Scripting and Regular Expressions
Prof. Brian D. Davison
Computer Science & Engineering, Lehigh University
Announcements
- Welcome back! Hope you enjoyed the extra couple of days off!
- New homework #4 online, due Friday in class
- Today's schedule
- Return short quiz #6 -- range 3-9, mean 7
- Continue Shell Scripting with Regular Expressions
A longer example
- First, let's review the shell scripting we saw before break in
a longer example (remember to open this and any other link in a new tab)
As we read through the file, we see: the sh-bang line, comments, local
variable declarations. It then tests for the non-existence of a file, and
so if true (doesn't exist), it prints an error and exits. We then create
a global (environmental) variable, followed by a local numeric variable
given the result of an expression that executes the system command date
(asking only for the hours) and adding 1 to it. We then make sure the
number of hours is not in military (24hr time). A local array of foods is
declared, and an integer too. For each person in the list generated by
the system command cat on a file (except for root), run the command mail
with a message (with variable substitutions) to send. The index variable
is updated on each iteration, and is reset if we run through the entire
set of food choices.
Globbing
bash$ ls -l [^ab]*
-rw-rw-r-- 1 bozo bozo 0 Aug 6 18:42 c.1
-rw-rw-r-- 1 bozo bozo 466 Aug 6 17:48 t2.sh
-rw-rw-r-- 1 bozo bozo 758 Jul 30 09:02 test1.txt
bash$ echo *
a.1 b.1 c.1 t2.sh test1.txt
bash$ echo t*
t2.sh test1.txt
Long ago, wildcard expansion was provided by an external program called
glob, but has since been incorporated into the shell directly.
The ability to perform globbing is also provided to unix programs,
using the standard glob(3c) C library function.
You can read about the POSIX standard for globbing in the
linux man page for glob(7)
The ^ within brackets provides negation.
Regular Expressions
Try changing this slightly to find other words in the dictionary file.
grep
- grep is an acronym that stands for global regular expression print.
- Typically we'll use grep to match a string literal (a regular expression without metacharacters)
% ps -ef | grep root
- grep has a number of useful options, including
- -i ignore case
- -n include line numbers in output
- -w search for words
- -l list filenames only
- -v show non-matching lines only
- Today, however, we're focusing on the regular expressions such as those
that grep uses (described in Quigley, Ch 4, and in grep man page).
Regular Expressions
- Regular Expressions
- A regular expression is a way of describing a pattern of characters to a program so that it can either display or modify the occurrences of that pattern.
- Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.
- The fundamental building blocks are the regular expressions that match a single character.
- Most characters, including all letters and digits, are regular expressions that match themselves.
- The simplest form of a regular expression is just a literal string.
- In addition to single characters and literal strings, regular expressions use a set of metacharacters that have special meaning when used in a regular expression.
Basic Regular Expression Metacharacters
- Wildcard matching
- . -- matches any one character
- [abc] -- matches any one of the letters 'a', 'b', or 'c'
- [^abc] -- matches any character except 'a', 'b', or 'c'
- [a-zA-Z0-9] -- matches either a lower case letter, upper case letter, or a digit
- Quantifiers
- * -- match previous character zero or more times
- Anchors
- ^ -- matches beginning of line
- $ -- matches end of line
- \< -- matches beginning of word
- \> -- matches end of word
Extended Regular Expression Metacharacters
- Wildcard matching
- | -- matches either the character before or after
a|b|c
- () -- groups characters for more complicated expressions
Lehi(gh)*
- Quantifiers
- ? -- match previous character zero or one time
- + -- match previous character one or more times
- These are available with egrep(1)
Regular Expressions in Emacs
- UNIX editors such as vi and emacs also support regular expressions to search and transform files
- M-C-S will allow you to type a regular expression and interactively find the next one (C-S will move to the next, as with regular search)
- M-X replace-regexp will prompt for a regular expression for matching, and then prompt for the replacement string
Try running emacs now. Put five lines of Lehigh X where X is the line number.
Now use the regex "L.*igh [0-9]" and C-S to see that it matches each one.
Lab exercise (from Quigley)
- Given the file datebook.txt
- Print all lines containing the string San.
- Print all lines where the person's first name starts with J.
- Print all lines ending in 700.
- Print all lines that don't contain 834.
- Print all lines where birthdays are in December.
- Print all lines where the phone number is in the 408 area code.
- Print all lines containing an uppercase letter, followed by four
lowercase letters, a comma, a space, and one uppercase letter.
- Print lines where the last name begins with a K or k.
- Print lines preceded by a line number where the salary (last field) is a six-figure number.
- Print lines containing Lincoln or lincoln.