Part 1 – Mastering GREP – Introduction and Invoking
Introduction to Grep
`grep’ searches input files for lines containing a match to a given pattern list. When it finds a match in a line, it copies the line to standard output (by default), or produces whatever other sort of output you have requested with options.
Though `grep’ expects to do the matching on text, it has no limits on input line length other than available memory, and it can match arbitrary characters within a line. If the final byte of an input file is not a newline, `grep’ silently supplies one. Since newline is also a separator for the list of patterns, there is no way to match newline characters in a text.
The general synopsis of the `grep’ command line is
grep OPTIONS PATTERN INPUT_FILE_NAMES
There can be zero or more OPTIONS. PATTERN will only be seen as such (and not as an INPUT_FILE_NAME) if it wasn’t already specified within OPTIONS (by using the `-e PATTERN’ or `-f FILE’ options). There can be zero or more INPUT_FILE_NAMES.
1 Command-line Options
1.1 Generic Program Information
Print a usage message briefly summarizing the command-line options and the bug-reporting address, then exit.
`-V’ | `–version’
Print the version number of `grep’ to the standard output stream. This version number should be included in all bug reports.
1.2 Matching Control
`-e PATTERN’ | `–regexp=PATTERN’
Use PATTERN as the pattern. This can be used to specify multiplesearch patterns, or to protect a pattern beginning with a `-‘.(`-e’ is specified by POSIX.)
`-f FILE’ | `–file=FILE’
Obtain patterns from FILE, one per line. The empty file containszero patterns, and therefore matches nothing. (`-f’ is specifiedby POSIX.)
`-i’ | `-y’ | `–ignore-case’
Ignore case distinctions in both the pattern and the input files.`-y’ is an obsolete synonym that is provided for compatibility.(`-i’ is specified by POSIX.)
`-v’ | `–invert-match’
Invert the sense of matching, to select non-matching lines. (`-v’is specified by POSIX.)
`-w’ | `–word-regexp’
Select only those lines containing matches that form whole words.The test is that the matching substring must either be at thebeginning of the line, or preceded by a non-word constituentcharacter. Similarly, it must be either at the end of the line orfollowed by a non-word constituent character. Word-constituentcharacters are letters, digits, and the underscore.
`-x’ | `–line-regexp’
Select only those matches that exactly match the whole line.(`-x’ is specified by POSIX.)
1.3 General Output Control
`-c’ | `–count’
Suppress normal output; instead print a count of matching linesfor each input file. With the `-v’ (`–invert-match’) option,count non-matching lines. (`-c’ is specified by POSIX.)
`–color[=WHEN]’ | `–colour[=WHEN]’
Surround the matched (non-empty) strings, matching lines, contextlines, file names, line numbers, byte offsets, and separators (forfields and groups of context lines) with escape sequences todisplay them in color on the terminal.
The colors are defined bythe environment variable `GREP_COLORS’ and default to`ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36′ for bold redmatched text, magenta file names, green line numbers, green byteoffsets, cyan separators, and default terminal colors otherwise.
The deprecated environment variable `GREP_COLOR’ is stillsupported, but its setting does not have priority; it defaults to`01;31′ (bold red) which only covers the color for matched text.WHEN is `never’, `always’, or `auto’.
`-L’ | `–files-without-match’
Suppress normal output; instead print the name of each input filefrom which no output would normally have been printed. Thescanning of each file stops on the first match.
`-l’ | `–files-with-matches’
Suppress normal output; instead print the name of each input filefrom which output would normally have been printed. The scanningof each file stops on the first match. (`-l’ is specified byPOSIX.)
`-m NUM’ | `–max-count=NUM’
Stop reading a file after NUM maTching lines. If the input isstandard input from a regular file, and NUM matching lines areoutput, `grep’ ensures that the standard input is positioned justafter the last matching line before exiting, regardless of thepresence of trailing context lines. This enables a callingprocess to resume a search.
For example, the following shell script makes use of it:
while grep -m 1 PATTERN
done < FILE
But the following probably will not work because a pipe is not a regular
# This probably will not work.
cat FILE |
while grep -m 1 PATTERN
When `grep’ stops after NUM matching lines, it outputs anytrailing context lines. Since context does not include matchinglines, `grep’ will stop when it encounters another matching line.When the `-c’ or `–count’ option is also used, `grep’ does notoutput a count greater than NUM. When the `-v’ or`–invert-match’ option is also used, `grep’ stops afteroutputting NUM non-matching lines.
`-o’ | `–only-matching’
Print only the matched (non-empty) parts of matching lines, witheach such part on a separate output line.
`-q’ | `–quiet’ | `–silent’
Quiet; do not write anything to standard output. Exit immediatelywith zero status if any match is found, even if an error wasdetected. Also see the `-s’ or `–no-messages’ option. (`-q’ isspecified by POSIX.)
`-s’ | `–no-messages’
Suppress error messages about nonexistent or unreadable files.Portability note: unlike GNU `grep’, 7th Edition Unix `grep’ didnot conform to POSIX, because it lacked `-q’ and its `-s’ optionbehaved like GNU `grep”s `-q’ option.(1) USG-style `grep’ alsolacked `-q’ but its `-s’ option behaved like GNU `grep”s.Portable shell scripts should avoid both `-q’ and `-s’ and shouldredirect standard and error output to `/dev/null’ instead. (`-s’is specified by POSIX.)
1.4 Output Line Prefix Control
When several prefix fields are to be output, the order is always file name, line number, and byte offset, regardless of the order in which these options were specified.
`-b’ | `–byte-offset’
Print the 0-based byte offset within the input file before eachline of output. If `-o’ (`–only-matching’) is specified, printthe offset of the matching part itself. When `grep’ runs onMS-DOS or MS-Windows, the printed byte offsets depend on whetherthe `-u’ (`–unix-byte-offsets’) option is used; see below.
`-H’ | `–with-filename’
Print the file name for each match. This is the default whenthere is more than one file to search.
`-h’ | `–no-filename’
Suppress the prefixing of file names on output. This is thedefault when there is only one file (or only standard input) tosearch.
Display input actually coming from standard input as input comingfrom file LABEL. This is especially useful when implementingtools like `zgrep’;
e.g.: gzip -cd foo.gz | grep –label=foo -H something
`-n’ | `–line-number’
Prefix each line of output with the 1-based line number within itsinput file. (`-n’ is specified by POSIX.)
`-T’ | `–initial-tab’
Make sure that the first character of actual line content lies ona tab stop, so that the alignment of tabs looks normal. This isuseful with options that prefix their output to the actual content:`-H’, `-n’, and `-b’. In order to improve the probability thatlines from a single file will all start at the same column, thisalso causes the line number and byte offset (if present) to beprinted in a minimum-size field width.
`-u’ | `–unix-byte-offsets’
Report Unix-style byte offsets. This option causes `grep’ toreport byte offsets as if the file were a Unix-style text file,i.e., the byte offsets ignore the `CR’ characters that werestripped. This will produce results identical to running `grep’on a Unix machine. This option has no effect unless the `-b’option is also used; it has no effect on platforms other thanMS-DOS and MS-Windows.
`-Z’ | `–null’
Output a zero byte (the ASCII `NUL’ character) instead of thecharacter that normally follows a file name. For example, `grep-lZ’ outputs a zero byte after each file name instead of the usualnewline. This option makes the output unambiguous, even in thepresence of file names containing unusual characters like newlines.This option can be used with commands like `find -print0′, `perl-0′, `sort -z’, and `xargs -0′ to process arbitrary file names,even those that contain newline characters.
1.5 Context Line Control
Regardless of how these options are set, `grep’ will never print any given line more than once. If the `-o’ (`–only-matching’) option is specified, these options have no effect and a warning is given upon their use.
`-A NUM’ | `–after-context=NUM’
Print NUM lines of trailing context after matching lines.
`-B NUM’ | `–before-context=NUM’
Print NUM lines of leading context before matching lines.
`-C NUM’ | `-NUM’ | `–context=NUM’
Print NUM lines of leading and trailing output context.
When `-A’, `-B’ or `-C’ are in use, print STRING instead of `–‘around disjoint groups of lines.
When `-A’, `-B’ or `-C’ are in use, print disjoint groups of linesadjacent to each other.
Here are some points about how `grep’ chooses the separator to print between prefix fields and line content:
- Matching lines normally use `:’ as a separator between prefixfields and actual line content.
- Context (i.e., non-matching) lines use `-‘ instead.
- When no context is specified, matching lines are simply output oneright after another.
- When nonzero context is specified, lines that are adjacent in theinput form a group and are output one right after another, while aseparator appears by default between disjoint groups on a line ofits own and without any prefix.
- The default separator is `–‘, however whether to include it andits appearance can be changed with the options above.
- Each group may contain several matching lines when they are closeenough to each other that two otherwise adjacent but dividedgroups connect and can just merge into a single contiguous one.
1.6 File and Directory Selection
`-a’ | `–text’
Process a binary file as if it were text; this is equivalent tothe `–binary-files=text’ option.
If a file’s allocation metadata or its first few bytes indicatethat the file contains binary data, assume that the file is oftype TYPE. By default, TYPE is `binary’, and `grep’ normallyoutputs either a one-line message saying that a binary filematches, or no message if there is no match.
If TYPE is `without-match’, `grep’ assumes that a binary file doesnot match; this is equivalent to the `-I’ option.
If TYPE is `text’, `grep’ processes a binary file as if it weretext; this is equivalent to the `-a’ option.
_Warning:_ `–binary-files=text’ might output binary garbage,which can have nasty side effects if the output is a terminal andif the terminal driver interprets some of it as commands.
`-D ACTION’ | `–devices=ACTION’
If an input file is a device, FIFO, or socket, use ACTION toprocess it. If ACTION is `read’, all devices are read just as ifthey were ordinary files. If ACTION is `skip’, devices, FIFOs,and sockets are silently skipped. By default, devices are read ifthey are on the command line or if the `-R'(`–dereference-recursive’) option is used, and are skipped ifthey are encountered recursively and the `-r’ (`–recursive’)option is used. This option has no effect on a file that is read
via standard input.
`-d ACTION’ | `–directories=ACTION’
If an input file is a directory, use ACTION to process it. Bydefault, ACTION is `read’, which means that directories are readjust as if they were ordinary files (some operating systems andfile systems disallow this, and will cause `grep’ to print errormessages for every directory or silently skip them). If ACTION is`skip’, directories are silently skipped. If ACTION is `recurse’,`grep’ reads all files under each directory, recursively,following command-line symbolic links and skipping other symlinks;this is equivalent to the `-r’ option.
Skip files whose base name matches GLOB (using wildcard matching).A file-name glob can use `*’, `?’, and `[‘…`]’ as wildcards, and`\’ to quote a wildcard or backslash character literally.
Skip files whose base name matches any of the file-name globs readfrom FILE (using wildcard matching as described under `–exclude’).
Exclude directories matching the pattern DIR from recursivedirectory searches.
Process a binary file as if it did not contain matching data; thisis equivalent to the `–binary-files=without-match’ option.
Search only files whose base name matches GLOB (using wildcardmatching as described under `–exclude’).
`-r’ | `–recursive’
For each directory operand, read and process all files in thatdirectory, recursively. Follow symbolic links on the commandline, but skip symlinks that are encountered recursively. This isthe same as the `–directories=recurse’ option.
`-R’ | `–dereference-recursive’
For each directory operand, read and process all files in thatdirectory, recursively, following all symbolic links.
1.7 Other Options
Use line buffering on output. This can cause a performancepenalty.
This option is deprecated and now elicits a warning, but isotherwise a no-op. It used to make `grep’ read input with the`mmap’ system call, instead of the default `read’ system call. Onmodern systems, `mmap’ would rarely if ever yield betterperformance.
`-U’ | `–binary’
Treat the file(s) as binary. By default, under MS-DOS andMS-Windows, `grep’ guesses whether a file is text or binary asdescribed for the `–binary-files’ option. If `grep’ decides thefile is a text file, it strips the `CR’ characters from theoriginal file contents (to make regular expressions with `^’ and`$’ work correctly). Specifying `-U’ overrules this guesswork,causing all files to be read and passed to the matching mechanismverbatim; if the file is a text file with `CR/LF’ pairs at the endof each line, this will cause some regular expressions to fail.This option has no effect on platforms other than MS-DOS andMS-Windows.
`-z’ | `–null-data’
Treat the input as a set of lines, each terminated by a zero byte(the ASCII `NUL’ character) instead of a newline. Like the `-Z’or `–null’ option, this option can be used with commands like`sort -z’ to process arbitrary file names.
2 Environment Variables
The behavior of `grep’ is affected by the following environment variables.
The locale for category `LC_FOO’ is specified by examining the three environment variables `LC_ALL’, `LC_FOO’, and `LANG’, in that order.
The first of these variables that is set specifies the locale.
For example, if `LC_ALL’ is not set, but `LC_MESSAGES’ is set to `pt_BR’, then the Brazilian Portuguese locale is used for the `LC_MESSAGES’ category.
The `C’ locale is used if none of these environment variables are set, if the locale catalog is not installed, or if `grep’ was not compiled with national language support (NLS).
Many of the environment variables in the following list let you control highlighting using Select Graphic Rendition (SGR) commands interpreted by the terminal or terminal emulator. (See the section in the documentation of your text terminal for permitted values and their meanings as character attributes.) These substring values are integers in decimal representation and can be concatenated with semicolons.
`grep’ takes care of assembling the result into a complete SGR sequence (`\33[‘…`m’).
Common values to concatenate include `1′ for bold, `4′ for underline, `5′ for blink, `7′ for inverse, `39′ for default foreground color, `30′ to `37′ for foreground colors, `90′ to `97′ for 16-color mode foreground colors, `38;5;0′ to `38;5;255′ for 88-color and 256-color modes foreground colors, `49′ for default background color, `40′ to `47′ for background colors, `100′ to `107′ for 16-color mode background colors, and `48;5;0′ to `48;5;255′ for 88-color and 256-color modes background colors.
The two-letter names used in the `GREP_COLORS’ environment variable (and some of the others) refer to terminal “capabilities,” the ability of a terminal to highlight text, or change its color, and so on. These capabilities are stored in an online database and accessed by the `terminfo’ library.
This variable specifies default options to be placed in front ofany explicit options. For example, if `GREP_OPTIONS’ is`–binary-files=without-match –directories=skip’, `grep’ behavesas if the two options `–binary-files=without-match’ and`–directories=skip’ had been specified before any explicitoptions. Option specifications are separated by whitespace. Abackslash escapes the next character, so it can be used to specifyan option containing whitespace or a backslash.
The GREP_OPTIONS’ value does not affect whether `grep’ without file operands searches standard input or the working directory; that is affected only by command-line options. For example, the command `grep PAT’ searches standard input and the command `grep -r PAT’ searches the working directory, regardless of whether `GREP_OPTIONS’ contains `-r’.
This variable specifies the color used to highlight matched (non-empty) text. It is deprecated in favor of `GREP_COLORS’, but still supported. The `mt’, `ms’, and `mc’ capabilities of `GREP_COLORS’ have priority over it. It can only specify the color used to highlight the matching non-empty text in any matching line (a selected line when the `-v’ command-line option is omitted, or a context line when `-v’ is specified). The default is `01;31′, which means a bold red foreground text on the terminal’s default background.
This variable specifies the colors and other attributes used to highlight various parts of the output. Its value is a colon-separated list of `terminfo’ capabilities that defaults to `ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36′ with the `rv’ and `ne’ boolean capabilities omitted (i.e., false). Supported capabilities are as follows.
SGR substring for whole selected lines (i.e., matching lines when the `-v’ command-line option is omitted, or non-matching lines when `-v’ is specified). If however the boolean `rv’ capability and the `-v’ command-line option are both specified, it applies to context matching lines instead. The default is empty (i.e., the terminal’s default color pair).
SGR substring for whole context lines (i.e., non-matching lines when the `-v’ command-line option is omitted, or matching lines when `-v’ is specified). If however the boolean `rv’ capability and the `-v’ command-line option are both specified, it applies to selected non-matching lines instead. The default is empty (i.e., the terminal’s default color pair).
Boolean value that reverses (swaps) the meanings of the `sl=’ and `cx=’ capabilities when the `-v’ command-line option is specified. The default is false (i.e., the capability is omitted).
SGR substring for matching non-empty text in any matching line (i.e., a selected line when the `-v’ command-line option is omitted, or a context line when `-v’ is specified). Setting this is equivalent to setting both `ms=’ and `mc=’ at once to the same value. The default is a bold red text foreground over the current line background.
SGR substring for matching non-empty text in a selected line. (This is used only when the `-v’ command-line option is omitted.) The effect of the `sl=’ (or `cx=’ if `rv’) capability remains active when this takes effect. The default is a bold red text foreground over the current line background.
SGR substring for matching non-empty text in a context line. (This is used only when the `-v’ command-line option is specified.) The effect of the `cx=’ (or `sl=’ if `rv’) capability remains active when this takes effect. The default is a bold red text foreground over the current line background.
SGR substring for file names prefixing any content line. The default is a magenta text foreground over the terminal’s default background.
SGR substring for line numbers prefixing any content line. The default is a green text foreground over the terminal’s default background.
SGR substring for byte offsets prefixing any content line. The default is a green text foreground over the terminal’s default background.
SGR substring for separators that are inserted between selected line fields (`:’), between context line fields (`-‘), and between groups of adjacent lines when nonzero context is specified (`–‘). The default is a cyan text foreground over the terminal’s default background.
Boolean value that prevents clearing to the end of line using Erase in Line (EL) to Right (`\33[K’) each time a colorized item ends. This is needed on terminals on which EL is not supported. It is otherwise useful on terminals for which the `back_color_erase’ (`bce’) boolean `terminfo’ capability does not apply, when the chosen highlight colors do not affect the background, or when EL is too slow or causes too much flicker. The default is false (i.e., the capability is omitted).
Note that boolean capabilities have no `=’… part. They are omitted (i.e., false) by default and become true when specified.
`LC_ALL’ | `LC_COLLATE’ | `LANG’
These variables specify the locale for the `LC_COLLATE’ category, which determines the collating sequence used to interpret range expressions like `[a-z]’.
`LC_ALL’ | `LC_CTYPE’ | `LANG’
These variables specify the locale for the `LC_CTYPE’ category, which determines the type of characters, e.g., which characters are whitespace.
`LC_ALL’ | `LC_MESSAGES’ | `LANG’
These variables specify the locale for the `LC_MESSAGES’ category, which determines the language that `grep’ uses for messages. The default `C’ locale uses American English messages.
If set, `grep’ behaves as POSIX requires; otherwise, `grep’ behaves more like other GNU programs. POSIX requires that options that follow file names must be treated as file names; by default, such options are permuted to the front of the operand list and are treated as options. Also, `POSIXLY_CORRECT’ disables special handling of an invalid bracket expression.
(Here `N’ is `grep”s numeric process ID.) If the Ith character of this environment variable’s value is `1′, do not consider the Ith operand of `grep’ to be an option, even if it appears to be one. A shell can put this variable in the environment for each command it runs, specifying which operands are the results of file name wildcard expansion and therefore should not be treated as options. This behavior is available only with the GNU C library, and only when `POSIXLY_CORRECT’ is not set.
3 Exit Status
Normally, the exit status is 0 if selected lines are found and 1 otherwise. But the exit status is 2 if an error occurred, unless the `-q’ or `–quiet’ or `–silent’ option is used and a selected line is found. Note, however, that POSIX only mandates, for programs such as `grep’, `cmp’, and `diff’, that the exit status in case of error be greater than 1; it is therefore advisable, for the sake of portability, to use logic that tests for this general condition instead of strict equality with 2.
4 `grep’ Programs
`grep’ searches the named input files for lines containing a match to the given pattern. By default, `grep’ prints the matching lines. A file named `-‘ stands for standard input. If no input is specified, `grep’ searches the working directory `.’ if given a command-line option specifying recursion; otherwise, `grep’ searches standard input. There are four major variants of `grep’, controlled by the following options.
`-G’ | `–basic-regexp’
Interpret the pattern as a basic regular expression (BRE). This is the default.
`-E’ | `–extended-regexp’
Interpret the pattern as an extended regular expression (ERE). (`-E’ is specified by POSIX.)
`-F’ | `–fixed-strings’
Interpret the pattern as a list of fixed strings, separated by newlines, any of which is to be matched. (`-F’ is specified by POSIX.)
`-P’ | `–perl-regexp’
Interpret the pattern as a Perl regular expression. This is highly experimental and `grep -P’ may warn of unimplemented features.
In addition, two variant programs `egrep’ and `fgrep’ are available. `egrep’ is the same as `grep -E’. `fgrep’ is the same as `grep -F’. Direct invocation as either `egrep’ or `fgrep’ is deprecated, but is provided to allow historical applications that rely on them to run unmodified.