UGREP-INDEXER(1) User Commands UGREP-INDEXER(1)
NAME
ugrep-indexer -- file indexer to accelerate recursive searching
SYNOPSIS
ugrep-indexer [-0...
9] [
-c|
-d|
-f] [
-I] [
-q] [
-S] [
-s] [
-X] [
-z]
[
PATH]
DESCRIPTION
The
ugrep-indexer utility recursively indexes files to accelerate
recursive searching with the
ug --index PATTERN commands:
$
ugrep-indexer [
-I] [
-z]
...
$
ug --index [
-I] [
-z] [
-r|
-R]
OPTIONS PATTERN $
ugrep --index [
-I] [
-z] [
-r|
-R]
OPTIONS PATTERN where option
-I or
--ignore-binary ignores binary files, which is
recommended to limit indexing storage overhead and to reduce search
time. Option
-z or
--decompress indexes and searches archives and
compressed files.
Indexing speeds up searching file systems that are large and cold
(not recently cached in RAM) and file systems that are generally slow
to search. Note that indexing may not speed up searching few files
or recursively searching fast file systems.
Searching with
ug --index is safe and never skips modified files that
may match after indexing; the
ug --index PATTERN command always
searches files and directories that were added or modified after
indexing. When option
--stats is used with
ug --index, a search
report is produced showing the number of files skipped not matching
any indexes and the number of files and directories that were added
or modified after indexing. Note that searching with
ug --index may
significantly increase the start-up time when complex regex patterns
are specified that contain large Unicode character classes combined
with `*' or `+' repeats, which should be avoided.
ugrep-indexer stores a hidden index file in each directory indexed.
The size of an index file depends on the number of files indexed and
the specified indexing accuracy. Higher accuracy produces larger
index files to improve search performance by reducing false positives
(a false positive is a match prediction for a file when the file does
not match the regex pattern.)
ugrep-indexer accepts an optional
PATH to the root of the directory
tree to index. The default is to index the working directory tree.
ugrep-indexer incrementally updates indexes. To force reindexing,
specify option
-f or
--force. Indexes are deleted with option
-d or
--delete.
ugrep-indexer may be stopped and restarted to continue indexing at
any time. Incomplete index files do not cause errors.
ASCII, UTF-8, UTF-16 and UTF-32 files are indexed and searched as
text files unless their UTF encoding is invalid. Files with other
encodings are indexed as binary files and can be searched with non-
Unicode regex patterns using
ug --index -U.
When
ugrep-indexer option
-I or
--ignore-binary is specified, binary
files are ignored and not indexed. Avoid searching these non-indexed
binary files with
ug --index -I using option
-I.
ugrep-indexer option
-X or
--ignore-files respects gitignore rules.
Likewise, avoid searching non-indexed ignored files with
ug --index --ignore-files using option
--ignore-files.
Archives and compressed files are indexed with
ugrep-indexer option
-z or
--decompress. Otherwise, archives and compressed files are
indexed as binary files or are ignored with option
-I or
--ignore- binary. Note that once an archive or compressed file is indexed as a
binary file, it will not be reindexed with option
-z to index the
contents of the archive or compressed file. Only files that are
modified after indexing are reindexed, which is determined by
comparing time stamps.
Symlinked files are indexed with
ugrep-indexer option
-S or
--dereference-files. Symlinks to directories are never followed.
To save a log file of the indexing process, specify option
-v or
--verbose and redirect standard output to a log file. All messages
and warnings are sent to standard output and captured by the log
file.
A .ugrep-indexer configuration file with configuration options is
loaded when present in the working directory or in the home
directory. A configuration option consists of the name of a long
option and its argument when applicable.
The following options are available:
-0,
-1,
-2,
-3, ...,
-9,
--accuracy=
DIGIT Specifies indexing accuracy. A low accuracy reduces the
indexing storage overhead at the cost of a higher rate of
false positive pattern matches (more noise). A high accuracy
reduces the rate of false positive regex pattern matches (less
noise) at the cost of an increased indexing storage overhead.
An accuracy between 2 and 7 is recommended. The default
accuracy is 4.
-.,
--hidden Index hidden files and directories.
-?,
--help Display a help message and exit.
-c,
--check Recursively check and report indexes without reindexing files.
-d,
--delete Recursively remove index files.
-f,
--force Force reindexing of files, even those that are already
indexed.
-I,
--ignore-binary Do not index binary files.
-q,
--quiet,
--silent Quiet mode: do not display indexing statistics.
-S,
--dereference-files Follow symbolic links to files. Symbolic links to directories
are never followed.
-s,
--no-messages Silent mode: nonexistent and unreadable files are ignored,
i.e. their error messages and warnings are suppressed.
-V,
--version Display version and exit.
-v,
--verbose Produce verbose output. Files are marked A for archive, C for
compressed, and B for binary or I for ignored binary.
Deletions are marked D.
-X,
--ignore-files,
--ignore-files=
FILE Do not index files and directories matching the globs in FILE
encountered during indexing. The default FILE is
`.gitignore'. This option may be repeated to specify
additional files.
-z,
--decompress Index the contents of compressed files and archives. Hidden
files in archives are ignored unless option
-. or
--hidden is
specified. Option
-I or
--ignore-binary ignores compressed
binary files. When used with option
--zmax=
NUM, indexes the
contents of compressed files and archives stored within
archives up to NUM levels deep. Supported compression
formats: gzip (.gz), compress (.Z), zip, 7z, bzip2 (requires
suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2), lzma and
xz (requires suffix .lzma, .tlz, .xz, .txz), lz4 (requires
suffix .lz4), zstd (requires suffix .zst, .zstd, .tzst),
brotli (requires suffix .br), bzip3 (requires suffix .bz3).
--zmax=
NUM When used with option
-z (
--decompress), indexes the contents
of compressed files and archives stored within archives by up
to NUM expansion levels deep. The default
--zmax=1 only
permits indexing uncompressed files stored in cpio, pax, tar,
zip and 7z archives; compressed files and archives are
detected as binary files and are effectively ignored. Specify
--zmax=2 to index compressed files and archives stored in
cpio, pax, tar, zip and 7z archives. NUM may range from 1 to
99 for up to 99 decompression and de-archiving steps.
Increasing NUM values gradually degrades performance.
EXIT STATUS
The
ugrep-indexer utility exits with one of the following values:
0 Indexes are up to date.
1 Indexing check
-c detected missing and outdated index files.
EXAMPLES
Recursively and incrementally index all non-binary files showing
progress:
$ ugrep-indexer -I -v
Recursively and incrementally index all non-binary files, including
non-binary files stored in archives and in compressed files, showing
progress:
$ ugrep-indexer -z -I -v
Incrementally index all non-binary files, including archives and
compressed files, show progress, follow symbolic links to files (but
not to directories), but do not index files and directories matching
the globs in .gitignore:
$ ugrep-indexer -z -I -v -S -X
Force re-indexing of all non-binary files, including archives and
compressed files, follow symbolic links to files (but not to
directories), but do not index files and directories matching the
globs in .gitignore:
$ ugrep-indexer -f -z -I -v -S -X
Same, but decrease index file storage to a minimum by decreasing
indexing accuracy from 4 (the default) to 0:
$ ugrep-indexer -f -0 -z -I -v -S -X
Increase search performance by increasing the indexing accuracy from
4 (the default) to 7 at a cost of larger index files:
$ ugrep-indexer -f7zIvSX
Recursively delete all hidden ._UG#_Store index files to restore the
directory tree to non-indexed:
$ ugrep-indexer -d
COPYRIGHT
Copyright (c) 2021-2025 Robert A. van Engelen <engelen@acm.org>
ugrep-indexer is released under the BSD-3 license. All parts of the
software have reasonable copyright terms permitting free
redistribution. This includes the ability to reuse all or parts of
the ugrep source tree.
SEE ALSO
ug(1),
ugrep(1).
BUGS
Report bugs at:
https://github.com/Genivia/ugrep-indexer/issues
ugrep-indexer 7.3.0 March 3, 2025 UGREP-INDEXER(1)