PCRE2CONVERT(3) Introduction to Library Functions PCRE2CONVERT(3)
NAME
PCRE2 - Perl-compatible regular expressions (revised API)
EXPERIMENTAL PATTERN CONVERSION FUNCTIONS
This document describes a set of functions that can be used to
convert "foreign" patterns into PCRE2 regular expressions. This
facility is currently experimental, and may be changed in future
releases. Two kinds of pattern, globs and POSIX patterns, are
supported.
THE CONVERT CONTEXT
pcre2_convert_context *pcre2_convert_context_create( pcre2_general_context *gcontext); pcre2_convert_context *pcre2_convert_context_copy( pcre2_convert_context *cvcontext); void pcre2_convert_context_free(pcre2_convert_context *cvcontext); int pcre2_set_glob_escape(pcre2_convert_context *cvcontext, uint32_t escape_char); int pcre2_set_glob_separator(pcre2_convert_context *cvcontext, uint32_t separator_char); A convert context is used to hold parameters that affect the way that
pattern conversion works. Like all PCRE2 contexts, you need to use a
context only if you want to override the defaults. There are the
usual create, copy, and free functions. If custom memory management
functions are set in a general context that is passed to
pcre2_convert_context_create(), they are used for all memory
management within the conversion functions.
There are only two parameters in the convert context at present. Both
apply only to glob conversions. The escape character defaults to
grave accent under Windows, otherwise backslash. It can be set to
zero, meaning no escape character, or to any punctuation character
with a code point less than 256. The separator character defaults to
backslash under Windows, otherwise forward slash. It can be set to
forward slash, backslash, or dot.
The two setting functions return zero on success, or
PCRE2_ERROR_BADDATA if their second argument is invalid.
THE CONVERSION FUNCTION
int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length, uint32_t options, PCRE2_UCHAR **buffer, PCRE2_SIZE *blength, pcre2_convert_context *cvcontext); void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern); The first two arguments of
pcre2_pattern_convert() define the foreign
pattern that is to be converted. The length may be given as
PCRE2_ZERO_TERMINATED. The
options argument defines how the pattern
is to be processed. If the input is UTF, the PCRE2_CONVERT_UTF option
should be set. PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are
sure the input is valid. One or more of the glob options, or one of
the following POSIX options must be set to define the type of
conversion that is required:
PCRE2_CONVERT_GLOB
PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
PCRE2_CONVERT_GLOB_NO_STARSTAR
PCRE2_CONVERT_POSIX_BASIC
PCRE2_CONVERT_POSIX_EXTENDED
Details of the conversions are given below. The
buffer and
blength arguments define how the output is handled:
If
buffer is NULL, the function just returns the length of the
converted pattern via
blength. This is one less than the length of
buffer needed, because a terminating zero is always added to the
output.
If
buffer points to a NULL pointer, an output buffer is obtained
using the allocator in the context or
malloc() if no context is
supplied. A pointer to this buffer is placed in the variable to which
buffer points. When no longer needed the output buffer must be freed
by calling
pcre2_converted_pattern_free(). If this function is called
with a NULL argument, it returns immediately without doing anything.
If
buffer points to a non-NULL pointer,
blength must be set to the
actual length of the buffer provided (in code units).
In all cases, after successful conversion, the variable pointed to by
blength is updated to the length actually used (in code units),
excluding the terminating zero that is always added.
If an error occurs, the length (via
blength) is set to the offset
within the input pattern where the error was detected. Only gross
syntax errors are caught; there are plenty of errors that will get
passed on for
pcre2_compile() to discover.
The return from
pcre2_pattern_convert() is zero on success or a non-
zero PCRE2 error code. Note that PCRE2 error codes may be positive or
negative:
pcre2_compile() uses mostly positive codes and
pcre2_match() negative ones;
pcre2_convert() uses existing codes of
both kinds. A textual error message can be obtained by calling
pcre2_get_error_message().
CONVERTING GLOBS
Globs are used to match file names, and consequently have the concept
of a "path separator", which defaults to backslash under Windows and
forward slash otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards
* and ? are not permitted to match separator characters, but the
double-star (**) feature (which does match separators) is supported.
PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards
allowed to match separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR
matches globs with the double-star feature disabled. These options
may be given together.
CONVERTING POSIX PATTERNS
POSIX defines two kinds of regular expression pattern: basic and
extended. These can be processed by setting
PCRE2_CONVERT_POSIX_BASIC or PCRE2_CONVERT_POSIX_EXTENDED,
respectively.
In POSIX patterns, backslash is not special in a character class.
Unmatched closing parentheses are treated as literals.
In basic patterns, ? + | {} and () must be escaped to be recognized
as metacharacters outside a character class. If the first character
in the pattern is * it is treated as a literal. ^ is a metacharacter
only at the start of a branch.
In extended patterns, a backslash not in a character class always
makes the next character literal, whatever it is. There are no
backreferences.
Note: POSIX mandates that the longest possible match at the first
matching position must be found. This is not what
pcre2_match() does;
it yields the first match that is found. An application can use
pcre2_dfa_match() to find the longest match, but that does not
support backreferences (but then neither do POSIX extended patterns).
AUTHOR
Philip Hazel
Retired from University Computing Service
Cambridge, England.
REVISION
Last updated: 14 November 2023
Copyright (c) 1997-2018 University of Cambridge.
PCRE2 10.45 14 November 2023 PCRE2CONVERT(3)