Tribblix: manual page: LLnextgen.1

LLnextgen(1) LLnextgen parser generator LLnextgen(1)

NAME

LLnextgen - an Extended-LL(1) parser generator

SYNOPSIS

LLnextgen [OPTIONS] [FILES]

DESCRIPTION

LLnextgen is a (partial) reimplementation of the LLgen ELL(1) parser
generator created by D. Grune and C.J.H. Jacobs (note: this is not
the same as the LLgen parser generator by Fischer and LeBlanc). It
takes an EBNF-like description of the grammar as input(s), and
produces a parser in C.

Input files are expected to end in .g. The output files will have .g
removed and .c and .h added. If the input file does not end in .g,
the extensions .c and .h will simply be added to the name of the
input file. Output files can also be given a different base name
using the option --base-name (see below).

OPTIONS

LLnextgen accepts the following options:

-c, --max-compatibility
Set options required for maximum source-level compatibility.
This is different from running as LLgen, as all extensions are
still allowed. LLreissue and the prototypes in the header file
are still generated. This option turns on the
--llgen-arg-style, --llgen-escapes-only and
--llgen-output-style options.

-e, --warnings-as-errors
Treat warnings as errors.

-Enum, --error-limit=num
Set the maximum number of errors, before LLnextgen aborts. If
num is set 0, the error limit is set to infinity. This is to
override the error limit option specified in the grammar file.

-h[which], --help[=which]
Print out a help message, describing the options. The optional
which argument allows selection of which options to print.
which can be set to all, depend, error, and extra.

-V, --version
Print the program version and copyright information, and exit.

-v[level], --verbose[=level]
Increase (without explicit level) or set (with explicit level)
the verbosity level. LLnextgen uses this option differently
than LLgen. At level 1, LLnextgen will output traces of the
conflicts to standard error. At level 2, LLnextgen will also
write a file named LL.output with the rules containing
conflicts. At level 3, LLnextgen will include the entire
grammar in LL.output.
LLgen will write the LL.output file from level 1, but cannot
generate conflict traces. It also has an intermediate setting
between LLnextgen levels 2 and 3.

-w[warnings], --suppress-warnings[=warnings]
Suppress all or selected warnings. Available warnings are:
arg-separator, option-override, unbalanced-c, multiple-parser,
eofile, unused[:<identifier>], datatype and unused-retval. The
unused warning can suppress all warnings about unused tokens
and non-terminals, or can be used to suppress warnings about
specific tokens or non-terminals by adding a colon and a name.
For example, to suppress warning messages about FOO not being
used, use -wunused:FOO. Several comma separated warnings can
be specified with one option on the command line.

--abort
Generate the LLabort function.

--base-name=name
Set the base name for the output files. Normally LLnextgen
uses the name of the first input file without any trailing .g
as the base name. This option can be used to override the
default. The files created will be name.c and name.h. This
option cannot be used in combination with
--llgen-output-style.

--depend[=modifiers]
Generate dependency information to be used by the make(1)
program. The modifiers can be used to change the make targets
(targets:<targets>, and extra-targets:<targets>) and the
output (file:<file>). The default are to use the output names
as they would be created by running with the same arguments as
targets, and to output to standard output. Using the targets
modifier, the list of targets can be specified manually. The
extra-targets modifier allows targets to be added to the
default list of targets. Finally, the phony modifier will add
phony targets for all dependencies to avoid make(1) problems
when removing or renaming dependencies. This is like the
gcc(1) -MP option.

--depend-cpp
Dump all top-level C-code to standard out. This can be used to
generate dependency information for the generated files by
piping the output from LLnextgen through the C preprocessor
with the appropriate options.

--dump-lexer-wrapper
Write the lexer wrapper function to standard output, and exit.

--dump-llmessage
Write the default LLmessage function to standard output, and
exit.

--dump-tokens[=modifier]
Dump %token directives for unknown identifiers that match the
--token-pattern pattern. The default is to generate a single
%token directive with all the unknown identifiers separated by
comma's. This default can be overridden by modifier. The
modifier separate produces a separate %token directive for
each identifier, while label produces a %label directive. The
text of the label will be the name of the identifier. If the
label modifier and the --lowercase-symbols option are both
specified the label will contain only lowercase characters.
Note: this option is not always available. It requires the
POSIX regex API. If the POSIX regex API is not available on
your platform, or the LLnextgen binary was compiled without
support for the API, you will not be able to use this option.

--extensions=list
Specify the extensions to be used for the generated files. The
list must be comma separated, and should not contain the .
before the extension. The first item in the list is the C
source file and the second item is the header file. You can
omit the extension for the C source file and only specify the
extension for the header file.

--generate-lexer-wrapper[=yes|no]
Indicate whether to generate a wrapper for the lexical
analyser. As LLnextgen requires a lexical analyser to return
the last token returned after detecting an error which
requires inserting a token to repair, most lexical analysers
require a wrapper to accommodate LLnextgen. As it is identical
for almost each grammar, LLnextgen can provide one. Use
--dump-lexer-wrapper to see the code. If you do specifiy this
option LLnextgen will generate a warning, to help remind you
that a wrapper is required.
If you do not want the automatically generate wrapper you
should specifiy this option followed by =no.

--generate-llmessage
Generate an LLmessage function. LLnextgen requires programs to
provide a function for informing the user about errors in the
input. When developing a parser, it is often desirable to have
a default LLmessage. The provided LLmessage is very simple
and should be replaced by a more elaborate one, once the
parser is beyond the first testing phase. Use --dump-llmessage
to see the code. This option automatically turns on
--generate-symbol-table.

--generate-symbol-table
Generate a symbol table. The symbol table will contain strings
for all tokens and character literals. By default, the symbol
table contains the token name as specified in the grammar. To
change the string, for both tokens and character literals, use
the %label directive.

--gettext[=macro,guard]
Add gettext support. A macro call is added around symbol table
entries generated from %label directives. The macro will
expand to the string itself. This is meant to allow
xgettext(1) to extract the strings. The default is N_, because
that is what most people use. A guard will be included such
that compilation without gettext is possible by not defining
the guard. The guard is set to USE_NLS by default.
Translations will be done automatically in LLgetSymbol in the
generated parser through a call to gettext.

--keep-dir
Do not remove directory component of the input file-name when
creating the output file-name. By default, outputs are created
in the current directory. This option will generate the
output in the directory of the input.

--llgen-arg-style
Use semicolons as argument separators in rule headers.
LLnextgen uses comma's by default, as this is what ANSI C
does.

--llgen-escapes-only
Only allow the escape sequences defined by LLgen in character
literals. By default LLnextgen also allows \a, \v, \?, \",
and hexadecimal constants with \x.

--llgen-output-style
Generate one .c output per input, and the files Lpars.c and
Lpars.h, instead of one .c and one .h file based on the name
of the first input.

--lowercase-symbols
Convert the token names used for generating the symbol table
to lower case. This only applies to tokens for which no
%label directive has been specified.

--no-allow-label-create
Do not allow the %label directive to create new tokens. Note
that this requires that the token being labelled is either a
character literal or a %token directive creating the named
token has preceded the %label directive.

--no-arg-count
Do not check argument counts for rules. LLnextgen checks
whether a rule is used with the same number of arguments as it
is defined. LLnextgen also checks that any rules for which a
%start directive is specified, the number of arguments is 0.

--no-eof-zero
Do not use 0 as end-of-file token. (f)lex(1) uses 0 as the
end-of-file token. Other lexical-analyser generators may use
-1, and may use 0 for something else (e.g. the nul character).

--no-init-llretval
Do not initialise LLretval with 0 bytes. Note that you have to
take care of initialisation of LLretval yourself when using
this option.

--no-line-directives
Do not generate #line directives in the output. This means all
errors will be reported relative to the output file. By
default LLnextgen generates #line directives to make the C
compiler generate errors relative to the LLnextgen input file.

--no-llreissue
Do not generate the LLreissue variable, which is used to
indicate when a token should be reissued by the lexical
analyser.

--no-prototypes-header
Do not generate prototypes for the parser and other functions
in the header file.

--not-only-reachable
Do not only analyse reachable rules. LLnextgen by default does
not take unreachable rules into account when doing conflict
analysis, as these can cause spurious conflicts. However, if
the unreachable rules will be used in the future, one might
already want to be notified of problems with these rules.
LLgen by default does analyse unreachable rules.
Note: in the case where a rule is unreachable because the only
alternative of another reachable rule that mentions it is
never chosen (because of a %avoid directive), the rule is
still deemed reachable for the analysis. The only way to avoid
this behaviour is by doing the complete analysis twice, which
is an excessive amount of work to do for a very rare case.

--reentrant
Generate a reentrant parser. By default, LLnextgen generates
non-reentrant parsers. A reentrant parser can be called from
itself, but not from another thread. Use --thread-safe to
generate a thread-safe parser.
Note that when multiple parsers are specified in one grammar
(using multiple %start directives), and one of these parsers
calls another, either the --reentrant option or the --thread-
safe option is also required. If these parsers are only called
when none of the others is running, the option is not
necessary.
Use only in combination with a reentrant lexical analyser.

--show-dir
Show directory names of source files in error and warning
messages. These are usually omitted for readability, but may
sometimes be necessary for tracing errors.

--thread-safe
Generate a thread-safe parser. Thread-safe parsers can be run
in parallel in different threads of the same program. The
interface of a thread-safe parser is different from the
regular (and then reentrant) version. See the detailed manual
for more details.

--token-pattern=pattern
Specify a regular expression to match with unknown identifiers
used in the grammar. If an unknown identifier matches,
LLnextgen will generate a token declaration for the
identifier. This option is primarily implemented to aid in the
first stages of development, to allow for quick testing for
conflicts without having to specify all the tokens yet. A list
of tokens can be generated with the --dump-tokens option.
Note: this option is not always available. It requires the
POSIX regex API. If the POSIX regex API is not available on
your platform, or the LLnextgen binary was compiled without
support for the API, you will not be able to use this option.

By running LLnextgen using the name LLgen, LLnextgen goes into
LLgen-mode. This is implemented by turning off all default extra
functionality like LLreissue, and disallowing all extensions to the
LLgen language. When running as LLgen, LLnextgen accepts the
following options from LLgen:

-a Ignored. LLnextgen only generates ANSI C.

-hnum Ignored. LLnextgen leaves optimisation of jump tables entirely
up to the C-compiler.

-j[num]
Ignored. LLnextgen leaves optimisation of jump tables entirely
up to the C-compiler.

-l[num]
Ignored. LLnextgen leaves optimisation of jump tables entirely
up to the C-compiler.

-v Increase the verbosity level. See the description of the -v
option above for details.

-w Suppress all warnings.

-x Ignored. LLnextgen will only generate token sets in LL.output.
The extensive error-reporting mechanisms in LLnextgen make
this feature obsolete.

LLnextgen cannot create parsers with non-correcting error-recovery.
Therefore, using the -n or -s options will cause LLnextgen to print
an error message and exit.

COMPATIBILITY WITH LLGEN

At this time the basic LLgen functionality is implemented. This
includes everything apart from the extended user error-handling with
the %onerror directive and the non-correcting error-recovery.

Although I've tried to copy the behaviour of LLgen accurately, I have
implemented some aspects slightly differently. The following is a
list of the differences in behaviour between LLgen and LLnextgen:

+o LLgen generated both K&R style C code and ANSI C code.
LLnextgen only supports generation of ANSI C code.

+o There is a minor difference in the determination of the
default choices. LLnextgen simply chooses the first
production with the shortest possible terminal production,
while LLgen also takes the complexity in terms of non-
terminals and terms into account. There is also a minor
difference when there is more than one shortest alternative
and some of them are marked with %avoid. Both differences are
not very important as the user can specify which alternative
should be the default, thereby circumventing the differences
in the algorithms.

+o The default behaviour of generating one output C file per
input and Lpars.c and Lpars.h has been changed in favour of
generating one .c file and one .h file. The rationale given
for creating multiple output files in the first place was that
it would reduce the compilation time for the generated parser.
As computation power has become much more abundant this
feature is no longer necessary, and the difficult interaction
with the make program makes it undesirable. The LLgen
behaviour is still supported through a command-line switch.

+o in LLgen one could have a parser and a %first macro with the
same name. LLnextgen forbids this, as it leads to name
collisions in the new file naming scheme. For the old LLgen
file naming scheme it could also easily lead to name
collisions, although they could be circumvented by not
mentioning the parser in any of the C code in the .g files.

+o LLgen names the labels it generates L_X, where X is a number.
LLnextgen names these LL_X.

+o LLgen parsers are always reentrant. As this feature is not
used very often, LLnextgen parsers are non-reentrant unless
the option --reentrant is used.

Furthermore, LLnextgen has many extended features, for easier
development.

BUGS

If you think you have found a bug, please check that you are using
the latest version of LLnextgen [http://os.ghalkes.nl/LLnextgen].
When reporting bugs, please include a minimal grammar that
demonstrates the problem.

AUTHOR

G.P. Halkes <llnextgen@ghalkes.nl>

COPYRIGHT

Copyright (C) 2005-2008 G.P. Halkes
LLnextgen is licensed under the GNU General Public License version 3.
For more details on the license, see the file COPYING in the
documentation directory. On Un*x systems this is usually
/usr/share/doc/LLnextgen-0.5.5.