Tribblix: manual page: flawfinder.1

FLAWFINDER(1) Flawfinder FLAWFINDER(1)

NAME

flawfinder - lexically find potential security flaws ("hits") in
source code

SYNOPSIS

DESCRIPTION

Flawfinder searches through C/C++ source code looking for potential
security flaws. To run flawfinder, simply give flawfinder a list of
directories or files. For each directory given, all files that have
C/C++ filename extensions in that directory (and its subdirectories,
recursively) will be examined. Thus, for most projects, simply give
flawfinder the name of the source code's topmost directory (use ``.''
for the current directory), and flawfinder will examine all of the
project's C/C++ source code. Flawfinder does not require that you be
able to build your software, so it can be used even with incomplete
source code. If you only want to have changes reviewed, save a
unified diff of those changes (created by GNU "diff -u" or "svn diff"
or "git diff") in a patch file and use the --patch (-P) option.

Flawfinder will produce a list of ``hits'' (potential security flaws,
also called findings), sorted by risk; the riskiest hits are shown
first. The risk level is shown inside square brackets and varies
from 0, very little risk, to 5, great risk. This risk level depends
not only on the function, but on the values of the parameters of the
function. For example, constant strings are often less risky than
fully variable strings in many contexts, and in those contexts the
hit will have a lower risk level. Flawfinder knows about gettext (a
common library for internationalized programs) and will treat
constant strings passed through gettext as though they were constant
strings; this reduces the number of false hits in internationalized
programs. Flawfinder will do the same sort of thing with _T() and
_TEXT(), common Microsoft macros for handling internationalized
programs. Flawfinder correctly ignores text inside comments and
strings. Normally flawfinder shows all hits with a risk level of at
least 1, but you can use the --minlevel option to show only hits with
higher risk levels if you wish. Hit descriptions also note the
relevant Common Weakness Enumeration (CWE) identifier(s) in
parentheses, as discussed below. Flawfinder is officially CWE-
Compatible. Hit descriptions with "[MS-banned]" indicate functions
that are in the banned list of functions released by Microsoft; see
http://msdn.microsoft.com/en-us/library/bb288454.aspx for more
information about banned functions.

Not every hit (aka finding) is actually a security vulnerability, and
not every security vulnerability is necessarily found. Nevertheless,
flawfinder can be an aid in finding and removing security
vulnerabilities. A common way to use flawfinder is to first apply
flawfinder to a set of source code and examine the highest-risk
items. Then, use --inputs to examine the input locations, and check
to make sure that only legal and safe input values are accepted from
untrusted users.

Once you've audited a program, you can mark source code lines that
are actually fine but cause spurious warnings so that flawfinder will
stop complaining about them. To mark a line so that these warnings
are suppressed, put a specially-formatted comment either on the same
line (after the source code) or all by itself in the previous line.
The comment must have one of the two following formats:

+o // Flawfinder: ignore

+o /* Flawfinder: ignore */

For compatibility's sake, you can replace "Flawfinder:" with "ITS4:"
or "RATS:" in these specially-formatted comments. Since it's
possible that such lines are wrong, you can use the --neverignore
option, which causes flawfinder to never ignore any line no matter
what the comment directives say (more confusingly, --neverignore
ignores the ignores).

Flawfinder uses an internal database called the ``ruleset''; the
ruleset identifies functions that are common causes of security
flaws. The standard ruleset includes a large number of different
potential problems, including both general issues that can impact any
C/C++ program, as well as a number of specific Unix-like and Windows
functions that are especially problematic. The --listrules option
reports the list of current rules and their default risk levels. As
noted above, every potential security flaw found in a given source
code file (matching an entry in the ruleset) is called a ``hit,'' and
the set of hits found during any particular run of the program is
called the ``hitlist.'' Hitlists can be saved (using --savehitlist),
reloaded back for redisplay (using --loadhitlist), and you can show
only the hits that are different from another run (using
--diffhitlist).

Flawfinder is a simple tool, leading to some fundamental pros and
cons. Flawfinder works by doing simple lexical tokenization
(skipping comments and correctly tokenizing strings), looking for
token matches to the database (particularly to find function calls).
Flawfinder is thus similar to RATS and ITS4, which also use simple
lexical tokenization. Flawfinder then examines the text of the
function parameters to estimate risk. Unlike tools such as splint,
gcc's warning flags, and clang, flawfinder does not use or have
access to information about control flow, data flow, or data types
when searching for potential vulnerabilities or estimating the level
of risk. Thus, flawfinder will necessarily produce many false
positives for vulnerabilities and fail to report many
vulnerabilities. On the other hand, flawfinder can find
vulnerabilities in programs that cannot be built or cannot be linked.
It can often work with programs that cannot even be compiled (at
least by the reviewer's tools). Flawfinder also doesn't get as
confused by macro definitions and other oddities that more
sophisticated tools have trouble with. Flawfinder can also be useful
as a simple introduction to static analysis tools in general, since
it is easy to start using and easy to understand.

Any filename given on the command line will be examined (even if it
doesn't have a usual C/C++ filename extension); thus you can force
flawfinder to examine any specific files you desire. While searching
directories recursively, flawfinder only opens and examines regular
files that have C/C++ filename extensions. Flawfinder presumes that
files are C/C++ files if they have the extensions ".c", ".h", ".ec",
".ecp", ".pgc", ".C", ".cpp", ".CPP", ".cxx", ".c++", ".cc", ".CC",
".pcc", ".hpp", or ".H". The filename ``-'' means the standard
input. To prevent security problems, special files (such as device
special files and named pipes) are always skipped, and by default
symbolic links are skipped (the --allowlink option follows symbolic
links).

After the list of hits is a brief summary of the results (use -D to
remove this information). It will show the number of hits, lines
analyzed (as reported by wc -l), and the physical source lines of
code (SLOC) analyzed. A physical SLOC is a non-blank, non-comment
line. It will then show the number of hits at each level; note that
there will never be a hit at a level lower than minlevel (1 by
default). Thus, "[0] 0 [1] 9" means that at level 0 there were
0 hits reported, and at level 1 there were 9 hits reported. It will
next show the number of hits at a given level or larger (so level 3+
has the sum of the number of hits at level 3, 4, and 5). Thus, an
entry of "[0+] 37" shows that at level 0 or higher there were 37
hits (the 0+ entry will always be the same as the "hits" number
above). Hits per KSLOC is next shown; this is each of the "level or
higher" values multiplied by 1000 and divided by the physical SLOC.
If symlinks were skipped, the count of those is reported. If hits
were suppressed (using the "ignore" directive in source code comments
as described above), the number suppressed is reported. The minimum
risk level to be included in the report is displayed; by default this
is 1 (use --minlevel to change this). The summary ends with
important reminders: Not every hit is necessarily a security
vulnerability, and there may be other security vulnerabilities not
reported by the tool.

Flawfinder can easily integrate into a continuous integration system.
You might want to check out the --error-level option to help do that,
e.g., using --error-level=4 will cause an error to be returned if
flawfinder finds a vulnerability of level 4 or higher.

Flawfinder is released under the GNU GPL license version 2 or later
(GPLv2+).

Flawfinder works similarly to another program, ITS4, which is not
fully open source software (as defined in the Open Source Definition)
nor free software (as defined by the Free Software Foundation). The
author of Flawfinder has never seen ITS4's source code. Flawfinder
is similar in many ways to RATS, if you are familiar with RATS.

BRIEF TUTORIAL

Here's a brief example of how flawfinder might be used. Imagine that
you have the C/C++ source code for some program named xyzzy (which
you may or may not have written), and you're searching for security
vulnerabilities (so you can fix them before customers encounter the
vulnerabilities). For this tutorial, I'll assume that you're using a
Unix-like system, such as Linux, OpenBSD, or MacOS X.

If the source code is in a subdirectory named xyzzy, you would
probably start by opening a text window and using flawfinder's
default settings, to analyze the program and report a prioritized
list of potential security vulnerabilities (the ``less'' just makes
sure the results stay on the screen):
flawfinder xyzzy | less

At this point, you will see a large number of entries. Each entry
has a filename, a colon, a line number, a risk level in brackets
(where 5 is the most risky), a category, the name of the function,
and a description of why flawfinder thinks the line is a
vulnerability. Flawfinder normally sorts by risk level, showing the
riskiest items first; if you have limited time, it's probably best to
start working on the riskiest items and continue until you run out of
time. If you want to limit the display to risks with only a certain
risk level or higher, use the --minlevel option. If you're getting
an extraordinary number of false positives because variable names
look like dangerous function names, use the -F option to remove
reports about them. If you don't understand the error message,
please see documents such as the Secure Programming HOWTO
<https://dwheeler.com/secure-programs> at
https://dwheeler.com/secure-programs which provides more information
on writing secure programs.

Once you identify the problem and understand it, you can fix it.
Occasionally you may want to re-do the analysis, both because the
line numbers will change and to make sure that the new code doesn't
introduce yet a different vulnerability.

If you've determined that some line isn't really a problem, and
you're sure of it, you can insert just before or on the offending
line a comment like
/* Flawfinder: ignore */
to keep them from showing up in the output.

Once you've done that, you should go back and search for the
program's inputs, to make sure that the program strongly filters any
of its untrusted inputs. Flawfinder can identify many program inputs
by using the --inputs option, like this:
flawfinder --inputs xyzzy

Flawfinder can integrate well with text editors and integrated
development environments; see the examples for more information.

Flawfinder includes many other options, including ones to create HTML
versions of the output (useful for prettier displays) and OASIS
Static Analysis Results Interchange Format (SARIF) output. The next
section describes those options in more detail.

OPTIONS

Flawfinder has a number of options, which can be grouped into options
that control its own documentation, select input data, select which
hits to display, select the output format, and perform hitlist
management. The commonly-used flawfinder options support the
standard option syntax defined in the POSIX (Issue 7, 2013 Edition)
section ``Utility Conventions''. Flawfinder also supports the GNU
long options (double-dash options of form --option) as defined in the
GNU C Library Reference Manual ``Program Argument Syntax
Conventions'' and GNU Coding Standards ``Standards for Command Line
Interfaces''. Long option arguments can be provided as
``--name=value'' or ``-name value''. All options can be accessed
using the more readable GNU long option conventions; some less
commonly used options can only be accessed using long option
conventions.

Documentation

--help

-h Show usage (help) information.

--version Shows (just) the version number and exits.

--listrules List the terms (tokens) that trigger further examination,
their default risk level, and the default warning
(including the CWE identifier(s), if applicable), all
tab-separated. The terms are primarily names of
potentially-dangerous functions. Note that the reported
risk level and warning for some specific code may be
different than the default, depending on how the term is
used. Combine with -D if you do not want the usual
header. Flawfinder version 1.29 changed the separator
from spaces to tabs, and added the default warning field.

Selecting Input Data

--allowlink Allow the use of symbolic links; normally symbolic links
are skipped. Don't use this option if you're analyzing
code by others; attackers could do many things to cause
problems for an analysis with this option enabled. For
example, an attacker could insert symbolic links to files
such as /etc/passwd (leaking information about the file)
or create a circular loop, which would cause flawfinder
to run ``forever''. Another problem with enabling this
option is that if the same file is referenced multiple
times using symbolic links, it will be analyzed multiple
times (and thus reported multiple times). Note that
flawfinder already includes some protection against
symbolic links to special file types such as device file
types (e.g., /dev/zero or C:\mystuff\com1). Note that
for flawfinder version 1.01 and before, this was the
default.

--followdotdir
Enter directories whose names begin with ".". Normally
such directories are ignored, since they normally include
version control private data (such as .git/ or .svn/),
build metadata (such as .makepp), configuration
information, and so on.

--nolink Ignored. Historically this disabled following symbolic
links; this behavior is now the default.

--patch=patchfile

-P patchfile
Examine the selected files or directories, but only
report hits in lines that are added or modified as
described in the given patch file. The patch file must
be in a recognized unified diff format (e.g., the output
of GNU "diff -u old new", "svn diff", or "git diff
[commit]"). Flawfinder assumes that the patch has
already been applied to the files. The patch file can
also include changes to irrelevant files (they will
simply be ignored). The line numbers given in the patch
file are used to determine which lines were changed, so
if you have modified the files since the patch file was
created, regenerate the patch file first. Beware that
the file names of the new files given in the patch file
must match exactly, including upper/lower case, path
prefix, and directory separator (\ vs. /). Only unified
diff format is accepted (GNU diff, svn diff, and git diff
output is okay); if you have a different format, again
regenerate it first. Only hits that occur on resultant
changed lines, or immediately above and below them, are
reported. This option implies --neverignore. Warning:
Do not pass a patch file without the -P, because
flawfinder will then try to treat the file as a source
file. This will often work, but the line numbers will be
relative to the beginning of the patch file, not the
positions in the source code. Note that you must also
provide the actual files to analyze, and not just the
patch file; when using -P files are only reported if they
are both listed in the patch and also listed (directly or
indirectly) in the list of files to analyze.

Selecting Hits to Display

--inputs

-I Show only functions that obtain data from outside the program;
this also sets minlevel to 0.

--minlevel=X

-m X Set minimum risk level to X for inclusion in hitlist. This
can be from 0 (``no risk'') to 5 (``maximum risk''); the
default is 1.

--falsepositive

-F Do not include hits that are likely to be false positives.
Currently, this means that function names are ignored if
they're not followed by "(", and that declarations of
character arrays aren't noted. Thus, if you have use a
variable named "access" everywhere, this will eliminate
references to this ordinary variable. This isn't the default,
because this also increases the likelihood of missing
important hits; in particular, function names in #define
clauses and calls through function pointers will be missed.

--neverignore

-n Never ignore security issues, even if they have an ``ignore''
directive in a comment.

--regexp=PATTERN

-e PATTERN
Only report hits with text that matches the regular expression
pattern PATTERN. For example, to only report hits containing
the text "CWE-120", use ``--regex CWE-120''. These option
flag names are the same as grep.

Selecting Output Format

--columns

-C Show the column number (as well as the file name and line
number) of each hit; this is shown after the line number
by adding a colon and the column number in the line (the
first character in a line is column number 1). This is
useful for editors that can jump to specific columns, or
for integrating with other tools (such as those to
further filter out false positives).

--context

-c Show context, i.e., the line having the "hit"/potential
flaw. By default the line is shown immediately after the
warning.

--csv Generate output in comma-separated-value (CSV) format.
This is the recommended format for sending to other tools
for processing. It will always generate a header row,
followed by 0 or more data rows (one data row for each
hit). Selecting this option automatically enables
--quiet and --dataonly. The headers are mostly self-
explanatory. "File" is the filename, "Line" is the line
number, "Column" is the column (starting from 1), "Level"
is the risk level (0-5, 5 is riskiest), "Category" is the
general flawfinder category, "Name" is the name of the
triggering rule, "Warning" is text explaining why it is a
hit (finding), "Suggestion" is text suggesting how it
might be fixed, "Note" is other explanatory notes, "CWEs"
is the list of one or more CWEs, "Context" is the source
code line triggering the hit, and "Fingerprint" is the
SHA-256 hash of the context once its leading and trailing
whitespace have been removed (the fingerprint may help
detect and eliminate later duplications). If you use
Python3, the hash is of the context when encoded as
UTF-8.

--dataonly

-D Don't display the header and footer. Use this along with
--quiet to see just the data itself.

--html

-H Format the output as HTML instead of as simple text.

--immediate

-i Immediately display hits (don't just wait until the end).

--sarif Produce output in the OASIS Static Analysis Results
Interchange Format (SARIF) format (a JSON-based format).
The goals of the SARIF format, as explained in version
2.1.0 (27 March 2020) of its specification, include being
able to "comprehensively capture the range of data
produced by commonly used static analysis tools." SARIF
output identifies the tool name as "Flawfinder". The
flawfinder levels 0 through 5 are mapped to SARIF rank
(by dividing by 5), SARIF level, and the default viewer
action as follows:

Flawfinder 0: SARIF rank 0.0, SARIF level note, Does not
display by default

Flawfinder 1: SARIF rank 0.2, SARIF level note, Does not
display by default

Flawfinder 2: SARIF rank 0.4, SARIF level note, Does not
display by default

Flawfinder 3: SARIF rank 0.6, SARIF level warning,
Displays by default, does not break build / other
processes

Flawfinder 4: SARIF rank 0.8, SARIF level error, Displays
by default, breaks build/ other processes

Flawfinder 5: SARIF rank 1.0, SARIF level error, Displays
by default, breaks build/ other processes

A big thanks to Yong Yan implementing SARIF output
generation for flawfinder! For more about the SARIF
format, see: https://www.oasis-
open.org/committees/tc_home.php?wg_abbrev=sarif

--singleline

-S Display as single line of text output for each hit.
Useful for interacting with compilation tools.

--omittime Omit timing information. This is useful for regression
tests of flawfinder itself, so that the output doesn't
vary depending on how long the analysis takes.

--quiet

-Q Don't display status information (i.e., which files are
being examined) while the analysis is going on.

--error-level=LEVEL
Return a nonzero (false) error code if there is at least
one hit of LEVEL or higher. If a diffhitlist is
provided, hits noted in it are ignored. This option can
be useful within a continuous integration script,
especially if you mark known-okay lines as "flawfinder:
ignore". Usually you want level to be fairly high, such
as 4 or 5. By default, flawfinder returns 0 (true) on a
successful run.

Hitlist Management

--savehitlist=F
Save all resulting hits (the "hitlist") to F.

--loadhitlist=F
Load the hitlist from F instead of analyzing source
programs. Warning: Do not load hitlists from untrusted
sources (for security reasons). These are internally
implemented using Python's "pickle" facility, which
trusts the input. Note that stored hitlists often cannot
be read when using an older version of Python, in
particular, if savehitlist was used but flawfinder was
run using Python 3, the hitlist can't be loaded by
running flawfinder with Python 2.

--diffhitlist=F
Show only hits (loaded or analyzed) not in F. F was
presumably created previously using --savehitlist.
Warning: Do not diff hitlists from untrusted sources (for
security reasons). If the --loadhitlist option is not
provided, this will show the hits in the analyzed source
code files that were not previously stored in F. If used
along with --loadhitlist, this will show the hits in the
loaded hitlist not in F. The difference algorithm is
conservative; hits are only considered the ``same'' if
they have the same filename, line number, column
position, function name, and risk level.

Character Encoding Errors

Flawfinder uses the character encoding rules set by Python.
Sometimes source code does not perfectly follow some encoding rules.
If you run flawfinder with Python 2 these non-conformities often do
not impact processing in practice.

However, if you run flawfinder with Python 3, this can be a problem.
Python 3 developers wants the world to always use encodings perfectly
correctly, everywhere, and in general wants everyone to only use
UTF-8. UTF-8 is a great encoding, and it is very popular, but the
world often doesn't care what the Python 3 developers want.

When running flawfinder using Python 3, the program will crash hard
if any source file has any non-conforming text. It will do this even
if the non-conforming text is in comments or strings (where it often
doesn't matter). Python 3 fails to provide useful built-ins to deal
with the messiness of the real world, so it's non-trivial to deal
with this problem without depending on external libraries (which
we're trying to avoid).

A symptom of this problem is if you run flawfinder and you see an
error message like this:

Error: encoding error in ,1.c

'utf-8' codec can't decode byte 0xff in position 45: invalid start
byte

What you are seeing is the result of an internal UnicodeDecodeError.

If this happens to you, there are several options:

Option #1 (special case): if your system normally uses an encoding
other than UTF-8, is properly set up to use that encoding (using
LC_ALL and maybe LC_CTYPE), and the input files are in that non-UTF-8
encoding, it may be that Python3 is (incorrectly) ignoring your
configuration. In that case, simply tell Python3 to use your
configuration by setting the environment variable PYTHONUTF8=0, e.g.,
run flawfinder as: "PYTHONUTF8=0 python3 flawfinder ...".

Option #2 (special case): If you know what the encoding of the files
is, you can force use of that encoding. E.g., if the encoding is
BLAH, run flawfinder as: "PYTHONUTF8=0 LC_ALL=C.BLAH python3
flawfinder ...". You can replace "C" after LC_ALL= with your real
language locale (e.g., "en_US").

Option #3: If you don't know what the encoding is, or the encoding is
inconsistent (e.g., the common case of UTF-8 files with some
characters encoded using Windows-1252 instead), then you can force
the system to use the ISO-8859-1 (Latin-1) encoding in which all
bytes are allowed. If the inconsistencies are only in comments and
strings, and the underlying character set is "close enough" to ASCII,
this can get you going in a hurry. You can do this by running:
"PYTHONUTF8=0 LC_ALL=C.ISO-8859-1 python3 flawfinder ...". In some
cases you may not need the "PYTHONUTF8=0". You may be able to
replace "C" after LC_ALL= with your real language locale (e.g.,
"en_US").

Option #4: Convert the encoding of the files to be analyzed so that
it's a single encoding - it's highly recommended to convert to UTF-8.
For example, the system program "iconv" or the Python program cvt2utf
can be used to convert encodings. (You can install cvt2utf with "pip
install cvtutf"). This works well if some files have one encoding,
and some have another, but they are consistent within a single file.
If the files have encoding errors, you'll have to fix them.

Option #5: Run flawfinder using Python 2 instead of Python 3. E.g.,
"python2 flawfinder ...".

To be clear: I strongly recommend using the UTF-8 encoding for all
source code, and use continuous integration tests to ensure that the
source code is always valid UTF-8. If you do that, many problems
disappear. But in the real world this is not always the situation.
Hopefully this information will help you deal with real-world
encoding problems.

EXAMPLES

Here are various examples of how to invoke flawfinder. The first
examples show various simple command-line options. Flawfinder is
designed to work well with text editors and integrated development
environments, so the next sections show how to integrate flawfinder
into vim and emacs.

Simple command-line options
flawfinder /usr/src/linux-3.16
Examine all the C/C++ files in the directory
/usr/src/linux-3.16 and all its subdirectories
(recursively), reporting on all hits found. By default
flawfinder will skip symbolic links and directories with
names that start with a period.

flawfinder --error-level=4 .
Examine all the C/C++ files in the current directory and
its subdirectories (recursively); return an error code if
there are vulnerabilities level 4 and up (the two highest
risk levels). This is a plausible way to use flawfinder
in a continuous integration system.

flawfinder --minlevel=4 .
Examine all the C/C++ files in the current directory and
its subdirectories (recursively); only report
vulnerabilities level 4 and up (the two highest risk
levels).

flawfinder --inputs mydir
Examine all the C/C++ files in mydir and its
subdirectories (recursively), and report functions that
take inputs (so that you can ensure that they filter the
inputs appropriately).

flawfinder --neverignore mydir
Examine all the C/C++ files in the directory mydir and
its subdirectories, including even the hits marked for
ignoring in the code comments.

flawfinder --csv .
Examine the current directory down (recursively), and
report all hits in CSV format. This is the recommended
form if you want to further process flawfinder output
using other tools (such as data correlation tools).

flawfinder -QD mydir
Examine mydir and report only the actual results
(removing the header and footer of the output). This
form may be useful if the output will be piped into other
tools for further analysis, though CSV format is probably
the better choice in that case. The -C (--columns) and
-S (--singleline) options can also be useful if you're
piping the data into other tools.

flawfinder -QDSC mydir
Examine mydir, reporting only the actual results (no
header or footer). Each hit is reported on one line, and
column numbers are reported. This can be a useful
command if you are feeding flawfinder output to other
tools.

flawfinder --quiet --html --context mydir > results.html
Examine all the C/C++ files in the directory mydir and
its subdirectories, and produce an HTML formatted version
of the results. Source code management systems (such as
SourceForge and Savannah) might use a command like this.

flawfinder --quiet --savehitlist saved.hits *.[ch]
Examine all .c and .h files in the current directory.
Don't report on the status of processing, and save the
resulting hitlist (the set of all hits) in the file
saved.hits.

flawfinder --diffhitlist saved.hits *.[ch]
Examine all .c and .h files in the current directory, and
show any hits that weren't already in the file
saved.hits. This can be used to show only the ``new''
vulnerabilities in a modified program, if saved.hits was
created from the older version of the program being
analyzed.

flawfinder --patch recent.patch .
Examine the current directory recursively, but only
report lines that were changed or added in the already-
applied patchfile named recent.patch.

flawfinder --regex "CWE-120|CWE-126" src/
Examine directory src recursively, but only report hits
where CWE-120 or CWE-126 apply.

Invoking from vim

The text editor vim includes a "quickfix" mechanism that works well
with flawfinder, so that you can easily view the warning messages and
jump to the relevant source code.

First, you need to invoke flawfinder to create a list of hits, and
there are two ways to do this. The first way is to start flawfinder
first, and then (using its output) invoke vim. The second way is to
start (or continue to run) vim, and then invoke flawfinder (typically
from inside vim).

For the first way, run flawfinder and store its output in some
FLAWFILE (say "flawfile"), then invoke vim using its -q option, like
this: "vim -q flawfile". The second way (starting flawfinder after
starting vim) can be done a legion of ways. One is to invoke
flawfinder using a shell command, ":!flawfinder-command > FLAWFILE",
then follow that with the command ":cf FLAWFILE". Another way is to
store the flawfinder command in your makefile (as, say, a
pseudocommand like "flaw"), and then run ":make flaw".

In all these cases you need a command for flawfinder to run. A
plausible command, which places each hit in its own line (-S) and
removes headers and footers that would confuse it, is:

flawfinder -SQD .

You can now use various editing commands to view the results. The
command ":cn" displays the next hit; ":cN" displays the previous hit,
and ":cr" rewinds back to the first hit. ":copen" will open a window
to show the current list of hits, called the "quickfix window";
":cclose" will close the quickfix window. If the buffer in the used
window has changed, and the error is in another file, jumping to the
error will fail. You have to make sure the window contains a buffer
which can be abandoned before trying to jump to a new file, say by
saving the file; this prevents accidental data loss.

Invoking from emacs

The text editor / operating system emacs includes "grep mode" and
"compile mode" mechanisms that work well with flawfinder, making it
easy to view warning messages, jump to the relevant source code, and
fix any problems you find.

First, you need to invoke flawfinder to create a list of warning
messages. You can use "grep mode" or "compile mode" to create this
list. Often "grep mode" is more convenient; it leaves compile mode
untouched so you can easily recompile once you've changed something.
However, if you want to jump to the exact column position of a hit,
compile mode may be more convenient because emacs can use the column
output of flawfinder to directly jump to the right location without
any special configuration.

To use grep mode, enter the command "M-x grep" and then enter the
needed flawfinder command. To use compile mode, enter the command
"M-x compile" and enter the needed flawfinder command. This is a
meta-key command, so you'll need to use the meta key for your
keyboard (this is usually the ESC key). As with all emacs commands,
you'll need to press RETURN after typing "grep" or "compile". So on
many systems, the grep mode is invoked by typing ESC x g r e p
RETURN.

You then need to enter a command, removing whatever was there before
if necessary. A plausible command is:

flawfinder -SQDC .

This command makes every hit report a single line, which is much
easier for tools to handle. The quiet and dataonly options remove
the other status information not needed for use inside emacs. The
trailing period means that the current directory and all descendents
are searched for C/C++ code, and analyzed for flaws.

Once you've invoked flawfinder, you can use emacs to jump around in
its results. The command C-x ` (Control-x backtick) visits the
source code location for the next warning message. C-u C-x `
(control-u control-x backtick) restarts from the beginning. You can
visit the source for any particular error message by moving to that
hit message in the *compilation* buffer or *grep* buffer and typing
the return key. (Technical note: in the compilation buffer, this
invokes compile-goto-error.) You can also click the Mouse-2 button
on the error message (you don't need to switch to the *compilation*
buffer first).

If you want to use grep mode to jump to specific columns of a hit,
you'll need to specially configure emacs to do this. To do this,
modify the emacs variable "grep-regexp-alist". This variable tells
Emacs how to parse output of a "grep" command, similar to the
variable "compilation-error-regexp-alist" which lists various formats
of compilation error messages.

Invoking from Integrated Development Environments (IDEs)
For (other) IDEs, consult your IDE's set of plug-ins.

COMMON WEAKNESS ENUMERATION (CWE)
The Common Weakness Enumeration (CWE) is ``a formal list or
dictionary of common software weaknesses that can occur in software's
architecture, design, code or implementation that can lead to
exploitable security vulnerabilities... created to serve as a common
language for describing software security weaknesses''
(https://cwe.mitre.org/about/faq.html). For more information on
CWEs, see https://cwe.mitre.org.

Flawfinder supports the CWE and is officially CWE-Compatible. Hit
descriptions typically include a relevant Common Weakness Enumeration
(CWE) identifier in parentheses where there is known to be a relevant
CWE. For example, many of the buffer-related hits mention CWE-120,
the CWE identifier for ``buffer copy without checking size of input''
(aka ``Classic Buffer Overflow''). In a few cases more than one CWE
identifier may be listed. The HTML report also includes hypertext
links to the CWE definitions hosted at MITRE. In this way,
flawfinder is designed to meet the CWE-Output requirement.

In some cases there are CWE mapping and usage challenges; here is how
flawfinder handles them. If the same entry maps to multiple CWEs
simultaneously, all the CWE mappings are listed as separated by
commas. This often occurs with CWE-20, Improper Input Validation;
thus the report "CWE-676, CWE-120" maps to two CWEs. In addition,
flawfinder provides additional information for those who are are
interested in the CWE/SANS top 25 list 2011
(https://cwe.mitre.org/top25/) when mappings are not directly to
them. Many people will want to search for specific CWEs in this top
25 list, such as CWE-120 (classic buffer overflow). The challenge is
that some flawfinder hits map to a more general CWE that would
include a top 25 item, while in some other cases hits map to a more
specific vulnerability that is only a subset of a top 25 item. To
resolve this, in some cases flawfinder will list a sequence of CWEs
in the format "more-general/more-specific", where the CWE actually
being mapped is followed by a "!". This is always done whenever a
flaw is not mapped directly to a top 25 CWE, but the mapping is
related to such a CWE. So "CWE-119!/CWE-120" means that the
vulnerability is mapped to CWE-119 and that CWE-120 is a subset of
CWE-119. In contrast, "CWE-362/CWE-367!" means that the hit is
mapped to CWE-367, a subset of CWE-362. Note that this is a subtle
syntax change from flawfinder version 1.31; in flawfinder version
1.31, the form "more-general:more-specific" meant what is now listed
as "more-general!/more-specific", while "more-general/more-specific"
meant "more-general/more-specific!". Tools can handle both the
version 1.31 and the current format, if they wish, by noting that the
older format did not use "!" at all (and thus this is easy to
distinguish). These mapping mechanisms simplify searching for
certain CWEs.

CWE version 2.7 (released June 23, 2014) was used for the mapping.
The current CWE mappings select the most specific CWE the tool can
determine. In theory, most CWE security elements
(signatures/patterns that the tool searches for) could theoretically
be mapped to CWE-676 (Use of Potentially Dangerous Function), but
such a mapping would not be useful. Thus, more specific mappings
were preferred where one could be found. Flawfinder is a lexical
analysis tool; as a result, it is impractical for it to be more
specific than the mappings currently implemented. This also means
that it is unlikely to need much updating for map currency; it simply
doesn't have enough information to refine to a detailed CWE level
that CWE changes would typically affect. The list of CWE identifiers
was generated automatically using "make show-cwes", so there is
confidence that this list is correct. Please report CWE mapping
problems as bugs if you find any.

Flawfinder may fail to find a vulnerability, even if flawfinder
covers one of these CWE weaknesses. That said, flawfinder does find
vulnerabilities listed by the CWEs it covers, and it will not report
lines without those vulnerabilities in many cases. Thus, as required
for any tool intending to be CWE compatible, flawfinder has a rate of
false positives less than 100% and a rate of false negatives less
than 100%. Flawfinder almost always reports whenever it finds a
match to a CWE security element (a signature/pattern as defined in
its database), though certain obscure constructs can cause it to fail
(see BUGS below).

Flawfinder can report on the following CWEs (these are the CWEs that
flawfinder covers; ``*'' marks those in the CWE/SANS top 25 list):

+o CWE-20: Improper Input Validation

+o CWE-22: Improper Limitation of a Pathname to a Restricted Directory
(``Path Traversal'')

+o CWE-78: Improper Neutralization of Special Elements used in an OS
Command (``OS Command Injection'')*

+o CWE-119: Improper Restriction of Operations within the Bounds of a
Memory Buffer (a parent of CWE-120*, so this is shown as
CWE-119!/CWE-120)

+o CWE-120: Buffer Copy without Checking Size of Input (``Classic
Buffer Overflow'')*

+o CWE-126: Buffer Over-read

+o CWE-134: Uncontrolled Format String*

+o CWE-190: Integer Overflow or Wraparound*

+o CWE-250: Execution with Unnecessary Privileges

+o CWE-327: Use of a Broken or Risky Cryptographic Algorithm*

+o CWE-362: Concurrent Execution using Shared Resource with Improper
Synchronization (``Race Condition'')

+o CWE-377: Insecure Temporary File

+o CWE-676: Use of Potentially Dangerous Function*

+o CWE-732: Incorrect Permission Assignment for Critical Resource*

+o CWE-785: Use of Path Manipulation Function without Maximum-sized
Buffer (child of CWE-120*, so this is shown as CWE-120/CWE-785)

+o CWE-807: Reliance on Untrusted Inputs in a Security Decision*

+o CWE-829: Inclusion of Functionality from Untrusted Control Sphere*

You can select a specific subset of CWEs to report by using the
``--regex'' (-e) option. This option accepts a regular expression,
so you can select multiple CWEs, e.g., ``--regex "CWE-120|CWE-126"''.
If you select multiple CWEs with ``|'' on a command line you will
typically need to quote the parameters (since an unquoted ``|'' is
the pipe symbol). Flawfinder is designed to meet the CWE-Searchable
requirement.

If your goal is to report a subset of CWEs that are listed in a file,
that can be achieved on a Unix-like system using the ``--regex'' aka
``-e'' option. The file must be in regular expression format. For
example, ``flawfinder -e $(cat file1)'' would report only hits that
matched the pattern in ``file1''. If file1 contained
``CWE-120|CWE-126'' it would only report hits matching those CWEs.

A list of all CWE security elements (the signatures/patterns that
flawfinder looks for) can be found by using the ``--listrules''
option. Each line lists the signature token (typically a function
name) that may lead to a hit, the default risk level, and the default
warning (which includes the default CWE identifier). For most
purposes this is also enough if you want to see what CWE security
elements map to which CWEs, or the reverse. For example, to see the
most of the signatures (function names) that map to CWE-327, without
seeing the default risk level or detailed warning text, run
``flawfinder --listrules | grep CWE-327 | cut -f1''. You can also
see the tokens without a CWE mapping this way by running ``flawfinder
-D --listrules | grep -v CWE-''. However, while --listrules lists
all CWE security elements, it only lists the default mappings from
CWE security elements to CWE identifiers. It does not include the
refinements that flawfinder applies (e.g., by examining function
parameters).

If you want a detailed and exact mapping between the CWE security
elements and CWE identifiers, the flawfinder source code (included in
the distribution) is the best place for that information. This
detailed information is primarily of interest to those few people who
are trying to refine the CWE mappings of flawfinder or refine CWE in
general. The source code documents the mapping between the security
elements to the respective CWE identifiers, and is a single Python
file. The ``c_rules'' dataset defines most rules, with reference to
a function that may make further refinements. You can search the
dataset for function names to see what CWE it generates by default;
if first parameter is not ``normal'' then that is the name of a
refining Python method that may select different CWEs (depending on
additional information). Conversely, you can search for ``CWE-
number'' and find what security elements (signatures or patterns)
refer to that CWE identifier. For most people, this is much more
than they need; most people just want to scan their source code to
quickly find problems.

SECURITY

The whole point of this tool is to help find vulnerabilities so they
can be fixed. However, developers and reviewers must know how to
develop secure software to use this tool, because otherwise, a fool
with a tool is still a fool. My book at https://dwheeler.com/secure-
programs may help.

This tool should be, at most, a small part of a larger software
development process designed to eliminate or reduce the impact of
vulnerabilities. Developers and reviewers need know how to develop
secure software, and they need to apply this knowledge to reduce the
risks of vulnerabilities in the first place.

Different vulnerability-finding tools tend to find different
vulnerabilities. Thus, you are best off using human review and a
variety of tools. This tool can help find some vulnerabilities, but
by no means all.

You should always analyze a copy of the source program being
analyzed, not a directory that can be modified by a developer while
flawfinder is performing the analysis. This is especially true if
you don't necessily trust a developer of the program being analyzed.
If an attacker has control over the files while you're analyzing
them, the attacker could move files around or change their contents
to prevent the exposure of a security problem (or create the
impression of a problem where there is none). If you're worried
about malicious programmers you should do this anyway, because after
analysis you'll need to verify that the code eventually run is the
code you analyzed. Also, do not use the --allowlink option in such
cases; attackers could create malicious symbolic links to files
outside of their source code area (such as /etc/passwd).

Source code management systems (like GitHub, SourceForge, and
Savannah) definitely fall into this category; if you're maintaining
one of those systems, first copy or extract the files into a separate
directory (that can't be controlled by attackers) before running
flawfinder or any other code analysis tool.

Note that flawfinder only opens regular files, directories, and (if
requested) symbolic links; it will never open other kinds of files,
even if a symbolic link is made to them. This counters attackers who
insert unusual file types into the source code. However, this only
works if the filesystem being analyzed can't be modified by an
attacker during the analysis, as recommended above. This protection
also doesn't work on Cygwin platforms, unfortunately.

Cygwin systems (Unix emulation on top of Windows) have an additional
problem if flawfinder is used to analyze programs that the analyst
cannot trust. The problem is due to a design flaw in Windows (that
it inherits from MS-DOS). On Windows and MS-DOS, certain filenames
(e.g., ``com1'') are automatically treated by the operating system as
the names of peripherals, and this is true even when a full pathname
is given. Yes, Windows and MS-DOS really are designed this badly.
Flawfinder deals with this by checking what a filesystem object is,
and then only opening directories and regular files (and symlinks if
enabled). Unfortunately, this doesn't work on Cygwin; on at least
some versions of Cygwin on some versions of Windows, merely trying to
determine if a file is a device type can cause the program to hang.
A workaround is to delete or rename any filenames that are
interpreted as device names before performing the analysis. These
so-called ``reserved names'' are CON, PRN, AUX, CLOCK$, NUL,
COM1-COM9, and LPT1-LPT9, optionally followed by an extension (e.g.,
``com1.txt''), in any directory, and in any case (Windows is case-
insensitive).

Do not load or diff hitlists from untrusted sources. They are
implemented using the Python pickle module, and the pickle module is
not intended to be secure against erroneous or maliciously
constructed data. Stored hitlists are intended for later use by the
same user who created the hitlist; in that context this restriction
is not a problem.

BUGS

Flawfinder is based on simple text pattern matching, which is part of
its fundamental design and not easily changed. This design approach
leads to a number of fundamental limitations, e.g., a higher false
positive rate, and is the underlying cause of most of the bugs listed
here. On the positive side, flawfinder doesn't get confused by many
complicated preprocessor sequences that other tools sometimes choke
on; flawfinder can often handle code that cannot link, and sometimes
cannot even compile or build.

Flawfinder is currently limited to C/C++. In addition, when
analyzing C++ it focuses primarily on the C subset of C++. For
example, flawfinder does not report on expressions like cin >>
charbuf, where charbuf is a char array. That is because flawfinder
doesn't have type information, and ">>" is safe with many other
types; reporting on all ">>" would lead to too many false positives.
That said, it's designed so that adding support for other languages
should be easy where its text-based approach can usefully apply.

Flawfinder can be fooled by user-defined functions or method names
that happen to be the same as those defined as ``hits'' in its
database, and will often trigger on definitions (as well as uses) of
functions with the same name. This is typically not a problem for C
code. In C code, a function with the same name as a common library
routine name often indicates that the developer is simply rewriting a
common library routine with the same interface, say for portability's
sake. C programs tend to avoid reusing the same name for a different
purpose (since in C function names are global by default). There are
reasonable odds that these rewritten routines will be vulnerable to
the same kinds of misuse, and thus, reusing these rules is a
reasonable approach. However, this can be a much more serious
problem in C++ code which heavily uses classes and namespaces, since
the same method name may have many different meanings. The
--falsepositive option can help somewhat in this case. If this is a
serious problem, feel free to modify the program, or process the
flawfinder output through other tools to remove the false positives.

Preprocessor commands embedded in the middle of a parameter list of a
call can cause problems in parsing, in particular, if a string is
opened and then closed multiple times using an #ifdef .. #else
construct, flawfinder gets confused. Such constructs are bad style,
and will confuse many other tools too. If you must analyze such
files, rewrite those lines. Thankfully, these are quite rare.

Flawfinder reports vulnerabilities regardless of the parameters of
"#if" or "#ifdef". A construct "#if VALUE" will often have VALUE of
0 in some cases, and non-zero in others. Similarly, "#ifdef VALUE"
will have VALUE defined in some cases, and not defined in others.
Flawfinder reports in all cases, which means that flawfinder has a
chance of reporting vulnerabilities in all alternatives. This is not
a bug, this is intended behavior.

Flawfinder will report hits even if they are between a literal "#if
0" and "#endif". It would be possible to change this particular
situation, but directly using "#if 0" to comment-out code (other than
during debugging) indicates (1) the removal is very temporary (in
which case we should still report it) or (2) very poor code
practices. If you want to permanently get rid of code, then delete
it instead of using "#if 0", since you can always see what it was
using your version control software. If you don't use version
control software, then that's the bug you need to fix right now.

Some complex or unusual constructs can mislead flawfinder. In
particular, if a parameter begins with gettext(" and ends with ),
flawfinder will presume that the parameter of gettext is a constant.
This means it will get confused by patterns like gettext("hi") +
function("bye"). In practice, this doesn't seem to be a problem;
gettext() is usually wrapped around the entire parameter.

The routine to detect statically defined character arrays uses simple
text matching; some complicated expressions can cause it to trigger
or not trigger unexpectedly.

Flawfinder looks for specific patterns known to be common mistakes.
Flawfinder (or any tool like it) is not a good tool for finding
intentionally malicious code (e.g., Trojan horses); malicious
programmers can easily insert code that would not be detected by this
kind of tool.

Flawfinder looks for specific patterns known to be common mistakes in
application code. Thus, it is likely to be less effective analyzing
programs that aren't application-layer code (e.g., kernel code or
self-hosting code). The techniques may still be useful; feel free to
replace the database if your situation is significantly different
from normal.

Flawfinder's default output format (filename:linenumber, followed
optionally by a :columnnumber) can be misunderstood if any source
files have very weird filenames. Filenames embedding a
newline/linefeed character will cause odd breaks, and filenames
including colon (:) are likely to be misunderstood. This is
especially important if flawfinder's output is being used by other
tools, such as filters or text editors. If you are using
flawfinder's output in other tools, consider using its CSV format
instead (which can handle this). If you're looking at new code,
examine the files for such characters. It's incredibly unwise to
have such filenames anyway; many tools can't handle such filenames at
all. Newline and linefeed are often used as internal data
delimeters. The colon is often used as special characters in
filesystems: MacOS uses it as a directory separator, Windows/MS-DOS
uses it to identify drive letters, Windows/MS-DOS inconsistently uses
it to identify special devices like CON:, and applications on many
platforms use the colon to identify URIs/URLs. Filenames including
spaces and/or tabs don't cause problems for flawfinder, though note
that other tools might have problems with them.

Flawfinder is not internationalized, so it currently does not support
localization.

In general, flawfinder attempts to err on the side of caution; it
tends to report hits, so that they can be examined further, instead
of silently ignoring them. Thus, flawfinder prefers to have false
positives (reports that turn out to not be problems) rather than
false negatives (failures to report security vulnerabilities). But
this is a generality; flawfinder uses simplistic heuristics and
simply can't get everything "right".

Security vulnerabilities might not be identified as such by
flawfinder, and conversely, some hits aren't really security
vulnerabilities. This is true for all static security scanners, and
is especially true for tools like flawfinder that use a simple
lexical analysis and pattern analysis to identify potential
vulnerabilities. Still, it can serve as a useful aid for humans,
helping to identify useful places to examine further, and that's the
point of this simple tool.

AUTHOR

David A. Wheeler (dwheeler@dwheeler.com).

Flawfinder 3 Jun 2021 FLAWFINDER(1)