STYLE(7) Standards, Environments, and Macros STYLE(7)
NAME
STYLE - C Style and Coding Standards for illumos
SYNOPSIS
This document describes a set of coding standards for C programs
written in illumos gate, the illumos source repository.
This document is based on
C Style and Coding Standards for SunOS by
Bill Shannon.
INTRODUCTION
The purpose of the document is to establish a consistent style for C
program source files within the illumos gate. Collectively, this
document describes the "illumos C style", and the scope is limited to
the application of illumos coding style to the C language.
Source code tends to be read many more times than it is written or
modified. Using a consistent style makes it easier for multiple people
to co-operate in the development and maintenance of programs. It
reduces cognitive complexity by eliminating superficial differences,
freeing the programmer to concentrate on the task at hand. This in
turn aids review and analysis of code, since small stylistic
distractions are eliminated. Further, eliding such distractions makes
it easier for programmers to work on unfamiliar parts of the code base.
Finally, it facilitates the construction of tools that incorporate the
rules in this standard to help programmers prepare programs. For
example, automated formatters, text editor integration, and so on, can
refer to this document to understand the rules of illumos C.
Of necessity, these standards cannot cover all situations. Experience
and informed judgment count for much. Inexperienced programmers who
encounter unusual situations should refer to code written by
experienced C programmers following these rules, or consult with
experienced illumos programmers for help with creating a stylistically
acceptable solution.
The illumos code base has a long history, dating back to the original
Unix from AT&T and Bell Labs. Furthermore, for many years the C style
was not formally defined, and there was much variation and many corner
cases as it evolved. As such, it is possible to find examples of code
that do not conform to this standard in the source tree. If possible,
strongly consider converting to this style before beginning substantial
work on such code. If that is not practical, then favor consistency
with surrounding code over conformity. All new code should conform to
these rules.
Character Set Source files predominantly use ASCII printing characters. However,
UTF-8 may be used when required for accurate representation of names
and proper nouns in comments and string literals. Exercise some care
here, however: be aware that source files are consumed by many tools
beyond just compilers, and some may not be able to cope with multi-byte
extended characters. In particular, ISO/IEC 9899:1999 ("ISO C99") is
used for most illumos source and technically limits its "extended"
character set to characters that can fit into a single byte. C11
relaxes this, and most illumos tools are already fairly tolerant here,
but use sound judgment with non-ASCII characters: people should not be
forced to change their names, but do not add emoji or other extraneous
content. UTF-8 may not be used in identifiers.
Generally favor ASCII-only in header files. For new code, Avoid non-
ASCII characters from non-UTF-8 character encodings such as ISO-8859-1
or similar. Pseudo-graphical line printing characters and similar
glyphs are not permitted, though diagrams made with "ASCII art" using
`+', `-', `|' and so on are permitted. Non-printing characters, such
as control characters (including form-feeds, backspace, and similar)
should not appear in source files. Terminal escape sequences to change
text color or position are similarly prohibited.
Inside of string constants, prefer C escape sequences instead of
literal characters for tabs, form-feeds, carriage returns, newlines,
and so on. Obviously, use a literal space character when a space is
required in a string: do not use octal or hex escape sequences when a
space literal will do.
Generally prefer the use of C character constants to numeric code
points. For example, use
if (*p == '\n')
return (EOL); /* end of line */
instead of,
#define NL 10
if (*p == NL)
return (EOL); /* end of line */
An exception here may be if reading octet-oriented data where specific
values are known in advance, such as when parsing data read from a
socket.
Lines in Source Files Lines in source files are limited to 80 columns. If a logical line
exceeds this, it must be broken and continued on a new line.
Continuation Lines
Continuation lines are used when a logical statement or expression will
not fit in the available space, such as a procedure call with many
arguments, or a complex boolean or arithmetic expression. When this
happens, the line should be broken as follows:
+o After a comma in the case of a function call or function
definition. Note, never break in the middle of a parameter
expression, such as between the type and argument name.
+o After the last operator that fits on the line for arithmetic,
boolean and ternary expressions.
A continuation line should never start with a logical or binary
operator. The next line should be further indented by four literal
space characters (half a tab stop). If needed, subsequent continuation
lines should be broken in the same manner, and aligned with each other.
For example,
if (long_logical_test_1 || long_logical_test_2 ||
long_logical_test_3) {
statements;
}
a = (long_identifier_term1 - long_identifier_term2) *
long_identifier_term3;
function(long_complicated_expression1, long_complicated_expression2,
long_complicated_expression3, long_complicated_expression4,
long_complicated_expression5, long_complicated_expression6)
It is acceptable to break a line earlier than necessary in order to
keep constructs together to aid readability or understanding. For
example,
if ((flag & FLAG1) != 0 ||
(flag & FLAG2) != 0 ||
(flag & FLAG3) != 0) {
statements;
}
Continuation lines usually occur when blocks are deeply nested or very
long identifiers are used, or functions have many parameters. Often,
this is a sign that code should be rewritten or broken up, or that the
variable name is not fit for purpose. A strategically introduced
temporary variable may help clarify the code. Breaking a particularly
large function with deeply nested blocks up into multiple, smaller
functions can be an improvement. Using a structure to group arguments
together instead of having many positional parameters can make function
signatures shorter and easier to understand.
Indentation and White Space
Initial indentation must use only tab characters, with tabs set to
eight spaces. Continuation lines are indented with tabs to the
continued line, and then further indented by another four spaces, as
described above. If indentation causes the code to be too wide to fit
in 80 columns, it may be too complex and would be clearer if it were
rewritten, as described above. The rules for how to indent particular
C constructs such as
if,
for and
switch are described in
Compound Statements.
Tab characters may also be used for alignment beyond indentation within
source files, such as to line up comments, but avoid using spaces for
this. Note that "ASCII art" diagrams in block comments are explicitly
exempt from this rule, and may use spaces for alignment as needed. A
space followed by a tab outside of a string constant is forbidden.
Trailing white space is not permitted, whether at the ends of lines or
at the end of the file. That is, neither trailing blanks or tabs at
the ends of lines nor additional newlines at the end of a file are
allowed. The last character in each source file should be a newline
character.
Comments Comments should be used to give overviews of code and provide
additional information that is not readily apparent from the source
itself. Comments should only be used to describe
what code does or
why it is implemented the way that it is, but should not describe
how code
works. Very rare exceptions are allowed for cases where the
implementation is particularly subtle.
Source files should begin with a block comment that includes license
information for that file, as well as a list of copyright holders.
However, source files should not contain comments listing authors or
the modification history for the file: this information belongs in the
revision control system and issue tracker. Following the copyright
material, an explanatory comment that describes the file's purpose,
provides background, refences to relevant standards, or similar
information, is helpful. A suitable template for new files can be
found in
usr/src/prototypes within the illumos-gate code repository.
Front-matter aside, comments should only contain information that is
germane to reading and understanding the program. External
information, such about how the corresponding package is built or what
directory it should reside in should not be in a comment in a source
file. Discussions of non-trivial design decisions are appropriate if
they aid in understanding the code, but again avoid duplicating
information that is present in, and clear from, the code. In general,
avoid including information that is likely to become out-of-date in
comments; for example, specific section numbers of rapidly evolving
documents may change over time.
Comments should
not be enclosed in large boxes drawn with asterisks or
other characters. Comments should never include special characters,
such as form-feed or backspace, and no terminal drawing characters.
There are three styles of comments: block, single-line, and trailing.
Block Comments
The opening `/*' of a block comment that appears at the top-level of a
file (that is, outside of a function, structure definition, or similar
construct) should be in column one. There should be a `*' in column 2
before each line of text in the block comment, and the closing `*/'
should be in columns 2-3, so that the `*'s line up. This enables `grep
^.\*' to match all of the top-level block comments in a file. There is
never any text on the first or last lines of a block comment. The
initial text line is separated from the * by a single space, although
later text lines may be further indented, as appropriate for clarity.
/*
* Here is a block comment.
* The comment text should be spaced or tabbed over
* and the opening slash-star and closing star-slash
* should be alone on a line.
*/
Block comments are used to provide high-level, natural language
descriptions of the content of files, the purpose of functions, and to
describe data structures and algorithms. Block comments should be used
at the beginning of each file and before functions as necessary.
The very first comment in a file should be front-matter containing
license and copyright information, as mentioned above.
Following the front-matter, files should have block comments that
describe their contents and any special considerations the reader
should take note of while reading.
A block comment preceding a function should document what it does,
input parameters, algorithm, and returned value. For example,
/*
* index(c, str) returns a pointer to the first occurrence of
* character c in string str, or NULL if c doesn't occur
* in the string.
*/
In many cases, block comments inside a function are appropriate, and
they should be indented to the same indentation level as the code that
they describe.
Block comments should contain complete, correct sentences and should
follow the English language rules for punctuation, grammar, and
capitalization. Sentences should be separated by either a single space
or two space characters, and such spacing should be consistent within a
comment. That is, either always separate sentences with a single space
or with two spaces, but do not mix styles within a comment (and ideally
do not mix styles within a source file). Paragraphs within a block
comment should be separated by an empty line containing only a space,
`*' and newline. For example,
/*
* This is a block comment. It consists of several sentences
* that are separated by two space characters. It should say
* something significant about the code.
*
* This comment also contains two separate paragraphs, separated
* by an "empty" line. Note that the "empty" line still has the
* leading ' *'.
*/
Do not indent paragraphs with spaces or tabs.
Single-Line Comments A single-line comment is a short comment that may appear on a single
line indented so that it matches the code that follows. Short phrases
or sentence fragments are acceptable in single-line comments.
if (argc > 1) {
/* get input file from command line */
if (freopen(argv[1], "r", stdin) == NULL)
err(EXIT_FAILURE, "can't open %s\n", argv[1]);
}
The comment text should be separated from the opening `/*' and closing
`*/' by a space.
The closing `*/'s of several adjacent single-line comments should
not be forced to be aligned vertically. In general, a block comment should
be used when a single line is insufficient.
Trailing Comments
Very short comments may appear on the same line as the code they
describe, but should be tabbed over far enough to separate them from
the statements. If more than one short comment appears in a block of
code, they should all be tabbed to the same indentation level.
Trailing comments are most often sentence fragments or short phrases.
if (a == 2)
return (TRUE); /* special case */
else
return (isprime(a)); /* works only for odd a */
Trailing comments are most useful for documenting declarations and non-
obvious cases. Avoid the assembly language style of commenting every
line of executable code with a trailing comment.
Trailing comments are often also used on preprocessor
#else and
#endif statements if they are far away from the corresponding test. See
Preprocessor for more guidance on this.
XXX and TODO comments
Do not add "XXX" or "TODO" comments in new code.
Naming Conventions It has been said that naming things is the hardest problem in computer
science, and the longevity of illumos means that there is wide
variation across the source base when it comes to identifiers. Much of
this was driven by the demands of early C dialects, that restricted
externally visible identifiers to 6 significant characters. While this
ancient restriction no longer applies in modern C, there is still an
aesthetic preference for brevity and some argument about backwards
compatibility with third-party compilers. Regardless, consistent
application of conventions for identifiers can make programs more
understandable and easier to read. Naming conventions can also give
information about the function of the identifier, whether constants,
named types, variables, or similar, that can be helpful in
understanding code. Programmers should therefore be consistent in
using naming conventions within a project. Individual projects will
undoubtedly have their own naming conventions incorporating terminology
specific to that project.
In general, the following guidelines should be followed:
+o The length of a name should be proportional to its scope. An
identifier declared at global scope would generally be longer
than one declared in a small block; an index variable used in
a one-line loop might be a single character.
+o Names should be short but meaningful. Favor brevity.
+o One character names should be avoided except for temporary
variables of short scope. If one uses a single character
name, then use variables
i,
j,
k,
m,
n for integers,
c,
d,
e for characters,
p,
q for pointers, and
s,
t for character
pointers. Avoid variable
l (lower-case L) because it is hard
to distinguish between `1' (the digit one) and `I' (capital
i) on some printers and displays.
+o Pointers may have a `p' appended to their names for each
level of indirection. For example, a pointer to the variable
dbaddr can be named
dbaddrp (or perhaps simply
dp), if the
scope is small enough. Similarly,
dbaddrpp would be a
pointer to a pointer to
dbaddr.
+o Separate "words" in a long identifier with underscores:
create_panel_item
Mixed case names like `CreatePanelItem', or `javaStyleName',
are strongly discouraged and should not be used for new code.
+o Leading underscores are reserved by the C standard and
generally should not be used in identifiers for user-space
programs. They may be used in user-space libraries or in the
kernel with some caution, though be careful to avoid
conflicts with constructs from standard C, such as `_Bool',
`_Alignof', and so on. Trailing underscores should be
similarly avoided in user-space programs.
+o Two conventions are used for named types in the form of
typedefs. Within the kernel and in many places in userland,
named types are given a name ending in `_t', for example,
typedef enum { FALSE, TRUE } bool_t;
typedef struct node node_t;
Technically such names are reserved by POSIX, but some
liberties are taken here given both the age and provenance of
the illumos code base. Note that typedefs for function
pointer types may end in `_f' to signify that they refer to
function types.
In some user programs named types have their first letter
capitalized, as in,
typedef enum { FALSE, TRUE } Bool;
typedef struct node Node;
This practice is deprecated; all new code must use the `_f'
and `_t' suffices for named types.
+o #define names for constants should be in all CAPS. Separate
words with underscores, as for variable names.
+o Function-like macro names may be all CAPS or all lower case.
Prefer all upper case macro names for new code. Some macros
(such as
getchar(3C) and
putchar(3C)) are in lower case since
they may also exist as functions. Others, such as
major(3C),
minor(3C), and
makedev(3C) are macros for historical reasons.
+o Variable names, structure tag names, and function names
should be lower case.
Note: in general, with the exception of named types, it is
best to avoid names that differ only in case, like
foo and
FOO. The potential for confusion is considerable. However,
it is acceptable to use a name which differs only in
capitalization from its base type for a typedef, such as,
typedef struct node Node;
It is also acceptable to give a variable of this type a name
that is the all lower case version of the type name. For
example,
Node node;
+o Struct members should be prefixed with an identifier as
described in
Structures and Unions.
+o The individual items of enums should be made unique names by
prefixing them with a tag identifying the package to which
they belong. For example,
enum rainbow { RB_red, RB_orange, RB_green, RB_blue };
The
mdb(1) debugger supports enums in that it can print out
the value of an enum, and can also perform assignment
statements using an item in the range of an enum. Thus, the
use of enums over equivalent
#defines may aid debugging
programs. For example, rather than writing:
#define SUNDAY 0
#define MONDAY 1
write:
enum day_of_week { DW_SUNDAY, DW_MONDAY, ... };
Enums of this sort can be particularly useful for bitfields,
as the
mdb(1) debugger can decode them symbolically. For
example, an instance of:
enum vmx_caps {
VMX_CAP_NONE = 0,
VMX_CAP_TPR_SHADOW = (1UL << 0),
VMX_CAP_APICV = (1UL << 1),
VMX_CAP_APICV_X2APIC = (1UL << 2),
VMX_CAP_APICV_PIR = (1UL << 3),
};
with all bits set is printed by
mdb(1) as
0xf (VMX_CAP_{TPR_SHADOW|APICV|APICV_X2APIC|APICV_PIR})
+o Implementors of libraries should take care to hide symbols
that are private to the library. If a symbol is local to a
single module, one may simply declare it as
static. For
symbols that are shared between several translation units in
the same library, and therefore must be declared
extern, the
programmer should use the linker and mapfiles to hide private
symbols. For symbols that are logically private to group of
libraries, one may use a naming convention, such as prefixing
the name with an underscore and a tag that is unique to the
package, such as, `_panel_caret_mpr', but it is not necessary
to use stylistic conventions to hide symbols that will not be
exported. Programmers may optionally use such a naming
convention as an additional signal that symbols are internal
to a library, but this is not required.
+o One should always use care to avoid conflicts with
identifiers reserved by C.
+o Generally use nouns for type names and verbs or verb phrases
for functions.
Declarations There is considerable variation in the format of declarations within
the illumos gate. As an example, there are many places that use one
declaration per line, and employ tab characters to line up the variable
names:
int level; /* indentation level */
int size; /* size of symbol table */
int lines; /* lines read from input */
and it is also common to declarations combined into a single line,
particularly when the variable names are self-explanatory or temporary:
int level, size, lines;
Indentation between type names or qualifiers and identifiers also
varies. Some use no such indentation:
int level,
volatile uint8_t byte;
char *ptr;
while many programmers feel that aligning variable declarations makes
code more readable:
int x;
extern int y;
volatile int count;
char **pointer_to_string;
However note that declarations such as the following probably make code
harder to read:
struct very_long_structure_name *p;
struct another_very_long_structure_name *q;
char *s;
int i;
short r;
While these styles vary, there are some rules which should be applied
consistently:
+o Always use function prototypes in preference to old-style
function declarations for new code.
+o Variables and functions should not be declared on the same
line.
+o Variables which are initialized at the time of declaration
should be declared on separate lines. That is one should
write:
int size, lines;
int level = 0;
instead of:
int level = 0, size, lines;
+o Variable declarations should be scoped to the smallest
possible block in which they are used.
+o Variable names within inner blocks should not shadow those at
higher levels.
+o For code compiled with flags that enable ISO/IEC 9899:1999
("ISO C99") features, additionally:
+o A
for loop may declare and initialize its counting
variable. Note that the most appropriate type for
counting variables is often
size_t or
uint_t rather
than
int. In particular, take care when indexing
into arrays:
size_t is guaranteed to be large
enough to index any array, whereas
uint_t is not.
+o Variables do not have to be declared at the start
of a block. However, care should be taken to use
this feature only where it makes the code more
readable.
External Declarations
External declarations should begin in column 1. Each declaration
should be on a separate line. A comment describing the role of the
object being declared should be included, with the exception that a
list of defined constants does not need comments if the constant names
themselves are sufficient documentation. Constant names and their
defined values should be tabbed so that they line up underneath each
other. For a block of related objects, a single block comment is
sufficient. However, if trailing comments are used, these should also
be tabbed to line up underneath each other.
Structures and Unions
For structure and union template declarations, each element should be
on its own line with a comment describing it. The
struct keyword and
opening brace `{' should be on the same line as the structure tag, and
the closing brace should be alone on a line in column 1. Each member
is indented by one tab:
struct boat {
int b_wllength; /* water line length in feet */
int b_type; /* see below */
long b_sarea; /* sail area in square feet */
};
Struct members should be prefixed with an abbreviation of the struct
name followed by an underscore (`_'). Typically the first character of
each word in the struct's name is used for the prefix. While not
required by the language, this convention disambiguates the members for
tools such as
cscope(1). For example, consider a structure with a
member named `len', this could lead to many ambiguous references.
Use of `static' In any file which is part of a larger whole rather than a self-
contained program, maximum use should be made of the
static keyword to
make functions and variables local to single files. Variables in
particular should be accessible from other files only when there is a
clear need that cannot be filled in another way. Such usage, and in
particular its rationale, should be made clear with comments, and
possibly with a private header file.
Qualifiers
Qualifiers, like `const', `volatile', and `restrict' are used to
communicate information to the compiler about how an object is used.
This can be very useful for facilitating optimizations that can
dramatically improve the runtime performance of code. Appropriate
qualification can prevent bugs. For example, a `const' qualified
pointer points to an object that cannot be modified; an attempt to do
so will give a compile-time error, rather than runtime data corruption.
Additionally, use of such qualifiers can communicate attributes of an
interface to a programmer who uses that interface; a programmer who
passes a pointer to a function that expects a `const' qualified
parameter knows that the function will not modify the value the pointer
refers to. Use qualifiers, but beware of some caveats.
Pointer variables that are `const' qualified should not cast away the
qualifier; the compiler may make optimizations based on the
qualification that are invalid if applied in a non-const context.
Similarly, it is undefined behavior to discard the qualifier for
variables that are `volatile'. Note that this means that one cannot,
for example, pass a volatile-qualified pointer to many functions, such
as
memcpy(3C) or
memset(3C).
Function Definitions A complex function should be preceded by a prologue in a block comment
that gives the name and a short description of what the function does.
The type of the value returned should be alone on a line in column 1,
including any qualifiers, such as `const' or `static'. Functions that
return
int should have that return type explicitly specified:
traditional C's default of
int for the return type of unqualified
functions is deprecated. If the function does not return a value then
it should be given the return type
void. If the return value requires
explanation, it should be given in the block comment. Functions and
variables that are not used outside of the file they are defined in
should be declared as
static. This lets the reader know explicitly
that they are private, and also eliminates the possibility of name
conflicts with variables and procedures in other files.
Functions must be declared using ANSI X3.159-1989 ("ANSI C89") syntax
rather than K&R. There are still places within the illumos gate that
use K&R syntax and these should be converted as work is done in those
areas.
All local declarations and code within the function body should be
tabbed over at least one tab, with the level of indentation reflecting
the structure of the code. Labels should appear in column 1. If the
function uses any external variables or functions that are not
otherwise declared
extern at the file level or in a header file, these
should have their own declarations in the function body using the
extern keyword. If the external variable is an array, the array bounds
must be repeated in the
extern declaration.
If an external variable or value of a parameter passed by pointer is
changed by the function, that should be noted in the block comment.
All comments about parameters and local variables should be tabbed so
that they line up vertically. The declarations should be separated
from the function's statements by a blank line.
Note that functions that take no parameters must always have a void
parameter, as shown in the first example below.
The following examples illustrate many of the rules for function
definitions.
/*
* sky_is_blue()
*
* Return true if the sky is blue, else false.
*/
bool
sky_is_blue(void)
{
extern int hour;
if (hour < MORNING || hour > EVENING)
return (false); /* black */
else
return (true); /* blue */
}
/*
* tail(nodep)
*
* Find the last element in the linked list
* pointed to by nodep and return a pointer to it.
*/
Node *
tail(Node *nodep)
{
Node *np; /* current pointer advances to NULL */
Node *lp; /* last pointer follows np */
np = lp = nodep;
while ((np = np->next) != NULL)
lp = np;
return (lp);
}
/*
* ANSI C Form 1.
* Use this form when the arguments easily fit on one line,
* and no per-argument comments are needed.
*/
int
foo(int alpha, char *beta, struct bar gamma)
{
...
}
/*
* ANSI C Form 2.
* This is a variation on form 1, using the standard continuation
* line technique (indent by 4 spaces). Use this form when no
* per-argument comments are needed, but all argument declarations
* won't fit on one line.
*/
int
foo(int alpha, char *beta,
struct bar gamma)
{
...
}
/*
* ANSI C Form 3.
* Use this form when per-argument comments are needed.
* Note that each line of arguments is indented by a full
* tab stop. Note carefully the placement of the left
* and right parentheses.
*/
int
foo(
int alpha, /* first arg */
char *beta, /* arg with a long comment needed */
/* to describe its purpose */
struct bar gamma) /* big arg */
{
...
}
A single blank line should separate function definitions.
Type Declarations Many programmers use named types, such as,
typedefs, liberally. They
feel that the use of typedefs simplifies declaration lists and can make
program modification easier when types must change. Other programmers
feel that the use of a typedef hides the underlying type when they want
to know what the type is. This is particularly true for programmers
who need to be concerned with efficiency, like kernel programmers, and
therefore need to be aware of the implementation details. The choice
of whether or not to use typedef is left to the implementor.
If one elects to use a typedef in conjunction with a pointer type, the
underlying type should be typedef-ed, rather than typedef-ing a pointer
to underlying type, because it is often necessary and usually helpful
to be able to tell if a type is a pointer.
The use of ISO/IEC 9899:1999 ("ISO C99") unsigned integer identifiers
of the form
uintXX_t is preferred over the older BSD-style
u_intXX_t.
New code should use the former, and old code should be converted to the
new form if other work is being done in that area.
Boolean Types ISO/IEC 9899:1999 ("ISO C99") introduced the `_Bool' keyword and
preprocessor macros for the
bool,
true, and
false symbols in the
<
stdbool.h> header (<
sys/stdbool.h> in the kernel). Prior to this, C
had no standard boolean type, but illumos provided an `enum',
boolean_t, with variants B_FALSE and B_TRUE that is widely used.
Sadly, these two types differ significantly:
+o bool tends to be defined by ABIs as being a single byte wide,
while enumerations, and thus
boolean_t, use the same
representation as an
int.
+o bool is defined to be unsigned, while the enumerated type
boolean_t is signed.
+o The type "rank" of
_Bool is defined to be lower than all
other integer types.
+o The only legal values of variables of type
bool are 0 and 1
(false and true respectively), and while
boolean_t is only
defined with two variants, nothing structurally prevents an
assignment from a different value.
+o Type conversion to
_Bool has different semantics than
assignment to other integer types: conversion results in a 0
if and only if the original value compares equal to 0,
otherwise the result is a 1. For an
int, truncating,
rounding behavior, or sign extending behavior is used.
Thus, programmers must exercise significant care when mixing code using
the standard type and
boolean_t.
Broadly, new code should prefer the use of
bool when available.
However, code that makes extensive use of
boolean_t should generally
continue to do so. Do not mix
bool and
boolean_t in the same
struct,
for example. Similarly, if a file makes extensive use of one, then do
not use the other. Furthermore be aware that using
bool requires at
least ISO/IEC 9899:1999 ("ISO C99"), which is not mandated across the
system, so exercise care in public interfaces. Be particularly aware
that transitive includes of header files could mean that code using
constructs such as
bool might leak into code that targets an older
version of the language; the programmer must not allow this to happen.
For example, should a use of
bool inadvertantly end up in <
stdlib.h>,
<
sys/types.h>, or another standard-mandated or traditional Unix header
file and be available outside of a ISO/IEC 9899:1999 ("ISO C99")
compilation environment, older programs could fail to compile.
Do not use
int or another type to present boolean values in new code.
Guidelines for mixing boolean types
As mentioned above, care must taken when mixing
bool and
boolean_t types. In particular:
+o Assigning from a variable of type
bool to one of
boolean_t,
or vice versa, is generally safe. This includes assigning
the value returned from a function of one type to the other.
+o Passing arguments of one type to a function expecting the
other is generally safe.
+o Simple comparisons between the two types are generally safe.
However, taking a pointer to a variable of one type and casting it to
the other is not safe and should never be done. Similarly, changing
the definition of one type to another in a
struct or
union is not safe
unless one can guarantee that the element of such a compound type is
never referred to by pointer and that the type is never used as part of
a public interface, such as an
ioctl(2).
Statements Each line should contain at most one statement. In particular, do not
use the comma operator to group multiple statements on one line, or to
avoid using braces. For example,
argv++; argc--; /* WRONG */
if (err)
fprintf(stderr, "error"),
exit(1); /* VERY WRONG */
Nesting the ternary conditional operator (?:) can lead to confusing,
hard to follow code. For example:
num = cnt < tcnt ? (cnt < fcnt ? fcnt : cnt) :
tcnt < bcnt ? tcnt : bcnt > fcnt ? fcnt : bcnt; /* WRONG */
Avoid expressions like these, and in general do not nest the ternary
operator unless doing so is unavoidable.
If the
return statement is used to return a value, the expression
should always be enclosed in parentheses.
Functions that return no value should
not include a return statement as
the last statement in the function, though early return via a bare
return; on a line by itself is permitted.
Compound Statements
Compound statements are statements that contain lists of statements
enclosed in braces (`{}'). The enclosed list should be indented one
more level than the compound statement itself. The opening left brace
should be at the end of the line beginning the compound statement, and
the closing right brace should be alone on a line, positioned under the
beginning of the compound statement (see examples below). Note that
the left brace that begins a function body is the only occurrence of a
left brace which should be alone on a line.
Braces are also used around a single statement when it is part of a
control structure, such as an
if-else or
for statement, as in:
if (condition) {
if (other_condition)
statement;
}
Some programmers feel that braces should be used to surround
all statements that are part of control structures, even singletons,
because this makes it easier to add or delete statements without
thinking about whether braces should be added or removed. Some
programmers reason that, since some apparent function calls might
actually be macros that expand into multiple statements, always using
braces allows such macros to always work safely. Thus, they would
write:
if (condition) {
return (0);
}
Here, the braces are optional and may be omitted to save vertical
space. However:
+o if one arm of an
if-else statement contains braces, all arms
should contain braces;
+o if the condition or singleton occupies more than one line,
braces should always be used;
if (condition) {
fprintf(stderr, "wrapped singleton: %d\n",
errno);
}
if (strncmp(str, "long condition",
sizeof ("long condition") - 1) == 0) {
fprintf(stderr, "singleton: %d\n", errno);
}
+o if the body of a
for or
while loop is empty, no braces are
needed:
while (*p++ != c)
;
Examples
if, if-else, if-else if-else statements if (condition) {
statements;
}
if (condition) {
statements;
} else {
statements;
}
if (condition) {
statements;
} else if (condition) {
statements;
}
Note that the right brace before the
else and the right brace before
the
while of a
do-while statement (see below) are the only places where
a right brace appears that is not alone on a line.
for statements for (initialization; condition; update) {
statements;
}
When using the comma operator in the initialization or update clauses
of a
for statement, it is suggested that no more than three variables
should be updated. More than this tends to make the expression too
complex. In this case it is generally better to use separate
statements outside the
for loop (for the initialization clause), or at
the end of the loop (for the update clause).
The initialization, condition, and update portions of a
for loop may be
omitted.
The infinite loop is written using a
for loop.
for (;;) {
statements;
}
while statements while (condition) {
statements;
}
When writing
while loops, prefer nested assignment inside of
comparison. That is, prefer:
while ((c = getc()) != EOF) {
statements;
}
over,
c = get();
while (c != EOF) {
statements;
c = getc();
}
do-while statements do {
statements;
} while (condition);
switch statements switch (condition) {
case ABC:
case DEF:
statements;
break;
case GHI:
statements;
/* FALLTHROUGH */
case JKL: {
int local;
statements;
}
case XYZ:
statements;
break;
default:
statements;
break;
}
The last
break is, strictly speaking, redundant, but it is recommended
form nonetheless because it prevents a fall-through error if another
case is added later after the last one.
When using the fall-through feature of
switch, a comment of the style
shown above should be used. In addition to being a useful note for
future maintenance, it serves as a hint to the compiler that this is
intentional and should not therefore generate a warning.
All
switch statements should include a default case with the possible
exception of a switch on an
enum variable for which all possible values
of the
enum are listed.
Don't assume that the list of cases covers all possible cases. New,
unanticipated, cases may be added later, or bugs elsewhere in the
program may cause variables to take on unexpected values.
Each
case statement should be indented to the same level as the
switch statement. Each
case statement should be on a line separate from the
statements within the case.
White Space Vertical White Space
Judicious use of lines can improve readability by setting off sections
of code that are logically related. Use vertical white space to make
it clear that stanzas are logically separated.
A blank line should always be used in the following circumstances:
+o After the
#include section at the top of a source file.
+o After blocks of
#defines of constants, and before and after
#defines of macros.
+o Between structure declarations.
+o Between functions.
+o After local variable declarations.
Form-feeds should never be used to separate functions.
Horizontal White Space Here are the guidelines for blank spaces:
+o A blank should follow a keyword whenever a parenthesis
follows the keyword. Note that both
sizeof and
return are
keywords, whereas things like
strlen(3C) and
exit(3C) are
not.
Blanks should not be used between procedure names (or macro
calls) and their argument list. This helps to distinguish
keywords from procedure calls.
/*
* No space between strncmp and '(' but
* there is one between sizeof and '('
*/
if (strncmp(x, "done", sizeof ("done") - 1) == 0)
...
+o Blanks should appear after commas in argument lists.
+o Blanks should
not appear immediately after a left parenthesis
or immediately before a right parenthesis.
+o All binary operators except `.' and `->' should be separated
from their operands by blanks. In other words, blanks should
appear around assignment, arithmetic, relational, and logical
operators.
Blanks should never separate unary operators such as unary
minus, address (`&'), indirection (`*'), increment (`++'),
and decrement (`--') from their operands. Note that this
includes the unary `*' that is a part of pointer
declarations.
Examples:
char *d, *s;
a += c + d;
a = (a + b) / (c * d);
strp->field = str.fl - ((x & MASK) >> DISP);
while ((*d++ = *s++) != ' ')
n++;
+o The expressions in a
for statement should be separated by
blanks:
for (expr1; expr2; expr3)
If an expression is omitted, no space should be left in its
place:
for (expr1; expr2;)
+o Casts should not be followed by a blank, with the exception
of function calls whose return values are ignored:
(void) myfunc((uintptr_t)ptr, (char *)x);
Hidden White Space
There are many uses of blanks that will not be visible when viewed on a
terminal, and it is often difficult to distinguish blanks from tabs.
However, inconsistent use of blanks and tabs may produce unexpected
results when the code is printed with a pretty-printer, and may make
simple regular expression searches fail unexpectedly. The following
guidelines are helpful:
+o Spaces and tabs at the end of a line are not permitted.
+o Spaces between tabs, and tabs between spaces, are not
permitted.
+o Use tabs to line things up in columns (such as for indenting
code, and to line up elements within a series of
declarations) and spaces to separate items within a line.
+o Use tabs to separate single line comments from the
corresponding code.
Parentheses Since C has complex precedence rules, parentheses can clarify the
programmer's intent in complex expressions that mix operators.
Programmers should feel free to use parentheses if they feel that they
make the code clearer and easier to understand. However, bear in mind
that this can be taken too far, so some judgment must be applied to
prevent making things less readable. For example, compare:
x = ((x * 2) * 3) + (((y / 2) * 3) + 1);
to,
x = x * 2 * 3 + y / 2 * 3 + 1;
It is also important to remember that complex expressions can be used
as parameters to macros, and operator-precedence problems can arise
unless
all occurrences of parameters in the body of a macro definition
have parentheses around them.
Constants Numeric constants should not generally be written directly. Instead,
give the constant a meaningful name using a
const variable, an
enum or
the
#define feature of the C preprocessor. This makes it easier to
maintain large programs since the constant value can be changed
uniformly by changing only the constant's definition.
The enum data type is the preferred way to handle situations where a
variable takes on only a discrete set of values, since additional type
checking is available through the compiler and, as mentioned above,
tools such as the
mdb(1) debugger also support enums.
There are some cases where the constants 0 and 1 may appear as
themselves instead of as
#defines. For example if a
for loop indexes
through an array, then
for (i = 0; i < ARYBOUND; i++)
is reasonable.
In rare cases, other constants may appear as themselves. Some judgment
is required to determine whether the semantic meaning of the constant
is obvious from its value, or whether the code would be easier to
understand if a symbolic name were used for the value.
Goto While not completely avoidable, use of
goto is generally discouraged.
In many cases, breaking a procedure into smaller pieces, or using a
different language construct can eliminate the need for
gotos. For
example, instead of:
again:
if (s = proc(args))
if (s == -1 && errno == EINTR)
goto again;
write:
do {
s = proc(args);
} while (s == -1 && errno == EINTR);
The main place where
gotos can be usefully employed is to break out of
several levels of
switch or loop nesting, or to centralize error path
cleanup code in a function. For example:
for (...)
for (...) {
...
if (disaster)
goto error;
}
...
error:
clean up the mess;
However the need to do such things may indicate that the inner
constructs should be broken out into a separate function. Never use a
goto outside of a given block to branch to a label within a block:
goto label; /* WRONG */
...
for (...) {
...
label:
statement;
...
}
When a
goto is necessary, the accompanying label should be alone on a
line.
Variable Initialization C permits initializing a variable where it is declared. Programmers
are equally divided about whether or not this is a good idea: "I like
to think of declarations and executable code as separate units.
Intermixing them only confuses the issue. If only a scattered few
declarations are initialized, it is easy not to see them." "The major
purpose of code style is clarity. I think the less hunting around for
the connections between different places in the code, the better. I
don't think variables should be initialized for no reason, however. If
the variable doesn't need to be initialized, don't waste the reader's
time by making him/her think that it does."
A convention used by some programmers is to only initialize automatic
variables in declarations if the value of the variable is constant
throughout the block; such variables should be declared
const. Note
that as a matter of correctness, all automatic variables must be
initialized before use, either in the declaration or elsewhere.
The decision about whether or not to initialize a variable in a
declaration is therefore left to the implementor. Use good taste. For
example, don't bury a variable initialization in the middle of a long
declaration:
int a, b, c, d = 4, e, f; /* This is NOT good style */
Multiple Assignments C also permits assigning several variables to the same value in a
single statement, as in,
x = y = z = 0;
Good taste is required here also. For example, assigning several
variables that are used the same way in the program in a single
statement clarifies the relationship between the variables by making it
more explicit:
x = y = z = 0;
vx = vy = vz = 1;
count = 0;
scale = 1;
is good, whereas:
x = y = z = count = 0;
vx = vy = vz = scale = 1;
sacrifices clarity for brevity. In any case, the variables that are so
assigned should all be of the same type (or all pointers being
initialized to NULL). It is not a good idea to use multiple
assignments for complex expressions, as this can be significantly
harder to read. E.g.,
foo_bar.fb_name.firstch = bar_foo.fb_name.lastch = 'c'; /* Yecch */
Preprocessor The C preprocessor provides support for textual inclusion of files
(most often header files), conditional compilation, and macro
definitions and substitutions.
It should be noted that the preprocessor works at the lexicographical,
not syntactic level of the language. It is possible to define macros
that are not syntactically valid when expanded, and the programmer
should take care when using the preprocessor. Some general advice
follows.
Do not rename members of a structure using
#define within a subsystem;
instead, use a
union. The legacy practice of using
#define to define
shorthand notations for referencing members of a union should not be
used in new code.
Be
extremely careful when choosing names for
#defines. For example,
never use something like
#define size 10
especially in a header file, since it is not unlikely that the user
might want to declare a variable named
size.
Remember that names used in
#define statements come out of a global
preprocessor name space and can conflict with names in any other
namespace. For this reason, this use of
#define is discouraged.
Note that
#define follows indentation rules similar to other
declarations; see the section on
Indentation for details.
Care is needed when defining macros that replace functions since
functions pass their parameters by value whereas macros pass their
arguments by name substitution.
At the end of an
#ifdef construct used to select among a required set
of options (such as machine types), include a final
#else clause
containing a useful but illegal statement so that the compiler will
generate an error message if none of the options has been defined:
#ifdef vax
...
#elif sun
...
#elif u3b2
...
#else
#error unknown machine type;
#endif /* machine type */
Header files should make use of "include guards" to prevent their
contents from being evaluated multiple times. For example,
#ifndef _FOOBAR_H
#define _FOOBAR_H
/* Header contents....
#endif /* !_FOOBAR_H */
The symbol defined for the include guard should be uniquely derived
from the header file's name. Note that this is one area where library
authors often use a leading underscore in an identifier. While this is
technically in violation of the ISO C standard, the practice is common.
Don't change C syntax via macro substitution. For example,
#define BEGIN {
It makes the program unintelligible to all but the perpetrator.
Be especially aware that function-like macros are textually
substituted, and side-effects in their arguments may be multiply-
evaluated if the arguments are referred to more than once in the body
of the macro. Similarly, variables defined inside of a macro's body
may conflict with variables in the outer scope. Finally, macros are
not generally type safe. For most macros and most programs, these are
non-issues, but programmers who run into problems here may consider
judicious use of `inline' functions as an alternative.
Whitespace and the Preprocessor
Use the following conventions with respect to whitespace and the
preprocessor:
+o `#include' should be followed by a single space character.
+o `#define' should be followed by a single tab character.
+o `#if', `#ifdef', and other preprocessor statements may be
followed by either a tab or space, but be consistent with the
surrounding code.
Miscellaneous Comments on Good Taste Avoid undefined behavior wherever possible. Note that the rules of C
are very subtle, and many things that at first appear well-defined can
actually conceal undefined behavior. When in doubt, consult the C
standard.
Traditional Unix style favors guard clauses, which check a precondition
and fail (possibly via an early return) over deeply nested control
structures. For example, prefer:
void
foo(void)
{
struct foo *foo;
struct bar *bar;
struct baz *baz;
foo = some_foo();
if (!is_valid_foo(foo))
return;
bar = some_bar(foo);
if (!is_valid_bar(bar)
return;
baz = some_baz(bar);
if (!is_valid_baz(baz));
return;
/* All of the preconditions are met */
do_something(baz);
}
over,
void
foo(void)
{
struct foo *f;
foo = some_foo();
if (is_valid_foo(foo)) {
bar = some_bar(foo);
if (is_valid_bar(bar)) {
baz = some_baz(bar);
if (is_valid_baz(baz)) {
/* Preconditions met */
do_something(baz);
}
}
}
}
Try to make the structure of your program match the intent. For
example, replace:
if (boolean_expression)
return (TRUE);
else
return (FALSE);
with:
return (boolean_expression);
Similarly,
if (condition)
return (x);
return (y);
is usually clearer than:
if (condition)
return (x);
else
return (y);
or even better, if the condition and return expressions are short;
return (condition ? x : y);
Do not default the boolean test for nonzero. Prefer
if (f() != 0)
rather than
if (f())
even though 0 is considered to "false" in boolean contexts in C. An
exception is commonly made for predicate functions, which encapsulate
(possibly complex) boolean expressions. Predicates must meet the
following restrictions:
+o Has no other purpose than to return true or false.
+o Returns 0 for false, non-zero for true.
+o Is named so that the meaning of the return value is obvious.
Call a predicate
is_valid() or
valid(), not
check_valid(). Note that
isvalid() and similar names with the `is' prefix followed by a letter
or number (but not underscore) are reserved by the ISO C standard.
The set of POSIX ctype functions including
isalpha(3C),
isalnum(3C),
and
isdigit(3C) are examples of predicates.
A particularly notorious case of not obeying the rules around
predicates is using
strcmp(3C) to test for string equality, where the
result should never be defaulted (and indeed, a return value of 0
denotes equality).
Never use the boolean negation operator (`!') with non-boolean
expressions. In particular, never use it to test for a NULL pointer or
to test for success of comparison functions like
strcmp(3C) or
memcmp(3C). E.g.,
char *p;
...
if (!p) /* WRONG */
return;
if (!strcmp(*argv, "-a")) /* WRONG */
aflag++;
When testing whether a bit is set in a value, it is good style to
explicitly test the result of a bitwise operation against 0, rather
than defaulting the boolean condition. Prefer
if ((flags & FLAG_VERBOSE) != 0)
if ((flags & FLAG_VERBOSE) == 0)
rather than the following:
if (flags & FLAG_VERBOSE)
if (!(flags & FLAG_VERBOSE))
Do not use the assignment operator in a place where it could be easily
confused with the equality operator. For instance, in the simple
expression
if (x = y)
statement;
it is hard to tell whether the programmer really meant assignment or
mistyped an equality test. Instead, use
if ((x = y) != 0)
statement;
or something similar, if the assignment is actually needed within the
if statement.
There is a time and a place for embedded assignments. The
++ and
-- operators count as assignments; so, for many purposes, do functions
with side effects.
In some constructs there is no better way to accomplish the results
without making the code bulkier and less readable. To repeat the
earlier loop example:
while ((c = getchar()) != EOF) {
process the character
}
Embedded assignments used to provide modest improvement in run-time
performance, but this is no longer the case with modern optimizing
compilers. Do note write, for example,
d = (a = b + c) + 4; /* WRONG */
believing that it will somehow be "faster" than
a = b + c;
d = a + 4;
In general, avoid such premature micro-optimization unless performance
is clearly a bottleneck, and a profile shows that the optimization
provides a significant performance boost. Be aware how in the long run
hand-optimized code often turns into a pessimization, and maintenance
difficulty will increase as the human memory of what's going on in a
given piece of code fades. Note also that side effects within
expressions can result in code whose semantics are compiler-dependent,
since C's order of evaluation is explicitly undefined in most places.
Compilers do differ.
There is also a time and place for the ternary (`? :') operator and the
binary comma operator. If an expression containing a binary operator
appears before the `?', it should be parenthesized:
(x >= 0) ? x : -x
Nested ternary operators can be confusing and should be avoided if
possible.
The comma operator can be useful in
for statements to provide multiple
initializations or incrementations.
SEE ALSO
Bill Shannon,
C Style and Coding Standards for SunOS, 1996.
mdb(1)illumos March 17, 2024 illumos