Tribblix: manual page: trapstat.8

TRAPSTAT(8) Maintenance Commands and Procedures TRAPSTAT(8)

NAME

trapstat - report trap statistics

SYNOPSIS

DESCRIPTION

The trapstat utility gathers and displays run-time trap statistics on
UltraSPARC-based systems. The default output is a table of trap types
and CPU IDs, with each row of the table denoting a trap type and each
column of the table denoting a CPU. If standard output is a terminal,
the table contains as many columns of data as can fit within the
terminal width; if standard output is not a terminal, the table
contains at most six columns of data. By default, data is gathered
and displayed for all CPUs; if the data cannot fit in a single table,
it is printed across multiple tables. The set of CPUs for which data
is gathered and displayed can be optionally specified with the -c or
-C option.

Unless the -r option or the -a option is specified, the value
displayed in each entry of the table corresponds to the number of
traps per second. If the -r option is specified, the value
corresponds to the number of traps over the interval implied by the
specified sampling rate; if the -a option is specified, the value
corresponds to the accumulated number of traps since the invocation
of trapstat.

By default, trapstat displays data once per second, and runs
indefinitely; both of these behaviors can be optionally controlled
with the interval and count parameters, respectively. The interval is
specified in seconds; the count indicates the number of intervals to
be executed before exiting. Alternatively, command can be specified,
in which case trapstat executes the provided command and continues to
run until the command exits. A positive integer is assumed to be an
interval; if the desired command cannot be distinguished from an
integer, the full path of command must be specified.

UltraSPARC I (obsolete), II, and III handle translation lookaside
buffer (TLB) misses by trapping to the operating system. TLB miss
traps can be a significant component of overall system performance
for some workloads; the -t option provides in-depth information on
these traps. When run with this option, trapstat displays both the
rate of TLB miss traps and the percentage of time spent processing
those traps. Additionally, TLB misses that hit in the translation
storage buffer (TSB) are differentiated from TLB misses that further
miss in the TSB. (The TSB is a software structure used as a
translation entry cache to allow the TLB to be quickly filled; it is
discussed in detail in the UltraSPARC II User's Manual.) The TLB and
TSB miss information is further broken down into user- and kernel-
mode misses.

Workloads with working sets that exceed the TLB reach may spend a
significant amount of time missing in the TLB. To accommodate such
workloads, the operating system supports multiple page sizes: larger
page sizes increase the effective TLB reach and thereby reduce the
number of TLB misses. To provide insight into the relationship
between page size and TLB miss rate, trapstat optionally provides in-
depth TLB miss information broken down by page size using the -T
option. The information provided by the -T option is a superset of
that provided by the -t option; only one of -t and -T can be
specified.

OPTIONS

The following options are supported:

-a
Displays the number of traps as accumulating,
monotonically increasing values instead of
per-second or per-interval rates.

-c cpulist
Enables trapstat only on the CPUs specified
by cpulist.

cpulist can be a single processor ID (for
example, 4), a range of processor IDs (for
example, 4-6), or a comma separated list of
processor IDs or processor ID ranges (for
example, 4,5,6 or 4,6-8).

-C processor_set_id
Enables trapstat only on the CPUs in the
processor set specified by processor_set_id.

trapstat modifies its output to always
reflect the CPUs in the specified processor
set. If a CPU is added to the set, trapstat
modifies its output to include the added CPU;
if a CPU is removed from the set, trapstat
modifies its output to exclude the removed
CPU. At most one processor set can be
specified.

-e entrylist
Enables trapstat only for the trap table
entry or entries specified by entrylist. A
trap table entry can be specified by trap
number or by trap name (for example, the
level-10 trap can be specified as 74, 0x4A,
0x4a, or level-10).

entrylist can be a single trap table entry or
a comma separated list of trap table entries.
If the specified trap table entry is not
valid, trapstat prints a table of all valid
trap table entries and values. A list of
valid trap table entries is also found in The
SPARC Architecture Manual, Version 9 and the
Sun Microelectronics UltraSPARC II User's
Manual. If the parsable option (-P) is
specified in addition to the -e option, the
format of the data is as follows:

Field Contents
1 Timestamp (nanoseconds since start)
2 CPU ID
3 Trap number (in hexadecimal)
4 Trap name
5 Trap rate per interval

Each field is separated with whitespace. If
the format is modified, it will be modified
by adding potentially new fields beginning
with field 6; exant fields will remain
unchanged.

-l
Lists trap table entries. By default, a table
is displayed containing all valid trap
numbers, their names and a brief description.
The trap name is used in both the default
output and in the entrylist parameter for the
-e argument. If the parsable option (-P) is
specified in addition to the -l option, the
format of the data is as follows:

Field Contents
1 Trap number in hexadecimal
2 Trap number in decimal
3 Trap name
Remaining Trap description

-P
Generates parsable output. When run without
other data gathering modifying options (that
is, -e, -t or -T), trapstat's the parsable
output has the following format:

Field Contents
1 Timestamp (nanoseconds since start)
2 CPU ID
3 Trap number (in hexadecimal)
4 Trap name
5 Trap rate per interval

Each field is separated with whitespace. If
the format is modified, it will be modified
by adding potentially new fields beginning
with field 6; extant fields will remain
unchanged.

-r rate
Explicitly sets the sampling rate to be rate
samples per second. If this option is
specified, trapstat's output changes from a
traps-per-second to traps-per-sampling-
interval.

-t
Enables TLB statistics.

A table is displayed with four principal
columns of data: itlb-miss, itsb-miss, dtlb-
miss, and dtsb-miss. The columns contain both
the rate of the corresponding event and the
percentage of CPU time spent processing the
event. The percentage of CPU time is given
only in terms of a single CPU. The rows of
the table correspond to CPUs, with each CPU
consuming two rows: one row for user-mode
events (denoted with u) and one row for
kernel-mode events (denoted with k). For each
row, the percentage of CPU time is totalled
and displayed in the rightmost column. The
CPUs are delineated with a solid line. If the
parsable option (-P) is specified in addition
to the -t option, the format of the data is
as follows:

Field Contents
1 Timestamp (nanoseconds since start)
2 CPU ID
3 Mode (k denotes kernel, u denotes user)
4 I-TLB misses
5 Percentage of time in I-TLB miss handler
6 I-TSB misses
7 Percentage of time in I-TSB miss handler
8 D-TLB misses
9 Percentage of time in D-TLB miss handler
10 D-TSB misses
11 Percentage of time in D-TSB miss handler

Each field is separated with whitespace. If
the format is modified, it will be modified
by adding potentially new fields beginning
with field 12; extant fields will remain
unchanged.

-T
Enables TLB statistics, with page size
information. As with the -t option, a table
is displayed with four principal columns of
data: itlb-miss, itsb-miss, dtlb-miss, and
dtsb-miss. The columns contain both the
absolute number of the corresponding event,
and the percentage of CPU time spent
processing the event. The percentage of CPU
time is given only in terms of a single CPU.
The rows of the table correspond to CPUs,
with each CPU consuming two sets of rows: one
set for user-level events (denoted with u)
and one set for kernel-level events (denoted
with k). Each set, in turn, contains as many
rows as there are page sizes supported (see
getpagesizes(3C)). For each row, the
percentage of CPU time is totalled and
displayed in the right-most column. The two
sets are delineated with a dashed line; CPUs
are delineated with a solid line. If the
parsable option (-P) is specified in addition
to the -T option, the format of the data is
as follows:

Field Contents
1 Timestamp (nanoseconds since start)
2 CPU ID
3 Mode k denotes kernel, u denotes user)
4 Page size, in decimal
5 I-TLB misses
6 Percentage of time in I-TLB miss handler
7 I-TSB misses
8 Percentage of time in I-TSB miss handler
9 D-TLB misses
10 Percentage of time in D-TLB miss handler
11 D-TSB misses
12 Percentage of time in D-TSB miss handler

Each field is separated with whitespace. If
the format is modified, it will be modified
by adding potentially new fields beginning
with field 13; extant fields will remain
unchanged.

EXAMPLES

Example 1: Using trapstat Without Options

Example 2: Using trapset with CPU Filtering

The -c option can be used to limit the CPUs on which trapstat is
enabled. This example limits CPU 1 and CPUs 12 through 15.

example# trapstat -c 1,12-15

vct name | cpu1 cpu12 cpu13 cpu14 cpu15
------------------------+---------------------------------------------
24 cleanwin | 6923 3072 2500 3518 2261
44 level-4 | 3 0 0 1 1
49 level-9 | 100 100 100 100 100
4d level-13 | 23 8 14 19 14
60 int-vec | 2559 2699 2752 2688 2792
64 itlb-miss | 3296 1548 1174 1698 1087
68 dtlb-miss | 114788 54313 43040 58336 38057
6c dtlb-prot | 1046 549 417 545 370
84 spill-user-32 | 66551 29480 301588 26522 213032
88 spill-user-64 | 0 318652 111239 299829 221716
8c spill-user-32-cln | 856 347 331 416 293
90 spill-user-64-cln | 0 55 21 59 39
98 spill-kern-64 | 66464 31803 24758 34004 22277
a4 spill-asuser-32 | 1423 569 560 698 483
a8 spill-asuser-64 | 0 74 32 98 46
ac spill-asuser-32-cln | 4875 2250 1728 2384 1584
b0 spill-asuser-64-cln | 0 2 0 1 0
c4 fill-user-32 | 64193 28418 287516 27055 202093
c8 fill-user-64 | 0 305016 106692 288542 210654
cc fill-user-32-cln | 6733 3520 15185 2396 12035
d0 fill-user-64-cln | 0 13226 3506 12933 11032
d8 fill-kern-64 | 66220 31680 24674 33892 22196
108 syscall-32 | 2446 967 817 1196 755

Example 3: Using trapstat with TLB Statistics

The -t option displays in-depth TLB statistics, including the amount
of time spent performing TLB miss processing. The following example
shows that the machine is spending 14.1 percent of its time just
handling D-TLB misses:

example# trapstat -t
cpu m| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
-----+-------------------------------+-------------------------------+----
0 u| 2571 0.3 0 0.0 | 10802 1.3 0 0.0 | 1.6
0 k| 0 0.0 0 0.0 | 106420 13.4 184 0.1 |13.6
-----+-------------------------------+-------------------------------+----
1 u| 3069 0.3 0 0.0 | 10983 1.2 100 0.0 | 1.6
1 k| 27 0.0 0 0.0 | 106974 12.6 19 0.0 |12.7
-----+-------------------------------+-------------------------------+----
2 u| 3033 0.3 0 0.0 | 11045 1.2 105 0.0 | 1.6
2 k| 43 0.0 0 0.0 | 107842 12.7 108 0.0 |12.8
-----+-------------------------------+-------------------------------+----
3 u| 2924 0.3 0 0.0 | 10380 1.2 121 0.0 | 1.6
3 k| 54 0.0 0 0.0 | 102682 12.2 16 0.0 |12.2
-----+-------------------------------+-------------------------------+----
4 u| 3064 0.3 0 0.0 | 10832 1.2 120 0.0 | 1.6
4 k| 31 0.0 0 0.0 | 107977 13.0 236 0.1 |13.1
=====+===============================+===============================+====
ttl | 14816 0.3 0 0.0 | 585937 14.1 1009 0.0 |14.5

Example 4: Using trapstat with TLB Statistics and Page Size

Information

By specifying the -T option, trapstat shows TLB misses broken down by
page size. In this example, CPU 0 is spending 7.9 percent of its time
handling user-mode TLB misses on 8K pages, and another 2.3 percent of
its time handling user-mode TLB misses on 64K pages.

example# trapstat -T -c 0
cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
----------+-------------------------------+-------------------------------+----
0 u 8k| 1300 0.1 15 0.0 | 104897 7.9 90 0.0 | 8.0
0 u 64k| 0 0.0 0 0.0 | 29935 2.3 7 0.0 | 2.3
0 u 512k| 0 0.0 0 0.0 | 3569 0.2 2 0.0 | 0.2
0 u 4m| 0 0.0 0 0.0 | 233 0.0 2 0.0 | 0.0
- - - - - + - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - + - -
0 k 8k| 13 0.0 0 0.0 | 71733 6.5 110 0.0 | 6.5
0 k 64k| 0 0.0 0 0.0 | 0 0.0 0 0.0 | 0.0
0 k 512k| 0 0.0 0 0.0 | 0 0.0 206 0.1 | 0.1
0 k 4m| 0 0.0 0 0.0 | 0 0.0 0 0.0 | 0.0
==========+===============================+===============================+====
ttl | 1313 0.1 15 0.0 | 210367 17.1 417 0.2 |17.5

Example 5: Using trapstat with Entry Filtering

By specifying the -e option, trapstat displays statistics for only
specific trap types. Using this option minimizes the probe effect
when seeking specific data. This example yields statistics for only
the dtlb-prot and syscall-32 traps on CPUs 12 through 15:

example# trapstat -e dtlb-prot,syscall-32 -c 12-15
vct name | cpu12 cpu13 cpu14 cpu15
------------------------+------------------------------------
6c dtlb-prot | 817 754 1018 560
108 syscall-32 | 1426 1647 2186 1142

vct name | cpu12 cpu13 cpu14 cpu15
------------------------+------------------------------------
6c dtlb-prot | 1085 996 800 707
108 syscall-32 | 2578 2167 1638 1452

Example 6: Using trapstat with a Higher Sampling Rate

The following example uses the -r option to specify a sampling rate
of 1000 samples per second, and filter only for the level-10 trap.
Additionally, specifying the -P option yields parsable output.

Notice the timestamp difference between the level-10 events:
9,998,000 nanoseconds and 10,007,000 nanoseconds. These level-10
events correspond to the system clock, which by default ticks at 100
hertz (that is, every 10,000,000 nanoseconds).

example# trapstat -e level-10 -P -r 1000
1070400 0 4a level-10 0
2048600 0 4a level-10 0
3030400 0 4a level-10 1
4035800 0 4a level-10 0
5027200 0 4a level-10 0
6027200 0 4a level-10 0
7027400 0 4a level-10 0
8028200 0 4a level-10 0
9026400 0 4a level-10 0
10029600 0 4a level-10 0
11028600 0 4a level-10 0
12024000 0 4a level-10 0
13028400 0 4a level-10 1
14031200 0 4a level-10 0
15027200 0 4a level-10 0
16027600 0 4a level-10 0
17025000 0 4a level-10 0
18026000 0 4a level-10 0
19027800 0 4a level-10 0
20025600 0 4a level-10 0
21025200 0 4a level-10 0
22025000 0 4a level-10 0
23035400 0 4a level-10 1
24027400 0 4a level-10 0
25026000 0 4a level-10 0
26027000 0 4a level-10 0

ATTRIBUTES

See attributes(7) for descriptions of the following attributes:

+--------------------------------------------+
| ATTRIBUTE TYPE ATTRIBUTE VALUE |
|Interface Stability |
| Human Readable Output Unstable |
| Parsable Output Evolving |
+--------------------------------------------+

NOTES

When enabled, trapstat induces a varying probe effect, depending on
the type of information collected. While the precise probe effect
depends upon the specifics of the hardware, the following table can
be used as a rough guide:

Option Approximate probe effect
default 3-5% per trap
-e 3-5% per specified trap
-t, -T 40-45% per TLB miss trap
hitting in the TSB,
25-30% per TLB miss trap
missing in the TSB

These probe effects are per trap not for the system as a whole. For
example, running trapstat with the default options on a system that
spends 7% of total time handling traps induces a performance
degradation of less than one half of one percent; running trapstat
with the -t or -T option on a system spending 5% of total time
processing TLB misses induce a performance degradation of no more
than 2.5%.

When run with the -t or -T option, trapstat accounts for its probe
effect when calculating the %tim fields. This assures that the %tim
fields are a reasonably accurate indicator of the time a given
workload is spending handling TLB misses -- regardless of the
perturbing presence of trapstat.

While the %tim fields include the explicit cost of executing the TLB
miss handler, they do not include the implicit costs of TLB miss
traps (for example, pipeline effects, cache pollution, etc). These
implicit costs become more significant as the trap rate grows; if
high %tim values are reported (greater than 50%), you can accurately
infer that much of the balance of time is being spent on the implicit
costs of the TLB miss traps.

Due to the potential system wide degradation induced, only the super-
user can run trapstat.

Due to the limitation of the underlying statistics gathering
methodology, only one instance of trapstat can run at a time.

April 9, 2016 TRAPSTAT(8)