HSW_EVENTS(3CPC) CPU Performance Counters Library Functions
NAME
hsw_events - processor model specific performance counter events
DESCRIPTION
This manual page describes events specific to the following Intel CPU
models and is derived from Intel's perfmon data. For more information,
please consult the Intel Software Developer's Manual or Intel's perfmon
website.
CPU models described by this document:
+o Family 0x6, Model 0x46 +o Family 0x6, Model 0x45 +o Family 0x6, Model 0x3c The following events are supported:
ld_blocks.store_forward This event counts loads that followed a store to the same
address, where the data could not be forwarded inside the
pipeline from the store to the load. The most common reason
why store forwarding would be blocked is when a load's address
range overlaps with a preceding smaller uncompleted store. The
penalty for blocked store forwarding is that the load must wait
for the store to write its value to the cache before it can be
issued.
ld_blocks.no_sr The number of times that split load operations are temporarily
blocked because all resources for handling the split accesses
are in use.
misalign_mem_ref.loads Speculative cache-line split load uops dispatched to L1D.
misalign_mem_ref.stores Speculative cache-line split store-address uops dispatched to
L1D.
ld_blocks_partial.address_alias Aliasing occurs when a load is issued after a store and their
memory addresses are offset by 4K. This event counts the
number of loads that aliased with a preceding store, resulting
in an extended address check in the pipeline which can have a
performance impact.
dtlb_load_misses.miss_causes_a_walk Misses in all TLB levels that cause a page walk of any page
size.
dtlb_load_misses.walk_completed_4k Completed page walks due to demand load misses that caused 4K
page walks in any TLB levels.
dtlb_load_misses.walk_completed_2m_4m Completed page walks due to demand load misses that caused
2M/4M page walks in any TLB levels.
dtlb_load_misses.walk_completed_1g Load miss in all TLB levels causes a page walk that completes.
(1G)
dtlb_load_misses.walk_completed Completed page walks in any TLB of any page size due to demand
load misses.
dtlb_load_misses.walk_duration This event counts cycles when the page miss handler (PMH) is
servicing page walks caused by DTLB load misses.
dtlb_load_misses.stlb_hit_4k This event counts load operations from a 4K page that miss the
first DTLB level but hit the second and do not cause page
walks.
dtlb_load_misses.stlb_hit_2m This event counts load operations from a 2M page that miss the
first DTLB level but hit the second and do not cause page
walks.
dtlb_load_misses.stlb_hit Number of cache load STLB hits. No page walk.
dtlb_load_misses.pde_cache_miss DTLB demand load misses with low part of linear-to-physical
address translation missed.
int_misc.recovery_cycles This event counts the number of cycles spent waiting for a
recovery after an event such as a processor nuke, JEClear,
assist, hle/rtm abort etc.
int_misc.recovery_cycles_any Core cycles the allocator was stalled due to recovery from
earlier clear event for any thread running on the physical core
(e.g. misprediction or memory nuke).
uops_issued.any This event counts the number of uops issued by the Front-end of
the pipeline to the Back-end. This event is counted at the
allocation stage and will count both retired and non-retired
uops.
uops_issued.stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops
to Reservation Station (RS) for the thread.
uops_issued.core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops
to Reservation Station (RS) for all threads.
uops_issued.flags_merge Number of flags-merge uops allocated. Such uops add delay.
uops_issued.slow_lea Number of slow LEA or similar uops allocated. Such uop has 3
sources (for example, 2 sources + immediate) regardless of
whether it is a result of LEA instruction or not.
uops_issued.single_mul Number of multiply packed/scalar single precision uops
allocated.
arith.divider_uops Any uop executed by the Divider. (This includes all divide
uops, sqrt, ...)
l2_rqsts.demand_data_rd_miss Demand data read requests that missed L2, no rejects.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.rfo_miss Counts the number of store RFO requests that miss the L2 cache.
l2_rqsts.code_rd_miss Number of instruction fetches that missed the L2 cache.
l2_rqsts.all_demand_miss Demand requests that miss L2 cache.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.l2_pf_miss Counts all L2 HW prefetcher requests that missed L2.
l2_rqsts.miss All requests that missed L2.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.demand_data_rd_hit Counts the number of demand Data Read requests, initiated by
load instructions, that hit L2 cache
The following errata may apply to this: HSD78, HSM80
l2_rqsts.rfo_hit Counts the number of store RFO requests that hit the L2 cache.
l2_rqsts.code_rd_hit Number of instruction fetches that hit the L2 cache.
l2_rqsts.l2_pf_hit Counts all L2 HW prefetcher requests that hit L2.
l2_rqsts.all_demand_data_rd Counts any demand and L1 HW prefetch data load requests to L2.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.all_rfo Counts all L2 store RFO requests.
l2_rqsts.all_code_rd Counts all L2 code requests.
l2_rqsts.all_demand_references Demand requests to L2 cache.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.all_pf Counts all L2 HW prefetcher requests.
l2_rqsts.references All requests to L2 cache.
The following errata may apply to this: HSD78, HSM80
l2_demand_rqsts.wb_hit Not rejected writebacks that hit L2 cache.
longest_lat_cache.miss This event counts each cache miss condition for references to
the last level cache.
longest_lat_cache.reference This event counts requests originating from the core that
reference a cache line in the last level cache.
cpu_clk_unhalted.thread_p Counts the number of thread cycles while the thread is not in a
halt state. The thread enters the halt state when it is running
the HLT instruction. The core frequency may change from time to
time due to power or thermal throttling.
cpu_clk_unhalted.thread_p_any Core cycles when at least one thread on the physical core is
not in halt state.
cpu_clk_thread_unhalted.ref_xclk Increments at the frequency of XCLK (100 MHz) when not halted.
cpu_clk_thread_unhalted.ref_xclk_any Reference cycles when the at least one thread on the physical
core is unhalted (counts at 100 MHz rate).
cpu_clk_unhalted.ref_xclk Reference cycles when the thread is unhalted. (counts at 100
MHz rate)
cpu_clk_unhalted.ref_xclk_any Reference cycles when the at least one thread on the physical
core is unhalted (counts at 100 MHz rate).
cpu_clk_thread_unhalted.one_thread_active Count XClk pulses when this thread is unhalted and the other
thread is halted.
cpu_clk_unhalted.one_thread_active Count XClk pulses when this thread is unhalted and the other
thread is halted.
l1d_pend_miss.pending Increments the number of outstanding L1D misses every cycle.
Set Cmask = 1 and Edge =1 to count occurrences.
l1d_pend_miss.pending_cycles Cycles with L1D load Misses outstanding.
l1d_pend_miss.pending_cycles_any Cycles with L1D load Misses outstanding from any thread on
physical core.
l1d_pend_miss.request_fb_full Number of times a request needed a FB entry but there was no
entry available for it. That is the FB unavailability was
dominant reason for blocking the request. A request includes
cacheable/uncacheable demands that is load, store or SW
prefetch. HWP are e.
l1d_pend_miss.fb_full Cycles a demand request was blocked due to Fill Buffers
inavailability.
dtlb_store_misses.miss_causes_a_walk Miss in all TLB levels causes a page walk of any page size
(4K/2M/4M/1G).
dtlb_store_misses.walk_completed_4k Completed page walks due to store misses in one or more TLB
levels of 4K page structure.
dtlb_store_misses.walk_completed_2m_4m Completed page walks due to store misses in one or more TLB
levels of 2M/4M page structure.
dtlb_store_misses.walk_completed_1g Store misses in all DTLB levels that cause completed page
walks. (1G)
dtlb_store_misses.walk_completed Completed page walks due to store miss in any TLB levels of any
page size (4K/2M/4M/1G).
dtlb_store_misses.walk_duration This event counts cycles when the page miss handler (PMH) is
servicing page walks caused by DTLB store misses.
dtlb_store_misses.stlb_hit_4k This event counts store operations from a 4K page that miss the
first DTLB level but hit the second and do not cause page
walks.
dtlb_store_misses.stlb_hit_2m This event counts store operations from a 2M page that miss the
first DTLB level but hit the second and do not cause page
walks.
dtlb_store_misses.stlb_hit Store operations that miss the first TLB level but hit the
second and do not cause page walks.
dtlb_store_misses.pde_cache_miss DTLB store misses with low part of linear-to-physical address
translation missed.
load_hit_pre.sw_pf Non-SW-prefetch load dispatches that hit fill buffer allocated
for S/W prefetch.
load_hit_pre.hw_pf Non-SW-prefetch load dispatches that hit fill buffer allocated
for H/W prefetch.
ept.walk_cycles Cycle count for an Extended Page table walk.
l1d.replacement This event counts when new data lines are brought into the L1
Data cache, which cause other lines to be evicted from the
cache.
tx_mem.abort_conflict Number of times a transactional abort was signaled due to a
data conflict on a transactionally accessed address.
tx_mem.abort_capacity_write Number of times a transactional abort was signaled due to a
data capacity limitation for transactional writes.
tx_mem.abort_hle_store_to_elided_lock Number of times a HLE transactional region aborted due to a non
XRELEASE prefixed instruction writing to an elided lock in the
elision buffer.
tx_mem.abort_hle_elision_buffer_not_empty Number of times an HLE transactional execution aborted due to
NoAllocatedElisionBuffer being non-zero.
tx_mem.abort_hle_elision_buffer_mismatch Number of times an HLE transactional execution aborted due to
XRELEASE lock not satisfying the address and value requirements
in the elision buffer.
tx_mem.abort_hle_elision_buffer_unsupported_alignment Number of times an HLE transactional execution aborted due to
an unsupported read alignment from the elision buffer.
tx_mem.hle_elision_buffer_full Number of times HLE lock could not be elided due to
ElisionBufferAvailable being zero.
move_elimination.int_eliminated Number of integer move elimination candidate uops that were
eliminated.
move_elimination.simd_eliminated Number of SIMD move elimination candidate uops that were
eliminated.
move_elimination.int_not_eliminated Number of integer move elimination candidate uops that were not
eliminated.
move_elimination.simd_not_eliminated Number of SIMD move elimination candidate uops that were not
eliminated.
cpl_cycles.ring0 Unhalted core cycles when the thread is in ring 0.
cpl_cycles.ring0_trans Number of intervals between processor halts while thread is in
ring 0.
cpl_cycles.ring123 Unhalted core cycles when the thread is not in ring 0.
tx_exec.misc1 Counts the number of times a class of instructions that may
cause a transactional abort was executed. Since this is the
count of execution, it may not always cause a transactional
abort.
tx_exec.misc2 Counts the number of times a class of instructions (e.g.,
vzeroupper) that may cause a transactional abort was executed
inside a transactional region.
tx_exec.misc3 Counts the number of times an instruction execution caused the
transactional nest count supported to be exceeded.
tx_exec.misc4 Counts the number of times a XBEGIN instruction was executed
inside an HLE transactional region.
tx_exec.misc5 Counts the number of times an HLE XACQUIRE instruction was
executed inside an RTM transactional region.
rs_events.empty_cycles This event counts cycles when the Reservation Station ( RS ) is
empty for the thread. The RS is a structure that buffers
allocated micro-ops from the Front-end. If there are many
cycles when the RS is empty, it may represent an underflow of
instructions delivered from the Front-end.
rs_events.empty_end Counts end of periods where the Reservation Station (RS) was
empty. Could be useful to precisely locate Frontend Latency
Bound issues.
offcore_requests_outstanding.demand_data_rd Offcore outstanding demand data read transactions in SQ to
uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD78, HSD62, HSD61,
HSM63, HSM80
offcore_requests_outstanding.cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions
are present in SuperQueue (SQ), queue to uncore.
The following errata may apply to this: HSD78, HSD62, HSD61,
HSM63, HSM80
offcore_requests_outstanding.demand_data_rd_ge_6 Cycles with at least 6 offcore outstanding Demand Data Read
transactions in uncore queue.
The following errata may apply to this: HSD78, HSD62, HSD61,
HSM63, HSM80
offcore_requests_outstanding.demand_code_rd Offcore outstanding Demand code Read transactions in SQ to
uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.demand_rfo Offcore outstanding RFO store transactions in SQ to uncore. Set
Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.cycles_with_demand_rfo Offcore outstanding demand rfo reads transactions in SuperQueue
(SQ), queue to uncore, every cycle.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.all_data_rd Offcore outstanding cacheable data read transactions in SQ to
uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.cycles_with_data_rd Cycles when offcore outstanding cacheable Core Data Read
transactions are present in SuperQueue (SQ), queue to uncore.
The following errata may apply to this: HSD62, HSD61, HSM63
lock_cycles.split_lock_uc_lock_duration Cycles in which the L1D and L2 are locked, due to a UC lock or
split lock.
lock_cycles.cache_lock_duration Cycles in which the L1D is locked.
idq.empty Counts cycles the IDQ is empty.
The following errata may apply to this: HSD135
idq.mite_uops Increment each cycle # of uops delivered to IDQ from MITE path.
Set Cmask = 1 to count cycles.
idq.mite_cycles Cycles when uops are being delivered to Instruction Decode
Queue (IDQ) from MITE path.
idq.dsb_uops Increment each cycle. # of uops delivered to IDQ from DSB path.
Set Cmask = 1 to count cycles.
idq.dsb_cycles Cycles when uops are being delivered to Instruction Decode
Queue (IDQ) from Decode Stream Buffer (DSB) path.
idq.ms_dsb_uops Increment each cycle # of uops delivered to IDQ when MS_busy by
DSB. Set Cmask = 1 to count cycles. Add Edge=1 to count # of
delivery.
idq.ms_dsb_cycles Cycles when uops initiated by Decode Stream Buffer (DSB) are
being delivered to Instruction Decode Queue (IDQ) while
Microcode Sequenser (MS) is busy.
idq.ms_dsb_occur Deliveries to Instruction Decode Queue (IDQ) initiated by
Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is
busy.
idq.all_dsb_cycles_4_uops Counts cycles DSB is delivered four uops. Set Cmask = 4.
idq.all_dsb_cycles_any_uops Counts cycles DSB is delivered at least one uops. Set Cmask =
1.
idq.ms_mite_uops Increment each cycle # of uops delivered to IDQ when MS_busy by
MITE. Set Cmask = 1 to count cycles.
idq.all_mite_cycles_4_uops Counts cycles MITE is delivered four uops. Set Cmask = 4.
idq.all_mite_cycles_any_uops Counts cycles MITE is delivered at least one uop. Set Cmask =
1.
idq.ms_uops This event counts uops delivered by the Front-end with the
assistance of the microcode sequencer. Microcode assists are
used for complex instructions or scenarios that can't be
handled by the standard decoder. Using other instructions, if
possible, will usually improve performance.
idq.ms_cycles This event counts cycles during which the microcode sequencer
assisted the Front-end in delivering uops. Microcode assists
are used for complex instructions or scenarios that can't be
handled by the standard decoder. Using other instructions, if
possible, will usually improve performance.
idq.ms_switches Number of switches from DSB (Decode Stream Buffer) or MITE
(legacy decode pipeline) to the Microcode Sequencer.
idq.mite_all_uops Number of uops delivered to IDQ from any path.
icache.hit Number of Instruction Cache, Streaming Buffer and Victim Cache
Reads. both cacheable and noncacheable, including UC fetches.
icache.misses This event counts Instruction Cache (ICACHE) misses.
icache.ifetch_stall Cycles where a code fetch is stalled due to L1 instruction-
cache miss.
icache.ifdata_stall Cycles where a code fetch is stalled due to L1 instruction-
cache miss.
itlb_misses.miss_causes_a_walk Misses in ITLB that causes a page walk of any page size.
itlb_misses.walk_completed_4k Completed page walks due to misses in ITLB 4K page entries.
itlb_misses.walk_completed_2m_4m Completed page walks due to misses in ITLB 2M/4M page entries.
itlb_misses.walk_completed_1g Store miss in all TLB levels causes a page walk that completes.
(1G)
itlb_misses.walk_completed Completed page walks in ITLB of any page size.
itlb_misses.walk_duration This event counts cycles when the page miss handler (PMH) is
servicing page walks caused by ITLB misses.
itlb_misses.stlb_hit_4k ITLB misses that hit STLB (4K).
itlb_misses.stlb_hit_2m ITLB misses that hit STLB (2M).
itlb_misses.stlb_hit ITLB misses that hit STLB. No page walk.
ild_stall.lcp This event counts cycles where the decoder is stalled on an
instruction with a length changing prefix (LCP).
ild_stall.iq_full Stall cycles due to IQ is full.
br_inst_exec.nontaken_conditional Not taken macro-conditional branches.
br_inst_exec.taken_conditional Taken speculative and retired macro-conditional branches.
br_inst_exec.taken_direct_jump Taken speculative and retired macro-conditional branch
instructions excluding calls and indirects.
br_inst_exec.taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls
and returns.
br_inst_exec.taken_indirect_near_return Taken speculative and retired indirect branches with return
mnemonic.
br_inst_exec.taken_direct_near_call Taken speculative and retired direct near calls.
br_inst_exec.taken_indirect_near_call Taken speculative and retired indirect calls.
br_inst_exec.all_conditional Speculative and retired macro-conditional branches.
br_inst_exec.all_direct_jmp Speculative and retired macro-unconditional branches excluding
calls and indirects.
br_inst_exec.all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and
returns.
br_inst_exec.all_indirect_near_return Speculative and retired indirect return branches.
br_inst_exec.all_direct_near_call Speculative and retired direct near calls.
br_inst_exec.all_branches Counts all near executed branches (not necessarily retired).
br_misp_exec.nontaken_conditional Not taken speculative and retired mispredicted macro
conditional branches.
br_misp_exec.taken_conditional Taken speculative and retired mispredicted macro conditional
branches.
br_misp_exec.taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches
excluding calls and returns.
br_misp_exec.taken_return_near Taken speculative and retired mispredicted indirect branches
with return mnemonic.
br_misp_exec.taken_indirect_near_call Taken speculative and retired mispredicted indirect calls.
br_misp_exec.all_conditional Speculative and retired mispredicted macro conditional
branches.
br_misp_exec.all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns.
br_misp_exec.all_branches Counts all near executed branches (not necessarily retired).
idq_uops_not_delivered.core This event count the number of undelivered (unallocated) uops
from the Front-end to the Resource Allocation Table (RAT) while
the Back-end of the processor is not stalled. The Front-end can
allocate up to 4 uops per cycle so this event can increment 0-4
times per cycle depending on the number of unallocated uops.
This event is counted on a per-core basis.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_0_uops_deliv.core This event counts the number cycles during which the Front-end
allocated exactly zero uops to the Resource Allocation Table
(RAT) while the Back-end of the processor is not stalled. This
event is counted on a per-core basis.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_le_1_uop_deliv.core Cycles per thread when 3 or more uops are not delivered to
Resource Allocation Table (RAT) when backend of the machine is
not stalled.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_le_2_uop_deliv.core Cycles with less than 2 uops delivered by the front end.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_le_3_uop_deliv.core Cycles with less than 3 uops delivered by the front end.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table
(RAT) was stalling FE.
The following errata may apply to this: HSD135
uops_executed_port.port_0 Cycles which a uop is dispatched on port 0 in this thread.
uops_executed_port.port_0_core Cycles per core when uops are exectuted in port 0.
uops_dispatched_port.port_0 Cycles per thread when uops are executed in port 0.
uops_executed_port.port_1 Cycles which a uop is dispatched on port 1 in this thread.
uops_executed_port.port_1_core Cycles per core when uops are exectuted in port 1.
uops_dispatched_port.port_1 Cycles per thread when uops are executed in port 1.
uops_executed_port.port_2 Cycles which a uop is dispatched on port 2 in this thread.
uops_executed_port.port_2_core Cycles per core when uops are dispatched to port 2.
uops_dispatched_port.port_2 Cycles per thread when uops are executed in port 2.
uops_executed_port.port_3 Cycles which a uop is dispatched on port 3 in this thread.
uops_executed_port.port_3_core Cycles per core when uops are dispatched to port 3.
uops_dispatched_port.port_3 Cycles per thread when uops are executed in port 3.
uops_executed_port.port_4 Cycles which a uop is dispatched on port 4 in this thread.
uops_executed_port.port_4_core Cycles per core when uops are exectuted in port 4.
uops_dispatched_port.port_4 Cycles per thread when uops are executed in port 4.
uops_executed_port.port_5 Cycles which a uop is dispatched on port 5 in this thread.
uops_executed_port.port_5_core Cycles per core when uops are exectuted in port 5.
uops_dispatched_port.port_5 Cycles per thread when uops are executed in port 5.
uops_executed_port.port_6 Cycles which a uop is dispatched on port 6 in this thread.
uops_executed_port.port_6_core Cycles per core when uops are exectuted in port 6.
uops_dispatched_port.port_6 Cycles per thread when uops are executed in port 6.
uops_executed_port.port_7 Cycles which a uop is dispatched on port 7 in this thread.
uops_executed_port.port_7_core Cycles per core when uops are dispatched to port 7.
uops_dispatched_port.port_7 Cycles per thread when uops are executed in port 7.
resource_stalls.any Cycles allocation is stalled due to resource related reason.
The following errata may apply to this: HSD135
resource_stalls.rs Cycles stalled due to no eligible RS entry available.
resource_stalls.sb This event counts cycles during which no instructions were
allocated because no Store Buffers (SB) were available.
resource_stalls.rob Cycles stalled due to re-order buffer full.
cycle_activity.cycles_l2_pending Cycles with pending L2 miss loads. Set Cmask=2 to count cycle.
The following errata may apply to this: HSD78, HSM63, HSM80
cycle_activity.cycles_ldm_pending Cycles with pending memory loads. Set Cmask=2 to count cycle.
cycle_activity.cycles_no_execute This event counts cycles during which no instructions were
executed in the execution stage of the pipeline.
cycle_activity.stalls_l2_pending Number of loads missed L2.
The following errata may apply to this: HSM63, HSM80
cycle_activity.stalls_ldm_pending This event counts cycles during which no instructions were
executed in the execution stage of the pipeline and there were
memory instructions pending (waiting for data).
cycle_activity.cycles_l1d_pending Cycles with pending L1 data cache miss loads. Set Cmask=8 to
count cycle.
cycle_activity.stalls_l1d_pending Execution stalls due to L1 data cache miss loads. Set
Cmask=0CH.
lsd.uops Number of uops delivered by the LSD.
lsd.cycles_active Cycles Uops delivered by the LSD, but didn't come from the
decoder.
lsd.cycles_4_uops Cycles 4 Uops delivered by the LSD, but didn't come from the
decoder.
dsb2mite_switches.penalty_cycles Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
itlb.itlb_flush Counts the number of ITLB flushes, includes 4k/2M/4M pages.
offcore_requests.demand_data_rd Demand data read requests sent to uncore.
The following errata may apply to this: HSD78, HSM80
offcore_requests.demand_code_rd Demand code read requests sent to uncore.
offcore_requests.demand_rfo Demand RFO read requests sent to uncore, including regular
RFOs, locks, ItoM.
offcore_requests.all_data_rd Data read requests sent to uncore (demand and prefetch).
uops_executed.stall_cycles Counts number of cycles no uops were dispatched to be executed
on this thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_1_uop_exec This events counts the cycles where at least one uop was
executed. It is counted per thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_2_uops_exec This events counts the cycles where at least two uop were
executed. It is counted per thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_3_uops_exec This events counts the cycles where at least three uop were
executed. It is counted per thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.core Counts total number of uops to be executed per-core each cycle.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_1 Cycles at least 1 micro-op is executed from any thread on
physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_2 Cycles at least 2 micro-op is executed from any thread on
physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_3 Cycles at least 3 micro-op is executed from any thread on
physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_4 Cycles at least 4 micro-op is executed from any thread on
physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_none Cycles with no micro-ops executed from any thread on physical
core.
The following errata may apply to this: HSD30, HSM31
offcore_requests_buffer.sq_full Offcore requests buffer cannot take more entries for this
thread core.
page_walker_loads.dtlb_l1 Number of DTLB page walker loads that hit in the L1+FB.
page_walker_loads.dtlb_l2 Number of DTLB page walker loads that hit in the L2.
page_walker_loads.dtlb_l3 Number of DTLB page walker loads that hit in the L3.
The following errata may apply to this: HSD25
page_walker_loads.dtlb_memory Number of DTLB page walker loads from memory.
The following errata may apply to this: HSD25
page_walker_loads.itlb_l1 Number of ITLB page walker loads that hit in the L1+FB.
page_walker_loads.itlb_l2 Number of ITLB page walker loads that hit in the L2.
page_walker_loads.itlb_l3 Number of ITLB page walker loads that hit in the L3.
The following errata may apply to this: HSD25
page_walker_loads.itlb_memory Number of ITLB page walker loads from memory.
The following errata may apply to this: HSD25
page_walker_loads.ept_dtlb_l1 Counts the number of Extended Page Table walks from the DTLB
that hit in the L1 and FB.
page_walker_loads.ept_dtlb_l2 Counts the number of Extended Page Table walks from the DTLB
that hit in the L2.
page_walker_loads.ept_dtlb_l3 Counts the number of Extended Page Table walks from the DTLB
that hit in the L3.
page_walker_loads.ept_dtlb_memory Counts the number of Extended Page Table walks from the DTLB
that hit in memory.
page_walker_loads.ept_itlb_l1 Counts the number of Extended Page Table walks from the ITLB
that hit in the L1 and FB.
page_walker_loads.ept_itlb_l2 Counts the number of Extended Page Table walks from the ITLB
that hit in the L2.
page_walker_loads.ept_itlb_l3 Counts the number of Extended Page Table walks from the ITLB
that hit in the L2.
page_walker_loads.ept_itlb_memory Counts the number of Extended Page Table walks from the ITLB
that hit in memory.
tlb_flush.dtlb_thread DTLB flush attempts of the thread-specific entries.
tlb_flush.stlb_any Count number of STLB flush attempts.
inst_retired.any_p Number of instructions at retirement.
The following errata may apply to this: HSD11, HSD140
inst_retired.prec_dist Precise instruction retired event with HW to reduce effect of
PEBS shadow in IP distribution.
The following errata may apply to this: HSD140
inst_retired.x87 This is a non-precise version (that is, does not use PEBS) of
the event that counts FP operations retired. For X87 FP
operations that have no exceptions counting also includes flows
that have several X87, or flows that use X87 uops in the
exception handling.
other_assists.avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty
applicable.
The following errata may apply to this: HSD56, HSM57
other_assists.sse_to_avx Number of transitions from SSE to AVX-256 when penalty
applicable.
The following errata may apply to this: HSD56, HSM57
other_assists.any_wb_assist Number of microcode assists invoked by HW upon uop writeback.
uops_retired.all Counts the number of micro-ops retired. Use Cmask=1 and invert
to count active cycles or stalled cycles.
uops_retired.stall_cycles Cycles without actually retired uops.
uops_retired.total_cycles Cycles with less than 10 actually retired uops.
uops_retired.core_stall_cycles Cycles without actually retired uops.
uops_retired.retire_slots This event counts the number of retirement slots used each
cycle. There are potentially 4 slots that can be used each
cycle - meaning, 4 uops or 4 instructions could retire each
cycle.
machine_clears.cycles Cycles there was a Nuke. Account for both thread-specific and
All Thread Nukes.
machine_clears.memory_ordering This event counts the number of memory ordering machine clears
detected. Memory ordering machine clears can result from memory
address aliasing or snoops from another hardware thread or core
to data inflight in the pipeline. Machine clears can have a
significant performance impact if they are happening
frequently.
machine_clears.smc This event is incremented when self-modifying code (SMC) is
detected, which causes a machine clear. Machine clears can
have a significant performance impact if they are happening
frequently.
machine_clears.maskmov This event counts the number of executed Intel AVX masked load
operations that refer to an illegal address range with the mask
bits set to 0.
br_inst_retired.all_branches Branch instructions at retirement.
br_inst_retired.conditional Counts the number of conditional branch instructions retired.
br_inst_retired.near_call Direct and indirect near call instructions retired.
br_inst_retired.near_call_r3 Direct and indirect macro near call instructions retired
(captured in ring 3).
br_inst_retired.all_branches_pebs All (macro) branch instructions retired.
br_inst_retired.near_return Counts the number of near return instructions retired.
br_inst_retired.not_taken Counts the number of not taken branch instructions retired.
br_inst_retired.near_taken Number of near taken branches retired.
br_inst_retired.far_branch Number of far branches retired.
br_misp_retired.all_branches Mispredicted branch instructions at retirement.
br_misp_retired.conditional Mispredicted conditional branch instructions retired.
br_misp_retired.all_branches_pebs This event counts all mispredicted branch instructions retired.
This is a precise event.
br_misp_retired.near_taken Number of near branch instructions retired that were taken but
mispredicted.
avx_insts.all Note that a whole rep string only counts AVX_INST.ALL once.
hle_retired.start Number of times an HLE execution started.
hle_retired.commit Number of times an HLE execution successfully committed.
hle_retired.aborted Number of times an HLE execution aborted due to any reasons
(multiple categories may count as one).
hle_retired.aborted_misc1 Number of times an HLE execution aborted due to various memory
events (e.g., read/write capacity and conflicts).
hle_retired.aborted_misc2 Number of times an HLE execution aborted due to uncommon
conditions.
hle_retired.aborted_misc3 Number of times an HLE execution aborted due to HLE-unfriendly
instructions.
hle_retired.aborted_misc4 Number of times an HLE execution aborted due to incompatible
memory type.
The following errata may apply to this: HSD65
hle_retired.aborted_misc5 Number of times an HLE execution aborted due to none of the
previous 4 categories (e.g. interrupts).
rtm_retired.start Number of times an RTM execution started.
rtm_retired.commit Number of times an RTM execution successfully committed.
rtm_retired.aborted Number of times an RTM execution aborted due to any reasons
(multiple categories may count as one).
rtm_retired.aborted_misc1 Number of times an RTM execution aborted due to various memory
events (e.g. read/write capacity and conflicts).
rtm_retired.aborted_misc2 Number of times an RTM execution aborted due to various memory
events (e.g., read/write capacity and conflicts).
rtm_retired.aborted_misc3 Number of times an RTM execution aborted due to HLE-unfriendly
instructions.
rtm_retired.aborted_misc4 Number of times an RTM execution aborted due to incompatible
memory type.
The following errata may apply to this: HSD65
rtm_retired.aborted_misc5 Number of times an RTM execution aborted due to none of the
previous 4 categories (e.g. interrupt).
fp_assist.x87_output Number of X87 FP assists due to output values.
fp_assist.x87_input Number of X87 FP assists due to input values.
fp_assist.simd_output Number of SIMD FP assists due to output values.
fp_assist.simd_input Number of SIMD FP assists due to input values.
fp_assist.any Cycles with any input/output SSE* or FP assists.
rob_misc_events.lbr_inserts Count cases of saving new LBR records by hardware.
mem_uops_retired.stlb_miss_loads Retired load uops that miss the STLB.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.stlb_miss_stores Retired store uops that miss the STLB.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.lock_loads Retired load uops with locked access.
The following errata may apply to this: HSD76, HSD29, HSM30
mem_uops_retired.split_loads Retired load uops that split across a cacheline boundary.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.split_stores Retired store uops that split across a cacheline boundary.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.all_loads All retired load uops.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.all_stores All retired store uops.
The following errata may apply to this: HSD29, HSM30
mem_load_uops_retired.l1_hit Retired load uops with L1 cache hits as data sources.
The following errata may apply to this: HSD29, HSM30
mem_load_uops_retired.l2_hit Retired load uops with L2 cache hits as data sources.
The following errata may apply to this: HSD76, HSD29, HSM30
mem_load_uops_retired.l3_hit Retired load uops with L3 cache hits as data sources.
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM26, HSM30
mem_load_uops_retired.l1_miss Retired load uops missed L1 cache as data sources.
The following errata may apply to this: HSM30
mem_load_uops_retired.l2_miss Retired load uops missed L2. Unknown data source excluded.
The following errata may apply to this: HSD29, HSM30
mem_load_uops_retired.l3_miss Retired load uops missed L3. Excludes unknown data source .
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM26, HSM30
mem_load_uops_retired.hit_lfb Retired load uops which data sources were load uops missed L1
but hit FB due to preceding miss to the same cache line with
data not ready.
The following errata may apply to this: HSM30
mem_load_uops_l3_hit_retired.xsnp_miss Retired load uops which data sources were L3 hit and cross-core
snoop missed in on-pkg core cache.
The following errata may apply to this: HSD29, HSD25, HSM26,
HSM30
mem_load_uops_l3_hit_retired.xsnp_hit Retired load uops which data sources were L3 and cross-core
snoop hits in on-pkg core cache.
The following errata may apply to this: HSD29, HSD25, HSM26,
HSM30
mem_load_uops_l3_hit_retired.xsnp_hitm Retired load uops which data sources were HitM responses from
shared L3.
The following errata may apply to this: HSD29, HSD25, HSM26,
HSM30
mem_load_uops_l3_hit_retired.xsnp_none Retired load uops which data sources were hits in L3 without
snoops required.
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM26, HSM30
mem_load_uops_l3_miss_retired.local_dram This event counts retired load uops where the data came from
local DRAM. This does not include hardware prefetches.
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM30
baclears.any Number of front end re-steers due to BPU misprediction.
l2_trans.demand_data_rd Demand data read requests that access L2 cache.
l2_trans.rfo RFO requests that access L2 cache.
l2_trans.code_rd L2 cache accesses when fetching instructions.
l2_trans.all_pf Any MLC or L3 HW prefetch accessing L2, including rejects.
l2_trans.l1d_wb L1D writebacks that access L2 cache.
l2_trans.l2_fill L2 fill requests that access L2 cache.
l2_trans.l2_wb L2 writebacks that access L2 cache.
l2_trans.all_requests Transactions accessing L2 pipe.
l2_lines_in.i L2 cache lines in I state filling L2.
l2_lines_in.s L2 cache lines in S state filling L2.
l2_lines_in.e L2 cache lines in E state filling L2.
l2_lines_in.all This event counts the number of L2 cache lines brought into the
L2 cache. Lines are filled into the L2 cache when there was an
L2 miss.
l2_lines_out.demand_clean Clean L2 cache lines evicted by demand.
l2_lines_out.demand_dirty Dirty L2 cache lines evicted by demand.
sq_misc.split_lock tbd
SEE ALSO
cpc(3CPC) https://download.01.org/perfmon/index/illumos June 18, 2018 illumos