AMD_F17H_ZEN1_EVENTS(3CPC) CPU Performance Counters Library Functions
NAME
amd_f17h_zen1_events - AMD Family 17h Zen1 processor performance
monitoring events
DESCRIPTION
This manual page describes events specfic to AMD Family 17h Zen1
processors. For more information, please consult the appropriate AMD
BIOS and Kernel Developer's guide or Open-Source Register Reference.
Each of the events listed below includes the AMD mnemonic which matches
the name found in the AMD manual and a brief summary of the event. If
available, a more detailed description of the event follows and then
any additional unit values that modify the event. Each unit can be
combined to create a new event in the system by placing the '.'
character between the event name and the unit name.
The following events are supported:
FpuPipeAssignment Core::X86::Pmc::Core::FpuPipeAssignment - FPU Pipe Assignment
The number of operations (uOps) and dual-pipeuOps dispatched to
each of the 4 FPU execution pipelines. This event reflects how
busy the FPU pipelines are and may be used for workload
characterization. This includes all operations performed by
x87, MMXTM, and SSE instructions, including moves. Each
increment represents a one-cycle dispatch event. This event is
a speculative event. (See
Core::X86::Pmc::Core::ExRetMmxFpInstr). Since this event
includes non- numeric operations it is not suitable for
measuring MFLOPS.
This event has the following units which may be used to modify
the behavior of the event:
Dual3 Total number multi-pipe uOps assigned to Pipe 3
Dual2 Total number multi-pipe uOps assigned to Pipe 2
Dual1 Total number multi-pipe uOps assigned to Pipe 1
Dual0 Total number multi-pipe uOps assigned to Pipe 0
Total3 Total number uOps assigned to Pipe 3
Total2 Total number uOps assigned to Pipe 2
Total1 Total number uOps assigned to Pipe 1
Total0 Total number uOps assigned to Pipe 0
FpSchedEmpty Core::X86::Pmc::Core::FpSchedEmpty - FP Scheduler Empty
This is a speculative event. The number of cycles in which the
FPU scheduler is empty. Note that some Ops like FP loads bypass
the scheduler. Invert this (Core::X86::Msr::PERF_CTL[Inv] == 1)
to count cycles in which at least one FPU operation is present
in the FPU.
FpRetx87FpOps Core::X86::Pmc::Core::FpRetx87FpOps - Retired x87 Floating
Point Operations
The number of x87 floating-point Ops that have retired. The
number of events logged per cycle can vary from 0 to 8.
This event has the following units which may be used to modify
the behavior of the event:
DivSqrROps Divide and square root Ops
MulOps Multiply Ops
AddSubOps Add/subtract Ops
FpRetSseAvxOps Core::X86::Pmc::Core::FpRetSseAvxOps - Retired SSE/AVX
Operations
This is a retire-based event. The number of retired SSE/AVX
FLOPS. The number of events logged per cycle can vary from 0 to
64. This event can count above 15. See 2.1.11.2 [Large
Increment per Cycle Events]
This event has the following units which may be used to modify
the behavior of the event:
DpMultAddFlops Double precision multiply-add FLOPS. Multiply-add
counts as 2 FLOPS.
DpDivFlops Double precision divide/square root FLOPS.
DpMultFlops Double precision multiply FLOPS.
DpAddSubFlops Double precision add/subtract FLOPS.
SpMultAddFlops Single precision multiply-add FLOP. Multiply-add counts
as 2 FLOPS.
SpDivFlops Single-precision divide/square root FLOPS
SpMultFlops Single-precision multiply FLOPS
SpAddSubFlops Single-precision add/subtract FLOPS
FpNumMovElimScalOp Core::X86::Pmc::Core::FpNumMovElimScalOp - Number of Move
Elimination and Scalar Op Optimization
This is a dispatch based speculative event, and is useful for
measuring the effectiveness of the Move elimination and Scalar
code optimization schemes.
This event has the following units which may be used to modify
the behavior of the event:
Optimized Number of Scalar Ops optimized
OptPotential Number of Ops that are candidates for optimization
(have Z-bit either set or pass).
SseMovOpsElim Number of SSE Move Ops eliminated
SseMovOps Number of SSE Move Ops
FpRetiredSerOps Core::X86::Pmc::Core::FpRetiredSerOps - Retired Serializing Ops
The number of serializing Ops retired.
This event has the following units which may be used to modify
the behavior of the event:
X87CtrlRet x87 control word mispredict traps due to mispredictions
in RC or PC, or changes in mask bits
X87BotRet x87 bottom-executing uOps retired
SseCtrlRet SSE control word mispredict traps due to mispredictions
in RC, FTZ or DAZ, or changes in mask bits
SseBotRet SSE bottom-executing uOps retired
LsBadStatus2 Core::X86::Pmc::Core::LsBadStatus2 - Bad Status 2
Store To Load Interlock (STLI) are loads that were unable to
complete because of a possible match with an older store, and
the older store could not do STLF for some reason. There are a
number of reasons why this occurs, and this perfmon organizes
them into three major groups.
This event has the following units which may be used to modify
the behavior of the event:
StlfNoData The load is capable of forwarding from an older store
(i.e. the address match/overlap between the load and
the older store) was good and everything works from an
address perspective, but the store's data has not been
produced by EX or FP yet so it can't be forwarded.
StliOther All the other reasons. The most common among these is
that there is only a partial overlap between the store
and the load, for example there's an 8B store to
address A and a 16B load starting at address A. STLF
can't be performed in this case because only some of
the load's data is coming fromthe store, so the load
gets StliOther. Another StliOther case is if the load
hits a non-cacheable store that's sitting in the non-
cacheable buffers (WCBs).
StliNoState The STLF is validated using DC way instead of an
address compare. The store that wants to STLF is
required to be a DC hit and have a valid DC way. The
STLF candidate store is chosen based on address bits
11:0 overlap, and the DC way of that store is compared
to the way of the load. If the store is in a DC miss
state, then it doesn't have a valid DC way and so
cannot validate STLF. The load gets StliNoState and
can't complete. Read-write
LsLocks Core::X86::Pmc::Core::LsLocks - Locks
LsRetClClush Core::X86::Pmc::Core::LsRetClClush - Retired CLFLUSH
Instructions
The number of retired CLFLUSH instructions. This is a non-
speculative event.
LsRetCpuid Core::X86::Pmc::Core::LsRetCpuid - Retired CPUID Instructions
The number of CPUID instructions retired.
LsDispatch Core::X86::Pmc::Core::LsDispatch - LS Dispatch
Counts the number of operations dispatched to the LS unit.
LsSmiRx Core::X86::Pmc::Core::LsSmiRx - SMIs Received
Counts the number of SMIs received.
LsSTLF Core::X86::Pmc::Core::LsSTLF - Store to Load Forward
Number of STLF hits.
LsStCommitCancel2 Core::X86::Pmc::Core::LsStCommitCancel2 - Store Commit Cancels
2
This event has the following units which may be used to modify
the behavior of the event:
StCommitCancelWcbFull A non-cacheable store and the non-cacheable commit
buffer is full.
LsDcAccesses Core::X86::Pmc::Core::LsDcAccesses - Data Cache Accesses
The number of accesses to the data cache for load and store
references. This may include certain microcode scratchpad
accesses, although these are generally rare. Each increment
represents an eight-byte access, although the instruction may
only be accessing a portion of that. This event is a
speculative event.
LsRefillsFromSys Core::X86::Pmc::Core::LsRefillsFromSys - Data Cache Refills
from System
Demand Data Cache Fills by Data Source.
This event has the following units which may be used to modify
the behavior of the event:
LS_MABRESP_RMT_DRAM DRAM or IO from different die.
LS_MABRESP_RMT_CACHE Hit in cache; Remote CCX and the address's Home Node is
on a different die.
LS_MABRESP_LCL_DRAM DRAM or IO from this thread's die.
LS_MABRESP_LCL_CACHE Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.
MABRESP_LCL_L2 Local L2 hit.
LsL1DTlbMiss Core::X86::Pmc::Core::LsL1DTlbMiss - L1 DTLB Miss
This event has the following units which may be used to modify
the behavior of the event:
TlbReload1GL2Miss TlbReload2ML2Miss TlbReload32KL2Miss TlbReload4KL2Miss TlbReload1GL2Hit TlbReload2ML2Hit TlbReload32KL2Hit TlbReload4KL2Hit LsTablewalker Core::X86::Pmc::Core::LsTablewalker - Tablewalker allocation
This event has the following units which may be used to modify
the behavior of the event:
PerfMonTablewalkAllocIside1 PerfMonTablewalkAllocIside0 PerfMonTablewalkAllocDside1 PerfMonTablewalkAllocDside0 LsMisalAccesses Core::X86::Pmc::Core::LsMisalAccesses - Misaligned loads
LsPrefInstrDisp Core::X86::Pmc::Core::LsPrefInstrDisp - Prefetch Instructions
Dispatched
Software Prefetch Instructions Dispatched.
This event has the following units which may be used to modify
the behavior of the event:
PrefetchNTA StorePrefetchW LoadPrefetchW Prefetch, Prefetch_T0_T1_T2
LsInefSwPref Core::X86::Pmc::Core::LsInefSwPref - Ineffective Software
Prefetchs
The number of software prefetches that did not fetch data
outside of the processor core.
This event has the following units which may be used to modify
the behavior of the event:
MabMchCnt Software PREFETCH instruction saw a match on an
already-allocated miss request buffer.
DataPipeSwPfDcHit Software PREFETCH instruction saw a DC hit.
LsSwPfDcFills Core::X86::Pmc::Core::LsSwPfDcFills - Software Prefetch Data
Cache Fills
Software Prefetch Data Cache Fills by Data Source
This event has the following units which may be used to modify
the behavior of the event:
LS_MABRESP_RMT_DRAM DRAM or IO from different die.
LS_MABRESP_RMT_CACHE Hit in cache; Remote CCX and the address's Home Node is
on a different die.
LS_MABRESP_LCL_DRAM DRAM or IO from this thread's die.
LS_MABRESP_LCL_CACHE Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.
MABRESP_LCL_L2 Local L2 hit.
LsHwPfDcFills Core::X86::Pmc::Core::LsHwPfDcFills - Hardware Prefetch Data
Cache Fills
Hardware Prefetch Data Cache Fills by Data Source
This event has the following units which may be used to modify
the behavior of the event:
LS_MABRESP_RMT_DRAM DRAM or IO from different die.
LS_MABRESP_RMT_CACHE Hit in cache; Remote CCX and the address's Home Node is
on a different die.
LS_MABRESP_LCL_DRAM DRAM or IO from this thread's die.
LS_MABRESP_LCL_CACHE Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.
MABRESP_LCL_L2 Local L2 hit.
LsTwDcFills Core::X86::Pmc::Core::LsTwDcFills - Table Walker Data Cache
Fills by Data Source
This event has the following units which may be used to modify
the behavior of the event:
LS_MABRESP_RMT_DRAM DRAM or IO from different die.
LS_MABRESP_RMT_CACHE Hit in cache; Remote CCX and the address's Home Node is
on a different die.
LS_MABRESP_LCL_DRAM DRAM or IO from this thread's die.
LS_MABRESP_LCL_CACHE Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.
MABRESP_LCL_L2 Local L2 hit.
LsNotHaltedCyc Core::X86::Pmc::Core::LsNotHaltedCyc - Cycles not in Halt
IcFw32 Core::X86::Pmc::Core::IcFw32 - 32 Byte Instruction Cache Fetch
The number of 32B fetch windows transferred from IC pipe to DE
instruction decoder (includes non-cacheable and cacheable fill
responses).
IcFw32Miss Core::X86::Pmc::Core::IcFw32Miss - 32 Byte Instruction Cache
Misses
The number of 32B fetch windows tried to read the L1 IC and
missed in the full tag.
IcCacheFillL2 Core::X86::Pmc::Core::IcCacheFillL2 - Instruction Cache Refills
from L2
The number of 64 byte instruction cache line was fulfilled from
the L2 cache.
IcCacheFillSys Core::X86::Pmc::Core::IcCacheFillSys - Instruction Cache
Refills from System
The number of 64 byte instruction cache line fulfilled from
system memory or another cache.
BpL1TlbMissL2Hit Core::X86::Pmc::Core::BpL1TlbMissL2Hit - L1 ITLB Miss, L2 ITLB
Hit
The number of instruction fetches that miss in the L1 ITLB but
hit in the L2 ITLB.
BpL1TlbMissL2Miss Core::X86::Pmc::Core::BpL1TlbMissL2Miss - L1 ITLB Miss, L2 ITLB
Miss
The number of instruction fetches that miss in both the L1 and
L2 TLBs
IcFetchStall Core::X86::Pmc::Core::IcFetchStall - Instruction Pipe Stall
This event has the following units which may be used to modify
the behavior of the event:
IcStallAny Instruction Cache pipeline was stalled during this
clock cycle for any reason.
IcStallDqEmpty Instruction Cache pipeline was stalled during this
clock cycle due to upstream not providing fetch
addresses quickly.
IcStallBackPressure Instruction Cache pipeline was stalled during this
clock cycle due to downstream queues being full.
BpL1BTBCorrect Core::X86::Pmc::Core::BpL1BTBCorrect - L1 BTB Correction
BpL2BTBCorrect Core::X86::Pmc::Core::BpL2BTBCorrect - L2 BTB Correction
IcCacheInval Core::X86::Pmc::Core::IcCacheInval - Instruction Cache Lines
Invalidated
The number of instruction cache lines invalidated. A non-SMC
event is CMC (cross modifying code), either from the other
thread of the core or another core.
This event has the following units which may be used to modify
the behavior of the event:
L2InvalidatingProbe IC line invalidated due to L2 invalidating probe
(external or LS).
FillInvalidated IC line invalidated due to overwriting fill response.
BpTlbRel Core::X86::Pmc::Core::BpTlbRel - ITLB Reloads
The number of ITLB reload requests.
IcOcModeSwitch Core::X86::Pmc::Core::IcOcModeSwitch - OC Mode Switch
This event has the following units which may be used to modify
the behavior of the event:
OcIcModeSwitch OC to IC mode switch
IcOcModeSwitch IC to OC mode switch
DeDisDispatchTokenStalls0 Core::X86::Pmc::Core::DeDisDispatchTokenStalls0 - Dynamic
Tokens Dispatch Stall Cycles 0
Cycles where a dispatch group is valid but does not get
dispatched due to a token stall.
This event has the following units which may be used to modify
the behavior of the event:
RetireTokenStall RETIRE Tokens unavailable
AGSQTokenStall AGSQ Tokens unavailable
ALUTokenStall ALU tokens total unavailable
ALSQ3_0_TokenStall ALSQ3TokenStall ALSQ 3 Tokens unavailable
ALSQ2TokenStall ALSQ 2 Tokens unavailable
ALSQ1TokenStall ALSQ 1 Tokens unavailable
ExRetInstr Core::X86::Pmc::Core::ExRetInstr - Retired Instructions
ExRetCops Core::X86::Pmc::Core::ExRetCops - Retired Uops
The number of uOps retired. This includes all processor
activity (instructions, exceptions, interrupts, microcode
assists, etc.). The number of events logged per cycle can vary
from 0 to 4.
ExRetBrn Core::X86::Pmc::Core::ExRetBrn - Retired Branch Instructions
The number of branch instructions retired. This includes all
types of architectural control flow changes, including
exceptions and interrupts.
ExRetBrnMisp Core::X86::Pmc::Core::ExRetBrnMisp - Retired Branch
Instructions Mispredicted
The number of branch instructions retired, of any type, that
were not correctly predicted. This includes those for which
prediction is not attempted (far control transfers, exceptions
and interrupts).
ExRetBrnTkn Core::X86::Pmc::Core::ExRetBrnTkn - Retired Taken Branch
Instructions
The number of taken branches that were retired. This includes
all types of architectural control flow changes, including
exceptions and interrupts.
ExRetBrnTknMisp Core::X86::Pmc::Core::ExRetBrnTknMisp - Retired Taken Branch
Instructions Mispredicted
The number of retired taken branch instructions that were
mispredicted.
ExRetBrnFar Core::X86::Pmc::Core::ExRetBrnFar - Retired Far Control
Transfers
The number of far control transfers retired including far
call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and
interrupts. Far control transfers are not subject to branch
prediction.
ExRetBrnResync Core::X86::Pmc::Core::ExRetBrnResync - Retired Branch Resyncs
The number of resync branches. These reflect pipeline restarts
due to certain microcode assists and events such as writes to
the active instruction stream, among other things. Each
occurrence reflects a restart penalty similar to a branch
mispredict. This is relatively rare.
ExRetNearRet Core::X86::Pmc::Core::ExRetNearRet - Retired Near Returns
The number of near return instructions (RET or RET Iw) retired.
ExRetNearRetMispred Core::X86::Pmc::Core::ExRetNearRetMispred - Retired Near
Returns Mispredicted
The number of near returns retired that were not correctly
predicted by the return address predictor. Each such mispredict
incurs the same penalty as a mispredicted conditional branch
instruction.
ExRetBrnIndMisp Core::X86::Pmc::Core::ExRetBrnIndMisp - Retired Indirect Branch
Instructions Mispredicted
ExRetMmxFpInstr Core::X86::Pmc::Core::ExRetMmxFpInstr - Retired MMXTM/FP
Instructions
The number of MMX, SSE or x87 instructions retired. The
UnitMask allows the selection of the individual classes of
instructions as given in the table. Each increment represents
one complete instruction. Since this event includes non-
numeric instructions it is not suitable for measuring MFLOPS.
This event has the following units which may be used to modify
the behavior of the event:
SseInstr SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, SSE41,
SSE42, AVX).
MmxInstr MMX instructions.
X87Instr x87 instructions
ExRetCond Core::X86::Pmc::Core::ExRetCond - Retired Conditional Branch
Instructions
ExDivBusy Core::X86::Pmc::Core::ExDivBusy - Div Cycles Busy count
ExDivCount Core::X86::Pmc::Core::ExDivCount - Div Op Count
ExTaggedIbsOps Core::X86::Pmc::Core::ExTaggedIbsOps - Tagged IBS Ops
This event has the following units which may be used to modify
the behavior of the event:
IbsCountRollover Number of times an op could not be tagged by IBS
because of a previous tagged op that has not retired.
IbsTaggedOpsRet Number of Ops tagged by IBS that retired
IbsTaggedOps Number of Ops tagged by IBS
ExRetFusBrnchInst Core::X86::Pmc::Core::ExRetFusBrnchInst - Retired Fused Branch
Instructions
The number of fused retired branch instructions retired per
cycle. The number of events logged per cycle can vary from 0 to
3.
L2RequestG1 Core::X86::Pmc::Core::L2RequestG1 - Requests to L2 Group1
This event has the following units which may be used to modify
the behavior of the event:
RdBlkL RdBlkX LsRdBlkC_S CacheableIcRead ChangeToX PrefetchL2 Assume core should also count these and allow the
breakdown between H/W vs. S/W and LS vs. IC.
L2HwPf OtherRequests Events covered by Core::X86::Pmc::Core::L2RequestG2.
L2RequestG2 Core::X86::Pmc::Core::L2RequestG2 - Requests to L2 Group2
Multi-events in that LS and IF requests can be received
simultaneous.
This event has the following units which may be used to modify
the behavior of the event:
Group1 All Group 1 commands not in unit0.
LsRdSized RdSized, RdSized32, RdSized64.
LsRdSizedNC RdSizedNC, RdSized32NC, RdSized64NC.
IcRdSized IcRdSizedNC SmcInval BusLocksOriginator BusLocksResponses L2Latancy Core::X86::Pmc::Core::L2Latancy - L2 Latency
Total cycles spent waiting for L2 fills to complete from L3 or
memory, divided by four. This may be used to calculate average
latency by multiplying this count by four and then dividing by
the total number of L2 fills (unit mask
Core::X86::Pmc::Core::L2RequestG1 == FEh). Event counts are for
both threads. To calculate average latency, the number of fills
from both threads must be used.
This event has the following units which may be used to modify
the behavior of the event:
L2CyclesWaitingOnFills L2WbcReq Core::X86::Pmc::Core::L2WbcReq - LS to L2 WBC requests
This event has the following units which may be used to modify
the behavior of the event:
WcbWrite WcbClose CacheLineFlush I_LineFlush ZeroByteStore This becomes WriteNoData at SDP; this count does not
include DVM Sync Ops and bus locks which are counted in
Core::X86::Pmc::Core::L2RequestG2.
LocalIcClr Local IC Clear
CLZero Cache Line Zero
L2CacheReqStat Core::X86::Pmc::Core::L2CacheReqStat - Core to L2 Cacheable
Request Access Status
This event does not count accesses to the L2 cache by the L2
prefetcher, but it does count accesses by the L1 prefetcher.
This event has the following units which may be used to modify
the behavior of the event:
LsRdBlkCS LS ReadBlock C/S Hit
LsRdBlkLHitX LS Read Block L Hit X
LsRdBlkLHitS LsRdBlkL Hit Shared
LsRdBlkX LsRdBlkX/ChgToX Hit X. Count RdBlkX finding Shared as a
Miss.
LsRdBlkC LS Read Block C S L X Change to X Miss
IcFillHitX IC Fill Hit Exclusive Stale
IcFillHitS IC Fill Hit Shared
IcFillMiss IC Fill Miss
L2FillPending Core::X86::Pmc::Core::L2FillPending - Cycles with fill pending
from L2
Total cycles spent with one or more fill requests in flight
from L2.
This event has the following units which may be used to modify
the behavior of the event:
L2FillBusy.SEE ALSO
cpc(3CPC)illumos March 25, 2019 illumos