Tribblix: manual page: amd_f17h_zen1

AMD_F17H_ZEN1_EVENTS(3CPC) CPU Performance Counters Library Functions

NAME

amd_f17h_zen1_events - AMD Family 17h Zen1 processor performance
monitoring events

DESCRIPTION

This manual page describes events specfic to AMD Family 17h Zen1
processors. For more information, please consult the appropriate AMD
BIOS and Kernel Developer's guide or Open-Source Register Reference.

Each of the events listed below includes the AMD mnemonic which matches
the name found in the AMD manual and a brief summary of the event. If
available, a more detailed description of the event follows and then
any additional unit values that modify the event. Each unit can be
combined to create a new event in the system by placing the '.'
character between the event name and the unit name.

The following events are supported:

FpuPipeAssignment
Core::X86::Pmc::Core::FpuPipeAssignment - FPU Pipe Assignment

The number of operations (uOps) and dual-pipeuOps dispatched to
each of the 4 FPU execution pipelines. This event reflects how
busy the FPU pipelines are and may be used for workload
characterization. This includes all operations performed by
x87, MMXTM, and SSE instructions, including moves. Each
increment represents a one-cycle dispatch event. This event is
a speculative event. (See
Core::X86::Pmc::Core::ExRetMmxFpInstr). Since this event
includes non- numeric operations it is not suitable for
measuring MFLOPS.

This event has the following units which may be used to modify
the behavior of the event:

Dual3 Total number multi-pipe uOps assigned to Pipe 3

Dual2 Total number multi-pipe uOps assigned to Pipe 2

Dual1 Total number multi-pipe uOps assigned to Pipe 1

Dual0 Total number multi-pipe uOps assigned to Pipe 0

Total3 Total number uOps assigned to Pipe 3

Total2 Total number uOps assigned to Pipe 2

Total1 Total number uOps assigned to Pipe 1

Total0 Total number uOps assigned to Pipe 0

FpSchedEmpty
Core::X86::Pmc::Core::FpSchedEmpty - FP Scheduler Empty

This is a speculative event. The number of cycles in which the
FPU scheduler is empty. Note that some Ops like FP loads bypass
the scheduler. Invert this (Core::X86::Msr::PERF_CTL[Inv] == 1)
to count cycles in which at least one FPU operation is present
in the FPU.

FpRetx87FpOps
Core::X86::Pmc::Core::FpRetx87FpOps - Retired x87 Floating
Point Operations

The number of x87 floating-point Ops that have retired. The
number of events logged per cycle can vary from 0 to 8.

This event has the following units which may be used to modify
the behavior of the event:

DivSqrROps
Divide and square root Ops

MulOps Multiply Ops

AddSubOps
Add/subtract Ops

FpRetSseAvxOps
Core::X86::Pmc::Core::FpRetSseAvxOps - Retired SSE/AVX
Operations

This is a retire-based event. The number of retired SSE/AVX
FLOPS. The number of events logged per cycle can vary from 0 to
64. This event can count above 15. See 2.1.11.2 [Large
Increment per Cycle Events]

This event has the following units which may be used to modify
the behavior of the event:

DpMultAddFlops
Double precision multiply-add FLOPS. Multiply-add
counts as 2 FLOPS.

DpDivFlops
Double precision divide/square root FLOPS.

DpMultFlops
Double precision multiply FLOPS.

DpAddSubFlops
Double precision add/subtract FLOPS.

SpMultAddFlops
Single precision multiply-add FLOP. Multiply-add counts
as 2 FLOPS.

SpDivFlops
Single-precision divide/square root FLOPS

SpMultFlops
Single-precision multiply FLOPS

SpAddSubFlops
Single-precision add/subtract FLOPS

FpNumMovElimScalOp
Core::X86::Pmc::Core::FpNumMovElimScalOp - Number of Move
Elimination and Scalar Op Optimization

This is a dispatch based speculative event, and is useful for
measuring the effectiveness of the Move elimination and Scalar
code optimization schemes.

This event has the following units which may be used to modify
the behavior of the event:

Optimized
Number of Scalar Ops optimized

OptPotential
Number of Ops that are candidates for optimization
(have Z-bit either set or pass).

SseMovOpsElim
Number of SSE Move Ops eliminated

SseMovOps
Number of SSE Move Ops

FpRetiredSerOps
Core::X86::Pmc::Core::FpRetiredSerOps - Retired Serializing Ops

The number of serializing Ops retired.

This event has the following units which may be used to modify
the behavior of the event:

X87CtrlRet
x87 control word mispredict traps due to mispredictions
in RC or PC, or changes in mask bits

X87BotRet
x87 bottom-executing uOps retired

SseCtrlRet
SSE control word mispredict traps due to mispredictions
in RC, FTZ or DAZ, or changes in mask bits

SseBotRet
SSE bottom-executing uOps retired

LsBadStatus2
Core::X86::Pmc::Core::LsBadStatus2 - Bad Status 2

Store To Load Interlock (STLI) are loads that were unable to
complete because of a possible match with an older store, and
the older store could not do STLF for some reason. There are a
number of reasons why this occurs, and this perfmon organizes
them into three major groups.

This event has the following units which may be used to modify
the behavior of the event:

StlfNoData
The load is capable of forwarding from an older store
(i.e. the address match/overlap between the load and
the older store) was good and everything works from an
address perspective, but the store's data has not been
produced by EX or FP yet so it can't be forwarded.

StliOther
All the other reasons. The most common among these is
that there is only a partial overlap between the store
and the load, for example there's an 8B store to
address A and a 16B load starting at address A. STLF
can't be performed in this case because only some of
the load's data is coming fromthe store, so the load
gets StliOther. Another StliOther case is if the load
hits a non-cacheable store that's sitting in the non-
cacheable buffers (WCBs).

StliNoState
The STLF is validated using DC way instead of an
address compare. The store that wants to STLF is
required to be a DC hit and have a valid DC way. The
STLF candidate store is chosen based on address bits
11:0 overlap, and the DC way of that store is compared
to the way of the load. If the store is in a DC miss
state, then it doesn't have a valid DC way and so
cannot validate STLF. The load gets StliNoState and
can't complete. Read-write

LsLocks
Core::X86::Pmc::Core::LsLocks - Locks

LsRetClClush
Core::X86::Pmc::Core::LsRetClClush - Retired CLFLUSH
Instructions

The number of retired CLFLUSH instructions. This is a non-
speculative event.

LsRetCpuid
Core::X86::Pmc::Core::LsRetCpuid - Retired CPUID Instructions

The number of CPUID instructions retired.

LsDispatch
Core::X86::Pmc::Core::LsDispatch - LS Dispatch

Counts the number of operations dispatched to the LS unit.

LsSmiRx
Core::X86::Pmc::Core::LsSmiRx - SMIs Received

Counts the number of SMIs received.

LsSTLF Core::X86::Pmc::Core::LsSTLF - Store to Load Forward

Number of STLF hits.

LsStCommitCancel2
Core::X86::Pmc::Core::LsStCommitCancel2 - Store Commit Cancels
2

This event has the following units which may be used to modify
the behavior of the event:

StCommitCancelWcbFull
A non-cacheable store and the non-cacheable commit
buffer is full.

LsDcAccesses
Core::X86::Pmc::Core::LsDcAccesses - Data Cache Accesses

The number of accesses to the data cache for load and store
references. This may include certain microcode scratchpad
accesses, although these are generally rare. Each increment
represents an eight-byte access, although the instruction may
only be accessing a portion of that. This event is a
speculative event.

LsRefillsFromSys
Core::X86::Pmc::Core::LsRefillsFromSys - Data Cache Refills
from System

Demand Data Cache Fills by Data Source.

This event has the following units which may be used to modify
the behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is
on a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsL1DTlbMiss
Core::X86::Pmc::Core::LsL1DTlbMiss - L1 DTLB Miss

This event has the following units which may be used to modify
the behavior of the event:

TlbReload1GL2Miss

TlbReload2ML2Miss

TlbReload32KL2Miss

TlbReload4KL2Miss

TlbReload1GL2Hit

TlbReload2ML2Hit

TlbReload32KL2Hit

TlbReload4KL2Hit

LsTablewalker
Core::X86::Pmc::Core::LsTablewalker - Tablewalker allocation

This event has the following units which may be used to modify
the behavior of the event:

PerfMonTablewalkAllocIside1

PerfMonTablewalkAllocIside0

PerfMonTablewalkAllocDside1

PerfMonTablewalkAllocDside0

LsMisalAccesses
Core::X86::Pmc::Core::LsMisalAccesses - Misaligned loads

LsPrefInstrDisp
Core::X86::Pmc::Core::LsPrefInstrDisp - Prefetch Instructions
Dispatched

Software Prefetch Instructions Dispatched.

This event has the following units which may be used to modify
the behavior of the event:

PrefetchNTA

StorePrefetchW

LoadPrefetchW
Prefetch, Prefetch_T0_T1_T2

LsInefSwPref
Core::X86::Pmc::Core::LsInefSwPref - Ineffective Software
Prefetchs

The number of software prefetches that did not fetch data
outside of the processor core.

This event has the following units which may be used to modify
the behavior of the event:

MabMchCnt
Software PREFETCH instruction saw a match on an
already-allocated miss request buffer.

DataPipeSwPfDcHit
Software PREFETCH instruction saw a DC hit.

LsSwPfDcFills
Core::X86::Pmc::Core::LsSwPfDcFills - Software Prefetch Data
Cache Fills

Software Prefetch Data Cache Fills by Data Source

This event has the following units which may be used to modify
the behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is
on a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsHwPfDcFills
Core::X86::Pmc::Core::LsHwPfDcFills - Hardware Prefetch Data
Cache Fills

Hardware Prefetch Data Cache Fills by Data Source

This event has the following units which may be used to modify
the behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is
on a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsTwDcFills
Core::X86::Pmc::Core::LsTwDcFills - Table Walker Data Cache
Fills by Data Source

This event has the following units which may be used to modify
the behavior of the event:

LS_MABRESP_RMT_DRAM
DRAM or IO from different die.

LS_MABRESP_RMT_CACHE
Hit in cache; Remote CCX and the address's Home Node is
on a different die.

LS_MABRESP_LCL_DRAM
DRAM or IO from this thread's die.

LS_MABRESP_LCL_CACHE
Hit in cache; local CCX (not Local L2), or Remote CCX
and the address's Home Node is on this thread's die.

MABRESP_LCL_L2
Local L2 hit.

LsNotHaltedCyc
Core::X86::Pmc::Core::LsNotHaltedCyc - Cycles not in Halt

IcFw32 Core::X86::Pmc::Core::IcFw32 - 32 Byte Instruction Cache Fetch

The number of 32B fetch windows transferred from IC pipe to DE
instruction decoder (includes non-cacheable and cacheable fill
responses).

IcFw32Miss
Core::X86::Pmc::Core::IcFw32Miss - 32 Byte Instruction Cache
Misses

The number of 32B fetch windows tried to read the L1 IC and
missed in the full tag.

IcCacheFillL2
Core::X86::Pmc::Core::IcCacheFillL2 - Instruction Cache Refills
from L2

The number of 64 byte instruction cache line was fulfilled from
the L2 cache.

IcCacheFillSys
Core::X86::Pmc::Core::IcCacheFillSys - Instruction Cache
Refills from System

The number of 64 byte instruction cache line fulfilled from
system memory or another cache.

BpL1TlbMissL2Hit
Core::X86::Pmc::Core::BpL1TlbMissL2Hit - L1 ITLB Miss, L2 ITLB
Hit

The number of instruction fetches that miss in the L1 ITLB but
hit in the L2 ITLB.

BpL1TlbMissL2Miss
Core::X86::Pmc::Core::BpL1TlbMissL2Miss - L1 ITLB Miss, L2 ITLB
Miss

The number of instruction fetches that miss in both the L1 and
L2 TLBs

IcFetchStall
Core::X86::Pmc::Core::IcFetchStall - Instruction Pipe Stall

This event has the following units which may be used to modify
the behavior of the event:

IcStallAny
Instruction Cache pipeline was stalled during this
clock cycle for any reason.

IcStallDqEmpty
Instruction Cache pipeline was stalled during this
clock cycle due to upstream not providing fetch
addresses quickly.

IcStallBackPressure
Instruction Cache pipeline was stalled during this
clock cycle due to downstream queues being full.

BpL1BTBCorrect
Core::X86::Pmc::Core::BpL1BTBCorrect - L1 BTB Correction

BpL2BTBCorrect
Core::X86::Pmc::Core::BpL2BTBCorrect - L2 BTB Correction

IcCacheInval
Core::X86::Pmc::Core::IcCacheInval - Instruction Cache Lines
Invalidated

The number of instruction cache lines invalidated. A non-SMC
event is CMC (cross modifying code), either from the other
thread of the core or another core.

This event has the following units which may be used to modify
the behavior of the event:

L2InvalidatingProbe
IC line invalidated due to L2 invalidating probe
(external or LS).

FillInvalidated
IC line invalidated due to overwriting fill response.

BpTlbRel
Core::X86::Pmc::Core::BpTlbRel - ITLB Reloads

The number of ITLB reload requests.

IcOcModeSwitch
Core::X86::Pmc::Core::IcOcModeSwitch - OC Mode Switch

This event has the following units which may be used to modify
the behavior of the event:

OcIcModeSwitch
OC to IC mode switch

IcOcModeSwitch
IC to OC mode switch

DeDisDispatchTokenStalls0
Core::X86::Pmc::Core::DeDisDispatchTokenStalls0 - Dynamic
Tokens Dispatch Stall Cycles 0

Cycles where a dispatch group is valid but does not get
dispatched due to a token stall.

This event has the following units which may be used to modify
the behavior of the event:

RetireTokenStall
RETIRE Tokens unavailable

AGSQTokenStall
AGSQ Tokens unavailable

ALUTokenStall
ALU tokens total unavailable

ALSQ3_0_TokenStall

ALSQ3TokenStall
ALSQ 3 Tokens unavailable

ALSQ2TokenStall
ALSQ 2 Tokens unavailable

ALSQ1TokenStall
ALSQ 1 Tokens unavailable

ExRetInstr
Core::X86::Pmc::Core::ExRetInstr - Retired Instructions

ExRetCops
Core::X86::Pmc::Core::ExRetCops - Retired Uops

The number of uOps retired. This includes all processor
activity (instructions, exceptions, interrupts, microcode
assists, etc.). The number of events logged per cycle can vary
from 0 to 4.

ExRetBrn
Core::X86::Pmc::Core::ExRetBrn - Retired Branch Instructions

The number of branch instructions retired. This includes all
types of architectural control flow changes, including
exceptions and interrupts.

ExRetBrnMisp
Core::X86::Pmc::Core::ExRetBrnMisp - Retired Branch
Instructions Mispredicted

The number of branch instructions retired, of any type, that
were not correctly predicted. This includes those for which
prediction is not attempted (far control transfers, exceptions
and interrupts).

ExRetBrnTkn
Core::X86::Pmc::Core::ExRetBrnTkn - Retired Taken Branch
Instructions

The number of taken branches that were retired. This includes
all types of architectural control flow changes, including
exceptions and interrupts.

ExRetBrnTknMisp
Core::X86::Pmc::Core::ExRetBrnTknMisp - Retired Taken Branch
Instructions Mispredicted

The number of retired taken branch instructions that were
mispredicted.

ExRetBrnFar
Core::X86::Pmc::Core::ExRetBrnFar - Retired Far Control
Transfers

The number of far control transfers retired including far
call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and
interrupts. Far control transfers are not subject to branch
prediction.

ExRetBrnResync
Core::X86::Pmc::Core::ExRetBrnResync - Retired Branch Resyncs

The number of resync branches. These reflect pipeline restarts
due to certain microcode assists and events such as writes to
the active instruction stream, among other things. Each
occurrence reflects a restart penalty similar to a branch
mispredict. This is relatively rare.

ExRetNearRet
Core::X86::Pmc::Core::ExRetNearRet - Retired Near Returns

The number of near return instructions (RET or RET Iw) retired.

ExRetNearRetMispred
Core::X86::Pmc::Core::ExRetNearRetMispred - Retired Near
Returns Mispredicted

The number of near returns retired that were not correctly
predicted by the return address predictor. Each such mispredict
incurs the same penalty as a mispredicted conditional branch
instruction.

ExRetBrnIndMisp
Core::X86::Pmc::Core::ExRetBrnIndMisp - Retired Indirect Branch
Instructions Mispredicted

ExRetMmxFpInstr
Core::X86::Pmc::Core::ExRetMmxFpInstr - Retired MMXTM/FP
Instructions

The number of MMX, SSE or x87 instructions retired. The
UnitMask allows the selection of the individual classes of
instructions as given in the table. Each increment represents
one complete instruction. Since this event includes non-
numeric instructions it is not suitable for measuring MFLOPS.

This event has the following units which may be used to modify
the behavior of the event:

SseInstr
SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, SSE41,
SSE42, AVX).

MmxInstr
MMX instructions.

X87Instr
x87 instructions

ExRetCond
Core::X86::Pmc::Core::ExRetCond - Retired Conditional Branch
Instructions

ExDivBusy
Core::X86::Pmc::Core::ExDivBusy - Div Cycles Busy count

ExDivCount
Core::X86::Pmc::Core::ExDivCount - Div Op Count

ExTaggedIbsOps
Core::X86::Pmc::Core::ExTaggedIbsOps - Tagged IBS Ops

This event has the following units which may be used to modify
the behavior of the event:

IbsCountRollover
Number of times an op could not be tagged by IBS
because of a previous tagged op that has not retired.

IbsTaggedOpsRet
Number of Ops tagged by IBS that retired

IbsTaggedOps
Number of Ops tagged by IBS

ExRetFusBrnchInst
Core::X86::Pmc::Core::ExRetFusBrnchInst - Retired Fused Branch
Instructions

The number of fused retired branch instructions retired per
cycle. The number of events logged per cycle can vary from 0 to
3.

L2RequestG1
Core::X86::Pmc::Core::L2RequestG1 - Requests to L2 Group1

This event has the following units which may be used to modify
the behavior of the event:

RdBlkL

RdBlkX

LsRdBlkC_S

CacheableIcRead

ChangeToX

PrefetchL2
Assume core should also count these and allow the
breakdown between H/W vs. S/W and LS vs. IC.

L2HwPf

OtherRequests
Events covered by Core::X86::Pmc::Core::L2RequestG2.

L2RequestG2
Core::X86::Pmc::Core::L2RequestG2 - Requests to L2 Group2

Multi-events in that LS and IF requests can be received
simultaneous.

This event has the following units which may be used to modify
the behavior of the event:

Group1 All Group 1 commands not in unit0.

LsRdSized
RdSized, RdSized32, RdSized64.

LsRdSizedNC
RdSizedNC, RdSized32NC, RdSized64NC.

IcRdSized

IcRdSizedNC

SmcInval

BusLocksOriginator

BusLocksResponses

L2Latancy
Core::X86::Pmc::Core::L2Latancy - L2 Latency

Total cycles spent waiting for L2 fills to complete from L3 or
memory, divided by four. This may be used to calculate average
latency by multiplying this count by four and then dividing by
the total number of L2 fills (unit mask
Core::X86::Pmc::Core::L2RequestG1 == FEh). Event counts are for
both threads. To calculate average latency, the number of fills
from both threads must be used.

This event has the following units which may be used to modify
the behavior of the event:

L2CyclesWaitingOnFills

L2WbcReq
Core::X86::Pmc::Core::L2WbcReq - LS to L2 WBC requests

This event has the following units which may be used to modify
the behavior of the event:

WcbWrite

WcbClose

CacheLineFlush

I_LineFlush

ZeroByteStore
This becomes WriteNoData at SDP; this count does not
include DVM Sync Ops and bus locks which are counted in
Core::X86::Pmc::Core::L2RequestG2.

LocalIcClr
Local IC Clear

CLZero Cache Line Zero

L2CacheReqStat
Core::X86::Pmc::Core::L2CacheReqStat - Core to L2 Cacheable
Request Access Status

This event does not count accesses to the L2 cache by the L2
prefetcher, but it does count accesses by the L1 prefetcher.

This event has the following units which may be used to modify
the behavior of the event:

LsRdBlkCS
LS ReadBlock C/S Hit

LsRdBlkLHitX
LS Read Block L Hit X

LsRdBlkLHitS
LsRdBlkL Hit Shared

LsRdBlkX
LsRdBlkX/ChgToX Hit X. Count RdBlkX finding Shared as a
Miss.

LsRdBlkC
LS Read Block C S L X Change to X Miss

IcFillHitX
IC Fill Hit Exclusive Stale

IcFillHitS
IC Fill Hit Shared

IcFillMiss
IC Fill Miss

L2FillPending
Core::X86::Pmc::Core::L2FillPending - Cycles with fill pending
from L2

Total cycles spent with one or more fill requests in flight
from L2.

This event has the following units which may be used to modify
the behavior of the event:

L2FillBusy.

NAME

DESCRIPTION

SEE ALSO