MAC_CAPAB_RINGS(9E)          Driver Entry Points         MAC_CAPAB_RINGS(9E)
NAME
     mac_capab_rings - MAC ring capability
SYNOPSIS
     #include <sys/mac_provider.h>     typedef struct mac_capab_rings_s mac_capab_rings_t;INTERFACE LEVEL
     Uncommitted - This interface is still evolving.  API and ABI stability
     is not guaranteed.
DESCRIPTION
     The 
MAC_CAPAB_RINGS capability provides a means for device drivers to
     take advantage of the additional resources offered by hardware beyond
     the basic operations to transmit and receive.  There are two primary
     concepts that this MAC capability relies on: rings and groups.
     The 
ring is a abstract concept which must be mapped to some hardware
     construct by the driver.  It typically takes the form of a DMA memory
     region which is divided into many smaller units, called descriptors or
     entries.  Each entry in the ring describes a location in memory of a
     packet, which the hardware is to read from (to transmit it) or write to
     (upon reception).  Entries also typically contain metadata and
     attributes about the packet.  These entries are typically arranged in a
     fixed-size circular buffer (hence the "ring" name) which is shared
     between the operating system and the hardware via the DMA-backed
     memory.  Most NICs, regardless of their support for this capability,
     use something resembling a descriptor ring under the hood.  Some
     vendors may also refer to rings as 
queues.  The ring concept is
     intentionally general, so that more unusual underlying hardware
     constructs can also be used to implement it.
     A collection of one or more rings is called a 
group.  Each group
     usually has a collection of filters that can be associated with them.
     These filters are usually defined in terms of matching something like a
     MAC address, VLAN, or Ethertype, though more complex filters may exist
     in hardware.  When a packet matches a filter, it will then be directed
     to the group and eventually delivered to one of the rings in the group.
     In the MAC framework, rings and groups are separated into categories
     based on their purpose: transmitting and receiving.  While the MAC
     framework thinks of transmit and receive rings as different physical
     constructs, they may map to the same underlying resources in the
     hardware.  The device driver may implement the MAC_CAPAB_RINGS
     capability for one of transmitting, receiving, or both.
   Mapping Hardware to Rings and Groups
     There are many different ways that hardware resources may map to this
     capability.  Consider the following examples:
     1.   Hardware may support a feature commonly known as receive side
          scaling (RSS).  With RSS, the hardware has multiple rings and uses
          a hash function calculated over packet headers to choose which
          ring receives a particular packet.  Rings are associated with
          different interrupts, allowing multiple rings to be processed in
          parallel.  Supporting RSS in isolation would result in a device
          which has a single group, and multiple rings within that group.
     2.   Some hardware may have a single ring, but still support multiple
          receive filters.  This is commonly seen with some 1 GbE devices.
          While the hardware only has one ring, it has support for multiple
          independent MAC address filters, each of which can be programmed
          to receive traffic for a single MAC address.  The driver should
          map this situation to a single group with a single ring.  However,
          it would implement the ability to program several filters.  While
          this may not seem useful at first, when virtual NICs are created
          on top of a physical NIC, the additional hardware filters will be
          used to avoid putting the device in promiscuous mode.
     3.   Finally, some hardware has many rings, which can be placed in many
          different groups.  Each group has its own filtering capabilities.
          For such hardware, the device driver would declare support for
          multiple groups, each of which has its own independent set of
          rings.
     When choosing hardware constructs to implement rings and groups, it is
     also important to consider interrupts.  In order to support polling,
     each receive ring must be able to independently toggle whether that
     ring will generate an interrupt on packet reception, even when many
     rings share the same hardware level interrupt (e.g. the same MSI or
     MSI-X interrupt number and handler).
   Filters
     The 
mac_group_info(9S) structure is used to define several different
     kinds of filters that the group might implement.  There are three
     different classes of filters that exist:     
MAC Address             A given frame matches a MAC Address filter if the receive
             address in the Ethernet Header matches the specified MAC
             address.     
VLAN    A given frame matches a VLAN filter if it both has an 802.1Q
             VLAN tag and that tag matches the VALN number specified in the
             filter.  If the frame's outer ethertype is not 0x8100, then the
             filter will not match.     
MAC Address and VLAN             A given frame matches a MAC Address and VLAN filter if it
             matches both the specified MAC address and the specified VLAN.
             This is constructed as a logical AND of the previous two
             filters.  If only one of the two matches, then the frame does
             not match this filter.
             Note: this filter type is still under development and has not
             been plumbed through our APIs yet.
     Devices may support many different filter types.  If the hardware
     resources required for a combined filter type (e.g. MAC Address and
     VLAN) are similar to the resources required for each in isolation,
     drivers should prefer to implement just the combined type and should
     not implement the individual types.
     The MAC framework assumes that the following rules hold regarding
     filters:
     1.   When there are multiple filters of the same kind with different
          addresses, then the hardware will accept a frame if it matches 
ANY          of the specified filters.  In other words, if there are two VLAN
          filters defined, one for VLAN 23 and one for VLAN 42, then if a
          frame has either VLAN 23 or VLAN 42, it will be accepted for the
          group.
     2.   If multiple different classes of filters are defined, then the
          hardware should only accept a frame if it passes 
ALL of the filter
          classes.  For example, if there is a MAC address filter and a
          separate VLAN filter, the hardware will only accept the frame if
          it passes both sets of filters.
     3.   If there are multiple different classes of filters and there are
          multiple filters present in each class, then the driver will
          accept a packet as long as it matches 
ALL filter classes.
          However, within a given filter class, it may match 
ANY of the
          filters.  See the following boolean logic as an alternative way to
          phrase this case:
                match = MAC && VLAN
                MAC = 00:11:22:33:44:55 OR 00:66:77:88:99:aa OR ...
                VLAN = 11 OR 12 OR ...
     The following pseudocode summarizes the behavior for a device that
     supports independent MAC and VLAN filters.  If the hardware only
     supports a single family of filters, then simply treat that in the
     pseudocode as though it is always true:
           for each packet p:
               for each MAC filter m:
                   if m matches p's mac:
                       for each VLAN filter v:
                           if v matches p's vlan:
                               accept p for group
                               proceed to next packet
               reject packet p
               proceed to next packet
     The following pseudocode summarizes the behavior for a device that
     supports a combined MAC address and VLAN filter:
           for each packet p:
               for each filter f:
                   if f.mac matches p's mac and f.vlan matches p's vlan:
                       accept p for group
                       proceed to next packet
               reject packet p
               proceed to next packet
   MAC Capability Structure
     When the device driver's 
mc_getcapab(9E) function entry point is called
     with the capability requested set to MAC_CAPAB_RINGS, then the value of
     the capability structure is a pointer to a 
mac_capab_rings_t structure
     with the following members:
           mac_ring_type_t         mr_type;
           mac_group_type_t        mr_group_type;
           uint_t                  mr_rnum;
           uint_t                  mr_gnum;
           mac_get_ring_t          mr_rget;
           mac_get_group_t         mr_gget;
     If the driver supports the MAC_CAPAB_RINGS capability, then it should
     first check the 
mr_type member of the structure.  This member has the
     following possible values:
     MAC_RING_TYPE_RX
                   Indicates that this group is for receive rings.
     MAC_RING_TYPE_TX
                   Indicates that this group is for transmit rings.
     The driver will be asked to fill in this capability structure
     separately for receive and transmit groups and rings.  This allows a
     driver to have different entry points for each type.  If neither of
     these values is specified, then the device driver must return B_FALSE
     from its 
mc_getcapab(9E) entry point.  Once it has identified the type,
     it should fill in the capability structure based on the following
     rules:     
mr_type       The 
mr_type member is used to indicate whether this group
                   is for transmit or receive rings.  The 
mr_type member
                   should not be modified by the device driver.  It is set
                   by the MAC framework when the driver's 
mc_getcapab(9E)                   entry point is called.  As indicated above, the driver
                   must check the value to determine which group this                   
mc_getcapab(9E) call is referring to.     
mr_group_type                   This member is used to indicate the group type.  This
                   should be set to MAC_GROUP_TYPE_STATIC, which indicates
                   that the assignment of rings to groups is fixed, and each
                   ring can only ever belong to one specific group.  The
                   number of rings per group may vary on the group and can
                   be set by the driver.     
mr_rnum       This indicates the total number of rings that are
                   available.  The number exposed may be less than the
                   number supported in hardware.  This is often due to
                   receiving fewer resources such as interrupts.     
mr_gnum       This indicates the total number of groups that are
                   available from hardware.  The number exposed may be less
                   than the number supported in hardware.  This is often due
                   to receiving fewer resources such as interrupts.
                   When working with transmit rings, this value may be zero.
                   In this case, each ring is treated independently and
                   separate groups for each transmit ring are not required.     
mr_rget       This member is a function pointer that will be called to
                   provide information about a ring inside of a specific
                   group.  See 
mr_rget(9E) for information on the function,
                   its signature, and responsibilities.     
mr_gget       This member is a function pointer that will be called to
                   provide information about a group.  See 
mr_gget(9E) for
                   information on the function, its signature, and
                   responsibilities.
DRIVER IMPLICATIONS
   MAC Callback Entry Points
     When a driver implements the MAC_CAPAB_RINGS capability, then it must
     not implement some of the traditional MAC callbacks.  If the driver
     supports MAC_CAPAB_RINGS for receiving, then it must not implement the     
mc_unicst(9E) entry point.  This is instead handled through the filters
     that were described earlier.  The filter entry points are defined as
     part of the 
mac_group_info(9S) structure.
     If the driver supports MAC_CAPAB_RINGS for transmitting, then it should
     not implement the 
mc_tx(9E) entry point, it will not be used.  The MAC
     framework will instead use the 
mri_tx(9E) entry point that is provided
     by the driver in the 
mac_ring_info(9S) structure.
   Locking and Concurrency
     One of the main points of the MAC_CAPAB_RINGS capability is to increase
     the parallelism and concurrency that is actively going on in the
     driver.  This means that a driver may be asked to transmit, poll, or
     receive interrupts on all of its rings in parallel.  This usually calls
     for fine-grained locking in a driver's own data structures to ensure
     that the various rings can be populated and used without having to
     block on one another.  In general, most drivers have their own
     independent set of locks for each transmit and receive ring.  They also
     usually have separate locks for each group.
     Just because one driver performs locking in one way, does not mean that
     one has to mimic it.  The design of a driver and its locking is often
     tightly coupled to how the underlying hardware works and its
     complexity.
   Polling on rings
     When the MAC_CAPAB_RINGS capability is implemented, then additional
     functionality for receiving becomes available.  A receive ring has the
     ability to be polled.  When the operating system desires to begin
     polling the ring, it will make a function call into the driver, asking
     it to receive packets from this ring.  When receiving packets while
     polling, the process is generally identical to that described in the     
Receiving Data section of 
mac(9E).  For more details, see 
mri_poll(9E).
     When the MAC framework wants to enable polling, it will first turn off
     interrupts through the 
mi_disable(9E) entry point on the driver.  The
     driver must ensure that there is proper serialization between the
     interrupt enablement, interrupt disablement, the interrupt handler for
     that ring, and the 
mri_poll(9E) entry point.  For more information on
     the locking requirements related to polling, see the discussions in     
mri_poll(9E) and 
mi_disable(9E).
   Updated callback functions
     When using rings, two of the primary functions that were used change.
     First, the 
mac_rx(9F) function should be replaced with the     
mac_rx_ring(9F) function.  Secondly, the 
mac_tx_update(9F) function
     should be replaced with the 
mac_tx_ring_update(9F) function.
   Interrupt and Ring Mapping
     Drivers often vary the number of rings that they expose based on the
     number of interrupts that exist.  When a driver only supports a single
     group, there is often no reason to have more rings than interrupts.
     However, most hardware supports a means of having multiple rings tie to
     the same interrupt.  Drivers then tie the rings in different groups to
     the same interrupts and therefore when an interrupt is triggered,
     iterate over all of the rings.
     Tying multiple rings together into a single interrupt should only be
     done if hardware has the ability to control whether or not each ring
     contributes to the interrupt.  For the 
mi_disable(9E) entry point to
     work, each ring must be able to independently control whether or not
     receipt of a packet generates the shared interrupt.
   Filter Management
     As part of general operation, the device driver will be asked to add
     various filters to groups.  The MAC framework does not keep track of
     the assigned filters in such a way that after a device reset that
     they'll be given to the driver again.  Therefore, it is recommended
     that the driver keep track of all filters it has assigned such that
     they can be reinstated after a driver or system initiated device reset
     of some kind.  There is no need to persist anything across a call to     
detach(9E) or similar.
     For more information, see the 
TX STALL DETECTION, DEVICE RESETS, AND     FAULT MANAGEMENT section of 
mac(9E).   
Broadcast, Multicast, and Promiscuous Mode     Rings and groups are currently designed to emphasize and enhance the
     receipt of filtered, unicast frames.  This means that special handling
     is required when working with broadcast traffic, multicast traffic, and
     enabling promiscuous mode.  This only applies to receive groups and
     rings.
     By default, only the first group with index zero, sometimes called the
     default group, should ever be programmed to receive broadcast traffic.
     This group should always be programmed to receive broadcast traffic,
     the same way that the broader device is programmed to always receive
     broadcast traffic when the MAC_CAPAB_RINGS capability has not been
     negotiated.
     When multicast addresses are assigned to the device through the     
mc_multicst(9E) entry point, those should also be assigned to the first
     group.
     Similarly, when enabling promiscuous mode, the driver should only
     enable promiscuous traffic to be received by the first group.
     No other groups or rings should ever receive broadcast, multicast, or
     promiscuous mode traffic.
SEE ALSO
     mac(9E), 
mc_getcapab(9E), 
mc_multicst(9E), 
mc_tx(9E), 
mc_unicst(9E),     
mi_disable(9E), 
mr_gaddring(9E), 
mr_gget(9E), 
mr_gremring(9E),     
mr_rget(9E), 
mri_poll(9E), 
mac_rx(9F), 
mac_rx_ring(9F),     
mac_tx_ring_update(9F), 
mac_tx_update(9F), 
mac_group_info(9S)illumos                         July 17, 2023                        illumos