INTRO(9E) Driver Entry Points INTRO(9E)
NAME
Intro - introduction to device driver entry points
DESCRIPTION
Section 9E of the manual describes the entry points and building blocks
that are used to build and implement all kinds of device drivers and
kernel modules. Often times, modules and device drivers are talked
about interchangeably. The operating system is built around the idea
of loadable kernel modules. Device drivers are the primary type that
we think about; however, there are loadable kernel modules for file
systems, STREAMS devices, and even system calls!
The vast majority of this section focuses on documenting device (and
STREAMS) drivers. Device driver are further broken down into different
categories depending on what they are targeting. For example, there
are dedicated frameworks for SCSI/SAS HBA drivers, networking drivers,
USB drivers, and then general character and block device drivers.
While most of the time we think about device drivers as corresponding
to a piece of physical hardware, there are also pseudo-device drivers
which are device drivers that provide functionality, but aren't backed
by any hardware. For example,
dtrace(4D) and
lofi(4D) are both pseudo-
device drivers.
To help understand the relationship between these different types of
things, consider the following image:
+--------------------+
| |
| Loadable Modules |
| |
+--------------------+
| +--------------+ +------------+
| | | | |
+------------------------->| Cryptography | ... | Scheduling | ...
| | | | |
| +--------------+ +------------+
| +----------------+ +--------------+ +--------------+
| | | | | | |
+-->| Device Drivers | ... | File Systems | ... | System Calls | ...
| | | | | |
+----------------+ +--------------+ +--------------+
v
+-----------+
|
| +------------+ +---------+ +-----------+ +-----------+
+-->| Networking |->|
igb(4D) | ... |
mlxcx(4D) | ... |
cxgbe(4D) | ...
| +------------+ +---------+ +-----------+ +-----------+
|
| +-------+ +----------+ +-------------+ +----------+
+-->| HBA |------>|
smrt(4D) | ... |
mpt_sas(4D) | ... |
ahci(4D) | ...
| +-------+ +----------+ +-------------+ +----------+
|
| +-------+ +--------------+ +----------+ +---------+
+-->| USB |------>|
scsa2usb(4D) | ... |
ccid(4D) | ... |
hid(4D) | ...
| +-------+ +--------------+ +----------+ +---------+
|
| +---------+ +-------------+ +-------------+
+-->| Sensors |---->|
smntemp(4D) | ... |
pchtemp(4D) | ...
| +---------+ +-------------+ +-------------+
|
+-------+-------------+-----------+----------+
| v V |
v +-----------+ +-----+ v
+-------+ | Character | | USB | +-------+
| Audio | | and Block | | HCD | | Nexus | ...
+-------+ | Devices | +-----+ +-------+
+-----------+
The above diagram attempts to explain some of the relationships that
were mentioned above at a high level. All device drivers are loadable
modules that leverage the
modldrv(9S) structure and implement similar
_
init(9E) and _
fini(9E) entry points.
Some hardware implements more than one type of thing. The most common
example here would be a NIC that implements a temperature sensor or a
current sensor. Many devices also implement and leverage the kernel
statistics framework called "kstats". A device driver is not strictly
limited to only a single class of thing. For example, many USB client
devices are networking device drivers. In the subsequent sections
we'll go into the functions and structures that are related to creating
the different device drivers and their associated functions.
Kernel Initialization
To begin with, all loadable modules in the system are required to
implement three entry points. If these entry points are not present,
then the module cannot be installed in the system. These entry points
are _
init(9E), _
fini(9E), and _
info(9E).
The _
init(9E) entry point will be the first thing called in the module
and this is where any global initialization should be taken care of.
Once all global state has been successfully created, the driver should
call
mod_install(9F) to actually register with the system. Conversely,
_
fini(9E) is used to tear down the module. The driver uses
mod_remove(9F) to first remove the driver from the system and then it
can tear down any global state that was added there.
While we mention global state here, this isn't widely used in most
device drivers. A device driver can have multiple instances
instantiated, one for each instance of a hardware device that is found
and most state is tied to those instances. We'll discuss that more in
the next section.
The _
info(9E) entry point these days just calls
mod_info(9F) directly
and can return it.
All of these entry points directly or indirectly require a
struct modlinkage. This structure is used by all types of loadable kernel
modules and is filled in with information that varies based on the type
of module one is creating. Here, everything that we're creating is
going to use a
struct modldrv, which describes a loadable driver.
Every device driver will declare a static global variable for these and
fill them out. They are documented in
modlinkage(9S) and
modldrv(9S) respectively.
The following is an example of these structures borrowed from
igc(4D):
static struct modldrv igc_modldrv = {
.drv_modops = &mod_driverops,
.drv_linkinfo = "Intel I226/226 Ethernet Controller",
.drv_dev_ops = &igc_dev_ops
};
static struct modlinkage igc_modlinkage = {
.ml_rev = MODREV_1,
.ml_linkage = { &igc_modldrv, NULL }
};
From this there are a few important things to take away. A single
kernel module may implement more than one type of linkage, though this
is the exception and not the norm. The second part to call out here is
that while the
drv_modops will be the same for all drivers that use the
struct modldrv, the
drv_linkinfo and
drv_dev_ops will be unique to each
driver. The next section discusses the
struct dev_ops.
The Devices Tree and Instances
Device drivers have a unique challenge that makes them different from
other kinds of loadable modules: there may be very well more than a
single instance of the hardware that they support. Consider a few
examples: a user can plug in two distinct USB mass storage devices or
keyboards. A system may have more than one NIC present or the hardware
may expose multiple physical ports as distinct devices. Many systems
have more than one disk device. Conversely, if a given piece of
hardware isn't present then there's no reason for the driver for it to
be loaded. There is nothing that the Intel 1 GbE Ethernet NIC driver,
igb(4D), can do if there are no supported devices plugged in.
Devices are organized into a tree that is full of parent and child
relationships. This tree is what you see when you run
prtconf(8). As
an example, a USB device is plugged into a port on a hub, which may be
plugged into another hub, and then is eventually plugged into a PCI
device that is the USB host controller, which itself may be under a
PCI-PCI bridge, and this chain continues all the way up to the root of
the tree, which we call "rootnex". Device drivers that can enumerate
children and provide operations for them are called "nexus" drivers.
The system automatically fills out the device tree through a
combination of built-in mechanisms and through operations on other
nexus drivers. When a new hardware unit is discovered, a
dev_info_t structure, the device information, is created for it and it is linked
into the tree. Generally, the system can then use automatic
information embedded in the device to determine what driver is
responsible for the piece of hardware through the use of the
"compatible" property which the systems and nexus drivers set up on
their children. For example, PCI and PCIe drivers automatically set up
the compatible property based on information discovered in PCI
configuration space like the device's vendor, device ID, and class IDs.
The same is true of USB.
When a device driver is packaged, it contains metadata that indicates
which devices it supports. For example, the aforementioned igb driver
will have a rule that it matches "pciex8086,10a7". When the kernel
discovers a device with this alias present, it will know that it should
assign it to the igb driver and then it will assign the
dev_info_t structure a new instance number.
To emphasize here, each time the device is discovered in the tree, it
will have an independent instance number and an independent
dev_info_t that accompanies it. Each instance has an independent life time too.
The most obvious way to think about this is with something that can be
physically removed while the system is on, like a USB device. Just
because you pull one USB keyboard doesn't mean it impacts the other one
there. They are inherently different devices (albeit if they were
plugged into the same HUB and the HUB was removed, then they both would
be removed; however, each would be acted on independently).
Here is a slimmed down example from a system's
prtconf(8) output:
Oxide,Gimlet (driver name: rootnex)
scsi_vhci, instance #0 (driver name: scsi_vhci)
pci, instance #0 (driver name: npe)
pci1022,1480, instance #13 (driver name: amdzen_stub)
pci1022,164f
pci1022,1482
pci1de,fff9, instance #0 (driver name: pcieb)
pci1344,3100, instance #4 (driver name: nvme)
blkdev, instance #10 (driver name: blkdev)
pci1022,1482
pci1022,1482
pci1de,fff9, instance #1 (driver name: pcieb)
pci1b96,0, instance #7 (driver name: nvme)
blkdev, instance #0 (driver name: blkdev)
pci1de,fff9, instance #2 (driver name: pcieb)
pci1b96,0, instance #8 (driver name: nvme)
blkdev, instance #4 (driver name: blkdev)
pci1de,fff9, instance #3 (driver name: pcieb)
pci1b96,0, instance #10 (driver name: nvme)
blkdev, instance #1 (driver name: blkdev)
From this we can see that there are multiple instances of the NVMe
(nvme), PCIe bridge (pcieb), and generic block device (blkdev) driver
present. Each of these has their own
dev_info_t and has their various
entry points called in parallel. With that, let's dig into the
specifics of what the
struct dev_ops actually is and the different
operations to be aware.
struct dev_ops The device operations structure,
struct dev_ops, controls all of the
basic entry points that a loadable device contains. This is something
that every driver has to implement, no matter the type. The most
important things that will be present are the
devo_attach and
devo_detach members which are used to create and destroy instances of
the driver and then a pointer to any subsequent operations that exist,
such as the
devo_cb_ops, which is used for character and block device
drivers and the
devo_bus_ops, which is used for nexus drivers.
Attach and detach are the most important entry points in this
structure. This could be practically thought of as the "main" function
entry point for a device driver. This is where any initialization of
the instance will occur. This would include many traditional things
like setting up access to registers, allocating and assigning
interrupts, and interfacing with the various other device driver
frameworks such as
mac(9E).
The actions taken here are generally device-specific, while certain
classes of devices (e.g. PCI, USB, etc.) will have overlapping
concerns. In addition, this is where the driver will take care of
creating anything like a minor node which will be used to access it by
userland software if it's a character or block device driver.
There is generally a per-instance data structure that a driver creates.
It may do this by calling
kmem_zalloc(9F) and assigning the structure
with the
ddi_set_driver_private(9F) entry point or it may use the DDI's
soft state management functions rooted in
ddi_soft_state_init(9F). A
driver should try to tie as much state to the instance as possible,
where possible. There should not be anything like a fixed size global
array of possible instances. Someone usually finds a way to attach
many more instances of some type of hardware than you might expect!
The
attach(9E) and
detach(9E) entry points both have a unique command
argument that is used to describe a specific action that is going on.
This action may be a normal attach or it could be related to putting
the system into the ACPI S3 sleep or similar state with the suspend and
resume commands.
The following table are the common functions that most drivers end up
having to think a little bit about:
struct dev_ops:
attach(9E) detach(9E) getinfo(9E) quiesce(9E) Briefly, the
getinfo(9E) entry point is used to map between instances
of a device driver and the minor nodes it creates. Drivers that
participate in a framework like the SCSI HBA, Networking, or related
don't usually end up implementing this. However, drivers that manually
create minor nodes generally do. The
quiesce(9E) entry point is used
as part of the fast reboot operation. It is basically intended to stop
and/or reset the hardware and discard any ongoing I/O. For pseudo-
device drivers or drivers which do not perform I/O, they can use the
symbol `ddi_quiesce_not_needed' in lieu of a standard implementation.
In addition, the following additional entry points exist, but are less
commonly required either because the system generally takes care of it,
such as
probe(9E).
identify(9E) power(9E) probe(9E) For more information on the structure, see also
dev_ops(9S). The
following are a few examples of the
struct dev_ops structure from a few
drivers. We recommend using the C99 style for all new instances.
static struct dev_ops ksensor_dev_ops = {
.devo_rev = DEVO_REV,
.devo_refcnt = 0,
.devo_getinfo = ksensor_getinfo,
.devo_identify = nulldev,
.devo_probe = nulldev,
.devo_attach = ksensor_attach,
.devo_detach = ksensor_detach,
.devo_reset = nodev,
.devo_power = ddi_power,
.devo_quiesce = ddi_quiesce_not_needed,
.devo_cb_ops = &ksensor_cb_ops
};
static struct dev_ops igc_dev_ops = {
.devo_rev = DEVO_REV,
.devo_refcnt = 0,
.devo_getinfo = NULL,
.devo_identify = nulldev,
.devo_probe = nulldev,
.devo_attach = igc_attach,
.devo_detach = igc_detach,
.devo_reset = nodev,
.devo_quiesce = ddi_quiesce_not_supported,
.devo_cb_ops = &igc_cb_ops
};
static struct dev_ops pchtemp_dev_ops = {
.devo_rev = DEVO_REV,
.devo_refcnt = 0,
.devo_getinfo = nodev,
.devo_identify = nulldev,
.devo_probe = nulldev,
.devo_attach = pchtemp_attach,
.devo_detach = pchtemp_detach,
.devo_reset = nodev,
.devo_quiesce = ddi_quiesce_not_needed
};
Character and Block Operations
In the history of UNIX, the most common device drivers that were
created were for block and character devices. The interfaces in block
and character devices are usually in service of common I/O patterns
that the system exposes. For example, when you call
open(2),
ioctl(2),
or
read(2) on a device, it goes through the device's corresponding
entry point here. Both block and character devices operate on the
shared
struct cb_ops structure, with different members being expected
for both of them. While they both require that someone implement the
cb_open and
cb_close members, block devices perform I/O through the
strategy(9E) entry point and support the
dump(9E) entry point for
kernel crash dumps, while character devices implement the more
historically familiar
read(9E), write(9E,) and the
devmap(9E) entry
point for supporting memory-mapping.
While the device operations structures worked with the
dev_info_t structure and there was one per-instance, character and block
operations work with minor nodes: named entities that exist in the file
system. UNIX has long had the idea of a major and minor number that is
encoded in the
dev_t which is embedded in the file system, which is
what you see in the
st_rdev member of stat structure when you call
stat(2). The major number is assigned to the driver
as a whole, not an
instance. The minor number space is shared between all instances of a
driver. Minor node numbers are assigned by the driver when it calls
ddi_create_minor_node(9F) to create a minor node and when one of its
character or block entry points are called, it will get this minor
number back and it must translate it to the corresponding instance on
its own.
A special property of the
open(9E) entry point is that it can change
the minor number a client gets during its call to open which it will
use for all subsequent calls. This is called a "cloning" open.
Whether this is used or not depends on the type of driver that you are
creating. For example, many pseudo-device drivers like DTrace will use
this so each client has its own state. Similarly, devices that have
certain internal locking and transaction schemes will give each caller
a unique minor. The
ccid(4D) and
nvme(4D) driver are examples of this.
However, many drivers will have just a single minor node per instance
and just say that the minor node's number is the instance number,
making it very simple to figure out the mapping. When it's not so
simple, often an AVL tree or some other structure is used to help map
this together.
The following entry points are generally used for character devices:
ioctl(9E) The I/O control or ioctl entry point is used extensively
throughout the system to perform different kinds of operations.
These operations are often driver specific, though there are
also some which are also common operations that are used across
multiple devices like the disk operations described in
dkio(4I) or the ioctls that are used under the hood by
cfgadm(8) and
friends.
Whether a driver supports ioctls or not depends on it. If it
does, it is up to the driver to always perform any requisite
privilege and permission checking as well as take care in
copying in and out any kind of memory from the user process
through calls like
ddi_copyin(9F) and
ddi_copyout(9F).
The ioctl interface gives the driver writer great flexibility
to create equally useful or hard to consume interfaces. When
crafting a new committed interface over an ioctl, take care to
ensure there is an ability to version the structure or use
something that has more flexibility like a
nvlist_t. See the
`Copying Data to and from Userland' section of
Intro(9F) for
more information.
read(9E),
write(9E),
aread(9E), and
awrite(9E) These are the classic I/O routines of the system. A driver's
read and write routines operate on a
uio(9S) structure which
describes the I/O that is occurring, the offset into the device
that the I/O should occur at, and has various flags that
describe properties of the I/O request, such as whether or not
it is a non-blocking request.
The majority of device drivers that implement these entry
points are using them to create some kind of file-like
abstraction for a device. For example, the
ccid(4D) driver
uses these interfaces for submitting commands and reading
responses back from an underlying device.
For most use cases
read(9E) and
write(9E) are sufficient;
however, the
aread(9E) and
awrite(9E) are versions that tie
into the kernel's asynchronous I/O engine.
chpoll(9E) This entry point allows a device to be polled by user code for
an event of interest and connects through the kernel to
different polling mechanisms such as
poll(2),
port_get(3C), and
many others. Currently this interface only allows a driver to
define the classic poll style events such as POLLIN, POLLOUT,
and POLLHUP. The exact semantics of these are up to the
driver; however, it is expected that the read and write
oriented semantics of the various events will be honored by the
device driver.
devmap(9E) and
segmap(9E) These are entry points that are used to set up memory mappings
for a device and replace the older
mmap(9E) entry point. When
a function calls
mmap(2) on a device, it'll reach these,
starting with the
devmap(9E) entry point. The driver is
responsible for confirming that the mappings request and its
semantics are sensible, after which it will set up memory for
consumption. The
devmap(9E) manual page has more details on
the specifics here and the related entry points that can be
implemented as part of the
devmap_callback_ctl(9S) structures
such as
devmap_access(9E). The segment mapping is an optional
part that provides some additional controls for a driver such
as assigning certain mapping attributes or wanting to maintain
separate contexts for different mappings. See
segmap(9E) for
more information. It is common for drivers to just provide a
devmap(9E) entry point.
prop_op(9E) This entry point is used for drive's to manage and deal with
property creation. While this is its own entry point, most
callers can just specify
ddi_prop_op(9F) for this and don't
need any special handling.
The following entry points are used uniquely used for block devices:
strategy(9E) A driver's strategy entry point is used to actually perform I/O
as described by the
buf(9S) structure. It is responsible for
allocating all resources and then initiating the actual
request. The actual request will finish potentially
asynchronously through calls to
biodone(9F) or
bioerror(9F).
HBA or blkdev-based drivers do not usually end up implementing
this interface.
dump(9E) A driver's dump implementation is used when the operating
system has had a fatal error and is trying to persist a crash
dump to disk. This is a delicate operation as the system has
already failed, which means many normal operations like
interrupt handlers, timeouts, and blocking will no longer work.
In general, the
print(9E) entry point for block devices is vestigial
and users should fill in
nodev(9F) there instead.
The following are some examples of different character device
operations structures that drivers have employed. Note that using C99
structure definitions is preferred:
static struct cb_ops ksensor_cb_ops = {
.cb_open = ksensor_open,
.cb_close = ksensor_close,
.cb_strategy = nodev,
.cb_print = nodev,
.cb_dump = nodev,
.cb_read = nodev,
.cb_write = nodev,
.cb_ioctl = ksensor_ioctl,
.cb_devmap = nodev,
.cb_mmap = nodev,
.cb_segmap = nodev,
.cb_chpoll = nochpoll,
.cb_prop_op = ddi_prop_op,
.cb_flag = D_MP,
.cb_rev = CB_REV,
.cb_aread = nodev,
.cb_awrite = nodev
};
static struct cb_ops vio9p_cb_ops = {
.cb_rev = CB_REV,
.cb_flag = D_NEW | D_MP,
.cb_open = vio9p_open,
.cb_close = vio9p_close,
.cb_read = vio9p_read,
.cb_write = vio9p_write,
.cb_ioctl = vio9p_ioctl,
.cb_strategy = nodev,
.cb_print = nodev,
.cb_dump = nodev,
.cb_devmap = nodev,
.cb_mmap = nodev,
.cb_segmap = nodev,
.cb_chpoll = nochpoll,
.cb_prop_op = ddi_prop_op,
.cb_str = NULL,
.cb_aread = nodev,
.cb_awrite = nodev,
};
static struct cb_ops bd_cb_ops = {
bd_open, /* open */
bd_close, /* close */
bd_strategy, /* strategy */
nodev, /* print */
bd_dump, /* dump */
bd_read, /* read */
bd_write, /* write */
bd_ioctl, /* ioctl */
nodev, /* devmap */
nodev, /* mmap */
nodev, /* segmap */
nochpoll, /* poll */
bd_prop_op, /* cb_prop_op */
0, /* streamtab */
D_64BIT | D_MP, /* Driver compatibility flag */
CB_REV, /* cb_rev */
bd_aread, /* async read */
bd_awrite /* async write */
};
Networking Drivers
Networking device drivers come in many forms and flavors. They may
interface to the host via PCIe, USB, be a pseudo-device, or use
something entirely different like SPI (Serial Peripheral Interface).
The system provides a dedicated networking interface driver framework
that is documented in
mac(9E). This framework is sometimes also
referred to as GLDv3 (Generic LAN Device version 3).
All networking drivers will still implement a basic
struct dev_ops and
a minimal
struct cb_ops. The
mac(9E) framework takes care of
implementing all of the standard character device entry points at the
end of the day and instead provides a number of different networking-
specific entry points that take care of things like getting and setting
properties, installing and removing MAC addresses and filters, and
actually transmitting and providing callbacks for receiving packets.
Each instance of a device driver will generally have a separate
registration with
mac(9E). In other words, there is usually a one to
one relationship between a driver having its
attach(9E) entry point
called and it registering with the
mac(9E) framework.
STREAMS Modules
STREAMS modules are a historical way to provide certain services in the
kernel. For networking device drivers, instead see the prior section
and
mac(9E). Conceptually STREAMS break things into queues, with one
side being designed for a module to read data and another side for it
write or produce data. These modules are arranged in a stack, with
additional modules being pushed on for additional processing. For
example, the TTY subsystem has a serial console as a base STREAMS
module, but it then pushes on additional modules like the pseudo-
terminal emulation (
ptem(4M)), the standard line discipline
(
ldterm(4M)), etc.
STREAMS drivers don't use the normal character device entry points
(though sometimes they do define them) or even the
struct modldrv.
Instead they use the
struct modlstrmod which is discussed in
modlstrmod(9S), which in turn requires one to fill out the
fmodsw(9S),
streamtab(9S), and
qinit(9S) structures. The latter of these has two
of the more common entry points:
put(9E) srv(9E) These entry points are used when different kinds of messages are
received by the device driver on a queue. In addition, those entry
points define an alternative set of entry points for
open(9E) and
close(9E) as STREAMS modules open and close routines all operate in the
context of a given
queue_t. There are other differences here. An
ioctl is not a dedicated entry point, but rather a specific message
type (M_IOCTL) that is received in a driver's
put(9E) routine.
Finally, it's worth noting the
mt-streams(9F) manual page which
discusses several concurrency related considerations for STREAMS
related drivers.
HBA Drivers
Host bus adapters are used to interface with the various SCSI and SAS
controllers. Like with networking, the kernel provides a framework
under the name of SCSA. HBA drivers still often implement character
device entry points; however, they generally end up calling into shared
framework entry points for
open(9E),
ioctl(9E), and
close(9E). For
several of the concepts related with the 3rd version for the framework,
see
iport(9).
The following entry points are associated with HBA drivers:
tran_abort(9E) tran_bus_reset(9E) tran_dmafree(9E) tran_getcap(9E) tran_init_pkt(9E) tran_quiesce(9E) tran_reset(9E) tran_reset_notify(9E) tran_setup_pkt(9E) tran_start(9E) tran_sync_pkt(9E) tran_tgt_free(9E) tran_tgt_init(9E) tran_tgt_probe(9E) In addition to these, when using SCSAv3 with iports, drivers will call
scsi_hba_iport_register(9F) to create various iports. This has the
unique effect of causing the driver's top-level
attach(9E) entry point
to be called again, but referring to the iport instead of the main
hardware instance.
USB Drivers
The kernel provides a framework for USB client devices to access
various USB services such as getting access to device and configuration
descriptors, issuing control, bulk, interrupt, and isochronous
requests, and being notified when they are removed from the system.
Generally a USB device driver leverages a framework of some kind, like
mac(9E) in addition to the USB pieces. As such, there are no entry
points specific to USB device drivers; however, there are plenty of
provided functions.
To get started with a USB device driver, one will generally perform
some of the following steps:
1. Register with the USB framework by calling
usb_client_attach(9F).
2. Ask the kernel to fetch all of the device and class descriptors
that are appropriate with the
usb_get_dev_data(9F) function.
3. Parse the relevant descriptors to figure out which endpoints to
attach.
4. Open up pipes to the specific USB endpoints by using
usb_lookup_ep_data(9F),
usb_ep_xdescr_fill(9F), and
usb_pipe_xopen(9F).
5. Proceed with the rest of device initialization and service.
Sensors
Many devices embed sensors in them, such as a networking ASIC that
tracks its junction temperature. The kernel provides the
ksensor(9E) (kernel sensor) framework to allow device drivers to implement sensors
with a minimal set of callback functions. Any device driver, whether
it's providing services through another framework or not, can implement
the ksensor operations. Drivers do not need to implement any character
device operations directly. They are instead provided via the
ksensor(4D) driver.
A driver registers with the ksensor framework during its
attach(9E) entry point and must implement the functions described in
ksensor_ops(9E) for each sensor that it creates. These interfaces
include:
kso_kind(9E) kso_scalar(9E) Virtio Drivers
The kernel provides an uncommitted interface for Virtio device drivers,
which is discussed in some detail in
uts/common/io/virtio/virtio.h. A
client device driver will register with the framework through and then
use that to begin feature and interrupt negotiation. As part of that,
they are given the ability to set up virtqueues which can be used for
communicating to and from the hypervisor.
Kernel Statistics
Drivers have the ability to export kstats (kernel statistics) that will
appear in the
kstat(8) command. Any kind of module in the system can
create and register a kstat, it is not strictly tied to anything like a
dev_info_t. kstats have different types that they come in. The most
common kstat type is the KSTAT_TYPE_NAMED which allows for multiple,
typed name-value pairs to be part of the stat. This is what the kernel
uses under the hood for many things such as the various
mac(9E) statistics that are managed on behalf of drivers.
To create a kstat, a driver utilizes the
kstat_create(9F) function,
after which it has a chance to set up the kstat and make choices about
which entry points that it will implement. A kstat will not be made
visible until the caller calls
kstat_install(9F) on it. The two entry
points that a driver may implement are:
ks_snapshot(9E) ks_update(9E) First, let's discuss the
ks_update(9E) entry point. A kstat may be
updated in one of two ways: either by having its
ks_update(9E) function
called or by having the system update information as it goes in the
kstat's data. One would use the former when it involves doing
something like going out to hardware and reading registers, where as
the latter approach might be used when operations can be tracked as
part of a normal flow, such as the number of errors or particular
requests a driver has encountered. The
ks_snapshot(9E) entry point is
not as commonly used by comparison and allows a caller to interpose on
the data marshalling process for copying out to userland.
Upgradable Firmware Modules
The UFM (Upgradable Firmware Module) system in the kernel allows a
device driver to provide information about the firmware modules that
are present on a device and is generally used as supplementary
information about a device. The UFM framework allows a driver to
declare a given number of modules that exist on a given
dev_info_t.
Each module has some number of slots with different versions. This
information is automatically exported into various consumers such as
fwflash(8), the Fault Management Architecture, and the
ufm(4D) driver's
specific ioctls.
A driver fills in the operations vector discussed in
ddi_ufm(9E) and
registers it with the kernel by calling
ddi_ufm_init(9F). These
interfaces have entry points include:
ddi_ufm_op_getcaps(9E) ddi_ufm_op_nimages(9E) ddi_ufm_op_fill_image(9E) ddi_ufm_op_fill_slot(9E) ddi_ufm_op_readimg(9E) The
ddi_ufm_op_getcaps(9E) entry point describes the capabilities of
the device and what other entry points the kernel and callers can
expect to exist. The
ddi_ufm_op_nimages(9E) entry point tells the
system how many images there are and if it is not implemented, then the
system assumes there is a single slot. The
ddi_ufm_op_fill_image(9E) and
ddi_ufm_op_fill_slot(9E) entry points are used to fill in
information about slots and images respectively, while the
ddi_ufm_op_readimg(9E) entry point is used to read an image from the
device for the operating system. That entry point is often supported
when dealing with EEPROMs as many devices do not have a way of
retrieving the actual current firmware.
USB Host Interface Drivers
Opposite of USB device drivers are the device drivers that make the USB
abstractions work: USB host interface controllers. The kernel provides
a private framework for these, which is discussed in
usba_hcdi(9E). A
HCDI driver is a character device driver and ends up also instantiating
a root hub as part of its operation and forwards many of its open,
close, and ioctl routines to the corresponding usba hubdi functions.
To get started with the framework, a driver will need to call
usba_hcdi_register(9F) with a filled out
usba_hcdi_register_args_t(9S) structure. That registration structure includes the operation vector
of callbacks that the driver fills in, which involve opening and
closing pipes (
usba_hcdi_pipe_open(9E)), issuing the various ctrl,
interrupt, bulk, and isochronous transfers
(
usba_hcdi_pipe_bulk_xfer(9E), etc.), and more.
DTRACE PROBES
By default, the DTrace
fbt(4D), function boundary tracing, provider
will create DTrace probes based on the entry and return points of most
functions in a module (the primary exception being for some
hand-written assembler). While this is very powerful, there are often
times that driver writers want to define their own semantic probes.
The
sdt(4D), statically defined tracing, provider can be used for this.
To define an SDT probe, a driver should include <
sys/sdt.h>, which
defines several macros for probes based on the number of arguments that
are present. Each probe takes a name, which is constrained by the
rules of a C identifier. If two underscore characters are present in a
row (`_') they will be transformed into a hyphen (`-'). That is a
probe declared with a name of `hello__world' will be named
`hello-world' and accessible as the DTrace probe `sdt:::hello-world'.
Each probe can present a varying number of arguments in DTrace, ranging
from 0-8. For each DTrace probe argument, one passes both the type of
the argument and the actual value. The following example from the
igc(4D) driver shows a DTrace probe that provides four arguments and
would be accessible using the probe `sdt:::igc-context-desc':
DTRACE_PROBE4(igc__context__desc, igc_t *, igc, igc_tx_ring_t *,
ring, igc_tx_state_t *, tx, struct igc_adv_tx_context_desc *,
ctx);
In the above example,
igc,
ring,
tx, and
ctx are local variables and
function parameters.
By default SDT probes are considered
Volatile, in other words they can
change at any time and disappear. This is used to encourage widespread
use of SDT probes for what may be useful for a particular problem or
issue that is being investigated. SDT probes that are stabilized are
transformed into their own first class provider.
SEE ALSO
Intro(9),
Intro(9F),
Intro(9S)illumos May 23, 2024 illumos