Archive-only list for patches
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Jonathan Corbet <corbet@lwn.net>,
	Itay Avraham <itayavr@nvidia.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>,
	Aron Silverton <aron.silverton@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	David Ahern <dsahern@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Jiri Pirko <jiri@nvidia.com>, Leonid Bloch <lbloch@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	linux-cxl@vger.kernel.org, patches@lists.linux.dev
Subject: [PATCH 6/8] fwctl: Add documentation
Date: Mon,  3 Jun 2024 12:53:22 -0300	[thread overview]
Message-ID: <6-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com> (raw)
In-Reply-To: <0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com>

Document the purpose and rules for the fwctl subsystem.

Link in kdocs to the doc tree.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 Documentation/userspace-api/fwctl.rst | 269 ++++++++++++++++++++++++++
 Documentation/userspace-api/index.rst |   1 +
 2 files changed, 270 insertions(+)
 create mode 100644 Documentation/userspace-api/fwctl.rst

diff --git a/Documentation/userspace-api/fwctl.rst b/Documentation/userspace-api/fwctl.rst
new file mode 100644
index 00000000000000..630e75a91838f0
--- /dev/null
+++ b/Documentation/userspace-api/fwctl.rst
@@ -0,0 +1,269 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+fwctl subsystem
+===============
+
+:Author: Jason Gunthorpe
+
+Overview
+========
+
+Modern devices contain extensive amounts of FW, and in many cases, are largely
+software defined pieces of hardware. The evolution of this approach is largely a
+reaction to Moore's Law where a chip tape out is now highly expensive, and the
+chip design is extremely large. Replacing fixed HW logic with a flexible and
+tightly coupled FW/HW combination is an effective risk mitigation against chip
+respin. Problems in the HW design can be counteracted in device FW. This is
+especially true for devices which present a stable and backwards compatible
+interface to the operating system driver (such as NVMe).
+
+The FW layer in devices has grown to incredible sizes and devices frequently
+integrate clusters of fast processors to run it. For example, mlx5 devices have
+over 30MB of FW code, and big configurations operate with over 1GB of FW managed
+runtime state.
+
+The availability of such a flexible layer has created quite a variety in the
+industry where single pieces of silicon are now configurable software defined
+devices and can operate in substantially different ways depending on the need.
+Further we often see cases where specific sites wish to operate devices in ways
+that are highly specialized and require applications that have been tailored to
+their unique configuration.
+
+Further, devices have become multi-functional and integrated to the point they
+no longer fit neatly into the kernel's division of subsystems. Modern
+multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
+subsystems while sharing the underlying hardware using the auxiliary device
+system.
+
+All together this creates a challenge for the operating system, where devices
+have an expansive FW environment that needs robust device-specific debugging
+support, and FW driven functionality that is not well suited to “generic”
+interfaces. fwctl seeks to allow access to the full device functionality from
+user space in the areas of debuggability, management, and first-boot/nth-boot
+provisioning.
+
+fwctl is aimed at the common device design pattern where the OS and FW
+communicate via an RPC message layer constructed with a queue or mailbox scheme.
+In this case the driver will typically have some layer to deliver RPC messages
+and collect RPC responses from device FW. The in-kernel subsystem drivers that
+operate the device for its primary purposes will use these RPCs to build their
+drivers, but devices also usually have a set of ancillary RPCs that don't really
+fit into any specific subsystem. For example, a HW RAID controller is primarily
+operated by the block layer but also comes with a set of RPCs to administer the
+construction of drives within the HW RAID.
+
+In the past when devices were more single function individual subsystems would
+grow different approaches to solving some of these common problems, for instance
+monitoring device health, manipulating its FLASH, debugging the FW,
+provisioning, all have various unique interfaces across the kernel.
+
+fwctl's purpose is to define a common set of limited rules, described below,
+that allow user space to securely construct and execute RPCs inside device FW.
+The rules serve as an agreement between the operating system and FW on how to
+correctly design the RPC interface. As a uAPI the subsystem provides a thin
+layer of discovery and a generic uAPI to deliver the RPCs and collect the
+response. It supports a system of user space libraries and tools which will
+use this interface to control the device using the device native protocols.
+
+Scope of Action
+---------------
+
+fwctl drivers are strictly restricted to being a way to operate the device FW.
+It is not an avenue to access random kernel internals, or other operating system
+SW states.
+
+fwctl instances must operate on a well-defined device function, and the device
+should have a well-defined security model for what scope within the physical
+device the function is permitted to access. For instance, the most complex PCIe
+device today may broadly have several function level scopes:
+
+ 1. A privileged function with full access to the on-device global state and
+    configuration
+
+ 2. Multiple hypervisor functions with control over itself and child functions
+    used with VMs
+
+ 3. Multiple VM functions tightly scoped within the VM
+
+The device may create a logical parent/child relationship between these scopes,
+for instance a child VM's FW may be within the scope of the hypervisor FW. It is
+quite common in the VFIO world that the hypervisor environment has a complex
+provisioning/profiling/configuration responsibility for the function VFIO
+assigns to the VM.
+
+Further, within the function, devices often have RPC commands that fall within
+some general scopes of action:
+
+ 1. Access to function & child configuration, flash, etc that becomes live at a
+    function reset.
+
+ 2. Access to function & child runtime configuration that kernel drivers can
+    discover at runtime.
+
+ 3. Read only access to function debug information that may report on FW objects
+    in the function & child, including FW objects owned by other kernel
+    subsystems.
+
+ 4. Write access to function & child debug information strictly compatible with
+    the principles of kernel lockdown and kernel integrity protection. Triggers
+    a kernel Taint.
+
+ 5. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
+
+Userspace will provide a scope label on each RPC and the kernel must enforce the
+above CAP's and taints based on that scope. A combination of kernel and FW can
+enforce that RPCs are placed in the correct scope by userspace.
+
+Denied behavior
+---------------
+
+There are many things this interface must not allow user space to do (without a
+Taint or CAP), broadly derived from the principles of kernel lockdown. Some
+examples:
+
+ 1. DMA to/from arbitrary memory, hang the system, run code in the device, or
+    otherwise compromise device or system security and integrity.
+
+ 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
+    objects owned by kernel drivers.
+
+ 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
+    driver can react to the device configuration at function reset/driver load
+    time, but otherwise should not be coupled to fwctl.
+
+ 4. Operate the HW in a way that overlaps with the core purpose of another
+    primary kernel subsystem, such as read/write to LBAs, send/receive of
+    network packets, or operate an accelerator's data plane.
+
+fwctl is not a replacement for device direct access subsystems like uacce or
+VFIO.
+
+fwctl User API
+==============
+
+.. kernel-doc:: include/uapi/fwctl/fwctl.h
+.. kernel-doc:: include/uapi/fwctl/mlx5.h
+
+sysfs Class
+-----------
+
+fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
+(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
+operates the iotcl uAPI described above.
+
+fwctl devices can be related to driver components in other subsystems through
+sysfs::
+
+    $ ls /sys/class/fwctl/fwctl0/device/infiniband/
+    ibp0s10f0
+
+    $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
+    fwctl0/
+
+    $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
+    dev  device  power  subsystem  uevent
+
+User space Community
+--------------------
+
+Drawing inspiration from nvme-cli, participating in the kernel side must come
+with a user space in a common TBD git tree, at a minimum to usefully operate the
+kernel driver. Providing such an implementation is a pre-condition to merging a
+kernel driver.
+
+The goal is to build user space community around some of the shared problems
+we all have, and ideally develop some common user space programs with some
+starting themes of:
+
+ - Device in-field debugging
+
+ - HW provisioning
+
+ - VFIO child device profiling before VM boot
+
+ - Confidential Compute topics (attestation, secure provisioning)
+
+That stretches across all subsystems in the kernel. fwupd is a great example of
+how an excellent user space experience can emerge out of kernel-side diversity.
+
+fwctl Kernel API
+================
+
+.. kernel-doc:: drivers/fwctl/main.c
+   :export:
+.. kernel-doc:: include/linux/fwctl.h
+
+fwctl Driver design
+-------------------
+
+In many cases a fwctl driver is going to be part of a larger cross-subsystem
+device possibly using the auxiliary_device mechanism. In that case several
+subsystems are going to be sharing the same device and FW interface layer so the
+device design must already provide for isolation and co-operation between kernel
+subsystems. fwctl should fit into that same model.
+
+Part of the driver should include a description of how its scope restrictions
+and security model work. The driver and FW together must ensure that RPCs
+provided by user space are mapped to the appropriate scope. If the validation is
+done in the driver then the validation can read a 'command effects' report from
+the device, or hardwire the enforcement. If the validation is done in the FW,
+then the driver should pass the fwctl_rpc_scope to the FW along with the command.
+
+The driver and FW must co-operate to ensure that either fwctl cannot allocate
+any FW resources, or any resources it does allocate are freed on FD closure.  A
+driver primarily constructed around FW RPCs may find that its core PCI function
+and RPC layer belongs under fwctl with auxiliary devices connecting to other
+subsystems.
+
+Each device type must represent a stable FW ABI, such that the userspace
+components have the same general stability we expect from the kernel. FW upgrade
+should not break the userspace tools.
+
+Security Response
+=================
+
+The kernel remains the gatekeeper for this interface. If violations of the
+scopes, security or isolation principles are found, we have options to let
+devices fix them with a FW update, push a kernel patch to parse and block RPC
+commands or push a kernel patch to block entire firmware versions, or devices.
+
+While the kernel can always directly parse and restrict RPCs, it is expected
+that the existing kernel pattern of allowing drivers to delegate validation to
+FW to be a useful design.
+
+Existing Similar Examples
+=========================
+
+The approach described in this document is not a new idea. Direct, or near
+direct device access has been offered by the kernel in different areas for
+decades. With more devices wanting to follow this design pattern it is becoming
+clear that it is not entirely well understood and, more importantly, the
+security considerations are not well defined or agreed upon.
+
+Some examples:
+
+ - HW RAID controllers. This includes RPCs to do things like compose drives into
+   a RAID volume, configure RAID parameters, monitor the HW and more.
+
+ - Baseboard managers. RPCs for configuring settings in the device and more
+
+ - NVMe vendor command capsules. nvme-cli provides access to some monitoring
+   functions that different products have defined, but more exists.
+
+ - CXL also has a NVMe like vendor command system.
+
+ - DRM allows user space drivers to send commands to the device via kernel
+   mediation
+
+ - RDMA allows user space drivers to directly push commands to the device
+   without kernel involvement
+
+ - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc
+
+The first 4 would be examples of areas that fwctl intends to cover.
+
+Some key lessons learned from these past efforts are the importance of having a
+common user space project to use as a pre-condition for obtaining a kernel
+driver. Developing good community around useful software in user space is key to
+getting companies to fund participation to enable their products.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 5926115ec0ed86..9685942fc8a21f 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -43,6 +43,7 @@ Devices and I/O
 
    accelerators/ocxl
    dma-buf-alloc-exchange
+   fwctl
    gpio/index
    iommu
    iommufd
-- 
2.45.2


  parent reply	other threads:[~2024-06-03 15:53 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-06-04  9:32   ` Leon Romanovsky
2024-06-04 15:50     ` Jason Gunthorpe
2024-06-04 17:05       ` Jonathan Cameron
2024-06-04 18:52         ` Jason Gunthorpe
2024-06-05 11:08           ` Jonathan Cameron
2024-06-04 16:42   ` Randy Dunlap
2024-06-04 16:44     ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2024-06-04 12:16   ` Zhu Yanjun
2024-06-04 12:22     ` Leon Romanovsky
2024-06-04 16:50       ` Jonathan Cameron
2024-06-04 16:58         ` Jason Gunthorpe
2024-06-05 11:07           ` Jonathan Cameron
2024-06-05 18:27             ` Jason Gunthorpe
2024-06-06 13:34               ` Jonathan Cameron
2024-06-06 15:37                 ` Randy Dunlap
2024-06-05 15:42   ` Przemek Kitszel
2024-06-05 15:49     ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2024-06-13 23:32   ` Dave Jiang
2024-06-13 23:40     ` Jason Gunthorpe
2024-06-14 16:37       ` Dave Jiang
2024-06-03 15:53 ` [PATCH 4/8] taint: Add TAINT_FWCTL Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
2024-06-03 15:53 ` Jason Gunthorpe [this message]
2024-06-05  2:31   ` [PATCH 6/8] fwctl: Add documentation Randy Dunlap
2024-06-05 16:03     ` Jason Gunthorpe
2024-06-05 20:14       ` Randy Dunlap
2024-06-03 15:53 ` [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
2024-06-04  3:01   ` David Ahern
2024-06-04 14:04     ` Jakub Kicinski
2024-06-04 21:28       ` Saeed Mahameed
2024-06-04 22:32         ` Jakub Kicinski
2024-06-05 14:50           ` Jason Gunthorpe
2024-06-05 15:41             ` Jakub Kicinski
2024-06-04 23:56       ` Dan Williams
2024-06-05  3:05         ` Jakub Kicinski
2024-06-05 11:19         ` Jonathan Cameron
2024-06-05 13:59         ` Jason Gunthorpe
2024-06-06  2:35           ` David Ahern
2024-06-06 14:18             ` Jakub Kicinski
2024-06-06 14:48               ` Jason Gunthorpe
2024-06-06 15:05                 ` Jakub Kicinski
2024-06-06 17:47                   ` David Ahern
2024-06-07  6:48                     ` Jiri Pirko
2024-06-07 14:50                       ` David Ahern
2024-06-07 15:14                         ` Jason Gunthorpe
2024-06-07 15:50                           ` Jiri Pirko
2024-06-07 17:24                             ` Jason Gunthorpe
2024-06-07  7:34               ` Jiri Pirko
2024-06-07 12:49                 ` Andrew Lunn
2024-06-07 13:34                   ` Jiri Pirko
2024-06-08  1:43                     ` Jakub Kicinski
2024-06-06  4:56           ` Dan Williams
2024-06-06  8:50             ` Leon Romanovsky
2024-06-06 22:11               ` Dan Williams
2024-06-07  0:02                 ` Jason Gunthorpe
2024-06-07 13:12                 ` Leon Romanovsky
2024-06-06 14:41             ` Jason Gunthorpe
2024-06-06 14:58               ` Jakub Kicinski
2024-06-06 17:24               ` Dan Williams
2024-06-07  0:25                 ` Jason Gunthorpe
2024-06-07 10:47                   ` Przemek Kitszel
2024-06-11 15:36           ` Daniel Vetter
2024-06-11 16:17             ` Jason Gunthorpe
2024-06-11 16:54               ` Daniel Vetter
2024-06-06  1:58       ` David Ahern
2024-06-05  3:11 ` Jakub Kicinski
2024-06-05 12:06   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=aron.silverton@oracle.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dsahern@kernel.org \
    --cc=hch@infradead.org \
    --cc=itayavr@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=lbloch@nvidia.com \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox