* [PATCH v4 00/10] Introduce fwctl subystem
@ 2025-02-07 0:13 Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
` (13 more replies)
0 siblings, 14 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
[
Many people were away around the holiday period, but work is back in full
swing now with Dave already at v3 on his CXL work over the past couple
weeks. We are looking at a good chance of reaching this merge window. I
will work out some shared branches with CXL and get it into linux-next
once all three drivers can be assembled and reviews seem to be concluding.
There are couple open notes
- Greg was interested in a new name, but nobody offered any bikesheds
- I would like a co-maintainer
]
fwctl is a new subsystem intended to bring some common rules and order to
the growing pattern of exposing a secure FW interface directly to
userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
exposing a device for datapath operations fwctl is focused on debugging,
configuration and provisioning of the device. It will not have the
necessary features like interrupt delivery to support a datapath.
This concept is similar to the long standing practice in the "HW" RAID
space of having a device specific misc device to manage the RAID
controller FW. fwctl generalizes this notion of a companion debug and
management interface that goes along with a dataplane implemented in an
appropriate subsystem.
The need for this has reached a critical point as many users are moving to
run lockdown enabled kernels. Several existing devices have had long
standing tooling for management that relied on /sys/../resource0 or PCI
config space access which is not permitted in lockdown. A major point of
fwctl is to define and document the rules that a device must follow to
expose a lockdown compatible RPC.
Based on some discussion fwctl splits the RPCs into four categories
FWCTL_RPC_CONFIGURATION
FWCTL_RPC_DEBUG_READ_ONLY
FWCTL_RPC_DEBUG_WRITE
FWCTL_RPC_DEBUG_WRITE_FULL
Where the latter two trigger a new TAINT_FWCTL, and the final one requires
CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
would be responsible to restrict RPCs to the requested security scope,
while the core code handles the tainting and CAP checks.
For details see the final patch which introduces the documentation.
The CXL FWCTL driver is now in it own series on v3:
https://lore.kernel.org/r/20250204220430.4146187-1-dave.jiang@intel.com
I'm expecting a 3rd driver (from Shannon @ Pensando) to be posted right
away, the github version I saw looked good. I've got soft commitments for
about 6 drivers in total now.
There have been three LWN articles written discussing various aspects of
this proposal:
https://lwn.net/Articles/955001/
https://lwn.net/Articles/969383/
https://lwn.net/Articles/990802/
A really giant ksummit thread preceding a discussion at the Maintainer
Summit:
https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
Several have expressed general support for this concept:
AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com
Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org
Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org
NVIDIA Networking
Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
Work is ongoing for userspace, currently the mellanox tool suite has been
ported over:
https://github.com/Mellanox/mstflint
And a more simplified example how to use it:
https://github.com/jgunthorpe/mlx5ctl.git
This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
v4:
- Rebase to v6.14-rc1
- Fine tune comments and rst documentatin
- Adjust cleanup.h usage - remove places that add more ofuscation than
value
- CXL is back to its own independent series
- Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit
- Fix mlx5ctl_validate_rpc() logic around scope checking
- Disable mlx5ctl on SFs
v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com
- Rebase to v6.11-rc4
- Add a squashed version of David's CXL series as the 2nd driver
- Add missing includes
- Improve comments based on feedback
- Use the kdoc format that puts the member docs inside the struct
- Rewrite fwctl_alloc_device() to be clearer
- Incorporate all remarks for the documentation
v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
- Rebase to v6.10-rc5
- Minor style changes
- Follow the style consensus for the guard stuff
- Documentation grammer/spelling
- Add missed length output for mlx5 get_info
- Add two more missed MLX5 CMD's
- Collect tags
v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
Andy Gospodarek (2):
fwctl/bnxt: Support communicating with bnxt fw
bnxt: Create an auxiliary device for fwctl_bnxt
Jason Gunthorpe (6):
fwctl: Add basic structure for a class subsystem with a cdev
fwctl: Basic ioctl dispatch for the character device
fwctl: FWCTL_INFO to return basic information about the device
taint: Add TAINT_FWCTL
fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
fwctl: Add documentation
Saeed Mahameed (2):
fwctl/mlx5: Support for communicating with mlx5 fw
mlx5: Create an auxiliary device for fwctl_mlx5
Documentation/admin-guide/tainted-kernels.rst | 5 +
Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++
Documentation/userspace-api/fwctl/index.rst | 12 +
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 16 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/fwctl/Kconfig | 32 ++
drivers/fwctl/Makefile | 6 +
drivers/fwctl/bnxt/Makefile | 4 +
drivers/fwctl/bnxt/bnxt.c | 167 +++++++
drivers/fwctl/main.c | 416 ++++++++++++++++++
drivers/fwctl/mlx5/Makefile | 4 +
drivers/fwctl/mlx5/main.c | 340 ++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 126 +++++-
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 4 +
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 +
include/linux/fwctl.h | 135 ++++++
include/linux/panic.h | 3 +-
include/uapi/fwctl/bnxt.h | 27 ++
include/uapi/fwctl/fwctl.h | 140 ++++++
include/uapi/fwctl/mlx5.h | 36 ++
kernel/panic.c | 1 +
tools/debugging/kernel-chktaint | 8 +
27 files changed, 1782 insertions(+), 5 deletions(-)
create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
create mode 100644 Documentation/userspace-api/fwctl/index.rst
create mode 100644 drivers/fwctl/Kconfig
create mode 100644 drivers/fwctl/Makefile
create mode 100644 drivers/fwctl/bnxt/Makefile
create mode 100644 drivers/fwctl/bnxt/bnxt.c
create mode 100644 drivers/fwctl/main.c
create mode 100644 drivers/fwctl/mlx5/Makefile
create mode 100644 drivers/fwctl/mlx5/main.c
create mode 100644 include/linux/fwctl.h
create mode 100644 include/uapi/fwctl/bnxt.h
create mode 100644 include/uapi/fwctl/fwctl.h
create mode 100644 include/uapi/fwctl/mlx5.h
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
--
2.43.0
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 23:32 ` Dan Williams
2025-02-08 0:08 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
` (12 subsequent siblings)
13 siblings, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Create the class, character device and functions for a fwctl driver to
un/register to the subsystem.
A typical fwctl driver has a sysfs presence like:
$ ls -l /dev/fwctl/fwctl0
crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0
$ ls /sys/class/fwctl/fwctl0
dev device power subsystem uevent
$ ls /sys/class/fwctl/fwctl0/device/infiniband/
ibp0s10f0
$ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
fwctl0/
$ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
dev device power subsystem uevent
Which allows userspace to link all the multi-subsystem driver components
together and learn the subsystem specific names for the device's
components.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
MAINTAINERS | 8 ++
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/fwctl/Kconfig | 9 +++
drivers/fwctl/Makefile | 4 +
drivers/fwctl/main.c | 170 +++++++++++++++++++++++++++++++++++++++++
include/linux/fwctl.h | 69 +++++++++++++++++
7 files changed, 263 insertions(+)
create mode 100644 drivers/fwctl/Kconfig
create mode 100644 drivers/fwctl/Makefile
create mode 100644 drivers/fwctl/main.c
create mode 100644 include/linux/fwctl.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 896a307fa06545..ff418a77f39e4d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9557,6 +9557,14 @@ F: kernel/futex/*
F: tools/perf/bench/futex*
F: tools/testing/selftests/futex/
+FWCTL SUBSYSTEM
+M: Jason Gunthorpe <jgg@nvidia.com>
+M: Saeed Mahameed <saeedm@nvidia.com>
+S: Maintained
+F: Documentation/userspace-api/fwctl.rst
+F: drivers/fwctl/
+F: include/linux/fwctl.h
+
GALAXYCORE GC0308 CAMERA SENSOR DRIVER
M: Sebastian Reichel <sre@kernel.org>
L: linux-media@vger.kernel.org
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 7bdad836fc6207..7c556c5ac4fddc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -21,6 +21,8 @@ source "drivers/connector/Kconfig"
source "drivers/firmware/Kconfig"
+source "drivers/fwctl/Kconfig"
+
source "drivers/gnss/Kconfig"
source "drivers/mtd/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 45d1c3e630f754..b5749cf67044ce 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -135,6 +135,7 @@ obj-y += ufs/
obj-$(CONFIG_MEMSTICK) += memstick/
obj-$(CONFIG_INFINIBAND) += infiniband/
obj-y += firmware/
+obj-$(CONFIG_FWCTL) += fwctl/
obj-$(CONFIG_CRYPTO) += crypto/
obj-$(CONFIG_SUPERH) += sh/
obj-y += clocksource/
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
new file mode 100644
index 00000000000000..37147a695add9a
--- /dev/null
+++ b/drivers/fwctl/Kconfig
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menuconfig FWCTL
+ tristate "fwctl device firmware access framework"
+ help
+ fwctl provides a userspace API for restricted access to communicate
+ with on-device firmware. The communication channel is intended to
+ support a wide range of lockdown compatible device behaviors including
+ manipulating device FLASH, debugging, and other activities that don't
+ fit neatly into an existing subsystem.
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
new file mode 100644
index 00000000000000..1cad210f6ba580
--- /dev/null
+++ b/drivers/fwctl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL) += fwctl.o
+
+fwctl-y += main.o
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
new file mode 100644
index 00000000000000..34946bdc3bf3d7
--- /dev/null
+++ b/drivers/fwctl/main.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#define pr_fmt(fmt) "fwctl: " fmt
+#include <linux/fwctl.h>
+
+#include <linux/container_of.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+enum {
+ FWCTL_MAX_DEVICES = 4096,
+};
+static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
+
+static dev_t fwctl_dev;
+static DEFINE_IDA(fwctl_ida);
+
+static int fwctl_fops_open(struct inode *inode, struct file *filp)
+{
+ struct fwctl_device *fwctl =
+ container_of(inode->i_cdev, struct fwctl_device, cdev);
+
+ get_device(&fwctl->dev);
+ filp->private_data = fwctl;
+ return 0;
+}
+
+static int fwctl_fops_release(struct inode *inode, struct file *filp)
+{
+ struct fwctl_device *fwctl = filp->private_data;
+
+ fwctl_put(fwctl);
+ return 0;
+}
+
+static const struct file_operations fwctl_fops = {
+ .owner = THIS_MODULE,
+ .open = fwctl_fops_open,
+ .release = fwctl_fops_release,
+};
+
+static void fwctl_device_release(struct device *device)
+{
+ struct fwctl_device *fwctl =
+ container_of(device, struct fwctl_device, dev);
+
+ ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+ kfree(fwctl);
+}
+
+static char *fwctl_devnode(const struct device *dev, umode_t *mode)
+{
+ return kasprintf(GFP_KERNEL, "fwctl/%s", dev_name(dev));
+}
+
+static struct class fwctl_class = {
+ .name = "fwctl",
+ .dev_release = fwctl_device_release,
+ .devnode = fwctl_devnode,
+};
+
+static struct fwctl_device *
+_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
+{
+ struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
+ int devnum;
+
+ if (!fwctl)
+ return NULL;
+
+ fwctl->dev.class = &fwctl_class;
+ fwctl->dev.parent = parent;
+
+ devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
+ if (devnum < 0)
+ return NULL;
+ fwctl->dev.devt = fwctl_dev + devnum;
+
+ device_initialize(&fwctl->dev);
+ return_ptr(fwctl);
+}
+
+/* Drivers use the fwctl_alloc_device() wrapper */
+struct fwctl_device *_fwctl_alloc_device(struct device *parent,
+ const struct fwctl_ops *ops,
+ size_t size)
+{
+ struct fwctl_device *fwctl __free(fwctl) =
+ _alloc_device(parent, ops, size);
+
+ if (!fwctl)
+ return NULL;
+
+ cdev_init(&fwctl->cdev, &fwctl_fops);
+ /*
+ * The driver module is protected by fwctl_register/unregister(),
+ * unregister won't complete until we are done with the driver's module.
+ */
+ fwctl->cdev.owner = THIS_MODULE;
+
+ if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
+ return NULL;
+
+ fwctl->ops = ops;
+ return_ptr(fwctl);
+}
+EXPORT_SYMBOL_NS_GPL(_fwctl_alloc_device, "FWCTL");
+
+/**
+ * fwctl_register - Register a new device to the subsystem
+ * @fwctl: Previously allocated fwctl_device
+ *
+ * On return the device is visible through sysfs and /dev, driver ops may be
+ * called.
+ */
+int fwctl_register(struct fwctl_device *fwctl)
+{
+ return cdev_device_add(&fwctl->cdev, &fwctl->dev);
+}
+EXPORT_SYMBOL_NS_GPL(fwctl_register, "FWCTL");
+
+/**
+ * fwctl_unregister - Unregister a device from the subsystem
+ * @fwctl: Previously allocated and registered fwctl_device
+ *
+ * Undoes fwctl_register(). On return no driver ops will be called. The
+ * caller must still call fwctl_put() to free the fwctl.
+ *
+ * The design of fwctl allows this sort of disassociation of the driver from the
+ * subsystem primarily by keeping memory allocations owned by the core subsytem.
+ * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
+ * callback. This allows the module to remain unlocked while FDs are open.
+ */
+void fwctl_unregister(struct fwctl_device *fwctl)
+{
+ cdev_device_del(&fwctl->cdev, &fwctl->dev);
+}
+EXPORT_SYMBOL_NS_GPL(fwctl_unregister, "FWCTL");
+
+static int __init fwctl_init(void)
+{
+ int ret;
+
+ ret = alloc_chrdev_region(&fwctl_dev, 0, FWCTL_MAX_DEVICES, "fwctl");
+ if (ret)
+ return ret;
+
+ ret = class_register(&fwctl_class);
+ if (ret)
+ goto err_chrdev;
+ return 0;
+
+err_chrdev:
+ unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
+ return ret;
+}
+
+static void __exit fwctl_exit(void)
+{
+ class_unregister(&fwctl_class);
+ unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
+}
+
+module_init(fwctl_init);
+module_exit(fwctl_exit);
+MODULE_DESCRIPTION("fwctl device firmware access framework");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
new file mode 100644
index 00000000000000..68ac2d5ab87481
--- /dev/null
+++ b/include/linux/fwctl.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#ifndef __LINUX_FWCTL_H
+#define __LINUX_FWCTL_H
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/cleanup.h>
+
+struct fwctl_device;
+struct fwctl_uctx;
+
+struct fwctl_ops {
+};
+
+/**
+ * struct fwctl_device - Per-driver registration struct
+ * @dev: The sysfs (class/fwctl/fwctlXX) device
+ *
+ * Each driver instance will have one of these structs with the driver private
+ * data following immediately after. This struct is refcounted, it is freed by
+ * calling fwctl_put().
+ */
+struct fwctl_device {
+ struct device dev;
+ /* private: */
+ struct cdev cdev;
+ const struct fwctl_ops *ops;
+};
+
+struct fwctl_device *_fwctl_alloc_device(struct device *parent,
+ const struct fwctl_ops *ops,
+ size_t size);
+/**
+ * fwctl_alloc_device - Allocate a fwctl
+ * @parent: Physical device that provides the FW interface
+ * @ops: Driver ops to register
+ * @drv_struct: 'struct driver_fwctl' that holds the struct fwctl_device
+ * @member: Name of the struct fwctl_device in @drv_struct
+ *
+ * This allocates and initializes the fwctl_device embedded in the drv_struct.
+ * Upon success the pointer must be freed via fwctl_put(). Returns a 'drv_struct
+ * \*' on success, NULL on error.
+ */
+#define fwctl_alloc_device(parent, ops, drv_struct, member) \
+ ({ \
+ static_assert(__same_type(struct fwctl_device, \
+ ((drv_struct *)NULL)->member)); \
+ static_assert(offsetof(drv_struct, member) == 0); \
+ (drv_struct *)_fwctl_alloc_device(parent, ops, \
+ sizeof(drv_struct)); \
+ })
+
+static inline struct fwctl_device *fwctl_get(struct fwctl_device *fwctl)
+{
+ get_device(&fwctl->dev);
+ return fwctl;
+}
+static inline void fwctl_put(struct fwctl_device *fwctl)
+{
+ put_device(&fwctl->dev);
+}
+DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
+
+int fwctl_register(struct fwctl_device *fwctl);
+void fwctl_unregister(struct fwctl_device *fwctl);
+
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 12:59 ` Jonathan Cameron
` (2 more replies)
2025-02-07 0:13 ` [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
` (11 subsequent siblings)
13 siblings, 3 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Each file descriptor gets a chunk of per-FD driver specific context that
allows the driver to attach a device specific struct to. The core code
takes care of the memory lifetime for this structure.
The ioctl dispatch and design is based on what was built for iommufd. The
ioctls have a struct which has a combined in/out behavior with a typical
'zero pad' scheme for future extension and backwards compatibility.
Like iommufd some shared logic does most of the ioctl marshalling and
compatibility work and tables diatches to some function pointers for
each unique iotcl.
This approach has proven to work quite well in the iommufd and rdma
subsystems.
Allocate an ioctl number space for the subsystem.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 1 +
drivers/fwctl/main.c | 145 +++++++++++++++++-
include/linux/fwctl.h | 46 ++++++
include/uapi/fwctl/fwctl.h | 38 +++++
5 files changed, 226 insertions(+), 5 deletions(-)
create mode 100644 include/uapi/fwctl/fwctl.h
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 6d1465315df328..3410b020a9d093 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -331,6 +331,7 @@ Code Seq# Include File Comments
0x97 00-7F fs/ceph/ioctl.h Ceph file system
0x99 00-0F 537-Addinboard driver
<mailto:buk@buks.ipn.de>
+0x9A 00-0F include/uapi/fwctl/fwctl.h
0xA0 all linux/sdp/sdp.h Industrial Device Project
<mailto:kenji@bitgate.com>
0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver
diff --git a/MAINTAINERS b/MAINTAINERS
index ff418a77f39e4d..5f30adbe6c8521 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9564,6 +9564,7 @@ S: Maintained
F: Documentation/userspace-api/fwctl.rst
F: drivers/fwctl/
F: include/linux/fwctl.h
+F: include/uapi/fwctl/
GALAXYCORE GC0308 CAMERA SENSOR DRIVER
M: Sebastian Reichel <sre@kernel.org>
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index 34946bdc3bf3d7..d561deaf2b86d8 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -10,6 +10,8 @@
#include <linux/module.h>
#include <linux/slab.h>
+#include <uapi/fwctl/fwctl.h>
+
enum {
FWCTL_MAX_DEVICES = 4096,
};
@@ -18,20 +20,128 @@ static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
static dev_t fwctl_dev;
static DEFINE_IDA(fwctl_ida);
+struct fwctl_ucmd {
+ struct fwctl_uctx *uctx;
+ void __user *ubuffer;
+ void *cmd;
+ u32 user_size;
+};
+
+/* On stack memory for the ioctl structs */
+union ucmd_buffer {
+};
+
+struct fwctl_ioctl_op {
+ unsigned int size;
+ unsigned int min_size;
+ unsigned int ioctl_num;
+ int (*execute)(struct fwctl_ucmd *ucmd);
+};
+
+#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
+ [_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = { \
+ .size = sizeof(_struct) + \
+ BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
+ sizeof(_struct)), \
+ .min_size = offsetofend(_struct, _last), \
+ .ioctl_num = _ioctl, \
+ .execute = _fn, \
+ }
+static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
+};
+
+static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct fwctl_uctx *uctx = filp->private_data;
+ const struct fwctl_ioctl_op *op;
+ struct fwctl_ucmd ucmd = {};
+ union ucmd_buffer buf;
+ unsigned int nr;
+ int ret;
+
+ nr = _IOC_NR(cmd);
+ if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
+ return -ENOIOCTLCMD;
+
+ op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
+ if (op->ioctl_num != cmd)
+ return -ENOIOCTLCMD;
+
+ ucmd.uctx = uctx;
+ ucmd.cmd = &buf;
+ ucmd.ubuffer = (void __user *)arg;
+ ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
+ if (ret)
+ return ret;
+
+ if (ucmd.user_size < op->min_size)
+ return -EINVAL;
+
+ ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
+ ucmd.user_size);
+ if (ret)
+ return ret;
+
+ guard(rwsem_read)(&uctx->fwctl->registration_lock);
+ if (!uctx->fwctl->ops)
+ return -ENODEV;
+ return op->execute(&ucmd);
+}
+
static int fwctl_fops_open(struct inode *inode, struct file *filp)
{
struct fwctl_device *fwctl =
container_of(inode->i_cdev, struct fwctl_device, cdev);
+ int ret;
+
+ guard(rwsem_read)(&fwctl->registration_lock);
+ if (!fwctl->ops)
+ return -ENODEV;
+
+ struct fwctl_uctx *uctx __free(kfree) =
+ kzalloc(fwctl->ops->uctx_size, GFP_KERNEL_ACCOUNT);
+ if (!uctx)
+ return -ENOMEM;
+
+ uctx->fwctl = fwctl;
+ ret = fwctl->ops->open_uctx(uctx);
+ if (ret)
+ return ret;
+
+ scoped_guard(mutex, &fwctl->uctx_list_lock) {
+ list_add_tail(&uctx->uctx_list_entry, &fwctl->uctx_list);
+ }
get_device(&fwctl->dev);
- filp->private_data = fwctl;
+ filp->private_data = no_free_ptr(uctx);
return 0;
}
+static void fwctl_destroy_uctx(struct fwctl_uctx *uctx)
+{
+ lockdep_assert_held(&uctx->fwctl->uctx_list_lock);
+ list_del(&uctx->uctx_list_entry);
+ uctx->fwctl->ops->close_uctx(uctx);
+}
+
static int fwctl_fops_release(struct inode *inode, struct file *filp)
{
- struct fwctl_device *fwctl = filp->private_data;
+ struct fwctl_uctx *uctx = filp->private_data;
+ struct fwctl_device *fwctl = uctx->fwctl;
+ scoped_guard(rwsem_read, &fwctl->registration_lock) {
+ /*
+ * fwctl_unregister() has already removed the driver and
+ * destroyed the uctx.
+ */
+ if (fwctl->ops) {
+ guard(mutex)(&fwctl->uctx_list_lock);
+ fwctl_destroy_uctx(uctx);
+ }
+ }
+
+ kfree(uctx);
fwctl_put(fwctl);
return 0;
}
@@ -40,6 +150,7 @@ static const struct file_operations fwctl_fops = {
.owner = THIS_MODULE,
.open = fwctl_fops_open,
.release = fwctl_fops_release,
+ .unlocked_ioctl = fwctl_fops_ioctl,
};
static void fwctl_device_release(struct device *device)
@@ -48,6 +159,7 @@ static void fwctl_device_release(struct device *device)
container_of(device, struct fwctl_device, dev);
ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+ mutex_destroy(&fwctl->uctx_list_lock);
kfree(fwctl);
}
@@ -71,14 +183,17 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
if (!fwctl)
return NULL;
- fwctl->dev.class = &fwctl_class;
- fwctl->dev.parent = parent;
-
devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
if (devnum < 0)
return NULL;
fwctl->dev.devt = fwctl_dev + devnum;
+ fwctl->dev.class = &fwctl_class;
+ fwctl->dev.parent = parent;
+ init_rwsem(&fwctl->registration_lock);
+ mutex_init(&fwctl->uctx_list_lock);
+ INIT_LIST_HEAD(&fwctl->uctx_list);
+
device_initialize(&fwctl->dev);
return_ptr(fwctl);
}
@@ -129,6 +244,10 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, "FWCTL");
* Undoes fwctl_register(). On return no driver ops will be called. The
* caller must still call fwctl_put() to free the fwctl.
*
+ * Unregister will return even if userspace still has file descriptors open.
+ * This will call ops->close_uctx() on any open FDs and after return no driver
+ * op will be called. The FDs remain open but all fops will return -ENODEV.
+ *
* The design of fwctl allows this sort of disassociation of the driver from the
* subsystem primarily by keeping memory allocations owned by the core subsytem.
* The fwctl_device and fwctl_uctx can both be freed without requiring a driver
@@ -136,7 +255,23 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, "FWCTL");
*/
void fwctl_unregister(struct fwctl_device *fwctl)
{
+ struct fwctl_uctx *uctx;
+
cdev_device_del(&fwctl->cdev, &fwctl->dev);
+
+ /* Disable and free the driver's resources for any still open FDs. */
+ guard(rwsem_write)(&fwctl->registration_lock);
+ guard(mutex)(&fwctl->uctx_list_lock);
+ while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
+ struct fwctl_uctx,
+ uctx_list_entry)))
+ fwctl_destroy_uctx(uctx);
+
+ /*
+ * The driver module may unload after this returns, the op pointer will
+ * not be valid.
+ */
+ fwctl->ops = NULL;
}
EXPORT_SYMBOL_NS_GPL(fwctl_unregister, "FWCTL");
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 68ac2d5ab87481..93b470efb9dbc3 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -11,7 +11,30 @@
struct fwctl_device;
struct fwctl_uctx;
+/**
+ * struct fwctl_ops - Driver provided operations
+ *
+ * fwctl_unregister() will wait until all excuting ops are completed before it
+ * returns. Drivers should be mindful to not let their ops run for too long as
+ * it will block device hot unplug and module unloading.
+ */
struct fwctl_ops {
+ /**
+ * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
+ * bytes of this memory will be a fwctl_uctx. The driver can use the
+ * remaining bytes as its private memory.
+ */
+ size_t uctx_size;
+ /**
+ * @open_uctx: Called when a file descriptor is opened before the uctx
+ * is ever used.
+ */
+ int (*open_uctx)(struct fwctl_uctx *uctx);
+ /**
+ * @close_uctx: Called when the uctx is destroyed, usually when the FD
+ * is closed.
+ */
+ void (*close_uctx)(struct fwctl_uctx *uctx);
};
/**
@@ -26,6 +49,15 @@ struct fwctl_device {
struct device dev;
/* private: */
struct cdev cdev;
+
+ /* Protect uctx_list */
+ struct mutex uctx_list_lock;
+ struct list_head uctx_list;
+ /*
+ * Protect ops, held for write when ops becomes NULL during unregister,
+ * held for read whenever ops is loaded or an ops function is running.
+ */
+ struct rw_semaphore registration_lock;
const struct fwctl_ops *ops;
};
@@ -66,4 +98,18 @@ DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
int fwctl_register(struct fwctl_device *fwctl);
void fwctl_unregister(struct fwctl_device *fwctl);
+/**
+ * struct fwctl_uctx - Per user FD context
+ * @fwctl: fwctl instance that owns the context
+ *
+ * Every FD opened by userspace will get a unique context allocation. Any driver
+ * private data will follow immediately after.
+ */
+struct fwctl_uctx {
+ struct fwctl_device *fwctl;
+ /* private: */
+ /* Head at fwctl_device::uctx_list */
+ struct list_head uctx_list_entry;
+};
+
#endif
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
new file mode 100644
index 00000000000000..f4718a6240f281
--- /dev/null
+++ b/include/uapi/fwctl/fwctl.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
+ */
+#ifndef _UAPI_FWCTL_H
+#define _UAPI_FWCTL_H
+
+#define FWCTL_TYPE 0x9A
+
+/**
+ * DOC: General ioctl format
+ *
+ * The ioctl interface follows a general format to allow for extensibility. Each
+ * ioctl is passed a structure pointer as the argument providing the size of
+ * the structure in the first u32. The kernel checks that any structure space
+ * beyond what it understands is 0. This allows userspace to use the backward
+ * compatible portion while consistently using the newer, larger, structures.
+ *
+ * ioctls use a standard meaning for common errnos:
+ *
+ * - ENOTTY: The IOCTL number itself is not supported at all
+ * - E2BIG: The IOCTL number is supported, but the provided structure has
+ * non-zero in a part the kernel does not understand.
+ * - EOPNOTSUPP: The IOCTL number is supported, and the structure is
+ * understood, however a known field has a value the kernel does not
+ * understand or support.
+ * - EINVAL: Everything about the IOCTL was understood, but a field is not
+ * correct.
+ * - ENOMEM: Out of memory.
+ * - ENODEV: The underlying device has been hot-unplugged and the FD is
+ * orphaned.
+ *
+ * As well as additional errnos, within specific ioctls.
+ */
+enum {
+ FWCTL_CMD_BASE = 0,
+};
+
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 13:06 ` Jonathan Cameron
2025-02-08 0:21 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
` (10 subsequent siblings)
13 siblings, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Userspace will need to know some details about the fwctl interface being
used to locate the correct userspace code to communicate with the
kernel. Provide a simple device_type enum indicating what the kernel
driver is.
Allow the device to provide a device specific info struct that contains
any additional information that the driver may need to provide to
userspace.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/fwctl/main.c | 51 ++++++++++++++++++++++++++++++++++++++
include/linux/fwctl.h | 12 +++++++++
include/uapi/fwctl/fwctl.h | 32 ++++++++++++++++++++++++
3 files changed, 95 insertions(+)
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index d561deaf2b86d8..4b6792f2031e86 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -27,8 +27,58 @@ struct fwctl_ucmd {
u32 user_size;
};
+static int ucmd_respond(struct fwctl_ucmd *ucmd, size_t cmd_len)
+{
+ if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
+ min_t(size_t, ucmd->user_size, cmd_len)))
+ return -EFAULT;
+ return 0;
+}
+
+static int copy_to_user_zero_pad(void __user *to, const void *from,
+ size_t from_len, size_t user_len)
+{
+ size_t copy_len;
+
+ copy_len = min(from_len, user_len);
+ if (copy_to_user(to, from, copy_len))
+ return -EFAULT;
+ if (copy_len < user_len) {
+ if (clear_user(to + copy_len, user_len - copy_len))
+ return -EFAULT;
+ }
+ return 0;
+}
+
+static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
+{
+ struct fwctl_device *fwctl = ucmd->uctx->fwctl;
+ struct fwctl_info *cmd = ucmd->cmd;
+ size_t driver_info_len = 0;
+
+ if (cmd->flags)
+ return -EOPNOTSUPP;
+
+ if (cmd->device_data_len) {
+ void *driver_info __free(kfree) =
+ fwctl->ops->info(ucmd->uctx, &driver_info_len);
+ if (IS_ERR(driver_info))
+ return PTR_ERR(driver_info);
+
+ if (copy_to_user_zero_pad(u64_to_user_ptr(cmd->out_device_data),
+ driver_info, driver_info_len,
+ cmd->device_data_len))
+ return -EFAULT;
+ }
+
+ cmd->out_device_type = fwctl->ops->device_type;
+ cmd->device_data_len = driver_info_len;
+ return ucmd_respond(ucmd, sizeof(*cmd));
+}
+
/* On stack memory for the ioctl structs */
union ucmd_buffer {
+ struct fwctl_info info;
};
struct fwctl_ioctl_op {
@@ -48,6 +98,7 @@ struct fwctl_ioctl_op {
.execute = _fn, \
}
static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
+ IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
};
static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 93b470efb9dbc3..9b6cc8ae1aa0ca 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -7,6 +7,7 @@
#include <linux/device.h>
#include <linux/cdev.h>
#include <linux/cleanup.h>
+#include <uapi/fwctl/fwctl.h>
struct fwctl_device;
struct fwctl_uctx;
@@ -19,6 +20,10 @@ struct fwctl_uctx;
* it will block device hot unplug and module unloading.
*/
struct fwctl_ops {
+ /**
+ * @device_type: The drivers assigned device_type number. This is uABI.
+ */
+ enum fwctl_device_type device_type;
/**
* @uctx_size: The size of the fwctl_uctx struct to allocate. The first
* bytes of this memory will be a fwctl_uctx. The driver can use the
@@ -35,6 +40,13 @@ struct fwctl_ops {
* is closed.
*/
void (*close_uctx)(struct fwctl_uctx *uctx);
+ /**
+ * @info: Implement FWCTL_INFO. Return a kmalloc() memory that is copied
+ * to out_device_data. On input length indicates the size of the user
+ * buffer on output it indicates the size of the memory. The driver can
+ * ignore length on input, the core code will handle everything.
+ */
+ void *(*info)(struct fwctl_uctx *uctx, size_t *length);
};
/**
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index f4718a6240f281..ac66853200a5a8 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -4,6 +4,9 @@
#ifndef _UAPI_FWCTL_H
#define _UAPI_FWCTL_H
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
#define FWCTL_TYPE 0x9A
/**
@@ -33,6 +36,35 @@
*/
enum {
FWCTL_CMD_BASE = 0,
+ FWCTL_CMD_INFO = 0,
+ FWCTL_CMD_RPC = 1,
};
+enum fwctl_device_type {
+ FWCTL_DEVICE_TYPE_ERROR = 0,
+};
+
+/**
+ * struct fwctl_info - ioctl(FWCTL_INFO)
+ * @size: sizeof(struct fwctl_info)
+ * @flags: Must be 0
+ * @out_device_type: Returns the type of the device from enum fwctl_device_type
+ * @device_data_len: On input the length of the out_device_data memory. On
+ * output the size of the kernel's device_data which may be larger or
+ * smaller than the input. Maybe 0 on input.
+ * @out_device_data: Pointer to a memory of device_data_len bytes. Kernel will
+ * fill the entire memory, zeroing as required.
+ *
+ * Returns basic information about this fwctl instance, particularly what driver
+ * is being used to define the device_data format.
+ */
+struct fwctl_info {
+ __u32 size;
+ __u32 flags;
+ __u32 out_device_type;
+ __u32 device_data_len;
+ __aligned_u64 out_device_data;
+};
+#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 04/10] taint: Add TAINT_FWCTL
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (2 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 13:09 ` Jonathan Cameron
2025-02-08 0:24 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
` (9 subsequent siblings)
13 siblings, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Requesting a fwctl scope of access that includes mutating device debug
data will cause the kernel to be tainted. Changing the device operation
through things in the debug scope may cause the device to malfunction in
undefined ways. This should be reflected in the TAINT flags to help any
debuggers understand that something has been done.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
Documentation/admin-guide/tainted-kernels.rst | 5 +++++
include/linux/panic.h | 3 ++-
kernel/panic.c | 1 +
tools/debugging/kernel-chktaint | 8 ++++++++
4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 700aa72eecb169..a0cc017e44246f 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -101,6 +101,7 @@ Bit Log Number Reason that got the kernel tainted
16 _/X 65536 auxiliary taint, defined for and used by distros
17 _/T 131072 kernel was built with the struct randomization plugin
18 _/N 262144 an in-kernel test has been run
+ 19 _/J 524288 userspace used a mutating debug operation in fwctl
=== === ====== ========================================================
Note: The character ``_`` is representing a blank in this table to make reading
@@ -184,3 +185,7 @@ More detailed explanation for tainting
build time.
18) ``N`` if an in-kernel test, such as a KUnit test, has been run.
+
+ 19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
+ to use the devices debugging features. Device debugging features could
+ cause the device to malfunction in undefined ways.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index 54d90b6c5f47bd..2494d51707ef42 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -74,7 +74,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
#define TAINT_AUX 16
#define TAINT_RANDSTRUCT 17
#define TAINT_TEST 18
-#define TAINT_FLAGS_COUNT 19
+#define TAINT_FWCTL 19
+#define TAINT_FLAGS_COUNT 20
#define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)
struct taint_flag {
diff --git a/kernel/panic.c b/kernel/panic.c
index d8635d5cecb250..0c55eec9e8744a 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -511,6 +511,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
TAINT_FLAG(AUX, 'X', ' ', true),
TAINT_FLAG(RANDSTRUCT, 'T', ' ', true),
TAINT_FLAG(TEST, 'N', ' ', true),
+ TAINT_FLAG(FWCTL, 'J', ' ', true),
};
#undef TAINT_FLAG
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 279be06332be99..e7da0909d09707 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -204,6 +204,14 @@ else
echo " * an in-kernel test (such as a KUnit test) has been run (#18)"
fi
+T=`expr $T / 2`
+if [ `expr $T % 2` -eq 0 ]; then
+ addout " "
+else
+ addout "J"
+ echo " * fwctl's mutating debug interface was used (#19)"
+fi
+
echo "For a more detailed explanation of the various taint flags see"
echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources"
echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html"
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (3 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-08 0:28 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 06/10] fwctl: Add documentation Jason Gunthorpe
` (8 subsequent siblings)
13 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Add the FWCTL_RPC ioctl which allows a request/response RPC call to device
firmware. Drivers implementing this call must follow the security
guidelines under Documentation/userspace-api/fwctl.rst
The core code provides some memory management helpers to get the messages
copied from and back to userspace. The driver is responsible for
allocating the output message memory and delivering the message to the
device.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/fwctl/main.c | 60 +++++++++++++++++++++++++++++++++
include/linux/fwctl.h | 8 +++++
include/uapi/fwctl/fwctl.h | 68 ++++++++++++++++++++++++++++++++++++++
3 files changed, 136 insertions(+)
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index 4b6792f2031e86..a5e26944b830b5 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -8,17 +8,20 @@
#include <linux/container_of.h>
#include <linux/fs.h>
#include <linux/module.h>
+#include <linux/sizes.h>
#include <linux/slab.h>
#include <uapi/fwctl/fwctl.h>
enum {
FWCTL_MAX_DEVICES = 4096,
+ MAX_RPC_LEN = SZ_2M,
};
static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
static dev_t fwctl_dev;
static DEFINE_IDA(fwctl_ida);
+static unsigned long fwctl_tainted;
struct fwctl_ucmd {
struct fwctl_uctx *uctx;
@@ -76,9 +79,65 @@ static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
return ucmd_respond(ucmd, sizeof(*cmd));
}
+static int fwctl_cmd_rpc(struct fwctl_ucmd *ucmd)
+{
+ struct fwctl_device *fwctl = ucmd->uctx->fwctl;
+ struct fwctl_rpc *cmd = ucmd->cmd;
+ size_t out_len;
+
+ if (cmd->in_len > MAX_RPC_LEN || cmd->out_len > MAX_RPC_LEN)
+ return -EMSGSIZE;
+
+ switch (cmd->scope) {
+ case FWCTL_RPC_CONFIGURATION:
+ case FWCTL_RPC_DEBUG_READ_ONLY:
+ break;
+
+ case FWCTL_RPC_DEBUG_WRITE_FULL:
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+ fallthrough;
+ case FWCTL_RPC_DEBUG_WRITE:
+ if (!test_and_set_bit(0, &fwctl_tainted)) {
+ dev_warn(
+ &fwctl->dev,
+ "%s(%d): has requested full access to the physical device device",
+ current->comm, task_pid_nr(current));
+ add_taint(TAINT_FWCTL, LOCKDEP_STILL_OK);
+ }
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ void *inbuf __free(kvfree) = kvzalloc(cmd->in_len, GFP_KERNEL_ACCOUNT);
+ if (!inbuf)
+ return -ENOMEM;
+ if (copy_from_user(inbuf, u64_to_user_ptr(cmd->in), cmd->in_len))
+ return -EFAULT;
+
+ out_len = cmd->out_len;
+ void *outbuf __free(kvfree) = fwctl->ops->fw_rpc(
+ ucmd->uctx, cmd->scope, inbuf, cmd->in_len, &out_len);
+ if (IS_ERR(outbuf))
+ return PTR_ERR(outbuf);
+ if (outbuf == inbuf) {
+ /* The driver can re-use inbuf as outbuf */
+ inbuf = NULL;
+ }
+
+ if (copy_to_user(u64_to_user_ptr(cmd->out), outbuf,
+ min(cmd->out_len, out_len)))
+ return -EFAULT;
+
+ cmd->out_len = out_len;
+ return ucmd_respond(ucmd, sizeof(*cmd));
+}
+
/* On stack memory for the ioctl structs */
union ucmd_buffer {
struct fwctl_info info;
+ struct fwctl_rpc rpc;
};
struct fwctl_ioctl_op {
@@ -99,6 +158,7 @@ struct fwctl_ioctl_op {
}
static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
+ IOCTL_OP(FWCTL_RPC, fwctl_cmd_rpc, struct fwctl_rpc, out),
};
static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 9b6cc8ae1aa0ca..c2fcaa17a2bcd5 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -47,6 +47,14 @@ struct fwctl_ops {
* ignore length on input, the core code will handle everything.
*/
void *(*info)(struct fwctl_uctx *uctx, size_t *length);
+ /**
+ * @fw_rpc: Implement FWCTL_RPC. Deliver rpc_in/in_len to the FW and
+ * return the response and set out_len. rpc_in can be returned as the
+ * response pointer. Otherwise the returned pointer is freed with
+ * kvfree().
+ */
+ void *(*fw_rpc)(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+ void *rpc_in, size_t in_len, size_t *out_len);
};
/**
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index ac66853200a5a8..7a21f2f011917a 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -67,4 +67,72 @@ struct fwctl_info {
};
#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
+/**
+ * enum fwctl_rpc_scope - Scope of access for the RPC
+ *
+ * Refer to fwctl.rst for a more detailed discussion of these scopes.
+ */
+enum fwctl_rpc_scope {
+ /**
+ * @FWCTL_RPC_CONFIGURATION: Device configuration access scope
+ *
+ * Read/write access to device configuration. When configuration
+ * is written to the device it remains in a fully supported state.
+ */
+ FWCTL_RPC_CONFIGURATION = 0,
+ /**
+ * @FWCTL_RPC_DEBUG_READ_ONLY: Read only access to debug information
+ *
+ * Readable debug information. Debug information is compatible with
+ * kernel lockdown, and does not disclose any sensitive information. For
+ * instance exposing any encryption secrets from this information is
+ * forbidden.
+ */
+ FWCTL_RPC_DEBUG_READ_ONLY = 1,
+ /**
+ * @FWCTL_RPC_DEBUG_WRITE: Writable access to lockdown compatible debug information
+ *
+ * Allows write access to data in the device which may leave a fully
+ * supported state. This is intended to permit intensive and possibly
+ * invasive debugging. This scope will taint the kernel.
+ */
+ FWCTL_RPC_DEBUG_WRITE = 2,
+ /**
+ * @FWCTL_RPC_DEBUG_WRITE_FULL: Write access to all debug information
+ *
+ * Allows read/write access to everything. Requires CAP_SYS_RAW_IO, so
+ * it is not required to follow lockdown principals. If in doubt
+ * debugging should be placed in this scope. This scope will taint the
+ * kernel.
+ */
+ FWCTL_RPC_DEBUG_WRITE_FULL = 3,
+};
+
+/**
+ * struct fwctl_rpc - ioctl(FWCTL_RPC)
+ * @size: sizeof(struct fwctl_rpc)
+ * @scope: One of enum fwctl_rpc_scope, required scope for the RPC
+ * @in_len: Length of the in memory
+ * @out_len: Length of the out memory
+ * @in: Request message in device specific format
+ * @out: Response message in device specific format
+ *
+ * Deliver a Remote Procedure Call to the device FW and return the response. The
+ * call's parameters and return are marshaled into linear buffers of memory. Any
+ * errno indicates that delivery of the RPC to the device failed. Return status
+ * originating in the device during a successful delivery must be encoded into
+ * out.
+ *
+ * The format of the buffers matches the out_device_type from FWCTL_INFO.
+ */
+struct fwctl_rpc {
+ __u32 size;
+ __u32 scope;
+ __u32 in_len;
+ __u32 out_len;
+ __aligned_u64 in;
+ __aligned_u64 out;
+};
+#define FWCTL_RPC _IO(FWCTL_TYPE, FWCTL_CMD_RPC)
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 06/10] fwctl: Add documentation
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (4 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 14:42 ` Jonathan Cameron
2025-02-08 0:40 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
` (7 subsequent siblings)
13 siblings, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Document the purpose and rules for the fwctl subsystem.
Link in kdocs to the doc tree.
Nacked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++++++++++
Documentation/userspace-api/fwctl/index.rst | 12 +
Documentation/userspace-api/index.rst | 1 +
MAINTAINERS | 2 +-
4 files changed, 299 insertions(+), 1 deletion(-)
create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
create mode 100644 Documentation/userspace-api/fwctl/index.rst
diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
new file mode 100644
index 00000000000000..428f6f5bb9b4f9
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/fwctl.rst
@@ -0,0 +1,285 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+fwctl subsystem
+===============
+
+:Author: Jason Gunthorpe
+
+Overview
+========
+
+Modern devices contain extensive amounts of FW, and in many cases, are largely
+software-defined pieces of hardware. The evolution of this approach is largely a
+reaction to Moore's Law where a chip tape out is now highly expensive, and the
+chip design is extremely large. Replacing fixed HW logic with a flexible and
+tightly coupled FW/HW combination is an effective risk mitigation against chip
+respin. Problems in the HW design can be counteracted in device FW. This is
+especially true for devices which present a stable and backwards compatible
+interface to the operating system driver (such as NVMe).
+
+The FW layer in devices has grown to incredible size and devices frequently
+integrate clusters of fast processors to run it. For example, mlx5 devices have
+over 30MB of FW code, and big configurations operate with over 1GB of FW managed
+runtime state.
+
+The availability of such a flexible layer has created quite a variety in the
+industry where single pieces of silicon are now configurable software-defined
+devices and can operate in substantially different ways depending on the need.
+Further, we often see cases where specific sites wish to operate devices in ways
+that are highly specialized and require applications that have been tailored to
+their unique configuration.
+
+Further, devices have become multi-functional and integrated to the point they
+no longer fit neatly into the kernel's division of subsystems. Modern
+multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
+subsystems while sharing the underlying hardware using the auxiliary device
+system.
+
+All together this creates a challenge for the operating system, where devices
+have an expansive FW environment that needs robust device-specific debugging
+support, and FW-driven functionality that is not well suited to “generic”
+interfaces. fwctl seeks to allow access to the full device functionality from
+user space in the areas of debuggability, management, and first-boot/nth-boot
+provisioning.
+
+fwctl is aimed at the common device design pattern where the OS and FW
+communicate via an RPC message layer constructed with a queue or mailbox scheme.
+In this case the driver will typically have some layer to deliver RPC messages
+and collect RPC responses from device FW. The in-kernel subsystem drivers that
+operate the device for its primary purposes will use these RPCs to build their
+drivers, but devices also usually have a set of ancillary RPCs that don't really
+fit into any specific subsystem. For example, a HW RAID controller is primarily
+operated by the block layer but also comes with a set of RPCs to administer the
+construction of drives within the HW RAID.
+
+In the past when devices were more single function, individual subsystems would
+grow different approaches to solving some of these common problems. For instance
+monitoring device health, manipulating its FLASH, debugging the FW,
+provisioning, all have various unique interfaces across the kernel.
+
+fwctl's purpose is to define a common set of limited rules, described below,
+that allow user space to securely construct and execute RPCs inside device FW.
+The rules serve as an agreement between the operating system and FW on how to
+correctly design the RPC interface. As a uAPI the subsystem provides a thin
+layer of discovery and a generic uAPI to deliver the RPCs and collect the
+response. It supports a system of user space libraries and tools which will
+use this interface to control the device using the device native protocols.
+
+Scope of Action
+---------------
+
+fwctl drivers are strictly restricted to being a way to operate the device FW.
+It is not an avenue to access random kernel internals, or other operating system
+SW states.
+
+fwctl instances must operate on a well-defined device function, and the device
+should have a well-defined security model for what scope within the physical
+device the function is permitted to access. For instance, the most complex PCIe
+device today may broadly have several function-level scopes:
+
+ 1. A privileged function with full access to the on-device global state and
+ configuration
+
+ 2. Multiple hypervisor functions with control over itself and child functions
+ used with VMs
+
+ 3. Multiple VM functions tightly scoped within the VM
+
+The device may create a logical parent/child relationship between these scopes.
+For instance a child VM's FW may be within the scope of the hypervisor FW. It is
+quite common in the VFIO world that the hypervisor environment has a complex
+provisioning/profiling/configuration responsibility for the function VFIO
+assigns to the VM.
+
+Further, within the function, devices often have RPC commands that fall within
+some general scopes of action (see enum fwctl_rpc_scope):
+
+ 1. Access to function & child configuration, FLASH, etc. that becomes live at a
+ function reset. Access to function & child runtime configuration that is
+ transparent or non-disruptive to any driver or VM.
+
+ 2. Read-only access to function debug information that may report on FW objects
+ in the function & child, including FW objects owned by other kernel
+ subsystems.
+
+ 3. Write access to function & child debug information strictly compatible with
+ the principles of kernel lockdown and kernel integrity protection. Triggers
+ a kernel Taint.
+
+ 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
+
+User space will provide a scope label on each RPC and the kernel must enforce the
+above CAPs and taints based on that scope. A combination of kernel and FW can
+enforce that RPCs are placed in the correct scope by user space.
+
+Denied behavior
+---------------
+
+There are many things this interface must not allow user space to do (without a
+Taint or CAP), broadly derived from the principles of kernel lockdown. Some
+examples:
+
+ 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
+ untrusted code, or otherwise compromise device or system security and
+ integrity.
+
+ 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
+ objects owned by kernel drivers.
+
+ 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
+ driver can react to the device configuration at function reset/driver load
+ time, but otherwise must not be coupled to fwctl.
+
+ 4. Operate the HW in a way that overlaps with the core purpose of another
+ primary kernel subsystem, such as read/write to LBAs, send/receive of
+ network packets, or operate an accelerator's data plane.
+
+fwctl is not a replacement for device direct access subsystems like uacce or
+VFIO.
+
+Operations exposed through fwctl's non-taining interfaces should be fully
+sharable with other users of the device. For instance exposing a RPC through
+fwctl should never prevent a kernel subsystem from also concurrently using that
+same RPC or hardware unit down the road. In such cases fwctl will be less
+important than proper kernel subsystems that eventually emerge. Mistakes in this
+area resulting in clashes will be resolved in favour of a kernel implementation.
+
+fwctl User API
+==============
+
+.. kernel-doc:: include/uapi/fwctl/fwctl.h
+.. kernel-doc:: include/uapi/fwctl/mlx5.h
+
+sysfs Class
+-----------
+
+fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
+(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
+operates the iotcl uAPI described above.
+
+fwctl devices can be related to driver components in other subsystems through
+sysfs::
+
+ $ ls /sys/class/fwctl/fwctl0/device/infiniband/
+ ibp0s10f0
+
+ $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
+ fwctl0/
+
+ $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
+ dev device power subsystem uevent
+
+User space Community
+--------------------
+
+Drawing inspiration from nvme-cli, participating in the kernel side must come
+with a user space in a common TBD git tree, at a minimum to usefully operate the
+kernel driver. Providing such an implementation is a pre-condition to merging a
+kernel driver.
+
+The goal is to build user space community around some of the shared problems
+we all have, and ideally develop some common user space programs with some
+starting themes of:
+
+ - Device in-field debugging
+
+ - HW provisioning
+
+ - VFIO child device profiling before VM boot
+
+ - Confidential Compute topics (attestation, secure provisioning)
+
+that stretch across all subsystems in the kernel. fwupd is a great example of
+how an excellent user space experience can emerge out of kernel-side diversity.
+
+fwctl Kernel API
+================
+
+.. kernel-doc:: drivers/fwctl/main.c
+ :export:
+.. kernel-doc:: include/linux/fwctl.h
+
+fwctl Driver design
+-------------------
+
+In many cases a fwctl driver is going to be part of a larger cross-subsystem
+device possibly using the auxiliary_device mechanism. In that case several
+subsystems are going to be sharing the same device and FW interface layer so the
+device design must already provide for isolation and cooperation between kernel
+subsystems. fwctl should fit into that same model.
+
+Part of the driver should include a description of how its scope restrictions
+and security model work. The driver and FW together must ensure that RPCs
+provided by user space are mapped to the appropriate scope. If the validation is
+done in the driver then the validation can read a 'command effects' report from
+the device, or hardwire the enforcement. If the validation is done in the FW,
+then the driver should pass the fwctl_rpc_scope to the FW along with the command.
+
+The driver and FW must cooperate to ensure that either fwctl cannot allocate
+any FW resources, or any resources it does allocate are freed on FD closure. A
+driver primarily constructed around FW RPCs may find that its core PCI function
+and RPC layer belongs under fwctl with auxiliary devices connecting to other
+subsystems.
+
+Each device type must be mindful of Linux's philosophy for stable ABI. The FW
+RPC interface does not have to meet a strictly stable ABI, but it does need to
+meet an expectation that userspace tools that are deployed and in significant
+use don't needlessly break. FW upgrade and kernel upgrade should keep widely
+deployed tooling working.
+
+Development and debugging focused RPCs under more permissive scopes can have
+less stablitiy if the tools using them are only run under exceptional
+circumstances and not for every day use of the device. Debugging tools may even
+require exact version matching as they may require something similar to DWARF
+debug information from the FW binary.
+
+Security Response
+=================
+
+The kernel remains the gatekeeper for this interface. If violations of the
+scopes, security or isolation principles are found, we have options to let
+devices fix them with a FW update, push a kernel patch to parse and block RPC
+commands or push a kernel patch to block entire firmware versions/devices.
+
+While the kernel can always directly parse and restrict RPCs, it is expected
+that the existing kernel pattern of allowing drivers to delegate validation to
+FW to be a useful design.
+
+Existing Similar Examples
+=========================
+
+The approach described in this document is not a new idea. Direct, or near
+direct device access has been offered by the kernel in different areas for
+decades. With more devices wanting to follow this design pattern it is becoming
+clear that it is not entirely well understood and, more importantly, the
+security considerations are not well defined or agreed upon.
+
+Some examples:
+
+ - HW RAID controllers. This includes RPCs to do things like compose drives into
+ a RAID volume, configure RAID parameters, monitor the HW and more.
+
+ - Baseboard managers. RPCs for configuring settings in the device and more
+
+ - NVMe vendor command capsules. nvme-cli provides access to some monitoring
+ functions that different products have defined, but more exist.
+
+ - CXL also has a NVMe-like vendor command system.
+
+ - DRM allows user space drivers to send commands to the device via kernel
+ mediation
+
+ - RDMA allows user space drivers to directly push commands to the device
+ without kernel involvement
+
+ - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
+
+The first 4 are examples of areas that fwctl intends to cover. The latter three
+are examples of denied behavior as they fully overlap with the primary purpose
+of a kernel subsystem.
+
+Some key lessons learned from these past efforts are the importance of having a
+common user space project to use as a pre-condition for obtaining a kernel
+driver. Developing good community around useful software in user space is key to
+getting companies to fund participation to enable their products.
diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst
new file mode 100644
index 00000000000000..06959fbf154743
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/index.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Firmware Control (FWCTL) Userspace API
+======================================
+
+A framework that define a common set of limited rules that allows user space
+to securely construct and execute RPCs inside device firmware.
+
+.. toctree::
+ :maxdepth: 1
+
+ fwctl
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index b1395d94b3fd0a..e8e861f767fd5c 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -45,6 +45,7 @@ Devices and I/O
accelerators/ocxl
dma-buf-alloc-exchange
+ fwctl/index
gpio/index
iommufd
media/index
diff --git a/MAINTAINERS b/MAINTAINERS
index 5f30adbe6c8521..319169f7cb7e1c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9561,7 +9561,7 @@ FWCTL SUBSYSTEM
M: Jason Gunthorpe <jgg@nvidia.com>
M: Saeed Mahameed <saeedm@nvidia.com>
S: Maintained
-F: Documentation/userspace-api/fwctl.rst
+F: Documentation/userspace-api/fwctl/
F: drivers/fwctl/
F: include/linux/fwctl.h
F: include/uapi/fwctl/
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (5 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 06/10] fwctl: Add documentation Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-13 13:19 ` Przemek Kitszel
2025-02-07 0:13 ` [PATCH v4 08/10] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
` (6 subsequent siblings)
13 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
From: Saeed Mahameed <saeedm@nvidia.com>
mlx5 FW has a built in security context called UID. Each UID has a set of
permissions controlled by the kernel when it is created and every command
is tagged by the kernel with a particular UID. In general commands cannot
reach objects outside of their UID and commands cannot exceed their UID's
permissions. These restrictions are enforced by FW.
This mechanism has long been used in RDMA for the devx interface where
RDMA will sent commands directly to the FW and the UID limitations
restrict those commands to a ib_device/verbs security domain. For instance
commands that would effect other VFs, or global device resources. The
model is suitable for unprivileged userspace to operate the RDMA
functionality.
The UID has been extended with a "tools resources" permission which allows
additional commands and sub-commands that are intended to match with the
scope limitations set in FWCTL. This is an alternative design to the
"command intent log" where the FW does the enforcement rather than having
the FW report the enforcement the kernel should do.
Consistent with the fwctl definitions the "tools resources" security
context is limited to the FWCTL_RPC_CONFIGURATION,
FWCTL_RPC_DEBUG_READ_ONLY, FWCTL_RPC_DEBUG_WRITE, and
FWCTL_RPC_DEBUG_WRITE_FULL security scopes.
Like RDMA devx, each opened fwctl file descriptor will get a unique UID
associated with each file descriptor.
The fwctl driver is kept simple and we reject commands that can create
objects as the UID mechanism relies on the kernel to track and destroy
objects prior to detroying the UID. Filtering into fwctl sub scopes is
done inside the driver with a switch statement. This substantially limits
what is possible to primarily query functions ad a few limited set
operations.
mlx5 already has a robust infrastructure for delivering RPC messages to
fw. Trivially connect fwctl's RPC mechanism to mlx5_cmd_do(). Enforce the
User Context ID in every RPC header accepted from the FD so the FW knows
the security context of the issuing ID.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
MAINTAINERS | 7 +
drivers/fwctl/Kconfig | 14 ++
drivers/fwctl/Makefile | 1 +
drivers/fwctl/mlx5/Makefile | 4 +
drivers/fwctl/mlx5/main.c | 340 ++++++++++++++++++++++++++++++++++++
include/uapi/fwctl/fwctl.h | 1 +
include/uapi/fwctl/mlx5.h | 36 ++++
7 files changed, 403 insertions(+)
create mode 100644 drivers/fwctl/mlx5/Makefile
create mode 100644 drivers/fwctl/mlx5/main.c
create mode 100644 include/uapi/fwctl/mlx5.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 319169f7cb7e1c..413ab79bf2f43a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9566,6 +9566,13 @@ F: drivers/fwctl/
F: include/linux/fwctl.h
F: include/uapi/fwctl/
+FWCTL MLX5 DRIVER
+M: Saeed Mahameed <saeedm@nvidia.com>
+R: Itay Avraham <itayavr@nvidia.com>
+L: linux-kernel@vger.kernel.org
+S: Maintained
+F: drivers/fwctl/mlx5/
+
GALAXYCORE GC0308 CAMERA SENSOR DRIVER
M: Sebastian Reichel <sre@kernel.org>
L: linux-media@vger.kernel.org
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
index 37147a695add9a..f802cf5d4951e8 100644
--- a/drivers/fwctl/Kconfig
+++ b/drivers/fwctl/Kconfig
@@ -7,3 +7,17 @@ menuconfig FWCTL
support a wide range of lockdown compatible device behaviors including
manipulating device FLASH, debugging, and other activities that don't
fit neatly into an existing subsystem.
+
+if FWCTL
+config FWCTL_MLX5
+ tristate "mlx5 ConnectX control fwctl driver"
+ depends on MLX5_CORE
+ help
+ MLX5 provides interface for the user process to access the debug and
+ configuration registers of the ConnectX hardware family
+ (NICs, PCI switches and SmartNIC SoCs).
+ This will allow configuration and debug tools to work out of the box on
+ mainstream kernel.
+
+ If you don't know what to do here, say N.
+endif
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
index 1cad210f6ba580..1c535f694d7fe4 100644
--- a/drivers/fwctl/Makefile
+++ b/drivers/fwctl/Makefile
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_FWCTL) += fwctl.o
+obj-$(CONFIG_FWCTL_MLX5) += mlx5/
fwctl-y += main.o
diff --git a/drivers/fwctl/mlx5/Makefile b/drivers/fwctl/mlx5/Makefile
new file mode 100644
index 00000000000000..139a23e3c7c517
--- /dev/null
+++ b/drivers/fwctl/mlx5/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL_MLX5) += mlx5_fwctl.o
+
+mlx5_fwctl-y += main.o
diff --git a/drivers/fwctl/mlx5/main.c b/drivers/fwctl/mlx5/main.c
new file mode 100644
index 00000000000000..a8564bac27b5c1
--- /dev/null
+++ b/drivers/fwctl/mlx5/main.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/fwctl.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/driver.h>
+#include <uapi/fwctl/mlx5.h>
+
+#define mlx5ctl_err(mcdev, format, ...) \
+ dev_err(&mcdev->fwctl.dev, format, ##__VA_ARGS__)
+
+#define mlx5ctl_dbg(mcdev, format, ...) \
+ dev_dbg(&mcdev->fwctl.dev, "PID %u: " format, current->pid, \
+ ##__VA_ARGS__)
+
+struct mlx5ctl_uctx {
+ struct fwctl_uctx uctx;
+ u32 uctx_caps;
+ u32 uctx_uid;
+};
+
+struct mlx5ctl_dev {
+ struct fwctl_device fwctl;
+ struct mlx5_core_dev *mdev;
+};
+DEFINE_FREE(mlx5ctl, struct mlx5ctl_dev *, if (_T) fwctl_put(&_T->fwctl));
+
+struct mlx5_ifc_mbox_in_hdr_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mbox_out_hdr_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
+enum {
+ MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES = 0x4,
+};
+
+enum {
+ MLX5_CMD_OP_QUERY_DIAGNOSTIC_PARAMS = 0x819,
+ MLX5_CMD_OP_SET_DIAGNOSTIC_PARAMS = 0x820,
+ MLX5_CMD_OP_QUERY_DIAGNOSTIC_COUNTERS = 0x821,
+ MLX5_CMD_OP_POSTPONE_CONNECTED_QP_TIMEOUT = 0xb2e,
+};
+
+static int mlx5ctl_alloc_uid(struct mlx5ctl_dev *mcdev, u32 cap)
+{
+ u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {};
+ void *uctx;
+ int ret;
+ u16 uid;
+
+ uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx);
+
+ mlx5ctl_dbg(mcdev, "%s: caps 0x%x\n", __func__, cap);
+ MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
+ MLX5_SET(uctx, uctx, cap, cap);
+
+ ret = mlx5_cmd_exec(mcdev->mdev, in, sizeof(in), out, sizeof(out));
+ if (ret)
+ return ret;
+
+ uid = MLX5_GET(create_uctx_out, out, uid);
+ mlx5ctl_dbg(mcdev, "allocated uid %u with caps 0x%x\n", uid, cap);
+ return uid;
+}
+
+static void mlx5ctl_release_uid(struct mlx5ctl_dev *mcdev, u16 uid)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_uctx_in)] = {};
+ struct mlx5_core_dev *mdev = mcdev->mdev;
+ int ret;
+
+ MLX5_SET(destroy_uctx_in, in, opcode, MLX5_CMD_OP_DESTROY_UCTX);
+ MLX5_SET(destroy_uctx_in, in, uid, uid);
+
+ ret = mlx5_cmd_exec_in(mdev, destroy_uctx, in);
+ mlx5ctl_dbg(mcdev, "released uid %u %pe\n", uid, ERR_PTR(ret));
+}
+
+static int mlx5ctl_open_uctx(struct fwctl_uctx *uctx)
+{
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+ struct mlx5ctl_dev *mcdev =
+ container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+ int uid;
+
+ /*
+ * New FW supports the TOOLS_RESOURCES uid security label
+ * which allows commands to manipulate the global device state.
+ * Otherwise only basic existing RDMA devx privilege are allowed.
+ */
+ if (MLX5_CAP_GEN(mcdev->mdev, uctx_cap) &
+ MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES)
+ mfd->uctx_caps |= MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES;
+
+ uid = mlx5ctl_alloc_uid(mcdev, mfd->uctx_caps);
+ if (uid < 0)
+ return uid;
+
+ mfd->uctx_uid = uid;
+ return 0;
+}
+
+static void mlx5ctl_close_uctx(struct fwctl_uctx *uctx)
+{
+ struct mlx5ctl_dev *mcdev =
+ container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+
+ mlx5ctl_release_uid(mcdev, mfd->uctx_uid);
+}
+
+static void *mlx5ctl_info(struct fwctl_uctx *uctx, size_t *length)
+{
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+ struct fwctl_info_mlx5 *info;
+
+ info = kzalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ info->uid = mfd->uctx_uid;
+ info->uctx_caps = mfd->uctx_caps;
+ *length = sizeof(*info);
+ return info;
+}
+
+static bool mlx5ctl_validate_rpc(const void *in, enum fwctl_rpc_scope scope)
+{
+ u16 opcode = MLX5_GET(mbox_in_hdr, in, opcode);
+
+ /*
+ * Currently the driver can't keep track of commands that allocate
+ * objects in the FW, these commands are safe from a security
+ * perspective but nothing will free the memory when the FD is closed.
+ * For now permit only query commands and set commands that don't alter
+ * objects. Also the caps for the scope have not been defined yet,
+ * filter commands manually for now.
+ */
+ switch (opcode) {
+ case MLX5_CMD_OP_POSTPONE_CONNECTED_QP_TIMEOUT:
+ case MLX5_CMD_OP_QUERY_ADAPTER:
+ case MLX5_CMD_OP_QUERY_ESW_FUNCTIONS:
+ case MLX5_CMD_OP_QUERY_HCA_CAP:
+ case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT:
+ case MLX5_CMD_OP_QUERY_ROCE_ADDRESS:
+ case MLX5_CMD_OPCODE_QUERY_VUID:
+ /*
+ * FW limits SET_HCA_CAP on the tools UID to only the other function
+ * mode which is used for function pre-configuration
+ */
+ case MLX5_CMD_OP_SET_HCA_CAP:
+ return true; /* scope >= FWCTL_RPC_CONFIGURATION; */
+
+ case MLX5_CMD_OP_QUERY_CONG_PARAMS:
+ case MLX5_CMD_OP_QUERY_CONG_STATISTICS:
+ case MLX5_CMD_OP_QUERY_CONG_STATUS:
+ case MLX5_CMD_OP_QUERY_CQ:
+ case MLX5_CMD_OP_QUERY_DCT:
+ case MLX5_CMD_OP_QUERY_DIAGNOSTIC_COUNTERS:
+ case MLX5_CMD_OP_QUERY_DIAGNOSTIC_PARAMS:
+ case MLX5_CMD_OP_QUERY_EQ:
+ case MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT:
+ case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
+ case MLX5_CMD_OP_QUERY_FLOW_GROUP:
+ case MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY:
+ case MLX5_CMD_OP_QUERY_FLOW_TABLE:
+ case MLX5_CMD_OP_QUERY_GENERAL_OBJECT:
+ case MLX5_CMD_OP_QUERY_ISSI:
+ case MLX5_CMD_OP_QUERY_L2_TABLE_ENTRY:
+ case MLX5_CMD_OP_QUERY_LAG:
+ case MLX5_CMD_OP_QUERY_MAD_DEMUX:
+ case MLX5_CMD_OP_QUERY_MKEY:
+ case MLX5_CMD_OP_QUERY_MODIFY_HEADER_CONTEXT:
+ case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT:
+ case MLX5_CMD_OP_QUERY_PAGES:
+ case MLX5_CMD_OP_QUERY_Q_COUNTER:
+ case MLX5_CMD_OP_QUERY_QP:
+ case MLX5_CMD_OP_QUERY_RMP:
+ case MLX5_CMD_OP_QUERY_RQ:
+ case MLX5_CMD_OP_QUERY_RQT:
+ case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
+ case MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS:
+ case MLX5_CMD_OP_QUERY_SQ:
+ case MLX5_CMD_OP_QUERY_SRQ:
+ case MLX5_CMD_OP_QUERY_TIR:
+ case MLX5_CMD_OP_QUERY_TIS:
+ case MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE:
+ case MLX5_CMD_OP_QUERY_VNIC_ENV:
+ case MLX5_CMD_OP_QUERY_VPORT_COUNTER:
+ case MLX5_CMD_OP_QUERY_VPORT_STATE:
+ case MLX5_CMD_OP_QUERY_WOL_ROL:
+ case MLX5_CMD_OP_QUERY_XRC_SRQ:
+ case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY:
+ case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS:
+ case MLX5_CMD_OP_QUERY_XRQ:
+ return scope >= FWCTL_RPC_DEBUG_READ_ONLY;
+
+ case MLX5_CMD_OP_SET_DIAGNOSTIC_PARAMS:
+ return scope >= FWCTL_RPC_DEBUG_WRITE;
+
+ case MLX5_CMD_OP_ACCESS_REG:
+ return scope >= FWCTL_RPC_DEBUG_WRITE_FULL;
+ default:
+ return false;
+ }
+}
+
+static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+ void *rpc_in, size_t in_len, size_t *out_len)
+{
+ struct mlx5ctl_dev *mcdev =
+ container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+ void *rpc_out;
+ int ret;
+
+ if (in_len < MLX5_ST_SZ_BYTES(mbox_in_hdr) ||
+ *out_len < MLX5_ST_SZ_BYTES(mbox_out_hdr))
+ return ERR_PTR(-EMSGSIZE);
+
+ mlx5ctl_dbg(mcdev, "[UID %d] cmdif: opcode 0x%x inlen %zu outlen %zu\n",
+ mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
+ in_len, *out_len);
+
+ if (!mlx5ctl_validate_rpc(rpc_in, scope))
+ return ERR_PTR(-EBADMSG);
+
+ /*
+ * mlx5_cmd_do() copies the input message to its own buffer before
+ * executing it, so we can reuse the allocation for the output.
+ */
+ if (*out_len <= in_len) {
+ rpc_out = rpc_in;
+ } else {
+ rpc_out = kvzalloc(*out_len, GFP_KERNEL);
+ if (!rpc_out)
+ return ERR_PTR(-ENOMEM);
+ }
+
+ /* Enforce the user context for the command */
+ MLX5_SET(mbox_in_hdr, rpc_in, uid, mfd->uctx_uid);
+ ret = mlx5_cmd_do(mcdev->mdev, rpc_in, in_len, rpc_out, *out_len);
+
+ mlx5ctl_dbg(mcdev,
+ "[UID %d] cmdif: opcode 0x%x status 0x%x retval %pe\n",
+ mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
+ MLX5_GET(mbox_out_hdr, rpc_out, status), ERR_PTR(ret));
+
+ /*
+ * -EREMOTEIO means execution succeeded and the out is valid,
+ * but an error code was returned inside out. Everything else
+ * means the RPC did not make it to the device.
+ */
+ if (ret && ret != -EREMOTEIO) {
+ if (rpc_out != rpc_in)
+ kfree(rpc_out);
+ return ERR_PTR(ret);
+ }
+ return rpc_out;
+}
+
+static const struct fwctl_ops mlx5ctl_ops = {
+ .device_type = FWCTL_DEVICE_TYPE_MLX5,
+ .uctx_size = sizeof(struct mlx5ctl_uctx),
+ .open_uctx = mlx5ctl_open_uctx,
+ .close_uctx = mlx5ctl_close_uctx,
+ .info = mlx5ctl_info,
+ .fw_rpc = mlx5ctl_fw_rpc,
+};
+
+static int mlx5ctl_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+
+{
+ struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+ struct mlx5_core_dev *mdev = madev->mdev;
+ struct mlx5ctl_dev *mcdev __free(mlx5ctl) = fwctl_alloc_device(
+ &mdev->pdev->dev, &mlx5ctl_ops, struct mlx5ctl_dev, fwctl);
+ int ret;
+
+ if (!mcdev)
+ return -ENOMEM;
+
+ mcdev->mdev = mdev;
+
+ ret = fwctl_register(&mcdev->fwctl);
+ if (ret)
+ return ret;
+ auxiliary_set_drvdata(adev, no_free_ptr(mcdev));
+ return 0;
+}
+
+static void mlx5ctl_remove(struct auxiliary_device *adev)
+{
+ struct mlx5ctl_dev *mcdev = auxiliary_get_drvdata(adev);
+
+ fwctl_unregister(&mcdev->fwctl);
+ fwctl_put(&mcdev->fwctl);
+}
+
+static const struct auxiliary_device_id mlx5ctl_id_table[] = {
+ {.name = MLX5_ADEV_NAME ".fwctl",},
+ {}
+};
+MODULE_DEVICE_TABLE(auxiliary, mlx5ctl_id_table);
+
+static struct auxiliary_driver mlx5ctl_driver = {
+ .name = "mlx5_fwctl",
+ .probe = mlx5ctl_probe,
+ .remove = mlx5ctl_remove,
+ .id_table = mlx5ctl_id_table,
+};
+
+module_auxiliary_driver(mlx5ctl_driver);
+
+MODULE_IMPORT_NS("FWCTL");
+MODULE_DESCRIPTION("mlx5 ConnectX fwctl driver");
+MODULE_AUTHOR("Saeed Mahameed <saeedm@nvidia.com>");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 7a21f2f011917a..0790b8291ee1bd 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -42,6 +42,7 @@ enum {
enum fwctl_device_type {
FWCTL_DEVICE_TYPE_ERROR = 0,
+ FWCTL_DEVICE_TYPE_MLX5 = 1,
};
/**
diff --git a/include/uapi/fwctl/mlx5.h b/include/uapi/fwctl/mlx5.h
new file mode 100644
index 00000000000000..bcb4602ffdeee4
--- /dev/null
+++ b/include/uapi/fwctl/mlx5.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ *
+ * These are definitions for the command interface for mlx5 HW. mlx5 FW has a
+ * User Context mechanism which allows the FW to understand a security scope.
+ * FWCTL binds each FD to a FW user context and then places the User Context ID
+ * (UID) in each command header. The created User Context has a capability set
+ * that is appropriate for FWCTL's security model.
+ *
+ * Command formation should use a copy of the structs in mlx5_ifc.h following
+ * the Programmers Reference Manual. A open release is available here:
+ *
+ * https://network.nvidia.com/files/doc-2020/ethernet-adapters-programming-manual.pdf
+ *
+ * The device_type for this file is FWCTL_DEVICE_TYPE_MLX5.
+ */
+#ifndef _UAPI_FWCTL_MLX5_H
+#define _UAPI_FWCTL_MLX5_H
+
+#include <linux/types.h>
+
+/**
+ * struct fwctl_info_mlx5 - ioctl(FWCTL_INFO) out_device_data
+ * @uid: The FW UID this FD is bound to. Each command header will force
+ * this value.
+ * @uctx_caps: The FW capabilities that are enabled for the uid.
+ *
+ * Return basic information about the FW interface available.
+ */
+struct fwctl_info_mlx5 {
+ __u32 uid;
+ __u32 uctx_caps;
+};
+
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 08/10] mlx5: Create an auxiliary device for fwctl_mlx5
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (6 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw Jason Gunthorpe
` (5 subsequent siblings)
13 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
From: Saeed Mahameed <saeedm@nvidia.com>
If the device supports User Context then it can support fwctl. Create an
auxiliary device to allow fwctl to bind to it.
Create a sysfs like:
$ ls /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -l
lrwxrwxrwx 1 root root 0 Apr 25 19:46 /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -> ../../../../bus/auxiliary/drivers/mlx5_fwctl.mlx5_fwctl
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 9a79674d27f15a..891bbbbfbbf1a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -228,8 +228,15 @@ enum {
MLX5_INTERFACE_PROTOCOL_VNET,
MLX5_INTERFACE_PROTOCOL_DPLL,
+ MLX5_INTERFACE_PROTOCOL_FWCTL,
};
+static bool is_fwctl_supported(struct mlx5_core_dev *dev)
+{
+ /* fwctl is most useful on PFs, prevent fwctl on SFs for now */
+ return MLX5_CAP_GEN(dev, uctx_cap) && !mlx5_core_is_sf(dev);
+}
+
static const struct mlx5_adev_device {
const char *suffix;
bool (*is_supported)(struct mlx5_core_dev *dev);
@@ -252,6 +259,8 @@ static const struct mlx5_adev_device {
.is_supported = &is_mp_supported },
[MLX5_INTERFACE_PROTOCOL_DPLL] = { .suffix = "dpll",
.is_supported = &is_dpll_supported },
+ [MLX5_INTERFACE_PROTOCOL_FWCTL] = { .suffix = "fwctl",
+ .is_supported = &is_fwctl_supported },
};
int mlx5_adev_idx_alloc(void)
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (7 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 08/10] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 14:59 ` Jonathan Cameron
2025-02-07 0:13 ` [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt Jason Gunthorpe
` (4 subsequent siblings)
13 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
From: Andy Gospodarek <gospo@broadcom.com>
This patch adds basic support for the fwctl infrastructure. With the
approriate tool, the most basic RPC to the bnxt_en firmware can be
called.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/fwctl/Kconfig | 9 ++
drivers/fwctl/Makefile | 1 +
drivers/fwctl/bnxt/Makefile | 4 +
drivers/fwctl/bnxt/bnxt.c | 167 ++++++++++++++++++++++++++++++++++++
include/uapi/fwctl/bnxt.h | 27 ++++++
include/uapi/fwctl/fwctl.h | 1 +
6 files changed, 209 insertions(+)
create mode 100644 drivers/fwctl/bnxt/Makefile
create mode 100644 drivers/fwctl/bnxt/bnxt.c
create mode 100644 include/uapi/fwctl/bnxt.h
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
index f802cf5d4951e8..0a542a247303d7 100644
--- a/drivers/fwctl/Kconfig
+++ b/drivers/fwctl/Kconfig
@@ -9,6 +9,15 @@ menuconfig FWCTL
fit neatly into an existing subsystem.
if FWCTL
+config FWCTL_BNXT
+ tristate "bnxt control fwctl driver"
+ depends on BNXT
+ help
+ BNXT provides interface for the user process to access the debug and
+ configuration registers of the Broadcom NIC hardware family
+
+ If you don't know what to do here, say N.
+
config FWCTL_MLX5
tristate "mlx5 ConnectX control fwctl driver"
depends on MLX5_CORE
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
index 1c535f694d7fe4..5fb289243286ae 100644
--- a/drivers/fwctl/Makefile
+++ b/drivers/fwctl/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_FWCTL) += fwctl.o
+obj-$(CONFIG_FWCTL_BNXT) += bnxt/
obj-$(CONFIG_FWCTL_MLX5) += mlx5/
fwctl-y += main.o
diff --git a/drivers/fwctl/bnxt/Makefile b/drivers/fwctl/bnxt/Makefile
new file mode 100644
index 00000000000000..57c76e0e0c9ca7
--- /dev/null
+++ b/drivers/fwctl/bnxt/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL_BNXT) += bnxt_fwctl.o
+
+bnxt_fwctl-y += bnxt.o
diff --git a/drivers/fwctl/bnxt/bnxt.c b/drivers/fwctl/bnxt/bnxt.c
new file mode 100644
index 00000000000000..d2b9a64a1402bf
--- /dev/null
+++ b/drivers/fwctl/bnxt/bnxt.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, Broadcom Corporation
+ */
+#include <linux/fwctl.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <uapi/fwctl/bnxt.h>
+
+/* FIXME need a include/linux header for the aux related definitions */
+#include <../../../drivers/net/ethernet/broadcom/bnxt/bnxt.h>
+#include <../../../drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h>
+
+struct bnxtctl_uctx {
+ struct fwctl_uctx uctx;
+ u32 uctx_caps;
+};
+
+struct bnxtctl_dev {
+ struct fwctl_device fwctl;
+ struct bnxt_aux_priv *aux_priv;
+};
+
+DEFINE_FREE(bnxtctl, struct bnxtctl_dev *, if (_T) fwctl_put(&_T->fwctl))
+
+static int bnxtctl_open_uctx(struct fwctl_uctx *uctx)
+{
+ struct bnxtctl_uctx *bnxtctl_uctx =
+ container_of(uctx, struct bnxtctl_uctx, uctx);
+
+ bnxtctl_uctx->uctx_caps = BIT(FWCTL_BNXT_QUERY_COMMANDS) |
+ BIT(FWCTL_BNXT_SEND_COMMAND);
+ return 0;
+}
+
+static void bnxtctl_close_uctx(struct fwctl_uctx *uctx)
+{
+}
+
+static void *bnxtctl_info(struct fwctl_uctx *uctx, size_t *length)
+{
+ struct bnxtctl_uctx *bnxtctl_uctx =
+ container_of(uctx, struct bnxtctl_uctx, uctx);
+ struct fwctl_info_bnxt *info;
+
+ info = kzalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ info->uctx_caps = bnxtctl_uctx->uctx_caps;
+
+ *length = sizeof(*info);
+ return info;
+}
+
+/*
+ * bnxt_fw_msg->msg has the whole command
+ * the start of message is of type struct input
+ * struct input {
+ * __le16 req_type;
+ * __le16 cmpl_ring;
+ * __le16 seq_id;
+ * __le16 target_id;
+ * __le64 resp_addr;
+ * };
+ * so the hwrm op should be (struct input *)(hwrm_in->msg)->req_type
+ */
+static bool bnxtctl_validate_rpc(struct fwctl_uctx *uctx,
+ struct bnxt_fw_msg *hwrm_in)
+{
+ struct input *req = (struct input *)hwrm_in->msg;
+
+ switch (req->req_type) {
+ case HWRM_VER_GET:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static void *bnxtctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+ void *in, size_t in_len, size_t *out_len)
+{
+ struct bnxtctl_dev *bnxtctl =
+ container_of(uctx->fwctl, struct bnxtctl_dev, fwctl);
+ struct bnxt_aux_priv *bnxt_aux_priv = bnxtctl->aux_priv;
+ /* FIXME: Check me */
+ struct bnxt_fw_msg rpc_in = {
+ // FIXME: does bnxt_send_msg() copy?
+ .msg = in,
+ .msg_len = in_len,
+ .resp = in,
+ // FIXME: Dynamic memory for out_len
+ .resp_max_len = in_len,
+ };
+ int rc;
+
+ if (!bnxtctl_validate_rpc(uctx, &rpc_in))
+ return ERR_PTR(-EPERM);
+
+ rc = bnxt_send_msg(bnxt_aux_priv->edev, &rpc_in);
+ if (!rc)
+ return ERR_PTR(-EOPNOTSUPP);
+ return in;
+}
+
+static const struct fwctl_ops bnxtctl_ops = {
+ .device_type = FWCTL_DEVICE_TYPE_BNXT,
+ .uctx_size = sizeof(struct bnxtctl_uctx),
+ .open_uctx = bnxtctl_open_uctx,
+ .close_uctx = bnxtctl_close_uctx,
+ .info = bnxtctl_info,
+ .fw_rpc = bnxtctl_fw_rpc,
+};
+
+static int bnxtctl_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+{
+ struct bnxt_aux_priv *aux_priv =
+ container_of(adev, struct bnxt_aux_priv, aux_dev);
+ struct bnxtctl_dev *bnxtctl __free(bnxtctl) =
+ fwctl_alloc_device(&aux_priv->edev->pdev->dev, &bnxtctl_ops,
+ struct bnxtctl_dev, fwctl);
+ int rc;
+
+ if (!bnxtctl)
+ return -ENOMEM;
+
+ bnxtctl->aux_priv = aux_priv;
+
+ rc = fwctl_register(&bnxtctl->fwctl);
+ if (rc)
+ return rc;
+
+ auxiliary_set_drvdata(adev, no_free_ptr(bnxtctl));
+ return 0;
+}
+
+static void bnxtctl_remove(struct auxiliary_device *adev)
+{
+ struct bnxtctl_dev *ctldev = auxiliary_get_drvdata(adev);
+
+ fwctl_unregister(&ctldev->fwctl);
+ fwctl_put(&ctldev->fwctl);
+}
+
+static const struct auxiliary_device_id bnxtctl_id_table[] = {
+ { .name = "bnxt_en.fwctl", },
+ {},
+};
+MODULE_DEVICE_TABLE(auxiliary, bnxtctl_id_table);
+
+static struct auxiliary_driver bnxtctl_driver = {
+ .name = "bnxt_fwctl",
+ .probe = bnxtctl_probe,
+ .remove = bnxtctl_remove,
+ .id_table = bnxtctl_id_table,
+};
+
+module_auxiliary_driver(bnxtctl_driver);
+
+MODULE_IMPORT_NS(BNXT);
+MODULE_IMPORT_NS(FWCTL);
+MODULE_DESCRIPTION("BNXT fwctl driver");
+MODULE_AUTHOR("Broadcom Corporation");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/fwctl/bnxt.h b/include/uapi/fwctl/bnxt.h
new file mode 100644
index 00000000000000..ea47fdbbf6ea3e
--- /dev/null
+++ b/include/uapi/fwctl/bnxt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2024, Broadcom Corporation
+ *
+ */
+#ifndef _UAPI_FWCTL_BNXT_H_
+#define _UAPI_FWCTL_BNXT_H_
+
+#include <linux/types.h>
+
+enum fwctl_bnxt_commands {
+ FWCTL_BNXT_QUERY_COMMANDS = 0,
+ FWCTL_BNXT_SEND_COMMAND,
+};
+
+/**
+ * struct fwctl_info_bnxt - ioctl(FWCTL_INFO) out_device_data
+ * @uctx_caps: The command capabilities driver accepts.
+ *
+ * Return basic information about the FW interface available.
+ */
+struct fwctl_info_bnxt {
+ __u32 uctx_caps;
+ __u32 reserved;
+};
+
+#endif
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 0790b8291ee1bd..518f054f02d2d8 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -43,6 +43,7 @@ enum {
enum fwctl_device_type {
FWCTL_DEVICE_TYPE_ERROR = 0,
FWCTL_DEVICE_TYPE_MLX5 = 1,
+ FWCTL_DEVICE_TYPE_BNXT = 3,
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (8 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw Jason Gunthorpe
@ 2025-02-07 0:13 ` Jason Gunthorpe
2025-02-07 0:44 ` Jakub Kicinski
2025-02-07 21:41 ` [PATCH v4 00/10] Introduce fwctl subystem Dan Williams
` (3 subsequent siblings)
13 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 0:13 UTC (permalink / raw)
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
From: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 126 +++++++++++++++++-
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 4 +
4 files changed, 132 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7b8b5b39c7bbe8..bf33e7f26b1fd2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -16291,6 +16291,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
bnxt_init_ring_params(bp);
bnxt_set_ring_params(bp);
bnxt_rdma_aux_device_init(bp);
+ bnxt_fwctl_aux_device_init(bp);
+
rc = bnxt_set_dflt_rings(bp, true);
if (rc) {
if (BNXT_VF(bp) && rc == -ENODEV) {
@@ -16353,6 +16355,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
bnxt_dl_fw_reporters_create(bp);
bnxt_rdma_aux_device_add(bp);
+ bnxt_fwctl_aux_device_add(bp);
bnxt_print_device_info(bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 2373f423a523ec..1951fdda8820d0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -30,6 +30,7 @@
#include <net/xdp.h>
#include <linux/dim.h>
#include <linux/io-64-nonatomic-lo-hi.h>
+#include "bnxt_hsi.h"
#ifdef CONFIG_TEE_BNXT_FW
#include <linux/firmware/broadcom/tee_bnxt_fw.h>
#endif
@@ -2326,7 +2327,9 @@ struct bnxt {
(BNXT_CHIP_P3(bp) || BNXT_CHIP_P4(bp) || BNXT_CHIP_P5(bp))
struct bnxt_aux_priv *aux_priv;
+ struct bnxt_aux_priv *aux_priv_fwctl;
struct bnxt_en_dev *edev;
+ struct bnxt_en_dev *edev_fwctl;
struct bnxt_napi **bnapi;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index e4a7f37036edba..7e39d9695af339 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -26,7 +26,8 @@
#include "bnxt_hwrm.h"
#include "bnxt_ulp.h"
-static DEFINE_IDA(bnxt_aux_dev_ids);
+static DEFINE_IDA(bnxt_rdma_aux_dev_ids);
+static DEFINE_IDA(bnxt_fwctl_aux_dev_ids);
static void bnxt_fill_msix_vecs(struct bnxt *bp, struct bnxt_msix_entry *ent)
{
@@ -413,7 +414,7 @@ static void bnxt_aux_dev_release(struct device *dev)
container_of(dev, struct bnxt_aux_priv, aux_dev.dev);
struct bnxt *bp = netdev_priv(aux_priv->edev->net);
- ida_free(&bnxt_aux_dev_ids, aux_priv->id);
+ ida_free(&bnxt_rdma_aux_dev_ids, aux_priv->id);
kfree(aux_priv->edev->ulp_tbl);
bp->edev = NULL;
kfree(aux_priv->edev);
@@ -488,7 +489,7 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp)
if (!aux_priv)
goto exit;
- aux_priv->id = ida_alloc(&bnxt_aux_dev_ids, GFP_KERNEL);
+ aux_priv->id = ida_alloc(&bnxt_rdma_aux_dev_ids, GFP_KERNEL);
if (aux_priv->id < 0) {
netdev_warn(bp->dev,
"ida alloc failed for ROCE auxiliary device\n");
@@ -504,7 +505,7 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp)
rc = auxiliary_device_init(aux_dev);
if (rc) {
- ida_free(&bnxt_aux_dev_ids, aux_priv->id);
+ ida_free(&bnxt_rdma_aux_dev_ids, aux_priv->id);
kfree(aux_priv);
goto exit;
}
@@ -536,3 +537,120 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp)
exit:
bp->flags &= ~BNXT_FLAG_ROCE_CAP;
}
+
+void bnxt_fwctl_aux_device_uninit(struct bnxt *bp)
+{
+ struct bnxt_aux_priv *aux_priv;
+ struct auxiliary_device *adev;
+
+ /* Skip if no auxiliary device init was done. */
+ if (!bp->aux_priv_fwctl)
+ return;
+
+ aux_priv = bp->aux_priv_fwctl;
+ adev = &aux_priv->aux_dev;
+ auxiliary_device_uninit(adev);
+}
+
+
+void bnxt_fwctl_aux_device_del(struct bnxt *bp)
+{
+ if (!bp->edev_fwctl)
+ return;
+
+ auxiliary_device_delete(&bp->aux_priv_fwctl->aux_dev);
+}
+
+void bnxt_fwctl_aux_device_add(struct bnxt *bp)
+{
+ struct auxiliary_device *aux_dev;
+ int rc;
+
+ if (!bp->edev_fwctl) {
+ printk(KERN_CRIT "edev_fwctl is NULL %s\n", __func__);
+ return;
+ }
+
+ aux_dev = &bp->aux_priv_fwctl->aux_dev;
+ rc = auxiliary_device_add(aux_dev);
+ if (rc) {
+ netdev_warn(bp->dev, "Failed to add auxiliary device for FWCTL\n");
+ auxiliary_device_uninit(aux_dev);
+ bp->flags &= ~BNXT_FLAG_ROCE_CAP;
+ } else {
+ netdev_warn(bp->dev, "Added auxiliary device for FWCTL!!!\n");
+ }
+}
+
+static void bnxt_fwctl_aux_dev_release(struct device *dev)
+{
+ struct bnxt_aux_priv *aux_priv =
+ container_of(dev, struct bnxt_aux_priv, aux_dev.dev);
+ struct bnxt *bp = netdev_priv(aux_priv->edev->net);
+
+ ida_free(&bnxt_fwctl_aux_dev_ids, aux_priv->id);
+ kfree(aux_priv->edev);
+ bp->edev_fwctl = NULL;
+ kfree(bp->aux_priv_fwctl);
+ bp->aux_priv_fwctl = NULL;
+}
+
+
+void bnxt_fwctl_aux_device_init(struct bnxt *bp)
+{
+ struct auxiliary_device *aux_dev;
+ struct bnxt_aux_priv *aux_priv;
+ struct bnxt_en_dev *edev;
+ struct bnxt_ulp *ulp;
+ int rc;
+
+ aux_priv = kzalloc(sizeof(*bp->aux_priv), GFP_KERNEL);
+ if (!aux_priv)
+ return;
+
+ aux_priv->id = ida_alloc(&bnxt_fwctl_aux_dev_ids, GFP_KERNEL);
+ if (aux_priv->id < 0) {
+ netdev_warn(bp->dev,
+ "ida alloc failed for FWCTL auxiliary device\n");
+ kfree(aux_priv);
+ return;
+ }
+
+ aux_dev = &aux_priv->aux_dev;
+ aux_dev->id = aux_priv->id;
+ aux_dev->name = "fwctl";
+ aux_dev->dev.parent = &bp->pdev->dev;
+ aux_dev->dev.release = bnxt_fwctl_aux_dev_release;
+
+ rc = auxiliary_device_init(aux_dev);
+ if (rc) {
+ ida_free(&bnxt_fwctl_aux_dev_ids, aux_priv->id);
+ kfree(aux_priv);
+ return;
+ }
+ bp->aux_priv_fwctl = aux_priv;
+
+ /* From this point, all cleanup will happen via the .release callback &
+ * any error unwinding will need to include a call to
+ * auxiliary_device_uninit.
+ */
+ edev = kzalloc(sizeof(*edev), GFP_KERNEL);
+ if (!edev)
+ goto aux_dev_uninit;
+
+ aux_priv->edev = edev;
+
+ ulp = kzalloc(sizeof(*ulp), GFP_KERNEL);
+ if (!ulp)
+ goto aux_dev_uninit;
+
+ edev->ulp_tbl = ulp;
+ bp->edev_fwctl = edev;
+ bnxt_set_edev_info(edev, bp);
+ /* bp->ulp_num_msix_want = bnxt_set_dflt_ulp_msix(bp); */
+ printk(KERN_CRIT "Made it %s\n", __func__);
+ return;
+
+aux_dev_uninit:
+ auxiliary_device_uninit(aux_dev);
+}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index 7fa3b8d1ebd288..148e3eb32be001 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -124,6 +124,10 @@ void bnxt_rdma_aux_device_uninit(struct bnxt *bp);
void bnxt_rdma_aux_device_del(struct bnxt *bp);
void bnxt_rdma_aux_device_add(struct bnxt *bp);
void bnxt_rdma_aux_device_init(struct bnxt *bp);
+void bnxt_fwctl_aux_device_uninit(struct bnxt *bp);
+void bnxt_fwctl_aux_device_del(struct bnxt *bp);
+void bnxt_fwctl_aux_device_add(struct bnxt *bp);
+void bnxt_fwctl_aux_device_init(struct bnxt *bp);
int bnxt_register_dev(struct bnxt_en_dev *edev, struct bnxt_ulp_ops *ulp_ops,
void *handle);
void bnxt_unregister_dev(struct bnxt_en_dev *edev);
--
2.43.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 0:13 ` [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt Jason Gunthorpe
@ 2025-02-07 0:44 ` Jakub Kicinski
2025-02-07 3:17 ` Andy Gospodarek
0 siblings, 1 reply; 67+ messages in thread
From: Jakub Kicinski @ 2025-02-07 0:44 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
>
> Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This is only needed for RDMA, why can't you make this part of bnxt_re ?
--
pw-bot: nap
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 0:44 ` Jakub Kicinski
@ 2025-02-07 3:17 ` Andy Gospodarek
2025-02-07 12:46 ` Jason Gunthorpe
2025-02-07 15:36 ` Jakub Kicinski
0 siblings, 2 replies; 67+ messages in thread
From: Andy Gospodarek @ 2025-02-07 3:17 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jason Gunthorpe, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon, Michael Chan
On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote:
> > From: Andy Gospodarek <gospo@broadcom.com>
> >
> > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> This is only needed for RDMA, why can't you make this part of bnxt_re ?
This is not just needed for RDMA, so having the aux device for fwctl
as part of the base driver is preferred.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 3:17 ` Andy Gospodarek
@ 2025-02-07 12:46 ` Jason Gunthorpe
2025-02-07 15:36 ` Jakub Kicinski
1 sibling, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 12:46 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Jakub Kicinski, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon, Michael Chan
On Thu, Feb 06, 2025 at 10:17:58PM -0500, Andy Gospodarek wrote:
> On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote:
> > > From: Andy Gospodarek <gospo@broadcom.com>
> > >
> > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >
> > This is only needed for RDMA, why can't you make this part of bnxt_re ?
>
> This is not just needed for RDMA, so having the aux device for fwctl
> as part of the base driver is preferred.
Same for mlx5
I have to apologize, somehow the bnxt WIP patches got included here,
I did not intend that, it was late and I'm juggling too many things
this week.
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-07 0:13 ` [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
@ 2025-02-07 12:59 ` Jonathan Cameron
2025-02-07 13:52 ` Jason Gunthorpe
2025-02-08 0:16 ` Dave Jiang
2025-02-13 12:42 ` Przemek Kitszel
2 siblings, 1 reply; 67+ messages in thread
From: Jonathan Cameron @ 2025-02-07 12:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Thu, 6 Feb 2025 20:13:24 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> Each file descriptor gets a chunk of per-FD driver specific context that
> allows the driver to attach a device specific struct to. The core code
> takes care of the memory lifetime for this structure.
>
> The ioctl dispatch and design is based on what was built for iommufd. The
> ioctls have a struct which has a combined in/out behavior with a typical
> 'zero pad' scheme for future extension and backwards compatibility.
>
> Like iommufd some shared logic does most of the ioctl marshalling and
> compatibility work and tables diatches to some function pointers for
> each unique iotcl.
>
> This approach has proven to work quite well in the iommufd and rdma
> subsystems.
>
> Allocate an ioctl number space for the subsystem.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Hi Jason,
Fresh read through given it's been a while.
A few really trivial things inline + one passing comment on a future
entertaining corner.
Jonathan
> M: Sebastian Reichel <sre@kernel.org>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index 34946bdc3bf3d7..d561deaf2b86d8 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> static int fwctl_fops_release(struct inode *inode, struct file *filp)
> {
> - struct fwctl_device *fwctl = filp->private_data;
> + struct fwctl_uctx *uctx = filp->private_data;
> + struct fwctl_device *fwctl = uctx->fwctl;
>
> + scoped_guard(rwsem_read, &fwctl->registration_lock) {
> + /*
> + * fwctl_unregister() has already removed the driver and
> + * destroyed the uctx.
Comment is a little odd given it is I think referring to why
the code that follows wouldn't run. Perhaps just add a 'may'
fwctl_unregister() may have already removed the driver and destroyed
the uctx.
> + */
> + if (fwctl->ops) {
> + guard(mutex)(&fwctl->uctx_list_lock);
> + fwctl_destroy_uctx(uctx);
> + }
> + }
> +
> + kfree(uctx);
> fwctl_put(fwctl);
> return 0;
> }
>
> @@ -71,14 +183,17 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> if (!fwctl)
> return NULL;
>
> - fwctl->dev.class = &fwctl_class;
> - fwctl->dev.parent = parent;
> -
> devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> if (devnum < 0)
> return NULL;
> fwctl->dev.devt = fwctl_dev + devnum;
>
> + fwctl->dev.class = &fwctl_class;
> + fwctl->dev.parent = parent;
Shunt this move back to previous patch?
> + init_rwsem(&fwctl->registration_lock);
> + mutex_init(&fwctl->uctx_list_lock);
> + INIT_LIST_HEAD(&fwctl->uctx_list);
> +
> device_initialize(&fwctl->dev);
> return_ptr(fwctl);
> }
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index 68ac2d5ab87481..93b470efb9dbc3 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -11,7 +11,30 @@
> struct fwctl_device;
> struct fwctl_uctx;
>
> +/**
> + * struct fwctl_ops - Driver provided operations
> + *
> + * fwctl_unregister() will wait until all excuting ops are completed before it
> + * returns. Drivers should be mindful to not let their ops run for too long as
> + * it will block device hot unplug and module unloading.
A passing comment on this. Seems likely that at somepoint we'll want an
abort op to enable cancelling if the particular driver supports it
(abort background command in CXL). Anyhow, problem for another day.
> + */
> struct fwctl_ops {
> + /**
> + * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
> + * bytes of this memory will be a fwctl_uctx. The driver can use the
> + * remaining bytes as its private memory.
> + */
> + size_t uctx_size;
> + /**
> + * @open_uctx: Called when a file descriptor is opened before the uctx
> + * is ever used.
> + */
> + int (*open_uctx)(struct fwctl_uctx *uctx);
> + /**
> + * @close_uctx: Called when the uctx is destroyed, usually when the FD
> + * is closed.
> + */
> + void (*close_uctx)(struct fwctl_uctx *uctx);
> };
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device
2025-02-07 0:13 ` [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
@ 2025-02-07 13:06 ` Jonathan Cameron
2025-02-07 14:23 ` Jason Gunthorpe
2025-02-08 0:21 ` Dave Jiang
1 sibling, 1 reply; 67+ messages in thread
From: Jonathan Cameron @ 2025-02-07 13:06 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Thu, 6 Feb 2025 20:13:25 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> Userspace will need to know some details about the fwctl interface being
> used to locate the correct userspace code to communicate with the
> kernel. Provide a simple device_type enum indicating what the kernel
> driver is.
>
> Allow the device to provide a device specific info struct that contains
> any additional information that the driver may need to provide to
> userspace.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Trivial comment inline.
> ---
> drivers/fwctl/main.c | 51 ++++++++++++++++++++++++++++++++++++++
> include/linux/fwctl.h | 12 +++++++++
> include/uapi/fwctl/fwctl.h | 32 ++++++++++++++++++++++++
> 3 files changed, 95 insertions(+)
>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index d561deaf2b86d8..4b6792f2031e86 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -27,8 +27,58 @@ struct fwctl_ucmd {
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> index f4718a6240f281..ac66853200a5a8 100644
> --- a/include/uapi/fwctl/fwctl.h
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -4,6 +4,9 @@
> #ifndef _UAPI_FWCTL_H
> #define _UAPI_FWCTL_H
>
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> #define FWCTL_TYPE 0x9A
>
> /**
> @@ -33,6 +36,35 @@
> */
> enum {
> FWCTL_CMD_BASE = 0,
> + FWCTL_CMD_INFO = 0,
> + FWCTL_CMD_RPC = 1,
Trivial but in theory should perhaps leave RPC for patch 5?
> };
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 04/10] taint: Add TAINT_FWCTL
2025-02-07 0:13 ` [PATCH v4 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
@ 2025-02-07 13:09 ` Jonathan Cameron
2025-02-08 0:24 ` Dave Jiang
1 sibling, 0 replies; 67+ messages in thread
From: Jonathan Cameron @ 2025-02-07 13:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Thu, 6 Feb 2025 20:13:26 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> Requesting a fwctl scope of access that includes mutating device debug
> data will cause the kernel to be tainted. Changing the device operation
> through things in the debug scope may cause the device to malfunction in
> undefined ways. This should be reflected in the TAINT flags to help any
> debuggers understand that something has been done.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Not something I've ever directly touched before, so more eyes on this
would be good, but FWIW looks inline with other flags and the
general principle seems sensible to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-07 12:59 ` Jonathan Cameron
@ 2025-02-07 13:52 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 13:52 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 12:59:15PM +0000, Jonathan Cameron wrote:
> > static int fwctl_fops_release(struct inode *inode, struct file *filp)
> > {
> > - struct fwctl_device *fwctl = filp->private_data;
> > + struct fwctl_uctx *uctx = filp->private_data;
> > + struct fwctl_device *fwctl = uctx->fwctl;
> >
> > + scoped_guard(rwsem_read, &fwctl->registration_lock) {
> > + /*
> > + * fwctl_unregister() has already removed the driver and
> > + * destroyed the uctx.
>
> Comment is a little odd given it is I think referring to why
> the code that follows wouldn't run. Perhaps just add a 'may'
It is intended to describe the if:
> > + if (fwctl->ops) {
> > + guard(mutex)(&fwctl->uctx_list_lock);
> > + fwctl_destroy_uctx(uctx);
> > + }
So let's do:
/*
* NULL ops means fwctl_unregister() has already removed the
* driver and destroyed the uctx.
*/
if (fwctl->ops) {
> > + fwctl->dev.class = &fwctl_class;
> > + fwctl->dev.parent = parent;
>
> Shunt this move back to previous patch?
Yes
> > +/**
> > + * struct fwctl_ops - Driver provided operations
> > + *
> > + * fwctl_unregister() will wait until all excuting ops are completed before it
> > + * returns. Drivers should be mindful to not let their ops run for too long as
> > + * it will block device hot unplug and module unloading.
>
> A passing comment on this. Seems likely that at somepoint we'll want an
> abort op to enable cancelling if the particular driver supports it
> (abort background command in CXL). Anyhow, problem for another day.
Hum, that will be an interesting thing to fit in. abort from userspace
as an ioctl would make sense. auto-abort to hotunplug seems a bit
tricky
Thanks,
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device
2025-02-07 13:06 ` Jonathan Cameron
@ 2025-02-07 14:23 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:23 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 01:06:41PM +0000, Jonathan Cameron wrote:
> > enum {
> > FWCTL_CMD_BASE = 0,
> > + FWCTL_CMD_INFO = 0,
> > + FWCTL_CMD_RPC = 1,
>
> Trivial but in theory should perhaps leave RPC for patch 5?
Yeah, I think it is a rebasing error
Thanks,
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 06/10] fwctl: Add documentation
2025-02-07 0:13 ` [PATCH v4 06/10] fwctl: Add documentation Jason Gunthorpe
@ 2025-02-07 14:42 ` Jonathan Cameron
2025-02-10 15:17 ` Jason Gunthorpe
2025-02-08 0:40 ` Dave Jiang
1 sibling, 1 reply; 67+ messages in thread
From: Jonathan Cameron @ 2025-02-07 14:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Thu, 6 Feb 2025 20:13:28 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> Document the purpose and rules for the fwctl subsystem.
>
> Link in kdocs to the doc tree.
>
> Nacked-by: Jakub Kicinski <kuba@kernel.org>
> Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
A few tiny things inline.
> ---
> Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++++++++++
> Documentation/userspace-api/fwctl/index.rst | 12 +
> Documentation/userspace-api/index.rst | 1 +
> MAINTAINERS | 2 +-
> 4 files changed, 299 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
> create mode 100644 Documentation/userspace-api/fwctl/index.rst
>
> diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
> new file mode 100644
> index 00000000000000..428f6f5bb9b4f9
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl/fwctl.rst
> @@ -0,0 +1,285 @@
> +Operations exposed through fwctl's non-taining interfaces should be fully
> +sharable with other users of the device. For instance exposing a RPC through
> +fwctl should never prevent a kernel subsystem from also concurrently using that
> +same RPC or hardware unit down the road. In such cases fwctl will be less
> +important than proper kernel subsystems that eventually emerge. Mistakes in this
> +area resulting in clashes will be resolved in favour of a kernel implementation.
> +
> +fwctl User API
> +==============
> +
> +.. kernel-doc:: include/uapi/fwctl/fwctl.h
> +.. kernel-doc:: include/uapi/fwctl/mlx5.h
Doesn't exist yet... I'm not sure if that actually causes a build issue
or not but probably better to just slip this in later in the series.
> +Development and debugging focused RPCs under more permissive scopes can have
> +less stablitiy if the tools using them are only run under exceptional
stability
> +circumstances and not for every day use of the device. Debugging tools may even
> +require exact version matching as they may require something similar to DWARF
> +debug information from the FW binary.
> +
...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5f30adbe6c8521..319169f7cb7e1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9561,7 +9561,7 @@ FWCTL SUBSYSTEM
> M: Jason Gunthorpe <jgg@nvidia.com>
> M: Saeed Mahameed <saeedm@nvidia.com>
> S: Maintained
> -F: Documentation/userspace-api/fwctl.rst
> +F: Documentation/userspace-api/fwctl/
Push back to patch 1 or introduce this here for the first time.
> F: drivers/fwctl/
> F: include/linux/fwctl.h
> F: include/uapi/fwctl/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw
2025-02-07 0:13 ` [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw Jason Gunthorpe
@ 2025-02-07 14:59 ` Jonathan Cameron
2025-02-07 15:03 ` Jason Gunthorpe
0 siblings, 1 reply; 67+ messages in thread
From: Jonathan Cameron @ 2025-02-07 14:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Thu, 6 Feb 2025 20:13:31 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
>
> This patch adds basic support for the fwctl infrastructure. With the
> approriate tool, the most basic RPC to the bnxt_en firmware can be
> called.
>
> Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
As commented on below, this one should perhaps have been marked
RFC given there are a bunch of FIXME inline.
> diff --git a/drivers/fwctl/bnxt/bnxt.c b/drivers/fwctl/bnxt/bnxt.c
> new file mode 100644
> index 00000000000000..d2b9a64a1402bf
> --- /dev/null
> +++ b/drivers/fwctl/bnxt/bnxt.c
> @@ -0,0 +1,167 @@
> +
> +/*
> + * bnxt_fw_msg->msg has the whole command
> + * the start of message is of type struct input
> + * struct input {
> + * __le16 req_type;
> + * __le16 cmpl_ring;
> + * __le16 seq_id;
> + * __le16 target_id;
> + * __le64 resp_addr;
> + * };
> + * so the hwrm op should be (struct input *)(hwrm_in->msg)->req_type
> + */
> +static bool bnxtctl_validate_rpc(struct fwctl_uctx *uctx,
> + struct bnxt_fw_msg *hwrm_in)
> +{
> + struct input *req = (struct input *)hwrm_in->msg;
hwrm_in->msg is void * so should be no need to cast here.
> +
> + switch (req->req_type) {
> + case HWRM_VER_GET:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static void *bnxtctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
> + void *in, size_t in_len, size_t *out_len)
> +{
> + struct bnxtctl_dev *bnxtctl =
> + container_of(uctx->fwctl, struct bnxtctl_dev, fwctl);
> + struct bnxt_aux_priv *bnxt_aux_priv = bnxtctl->aux_priv;
> + /* FIXME: Check me */
With the various FIXME in here I'm guessing this is an RFC for now?
Maybe better to make that clear in the patch title.
> + struct bnxt_fw_msg rpc_in = {
> + // FIXME: does bnxt_send_msg() copy?
> + .msg = in,
> + .msg_len = in_len,
> + .resp = in,
> + // FIXME: Dynamic memory for out_len
> + .resp_max_len = in_len,
> + };
> + int rc;
> +
> + if (!bnxtctl_validate_rpc(uctx, &rpc_in))
> + return ERR_PTR(-EPERM);
> +
> + rc = bnxt_send_msg(bnxt_aux_priv->edev, &rpc_in);
> + if (!rc)
> + return ERR_PTR(-EOPNOTSUPP);
> + return in;
> +}
> +
> +static int bnxtctl_probe(struct auxiliary_device *adev,
> + const struct auxiliary_device_id *id)
> +{
> + struct bnxt_aux_priv *aux_priv =
> + container_of(adev, struct bnxt_aux_priv, aux_dev);
> + struct bnxtctl_dev *bnxtctl __free(bnxtctl) =
> + fwctl_alloc_device(&aux_priv->edev->pdev->dev, &bnxtctl_ops,
Does this make more sense than setting parent to the
auxiliary device? (same applies to the mlx5 driver but I didn't notice
it there). That will result in a deeper nest in sysfs but
at least makes it obvious what the aux dev is doing.
> + struct bnxtctl_dev, fwctl);
> + int rc;
> +
> + if (!bnxtctl)
> + return -ENOMEM;
> +
> + bnxtctl->aux_priv = aux_priv;
> +
> + rc = fwctl_register(&bnxtctl->fwctl);
> + if (rc)
> + return rc;
> +
> + auxiliary_set_drvdata(adev, no_free_ptr(bnxtctl));
> + return 0;
> +}
> +static const struct auxiliary_device_id bnxtctl_id_table[] = {
> + { .name = "bnxt_en.fwctl", },
> + {},
Trivial but no need for that trailing comma given this will always
be the last entry.
> +};
> +MODULE_DEVICE_TABLE(auxiliary, bnxtctl_id_table);
> +
> +static struct auxiliary_driver bnxtctl_driver = {
> + .name = "bnxt_fwctl",
> + .probe = bnxtctl_probe,
> + .remove = bnxtctl_remove,
> + .id_table = bnxtctl_id_table,
> +};
> +
> +module_auxiliary_driver(bnxtctl_driver);
> diff --git a/include/uapi/fwctl/bnxt.h b/include/uapi/fwctl/bnxt.h
> new file mode 100644
> index 00000000000000..ea47fdbbf6ea3e
> --- /dev/null
> +++ b/include/uapi/fwctl/bnxt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright (c) 2024, Broadcom Corporation
> + *
> + */
> +#ifndef _UAPI_FWCTL_BNXT_H_
> +#define _UAPI_FWCTL_BNXT_H_
> +
> +#include <linux/types.h>
> +
> +enum fwctl_bnxt_commands {
> + FWCTL_BNXT_QUERY_COMMANDS = 0,
> + FWCTL_BNXT_SEND_COMMAND,
> +};
> +
> +/**
> + * struct fwctl_info_bnxt - ioctl(FWCTL_INFO) out_device_data
> + * @uctx_caps: The command capabilities driver accepts.
Silly though it may be, if the kernel-doc script runs on this I'm fairly
sure it will moan about lack of docs for reserved.
> + *
> + * Return basic information about the FW interface available.
> + */
> +struct fwctl_info_bnxt {
> + __u32 uctx_caps;
> + __u32 reserved;
> +};
> +
> +#endif
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw
2025-02-07 14:59 ` Jonathan Cameron
@ 2025-02-07 15:03 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 15:03 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 02:59:23PM +0000, Jonathan Cameron wrote:
> On Thu, 6 Feb 2025 20:13:31 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > From: Andy Gospodarek <gospo@broadcom.com>
> >
> > This patch adds basic support for the fwctl infrastructure. With the
> > approriate tool, the most basic RPC to the bnxt_en firmware can be
> > called.
> >
> > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> As commented on below, this one should perhaps have been marked
> RFC given there are a bunch of FIXME inline.
Yeah, please ignore, it was a mistake to include it
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 3:17 ` Andy Gospodarek
2025-02-07 12:46 ` Jason Gunthorpe
@ 2025-02-07 15:36 ` Jakub Kicinski
2025-02-07 20:25 ` Saeed Mahameed
2025-02-07 23:29 ` Andy Gospodarek
1 sibling, 2 replies; 67+ messages in thread
From: Jakub Kicinski @ 2025-02-07 15:36 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Jason Gunthorpe, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon, Michael Chan
On Thu, 6 Feb 2025 22:17:58 -0500 Andy Gospodarek wrote:
> On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote:
> > > From: Andy Gospodarek <gospo@broadcom.com>
> > >
> > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >
> > This is only needed for RDMA, why can't you make this part of bnxt_re ?
>
> This is not just needed for RDMA, so having the aux device for fwctl
> as part of the base driver is preferred.
Please elaborate. As you well know I have experience using Broadcom
devices in large TCP/IP networks, without the need for proprietary
tooling.
Now, I understand that it may be expedient for Broadcom and nVidia
to skip the upstream process and ship "features" to customers using
DOCA and whatever you call your tooling. But let's be honest that
this is the motivation here. Unified support for proprietary tooling
across subsystems and product lines for a given vendor. This way
migrating from in-tree networking to proprietary IPU/DPU networking
is easier, while migrating to another vendor would require full tooling
replacement.
I have nothing against RDMA and CXL subsystems adding whatever APIs
they want. But I don't understand why you think it's okay to force
this on normal networking, which does not need it.
nVidia is already refusing to add basic minoring features to their
upstream driver, and keeps asking its customers to migrate to libdoca.
So the concern that merging this will negatively impact standard
tooling is no longer theoretical.
Anyway, rant over. Give us some technical details.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 15:36 ` Jakub Kicinski
@ 2025-02-07 20:25 ` Saeed Mahameed
2025-02-07 21:51 ` Jakub Kicinski
2025-02-07 23:29 ` Andy Gospodarek
1 sibling, 1 reply; 67+ messages in thread
From: Saeed Mahameed @ 2025-02-07 20:25 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andy Gospodarek, Jason Gunthorpe, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On 07 Feb 07:36, Jakub Kicinski wrote:
>On Thu, 6 Feb 2025 22:17:58 -0500 Andy Gospodarek wrote:
>> On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote:
>> > > From: Andy Gospodarek <gospo@broadcom.com>
>> > >
>> > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
>> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> >
>> > This is only needed for RDMA, why can't you make this part of bnxt_re ?
>>
>> This is not just needed for RDMA, so having the aux device for fwctl
>> as part of the base driver is preferred.
>
>Please elaborate. As you well know I have experience using Broadcom
>devices in large TCP/IP networks, without the need for proprietary
>tooling.
>
I think Andy was referring to the fact that the aux bus management is
implemented in the base driver which is currently under netdev stack.
fwctl as well as netdev/RDMA are aux drivers sharing the same base "pci"
device. So it's not part of netdev TCP/IP, due to fact that the pci base
driver is part of netdev due to historical reasons, driver developers
have to go through netdev mailing list to review pci/bus/aux device driver
patches, which has nothing to do with TCP/IP. If netdev doesn't welcome
non-TCP/IP patches, then I think we have a bigger problem here..
>Now, I understand that it may be expedient for Broadcom and nVidia
>to skip the upstream process and ship "features" to customers using
>DOCA and whatever you call your tooling. But let's be honest that
>this is the motivation here. Unified support for proprietary tooling
DOCA doesn't need FWCTL.
DOCA and DPDK has all the support they need to configure the pipeline
and their own transport without FWCTL. This patchset has nothing
to do with DOCA, this is pretty clear from mutli vendor and
corss-kernel support for FWCTL.
>across subsystems and product lines for a given vendor. This way
>migrating from in-tree networking to proprietary IPU/DPU networking
>is easier, while migrating to another vendor would require full tooling
>replacement.
This is old netdev mentality, we must not apply it to all subsystems.
And I don't understand why tooling has to be standardized in-kernel-tree?
We have plans to standardize tooling in user-space, this was already very
successfully done in the nvme-cli..
>
>I have nothing against RDMA and CXL subsystems adding whatever APIs
>they want. But I don't understand why you think it's okay to force
>this on normal networking, which does not need it.
>
As explained above, netdev doesn't need it, but netdev subsystem also hosts
the pci base drivers, so you are going to see fwctl patches the same as you
see rdma and other non netdev patches flowing through netdev ML.
>nVidia is already refusing to add basic minoring features to their
>upstream driver, and keeps asking its customers to migrate to libdoca.
nVidia is one of the top contributers to netdev, we submit patches on
weekly bases and due to netdev mailing list review backlog and policy
we barely make quota, so please elaborate on what features we are refusing
to do ??
>So the concern that merging this will negatively impact standard
>tooling is no longer theoretical.
>
>Anyway, rant over. Give us some technical details.
Technical details: Driver specific aux subsystem implementation is base
driver responsibility which is hosted under netdev.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (9 preceding siblings ...)
2025-02-07 0:13 ` [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt Jason Gunthorpe
@ 2025-02-07 21:41 ` Dan Williams
2025-02-07 21:58 ` Dave Jiang
` (2 subsequent siblings)
13 siblings, 0 replies; 67+ messages in thread
From: Dan Williams @ 2025-02-07 21:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Jason Gunthorpe wrote:
> [
> Many people were away around the holiday period, but work is back in full
> swing now with Dave already at v3 on his CXL work over the past couple
> weeks. We are looking at a good chance of reaching this merge window. I
> will work out some shared branches with CXL and get it into linux-next
> once all three drivers can be assembled and reviews seem to be concluding.
>
> There are couple open notes
> - Greg was interested in a new name, but nobody offered any bikesheds
Here is a straw-bikeshed that I hope conveys the following sentiments:
- "This is the long tail interface for all the knobs and tunables that
are past the knee of the curve of diminishing returns for a
cross-vendor kernel-wrapped ABI."
- "This interface is for non-primary functionality of the device. An OSV
is free to disable this interface and your device will still fulfill
all its primary objectives."
In that light, how about "auxctl"?
> - I would like a co-maintainer
I expect someone from linux-cxl@ land to take you up on this, and I
assume you would not say no to more than 1 co-maintainer. ...stay
tuned.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 20:25 ` Saeed Mahameed
@ 2025-02-07 21:51 ` Jakub Kicinski
2025-02-08 1:10 ` Saeed Mahameed
2025-02-08 1:16 ` Jason Gunthorpe
0 siblings, 2 replies; 67+ messages in thread
From: Jakub Kicinski @ 2025-02-07 21:51 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Andy Gospodarek, Jason Gunthorpe, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Fri, 7 Feb 2025 12:25:28 -0800 Saeed Mahameed wrote:
> >nVidia is already refusing to add basic minoring features to their
> >upstream driver, and keeps asking its customers to migrate to libdoca.
>
> nVidia is one of the top contributers to netdev,
That's inaccurate. I can't think of a single meaningful contribution
from nVidia's NIC team outside of your own driver in the last 2 years.
> we submit patches on weekly bases and due to netdev mailing list
> review backlog and policy we barely make quota,
Luckily we have development statistics so we don't have to argue:
Top reviewers (cs): Top reviewers (msg):
1 ( +1) [27] Meta 1 ( +1) [68] Meta
2 ( -1) [25] RedHat 2 ( -1) [57] RedHat
3 ( +1) [19] Intel 3 ( +2) [49] Intel
4 ( -1) [15] Andrew Lunn 4 ( ) [43] Andrew Lunn
5 ( ) [12] Google 5 ( -2) [32] Google
6 ( +2) [ 5] Linaro 6 ( +3) [13] NXP
7 ( +3) [ 4] Oracle 7 ( +5) [13] Oracle
Top authors (cs): Top authors (msg):
1 ( ) [9] RedHat 1 ( ) [48] Intel
2 ( +2) [8] Google 2 ( ) [42] RedHat
3 ( -1) [7] Intel 3 ( +1) [39] Meta
4 ( -1) [7] Meta 4 ( -1) [31] Huawei
5 ( +2) [5] nVidia 5 ( ) [31] nVidia
6 ( +7) [3] Oracle 6 (+11) [28] Oracle
7 ( +9) [2] Linaro 7 (+15) [23] Pengutronix
Top scores (positive): Top scores (negative):
1 ( +1) [329] Meta 1 ( ) [92] Huawei
2 ( +1) [265] Andrew Lunn 2 ( +1) [76] OpenVPN
3 ( -2) [238] RedHat 3 (***) [54] Pengutronix
4 ( +3) [125] Intel 4 ( +2) [53] Marvell
5 ( -1) [116] Google 5 ( +5) [50] Dent
6 ( -1) [ 70] Linaro 6 (***) [45] nVidia
7 ( -1) [ 39] Broadcom 7 (+12) [43] AMD
https://lore.kernel.org/all/20250121200710.19126f7d@kernel.org/
nVidia has a negative review vs authorship score. It'd probably
be much worse if it wasn't for the work of the switch team.
> so please elaborate on what features we are refusing to do ??
nVidia likes to send these threads to my management so I need
to be careful. An issue was discovered during new platform evaluation.
That's all I'm gonna say.
> As explained above, netdev doesn't need it, but netdev subsystem also
> hosts the pci base drivers, so you are going to see fwctl patches the
> same as you see rdma and other non netdev patches flowing through
> netdev ML.
Sure, but we're deadlocked here. It may be a slight inconvenience to
redo the interface so that its not a standalone aux bus driver. But if
you agree the netdev doesn't need it seems like a fairly straightforward
way to unblock your progress.
I am glad that you at least agree now that nedev doesn't need it.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (10 preceding siblings ...)
2025-02-07 21:41 ` [PATCH v4 00/10] Introduce fwctl subystem Dan Williams
@ 2025-02-07 21:58 ` Dave Jiang
2025-02-11 9:33 ` Jonathan Cameron
2025-02-13 17:52 ` Jason Gunthorpe
2025-02-12 22:21 ` Zhu Yanjun
2025-02-13 2:30 ` Nelson, Shannon
13 siblings, 2 replies; 67+ messages in thread
From: Dave Jiang @ 2025-02-07 21:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> [
> Many people were away around the holiday period, but work is back in full
> swing now with Dave already at v3 on his CXL work over the past couple
> weeks. We are looking at a good chance of reaching this merge window. I
> will work out some shared branches with CXL and get it into linux-next
> once all three drivers can be assembled and reviews seem to be concluding.
>
> There are couple open notes
> - Greg was interested in a new name, but nobody offered any bikesheds
> - I would like a co-maintainer
I volunteer as tribute. :)
I got the CXL series rebased and tested on top of this series. So you can add
Tested-by: Dave Jiang <dave.jiang@intel.com>
for the core FWCTL bits in the series.
I'll post the CXL FWCTL series v4 shortly.
DJ
> ]
>
> fwctl is a new subsystem intended to bring some common rules and order to
> the growing pattern of exposing a secure FW interface directly to
> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> exposing a device for datapath operations fwctl is focused on debugging,
> configuration and provisioning of the device. It will not have the
> necessary features like interrupt delivery to support a datapath.
>
> This concept is similar to the long standing practice in the "HW" RAID
> space of having a device specific misc device to manage the RAID
> controller FW. fwctl generalizes this notion of a companion debug and
> management interface that goes along with a dataplane implemented in an
> appropriate subsystem.
>
> The need for this has reached a critical point as many users are moving to
> run lockdown enabled kernels. Several existing devices have had long
> standing tooling for management that relied on /sys/../resource0 or PCI
> config space access which is not permitted in lockdown. A major point of
> fwctl is to define and document the rules that a device must follow to
> expose a lockdown compatible RPC.
>
> Based on some discussion fwctl splits the RPCs into four categories
>
> FWCTL_RPC_CONFIGURATION
> FWCTL_RPC_DEBUG_READ_ONLY
> FWCTL_RPC_DEBUG_WRITE
> FWCTL_RPC_DEBUG_WRITE_FULL
>
> Where the latter two trigger a new TAINT_FWCTL, and the final one requires
> CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
> would be responsible to restrict RPCs to the requested security scope,
> while the core code handles the tainting and CAP checks.
>
> For details see the final patch which introduces the documentation.
>
> The CXL FWCTL driver is now in it own series on v3:
> https://lore.kernel.org/r/20250204220430.4146187-1-dave.jiang@intel.com
>
> I'm expecting a 3rd driver (from Shannon @ Pensando) to be posted right
> away, the github version I saw looked good. I've got soft commitments for
> about 6 drivers in total now.
>
> There have been three LWN articles written discussing various aspects of
> this proposal:
>
> https://lwn.net/Articles/955001/
> https://lwn.net/Articles/969383/
> https://lwn.net/Articles/990802/
>
> A really giant ksummit thread preceding a discussion at the Maintainer
> Summit:
>
> https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
>
> Several have expressed general support for this concept:
>
> AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com
> Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
> Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org
> Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org
> NVIDIA Networking
> Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
> Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
> SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
>
> Work is ongoing for userspace, currently the mellanox tool suite has been
> ported over:
> https://github.com/Mellanox/mstflint
>
> And a more simplified example how to use it:
> https://github.com/jgunthorpe/mlx5ctl.git
>
> This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
>
> v4:
> - Rebase to v6.14-rc1
> - Fine tune comments and rst documentatin
> - Adjust cleanup.h usage - remove places that add more ofuscation than
> value
> - CXL is back to its own independent series
> - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit
> - Fix mlx5ctl_validate_rpc() logic around scope checking
> - Disable mlx5ctl on SFs
> v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com
> - Rebase to v6.11-rc4
> - Add a squashed version of David's CXL series as the 2nd driver
> - Add missing includes
> - Improve comments based on feedback
> - Use the kdoc format that puts the member docs inside the struct
> - Rewrite fwctl_alloc_device() to be clearer
> - Incorporate all remarks for the documentation
> v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
> - Rebase to v6.10-rc5
> - Minor style changes
> - Follow the style consensus for the guard stuff
> - Documentation grammer/spelling
> - Add missed length output for mlx5 get_info
> - Add two more missed MLX5 CMD's
> - Collect tags
> v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
>
> Andy Gospodarek (2):
> fwctl/bnxt: Support communicating with bnxt fw
> bnxt: Create an auxiliary device for fwctl_bnxt
>
> Jason Gunthorpe (6):
> fwctl: Add basic structure for a class subsystem with a cdev
> fwctl: Basic ioctl dispatch for the character device
> fwctl: FWCTL_INFO to return basic information about the device
> taint: Add TAINT_FWCTL
> fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
> fwctl: Add documentation
>
> Saeed Mahameed (2):
> fwctl/mlx5: Support for communicating with mlx5 fw
> mlx5: Create an auxiliary device for fwctl_mlx5
>
> Documentation/admin-guide/tainted-kernels.rst | 5 +
> Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++
> Documentation/userspace-api/fwctl/index.rst | 12 +
> Documentation/userspace-api/index.rst | 1 +
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 16 +
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/fwctl/Kconfig | 32 ++
> drivers/fwctl/Makefile | 6 +
> drivers/fwctl/bnxt/Makefile | 4 +
> drivers/fwctl/bnxt/bnxt.c | 167 +++++++
> drivers/fwctl/main.c | 416 ++++++++++++++++++
> drivers/fwctl/mlx5/Makefile | 4 +
> drivers/fwctl/mlx5/main.c | 340 ++++++++++++++
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 126 +++++-
> drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 4 +
> drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 +
> include/linux/fwctl.h | 135 ++++++
> include/linux/panic.h | 3 +-
> include/uapi/fwctl/bnxt.h | 27 ++
> include/uapi/fwctl/fwctl.h | 140 ++++++
> include/uapi/fwctl/mlx5.h | 36 ++
> kernel/panic.c | 1 +
> tools/debugging/kernel-chktaint | 8 +
> 27 files changed, 1782 insertions(+), 5 deletions(-)
> create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
> create mode 100644 Documentation/userspace-api/fwctl/index.rst
> create mode 100644 drivers/fwctl/Kconfig
> create mode 100644 drivers/fwctl/Makefile
> create mode 100644 drivers/fwctl/bnxt/Makefile
> create mode 100644 drivers/fwctl/bnxt/bnxt.c
> create mode 100644 drivers/fwctl/main.c
> create mode 100644 drivers/fwctl/mlx5/Makefile
> create mode 100644 drivers/fwctl/mlx5/main.c
> create mode 100644 include/linux/fwctl.h
> create mode 100644 include/uapi/fwctl/bnxt.h
> create mode 100644 include/uapi/fwctl/fwctl.h
> create mode 100644 include/uapi/fwctl/mlx5.h
>
>
> base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 15:36 ` Jakub Kicinski
2025-02-07 20:25 ` Saeed Mahameed
@ 2025-02-07 23:29 ` Andy Gospodarek
2025-02-08 0:08 ` Jakub Kicinski
1 sibling, 1 reply; 67+ messages in thread
From: Andy Gospodarek @ 2025-02-07 23:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jason Gunthorpe, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon, Michael Chan
On Fri, Feb 7, 2025 at 10:36 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 6 Feb 2025 22:17:58 -0500 Andy Gospodarek wrote:
> > On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote:
> > > > From: Andy Gospodarek <gospo@broadcom.com>
> > > >
> > > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > >
> > > This is only needed for RDMA, why can't you make this part of bnxt_re ?
> >
> > This is not just needed for RDMA, so having the aux device for fwctl
> > as part of the base driver is preferred.
>
> Please elaborate. As you well know I have experience using Broadcom
> devices in large TCP/IP networks, without the need for proprietary
> tooling.
I totally get that. As a user it is not satisfying to have to
download and attempt to compile complicated proprietary tools to use
hardware features that seem like they should just work. I don't think
fwctl should be used as a crutch to avoid doing the work that is
needed to get support upstream.
> Now, I understand that it may be expedient for Broadcom and nVidia
> to skip the upstream process and ship "features" to customers using
> DOCA and whatever you call your tooling. But let's be honest that
> this is the motivation here. Unified support for proprietary tooling
> across subsystems and product lines for a given vendor. This way
> migrating from in-tree networking to proprietary IPU/DPU networking
> is easier, while migrating to another vendor would require full tooling
> replacement.
>
> I have nothing against RDMA and CXL subsystems adding whatever APIs
> they want. But I don't understand why you think it's okay to force
> this on normal networking, which does not need it.
>
> nVidia is already refusing to add basic minoring features to their
> upstream driver, and keeps asking its customers to migrate to libdoca.
> So the concern that merging this will negatively impact standard
> tooling is no longer theoretical.
>
> Anyway, rant over. Give us some technical details.
The primary use-case that I find valuable is the ability to perform
debug of different parts of a hardware pipeline when devices are
already in the field. This could be the standard ethernet pipeline,
RoCE, crypto, etc.
We do have the ability to gather all the information we need via tools
like ethtool and devlink, but there are cases where running a tool in
real-time can help us know what is happening in a system on a per
packet basis. We actually did something like this this week.
When I look at fwctl, I don't see it as something that is valuable
only today -- I see it as something that is valuable 2 years from now.
When someone is still running v6.17 and we have discovered that a
debug counter/infra that was added in v7.0, but they cannot use it
without installing a new kernel or an OOB driver we don't have an
option to easily help narrow down the problem.
If a fairly simple tool can help perform RPC to FW to glean some of
this hardware information we save days of back and forth debugging
with special drivers to try and help narrow down the issue.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev
2025-02-07 0:13 ` [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
@ 2025-02-07 23:32 ` Dan Williams
2025-02-07 23:55 ` Jason Gunthorpe
2025-02-08 0:08 ` Dave Jiang
1 sibling, 1 reply; 67+ messages in thread
From: Dan Williams @ 2025-02-07 23:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
Jason Gunthorpe wrote:
> Create the class, character device and functions for a fwctl driver to
> un/register to the subsystem.
>
> A typical fwctl driver has a sysfs presence like:
>
> $ ls -l /dev/fwctl/fwctl0
> crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0
>
> $ ls /sys/class/fwctl/fwctl0
> dev device power subsystem uevent
>
> $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> ibp0s10f0
>
> $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> fwctl0/
>
> $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> dev device power subsystem uevent
>
> Which allows userspace to link all the multi-subsystem driver components
> together and learn the subsystem specific names for the device's
> components.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[..]
> +struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> + const struct fwctl_ops *ops,
> + size_t size);
> +/**
> + * fwctl_alloc_device - Allocate a fwctl
> + * @parent: Physical device that provides the FW interface
> + * @ops: Driver ops to register
> + * @drv_struct: 'struct driver_fwctl' that holds the struct fwctl_device
> + * @member: Name of the struct fwctl_device in @drv_struct
> + *
> + * This allocates and initializes the fwctl_device embedded in the drv_struct.
> + * Upon success the pointer must be freed via fwctl_put(). Returns a 'drv_struct
> + * \*' on success, NULL on error.
> + */
> +#define fwctl_alloc_device(parent, ops, drv_struct, member) \
> + ({ \
> + static_assert(__same_type(struct fwctl_device, \
> + ((drv_struct *)NULL)->member)); \
> + static_assert(offsetof(drv_struct, member) == 0); \
> + (drv_struct *)_fwctl_alloc_device(parent, ops, \
> + sizeof(drv_struct)); \
> + })
I have already suggested someone else copy this approach to context
allocation. What do you think of generalizing this in
include/linux/container_of.h as:
#define container_alloc(core_struct, drv_struct, member, alloc_fn, ...) \
({ \
static_assert(__same_type(core_struct, \
((drv_struct *)NULL)->member)); \
static_assert(offsetof(drv_struct, member) == 0); \
(drv_struct *)(alloc_fn)(sizeof(drv_struct), __VA_ARGS__); \
})
...and then fwctl_alloc_device becomes:
#define fwctl_alloc_device(parent, ops, drv_struct, member) \
container_alloc(struct fwctl_device, drv_struct, member, \
_fwctl_alloc_device, parent, ops);
Either way, you can add:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev
2025-02-07 23:32 ` Dan Williams
@ 2025-02-07 23:55 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 23:55 UTC (permalink / raw)
To: Dan Williams
Cc: Andy Gospodarek, Aron Silverton, Daniel Vetter, Dave Jiang,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 03:32:00PM -0800, Dan Williams wrote:
> > +#define fwctl_alloc_device(parent, ops, drv_struct, member) \
> > + ({ \
> > + static_assert(__same_type(struct fwctl_device, \
> > + ((drv_struct *)NULL)->member)); \
> > + static_assert(offsetof(drv_struct, member) == 0); \
> > + (drv_struct *)_fwctl_alloc_device(parent, ops, \
> > + sizeof(drv_struct)); \
> > + })
>
> I have already suggested someone else copy this approach to context
> allocation. What do you think of generalizing this in
> include/linux/container_of.h as:
I also have several places doing that too in iommufd and I think we
have a variation in rdma as well.
Let me suggest we go around after the fact and propose a consolidation
patch. I think it will be easier to understand like that?
> #define container_alloc(core_struct, drv_struct, member, alloc_fn, ...) \
> ({ \
> static_assert(__same_type(core_struct, \
> ((drv_struct *)NULL)->member)); \
> static_assert(offsetof(drv_struct, member) == 0); \
> (drv_struct *)(alloc_fn)(sizeof(drv_struct), __VA_ARGS__); \
> })
It makes sense to me
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 23:29 ` Andy Gospodarek
@ 2025-02-08 0:08 ` Jakub Kicinski
0 siblings, 0 replies; 67+ messages in thread
From: Jakub Kicinski @ 2025-02-08 0:08 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Jason Gunthorpe, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon, Michael Chan
On Fri, 7 Feb 2025 18:29:15 -0500 Andy Gospodarek wrote:
> The primary use-case that I find valuable is the ability to perform
> debug of different parts of a hardware pipeline when devices are
> already in the field.
I think we covered the "debug by people who have access to the RTL"
in previous iterations. For networking that's not a sufficient use
case for a backdoor this powerful, sorry.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev
2025-02-07 0:13 ` [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2025-02-07 23:32 ` Dan Williams
@ 2025-02-08 0:08 ` Dave Jiang
1 sibling, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2025-02-08 0:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Create the class, character device and functions for a fwctl driver to
> un/register to the subsystem.
>
> A typical fwctl driver has a sysfs presence like:
>
> $ ls -l /dev/fwctl/fwctl0
> crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0
>
> $ ls /sys/class/fwctl/fwctl0
> dev device power subsystem uevent
>
> $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> ibp0s10f0
>
> $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> fwctl0/
>
> $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> dev device power subsystem uevent
>
> Which allows userspace to link all the multi-subsystem driver components
> together and learn the subsystem specific names for the device's
> components.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> MAINTAINERS | 8 ++
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/fwctl/Kconfig | 9 +++
> drivers/fwctl/Makefile | 4 +
> drivers/fwctl/main.c | 170 +++++++++++++++++++++++++++++++++++++++++
> include/linux/fwctl.h | 69 +++++++++++++++++
> 7 files changed, 263 insertions(+)
> create mode 100644 drivers/fwctl/Kconfig
> create mode 100644 drivers/fwctl/Makefile
> create mode 100644 drivers/fwctl/main.c
> create mode 100644 include/linux/fwctl.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 896a307fa06545..ff418a77f39e4d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9557,6 +9557,14 @@ F: kernel/futex/*
> F: tools/perf/bench/futex*
> F: tools/testing/selftests/futex/
>
> +FWCTL SUBSYSTEM
> +M: Jason Gunthorpe <jgg@nvidia.com>
> +M: Saeed Mahameed <saeedm@nvidia.com>
> +S: Maintained
> +F: Documentation/userspace-api/fwctl.rst
> +F: drivers/fwctl/
> +F: include/linux/fwctl.h
> +
> GALAXYCORE GC0308 CAMERA SENSOR DRIVER
> M: Sebastian Reichel <sre@kernel.org>
> L: linux-media@vger.kernel.org
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 7bdad836fc6207..7c556c5ac4fddc 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -21,6 +21,8 @@ source "drivers/connector/Kconfig"
>
> source "drivers/firmware/Kconfig"
>
> +source "drivers/fwctl/Kconfig"
> +
> source "drivers/gnss/Kconfig"
>
> source "drivers/mtd/Kconfig"
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 45d1c3e630f754..b5749cf67044ce 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -135,6 +135,7 @@ obj-y += ufs/
> obj-$(CONFIG_MEMSTICK) += memstick/
> obj-$(CONFIG_INFINIBAND) += infiniband/
> obj-y += firmware/
> +obj-$(CONFIG_FWCTL) += fwctl/
> obj-$(CONFIG_CRYPTO) += crypto/
> obj-$(CONFIG_SUPERH) += sh/
> obj-y += clocksource/
> diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
> new file mode 100644
> index 00000000000000..37147a695add9a
> --- /dev/null
> +++ b/drivers/fwctl/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menuconfig FWCTL
> + tristate "fwctl device firmware access framework"
> + help
> + fwctl provides a userspace API for restricted access to communicate
> + with on-device firmware. The communication channel is intended to
> + support a wide range of lockdown compatible device behaviors including
> + manipulating device FLASH, debugging, and other activities that don't
> + fit neatly into an existing subsystem.
> diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
> new file mode 100644
> index 00000000000000..1cad210f6ba580
> --- /dev/null
> +++ b/drivers/fwctl/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_FWCTL) += fwctl.o
> +
> +fwctl-y += main.o
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> new file mode 100644
> index 00000000000000..34946bdc3bf3d7
> --- /dev/null
> +++ b/drivers/fwctl/main.c
> @@ -0,0 +1,170 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
> + */
> +#define pr_fmt(fmt) "fwctl: " fmt
> +#include <linux/fwctl.h>
> +
> +#include <linux/container_of.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +enum {
> + FWCTL_MAX_DEVICES = 4096,
> +};
> +static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
> +
> +static dev_t fwctl_dev;
> +static DEFINE_IDA(fwctl_ida);
> +
> +static int fwctl_fops_open(struct inode *inode, struct file *filp)
> +{
> + struct fwctl_device *fwctl =
> + container_of(inode->i_cdev, struct fwctl_device, cdev);
> +
> + get_device(&fwctl->dev);
> + filp->private_data = fwctl;
> + return 0;
> +}
> +
> +static int fwctl_fops_release(struct inode *inode, struct file *filp)
> +{
> + struct fwctl_device *fwctl = filp->private_data;
> +
> + fwctl_put(fwctl);
> + return 0;
> +}
> +
> +static const struct file_operations fwctl_fops = {
> + .owner = THIS_MODULE,
> + .open = fwctl_fops_open,
> + .release = fwctl_fops_release,
> +};
> +
> +static void fwctl_device_release(struct device *device)
> +{
> + struct fwctl_device *fwctl =
> + container_of(device, struct fwctl_device, dev);
> +
> + ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
> + kfree(fwctl);
> +}
> +
> +static char *fwctl_devnode(const struct device *dev, umode_t *mode)
> +{
> + return kasprintf(GFP_KERNEL, "fwctl/%s", dev_name(dev));
> +}
> +
> +static struct class fwctl_class = {
> + .name = "fwctl",
> + .dev_release = fwctl_device_release,
> + .devnode = fwctl_devnode,
> +};
> +
> +static struct fwctl_device *
> +_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> +{
> + struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
> + int devnum;
> +
> + if (!fwctl)
> + return NULL;
> +
> + fwctl->dev.class = &fwctl_class;
> + fwctl->dev.parent = parent;
> +
> + devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> + if (devnum < 0)
> + return NULL;
> + fwctl->dev.devt = fwctl_dev + devnum;
> +
> + device_initialize(&fwctl->dev);
> + return_ptr(fwctl);
> +}
> +
> +/* Drivers use the fwctl_alloc_device() wrapper */
> +struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> + const struct fwctl_ops *ops,
> + size_t size)
> +{
> + struct fwctl_device *fwctl __free(fwctl) =
> + _alloc_device(parent, ops, size);
> +
> + if (!fwctl)
> + return NULL;
> +
> + cdev_init(&fwctl->cdev, &fwctl_fops);
> + /*
> + * The driver module is protected by fwctl_register/unregister(),
> + * unregister won't complete until we are done with the driver's module.
> + */
> + fwctl->cdev.owner = THIS_MODULE;
> +
> + if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
> + return NULL;
> +
> + fwctl->ops = ops;
> + return_ptr(fwctl);
> +}
> +EXPORT_SYMBOL_NS_GPL(_fwctl_alloc_device, "FWCTL");
> +
> +/**
> + * fwctl_register - Register a new device to the subsystem
> + * @fwctl: Previously allocated fwctl_device
> + *
> + * On return the device is visible through sysfs and /dev, driver ops may be
> + * called.
> + */
> +int fwctl_register(struct fwctl_device *fwctl)
> +{
> + return cdev_device_add(&fwctl->cdev, &fwctl->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(fwctl_register, "FWCTL");
> +
> +/**
> + * fwctl_unregister - Unregister a device from the subsystem
> + * @fwctl: Previously allocated and registered fwctl_device
> + *
> + * Undoes fwctl_register(). On return no driver ops will be called. The
> + * caller must still call fwctl_put() to free the fwctl.
> + *
> + * The design of fwctl allows this sort of disassociation of the driver from the
> + * subsystem primarily by keeping memory allocations owned by the core subsytem.
> + * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
> + * callback. This allows the module to remain unlocked while FDs are open.
> + */
> +void fwctl_unregister(struct fwctl_device *fwctl)
> +{
> + cdev_device_del(&fwctl->cdev, &fwctl->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(fwctl_unregister, "FWCTL");
> +
> +static int __init fwctl_init(void)
> +{
> + int ret;
> +
> + ret = alloc_chrdev_region(&fwctl_dev, 0, FWCTL_MAX_DEVICES, "fwctl");
> + if (ret)
> + return ret;
> +
> + ret = class_register(&fwctl_class);
> + if (ret)
> + goto err_chrdev;
> + return 0;
> +
> +err_chrdev:
> + unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
> + return ret;
> +}
> +
> +static void __exit fwctl_exit(void)
> +{
> + class_unregister(&fwctl_class);
> + unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
> +}
> +
> +module_init(fwctl_init);
> +module_exit(fwctl_exit);
> +MODULE_DESCRIPTION("fwctl device firmware access framework");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> new file mode 100644
> index 00000000000000..68ac2d5ab87481
> --- /dev/null
> +++ b/include/linux/fwctl.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
> + */
> +#ifndef __LINUX_FWCTL_H
> +#define __LINUX_FWCTL_H
> +#include <linux/device.h>
> +#include <linux/cdev.h>
> +#include <linux/cleanup.h>
> +
> +struct fwctl_device;
> +struct fwctl_uctx;
> +
> +struct fwctl_ops {
> +};
> +
> +/**
> + * struct fwctl_device - Per-driver registration struct
> + * @dev: The sysfs (class/fwctl/fwctlXX) device
> + *
> + * Each driver instance will have one of these structs with the driver private
> + * data following immediately after. This struct is refcounted, it is freed by
> + * calling fwctl_put().
> + */
> +struct fwctl_device {
> + struct device dev;
> + /* private: */
> + struct cdev cdev;
> + const struct fwctl_ops *ops;
> +};
> +
> +struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> + const struct fwctl_ops *ops,
> + size_t size);
> +/**
> + * fwctl_alloc_device - Allocate a fwctl
> + * @parent: Physical device that provides the FW interface
> + * @ops: Driver ops to register
> + * @drv_struct: 'struct driver_fwctl' that holds the struct fwctl_device
> + * @member: Name of the struct fwctl_device in @drv_struct
> + *
> + * This allocates and initializes the fwctl_device embedded in the drv_struct.
> + * Upon success the pointer must be freed via fwctl_put(). Returns a 'drv_struct
> + * \*' on success, NULL on error.
> + */
> +#define fwctl_alloc_device(parent, ops, drv_struct, member) \
> + ({ \
> + static_assert(__same_type(struct fwctl_device, \
> + ((drv_struct *)NULL)->member)); \
> + static_assert(offsetof(drv_struct, member) == 0); \
> + (drv_struct *)_fwctl_alloc_device(parent, ops, \
> + sizeof(drv_struct)); \
> + })
> +
> +static inline struct fwctl_device *fwctl_get(struct fwctl_device *fwctl)
> +{
> + get_device(&fwctl->dev);
> + return fwctl;
> +}
> +static inline void fwctl_put(struct fwctl_device *fwctl)
> +{
> + put_device(&fwctl->dev);
> +}
> +DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
> +
> +int fwctl_register(struct fwctl_device *fwctl);
> +void fwctl_unregister(struct fwctl_device *fwctl);
> +
> +#endif
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-07 0:13 ` [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2025-02-07 12:59 ` Jonathan Cameron
@ 2025-02-08 0:16 ` Dave Jiang
2025-02-10 15:24 ` Jason Gunthorpe
2025-02-13 12:42 ` Przemek Kitszel
2 siblings, 1 reply; 67+ messages in thread
From: Dave Jiang @ 2025-02-08 0:16 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Each file descriptor gets a chunk of per-FD driver specific context that
> allows the driver to attach a device specific struct to. The core code
> takes care of the memory lifetime for this structure.
>
> The ioctl dispatch and design is based on what was built for iommufd. The
> ioctls have a struct which has a combined in/out behavior with a typical
> 'zero pad' scheme for future extension and backwards compatibility.
>
> Like iommufd some shared logic does most of the ioctl marshalling and
s/marshalling/marshaling/
> compatibility work and tables diatches to some function pointers for
s/diatches/dispatches/
> each unique iotcl.
s/iotcl/ioctl/
>
> This approach has proven to work quite well in the iommufd and rdma
> subsystems.
>
> Allocate an ioctl number space for the subsystem.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 1 +
> drivers/fwctl/main.c | 145 +++++++++++++++++-
> include/linux/fwctl.h | 46 ++++++
> include/uapi/fwctl/fwctl.h | 38 +++++
> 5 files changed, 226 insertions(+), 5 deletions(-)
> create mode 100644 include/uapi/fwctl/fwctl.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 6d1465315df328..3410b020a9d093 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -331,6 +331,7 @@ Code Seq# Include File Comments
> 0x97 00-7F fs/ceph/ioctl.h Ceph file system
> 0x99 00-0F 537-Addinboard driver
> <mailto:buk@buks.ipn.de>
> +0x9A 00-0F include/uapi/fwctl/fwctl.h
> 0xA0 all linux/sdp/sdp.h Industrial Device Project
> <mailto:kenji@bitgate.com>
> 0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff418a77f39e4d..5f30adbe6c8521 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9564,6 +9564,7 @@ S: Maintained
> F: Documentation/userspace-api/fwctl.rst
> F: drivers/fwctl/
> F: include/linux/fwctl.h
> +F: include/uapi/fwctl/
>
> GALAXYCORE GC0308 CAMERA SENSOR DRIVER
> M: Sebastian Reichel <sre@kernel.org>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index 34946bdc3bf3d7..d561deaf2b86d8 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -10,6 +10,8 @@
> #include <linux/module.h>
> #include <linux/slab.h>
>
> +#include <uapi/fwctl/fwctl.h>
> +
> enum {
> FWCTL_MAX_DEVICES = 4096,
> };
> @@ -18,20 +20,128 @@ static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
> static dev_t fwctl_dev;
> static DEFINE_IDA(fwctl_ida);
>
> +struct fwctl_ucmd {
> + struct fwctl_uctx *uctx;
> + void __user *ubuffer;
> + void *cmd;
> + u32 user_size;
> +};
> +
> +/* On stack memory for the ioctl structs */
> +union ucmd_buffer {
> +};
> +
> +struct fwctl_ioctl_op {
> + unsigned int size;
> + unsigned int min_size;
> + unsigned int ioctl_num;
> + int (*execute)(struct fwctl_ucmd *ucmd);
> +};
> +
> +#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
> + [_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = { \
> + .size = sizeof(_struct) + \
> + BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
> + sizeof(_struct)), \
> + .min_size = offsetofend(_struct, _last), \
> + .ioctl_num = _ioctl, \
> + .execute = _fn, \
> + }
> +static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
> +};
> +
> +static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct fwctl_uctx *uctx = filp->private_data;
> + const struct fwctl_ioctl_op *op;
> + struct fwctl_ucmd ucmd = {};
> + union ucmd_buffer buf;
> + unsigned int nr;
> + int ret;
> +
> + nr = _IOC_NR(cmd);
> + if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
> + return -ENOIOCTLCMD;
> +
> + op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
> + if (op->ioctl_num != cmd)
> + return -ENOIOCTLCMD;
> +
> + ucmd.uctx = uctx;
> + ucmd.cmd = &buf;
> + ucmd.ubuffer = (void __user *)arg;
> + ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
> + if (ret)
> + return ret;
> +
> + if (ucmd.user_size < op->min_size)
> + return -EINVAL;
> +
> + ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
> + ucmd.user_size);
> + if (ret)
> + return ret;
> +
> + guard(rwsem_read)(&uctx->fwctl->registration_lock);
> + if (!uctx->fwctl->ops)
> + return -ENODEV;
> + return op->execute(&ucmd);
> +}
> +
> static int fwctl_fops_open(struct inode *inode, struct file *filp)
> {
> struct fwctl_device *fwctl =
> container_of(inode->i_cdev, struct fwctl_device, cdev);
> + int ret;
> +
> + guard(rwsem_read)(&fwctl->registration_lock);
> + if (!fwctl->ops)
> + return -ENODEV;
> +
> + struct fwctl_uctx *uctx __free(kfree) =
> + kzalloc(fwctl->ops->uctx_size, GFP_KERNEL_ACCOUNT);
> + if (!uctx)
> + return -ENOMEM;
> +
> + uctx->fwctl = fwctl;
> + ret = fwctl->ops->open_uctx(uctx);
> + if (ret)
> + return ret;
> +
> + scoped_guard(mutex, &fwctl->uctx_list_lock) {
> + list_add_tail(&uctx->uctx_list_entry, &fwctl->uctx_list);
> + }
>
> get_device(&fwctl->dev);
> - filp->private_data = fwctl;
> + filp->private_data = no_free_ptr(uctx);
> return 0;
> }
>
> +static void fwctl_destroy_uctx(struct fwctl_uctx *uctx)
> +{
> + lockdep_assert_held(&uctx->fwctl->uctx_list_lock);
> + list_del(&uctx->uctx_list_entry);
> + uctx->fwctl->ops->close_uctx(uctx);
> +}
> +
> static int fwctl_fops_release(struct inode *inode, struct file *filp)
> {
> - struct fwctl_device *fwctl = filp->private_data;
> + struct fwctl_uctx *uctx = filp->private_data;
> + struct fwctl_device *fwctl = uctx->fwctl;
>
> + scoped_guard(rwsem_read, &fwctl->registration_lock) {
> + /*
> + * fwctl_unregister() has already removed the driver and
> + * destroyed the uctx.
> + */
> + if (fwctl->ops) {
> + guard(mutex)(&fwctl->uctx_list_lock);
> + fwctl_destroy_uctx(uctx);
> + }
> + }
> +
> + kfree(uctx);
> fwctl_put(fwctl);
> return 0;
> }
> @@ -40,6 +150,7 @@ static const struct file_operations fwctl_fops = {
> .owner = THIS_MODULE,
> .open = fwctl_fops_open,
> .release = fwctl_fops_release,
> + .unlocked_ioctl = fwctl_fops_ioctl,
> };
>
> static void fwctl_device_release(struct device *device)
> @@ -48,6 +159,7 @@ static void fwctl_device_release(struct device *device)
> container_of(device, struct fwctl_device, dev);
>
> ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
> + mutex_destroy(&fwctl->uctx_list_lock);
> kfree(fwctl);
> }
>
> @@ -71,14 +183,17 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> if (!fwctl)
> return NULL;
>
> - fwctl->dev.class = &fwctl_class;
> - fwctl->dev.parent = parent;
> -
> devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> if (devnum < 0)
> return NULL;
> fwctl->dev.devt = fwctl_dev + devnum;
>
> + fwctl->dev.class = &fwctl_class;
> + fwctl->dev.parent = parent;
> + init_rwsem(&fwctl->registration_lock);
> + mutex_init(&fwctl->uctx_list_lock);
> + INIT_LIST_HEAD(&fwctl->uctx_list);
> +
> device_initialize(&fwctl->dev);
> return_ptr(fwctl);
> }
> @@ -129,6 +244,10 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, "FWCTL");
> * Undoes fwctl_register(). On return no driver ops will be called. The
> * caller must still call fwctl_put() to free the fwctl.
> *
> + * Unregister will return even if userspace still has file descriptors open.
> + * This will call ops->close_uctx() on any open FDs and after return no driver
> + * op will be called. The FDs remain open but all fops will return -ENODEV.
> + *
> * The design of fwctl allows this sort of disassociation of the driver from the
> * subsystem primarily by keeping memory allocations owned by the core subsytem.
> * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
> @@ -136,7 +255,23 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, "FWCTL");
> */
> void fwctl_unregister(struct fwctl_device *fwctl)
> {
> + struct fwctl_uctx *uctx;
> +
> cdev_device_del(&fwctl->cdev, &fwctl->dev);
> +
> + /* Disable and free the driver's resources for any still open FDs. */
> + guard(rwsem_write)(&fwctl->registration_lock);
> + guard(mutex)(&fwctl->uctx_list_lock);
> + while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
> + struct fwctl_uctx,
> + uctx_list_entry)))
> + fwctl_destroy_uctx(uctx);
> +
> + /*
> + * The driver module may unload after this returns, the op pointer will
> + * not be valid.
> + */
> + fwctl->ops = NULL;
> }
> EXPORT_SYMBOL_NS_GPL(fwctl_unregister, "FWCTL");
>
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index 68ac2d5ab87481..93b470efb9dbc3 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -11,7 +11,30 @@
> struct fwctl_device;
> struct fwctl_uctx;
>
> +/**
> + * struct fwctl_ops - Driver provided operations
> + *
> + * fwctl_unregister() will wait until all excuting ops are completed before it
> + * returns. Drivers should be mindful to not let their ops run for too long as
> + * it will block device hot unplug and module unloading.
> + */
> struct fwctl_ops {
> + /**
> + * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
> + * bytes of this memory will be a fwctl_uctx. The driver can use the
> + * remaining bytes as its private memory.
> + */
> + size_t uctx_size;
> + /**
> + * @open_uctx: Called when a file descriptor is opened before the uctx
> + * is ever used.
> + */
> + int (*open_uctx)(struct fwctl_uctx *uctx);
> + /**
> + * @close_uctx: Called when the uctx is destroyed, usually when the FD
> + * is closed.
> + */
> + void (*close_uctx)(struct fwctl_uctx *uctx);
> };
>
> /**
> @@ -26,6 +49,15 @@ struct fwctl_device {
> struct device dev;
> /* private: */
> struct cdev cdev;
> +
> + /* Protect uctx_list */
> + struct mutex uctx_list_lock;
> + struct list_head uctx_list;
> + /*
> + * Protect ops, held for write when ops becomes NULL during unregister,
> + * held for read whenever ops is loaded or an ops function is running.
> + */
> + struct rw_semaphore registration_lock;
> const struct fwctl_ops *ops;
> };
>
> @@ -66,4 +98,18 @@ DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
> int fwctl_register(struct fwctl_device *fwctl);
> void fwctl_unregister(struct fwctl_device *fwctl);
>
> +/**
> + * struct fwctl_uctx - Per user FD context
> + * @fwctl: fwctl instance that owns the context
> + *
> + * Every FD opened by userspace will get a unique context allocation. Any driver
> + * private data will follow immediately after.
> + */
> +struct fwctl_uctx {
> + struct fwctl_device *fwctl;
> + /* private: */
> + /* Head at fwctl_device::uctx_list */
> + struct list_head uctx_list_entry;
> +};
> +
> #endif
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> new file mode 100644
> index 00000000000000..f4718a6240f281
> --- /dev/null
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
> + */
> +#ifndef _UAPI_FWCTL_H
> +#define _UAPI_FWCTL_H
> +
> +#define FWCTL_TYPE 0x9A
> +
> +/**
> + * DOC: General ioctl format
> + *
> + * The ioctl interface follows a general format to allow for extensibility. Each
> + * ioctl is passed a structure pointer as the argument providing the size of
> + * the structure in the first u32. The kernel checks that any structure space
> + * beyond what it understands is 0. This allows userspace to use the backward
> + * compatible portion while consistently using the newer, larger, structures.
> + *
> + * ioctls use a standard meaning for common errnos:
> + *
> + * - ENOTTY: The IOCTL number itself is not supported at all
> + * - E2BIG: The IOCTL number is supported, but the provided structure has
> + * non-zero in a part the kernel does not understand.
> + * - EOPNOTSUPP: The IOCTL number is supported, and the structure is
> + * understood, however a known field has a value the kernel does not
> + * understand or support.
> + * - EINVAL: Everything about the IOCTL was understood, but a field is not
> + * correct.
> + * - ENOMEM: Out of memory.
> + * - ENODEV: The underlying device has been hot-unplugged and the FD is
> + * orphaned.
> + *
> + * As well as additional errnos, within specific ioctls.
> + */
> +enum {
> + FWCTL_CMD_BASE = 0,
> +};
> +
> +#endif
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device
2025-02-07 0:13 ` [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2025-02-07 13:06 ` Jonathan Cameron
@ 2025-02-08 0:21 ` Dave Jiang
1 sibling, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2025-02-08 0:21 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Userspace will need to know some details about the fwctl interface being
> used to locate the correct userspace code to communicate with the
> kernel. Provide a simple device_type enum indicating what the kernel
> driver is.
>
> Allow the device to provide a device specific info struct that contains
> any additional information that the driver may need to provide to
> userspace.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/fwctl/main.c | 51 ++++++++++++++++++++++++++++++++++++++
> include/linux/fwctl.h | 12 +++++++++
> include/uapi/fwctl/fwctl.h | 32 ++++++++++++++++++++++++
> 3 files changed, 95 insertions(+)
>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index d561deaf2b86d8..4b6792f2031e86 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -27,8 +27,58 @@ struct fwctl_ucmd {
> u32 user_size;
> };
>
> +static int ucmd_respond(struct fwctl_ucmd *ucmd, size_t cmd_len)
> +{
> + if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
> + min_t(size_t, ucmd->user_size, cmd_len)))
> + return -EFAULT;
> + return 0;
> +}
> +
> +static int copy_to_user_zero_pad(void __user *to, const void *from,
> + size_t from_len, size_t user_len)
> +{
> + size_t copy_len;
> +
> + copy_len = min(from_len, user_len);
> + if (copy_to_user(to, from, copy_len))
> + return -EFAULT;
> + if (copy_len < user_len) {
> + if (clear_user(to + copy_len, user_len - copy_len))
> + return -EFAULT;
> + }
> + return 0;
> +}
> +
> +static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
> +{
> + struct fwctl_device *fwctl = ucmd->uctx->fwctl;
> + struct fwctl_info *cmd = ucmd->cmd;
> + size_t driver_info_len = 0;
> +
> + if (cmd->flags)
> + return -EOPNOTSUPP;
> +
> + if (cmd->device_data_len) {
> + void *driver_info __free(kfree) =
> + fwctl->ops->info(ucmd->uctx, &driver_info_len);
> + if (IS_ERR(driver_info))
> + return PTR_ERR(driver_info);
> +
> + if (copy_to_user_zero_pad(u64_to_user_ptr(cmd->out_device_data),
> + driver_info, driver_info_len,
> + cmd->device_data_len))
> + return -EFAULT;
> + }
> +
> + cmd->out_device_type = fwctl->ops->device_type;
> + cmd->device_data_len = driver_info_len;
> + return ucmd_respond(ucmd, sizeof(*cmd));
> +}
> +
> /* On stack memory for the ioctl structs */
> union ucmd_buffer {
> + struct fwctl_info info;
> };
>
> struct fwctl_ioctl_op {
> @@ -48,6 +98,7 @@ struct fwctl_ioctl_op {
> .execute = _fn, \
> }
> static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
> + IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
> };
>
> static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index 93b470efb9dbc3..9b6cc8ae1aa0ca 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -7,6 +7,7 @@
> #include <linux/device.h>
> #include <linux/cdev.h>
> #include <linux/cleanup.h>
> +#include <uapi/fwctl/fwctl.h>
>
> struct fwctl_device;
> struct fwctl_uctx;
> @@ -19,6 +20,10 @@ struct fwctl_uctx;
> * it will block device hot unplug and module unloading.
> */
> struct fwctl_ops {
> + /**
> + * @device_type: The drivers assigned device_type number. This is uABI.
> + */
> + enum fwctl_device_type device_type;
> /**
> * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
> * bytes of this memory will be a fwctl_uctx. The driver can use the
> @@ -35,6 +40,13 @@ struct fwctl_ops {
> * is closed.
> */
> void (*close_uctx)(struct fwctl_uctx *uctx);
> + /**
> + * @info: Implement FWCTL_INFO. Return a kmalloc() memory that is copied
> + * to out_device_data. On input length indicates the size of the user
> + * buffer on output it indicates the size of the memory. The driver can
> + * ignore length on input, the core code will handle everything.
> + */
> + void *(*info)(struct fwctl_uctx *uctx, size_t *length);
> };
>
> /**
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> index f4718a6240f281..ac66853200a5a8 100644
> --- a/include/uapi/fwctl/fwctl.h
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -4,6 +4,9 @@
> #ifndef _UAPI_FWCTL_H
> #define _UAPI_FWCTL_H
>
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> #define FWCTL_TYPE 0x9A
>
> /**
> @@ -33,6 +36,35 @@
> */
> enum {
> FWCTL_CMD_BASE = 0,
> + FWCTL_CMD_INFO = 0,
> + FWCTL_CMD_RPC = 1,
> };
>
> +enum fwctl_device_type {
> + FWCTL_DEVICE_TYPE_ERROR = 0,
> +};
> +
> +/**
> + * struct fwctl_info - ioctl(FWCTL_INFO)
> + * @size: sizeof(struct fwctl_info)
> + * @flags: Must be 0
> + * @out_device_type: Returns the type of the device from enum fwctl_device_type
> + * @device_data_len: On input the length of the out_device_data memory. On
> + * output the size of the kernel's device_data which may be larger or
> + * smaller than the input. Maybe 0 on input.
> + * @out_device_data: Pointer to a memory of device_data_len bytes. Kernel will
> + * fill the entire memory, zeroing as required.
> + *
> + * Returns basic information about this fwctl instance, particularly what driver
> + * is being used to define the device_data format.
> + */
> +struct fwctl_info {
> + __u32 size;
> + __u32 flags;
> + __u32 out_device_type;
> + __u32 device_data_len;
> + __aligned_u64 out_device_data;
> +};
> +#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
> +
> #endif
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 04/10] taint: Add TAINT_FWCTL
2025-02-07 0:13 ` [PATCH v4 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
2025-02-07 13:09 ` Jonathan Cameron
@ 2025-02-08 0:24 ` Dave Jiang
1 sibling, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2025-02-08 0:24 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Requesting a fwctl scope of access that includes mutating device debug
> data will cause the kernel to be tainted. Changing the device operation
> through things in the debug scope may cause the device to malfunction in
> undefined ways. This should be reflected in the TAINT flags to help any
> debuggers understand that something has been done.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> Documentation/admin-guide/tainted-kernels.rst | 5 +++++
> include/linux/panic.h | 3 ++-
> kernel/panic.c | 1 +
> tools/debugging/kernel-chktaint | 8 ++++++++
> 4 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> index 700aa72eecb169..a0cc017e44246f 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -101,6 +101,7 @@ Bit Log Number Reason that got the kernel tainted
> 16 _/X 65536 auxiliary taint, defined for and used by distros
> 17 _/T 131072 kernel was built with the struct randomization plugin
> 18 _/N 262144 an in-kernel test has been run
> + 19 _/J 524288 userspace used a mutating debug operation in fwctl
> === === ====== ========================================================
>
> Note: The character ``_`` is representing a blank in this table to make reading
> @@ -184,3 +185,7 @@ More detailed explanation for tainting
> build time.
>
> 18) ``N`` if an in-kernel test, such as a KUnit test, has been run.
> +
> + 19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
> + to use the devices debugging features. Device debugging features could
> + cause the device to malfunction in undefined ways.
> diff --git a/include/linux/panic.h b/include/linux/panic.h
> index 54d90b6c5f47bd..2494d51707ef42 100644
> --- a/include/linux/panic.h
> +++ b/include/linux/panic.h
> @@ -74,7 +74,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
> #define TAINT_AUX 16
> #define TAINT_RANDSTRUCT 17
> #define TAINT_TEST 18
> -#define TAINT_FLAGS_COUNT 19
> +#define TAINT_FWCTL 19
> +#define TAINT_FLAGS_COUNT 20
> #define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)
>
> struct taint_flag {
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d8635d5cecb250..0c55eec9e8744a 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -511,6 +511,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
> TAINT_FLAG(AUX, 'X', ' ', true),
> TAINT_FLAG(RANDSTRUCT, 'T', ' ', true),
> TAINT_FLAG(TEST, 'N', ' ', true),
> + TAINT_FLAG(FWCTL, 'J', ' ', true),
> };
>
> #undef TAINT_FLAG
> diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
> index 279be06332be99..e7da0909d09707 100755
> --- a/tools/debugging/kernel-chktaint
> +++ b/tools/debugging/kernel-chktaint
> @@ -204,6 +204,14 @@ else
> echo " * an in-kernel test (such as a KUnit test) has been run (#18)"
> fi
>
> +T=`expr $T / 2`
> +if [ `expr $T % 2` -eq 0 ]; then
> + addout " "
> +else
> + addout "J"
> + echo " * fwctl's mutating debug interface was used (#19)"
> +fi
> +
> echo "For a more detailed explanation of the various taint flags see"
> echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources"
> echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html"
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
2025-02-07 0:13 ` [PATCH v4 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
@ 2025-02-08 0:28 ` Dave Jiang
0 siblings, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2025-02-08 0:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Add the FWCTL_RPC ioctl which allows a request/response RPC call to device
> firmware. Drivers implementing this call must follow the security
> guidelines under Documentation/userspace-api/fwctl.rst
>
> The core code provides some memory management helpers to get the messages
> copied from and back to userspace. The driver is responsible for
> allocating the output message memory and delivering the message to the
> device.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/fwctl/main.c | 60 +++++++++++++++++++++++++++++++++
> include/linux/fwctl.h | 8 +++++
> include/uapi/fwctl/fwctl.h | 68 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 136 insertions(+)
>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index 4b6792f2031e86..a5e26944b830b5 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -8,17 +8,20 @@
> #include <linux/container_of.h>
> #include <linux/fs.h>
> #include <linux/module.h>
> +#include <linux/sizes.h>
> #include <linux/slab.h>
>
> #include <uapi/fwctl/fwctl.h>
>
> enum {
> FWCTL_MAX_DEVICES = 4096,
> + MAX_RPC_LEN = SZ_2M,
> };
> static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
>
> static dev_t fwctl_dev;
> static DEFINE_IDA(fwctl_ida);
> +static unsigned long fwctl_tainted;
>
> struct fwctl_ucmd {
> struct fwctl_uctx *uctx;
> @@ -76,9 +79,65 @@ static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
> return ucmd_respond(ucmd, sizeof(*cmd));
> }
>
> +static int fwctl_cmd_rpc(struct fwctl_ucmd *ucmd)
> +{
> + struct fwctl_device *fwctl = ucmd->uctx->fwctl;
> + struct fwctl_rpc *cmd = ucmd->cmd;
> + size_t out_len;
> +
> + if (cmd->in_len > MAX_RPC_LEN || cmd->out_len > MAX_RPC_LEN)
> + return -EMSGSIZE;
> +
> + switch (cmd->scope) {
> + case FWCTL_RPC_CONFIGURATION:
> + case FWCTL_RPC_DEBUG_READ_ONLY:
> + break;
> +
> + case FWCTL_RPC_DEBUG_WRITE_FULL:
> + if (!capable(CAP_SYS_RAWIO))
> + return -EPERM;
> + fallthrough;
> + case FWCTL_RPC_DEBUG_WRITE:
> + if (!test_and_set_bit(0, &fwctl_tainted)) {
> + dev_warn(
> + &fwctl->dev,
> + "%s(%d): has requested full access to the physical device device",
> + current->comm, task_pid_nr(current));
> + add_taint(TAINT_FWCTL, LOCKDEP_STILL_OK);
> + }
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + void *inbuf __free(kvfree) = kvzalloc(cmd->in_len, GFP_KERNEL_ACCOUNT);
> + if (!inbuf)
> + return -ENOMEM;
> + if (copy_from_user(inbuf, u64_to_user_ptr(cmd->in), cmd->in_len))
> + return -EFAULT;
> +
> + out_len = cmd->out_len;
> + void *outbuf __free(kvfree) = fwctl->ops->fw_rpc(
> + ucmd->uctx, cmd->scope, inbuf, cmd->in_len, &out_len);
> + if (IS_ERR(outbuf))
> + return PTR_ERR(outbuf);
> + if (outbuf == inbuf) {
> + /* The driver can re-use inbuf as outbuf */
> + inbuf = NULL;
> + }
> +
> + if (copy_to_user(u64_to_user_ptr(cmd->out), outbuf,
> + min(cmd->out_len, out_len)))
> + return -EFAULT;
> +
> + cmd->out_len = out_len;
> + return ucmd_respond(ucmd, sizeof(*cmd));
> +}
> +
> /* On stack memory for the ioctl structs */
> union ucmd_buffer {
> struct fwctl_info info;
> + struct fwctl_rpc rpc;
> };
>
> struct fwctl_ioctl_op {
> @@ -99,6 +158,7 @@ struct fwctl_ioctl_op {
> }
> static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
> IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
> + IOCTL_OP(FWCTL_RPC, fwctl_cmd_rpc, struct fwctl_rpc, out),
> };
>
> static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index 9b6cc8ae1aa0ca..c2fcaa17a2bcd5 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -47,6 +47,14 @@ struct fwctl_ops {
> * ignore length on input, the core code will handle everything.
> */
> void *(*info)(struct fwctl_uctx *uctx, size_t *length);
> + /**
> + * @fw_rpc: Implement FWCTL_RPC. Deliver rpc_in/in_len to the FW and
> + * return the response and set out_len. rpc_in can be returned as the
> + * response pointer. Otherwise the returned pointer is freed with
> + * kvfree().
> + */
> + void *(*fw_rpc)(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
> + void *rpc_in, size_t in_len, size_t *out_len);
> };
>
> /**
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> index ac66853200a5a8..7a21f2f011917a 100644
> --- a/include/uapi/fwctl/fwctl.h
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -67,4 +67,72 @@ struct fwctl_info {
> };
> #define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
>
> +/**
> + * enum fwctl_rpc_scope - Scope of access for the RPC
> + *
> + * Refer to fwctl.rst for a more detailed discussion of these scopes.
> + */
> +enum fwctl_rpc_scope {
> + /**
> + * @FWCTL_RPC_CONFIGURATION: Device configuration access scope
> + *
> + * Read/write access to device configuration. When configuration
> + * is written to the device it remains in a fully supported state.
> + */
> + FWCTL_RPC_CONFIGURATION = 0,
> + /**
> + * @FWCTL_RPC_DEBUG_READ_ONLY: Read only access to debug information
> + *
> + * Readable debug information. Debug information is compatible with
> + * kernel lockdown, and does not disclose any sensitive information. For
> + * instance exposing any encryption secrets from this information is
> + * forbidden.
> + */
> + FWCTL_RPC_DEBUG_READ_ONLY = 1,
> + /**
> + * @FWCTL_RPC_DEBUG_WRITE: Writable access to lockdown compatible debug information
> + *
> + * Allows write access to data in the device which may leave a fully
> + * supported state. This is intended to permit intensive and possibly
> + * invasive debugging. This scope will taint the kernel.
> + */
> + FWCTL_RPC_DEBUG_WRITE = 2,
> + /**
> + * @FWCTL_RPC_DEBUG_WRITE_FULL: Write access to all debug information
> + *
> + * Allows read/write access to everything. Requires CAP_SYS_RAW_IO, so
> + * it is not required to follow lockdown principals. If in doubt
> + * debugging should be placed in this scope. This scope will taint the
> + * kernel.
> + */
> + FWCTL_RPC_DEBUG_WRITE_FULL = 3,
> +};
> +
> +/**
> + * struct fwctl_rpc - ioctl(FWCTL_RPC)
> + * @size: sizeof(struct fwctl_rpc)
> + * @scope: One of enum fwctl_rpc_scope, required scope for the RPC
> + * @in_len: Length of the in memory
> + * @out_len: Length of the out memory
> + * @in: Request message in device specific format
> + * @out: Response message in device specific format
> + *
> + * Deliver a Remote Procedure Call to the device FW and return the response. The
> + * call's parameters and return are marshaled into linear buffers of memory. Any
> + * errno indicates that delivery of the RPC to the device failed. Return status
> + * originating in the device during a successful delivery must be encoded into
> + * out.
> + *
> + * The format of the buffers matches the out_device_type from FWCTL_INFO.
> + */
> +struct fwctl_rpc {
> + __u32 size;
> + __u32 scope;
> + __u32 in_len;
> + __u32 out_len;
> + __aligned_u64 in;
> + __aligned_u64 out;
> +};
> +#define FWCTL_RPC _IO(FWCTL_TYPE, FWCTL_CMD_RPC)
> +
> #endif
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 06/10] fwctl: Add documentation
2025-02-07 0:13 ` [PATCH v4 06/10] fwctl: Add documentation Jason Gunthorpe
2025-02-07 14:42 ` Jonathan Cameron
@ 2025-02-08 0:40 ` Dave Jiang
1 sibling, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2025-02-08 0:40 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Document the purpose and rules for the fwctl subsystem.
>
> Link in kdocs to the doc tree.
>
> Nacked-by: Jakub Kicinski <kuba@kernel.org>
> Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
One spelling correction below. Otherwise
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++++++++++
> Documentation/userspace-api/fwctl/index.rst | 12 +
> Documentation/userspace-api/index.rst | 1 +
> MAINTAINERS | 2 +-
> 4 files changed, 299 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
> create mode 100644 Documentation/userspace-api/fwctl/index.rst
>
> diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
> new file mode 100644
> index 00000000000000..428f6f5bb9b4f9
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl/fwctl.rst
> @@ -0,0 +1,285 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============
> +fwctl subsystem
> +===============
> +
> +:Author: Jason Gunthorpe
> +
> +Overview
> +========
> +
> +Modern devices contain extensive amounts of FW, and in many cases, are largely
> +software-defined pieces of hardware. The evolution of this approach is largely a
> +reaction to Moore's Law where a chip tape out is now highly expensive, and the
> +chip design is extremely large. Replacing fixed HW logic with a flexible and
> +tightly coupled FW/HW combination is an effective risk mitigation against chip
> +respin. Problems in the HW design can be counteracted in device FW. This is
> +especially true for devices which present a stable and backwards compatible
> +interface to the operating system driver (such as NVMe).
> +
> +The FW layer in devices has grown to incredible size and devices frequently
> +integrate clusters of fast processors to run it. For example, mlx5 devices have
> +over 30MB of FW code, and big configurations operate with over 1GB of FW managed
> +runtime state.
> +
> +The availability of such a flexible layer has created quite a variety in the
> +industry where single pieces of silicon are now configurable software-defined
> +devices and can operate in substantially different ways depending on the need.
> +Further, we often see cases where specific sites wish to operate devices in ways
> +that are highly specialized and require applications that have been tailored to
> +their unique configuration.
> +
> +Further, devices have become multi-functional and integrated to the point they
> +no longer fit neatly into the kernel's division of subsystems. Modern
> +multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
> +subsystems while sharing the underlying hardware using the auxiliary device
> +system.
> +
> +All together this creates a challenge for the operating system, where devices
> +have an expansive FW environment that needs robust device-specific debugging
> +support, and FW-driven functionality that is not well suited to “generic”
> +interfaces. fwctl seeks to allow access to the full device functionality from
> +user space in the areas of debuggability, management, and first-boot/nth-boot
> +provisioning.
> +
> +fwctl is aimed at the common device design pattern where the OS and FW
> +communicate via an RPC message layer constructed with a queue or mailbox scheme.
> +In this case the driver will typically have some layer to deliver RPC messages
> +and collect RPC responses from device FW. The in-kernel subsystem drivers that
> +operate the device for its primary purposes will use these RPCs to build their
> +drivers, but devices also usually have a set of ancillary RPCs that don't really
> +fit into any specific subsystem. For example, a HW RAID controller is primarily
> +operated by the block layer but also comes with a set of RPCs to administer the
> +construction of drives within the HW RAID.
> +
> +In the past when devices were more single function, individual subsystems would
> +grow different approaches to solving some of these common problems. For instance
> +monitoring device health, manipulating its FLASH, debugging the FW,
> +provisioning, all have various unique interfaces across the kernel.
> +
> +fwctl's purpose is to define a common set of limited rules, described below,
> +that allow user space to securely construct and execute RPCs inside device FW.
> +The rules serve as an agreement between the operating system and FW on how to
> +correctly design the RPC interface. As a uAPI the subsystem provides a thin
> +layer of discovery and a generic uAPI to deliver the RPCs and collect the
> +response. It supports a system of user space libraries and tools which will
> +use this interface to control the device using the device native protocols.
> +
> +Scope of Action
> +---------------
> +
> +fwctl drivers are strictly restricted to being a way to operate the device FW.
> +It is not an avenue to access random kernel internals, or other operating system
> +SW states.
> +
> +fwctl instances must operate on a well-defined device function, and the device
> +should have a well-defined security model for what scope within the physical
> +device the function is permitted to access. For instance, the most complex PCIe
> +device today may broadly have several function-level scopes:
> +
> + 1. A privileged function with full access to the on-device global state and
> + configuration
> +
> + 2. Multiple hypervisor functions with control over itself and child functions
> + used with VMs
> +
> + 3. Multiple VM functions tightly scoped within the VM
> +
> +The device may create a logical parent/child relationship between these scopes.
> +For instance a child VM's FW may be within the scope of the hypervisor FW. It is
> +quite common in the VFIO world that the hypervisor environment has a complex
> +provisioning/profiling/configuration responsibility for the function VFIO
> +assigns to the VM.
> +
> +Further, within the function, devices often have RPC commands that fall within
> +some general scopes of action (see enum fwctl_rpc_scope):
> +
> + 1. Access to function & child configuration, FLASH, etc. that becomes live at a
> + function reset. Access to function & child runtime configuration that is
> + transparent or non-disruptive to any driver or VM.
> +
> + 2. Read-only access to function debug information that may report on FW objects
> + in the function & child, including FW objects owned by other kernel
> + subsystems.
> +
> + 3. Write access to function & child debug information strictly compatible with
> + the principles of kernel lockdown and kernel integrity protection. Triggers
> + a kernel Taint.
> +
> + 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
> +
> +User space will provide a scope label on each RPC and the kernel must enforce the
> +above CAPs and taints based on that scope. A combination of kernel and FW can
> +enforce that RPCs are placed in the correct scope by user space.
> +
> +Denied behavior
> +---------------
> +
> +There are many things this interface must not allow user space to do (without a
> +Taint or CAP), broadly derived from the principles of kernel lockdown. Some
> +examples:
> +
> + 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
> + untrusted code, or otherwise compromise device or system security and
> + integrity.
> +
> + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
> + objects owned by kernel drivers.
> +
> + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
> + driver can react to the device configuration at function reset/driver load
> + time, but otherwise must not be coupled to fwctl.
> +
> + 4. Operate the HW in a way that overlaps with the core purpose of another
> + primary kernel subsystem, such as read/write to LBAs, send/receive of
> + network packets, or operate an accelerator's data plane.
> +
> +fwctl is not a replacement for device direct access subsystems like uacce or
> +VFIO.
> +
> +Operations exposed through fwctl's non-taining interfaces should be fully
> +sharable with other users of the device. For instance exposing a RPC through
> +fwctl should never prevent a kernel subsystem from also concurrently using that
> +same RPC or hardware unit down the road. In such cases fwctl will be less
> +important than proper kernel subsystems that eventually emerge. Mistakes in this
> +area resulting in clashes will be resolved in favour of a kernel implementation.
> +
> +fwctl User API
> +==============
> +
> +.. kernel-doc:: include/uapi/fwctl/fwctl.h
> +.. kernel-doc:: include/uapi/fwctl/mlx5.h
> +
> +sysfs Class
> +-----------
> +
> +fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
> +(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
> +operates the iotcl uAPI described above.
> +
> +fwctl devices can be related to driver components in other subsystems through
> +sysfs::
> +
> + $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> + ibp0s10f0
> +
> + $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> + fwctl0/
> +
> + $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> + dev device power subsystem uevent
> +
> +User space Community
> +--------------------
> +
> +Drawing inspiration from nvme-cli, participating in the kernel side must come
> +with a user space in a common TBD git tree, at a minimum to usefully operate the
> +kernel driver. Providing such an implementation is a pre-condition to merging a
> +kernel driver.
> +
> +The goal is to build user space community around some of the shared problems
> +we all have, and ideally develop some common user space programs with some
> +starting themes of:
> +
> + - Device in-field debugging
> +
> + - HW provisioning
> +
> + - VFIO child device profiling before VM boot
> +
> + - Confidential Compute topics (attestation, secure provisioning)
> +
> +that stretch across all subsystems in the kernel. fwupd is a great example of
> +how an excellent user space experience can emerge out of kernel-side diversity.
> +
> +fwctl Kernel API
> +================
> +
> +.. kernel-doc:: drivers/fwctl/main.c
> + :export:
> +.. kernel-doc:: include/linux/fwctl.h
> +
> +fwctl Driver design
> +-------------------
> +
> +In many cases a fwctl driver is going to be part of a larger cross-subsystem
> +device possibly using the auxiliary_device mechanism. In that case several
> +subsystems are going to be sharing the same device and FW interface layer so the
> +device design must already provide for isolation and cooperation between kernel
> +subsystems. fwctl should fit into that same model.
> +
> +Part of the driver should include a description of how its scope restrictions
> +and security model work. The driver and FW together must ensure that RPCs
> +provided by user space are mapped to the appropriate scope. If the validation is
> +done in the driver then the validation can read a 'command effects' report from
> +the device, or hardwire the enforcement. If the validation is done in the FW,
> +then the driver should pass the fwctl_rpc_scope to the FW along with the command.
> +
> +The driver and FW must cooperate to ensure that either fwctl cannot allocate
> +any FW resources, or any resources it does allocate are freed on FD closure. A
> +driver primarily constructed around FW RPCs may find that its core PCI function
> +and RPC layer belongs under fwctl with auxiliary devices connecting to other
> +subsystems.
> +
> +Each device type must be mindful of Linux's philosophy for stable ABI. The FW
> +RPC interface does not have to meet a strictly stable ABI, but it does need to
> +meet an expectation that userspace tools that are deployed and in significant
> +use don't needlessly break. FW upgrade and kernel upgrade should keep widely
> +deployed tooling working.
> +
> +Development and debugging focused RPCs under more permissive scopes can have
> +less stablitiy if the tools using them are only run under exceptional
s/stablitiy/stability/
DJ
> +circumstances and not for every day use of the device. Debugging tools may even
> +require exact version matching as they may require something similar to DWARF
> +debug information from the FW binary.
> +
> +Security Response
> +=================
> +
> +The kernel remains the gatekeeper for this interface. If violations of the
> +scopes, security or isolation principles are found, we have options to let
> +devices fix them with a FW update, push a kernel patch to parse and block RPC
> +commands or push a kernel patch to block entire firmware versions/devices.
> +
> +While the kernel can always directly parse and restrict RPCs, it is expected
> +that the existing kernel pattern of allowing drivers to delegate validation to
> +FW to be a useful design.
> +
> +Existing Similar Examples
> +=========================
> +
> +The approach described in this document is not a new idea. Direct, or near
> +direct device access has been offered by the kernel in different areas for
> +decades. With more devices wanting to follow this design pattern it is becoming
> +clear that it is not entirely well understood and, more importantly, the
> +security considerations are not well defined or agreed upon.
> +
> +Some examples:
> +
> + - HW RAID controllers. This includes RPCs to do things like compose drives into
> + a RAID volume, configure RAID parameters, monitor the HW and more.
> +
> + - Baseboard managers. RPCs for configuring settings in the device and more
> +
> + - NVMe vendor command capsules. nvme-cli provides access to some monitoring
> + functions that different products have defined, but more exist.
> +
> + - CXL also has a NVMe-like vendor command system.
> +
> + - DRM allows user space drivers to send commands to the device via kernel
> + mediation
> +
> + - RDMA allows user space drivers to directly push commands to the device
> + without kernel involvement
> +
> + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
> +
> +The first 4 are examples of areas that fwctl intends to cover. The latter three
> +are examples of denied behavior as they fully overlap with the primary purpose
> +of a kernel subsystem.
> +
> +Some key lessons learned from these past efforts are the importance of having a
> +common user space project to use as a pre-condition for obtaining a kernel
> +driver. Developing good community around useful software in user space is key to
> +getting companies to fund participation to enable their products.
> diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst
> new file mode 100644
> index 00000000000000..06959fbf154743
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl/index.rst
> @@ -0,0 +1,12 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Firmware Control (FWCTL) Userspace API
> +======================================
> +
> +A framework that define a common set of limited rules that allows user space
> +to securely construct and execute RPCs inside device firmware.
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + fwctl
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index b1395d94b3fd0a..e8e861f767fd5c 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -45,6 +45,7 @@ Devices and I/O
>
> accelerators/ocxl
> dma-buf-alloc-exchange
> + fwctl/index
> gpio/index
> iommufd
> media/index
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5f30adbe6c8521..319169f7cb7e1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9561,7 +9561,7 @@ FWCTL SUBSYSTEM
> M: Jason Gunthorpe <jgg@nvidia.com>
> M: Saeed Mahameed <saeedm@nvidia.com>
> S: Maintained
> -F: Documentation/userspace-api/fwctl.rst
> +F: Documentation/userspace-api/fwctl/
> F: drivers/fwctl/
> F: include/linux/fwctl.h
> F: include/uapi/fwctl/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 21:51 ` Jakub Kicinski
@ 2025-02-08 1:10 ` Saeed Mahameed
2025-02-08 1:16 ` Jason Gunthorpe
1 sibling, 0 replies; 67+ messages in thread
From: Saeed Mahameed @ 2025-02-08 1:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andy Gospodarek, Jason Gunthorpe, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On 07 Feb 13:51, Jakub Kicinski wrote:
>On Fri, 7 Feb 2025 12:25:28 -0800 Saeed Mahameed wrote:
>> >nVidia is already refusing to add basic minoring features to their
>> >upstream driver, and keeps asking its customers to migrate to libdoca.
>>
>> nVidia is one of the top contributers to netdev,
>
>That's inaccurate. I can't think of a single meaningful contribution
>from nVidia's NIC team outside of your own driver in the last 2 years.
>
I can help refresh your memory.
Switchdev, devlink, XFRM, TLS, XDP (multi buff, meta data),
page pool, and I'm pretty sure much more.
I can also point out a lot of projects that are still stuck for many years
due to lack of agreements on design and communicated importance e.g:
PSP, TCP ZC, devlink params, and some more..
Maybe you mean meaningful to you, which is very hard to predict what is
meaningful to you without clear communication.
>> we submit patches on weekly bases and due to netdev mailing list
>> review backlog and policy we barely make quota,
>
>Luckily we have development statistics so we don't have to argue:
>
Yes we don't have to argue, thanks for sharing.
[...] snip top reviewers since it's not part of the discussion.
>Top authors (cs): Top authors (msg):
> 1 ( ) [9] RedHat 1 ( ) [48] Intel
> 2 ( +2) [8] Google 2 ( ) [42] RedHat
> 3 ( -1) [7] Intel 3 ( +1) [39] Meta
> 4 ( -1) [7] Meta 4 ( -1) [31] Huawei
> 5 ( +2) [5] nVidia 5 ( ) [31] nVidia
^^^^^^ ^^^^^^
So we do contribute to netdev.. and we are not moving away from netdev
which was the whole point of your argument.
[...] snip Top scores, since doing reviews is not the issue here.
It's a separate topic. If you want we can discuss in a separate thread
since I got a lot of what to say on this.
>https://lore.kernel.org/all/20250121200710.19126f7d@kernel.org/
>
>nVidia has a negative review vs authorship score. It'd probably
>be much worse if it wasn't for the work of the switch team.
>
Irrelevant to FWCTL. And yes very important topic to discuss, we have
our own reasons and concerns. Let me know if you want to open this topic
for discussion in a separate thread.
>> so please elaborate on what features we are refusing to do ??
>
>nVidia likes to send these threads to my management so I need
>to be careful. An issue was discovered during new platform evaluation.
>That's all I'm gonna say.
>
I am not sure what you are talking about, but as one of the mlx5
maintainers I am 100% we are not refusing to do anything that we've been
asked, it is all about priorities, you have to sort this out with whoever
is reaching out to you :).
It's really hard to keep the discussion coherent and objective when you
are referring to private discussions I am not really part of, that we
can't discuss here, yet you brought them up.
>> As explained above, netdev doesn't need it, but netdev subsystem also
>> hosts the pci base drivers, so you are going to see fwctl patches the
>> same as you see rdma and other non netdev patches flowing through
>> netdev ML.
>
>Sure, but we're deadlocked here. It may be a slight inconvenience to
>redo the interface so that its not a standalone aux bus driver. But if
>you agree the netdev doesn't need it seems like a fairly straightforward
>way to unblock your progress.
Yes Aux needs some improvements and it must and can be abstracted out of
netdev relatively easily, to remove this unnecessary workload on netdev ML.
>
>I am glad that you at least agree now that nedev doesn't need it.
netdev can perfectly operate with all the standard tooling we got and we will
keep on developing them, TCP/IP configurability is well-established, that being
said, netdev is very bad at debug, and really really behind, the
few debugfs' and devlinks we have really don't cut it and will never be as
good as fwctl, so mlx5 fwctl has to run side by side with netdev,
I believe the same is true for all other vendors.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-07 21:51 ` Jakub Kicinski
2025-02-08 1:10 ` Saeed Mahameed
@ 2025-02-08 1:16 ` Jason Gunthorpe
2025-02-08 3:24 ` Andy Gospodarek
2025-02-11 1:04 ` Jakub Kicinski
1 sibling, 2 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-08 1:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
> But if you agree the netdev doesn't need it seems like a fairly
> straightforward way to unblock your progress.
I'm trying to understand what you are suggesting here.
We have many scenarios where mlx5_core spawns all kinds of different
devices, including recovery cases where there is no networking at all
and only fwctl. So we can't just discard the aux dev or mlx5_core
triggered setup without breaking scenarios.
However, you seem to be suggesting that netdev-only configurations (ie
netdev loaded but no rdma loaded) should disable fwctl. Is that the
case? All else would remain the same. It is very ugly but I could see
a technical path to do it, and would consider it if that brings peace.
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-08 1:16 ` Jason Gunthorpe
@ 2025-02-08 3:24 ` Andy Gospodarek
2025-02-11 1:04 ` Jakub Kicinski
1 sibling, 0 replies; 67+ messages in thread
From: Andy Gospodarek @ 2025-02-08 3:24 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jakub Kicinski, Saeed Mahameed, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Fri, Feb 7, 2025 at 8:16 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
>
> > But if you agree the netdev doesn't need it seems like a fairly
> > straightforward way to unblock your progress.
>
> I'm trying to understand what you are suggesting here.
>
> We have many scenarios where mlx5_core spawns all kinds of different
> devices, including recovery cases where there is no networking at all
> and only fwctl. So we can't just discard the aux dev or mlx5_core
> triggered setup without breaking scenarios.
>
> However, you seem to be suggesting that netdev-only configurations (ie
> netdev loaded but no rdma loaded) should disable fwctl. Is that the
> case? All else would remain the same. It is very ugly but I could see
> a technical path to do it, and would consider it if that brings peace.
>
We can probably live with that as well if it's required to keep fwctl
in an RDMA driver and out of pure netdevs.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 06/10] fwctl: Add documentation
2025-02-07 14:42 ` Jonathan Cameron
@ 2025-02-10 15:17 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-10 15:17 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 02:42:49PM +0000, Jonathan Cameron wrote:
> On Thu, 6 Feb 2025 20:13:28 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > Document the purpose and rules for the fwctl subsystem.
> >
> > Link in kdocs to the doc tree.
> >
> > Nacked-by: Jakub Kicinski <kuba@kernel.org>
> > Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> A few tiny things inline.
Got them all, thanks
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-08 0:16 ` Dave Jiang
@ 2025-02-10 15:24 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-10 15:24 UTC (permalink / raw)
To: Dave Jiang
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 05:16:12PM -0700, Dave Jiang wrote:
> > This approach has proven to work quite well in the iommufd and rdma
> > subsystems.
> >
> > Allocate an ioctl number space for the subsystem.
> >
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Got all the changes, thanks
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-08 1:16 ` Jason Gunthorpe
2025-02-08 3:24 ` Andy Gospodarek
@ 2025-02-11 1:04 ` Jakub Kicinski
2025-02-11 7:55 ` Leon Romanovsky
2025-02-11 16:24 ` David Ahern
1 sibling, 2 replies; 67+ messages in thread
From: Jakub Kicinski @ 2025-02-11 1:04 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
> On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
>
> > But if you agree the netdev doesn't need it seems like a fairly
> > straightforward way to unblock your progress.
>
> I'm trying to understand what you are suggesting here.
>
> We have many scenarios where mlx5_core spawns all kinds of different
> devices, including recovery cases where there is no networking at all
> and only fwctl. So we can't just discard the aux dev or mlx5_core
> triggered setup without breaking scenarios.
>
> However, you seem to be suggesting that netdev-only configurations (ie
> netdev loaded but no rdma loaded) should disable fwctl. Is that the
> case? All else would remain the same. It is very ugly but I could see
> a technical path to do it, and would consider it if that brings peace.
Yes, when RDMA driver is not loaded there should be no access to fwctl.
When RDMA is disabled on the device via devlink there should be no
fwctl access.
To disincentivize "creative workarounds" we have to also agree and
document that fwctl must not be used to configure TCP/IP functions
of the device, or host queues used by the netdev stack.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 1:04 ` Jakub Kicinski
@ 2025-02-11 7:55 ` Leon Romanovsky
2025-02-11 14:27 ` Andy Gospodarek
2025-02-11 18:36 ` Nelson, Shannon
2025-02-11 16:24 ` David Ahern
1 sibling, 2 replies; 67+ messages in thread
From: Leon Romanovsky @ 2025-02-11 7:55 UTC (permalink / raw)
To: Jakub Kicinski, Jason Gunthorpe
Cc: Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, linux-cxl, linux-rdma, netdev, Nelson, Shannon,
Michael Chan
On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
> On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
> > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
> >
> > > But if you agree the netdev doesn't need it seems like a fairly
> > > straightforward way to unblock your progress.
> >
> > I'm trying to understand what you are suggesting here.
> >
> > We have many scenarios where mlx5_core spawns all kinds of different
> > devices, including recovery cases where there is no networking at all
> > and only fwctl. So we can't just discard the aux dev or mlx5_core
> > triggered setup without breaking scenarios.
> >
> > However, you seem to be suggesting that netdev-only configurations (ie
> > netdev loaded but no rdma loaded) should disable fwctl. Is that the
> > case? All else would remain the same. It is very ugly but I could see
> > a technical path to do it, and would consider it if that brings peace.
>
> Yes, when RDMA driver is not loaded there should be no access to fwctl.
There are users mentioned in cover letter, which need FWCTL without RDMA.
https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
I want to suggest something different. What about to move all XXX_core
logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
place?
There is no technical need to have PCI/FW logic inside networking stack.
Thanks
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-07 21:58 ` Dave Jiang
@ 2025-02-11 9:33 ` Jonathan Cameron
2025-02-13 17:55 ` Jason Gunthorpe
2025-02-13 17:52 ` Jason Gunthorpe
1 sibling, 1 reply; 67+ messages in thread
From: Jonathan Cameron @ 2025-02-11 9:33 UTC (permalink / raw)
To: Dave Jiang
Cc: Jason Gunthorpe, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, 7 Feb 2025 14:58:51 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> > [
> > Many people were away around the holiday period, but work is back in full
> > swing now with Dave already at v3 on his CXL work over the past couple
> > weeks. We are looking at a good chance of reaching this merge window. I
> > will work out some shared branches with CXL and get it into linux-next
> > once all three drivers can be assembled and reviews seem to be concluding.
> >
> > There are couple open notes
> > - Greg was interested in a new name, but nobody offered any bikesheds
> > - I would like a co-maintainer
>
> I volunteer as tribute. :)
>
> I got the CXL series rebased and tested on top of this series. So you can add
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> for the core FWCTL bits in the series.
This is an area I plan to keep reviewing (and adding more use cases), so feel
free to add me as a Reviewer or Maintainer (depending on how guilty you want
me to feel if there is a backlog to review :) Will save me making sure to
track these down as they get posted in different subsystems.
Thanks,
Jonathan
>
> I'll post the CXL FWCTL series v4 shortly.
>
> DJ
>
> > ]
> >
> > fwctl is a new subsystem intended to bring some common rules and order to
> > the growing pattern of exposing a secure FW interface directly to
> > userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> > exposing a device for datapath operations fwctl is focused on debugging,
> > configuration and provisioning of the device. It will not have the
> > necessary features like interrupt delivery to support a datapath.
> >
> > This concept is similar to the long standing practice in the "HW" RAID
> > space of having a device specific misc device to manage the RAID
> > controller FW. fwctl generalizes this notion of a companion debug and
> > management interface that goes along with a dataplane implemented in an
> > appropriate subsystem.
> >
> > The need for this has reached a critical point as many users are moving to
> > run lockdown enabled kernels. Several existing devices have had long
> > standing tooling for management that relied on /sys/../resource0 or PCI
> > config space access which is not permitted in lockdown. A major point of
> > fwctl is to define and document the rules that a device must follow to
> > expose a lockdown compatible RPC.
> >
> > Based on some discussion fwctl splits the RPCs into four categories
> >
> > FWCTL_RPC_CONFIGURATION
> > FWCTL_RPC_DEBUG_READ_ONLY
> > FWCTL_RPC_DEBUG_WRITE
> > FWCTL_RPC_DEBUG_WRITE_FULL
> >
> > Where the latter two trigger a new TAINT_FWCTL, and the final one requires
> > CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
> > would be responsible to restrict RPCs to the requested security scope,
> > while the core code handles the tainting and CAP checks.
> >
> > For details see the final patch which introduces the documentation.
> >
> > The CXL FWCTL driver is now in it own series on v3:
> > https://lore.kernel.org/r/20250204220430.4146187-1-dave.jiang@intel.com
> >
> > I'm expecting a 3rd driver (from Shannon @ Pensando) to be posted right
> > away, the github version I saw looked good. I've got soft commitments for
> > about 6 drivers in total now.
> >
> > There have been three LWN articles written discussing various aspects of
> > this proposal:
> >
> > https://lwn.net/Articles/955001/
> > https://lwn.net/Articles/969383/
> > https://lwn.net/Articles/990802/
> >
> > A really giant ksummit thread preceding a discussion at the Maintainer
> > Summit:
> >
> > https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
> >
> > Several have expressed general support for this concept:
> >
> > AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com
> > Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
> > Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org
> > Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> > Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org
> > NVIDIA Networking
> > Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
> > Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
> > SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
> >
> > Work is ongoing for userspace, currently the mellanox tool suite has been
> > ported over:
> > https://github.com/Mellanox/mstflint
> >
> > And a more simplified example how to use it:
> > https://github.com/jgunthorpe/mlx5ctl.git
> >
> > This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
> >
> > v4:
> > - Rebase to v6.14-rc1
> > - Fine tune comments and rst documentatin
> > - Adjust cleanup.h usage - remove places that add more ofuscation than
> > value
> > - CXL is back to its own independent series
> > - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit
> > - Fix mlx5ctl_validate_rpc() logic around scope checking
> > - Disable mlx5ctl on SFs
> > v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com
> > - Rebase to v6.11-rc4
> > - Add a squashed version of David's CXL series as the 2nd driver
> > - Add missing includes
> > - Improve comments based on feedback
> > - Use the kdoc format that puts the member docs inside the struct
> > - Rewrite fwctl_alloc_device() to be clearer
> > - Incorporate all remarks for the documentation
> > v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
> > - Rebase to v6.10-rc5
> > - Minor style changes
> > - Follow the style consensus for the guard stuff
> > - Documentation grammer/spelling
> > - Add missed length output for mlx5 get_info
> > - Add two more missed MLX5 CMD's
> > - Collect tags
> > v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
> >
> > Andy Gospodarek (2):
> > fwctl/bnxt: Support communicating with bnxt fw
> > bnxt: Create an auxiliary device for fwctl_bnxt
> >
> > Jason Gunthorpe (6):
> > fwctl: Add basic structure for a class subsystem with a cdev
> > fwctl: Basic ioctl dispatch for the character device
> > fwctl: FWCTL_INFO to return basic information about the device
> > taint: Add TAINT_FWCTL
> > fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
> > fwctl: Add documentation
> >
> > Saeed Mahameed (2):
> > fwctl/mlx5: Support for communicating with mlx5 fw
> > mlx5: Create an auxiliary device for fwctl_mlx5
> >
> > Documentation/admin-guide/tainted-kernels.rst | 5 +
> > Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++
> > Documentation/userspace-api/fwctl/index.rst | 12 +
> > Documentation/userspace-api/index.rst | 1 +
> > .../userspace-api/ioctl/ioctl-number.rst | 1 +
> > MAINTAINERS | 16 +
> > drivers/Kconfig | 2 +
> > drivers/Makefile | 1 +
> > drivers/fwctl/Kconfig | 32 ++
> > drivers/fwctl/Makefile | 6 +
> > drivers/fwctl/bnxt/Makefile | 4 +
> > drivers/fwctl/bnxt/bnxt.c | 167 +++++++
> > drivers/fwctl/main.c | 416 ++++++++++++++++++
> > drivers/fwctl/mlx5/Makefile | 4 +
> > drivers/fwctl/mlx5/main.c | 340 ++++++++++++++
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +
> > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +
> > drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 126 +++++-
> > drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 4 +
> > drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 +
> > include/linux/fwctl.h | 135 ++++++
> > include/linux/panic.h | 3 +-
> > include/uapi/fwctl/bnxt.h | 27 ++
> > include/uapi/fwctl/fwctl.h | 140 ++++++
> > include/uapi/fwctl/mlx5.h | 36 ++
> > kernel/panic.c | 1 +
> > tools/debugging/kernel-chktaint | 8 +
> > 27 files changed, 1782 insertions(+), 5 deletions(-)
> > create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
> > create mode 100644 Documentation/userspace-api/fwctl/index.rst
> > create mode 100644 drivers/fwctl/Kconfig
> > create mode 100644 drivers/fwctl/Makefile
> > create mode 100644 drivers/fwctl/bnxt/Makefile
> > create mode 100644 drivers/fwctl/bnxt/bnxt.c
> > create mode 100644 drivers/fwctl/main.c
> > create mode 100644 drivers/fwctl/mlx5/Makefile
> > create mode 100644 drivers/fwctl/mlx5/main.c
> > create mode 100644 include/linux/fwctl.h
> > create mode 100644 include/uapi/fwctl/bnxt.h
> > create mode 100644 include/uapi/fwctl/fwctl.h
> > create mode 100644 include/uapi/fwctl/mlx5.h
> >
> >
> > base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
>
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 7:55 ` Leon Romanovsky
@ 2025-02-11 14:27 ` Andy Gospodarek
2025-02-12 14:20 ` Leon Romanovsky
2025-02-11 18:36 ` Nelson, Shannon
1 sibling, 1 reply; 67+ messages in thread
From: Andy Gospodarek @ 2025-02-11 14:27 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jakub Kicinski, Jason Gunthorpe, Saeed Mahameed, Aron Silverton,
Dan Williams, Daniel Vetter, Dave Jiang, David Ahern,
Andy Gospodarek, Christoph Hellwig, Itay Avraham, Jiri Pirko,
Jonathan Cameron, Leonid Bloch, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Tue, Feb 11, 2025 at 2:55 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
> > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
> > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
> > >
> > > > But if you agree the netdev doesn't need it seems like a fairly
> > > > straightforward way to unblock your progress.
> > >
> > > I'm trying to understand what you are suggesting here.
> > >
> > > We have many scenarios where mlx5_core spawns all kinds of different
> > > devices, including recovery cases where there is no networking at all
> > > and only fwctl. So we can't just discard the aux dev or mlx5_core
> > > triggered setup without breaking scenarios.
> > >
> > > However, you seem to be suggesting that netdev-only configurations (ie
> > > netdev loaded but no rdma loaded) should disable fwctl. Is that the
> > > case? All else would remain the same. It is very ugly but I could see
> > > a technical path to do it, and would consider it if that brings peace.
> >
> > Yes, when RDMA driver is not loaded there should be no access to fwctl.
>
> There are users mentioned in cover letter, which need FWCTL without RDMA.
> https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
>
> I want to suggest something different. What about to move all XXX_core
> logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
> place?
>
I understand the logic in your statement, but I do not want to
separate/split PCI driver from the NIC driver for bnxt-based devices.
We can look at doing that for future generations of hardware, but
splitting/switching drivers for existing hardware creates a poor
user-experience for distro users.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 1:04 ` Jakub Kicinski
2025-02-11 7:55 ` Leon Romanovsky
@ 2025-02-11 16:24 ` David Ahern
2025-02-18 20:05 ` Jason Gunthorpe
1 sibling, 1 reply; 67+ messages in thread
From: David Ahern @ 2025-02-11 16:24 UTC (permalink / raw)
To: Jakub Kicinski, Jason Gunthorpe
Cc: Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Nelson, Shannon,
Michael Chan
On 2/10/25 6:04 PM, Jakub Kicinski wrote:
> Yes, when RDMA driver is not loaded there should be no access to fwctl.
> When RDMA is disabled on the device via devlink there should be no
> fwctl access.
>
> To disincentivize "creative workarounds" we have to also agree and
> document that fwctl must not be used to configure TCP/IP functions
> of the device, or host queues used by the netdev stack.
Your request is not about "RDMA only" since there are non-RDMA use cases
at play (e.g., CXL). It seems like what you are really asking for is a
hard exception for "netdev" use cases, right? So a summary along the
lines of:
"Any resources in use by the netdev stack can only be created and
modified by established netdev tools."
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 7:55 ` Leon Romanovsky
2025-02-11 14:27 ` Andy Gospodarek
@ 2025-02-11 18:36 ` Nelson, Shannon
2025-02-12 13:22 ` Leon Romanovsky
1 sibling, 1 reply; 67+ messages in thread
From: Nelson, Shannon @ 2025-02-11 18:36 UTC (permalink / raw)
To: Leon Romanovsky, Jakub Kicinski, Jason Gunthorpe
Cc: Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, linux-cxl, linux-rdma, netdev, Michael Chan
On 2/10/2025 11:55 PM, Leon Romanovsky wrote:
>
> On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
>> On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
>>> On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
>>>
>>>> But if you agree the netdev doesn't need it seems like a fairly
>>>> straightforward way to unblock your progress.
>>>
>>> I'm trying to understand what you are suggesting here.
>>>
>>> We have many scenarios where mlx5_core spawns all kinds of different
>>> devices, including recovery cases where there is no networking at all
>>> and only fwctl. So we can't just discard the aux dev or mlx5_core
>>> triggered setup without breaking scenarios.
>>>
>>> However, you seem to be suggesting that netdev-only configurations (ie
>>> netdev loaded but no rdma loaded) should disable fwctl. Is that the
>>> case? All else would remain the same. It is very ugly but I could see
>>> a technical path to do it, and would consider it if that brings peace.
>>
>> Yes, when RDMA driver is not loaded there should be no access to fwctl.
>
> There are users mentioned in cover letter, which need FWCTL without RDMA.
> https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
>
> I want to suggest something different. What about to move all XXX_core
> logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
> place?
>
> There is no technical need to have PCI/FW logic inside networking stack.
>
> Thanks
Our pds_core device fits this description as well: it is not an ethernet
PCI device, but helps manage the FW/HW for Eth and other things that are
separate PCI functions. We ended up in the netdev arena because we
first went in as a support for vDPA VFs.
Should these 'core' devices live in linux-pci land? Is it possible that
some 'core' things might be platform devices rather than PCI?
sln
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 18:36 ` Nelson, Shannon
@ 2025-02-12 13:22 ` Leon Romanovsky
2025-02-14 1:03 ` Saeed Mahameed
0 siblings, 1 reply; 67+ messages in thread
From: Leon Romanovsky @ 2025-02-12 13:22 UTC (permalink / raw)
To: Nelson, Shannon
Cc: Jakub Kicinski, Jason Gunthorpe, Saeed Mahameed, Andy Gospodarek,
Aron Silverton, Dan Williams, Daniel Vetter, Dave Jiang,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Leonid Bloch, linux-cxl, linux-rdma,
netdev, Michael Chan
On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote:
> On 2/10/2025 11:55 PM, Leon Romanovsky wrote:
> >
> > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
> > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
> > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
> > > >
> > > > > But if you agree the netdev doesn't need it seems like a fairly
> > > > > straightforward way to unblock your progress.
> > > >
> > > > I'm trying to understand what you are suggesting here.
> > > >
> > > > We have many scenarios where mlx5_core spawns all kinds of different
> > > > devices, including recovery cases where there is no networking at all
> > > > and only fwctl. So we can't just discard the aux dev or mlx5_core
> > > > triggered setup without breaking scenarios.
> > > >
> > > > However, you seem to be suggesting that netdev-only configurations (ie
> > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the
> > > > case? All else would remain the same. It is very ugly but I could see
> > > > a technical path to do it, and would consider it if that brings peace.
> > >
> > > Yes, when RDMA driver is not loaded there should be no access to fwctl.
> >
> > There are users mentioned in cover letter, which need FWCTL without RDMA.
> > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
> >
> > I want to suggest something different. What about to move all XXX_core
> > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
> > place?
> >
> > There is no technical need to have PCI/FW logic inside networking stack.
> >
> > Thanks
>
> Our pds_core device fits this description as well: it is not an ethernet PCI
> device, but helps manage the FW/HW for Eth and other things that are
> separate PCI functions. We ended up in the netdev arena because we first
> went in as a support for vDPA VFs.
>
> Should these 'core' devices live in linux-pci land? Is it possible that
> some 'core' things might be platform devices rather than PCI?
IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now
these core drivers can be placed in drivers/fwctl instead. It will be natural
place for them as they will be located near the UAPI which provides an access
to them.
All other components will be auxbus devices in their respective
subsystems (eth, RDMA ...).
Thanks
>
> sln
>
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 14:27 ` Andy Gospodarek
@ 2025-02-12 14:20 ` Leon Romanovsky
0 siblings, 0 replies; 67+ messages in thread
From: Leon Romanovsky @ 2025-02-12 14:20 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Jakub Kicinski, Jason Gunthorpe, Saeed Mahameed, Aron Silverton,
Dan Williams, Daniel Vetter, Dave Jiang, David Ahern,
Andy Gospodarek, Christoph Hellwig, Itay Avraham, Jiri Pirko,
Jonathan Cameron, Leonid Bloch, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Tue, Feb 11, 2025 at 09:27:08AM -0500, Andy Gospodarek wrote:
> On Tue, Feb 11, 2025 at 2:55 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
> > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
> > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
> > > >
> > > > > But if you agree the netdev doesn't need it seems like a fairly
> > > > > straightforward way to unblock your progress.
> > > >
> > > > I'm trying to understand what you are suggesting here.
> > > >
> > > > We have many scenarios where mlx5_core spawns all kinds of different
> > > > devices, including recovery cases where there is no networking at all
> > > > and only fwctl. So we can't just discard the aux dev or mlx5_core
> > > > triggered setup without breaking scenarios.
> > > >
> > > > However, you seem to be suggesting that netdev-only configurations (ie
> > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the
> > > > case? All else would remain the same. It is very ugly but I could see
> > > > a technical path to do it, and would consider it if that brings peace.
> > >
> > > Yes, when RDMA driver is not loaded there should be no access to fwctl.
> >
> > There are users mentioned in cover letter, which need FWCTL without RDMA.
> > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
> >
> > I want to suggest something different. What about to move all XXX_core
> > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
> > place?
> >
>
> I understand the logic in your statement, but I do not want to
> separate/split PCI driver from the NIC driver for bnxt-based devices.
It is just an idea and there is no need to worry yet. There is no
evidence that netdev community will allow such move.
>
> We can look at doing that for future generations of hardware, but
> splitting/switching drivers for existing hardware creates a poor
> user-experience for distro users.
It is already solved with module autoload, dependencies and aliases.
Thanks
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (11 preceding siblings ...)
2025-02-07 21:58 ` Dave Jiang
@ 2025-02-12 22:21 ` Zhu Yanjun
2025-02-13 2:30 ` Nelson, Shannon
13 siblings, 0 replies; 67+ messages in thread
From: Zhu Yanjun @ 2025-02-12 22:21 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
在 2025/2/7 1:13, Jason Gunthorpe 写道:
> [
> Many people were away around the holiday period, but work is back in full
> swing now with Dave already at v3 on his CXL work over the past couple
> weeks. We are looking at a good chance of reaching this merge window. I
> will work out some shared branches with CXL and get it into linux-next
> once all three drivers can be assembled and reviews seem to be concluding.
>
> There are couple open notes
> - Greg was interested in a new name, but nobody offered any bikesheds
> - I would like a co-maintainer
> ]
>
> fwctl is a new subsystem intended to bring some common rules and order to
> the growing pattern of exposing a secure FW interface directly to
> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> exposing a device for datapath operations fwctl is focused on debugging,
> configuration and provisioning of the device. It will not have the
> necessary features like interrupt delivery to support a datapath.
>
> This concept is similar to the long standing practice in the "HW" RAID
> space of having a device specific misc device to manage the RAID
> controller FW. fwctl generalizes this notion of a companion debug and
> management interface that goes along with a dataplane implemented in an
> appropriate subsystem.
>
> The need for this has reached a critical point as many users are moving to
> run lockdown enabled kernels. Several existing devices have had long
> standing tooling for management that relied on /sys/../resource0 or PCI
> config space access which is not permitted in lockdown. A major point of
> fwctl is to define and document the rules that a device must follow to
> expose a lockdown compatible RPC.
>
> Based on some discussion fwctl splits the RPCs into four categories
>
> FWCTL_RPC_CONFIGURATION
> FWCTL_RPC_DEBUG_READ_ONLY
> FWCTL_RPC_DEBUG_WRITE
> FWCTL_RPC_DEBUG_WRITE_FULL
>
> Where the latter two trigger a new TAINT_FWCTL, and the final one requires
> CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
> would be responsible to restrict RPCs to the requested security scope,
> while the core code handles the tainting and CAP checks.
>
> For details see the final patch which introduces the documentation.
>
> The CXL FWCTL driver is now in it own series on v3:
> https://lore.kernel.org/r/20250204220430.4146187-1-dave.jiang@intel.com
>
> I'm expecting a 3rd driver (from Shannon @ Pensando) to be posted right
> away, the github version I saw looked good. I've got soft commitments for
> about 6 drivers in total now.
>
> There have been three LWN articles written discussing various aspects of
> this proposal:
>
> https://lwn.net/Articles/955001/
> https://lwn.net/Articles/969383/
> https://lwn.net/Articles/990802/
>
> A really giant ksummit thread preceding a discussion at the Maintainer
> Summit:
>
> https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
>
> Several have expressed general support for this concept:
>
> AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com
> Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
> Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org
> Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org
> NVIDIA Networking
> Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
> Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
> SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
>
> Work is ongoing for userspace, currently the mellanox tool suite has been
> ported over:
> https://github.com/Mellanox/mstflint
>
> And a more simplified example how to use it:
> https://github.com/jgunthorpe/mlx5ctl.git
Hi, Jason
I read all the threads about this fwctl subsystem carefully. I think
that this fwctl tool is very nice and helpful in our work. But I can not
find a user guide in the threads.
I want to have a try in our debug work with mlx5 devices. Can you share
a link of a user guide with us?
Your helps are much appreciated.
Thanks a lot.
Zhu Yanjun
>
> This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
>
> v4:
> - Rebase to v6.14-rc1
> - Fine tune comments and rst documentatin
> - Adjust cleanup.h usage - remove places that add more ofuscation than
> value
> - CXL is back to its own independent series
> - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit
> - Fix mlx5ctl_validate_rpc() logic around scope checking
> - Disable mlx5ctl on SFs
> v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com
> - Rebase to v6.11-rc4
> - Add a squashed version of David's CXL series as the 2nd driver
> - Add missing includes
> - Improve comments based on feedback
> - Use the kdoc format that puts the member docs inside the struct
> - Rewrite fwctl_alloc_device() to be clearer
> - Incorporate all remarks for the documentation
> v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
> - Rebase to v6.10-rc5
> - Minor style changes
> - Follow the style consensus for the guard stuff
> - Documentation grammer/spelling
> - Add missed length output for mlx5 get_info
> - Add two more missed MLX5 CMD's
> - Collect tags
> v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
>
> Andy Gospodarek (2):
> fwctl/bnxt: Support communicating with bnxt fw
> bnxt: Create an auxiliary device for fwctl_bnxt
>
> Jason Gunthorpe (6):
> fwctl: Add basic structure for a class subsystem with a cdev
> fwctl: Basic ioctl dispatch for the character device
> fwctl: FWCTL_INFO to return basic information about the device
> taint: Add TAINT_FWCTL
> fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
> fwctl: Add documentation
>
> Saeed Mahameed (2):
> fwctl/mlx5: Support for communicating with mlx5 fw
> mlx5: Create an auxiliary device for fwctl_mlx5
>
> Documentation/admin-guide/tainted-kernels.rst | 5 +
> Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++
> Documentation/userspace-api/fwctl/index.rst | 12 +
> Documentation/userspace-api/index.rst | 1 +
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 16 +
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/fwctl/Kconfig | 32 ++
> drivers/fwctl/Makefile | 6 +
> drivers/fwctl/bnxt/Makefile | 4 +
> drivers/fwctl/bnxt/bnxt.c | 167 +++++++
> drivers/fwctl/main.c | 416 ++++++++++++++++++
> drivers/fwctl/mlx5/Makefile | 4 +
> drivers/fwctl/mlx5/main.c | 340 ++++++++++++++
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 126 +++++-
> drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 4 +
> drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 +
> include/linux/fwctl.h | 135 ++++++
> include/linux/panic.h | 3 +-
> include/uapi/fwctl/bnxt.h | 27 ++
> include/uapi/fwctl/fwctl.h | 140 ++++++
> include/uapi/fwctl/mlx5.h | 36 ++
> kernel/panic.c | 1 +
> tools/debugging/kernel-chktaint | 8 +
> 27 files changed, 1782 insertions(+), 5 deletions(-)
> create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
> create mode 100644 Documentation/userspace-api/fwctl/index.rst
> create mode 100644 drivers/fwctl/Kconfig
> create mode 100644 drivers/fwctl/Makefile
> create mode 100644 drivers/fwctl/bnxt/Makefile
> create mode 100644 drivers/fwctl/bnxt/bnxt.c
> create mode 100644 drivers/fwctl/main.c
> create mode 100644 drivers/fwctl/mlx5/Makefile
> create mode 100644 drivers/fwctl/mlx5/main.c
> create mode 100644 include/linux/fwctl.h
> create mode 100644 include/uapi/fwctl/bnxt.h
> create mode 100644 include/uapi/fwctl/fwctl.h
> create mode 100644 include/uapi/fwctl/mlx5.h
>
>
> base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
` (12 preceding siblings ...)
2025-02-12 22:21 ` Zhu Yanjun
@ 2025-02-13 2:30 ` Nelson, Shannon
2025-02-13 18:02 ` Jason Gunthorpe
13 siblings, 1 reply; 67+ messages in thread
From: Nelson, Shannon @ 2025-02-13 2:30 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed
On 2/6/2025 4:13 PM, Jason Gunthorpe wrote:
>
> [
> Many people were away around the holiday period, but work is back in full
> swing now with Dave already at v3 on his CXL work over the past couple
> weeks. We are looking at a good chance of reaching this merge window. I
> will work out some shared branches with CXL and get it into linux-next
> once all three drivers can be assembled and reviews seem to be concluding.
>
> There are couple open notes
> - Greg was interested in a new name, but nobody offered any bikesheds
> - I would like a co-maintainer
> ]
>
> fwctl is a new subsystem intended to bring some common rules and order to
> the growing pattern of exposing a secure FW interface directly to
> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> exposing a device for datapath operations fwctl is focused on debugging,
> configuration and provisioning of the device. It will not have the
> necessary features like interrupt delivery to support a datapath.
>
> This concept is similar to the long standing practice in the "HW" RAID
> space of having a device specific misc device to manage the RAID
> controller FW. fwctl generalizes this notion of a companion debug and
> management interface that goes along with a dataplane implemented in an
> appropriate subsystem.
>
> The need for this has reached a critical point as many users are moving to
> run lockdown enabled kernels. Several existing devices have had long
> standing tooling for management that relied on /sys/../resource0 or PCI
> config space access which is not permitted in lockdown. A major point of
> fwctl is to define and document the rules that a device must follow to
> expose a lockdown compatible RPC.
>
> Based on some discussion fwctl splits the RPCs into four categories
>
> FWCTL_RPC_CONFIGURATION
> FWCTL_RPC_DEBUG_READ_ONLY
> FWCTL_RPC_DEBUG_WRITE
> FWCTL_RPC_DEBUG_WRITE_FULL
>
> Where the latter two trigger a new TAINT_FWCTL, and the final one requires
> CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
> would be responsible to restrict RPCs to the requested security scope,
> while the core code handles the tainting and CAP checks.
>
> For details see the final patch which introduces the documentation.
>
> The CXL FWCTL driver is now in it own series on v3:
> https://lore.kernel.org/r/20250204220430.4146187-1-dave.jiang@intel.com
>
> I'm expecting a 3rd driver (from Shannon @ Pensando) to be posted right
> away, the github version I saw looked good. I've got soft commitments for
> about 6 drivers in total now.
Hi Jason,
I've looked through the core code and didn't see anything that other
haven't already commented on. I didn't go through the mlx5 or bnxt code
very carefully, but you can put my Reviewed-by on your first 6 patches.
We've been running successfully with an earlier version of the code, but
haven't set up our full test environment with this version yet. Since
there doesn't seem to be much change here, you are welcome to my
Tested-by as well.
For the first 6 patches:
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Shannon Nelson <shannon.nelson@amd.com>
Cheers,
sln
>
> There have been three LWN articles written discussing various aspects of
> this proposal:
>
> https://lwn.net/Articles/955001/
> https://lwn.net/Articles/969383/
> https://lwn.net/Articles/990802/
>
> A really giant ksummit thread preceding a discussion at the Maintainer
> Summit:
>
> https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
>
> Several have expressed general support for this concept:
>
> AMD/Pensando - https://lore.kernel.org/linux-rdma/20241205222818.44439-1-shannon.nelson@amd.com
> Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
> Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org
> Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org
> NVIDIA Networking
> Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
> Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
> SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
>
> Work is ongoing for userspace, currently the mellanox tool suite has been
> ported over:
> https://github.com/Mellanox/mstflint
>
> And a more simplified example how to use it:
> https://github.com/jgunthorpe/mlx5ctl.git
>
> This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
>
> v4:
> - Rebase to v6.14-rc1
> - Fine tune comments and rst documentatin
> - Adjust cleanup.h usage - remove places that add more ofuscation than
> value
> - CXL is back to its own independent series
> - Increase FWCTL_MAX_DEVICES to 4096, someone hit the limit
> - Fix mlx5ctl_validate_rpc() logic around scope checking
> - Disable mlx5ctl on SFs
> v3: https://patch.msgid.link/r/0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com
> - Rebase to v6.11-rc4
> - Add a squashed version of David's CXL series as the 2nd driver
> - Add missing includes
> - Improve comments based on feedback
> - Use the kdoc format that puts the member docs inside the struct
> - Rewrite fwctl_alloc_device() to be clearer
> - Incorporate all remarks for the documentation
> v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
> - Rebase to v6.10-rc5
> - Minor style changes
> - Follow the style consensus for the guard stuff
> - Documentation grammer/spelling
> - Add missed length output for mlx5 get_info
> - Add two more missed MLX5 CMD's
> - Collect tags
> v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
>
> Andy Gospodarek (2):
> fwctl/bnxt: Support communicating with bnxt fw
> bnxt: Create an auxiliary device for fwctl_bnxt
>
> Jason Gunthorpe (6):
> fwctl: Add basic structure for a class subsystem with a cdev
> fwctl: Basic ioctl dispatch for the character device
> fwctl: FWCTL_INFO to return basic information about the device
> taint: Add TAINT_FWCTL
> fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
> fwctl: Add documentation
>
> Saeed Mahameed (2):
> fwctl/mlx5: Support for communicating with mlx5 fw
> mlx5: Create an auxiliary device for fwctl_mlx5
>
> Documentation/admin-guide/tainted-kernels.rst | 5 +
> Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++
> Documentation/userspace-api/fwctl/index.rst | 12 +
> Documentation/userspace-api/index.rst | 1 +
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 16 +
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/fwctl/Kconfig | 32 ++
> drivers/fwctl/Makefile | 6 +
> drivers/fwctl/bnxt/Makefile | 4 +
> drivers/fwctl/bnxt/bnxt.c | 167 +++++++
> drivers/fwctl/main.c | 416 ++++++++++++++++++
> drivers/fwctl/mlx5/Makefile | 4 +
> drivers/fwctl/mlx5/main.c | 340 ++++++++++++++
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 126 +++++-
> drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 4 +
> drivers/net/ethernet/mellanox/mlx5/core/dev.c | 9 +
> include/linux/fwctl.h | 135 ++++++
> include/linux/panic.h | 3 +-
> include/uapi/fwctl/bnxt.h | 27 ++
> include/uapi/fwctl/fwctl.h | 140 ++++++
> include/uapi/fwctl/mlx5.h | 36 ++
> kernel/panic.c | 1 +
> tools/debugging/kernel-chktaint | 8 +
> 27 files changed, 1782 insertions(+), 5 deletions(-)
> create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
> create mode 100644 Documentation/userspace-api/fwctl/index.rst
> create mode 100644 drivers/fwctl/Kconfig
> create mode 100644 drivers/fwctl/Makefile
> create mode 100644 drivers/fwctl/bnxt/Makefile
> create mode 100644 drivers/fwctl/bnxt/bnxt.c
> create mode 100644 drivers/fwctl/main.c
> create mode 100644 drivers/fwctl/mlx5/Makefile
> create mode 100644 drivers/fwctl/mlx5/main.c
> create mode 100644 include/linux/fwctl.h
> create mode 100644 include/uapi/fwctl/bnxt.h
> create mode 100644 include/uapi/fwctl/fwctl.h
> create mode 100644 include/uapi/fwctl/mlx5.h
>
>
> base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-07 0:13 ` [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2025-02-07 12:59 ` Jonathan Cameron
2025-02-08 0:16 ` Dave Jiang
@ 2025-02-13 12:42 ` Przemek Kitszel
2025-02-13 18:52 ` Jason Gunthorpe
2 siblings, 1 reply; 67+ messages in thread
From: Przemek Kitszel @ 2025-02-13 12:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
On 2/7/25 01:13, Jason Gunthorpe wrote:
> Each file descriptor gets a chunk of per-FD driver specific context that
> allows the driver to attach a device specific struct to. The core code
> takes care of the memory lifetime for this structure.
>
> The ioctl dispatch and design is based on what was built for iommufd. The
> ioctls have a struct which has a combined in/out behavior with a typical
> 'zero pad' scheme for future extension and backwards compatibility.
>
> Like iommufd some shared logic does most of the ioctl marshalling and
> compatibility work and tables diatches to some function pointers for
> each unique iotcl.
>
> This approach has proven to work quite well in the iommufd and rdma
> subsystems.
>
> Allocate an ioctl number space for the subsystem.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index 34946bdc3bf3d7..d561deaf2b86d8 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -10,6 +10,8 @@
> #include <linux/module.h>
> #include <linux/slab.h>
>
> +#include <uapi/fwctl/fwctl.h>
> +
> enum {
> FWCTL_MAX_DEVICES = 4096,
> };
> @@ -18,20 +20,128 @@ static_assert(FWCTL_MAX_DEVICES < (1U << MINORBITS));
> static dev_t fwctl_dev;
> static DEFINE_IDA(fwctl_ida);
>
> +struct fwctl_ucmd {
> + struct fwctl_uctx *uctx;
> + void __user *ubuffer;
> + void *cmd;
> + u32 user_size;
> +};
> +
> +/* On stack memory for the ioctl structs */
> +union ucmd_buffer {
for most names you follow the usual prefixing rules, would be good
to do for all
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
2025-02-07 0:13 ` [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
@ 2025-02-13 13:19 ` Przemek Kitszel
2025-02-13 14:25 ` Leon Romanovsky
2025-02-13 19:18 ` Jason Gunthorpe
0 siblings, 2 replies; 67+ messages in thread
From: Przemek Kitszel @ 2025-02-13 13:19 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
On 2/7/25 01:13, Jason Gunthorpe wrote:
> From: Saeed Mahameed <saeedm@nvidia.com>
In part this is a general feedback for the subsystem too.
> +FWCTL MLX5 DRIVER
I don't like this design.
That way each and every real driver would need to make another one to
just use fwctl.
Why not just require the real driver to call fwctl_register(opsstruct),
with the required .validate, .do_cmd, etc commands backed there?
There will be much less scaffolding.
Or the intention is to have this little driver replaced by OOT one,
but keep the real (say networking) driver as-is from intree?
> +++ b/drivers/fwctl/mlx5/main.c
> @@ -0,0 +1,340 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
-2025
> + */
> +#include <linux/fwctl.h>
> +#include <linux/auxiliary_bus.h>
> +#include <linux/mlx5/device.h>
> +#include <linux/mlx5/driver.h>
this breaks abstraction (at least your headers are in nice place, but
this is rather uncommon, typical solution is to have them backed inside
the driver directory) - the two drivers will be tightly coupled
> +module_auxiliary_driver(mlx5ctl_driver);
> +
> +MODULE_IMPORT_NS("FWCTL");
> +MODULE_DESCRIPTION("mlx5 ConnectX fwctl driver");
> +MODULE_AUTHOR("Saeed Mahameed <saeedm@nvidia.com>");
> +MODULE_LICENSE("Dual BSD/GPL");
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> index 7a21f2f011917a..0790b8291ee1bd 100644
> --- a/include/uapi/fwctl/fwctl.h
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -42,6 +42,7 @@ enum {
>
> enum fwctl_device_type {
> FWCTL_DEVICE_TYPE_ERROR = 0,
> + FWCTL_DEVICE_TYPE_MLX5 = 1,
is that for fwctl info to be able to properly report what device user
has asked ioctl on? Would be great to embed 32byte long cstring of
DRIVER_NAME, to don't need each and every device to come to you and
ask for inclusion, that would also resolve problem of conflicting IDs
(my-driver-id prior-to and after upstreaming)
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
2025-02-13 13:19 ` Przemek Kitszel
@ 2025-02-13 14:25 ` Leon Romanovsky
2025-02-13 19:18 ` Jason Gunthorpe
1 sibling, 0 replies; 67+ messages in thread
From: Leon Romanovsky @ 2025-02-13 14:25 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Jason Gunthorpe, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Jakub Kicinski, Leonid Bloch, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
On Thu, Feb 13, 2025 at 02:19:38PM +0100, Przemek Kitszel wrote:
> On 2/7/25 01:13, Jason Gunthorpe wrote:
> > From: Saeed Mahameed <saeedm@nvidia.com>
>
> In part this is a general feedback for the subsystem too.
>
> > +FWCTL MLX5 DRIVER
>
> I don't like this design.
> That way each and every real driver would need to make another one to
> just use fwctl.
>
> Why not just require the real driver to call fwctl_register(opsstruct),
> with the required .validate, .do_cmd, etc commands backed there?
> There will be much less scaffolding.
We invented auxiliary_bus to actually reduce scaffolding. The auxiliary
devices allow split of complex, multi-subsystem devices without need
to create hard binding of their drivers.
It allows for every subsystem to have its own independent driver, which
can be loaded separately, something that is not possible with your idea.
>
> Or the intention is to have this little driver replaced by OOT one,
> but keep the real (say networking) driver as-is from intree?
No, please read the purpose here drivers/base/auxiliary.c:
....
23 * In some subsystems, the functionality of the core device (PCI/ACPI/other) is
24 * too complex for a single device to be managed by a monolithic driver (e.g.
25 * Sound Open Firmware), multiple devices might implement a common intersection
26 * of functionality (e.g. NICs + RDMA), or a driver may want to export an
27 * interface for another subsystem to drive (e.g. SIOV Physical Function export
28 * Virtual Function management). A split of the functionality into child-
29 * devices representing sub-domains of functionality makes it possible to
30 * compartmentalize, layer, and distribute domain-specific concerns via a Linux
31 * device-driver model.
....
>
> > +++ b/drivers/fwctl/mlx5/main.c
> > @@ -0,0 +1,340 @@
> > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > +/*
> > + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
>
> -2025
>
> > + */
> > +#include <linux/fwctl.h>
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/mlx5/device.h>
> > +#include <linux/mlx5/driver.h>
>
> this breaks abstraction (at least your headers are in nice place, but
> this is rather uncommon, typical solution is to have them backed inside
> the driver directory) - the two drivers will be tightly coupled
FWCTL driver is connected to auxiliary device which is managed by some
other driver core (in our case mlx5_core). It is coupled by design.
>
> > +module_auxiliary_driver(mlx5ctl_driver);
> > +
> > +MODULE_IMPORT_NS("FWCTL");
> > +MODULE_DESCRIPTION("mlx5 ConnectX fwctl driver");
> > +MODULE_AUTHOR("Saeed Mahameed <saeedm@nvidia.com>");
> > +MODULE_LICENSE("Dual BSD/GPL");
> > diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> > index 7a21f2f011917a..0790b8291ee1bd 100644
> > --- a/include/uapi/fwctl/fwctl.h
> > +++ b/include/uapi/fwctl/fwctl.h
> > @@ -42,6 +42,7 @@ enum {
> > enum fwctl_device_type {
> > FWCTL_DEVICE_TYPE_ERROR = 0,
> > + FWCTL_DEVICE_TYPE_MLX5 = 1,
>
> is that for fwctl info to be able to properly report what device user
> has asked ioctl on? Would be great to embed 32byte long cstring of
> DRIVER_NAME, to don't need each and every device to come to you and
> ask for inclusion, that would also resolve problem of conflicting IDs
> (my-driver-id prior-to and after upstreaming)
Yes, we do want to make sure that FWCTL is used for upstream code and
don't want to open it for any out-of-tree drivers, which wants to use
this interface but didn't send it to upstream.
Thanks
>
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-07 21:58 ` Dave Jiang
2025-02-11 9:33 ` Jonathan Cameron
@ 2025-02-13 17:52 ` Jason Gunthorpe
1 sibling, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 17:52 UTC (permalink / raw)
To: Dave Jiang
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
David Ahern, Andy Gospodarek, Christoph Hellwig, Itay Avraham,
Jiri Pirko, Jonathan Cameron, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Fri, Feb 07, 2025 at 02:58:51PM -0700, Dave Jiang wrote:
> > There are couple open notes
> > - Greg was interested in a new name, but nobody offered any bikesheds
> > - I would like a co-maintainer
>
> I volunteer as tribute. :)
Excellent choice of word, into the volcano with you!
> I got the CXL series rebased and tested on top of this series. So you can add
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> for the core FWCTL bits in the series.
Got it
Thanks,
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-11 9:33 ` Jonathan Cameron
@ 2025-02-13 17:55 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 17:55 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Dave Jiang, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
Leon Romanovsky, linux-cxl, linux-rdma, netdev, Saeed Mahameed,
Nelson, Shannon
On Tue, Feb 11, 2025 at 09:33:29AM +0000, Jonathan Cameron wrote:
> This is an area I plan to keep reviewing (and adding more use cases), so feel
> free to add me as a Reviewer or Maintainer (depending on how guilty you want
> me to feel if there is a backlog to review :) Will save me making sure to
> track these down as they get posted in different subsystems.
I'll put in R for now you can upgrade yourself to M later if that is
appropriate. I'm hoping it will not be too many patches
Thanks,
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 00/10] Introduce fwctl subystem
2025-02-13 2:30 ` Nelson, Shannon
@ 2025-02-13 18:02 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 18:02 UTC (permalink / raw)
To: Nelson, Shannon
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed
On Wed, Feb 12, 2025 at 06:30:38PM -0800, Nelson, Shannon wrote:
> We've been running successfully with an earlier version of the code, but
> haven't set up our full test environment with this version yet. Since there
> doesn't seem to be much change here, you are welcome to my Tested-by as
> well.
>
> For the first 6 patches:
> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
> Tested-by: Shannon Nelson <shannon.nelson@amd.com>
Got it, thanks
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device
2025-02-13 12:42 ` Przemek Kitszel
@ 2025-02-13 18:52 ` Jason Gunthorpe
0 siblings, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 18:52 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
On Thu, Feb 13, 2025 at 01:42:44PM +0100, Przemek Kitszel wrote:
> > +/* On stack memory for the ioctl structs */
> > +union ucmd_buffer {
>
> for most names you follow the usual prefixing rules, would be good
> to do for all
Done
Thanks,
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
2025-02-13 13:19 ` Przemek Kitszel
2025-02-13 14:25 ` Leon Romanovsky
@ 2025-02-13 19:18 ` Jason Gunthorpe
1 sibling, 0 replies; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 19:18 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Jakub Kicinski,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Saeed Mahameed, Nelson, Shannon
On Thu, Feb 13, 2025 at 02:19:38PM +0100, Przemek Kitszel wrote:
> On 2/7/25 01:13, Jason Gunthorpe wrote:
> > From: Saeed Mahameed <saeedm@nvidia.com>
>
> In part this is a general feedback for the subsystem too.
>
> > +FWCTL MLX5 DRIVER
>
> I don't like this design.
> That way each and every real driver would need to make another one to
> just use fwctl.
It is not mandatory, drivers could call fwctl_register() directly from
a pci probe function. That is just very undesirable for a secondary
subsystem like fwctl for reasons Leon explained. We want loose
coupling controled by modules and driver binding, not strong coupling
where if you load, say, mlx5_core, you get a million other modules
automatically pulled in as well. Users should have control over this.
> Or the intention is to have this little driver replaced by OOT one,
> but keep the real (say networking) driver as-is from intree?
The design of the FW interface would have to be really off to motivate
an OOT one. IMHO you are more likely to see an intree fwctl driver and
maybe an OOT netdev or something.
> > +++ b/drivers/fwctl/mlx5/main.c
> > @@ -0,0 +1,340 @@
> > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > +/*
> > + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
>
> -2025
Time flies, thanks
>
> > + */
> > +#include <linux/fwctl.h>
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/mlx5/device.h>
> > +#include <linux/mlx5/driver.h>
>
> this breaks abstraction (at least your headers are in nice place, but
> this is rather uncommon, typical solution is to have them backed inside
> the driver directory) - the two drivers will be tightly coupled
It is part of the auxdev methodology. These headers pre-exist for all
the other mlx5 family aux devices to use.
> > enum fwctl_device_type {
> > FWCTL_DEVICE_TYPE_ERROR = 0,
> > + FWCTL_DEVICE_TYPE_MLX5 = 1,
>
> is that for fwctl info to be able to properly report what device user
> has asked ioctl on?
Yes.
> Would be great to embed 32byte long cstring of
> DRIVER_NAME, to don't need each and every device to come to you and
> ask for inclusion,
I think we want people to have to ask though, don't we? We don't want
to make it easy to write OOT drivers.
> that would also resolve problem of conflicting IDs (my-driver-id
> prior-to and after upstreaming)
Yes it would, but I suggest people get their driver posted before they
start shipping it :)
Jonathan had suggested using a uuid IIRC for the same reason.
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-12 13:22 ` Leon Romanovsky
@ 2025-02-14 1:03 ` Saeed Mahameed
2025-02-17 12:49 ` Jiri Pirko
0 siblings, 1 reply; 67+ messages in thread
From: Saeed Mahameed @ 2025-02-14 1:03 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Nelson, Shannon, Jakub Kicinski, Jason Gunthorpe, Saeed Mahameed,
Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
Dave Jiang, David Ahern, Andy Gospodarek, Christoph Hellwig,
Itay Avraham, Jiri Pirko, Jonathan Cameron, Leonid Bloch,
linux-cxl, linux-rdma, netdev, Michael Chan
On 12 Feb 15:22, Leon Romanovsky wrote:
>On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote:
>> On 2/10/2025 11:55 PM, Leon Romanovsky wrote:
>> >
>> > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
>> > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
>> > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
>> > > >
>> > > > > But if you agree the netdev doesn't need it seems like a fairly
>> > > > > straightforward way to unblock your progress.
>> > > >
>> > > > I'm trying to understand what you are suggesting here.
>> > > >
>> > > > We have many scenarios where mlx5_core spawns all kinds of different
>> > > > devices, including recovery cases where there is no networking at all
>> > > > and only fwctl. So we can't just discard the aux dev or mlx5_core
>> > > > triggered setup without breaking scenarios.
>> > > >
>> > > > However, you seem to be suggesting that netdev-only configurations (ie
>> > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the
>> > > > case? All else would remain the same. It is very ugly but I could see
>> > > > a technical path to do it, and would consider it if that brings peace.
>> > >
>> > > Yes, when RDMA driver is not loaded there should be no access to fwctl.
>> >
>> > There are users mentioned in cover letter, which need FWCTL without RDMA.
>> > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
>> >
>> > I want to suggest something different. What about to move all XXX_core
>> > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
>> > place?
>> >
>> > There is no technical need to have PCI/FW logic inside networking stack.
>> >
>> > Thanks
>>
>> Our pds_core device fits this description as well: it is not an ethernet PCI
>> device, but helps manage the FW/HW for Eth and other things that are
>> separate PCI functions. We ended up in the netdev arena because we first
>> went in as a support for vDPA VFs.
>>
>> Should these 'core' devices live in linux-pci land? Is it possible that
>> some 'core' things might be platform devices rather than PCI?
>
>IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now
>these core drivers can be placed in drivers/fwctl instead. It will be natural
+1
Fwctl subsystem is perfect for shared modules that need to initialize the
pci device to a minimal state where fwctl uAPIs are enabled for debug and
bare metal device configs while aux sunsystem can carry out the
spawning of other subsystems.
>place for them as they will be located near the UAPI which provides an access
>to them.
>
>All other components will be auxbus devices in their respective
>subsystems (eth, RDMA ...).
>
>Thanks
>
>>
>> sln
>>
>>
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-14 1:03 ` Saeed Mahameed
@ 2025-02-17 12:49 ` Jiri Pirko
2025-02-17 19:02 ` Leon Romanovsky
0 siblings, 1 reply; 67+ messages in thread
From: Jiri Pirko @ 2025-02-17 12:49 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Leon Romanovsky, Nelson, Shannon, Jakub Kicinski, Jason Gunthorpe,
Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, linux-cxl, linux-rdma, netdev, Michael Chan
Fri, Feb 14, 2025 at 02:03:56AM +0100, saeed@kernel.org wrote:
>On 12 Feb 15:22, Leon Romanovsky wrote:
>> On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote:
>> > On 2/10/2025 11:55 PM, Leon Romanovsky wrote:
>> > >
>> > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
>> > > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
>> > > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
>> > > > >
>> > > > > > But if you agree the netdev doesn't need it seems like a fairly
>> > > > > > straightforward way to unblock your progress.
>> > > > >
>> > > > > I'm trying to understand what you are suggesting here.
>> > > > >
>> > > > > We have many scenarios where mlx5_core spawns all kinds of different
>> > > > > devices, including recovery cases where there is no networking at all
>> > > > > and only fwctl. So we can't just discard the aux dev or mlx5_core
>> > > > > triggered setup without breaking scenarios.
>> > > > >
>> > > > > However, you seem to be suggesting that netdev-only configurations (ie
>> > > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the
>> > > > > case? All else would remain the same. It is very ugly but I could see
>> > > > > a technical path to do it, and would consider it if that brings peace.
>> > > >
>> > > > Yes, when RDMA driver is not loaded there should be no access to fwctl.
>> > >
>> > > There are users mentioned in cover letter, which need FWCTL without RDMA.
>> > > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
>> > >
>> > > I want to suggest something different. What about to move all XXX_core
>> > > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
>> > > place?
>> > >
>> > > There is no technical need to have PCI/FW logic inside networking stack.
>> > >
>> > > Thanks
>> >
>> > Our pds_core device fits this description as well: it is not an ethernet PCI
>> > device, but helps manage the FW/HW for Eth and other things that are
>> > separate PCI functions. We ended up in the netdev arena because we first
>> > went in as a support for vDPA VFs.
>> >
>> > Should these 'core' devices live in linux-pci land? Is it possible that
>> > some 'core' things might be platform devices rather than PCI?
>>
>> IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now
>> these core drivers can be placed in drivers/fwctl instead. It will be natural
>+1
>
>Fwctl subsystem is perfect for shared modules that need to initialize the
>pci device to a minimal state where fwctl uAPIs are enabled for debug and
>bare metal device configs while aux sunsystem can carry out the
>spawning of other subsystems.
Wouldn't it be better to call it drivers/core/ and have corectl or
corefwctl ?
>
>> place for them as they will be located near the UAPI which provides an access
>> to them.
>>
>> All other components will be auxbus devices in their respective
>> subsystems (eth, RDMA ...).
>>
>> Thanks
>>
>> >
>> > sln
>> >
>> >
>>
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-17 12:49 ` Jiri Pirko
@ 2025-02-17 19:02 ` Leon Romanovsky
0 siblings, 0 replies; 67+ messages in thread
From: Leon Romanovsky @ 2025-02-17 19:02 UTC (permalink / raw)
To: Jiri Pirko
Cc: Saeed Mahameed, Nelson, Shannon, Jakub Kicinski, Jason Gunthorpe,
Saeed Mahameed, Andy Gospodarek, Aron Silverton, Dan Williams,
Daniel Vetter, Dave Jiang, David Ahern, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, linux-cxl, linux-rdma, netdev, Michael Chan
On Mon, Feb 17, 2025 at 01:49:12PM +0100, Jiri Pirko wrote:
> Fri, Feb 14, 2025 at 02:03:56AM +0100, saeed@kernel.org wrote:
> >On 12 Feb 15:22, Leon Romanovsky wrote:
> >> On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote:
> >> > On 2/10/2025 11:55 PM, Leon Romanovsky wrote:
> >> > >
> >> > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote:
> >> > > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote:
> >> > > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote:
> >> > > > >
> >> > > > > > But if you agree the netdev doesn't need it seems like a fairly
> >> > > > > > straightforward way to unblock your progress.
> >> > > > >
> >> > > > > I'm trying to understand what you are suggesting here.
> >> > > > >
> >> > > > > We have many scenarios where mlx5_core spawns all kinds of different
> >> > > > > devices, including recovery cases where there is no networking at all
> >> > > > > and only fwctl. So we can't just discard the aux dev or mlx5_core
> >> > > > > triggered setup without breaking scenarios.
> >> > > > >
> >> > > > > However, you seem to be suggesting that netdev-only configurations (ie
> >> > > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the
> >> > > > > case? All else would remain the same. It is very ugly but I could see
> >> > > > > a technical path to do it, and would consider it if that brings peace.
> >> > > >
> >> > > > Yes, when RDMA driver is not loaded there should be no access to fwctl.
> >> > >
> >> > > There are users mentioned in cover letter, which need FWCTL without RDMA.
> >> > > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/
> >> > >
> >> > > I want to suggest something different. What about to move all XXX_core
> >> > > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated
> >> > > place?
> >> > >
> >> > > There is no technical need to have PCI/FW logic inside networking stack.
> >> > >
> >> > > Thanks
> >> >
> >> > Our pds_core device fits this description as well: it is not an ethernet PCI
> >> > device, but helps manage the FW/HW for Eth and other things that are
> >> > separate PCI functions. We ended up in the netdev arena because we first
> >> > went in as a support for vDPA VFs.
> >> >
> >> > Should these 'core' devices live in linux-pci land? Is it possible that
> >> > some 'core' things might be platform devices rather than PCI?
> >>
> >> IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now
> >> these core drivers can be placed in drivers/fwctl instead. It will be natural
> >+1
> >
> >Fwctl subsystem is perfect for shared modules that need to initialize the
> >pci device to a minimal state where fwctl uAPIs are enabled for debug and
> >bare metal device configs while aux sunsystem can carry out the
> >spawning of other subsystems.
>
> Wouldn't it be better to call it drivers/core/ and have corectl or
> corefwctl ?
Before names, let's first agree that this is the right thing to do.
I'm fine with any proposed name.
Thanks
>
> >
> >> place for them as they will be located near the UAPI which provides an access
> >> to them.
> >>
> >> All other components will be auxbus devices in their respective
> >> subsystems (eth, RDMA ...).
> >>
> >> Thanks
> >>
> >> >
> >> > sln
> >> >
> >> >
> >>
> >
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-11 16:24 ` David Ahern
@ 2025-02-18 20:05 ` Jason Gunthorpe
2025-02-18 21:42 ` David Ahern
0 siblings, 1 reply; 67+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 20:05 UTC (permalink / raw)
To: David Ahern
Cc: Jakub Kicinski, Saeed Mahameed, Andy Gospodarek, Aron Silverton,
Dan Williams, Daniel Vetter, Dave Jiang, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote:
> "Any resources in use by the netdev stack can only be created and
> modified by established netdev tools."
That is already a restriction described in the doc, not just netdev,
but any kernel driver running with any kernel owned resource. You
can't reach in and change kernel owned objects.
Jason
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-18 20:05 ` Jason Gunthorpe
@ 2025-02-18 21:42 ` David Ahern
2025-02-18 23:31 ` Jakub Kicinski
0 siblings, 1 reply; 67+ messages in thread
From: David Ahern @ 2025-02-18 21:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jakub Kicinski, Saeed Mahameed, Andy Gospodarek, Aron Silverton,
Dan Williams, Daniel Vetter, Dave Jiang, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On 2/18/25 1:05 PM, Jason Gunthorpe wrote:
> On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote:
>
>> "Any resources in use by the netdev stack can only be created and
>> modified by established netdev tools."
>
> That is already a restriction described in the doc, not just netdev,
> but any kernel driver running with any kernel owned resource. You
> can't reach in and change kernel owned objects.
>
ok, then Jakub's concerns should be met.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-18 21:42 ` David Ahern
@ 2025-02-18 23:31 ` Jakub Kicinski
2025-02-24 22:34 ` Saeed Mahameed
0 siblings, 1 reply; 67+ messages in thread
From: Jakub Kicinski @ 2025-02-18 23:31 UTC (permalink / raw)
To: David Ahern
Cc: Jason Gunthorpe, Saeed Mahameed, Andy Gospodarek, Aron Silverton,
Dan Williams, Daniel Vetter, Dave Jiang, Andy Gospodarek,
Christoph Hellwig, Itay Avraham, Jiri Pirko, Jonathan Cameron,
Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma, netdev,
Nelson, Shannon, Michael Chan
On Tue, 18 Feb 2025 14:42:48 -0700 David Ahern wrote:
> On 2/18/25 1:05 PM, Jason Gunthorpe wrote:
> > On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote:
> >
> >> "Any resources in use by the netdev stack can only be created and
> >> modified by established netdev tools."
> >
> > That is already a restriction described in the doc, not just netdev,
> > but any kernel driver running with any kernel owned resource. You
> > can't reach in and change kernel owned objects.
>
> ok, then Jakub's concerns should be met.
I appreciate the doc, but no, it's not enough. The fwctl interface must
not be exposed if RDMA is disabled or driver not loaded.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt
2025-02-18 23:31 ` Jakub Kicinski
@ 2025-02-24 22:34 ` Saeed Mahameed
0 siblings, 0 replies; 67+ messages in thread
From: Saeed Mahameed @ 2025-02-24 22:34 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Ahern, Jason Gunthorpe, Saeed Mahameed, Andy Gospodarek,
Aron Silverton, Dan Williams, Daniel Vetter, Dave Jiang,
Andy Gospodarek, Christoph Hellwig, Itay Avraham, Jiri Pirko,
Jonathan Cameron, Leonid Bloch, Leon Romanovsky, linux-cxl,
linux-rdma, netdev, Nelson, Shannon, Michael Chan
On 18 Feb 15:31, Jakub Kicinski wrote:
>On Tue, 18 Feb 2025 14:42:48 -0700 David Ahern wrote:
>> On 2/18/25 1:05 PM, Jason Gunthorpe wrote:
>> > On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote:
>> >
>> >> "Any resources in use by the netdev stack can only be created and
>> >> modified by established netdev tools."
>> >
>> > That is already a restriction described in the doc, not just netdev,
>> > but any kernel driver running with any kernel owned resource. You
>> > can't reach in and change kernel owned objects.
>>
>> ok, then Jakub's concerns should be met.
>
>I appreciate the doc, but no, it's not enough. The fwctl interface must
>not be exposed if RDMA is disabled or driver not loaded.
>
Jason's proposal was completely different, he asked that if only netdev is
present then we can explicitly block fwctl. Tying fwctl to RDMA makes no
sense for most of the drivers that will be using it, so I don't support
such suggestion not even blocking fwctl for netdev only systems, if one
can load RDMA and still can control the device, then Jakub's concerns are
not met, so what's the point?
The whole Idea of blocking fwctl in specific configurations has no
technical merit, if someone doesn't want fwctl in their system, then let's
implement a devlink knob like we have for all ulps.
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2025-02-24 22:34 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 0:13 [PATCH v4 00/10] Introduce fwctl subystem Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2025-02-07 23:32 ` Dan Williams
2025-02-07 23:55 ` Jason Gunthorpe
2025-02-08 0:08 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2025-02-07 12:59 ` Jonathan Cameron
2025-02-07 13:52 ` Jason Gunthorpe
2025-02-08 0:16 ` Dave Jiang
2025-02-10 15:24 ` Jason Gunthorpe
2025-02-13 12:42 ` Przemek Kitszel
2025-02-13 18:52 ` Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2025-02-07 13:06 ` Jonathan Cameron
2025-02-07 14:23 ` Jason Gunthorpe
2025-02-08 0:21 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
2025-02-07 13:09 ` Jonathan Cameron
2025-02-08 0:24 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
2025-02-08 0:28 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 06/10] fwctl: Add documentation Jason Gunthorpe
2025-02-07 14:42 ` Jonathan Cameron
2025-02-10 15:17 ` Jason Gunthorpe
2025-02-08 0:40 ` Dave Jiang
2025-02-07 0:13 ` [PATCH v4 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
2025-02-13 13:19 ` Przemek Kitszel
2025-02-13 14:25 ` Leon Romanovsky
2025-02-13 19:18 ` Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 08/10] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 09/10] fwctl/bnxt: Support communicating with bnxt fw Jason Gunthorpe
2025-02-07 14:59 ` Jonathan Cameron
2025-02-07 15:03 ` Jason Gunthorpe
2025-02-07 0:13 ` [PATCH v4 10/10] bnxt: Create an auxiliary device for fwctl_bnxt Jason Gunthorpe
2025-02-07 0:44 ` Jakub Kicinski
2025-02-07 3:17 ` Andy Gospodarek
2025-02-07 12:46 ` Jason Gunthorpe
2025-02-07 15:36 ` Jakub Kicinski
2025-02-07 20:25 ` Saeed Mahameed
2025-02-07 21:51 ` Jakub Kicinski
2025-02-08 1:10 ` Saeed Mahameed
2025-02-08 1:16 ` Jason Gunthorpe
2025-02-08 3:24 ` Andy Gospodarek
2025-02-11 1:04 ` Jakub Kicinski
2025-02-11 7:55 ` Leon Romanovsky
2025-02-11 14:27 ` Andy Gospodarek
2025-02-12 14:20 ` Leon Romanovsky
2025-02-11 18:36 ` Nelson, Shannon
2025-02-12 13:22 ` Leon Romanovsky
2025-02-14 1:03 ` Saeed Mahameed
2025-02-17 12:49 ` Jiri Pirko
2025-02-17 19:02 ` Leon Romanovsky
2025-02-11 16:24 ` David Ahern
2025-02-18 20:05 ` Jason Gunthorpe
2025-02-18 21:42 ` David Ahern
2025-02-18 23:31 ` Jakub Kicinski
2025-02-24 22:34 ` Saeed Mahameed
2025-02-07 23:29 ` Andy Gospodarek
2025-02-08 0:08 ` Jakub Kicinski
2025-02-07 21:41 ` [PATCH v4 00/10] Introduce fwctl subystem Dan Williams
2025-02-07 21:58 ` Dave Jiang
2025-02-11 9:33 ` Jonathan Cameron
2025-02-13 17:55 ` Jason Gunthorpe
2025-02-13 17:52 ` Jason Gunthorpe
2025-02-12 22:21 ` Zhu Yanjun
2025-02-13 2:30 ` Nelson, Shannon
2025-02-13 18:02 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).