* [PATCH 0/8] Introduce fwctl subystem
@ 2024-06-03 15:53 Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
` (9 more replies)
0 siblings, 10 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
fwctl is a new subsystem intended to bring some common rules and order to
the growing pattern of exposing a secure FW interface directly to
userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
exposing a device for datapath operations fwctl is focused on debugging,
configuration and provisioning of the device. It will not have the
necessary features like interrupt delivery to support a datapath.
This concept is similar to the long standing practice in the "HW" RAID
space of having a device specific misc device to manager the RAID
controller FW. fwctl generalizes this notion of a companion debug and
management interface that goes along with a dataplane implemented in an
appropriate subsystem.
The need for this has reached a critical point as many users are moving to
run lockdown enabled kernels. Several existing devices have had long
standing tooling for management that relied on /sys/../resource0 or PCI
config space access which is not permitted in lockdown. A major point of
fwctl is to define and document the rules that a device must follow to
expose a lockdown compatible RPC.
Based on some discussion fwctl splits the RPCs into four categories
FWCTL_RPC_CONFIGURATION
FWCTL_RPC_DEBUG_READ_ONLY
FWCTL_RPC_DEBUG_WRITE
FWCTL_RPC_DEBUG_WRITE_FULL
Where the latter two trigger a new TAINT_FWCTL, and the final one requires
CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
would be responsible to restrict RPCs to the requested security scope,
while the core code handles the tainting and CAP checks.
For details see the final patch which introduces the documentation.
This series incorporates a version of the mlx5ctl interface previously
proposed:
https://lore.kernel.org/r/20240207072435.14182-1-saeed@kernel.org/
For this series the memory registration mechanism was removed, but I
expect it will come back.
This series comes with mlx5 as a driver implementation, and I have soft
commitments for at least three more drivers.
There have been two LWN articles written discussing various aspects of
this proposal:
https://lwn.net/Articles/955001/
https://lwn.net/Articles/969383/
Several have expressed general support for this concept:
Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org/
Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org/
NVIDIA Networking
Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
Work is ongoing for a robust multi-device open source userspace, currently
the mlx5ctl_user that was posted by Saeed has been updated to use fwctl.
https://github.com/saeedtx/mlx5ctl.git
https://github.com/jgunthorpe/mlx5ctl.git
This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
Jason Gunthorpe (6):
fwctl: Add basic structure for a class subsystem with a cdev
fwctl: Basic ioctl dispatch for the character device
fwctl: FWCTL_INFO to return basic information about the device
taint: Add TAINT_FWCTL
fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
fwctl: Add documentation
Saeed Mahameed (2):
fwctl/mlx5: Support for communicating with mlx5 fw
mlx5: Create an auxiliary device for fwctl_mlx5
Documentation/admin-guide/tainted-kernels.rst | 5 +
Documentation/userspace-api/fwctl.rst | 269 ++++++++++++
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 16 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/fwctl/Kconfig | 23 +
drivers/fwctl/Makefile | 5 +
drivers/fwctl/main.c | 411 ++++++++++++++++++
drivers/fwctl/mlx5/Makefile | 4 +
drivers/fwctl/mlx5/main.c | 333 ++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 8 +
include/linux/fwctl.h | 112 +++++
include/linux/panic.h | 3 +-
include/uapi/fwctl/fwctl.h | 137 ++++++
include/uapi/fwctl/mlx5.h | 36 ++
kernel/panic.c | 1 +
18 files changed, 1367 insertions(+), 1 deletion(-)
create mode 100644 Documentation/userspace-api/fwctl.rst
create mode 100644 drivers/fwctl/Kconfig
create mode 100644 drivers/fwctl/Makefile
create mode 100644 drivers/fwctl/main.c
create mode 100644 drivers/fwctl/mlx5/Makefile
create mode 100644 drivers/fwctl/mlx5/main.c
create mode 100644 include/linux/fwctl.h
create mode 100644 include/uapi/fwctl/fwctl.h
create mode 100644 include/uapi/fwctl/mlx5.h
base-commit: c3f38fa61af77b49866b006939479069cd451173
--
2.45.2
^ permalink raw reply [flat|nested] 73+ messages in thread
* [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-04 9:32 ` Leon Romanovsky
2024-06-04 16:42 ` Randy Dunlap
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
` (8 subsequent siblings)
9 siblings, 2 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Create the class, character device and functions for a fwctl driver to
un/register to the subsystem.
A typical fwctl driver has a sysfs presence like:
$ ls -l /dev/fwctl/fwctl0
crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0
$ ls /sys/class/fwctl/fwctl0
dev device power subsystem uevent
$ ls /sys/class/fwctl/fwctl0/device/infiniband/
ibp0s10f0
$ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
fwctl0/
$ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
dev device power subsystem uevent
Which allows userspace to link all the multi-subsystem driver components
together and learn the subsystem specific names for the device's
components.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
MAINTAINERS | 8 ++
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/fwctl/Kconfig | 9 +++
drivers/fwctl/Makefile | 4 +
drivers/fwctl/main.c | 174 +++++++++++++++++++++++++++++++++++++++++
include/linux/fwctl.h | 68 ++++++++++++++++
7 files changed, 266 insertions(+)
create mode 100644 drivers/fwctl/Kconfig
create mode 100644 drivers/fwctl/Makefile
create mode 100644 drivers/fwctl/main.c
create mode 100644 include/linux/fwctl.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 8754ac2c259dc9..833b853808421e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9077,6 +9077,14 @@ F: kernel/futex/*
F: tools/perf/bench/futex*
F: tools/testing/selftests/futex/
+FWCTL SUBSYSTEM
+M: Jason Gunthorpe <jgg@nvidia.com>
+M: Saeed Mahameed <saeedm@nvidia.com>
+S: Maintained
+F: Documentation/userspace-api/fwctl.rst
+F: drivers/fwctl/
+F: include/linux/fwctl.h
+
GALAXYCORE GC0308 CAMERA SENSOR DRIVER
M: Sebastian Reichel <sre@kernel.org>
L: linux-media@vger.kernel.org
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 7bdad836fc6207..7c556c5ac4fddc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -21,6 +21,8 @@ source "drivers/connector/Kconfig"
source "drivers/firmware/Kconfig"
+source "drivers/fwctl/Kconfig"
+
source "drivers/gnss/Kconfig"
source "drivers/mtd/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index fe9ceb0d2288ad..f6a241b747b29c 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_MEMSTICK) += memstick/
obj-y += leds/
obj-$(CONFIG_INFINIBAND) += infiniband/
obj-y += firmware/
+obj-$(CONFIG_FWCTL) += fwctl/
obj-$(CONFIG_CRYPTO) += crypto/
obj-$(CONFIG_SUPERH) += sh/
obj-y += clocksource/
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
new file mode 100644
index 00000000000000..6ceee3347ae642
--- /dev/null
+++ b/drivers/fwctl/Kconfig
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menuconfig FWCTL
+ tristate "fwctl device firmware access framework"
+ help
+ fwctl provides a userspace API for restricted access to communicate
+ with on-device firmware. The communication channel is intended to
+ support a wide range of lockdown compatible device behaviors including
+ manipulating device FLASH, debugging, and other activities that don't
+ fit neatly into an existing subsystem.
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
new file mode 100644
index 00000000000000..1cad210f6ba580
--- /dev/null
+++ b/drivers/fwctl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL) += fwctl.o
+
+fwctl-y += main.o
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
new file mode 100644
index 00000000000000..ff9b7bad5a2b0d
--- /dev/null
+++ b/drivers/fwctl/main.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#define pr_fmt(fmt) "fwctl: " fmt
+#include <linux/fwctl.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/container_of.h>
+#include <linux/fs.h>
+
+enum {
+ FWCTL_MAX_DEVICES = 256,
+};
+static dev_t fwctl_dev;
+static DEFINE_IDA(fwctl_ida);
+
+static int fwctl_fops_open(struct inode *inode, struct file *filp)
+{
+ struct fwctl_device *fwctl =
+ container_of(inode->i_cdev, struct fwctl_device, cdev);
+
+ get_device(&fwctl->dev);
+ filp->private_data = fwctl;
+ return 0;
+}
+
+static int fwctl_fops_release(struct inode *inode, struct file *filp)
+{
+ struct fwctl_device *fwctl = filp->private_data;
+
+ fwctl_put(fwctl);
+ return 0;
+}
+
+static const struct file_operations fwctl_fops = {
+ .owner = THIS_MODULE,
+ .open = fwctl_fops_open,
+ .release = fwctl_fops_release,
+};
+
+static void fwctl_device_release(struct device *device)
+{
+ struct fwctl_device *fwctl =
+ container_of(device, struct fwctl_device, dev);
+
+ if (fwctl->dev.devt)
+ ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+ kfree(fwctl);
+}
+
+static char *fwctl_devnode(const struct device *dev, umode_t *mode)
+{
+ return kasprintf(GFP_KERNEL, "fwctl/%s", dev_name(dev));
+}
+
+static struct class fwctl_class = {
+ .name = "fwctl",
+ .dev_release = fwctl_device_release,
+ .devnode = fwctl_devnode,
+};
+
+static struct fwctl_device *
+_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
+{
+ struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
+
+ if (!fwctl)
+ return NULL;
+ fwctl->dev.class = &fwctl_class;
+ fwctl->dev.parent = parent;
+ device_initialize(&fwctl->dev);
+ return_ptr(fwctl);
+}
+
+/* Drivers use the fwctl_alloc_device() wrapper */
+struct fwctl_device *_fwctl_alloc_device(struct device *parent,
+ const struct fwctl_ops *ops,
+ size_t size)
+{
+ struct fwctl_device *fwctl __free(fwctl) =
+ _alloc_device(parent, ops, size);
+ int devnum;
+
+ devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
+ if (devnum < 0)
+ return NULL;
+ fwctl->dev.devt = fwctl_dev + devnum;
+
+ cdev_init(&fwctl->cdev, &fwctl_fops);
+ fwctl->cdev.owner = THIS_MODULE;
+
+ if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
+ return NULL;
+
+ fwctl->ops = ops;
+ return_ptr(fwctl);
+}
+EXPORT_SYMBOL_NS_GPL(_fwctl_alloc_device, FWCTL);
+
+/**
+ * fwctl_register - Register a new device to the subsystem
+ * @fwctl: Previously allocated fwctl_device
+ *
+ * On return the device is visible through sysfs and /dev, driver ops may be
+ * called.
+ */
+int fwctl_register(struct fwctl_device *fwctl)
+{
+ int ret;
+
+ ret = cdev_device_add(&fwctl->cdev, &fwctl->dev);
+ if (ret)
+ return ret;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
+
+/**
+ * fwctl_unregister - Unregister a device from the subsystem
+ * @fwctl: Previously allocated and registered fwctl_device
+ *
+ * Undoes fwctl_register(). On return no driver ops will be called. The
+ * caller must still call fwctl_put() to free the fwctl.
+ *
+ * Unregister will return even if userspace still has file descriptors open.
+ * This will call ops->close_uctx() on any open FDs and after return no driver
+ * op will be called. The FDs remain open but all fops will return -ENODEV.
+ *
+ * The design of fwctl allows this sort of disassociation of the driver from the
+ * subsystem primarily by keeping memory allocations owned by the core subsytem.
+ * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
+ * callback. This allows the module to remain unlocked while FDs are open.
+ */
+void fwctl_unregister(struct fwctl_device *fwctl)
+{
+ cdev_device_del(&fwctl->cdev, &fwctl->dev);
+
+ /*
+ * The driver module may unload after this returns, the op pointer will
+ * not be valid.
+ */
+ fwctl->ops = NULL;
+}
+EXPORT_SYMBOL_NS_GPL(fwctl_unregister, FWCTL);
+
+static int __init fwctl_init(void)
+{
+ int ret;
+
+ ret = alloc_chrdev_region(&fwctl_dev, 0, FWCTL_MAX_DEVICES, "fwctl");
+ if (ret)
+ return ret;
+
+ ret = class_register(&fwctl_class);
+ if (ret)
+ goto err_chrdev;
+ return 0;
+
+err_chrdev:
+ unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
+ return ret;
+}
+
+static void __exit fwctl_exit(void)
+{
+ class_unregister(&fwctl_class);
+ unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
+}
+
+module_init(fwctl_init);
+module_exit(fwctl_exit);
+MODULE_DESCRIPTION("fwctl device firmware access framework");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
new file mode 100644
index 00000000000000..ef4eaa87c945e4
--- /dev/null
+++ b/include/linux/fwctl.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#ifndef __LINUX_FWCTL_H
+#define __LINUX_FWCTL_H
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/cleanup.h>
+
+struct fwctl_device;
+struct fwctl_uctx;
+
+struct fwctl_ops {
+};
+
+/**
+ * struct fwctl_device - Per-driver registration struct
+ * @dev: The sysfs (class/fwctl/fwctlXX) device
+ *
+ * Each driver instance will have one of these structs with the driver
+ * private data following immeidately after. This struct is refcounted,
+ * it is freed by calling fwctl_put().
+ */
+struct fwctl_device {
+ struct device dev;
+ /* private: */
+ struct cdev cdev;
+ const struct fwctl_ops *ops;
+};
+
+struct fwctl_device *_fwctl_alloc_device(struct device *parent,
+ const struct fwctl_ops *ops,
+ size_t size);
+/**
+ * fwctl_alloc_device - Allocate a fwctl
+ * @parent: Physical device that provides the FW interface
+ * @ops: Driver ops to register
+ * @drv_struct: 'struct driver_fwctl' that holds the struct fwctl_device
+ * @member: Name of the struct fwctl_device in @drv_struct
+ *
+ * This allocates and initializes the fwctl_device embedded in the drv_struct.
+ * Upon success the pointer must be freed via fwctl_put(). Returns NULL on
+ * failure. Returns a 'drv_struct *' on success, NULL on error.
+ */
+#define fwctl_alloc_device(parent, ops, drv_struct, member) \
+ container_of(_fwctl_alloc_device( \
+ parent, ops, \
+ sizeof(drv_struct) + \
+ BUILD_BUG_ON_ZERO( \
+ offsetof(drv_struct, member))), \
+ drv_struct, member)
+
+static inline struct fwctl_device *fwctl_get(struct fwctl_device *fwctl)
+{
+ get_device(&fwctl->dev);
+ return fwctl;
+}
+static inline void fwctl_put(struct fwctl_device *fwctl)
+{
+ put_device(&fwctl->dev);
+}
+DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
+
+int fwctl_register(struct fwctl_device *fwctl);
+void fwctl_unregister(struct fwctl_device *fwctl);
+
+#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-04 12:16 ` Zhu Yanjun
2024-06-05 15:42 ` Przemek Kitszel
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
` (7 subsequent siblings)
9 siblings, 2 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Each file descriptor gets a chunk of per-FD driver specific context that
allows the driver to attach a device specific struct to. The core code
takes care of the memory lifetime for this structure.
The ioctl dispatch and design is based on what was built for iommufd. The
ioctls have a struct which has a combined in/out behavior with a typical
'zero pad' scheme for future extension and backwards compatibility.
Like iommufd some shared logic does most of the ioctl marshalling and
compatibility work and tables diatches to some function pointers for
each unique iotcl.
This approach has proven to work quite well in the iommufd and rdma
subsystems.
Allocate an ioctl number space for the subsystem.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 1 +
drivers/fwctl/main.c | 124 +++++++++++++++++-
include/linux/fwctl.h | 31 +++++
include/uapi/fwctl/fwctl.h | 41 ++++++
5 files changed, 196 insertions(+), 2 deletions(-)
create mode 100644 include/uapi/fwctl/fwctl.h
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index a141e8e65c5d3a..4d91c5a20b98c8 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -324,6 +324,7 @@ Code Seq# Include File Comments
0x97 00-7F fs/ceph/ioctl.h Ceph file system
0x99 00-0F 537-Addinboard driver
<mailto:buk@buks.ipn.de>
+0x9A 00-0F include/uapi/fwctl/fwctl.h
0xA0 all linux/sdp/sdp.h Industrial Device Project
<mailto:kenji@bitgate.com>
0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver
diff --git a/MAINTAINERS b/MAINTAINERS
index 833b853808421e..94062161e9c4d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9084,6 +9084,7 @@ S: Maintained
F: Documentation/userspace-api/fwctl.rst
F: drivers/fwctl/
F: include/linux/fwctl.h
+F: include/uapi/fwctl/
GALAXYCORE GC0308 CAMERA SENSOR DRIVER
M: Sebastian Reichel <sre@kernel.org>
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index ff9b7bad5a2b0d..7ecdabdd9dcb1e 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -9,26 +9,131 @@
#include <linux/container_of.h>
#include <linux/fs.h>
+#include <uapi/fwctl/fwctl.h>
+
enum {
FWCTL_MAX_DEVICES = 256,
};
static dev_t fwctl_dev;
static DEFINE_IDA(fwctl_ida);
+struct fwctl_ucmd {
+ struct fwctl_uctx *uctx;
+ void __user *ubuffer;
+ void *cmd;
+ u32 user_size;
+};
+
+/* On stack memory for the ioctl structs */
+union ucmd_buffer {
+};
+
+struct fwctl_ioctl_op {
+ unsigned int size;
+ unsigned int min_size;
+ unsigned int ioctl_num;
+ int (*execute)(struct fwctl_ucmd *ucmd);
+};
+
+#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
+ [_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = { \
+ .size = sizeof(_struct) + \
+ BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
+ sizeof(_struct)), \
+ .min_size = offsetofend(_struct, _last), \
+ .ioctl_num = _ioctl, \
+ .execute = _fn, \
+ }
+static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
+};
+
+static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct fwctl_uctx *uctx = filp->private_data;
+ const struct fwctl_ioctl_op *op;
+ struct fwctl_ucmd ucmd = {};
+ union ucmd_buffer buf;
+ unsigned int nr;
+ int ret;
+
+ nr = _IOC_NR(cmd);
+ if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
+ return -ENOIOCTLCMD;
+ op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
+ if (op->ioctl_num != cmd)
+ return -ENOIOCTLCMD;
+
+ ucmd.uctx = uctx;
+ ucmd.cmd = &buf;
+ ucmd.ubuffer = (void __user *)arg;
+ ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
+ if (ret)
+ return ret;
+
+ if (ucmd.user_size < op->min_size)
+ return -EINVAL;
+
+ ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
+ ucmd.user_size);
+ if (ret)
+ return ret;
+
+ guard(rwsem_read)(&uctx->fwctl->registration_lock);
+ if (!uctx->fwctl->ops)
+ return -ENODEV;
+ return op->execute(&ucmd);
+}
+
static int fwctl_fops_open(struct inode *inode, struct file *filp)
{
struct fwctl_device *fwctl =
container_of(inode->i_cdev, struct fwctl_device, cdev);
+ struct fwctl_uctx *uctx __free(kfree) = NULL;
+ int ret;
+
+ guard(rwsem_read)(&fwctl->registration_lock);
+ if (!fwctl->ops)
+ return -ENODEV;
+
+ uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
+ if (!uctx)
+ return -ENOMEM;
+
+ uctx->fwctl = fwctl;
+ ret = fwctl->ops->open_uctx(uctx);
+ if (ret)
+ return ret;
+
+ scoped_guard(mutex, &fwctl->uctx_list_lock) {
+ list_add_tail(&uctx->uctx_list_entry, &fwctl->uctx_list);
+ }
get_device(&fwctl->dev);
- filp->private_data = fwctl;
+ filp->private_data = no_free_ptr(uctx);
return 0;
}
+static void fwctl_destroy_uctx(struct fwctl_uctx *uctx)
+{
+ lockdep_assert_held(&uctx->fwctl->uctx_list_lock);
+ list_del(&uctx->uctx_list_entry);
+ uctx->fwctl->ops->close_uctx(uctx);
+}
+
static int fwctl_fops_release(struct inode *inode, struct file *filp)
{
- struct fwctl_device *fwctl = filp->private_data;
+ struct fwctl_uctx *uctx = filp->private_data;
+ struct fwctl_device *fwctl = uctx->fwctl;
+ scoped_guard(rwsem_read, &fwctl->registration_lock) {
+ if (fwctl->ops) {
+ guard(mutex)(&fwctl->uctx_list_lock);
+ fwctl_destroy_uctx(uctx);
+ }
+ }
+
+ kfree(uctx);
fwctl_put(fwctl);
return 0;
}
@@ -37,6 +142,7 @@ static const struct file_operations fwctl_fops = {
.owner = THIS_MODULE,
.open = fwctl_fops_open,
.release = fwctl_fops_release,
+ .unlocked_ioctl = fwctl_fops_ioctl,
};
static void fwctl_device_release(struct device *device)
@@ -46,6 +152,7 @@ static void fwctl_device_release(struct device *device)
if (fwctl->dev.devt)
ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+ mutex_destroy(&fwctl->uctx_list_lock);
kfree(fwctl);
}
@@ -69,6 +176,9 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
return NULL;
fwctl->dev.class = &fwctl_class;
fwctl->dev.parent = parent;
+ init_rwsem(&fwctl->registration_lock);
+ mutex_init(&fwctl->uctx_list_lock);
+ INIT_LIST_HEAD(&fwctl->uctx_list);
device_initialize(&fwctl->dev);
return_ptr(fwctl);
}
@@ -134,8 +244,18 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
*/
void fwctl_unregister(struct fwctl_device *fwctl)
{
+ struct fwctl_uctx *uctx;
+
cdev_device_del(&fwctl->cdev, &fwctl->dev);
+ /* Disable and free the driver's resources for any still open FDs. */
+ guard(rwsem_write)(&fwctl->registration_lock);
+ guard(mutex)(&fwctl->uctx_list_lock);
+ while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
+ struct fwctl_uctx,
+ uctx_list_entry)))
+ fwctl_destroy_uctx(uctx);
+
/*
* The driver module may unload after this returns, the op pointer will
* not be valid.
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index ef4eaa87c945e4..1d9651de92fc19 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -11,7 +11,20 @@
struct fwctl_device;
struct fwctl_uctx;
+/**
+ * struct fwctl_ops - Driver provided operations
+ * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
+ * bytes of this memory will be a fwctl_uctx. The driver can use the
+ * remaining bytes as its private memory.
+ * @open_uctx: Called when a file descriptor is opened before the uctx is ever
+ * used.
+ * @close_uctx: Called when the uctx is destroyed, usually when the FD is
+ * closed.
+ */
struct fwctl_ops {
+ size_t uctx_size;
+ int (*open_uctx)(struct fwctl_uctx *uctx);
+ void (*close_uctx)(struct fwctl_uctx *uctx);
};
/**
@@ -26,6 +39,10 @@ struct fwctl_device {
struct device dev;
/* private: */
struct cdev cdev;
+
+ struct rw_semaphore registration_lock;
+ struct mutex uctx_list_lock;
+ struct list_head uctx_list;
const struct fwctl_ops *ops;
};
@@ -65,4 +82,18 @@ DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
int fwctl_register(struct fwctl_device *fwctl);
void fwctl_unregister(struct fwctl_device *fwctl);
+/**
+ * struct fwctl_uctx - Per user FD context
+ * @fwctl: fwctl instance that owns the context
+ *
+ * Every FD opened by userspace will get a unique context allocation. Any driver
+ * private data will follow immediately after.
+ */
+struct fwctl_uctx {
+ struct fwctl_device *fwctl;
+ /* private: */
+ /* Head at fwctl_device::uctx_list */
+ struct list_head uctx_list_entry;
+};
+
#endif
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
new file mode 100644
index 00000000000000..0bdce95b6d69d9
--- /dev/null
+++ b/include/uapi/fwctl/fwctl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
+ */
+#ifndef _UAPI_FWCTL_H
+#define _UAPI_FWCTL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define FWCTL_TYPE 0x9A
+
+/**
+ * DOC: General ioctl format
+ *
+ * The ioctl interface follows a general format to allow for extensibility. Each
+ * ioctl is passed in a structure pointer as the argument providing the size of
+ * the structure in the first u32. The kernel checks that any structure space
+ * beyond what it understands is 0. This allows userspace to use the backward
+ * compatible portion while consistently using the newer, larger, structures.
+ *
+ * ioctls use a standard meaning for common errnos:
+ *
+ * - ENOTTY: The IOCTL number itself is not supported at all
+ * - E2BIG: The IOCTL number is supported, but the provided structure has
+ * non-zero in a part the kernel does not understand.
+ * - EOPNOTSUPP: The IOCTL number is supported, and the structure is
+ * understood, however a known field has a value the kernel does not
+ * understand or support.
+ * - EINVAL: Everything about the IOCTL was understood, but a field is not
+ * correct.
+ * - ENOMEM: Out of memory.
+ * - ENODEV: The underlying device has been hot-unplugged and the FD is
+ * orphaned.
+ *
+ * As well as additional errnos, within specific ioctls.
+ */
+enum {
+ FWCTL_CMD_BASE = 0,
+};
+
+#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-13 23:32 ` Dave Jiang
2024-06-03 15:53 ` [PATCH 4/8] taint: Add TAINT_FWCTL Jason Gunthorpe
` (6 subsequent siblings)
9 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Userspace will need to know some details about the fwctl interface being
used to locate the correct userspace code to communicate with the
kernel. Provide a simple device_type enum indicating what the kernel
driver is.
Allow the device to provide a device specific info struct that contains
any additional information that the driver may need to provide to
userspace.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/fwctl/main.c | 54 ++++++++++++++++++++++++++++++++++++++
include/linux/fwctl.h | 8 ++++++
include/uapi/fwctl/fwctl.h | 29 ++++++++++++++++++++
3 files changed, 91 insertions(+)
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index 7ecdabdd9dcb1e..10e3f504893892 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -17,6 +17,8 @@ enum {
static dev_t fwctl_dev;
static DEFINE_IDA(fwctl_ida);
+DEFINE_FREE(kfree_errptr, void *, if (!IS_ERR_OR_NULL(_T)) kfree(_T));
+
struct fwctl_ucmd {
struct fwctl_uctx *uctx;
void __user *ubuffer;
@@ -24,8 +26,59 @@ struct fwctl_ucmd {
u32 user_size;
};
+static int ucmd_respond(struct fwctl_ucmd *ucmd, size_t cmd_len)
+{
+ if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
+ min_t(size_t, ucmd->user_size, cmd_len)))
+ return -EFAULT;
+ return 0;
+}
+
+static int copy_to_user_zero_pad(void __user *to, const void *from,
+ size_t from_len, size_t user_len)
+{
+ size_t copy_len;
+
+ copy_len = min(from_len, user_len);
+ if (copy_to_user(to, from, copy_len))
+ return -EFAULT;
+ if (copy_len < user_len) {
+ if (clear_user(to + copy_len, user_len - copy_len))
+ return -EFAULT;
+ }
+ return 0;
+}
+
+static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
+{
+ struct fwctl_device *fwctl = ucmd->uctx->fwctl;
+ struct fwctl_info *cmd = ucmd->cmd;
+ size_t driver_info_len = 0;
+
+ if (cmd->flags)
+ return -EOPNOTSUPP;
+
+ if (cmd->device_data_len) {
+ void *driver_info __free(kfree_errptr) = NULL;
+
+ driver_info = fwctl->ops->info(ucmd->uctx, &driver_info_len);
+ if (IS_ERR(driver_info))
+ return PTR_ERR(driver_info);
+
+ if (copy_to_user_zero_pad(u64_to_user_ptr(cmd->out_device_data),
+ driver_info, driver_info_len,
+ cmd->device_data_len))
+ return -EFAULT;
+ }
+
+ cmd->out_device_type = fwctl->ops->device_type;
+ cmd->device_data_len = driver_info_len;
+ return ucmd_respond(ucmd, sizeof(*cmd));
+}
+
/* On stack memory for the ioctl structs */
union ucmd_buffer {
+ struct fwctl_info info;
};
struct fwctl_ioctl_op {
@@ -45,6 +98,7 @@ struct fwctl_ioctl_op {
.execute = _fn, \
}
static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
+ IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
};
static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 1d9651de92fc19..9a906b861acf3a 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -7,12 +7,14 @@
#include <linux/device.h>
#include <linux/cdev.h>
#include <linux/cleanup.h>
+#include <uapi/fwctl/fwctl.h>
struct fwctl_device;
struct fwctl_uctx;
/**
* struct fwctl_ops - Driver provided operations
+ * @device_type: The drivers assigned device_type number. This is uABI
* @uctx_size: The size of the fwctl_uctx struct to allocate. The first
* bytes of this memory will be a fwctl_uctx. The driver can use the
* remaining bytes as its private memory.
@@ -20,11 +22,17 @@ struct fwctl_uctx;
* used.
* @close_uctx: Called when the uctx is destroyed, usually when the FD is
* closed.
+ * @info: Implement FWCTL_INFO. Return a kmalloc() memory that is copied to
+ * out_device_data. On input length indicates the size of the user buffer
+ * on output it indicates the size of the memory. The driver can ignore
+ * length on input, the core code will handle everything.
*/
struct fwctl_ops {
+ enum fwctl_device_type device_type;
size_t uctx_size;
int (*open_uctx)(struct fwctl_uctx *uctx);
void (*close_uctx)(struct fwctl_uctx *uctx);
+ void *(*info)(struct fwctl_uctx *uctx, size_t *length);
};
/**
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 0bdce95b6d69d9..39db9f09f8068e 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -36,6 +36,35 @@
*/
enum {
FWCTL_CMD_BASE = 0,
+ FWCTL_CMD_INFO = 0,
+ FWCTL_CMD_RPC = 1,
};
+enum fwctl_device_type {
+ FWCTL_DEVICE_TYPE_ERROR = 0,
+};
+
+/**
+ * struct fwctl_info - ioctl(FWCTL_INFO)
+ * @size: sizeof(struct fwctl_info)
+ * @flags: Must be 0
+ * @out_device_type: Returns the type of the device from enum fwctl_device_type
+ * @device_data_len: On input the length of the out_device_data memory. On
+ * output the size of the kernel's device_data which may be larger or
+ * smaller than the input. Maybe 0 on input.
+ * @out_device_data: Pointer to a memory of device_data_len bytes. Kernel will
+ * fill the entire memory, zeroing as required.
+ *
+ * Returns basic information about this fwctl instance, particularly what driver
+ * is being used to define the device_data format.
+ */
+struct fwctl_info {
+ __u32 size;
+ __u32 flags;
+ __u32 out_device_type;
+ __u32 device_data_len;
+ __aligned_u64 out_device_data;
+};
+#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
+
#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 4/8] taint: Add TAINT_FWCTL
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (2 preceding siblings ...)
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
` (5 subsequent siblings)
9 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Requesting a fwctl scope of access that includes mutating device debug
data will cause the kernel to be tainted. Changing the device operation
through things in the debug scope may cause the device to malfunction in
undefined ways. This should be reflected in the TAINT flags to help any
debuggers understand that something has been done.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
Documentation/admin-guide/tainted-kernels.rst | 5 +++++
include/linux/panic.h | 3 ++-
kernel/panic.c | 1 +
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index f92551539e8a66..f91a54966a9719 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -101,6 +101,7 @@ Bit Log Number Reason that got the kernel tainted
16 _/X 65536 auxiliary taint, defined for and used by distros
17 _/T 131072 kernel was built with the struct randomization plugin
18 _/N 262144 an in-kernel test has been run
+ 19 _/J 524288 userspace used a mutating debug operation in fwctl
=== === ====== ========================================================
Note: The character ``_`` is representing a blank in this table to make reading
@@ -182,3 +183,7 @@ More detailed explanation for tainting
produce extremely unusual kernel structure layouts (even performance
pathological ones), which is important to know when debugging. Set at
build time.
+
+ 18) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
+ to use the devices debugging features. Device debugging features could
+ cause the device to malfunction in undefined ways.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index 6717b15e798c38..5dfd5295effd40 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -73,7 +73,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
#define TAINT_AUX 16
#define TAINT_RANDSTRUCT 17
#define TAINT_TEST 18
-#define TAINT_FLAGS_COUNT 19
+#define TAINT_FWCTL 19
+#define TAINT_FLAGS_COUNT 20
#define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)
struct taint_flag {
diff --git a/kernel/panic.c b/kernel/panic.c
index 8bff183d6180e7..b71f573ec7c5fc 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -494,6 +494,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
[ TAINT_AUX ] = { 'X', ' ', true },
[ TAINT_RANDSTRUCT ] = { 'T', ' ', true },
[ TAINT_TEST ] = { 'N', ' ', true },
+ [ TAINT_FWCTL ] = { 'J', ' ', true },
};
/**
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (3 preceding siblings ...)
2024-06-03 15:53 ` [PATCH 4/8] taint: Add TAINT_FWCTL Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 6/8] fwctl: Add documentation Jason Gunthorpe
` (4 subsequent siblings)
9 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Add the FWCTL_RPC ioctl which allows a request/response RPC call to device
firmware. Drivers implementing this call must follow the security
guidelines under Documentation/userspace-api/fwctl.rst
The core code provides some memory management helpers to get the messages
copied from and back to userspace. The driver is responsible for
allocating the output message memory and delivering the message to the
device.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/fwctl/main.c | 63 ++++++++++++++++++++++++++++++++++++
include/linux/fwctl.h | 5 +++
include/uapi/fwctl/fwctl.h | 66 ++++++++++++++++++++++++++++++++++++++
3 files changed, 134 insertions(+)
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index 10e3f504893892..a9d4b764832bb8 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -8,16 +8,20 @@
#include <linux/slab.h>
#include <linux/container_of.h>
#include <linux/fs.h>
+#include <linux/sizes.h>
#include <uapi/fwctl/fwctl.h>
enum {
FWCTL_MAX_DEVICES = 256,
+ MAX_RPC_LEN = SZ_2M,
};
static dev_t fwctl_dev;
static DEFINE_IDA(fwctl_ida);
+static unsigned long fwctl_tainted;
DEFINE_FREE(kfree_errptr, void *, if (!IS_ERR_OR_NULL(_T)) kfree(_T));
+DEFINE_FREE(kvfree_errptr, void *, if (!IS_ERR_OR_NULL(_T)) kvfree(_T));
struct fwctl_ucmd {
struct fwctl_uctx *uctx;
@@ -76,9 +80,67 @@ static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
return ucmd_respond(ucmd, sizeof(*cmd));
}
+static int fwctl_cmd_rpc(struct fwctl_ucmd *ucmd)
+{
+ struct fwctl_device *fwctl = ucmd->uctx->fwctl;
+ struct fwctl_rpc *cmd = ucmd->cmd;
+ void *outbuf __free(kvfree_errptr) = NULL;
+ void *inbuf __free(kvfree) = NULL;
+ size_t out_len;
+
+ if (cmd->in_len > MAX_RPC_LEN || cmd->out_len > MAX_RPC_LEN)
+ return -EMSGSIZE;
+
+ switch (cmd->scope) {
+ case FWCTL_RPC_CONFIGURATION:
+ case FWCTL_RPC_DEBUG_READ_ONLY:
+ break;
+
+ case FWCTL_RPC_DEBUG_WRITE_FULL:
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+ fallthrough;
+ case FWCTL_RPC_DEBUG_WRITE:
+ if (!test_and_set_bit(0, &fwctl_tainted)) {
+ dev_warn(
+ &fwctl->dev,
+ "%s(%d): has requested full access to the physical device device",
+ current->comm, task_pid_nr(current));
+ add_taint(TAINT_FWCTL, LOCKDEP_STILL_OK);
+ }
+ break;
+ default:
+ return -EOPNOTSUPP;
+ };
+
+ inbuf = kvzalloc(cmd->in_len, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
+ if (!inbuf)
+ return -ENOMEM;
+ if (copy_from_user(inbuf, u64_to_user_ptr(cmd->in), cmd->in_len))
+ return -EFAULT;
+
+ out_len = cmd->out_len;
+ outbuf = fwctl->ops->fw_rpc(ucmd->uctx, cmd->scope, inbuf, cmd->in_len,
+ &out_len);
+ if (IS_ERR(outbuf))
+ return PTR_ERR(outbuf);
+ if (outbuf == inbuf) {
+ /* The driver can re-use inbuf as outbuf */
+ inbuf = NULL;
+ }
+
+ if (copy_to_user(u64_to_user_ptr(cmd->out), outbuf,
+ min(cmd->out_len, out_len)))
+ return -EFAULT;
+
+ cmd->out_len = out_len;
+ return ucmd_respond(ucmd, sizeof(*cmd));
+}
+
/* On stack memory for the ioctl structs */
union ucmd_buffer {
struct fwctl_info info;
+ struct fwctl_rpc rpc;
};
struct fwctl_ioctl_op {
@@ -99,6 +161,7 @@ struct fwctl_ioctl_op {
}
static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
+ IOCTL_OP(FWCTL_RPC, fwctl_cmd_rpc, struct fwctl_rpc, out),
};
static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 9a906b861acf3a..294cfbf63306a2 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -26,6 +26,9 @@ struct fwctl_uctx;
* out_device_data. On input length indicates the size of the user buffer
* on output it indicates the size of the memory. The driver can ignore
* length on input, the core code will handle everything.
+ * @fw_rpc: Implement FWCTL_RPC. Deliver rpc_in/in_len to the FW and return
+ * the response and set out_len. rpc_in can be returned as the response
+ * pointer. Otherwise the returned pointer is freed with kvfree().
*/
struct fwctl_ops {
enum fwctl_device_type device_type;
@@ -33,6 +36,8 @@ struct fwctl_ops {
int (*open_uctx)(struct fwctl_uctx *uctx);
void (*close_uctx)(struct fwctl_uctx *uctx);
void *(*info)(struct fwctl_uctx *uctx, size_t *length);
+ void *(*fw_rpc)(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+ void *rpc_in, size_t in_len, size_t *out_len);
};
/**
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 39db9f09f8068e..8bde0d4416fd55 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -67,4 +67,70 @@ struct fwctl_info {
};
#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
+/**
+ * enum fwctl_rpc_scope - Scope of access for the RPC
+ */
+enum fwctl_rpc_scope {
+ /**
+ * @FWCTL_RPC_CONFIGURATION: Device configuration access scope
+ *
+ * Read/write access to device configuration. When configuration
+ * is written to the device remains in a fully supported state.
+ */
+ FWCTL_RPC_CONFIGURATION = 0,
+ /**
+ * @FWCTL_RPC_DEBUG_READ_ONLY: Read only access to debug information
+ *
+ * Readable debug information. Debug information is compatible with
+ * kernel lockdown, and does not disclose any sensitive information. For
+ * instance exposing any encryption secrets from this information is
+ * forbidden.
+ */
+ FWCTL_RPC_DEBUG_READ_ONLY = 1,
+ /**
+ * @FWCTL_RPC_DEBUG_WRITE: Writable access to lockdown compatible debug information
+ *
+ * Allows write access to data in the device which may leave a fully
+ * supported state. This is intended to permit intensive and possibly
+ * invasive debugging. This scope will taint the kernel.
+ */
+ FWCTL_RPC_DEBUG_WRITE = 2,
+ /**
+ * @FWCTL_RPC_DEBUG_WRITE_FULL: Writable access to all debug information
+ *
+ * Allows read/write access to everything. Requires CAP_SYS_RAW_IO, so
+ * it is not required to follow lockdown principals. If in doubt
+ * debugging should be placed in this scope. This scope will taint the
+ * kernel.
+ */
+ FWCTL_RPC_DEBUG_WRITE_FULL = 3,
+};
+
+/**
+ * struct fwctl_rpc - ioctl(FWCTL_RPC)
+ * @size: sizeof(struct fwctl_rpc)
+ * @scope: One of enum fwctl_rpc_scope, required scope for the RPC
+ * @in_len: Length of the in memory
+ * @out_len: Length of the out memory
+ * @in: Request message in device specific format
+ * @out: Response message in device specific format
+ *
+ * Deliver a Remote Procedure Call to the device FW and return the response. The
+ * call's parameters and return are marshaled into linear buffers of memory. Any
+ * errno indicates that delivery of the RPC to the device failed. Return status
+ * originating in the device during a successful delivery must be encoded into
+ * out.
+ *
+ * The format of the buffers matches the out_device_type from FWCTL_INFO.
+ */
+struct fwctl_rpc {
+ __u32 size;
+ __u32 scope;
+ __u32 in_len;
+ __u32 out_len;
+ __aligned_u64 in;
+ __aligned_u64 out;
+};
+#define FWCTL_RPC _IO(FWCTL_TYPE, FWCTL_CMD_RPC)
+
#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 6/8] fwctl: Add documentation
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (4 preceding siblings ...)
2024-06-03 15:53 ` [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-05 2:31 ` Randy Dunlap
2024-06-03 15:53 ` [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
` (3 subsequent siblings)
9 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Document the purpose and rules for the fwctl subsystem.
Link in kdocs to the doc tree.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
Documentation/userspace-api/fwctl.rst | 269 ++++++++++++++++++++++++++
Documentation/userspace-api/index.rst | 1 +
2 files changed, 270 insertions(+)
create mode 100644 Documentation/userspace-api/fwctl.rst
diff --git a/Documentation/userspace-api/fwctl.rst b/Documentation/userspace-api/fwctl.rst
new file mode 100644
index 00000000000000..630e75a91838f0
--- /dev/null
+++ b/Documentation/userspace-api/fwctl.rst
@@ -0,0 +1,269 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+fwctl subsystem
+===============
+
+:Author: Jason Gunthorpe
+
+Overview
+========
+
+Modern devices contain extensive amounts of FW, and in many cases, are largely
+software defined pieces of hardware. The evolution of this approach is largely a
+reaction to Moore's Law where a chip tape out is now highly expensive, and the
+chip design is extremely large. Replacing fixed HW logic with a flexible and
+tightly coupled FW/HW combination is an effective risk mitigation against chip
+respin. Problems in the HW design can be counteracted in device FW. This is
+especially true for devices which present a stable and backwards compatible
+interface to the operating system driver (such as NVMe).
+
+The FW layer in devices has grown to incredible sizes and devices frequently
+integrate clusters of fast processors to run it. For example, mlx5 devices have
+over 30MB of FW code, and big configurations operate with over 1GB of FW managed
+runtime state.
+
+The availability of such a flexible layer has created quite a variety in the
+industry where single pieces of silicon are now configurable software defined
+devices and can operate in substantially different ways depending on the need.
+Further we often see cases where specific sites wish to operate devices in ways
+that are highly specialized and require applications that have been tailored to
+their unique configuration.
+
+Further, devices have become multi-functional and integrated to the point they
+no longer fit neatly into the kernel's division of subsystems. Modern
+multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
+subsystems while sharing the underlying hardware using the auxiliary device
+system.
+
+All together this creates a challenge for the operating system, where devices
+have an expansive FW environment that needs robust device-specific debugging
+support, and FW driven functionality that is not well suited to “generic”
+interfaces. fwctl seeks to allow access to the full device functionality from
+user space in the areas of debuggability, management, and first-boot/nth-boot
+provisioning.
+
+fwctl is aimed at the common device design pattern where the OS and FW
+communicate via an RPC message layer constructed with a queue or mailbox scheme.
+In this case the driver will typically have some layer to deliver RPC messages
+and collect RPC responses from device FW. The in-kernel subsystem drivers that
+operate the device for its primary purposes will use these RPCs to build their
+drivers, but devices also usually have a set of ancillary RPCs that don't really
+fit into any specific subsystem. For example, a HW RAID controller is primarily
+operated by the block layer but also comes with a set of RPCs to administer the
+construction of drives within the HW RAID.
+
+In the past when devices were more single function individual subsystems would
+grow different approaches to solving some of these common problems, for instance
+monitoring device health, manipulating its FLASH, debugging the FW,
+provisioning, all have various unique interfaces across the kernel.
+
+fwctl's purpose is to define a common set of limited rules, described below,
+that allow user space to securely construct and execute RPCs inside device FW.
+The rules serve as an agreement between the operating system and FW on how to
+correctly design the RPC interface. As a uAPI the subsystem provides a thin
+layer of discovery and a generic uAPI to deliver the RPCs and collect the
+response. It supports a system of user space libraries and tools which will
+use this interface to control the device using the device native protocols.
+
+Scope of Action
+---------------
+
+fwctl drivers are strictly restricted to being a way to operate the device FW.
+It is not an avenue to access random kernel internals, or other operating system
+SW states.
+
+fwctl instances must operate on a well-defined device function, and the device
+should have a well-defined security model for what scope within the physical
+device the function is permitted to access. For instance, the most complex PCIe
+device today may broadly have several function level scopes:
+
+ 1. A privileged function with full access to the on-device global state and
+ configuration
+
+ 2. Multiple hypervisor functions with control over itself and child functions
+ used with VMs
+
+ 3. Multiple VM functions tightly scoped within the VM
+
+The device may create a logical parent/child relationship between these scopes,
+for instance a child VM's FW may be within the scope of the hypervisor FW. It is
+quite common in the VFIO world that the hypervisor environment has a complex
+provisioning/profiling/configuration responsibility for the function VFIO
+assigns to the VM.
+
+Further, within the function, devices often have RPC commands that fall within
+some general scopes of action:
+
+ 1. Access to function & child configuration, flash, etc that becomes live at a
+ function reset.
+
+ 2. Access to function & child runtime configuration that kernel drivers can
+ discover at runtime.
+
+ 3. Read only access to function debug information that may report on FW objects
+ in the function & child, including FW objects owned by other kernel
+ subsystems.
+
+ 4. Write access to function & child debug information strictly compatible with
+ the principles of kernel lockdown and kernel integrity protection. Triggers
+ a kernel Taint.
+
+ 5. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
+
+Userspace will provide a scope label on each RPC and the kernel must enforce the
+above CAP's and taints based on that scope. A combination of kernel and FW can
+enforce that RPCs are placed in the correct scope by userspace.
+
+Denied behavior
+---------------
+
+There are many things this interface must not allow user space to do (without a
+Taint or CAP), broadly derived from the principles of kernel lockdown. Some
+examples:
+
+ 1. DMA to/from arbitrary memory, hang the system, run code in the device, or
+ otherwise compromise device or system security and integrity.
+
+ 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
+ objects owned by kernel drivers.
+
+ 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
+ driver can react to the device configuration at function reset/driver load
+ time, but otherwise should not be coupled to fwctl.
+
+ 4. Operate the HW in a way that overlaps with the core purpose of another
+ primary kernel subsystem, such as read/write to LBAs, send/receive of
+ network packets, or operate an accelerator's data plane.
+
+fwctl is not a replacement for device direct access subsystems like uacce or
+VFIO.
+
+fwctl User API
+==============
+
+.. kernel-doc:: include/uapi/fwctl/fwctl.h
+.. kernel-doc:: include/uapi/fwctl/mlx5.h
+
+sysfs Class
+-----------
+
+fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
+(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
+operates the iotcl uAPI described above.
+
+fwctl devices can be related to driver components in other subsystems through
+sysfs::
+
+ $ ls /sys/class/fwctl/fwctl0/device/infiniband/
+ ibp0s10f0
+
+ $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
+ fwctl0/
+
+ $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
+ dev device power subsystem uevent
+
+User space Community
+--------------------
+
+Drawing inspiration from nvme-cli, participating in the kernel side must come
+with a user space in a common TBD git tree, at a minimum to usefully operate the
+kernel driver. Providing such an implementation is a pre-condition to merging a
+kernel driver.
+
+The goal is to build user space community around some of the shared problems
+we all have, and ideally develop some common user space programs with some
+starting themes of:
+
+ - Device in-field debugging
+
+ - HW provisioning
+
+ - VFIO child device profiling before VM boot
+
+ - Confidential Compute topics (attestation, secure provisioning)
+
+That stretches across all subsystems in the kernel. fwupd is a great example of
+how an excellent user space experience can emerge out of kernel-side diversity.
+
+fwctl Kernel API
+================
+
+.. kernel-doc:: drivers/fwctl/main.c
+ :export:
+.. kernel-doc:: include/linux/fwctl.h
+
+fwctl Driver design
+-------------------
+
+In many cases a fwctl driver is going to be part of a larger cross-subsystem
+device possibly using the auxiliary_device mechanism. In that case several
+subsystems are going to be sharing the same device and FW interface layer so the
+device design must already provide for isolation and co-operation between kernel
+subsystems. fwctl should fit into that same model.
+
+Part of the driver should include a description of how its scope restrictions
+and security model work. The driver and FW together must ensure that RPCs
+provided by user space are mapped to the appropriate scope. If the validation is
+done in the driver then the validation can read a 'command effects' report from
+the device, or hardwire the enforcement. If the validation is done in the FW,
+then the driver should pass the fwctl_rpc_scope to the FW along with the command.
+
+The driver and FW must co-operate to ensure that either fwctl cannot allocate
+any FW resources, or any resources it does allocate are freed on FD closure. A
+driver primarily constructed around FW RPCs may find that its core PCI function
+and RPC layer belongs under fwctl with auxiliary devices connecting to other
+subsystems.
+
+Each device type must represent a stable FW ABI, such that the userspace
+components have the same general stability we expect from the kernel. FW upgrade
+should not break the userspace tools.
+
+Security Response
+=================
+
+The kernel remains the gatekeeper for this interface. If violations of the
+scopes, security or isolation principles are found, we have options to let
+devices fix them with a FW update, push a kernel patch to parse and block RPC
+commands or push a kernel patch to block entire firmware versions, or devices.
+
+While the kernel can always directly parse and restrict RPCs, it is expected
+that the existing kernel pattern of allowing drivers to delegate validation to
+FW to be a useful design.
+
+Existing Similar Examples
+=========================
+
+The approach described in this document is not a new idea. Direct, or near
+direct device access has been offered by the kernel in different areas for
+decades. With more devices wanting to follow this design pattern it is becoming
+clear that it is not entirely well understood and, more importantly, the
+security considerations are not well defined or agreed upon.
+
+Some examples:
+
+ - HW RAID controllers. This includes RPCs to do things like compose drives into
+ a RAID volume, configure RAID parameters, monitor the HW and more.
+
+ - Baseboard managers. RPCs for configuring settings in the device and more
+
+ - NVMe vendor command capsules. nvme-cli provides access to some monitoring
+ functions that different products have defined, but more exists.
+
+ - CXL also has a NVMe like vendor command system.
+
+ - DRM allows user space drivers to send commands to the device via kernel
+ mediation
+
+ - RDMA allows user space drivers to directly push commands to the device
+ without kernel involvement
+
+ - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc
+
+The first 4 would be examples of areas that fwctl intends to cover.
+
+Some key lessons learned from these past efforts are the importance of having a
+common user space project to use as a pre-condition for obtaining a kernel
+driver. Developing good community around useful software in user space is key to
+getting companies to fund participation to enable their products.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 5926115ec0ed86..9685942fc8a21f 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -43,6 +43,7 @@ Devices and I/O
accelerators/ocxl
dma-buf-alloc-exchange
+ fwctl
gpio/index
iommu
iommufd
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (5 preceding siblings ...)
2024-06-03 15:53 ` [PATCH 6/8] fwctl: Add documentation Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
` (2 subsequent siblings)
9 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
From: Saeed Mahameed <saeedm@nvidia.com>
mlx5's fw has long provided a User Context concept. This has a long
history in RDMA as part of the devx extended verbs programming
interface. A User Context is a security envelope that contains objects and
controls access. It contains the Protection Domain object from the
InfiniBand Architecture and both togther provide the OS with the necessary
tools to bind a security context like a process to the device.
The security context is restricted to not be able to touch the kernel or
other processes. In the RDMA verbs case it is also restricted to not touch
global device resources.
The fwctl_mlx5 takes this approach and builds a User Context per fwctl
file descriptor and uses a FW security capability on the User Context to
enable access to global device resources. This makes the context useful
for provisioning and debugging the global device state.
mlx5 already has a robust infrastructure for delivering RPC messages to
fw. Trivially connect fwctl's RPC mechanism to mlx5_cmd_do(). Enforce the
User Context ID in every RPC header so the FW knows the security context
of the issuing ID.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
MAINTAINERS | 7 +
drivers/fwctl/Kconfig | 14 ++
drivers/fwctl/Makefile | 1 +
drivers/fwctl/mlx5/Makefile | 4 +
drivers/fwctl/mlx5/main.c | 333 ++++++++++++++++++++++++++++++++++++
include/uapi/fwctl/fwctl.h | 1 +
include/uapi/fwctl/mlx5.h | 36 ++++
7 files changed, 396 insertions(+)
create mode 100644 drivers/fwctl/mlx5/Makefile
create mode 100644 drivers/fwctl/mlx5/main.c
create mode 100644 include/uapi/fwctl/mlx5.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 94062161e9c4d7..3bd74656d73d5d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9086,6 +9086,13 @@ F: drivers/fwctl/
F: include/linux/fwctl.h
F: include/uapi/fwctl/
+FWCTL MLX5 DRIVER
+M: Saeed Mahameed <saeedm@nvidia.com>
+R: Itay Avraham <itayavr@nvidia.com>
+L: linux-kernel@vger.kernel.org
+S: Maintained
+F: drivers/fwctl/mlx5/
+
GALAXYCORE GC0308 CAMERA SENSOR DRIVER
M: Sebastian Reichel <sre@kernel.org>
L: linux-media@vger.kernel.org
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
index 6ceee3347ae642..d7aa64710146ce 100644
--- a/drivers/fwctl/Kconfig
+++ b/drivers/fwctl/Kconfig
@@ -7,3 +7,17 @@ menuconfig FWCTL
support a wide range of lockdown compatible device behaviors including
manipulating device FLASH, debugging, and other activities that don't
fit neatly into an existing subsystem.
+
+if FWCTL
+config FWCTL_MLX5
+ tristate "mlx5 ConnectX control fwctl driver"
+ depends on MLX5_CORE
+ help
+ MLX5CTL provides interface for the user process to access the debug and
+ configuration registers of the ConnectX hardware family
+ (NICs, PCI switches and SmartNIC SoCs).
+ This will allow configuration and debug tools to work out of the box on
+ mainstream kernel.
+
+ If you don't know what to do here, say N.
+endif
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
index 1cad210f6ba580..1c535f694d7fe4 100644
--- a/drivers/fwctl/Makefile
+++ b/drivers/fwctl/Makefile
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_FWCTL) += fwctl.o
+obj-$(CONFIG_FWCTL_MLX5) += mlx5/
fwctl-y += main.o
diff --git a/drivers/fwctl/mlx5/Makefile b/drivers/fwctl/mlx5/Makefile
new file mode 100644
index 00000000000000..139a23e3c7c517
--- /dev/null
+++ b/drivers/fwctl/mlx5/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL_MLX5) += mlx5_fwctl.o
+
+mlx5_fwctl-y += main.o
diff --git a/drivers/fwctl/mlx5/main.c b/drivers/fwctl/mlx5/main.c
new file mode 100644
index 00000000000000..d5b751f1e98486
--- /dev/null
+++ b/drivers/fwctl/mlx5/main.c
@@ -0,0 +1,333 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/fwctl.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/driver.h>
+#include <uapi/fwctl/mlx5.h>
+
+#define mlx5ctl_err(mcdev, format, ...) \
+ dev_err(&mcdev->fwctl.dev, format, ##__VA_ARGS__)
+
+#define mlx5ctl_dbg(mcdev, format, ...) \
+ dev_dbg(&mcdev->fwctl.dev, "PID %u: " format, current->pid, \
+ ##__VA_ARGS__)
+
+struct mlx5ctl_uctx {
+ struct fwctl_uctx uctx;
+ u32 uctx_caps;
+ u32 uctx_uid;
+};
+
+struct mlx5ctl_dev {
+ struct fwctl_device fwctl;
+ struct mlx5_core_dev *mdev;
+};
+DEFINE_FREE(mlx5ctl, struct mlx5ctl_dev *, if (_T) fwctl_put(&_T->fwctl));
+
+struct mlx5_ifc_mbox_in_hdr_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mbox_out_hdr_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
+enum {
+ MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES = 0x4,
+};
+
+enum {
+ MLX5_CMD_OP_QUERY_DIAGNOSTIC_PARAMS = 0x819,
+ MLX5_CMD_OP_SET_DIAGNOSTIC_PARAMS = 0x820,
+ MLX5_CMD_OP_QUERY_DIAGNOSTIC_COUNTERS = 0x821,
+};
+
+static int mlx5ctl_alloc_uid(struct mlx5ctl_dev *mcdev, u32 cap)
+{
+ u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {};
+ void *uctx;
+ int ret;
+ u16 uid;
+
+ uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx);
+
+ mlx5ctl_dbg(mcdev, "%s: caps 0x%x\n", __func__, cap);
+ MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
+ MLX5_SET(uctx, uctx, cap, cap);
+
+ ret = mlx5_cmd_exec(mcdev->mdev, in, sizeof(in), out, sizeof(out));
+ if (ret)
+ return ret;
+
+ uid = MLX5_GET(create_uctx_out, out, uid);
+ mlx5ctl_dbg(mcdev, "allocated uid %u with caps 0x%x\n", uid, cap);
+ return uid;
+}
+
+static void mlx5ctl_release_uid(struct mlx5ctl_dev *mcdev, u16 uid)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_uctx_in)] = {};
+ struct mlx5_core_dev *mdev = mcdev->mdev;
+ int ret;
+
+ MLX5_SET(destroy_uctx_in, in, opcode, MLX5_CMD_OP_DESTROY_UCTX);
+ MLX5_SET(destroy_uctx_in, in, uid, uid);
+
+ ret = mlx5_cmd_exec_in(mdev, destroy_uctx, in);
+ mlx5ctl_dbg(mcdev, "released uid %u %pe\n", uid, ERR_PTR(ret));
+}
+
+static int mlx5ctl_open_uctx(struct fwctl_uctx *uctx)
+{
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+ struct mlx5ctl_dev *mcdev =
+ container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+ int uid;
+
+ /*
+ * New FW supports the TOOLS_RESOURCES uid security label
+ * which allows commands to manipulate the global device state.
+ * Otherwise only basic existing RDMA devx privilege are allowed.
+ */
+ if (MLX5_CAP_GEN(mcdev->mdev, uctx_cap) &
+ MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES)
+ mfd->uctx_caps |= MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES;
+
+ uid = mlx5ctl_alloc_uid(mcdev, mfd->uctx_caps);
+ if (uid < 0)
+ return uid;
+
+ mfd->uctx_uid = uid;
+ return 0;
+}
+
+static void mlx5ctl_close_uctx(struct fwctl_uctx *uctx)
+{
+ struct mlx5ctl_dev *mcdev =
+ container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+
+ mlx5ctl_release_uid(mcdev, mfd->uctx_uid);
+}
+
+static void *mlx5ctl_info(struct fwctl_uctx *uctx, size_t *length)
+{
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+ struct fwctl_info_mlx5 *info;
+
+ info = kzalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ info->uid = mfd->uctx_uid;
+ info->uctx_caps = mfd->uctx_caps;
+ return info;
+}
+
+static bool mlx5ctl_validate_rpc(const void *in, enum fwctl_rpc_scope scope)
+{
+ u16 opcode = MLX5_GET(mbox_in_hdr, in, opcode);
+
+ /*
+ * Currently the driver can't keep track of commands that allocate
+ * objects in the FW, these commands are safe from a security
+ * perspective but nothing will free the memory when the FD is closed.
+ * For now permit only query commands. Also the caps for the scope have
+ * not been defined yet, filter commands manually for now.
+ */
+ switch (opcode) {
+ case MLX5_CMD_OP_QUERY_ADAPTER:
+ case MLX5_CMD_OP_QUERY_ESW_FUNCTIONS:
+ case MLX5_CMD_OP_QUERY_HCA_CAP:
+ case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT:
+ return scope <= FWCTL_RPC_CONFIGURATION;
+
+ case MLX5_CMD_OP_QUERY_CONG_PARAMS:
+ case MLX5_CMD_OP_QUERY_CONG_STATISTICS:
+ case MLX5_CMD_OP_QUERY_CONG_STATUS:
+ case MLX5_CMD_OP_QUERY_CQ:
+ case MLX5_CMD_OP_QUERY_DCT:
+ case MLX5_CMD_OP_QUERY_DIAGNOSTIC_COUNTERS:
+ case MLX5_CMD_OP_QUERY_DIAGNOSTIC_PARAMS:
+ case MLX5_CMD_OP_QUERY_EQ:
+ case MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT:
+ case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
+ case MLX5_CMD_OP_QUERY_FLOW_GROUP:
+ case MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY:
+ case MLX5_CMD_OP_QUERY_FLOW_TABLE:
+ case MLX5_CMD_OP_QUERY_GENERAL_OBJECT:
+ case MLX5_CMD_OP_QUERY_ISSI:
+ case MLX5_CMD_OP_QUERY_L2_TABLE_ENTRY:
+ case MLX5_CMD_OP_QUERY_LAG:
+ case MLX5_CMD_OP_QUERY_MAD_DEMUX:
+ case MLX5_CMD_OP_QUERY_MKEY:
+ case MLX5_CMD_OP_QUERY_MODIFY_HEADER_CONTEXT:
+ case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT:
+ case MLX5_CMD_OP_QUERY_PAGES:
+ case MLX5_CMD_OP_QUERY_Q_COUNTER:
+ case MLX5_CMD_OP_QUERY_QP:
+ case MLX5_CMD_OP_QUERY_RMP:
+ case MLX5_CMD_OP_QUERY_RQ:
+ case MLX5_CMD_OP_QUERY_RQT:
+ case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
+ case MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS:
+ case MLX5_CMD_OP_QUERY_SQ:
+ case MLX5_CMD_OP_QUERY_SRQ:
+ case MLX5_CMD_OP_QUERY_TIR:
+ case MLX5_CMD_OP_QUERY_TIS:
+ case MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE:
+ case MLX5_CMD_OP_QUERY_VNIC_ENV:
+ case MLX5_CMD_OP_QUERY_VPORT_COUNTER:
+ case MLX5_CMD_OP_QUERY_VPORT_STATE:
+ case MLX5_CMD_OP_QUERY_WOL_ROL:
+ case MLX5_CMD_OP_QUERY_XRC_SRQ:
+ case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY:
+ case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS:
+ case MLX5_CMD_OP_QUERY_XRQ:
+ return scope <= FWCTL_RPC_DEBUG_READ_ONLY;
+
+ case MLX5_CMD_OP_SET_DIAGNOSTIC_PARAMS:
+ return scope <= FWCTL_RPC_DEBUG_WRITE;
+
+ case MLX5_CMD_OP_ACCESS_REG:
+ return scope <= FWCTL_RPC_DEBUG_WRITE_FULL;
+ default:
+ return false;
+ }
+}
+
+static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+ void *rpc_in, size_t in_len, size_t *out_len)
+{
+ struct mlx5ctl_dev *mcdev =
+ container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+ struct mlx5ctl_uctx *mfd =
+ container_of(uctx, struct mlx5ctl_uctx, uctx);
+ void *rpc_alloc __free(kvfree) = NULL;
+ void *rpc_out;
+ int ret;
+
+ if (in_len < MLX5_ST_SZ_BYTES(mbox_in_hdr) ||
+ *out_len < MLX5_ST_SZ_BYTES(mbox_out_hdr))
+ return ERR_PTR(-EMSGSIZE);
+
+ /* FIXME: Requires device support for more scopes */
+ if (scope != FWCTL_RPC_CONFIGURATION &&
+ scope != FWCTL_RPC_DEBUG_READ_ONLY)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ mlx5ctl_dbg(mcdev, "[UID %d] cmdif: opcode 0x%x inlen %zu outlen %zu\n",
+ mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
+ in_len, *out_len);
+
+ if (!mlx5ctl_validate_rpc(rpc_in, scope))
+ return ERR_PTR(-EBADMSG);
+
+ /*
+ * mlx5_cmd_do() copies the input message to its own buffer before
+ * executing it, so we can reuse the allocation for the output.
+ */
+ if (*out_len <= in_len) {
+ rpc_out = rpc_in;
+ } else {
+ rpc_out = rpc_alloc = kvzalloc(*out_len, GFP_KERNEL);
+ if (!rpc_alloc)
+ return ERR_PTR(-ENOMEM);
+ }
+
+ /* Enforce the user context for the command */
+ MLX5_SET(mbox_in_hdr, rpc_in, uid, mfd->uctx_uid);
+ ret = mlx5_cmd_do(mcdev->mdev, rpc_in, in_len, rpc_out, *out_len);
+
+ mlx5ctl_dbg(mcdev,
+ "[UID %d] cmdif: opcode 0x%x status 0x%x retval %pe\n",
+ mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
+ MLX5_GET(mbox_out_hdr, rpc_out, status), ERR_PTR(ret));
+
+ /*
+ * -EREMOTEIO means execution succeeded and the out is valid,
+ * but an error code was returned inside out. Everything else
+ * means the RPC did not make it to the device.
+ */
+ if (ret && ret != -EREMOTEIO)
+ return ERR_PTR(ret);
+ if (rpc_out == rpc_in)
+ return rpc_in;
+ return_ptr(rpc_alloc);
+}
+
+static const struct fwctl_ops mlx5ctl_ops = {
+ .device_type = FWCTL_DEVICE_TYPE_MLX5,
+ .uctx_size = sizeof(struct mlx5ctl_uctx),
+ .open_uctx = mlx5ctl_open_uctx,
+ .close_uctx = mlx5ctl_close_uctx,
+ .info = mlx5ctl_info,
+ .fw_rpc = mlx5ctl_fw_rpc,
+};
+
+static int mlx5ctl_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+
+{
+ struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+ struct mlx5_core_dev *mdev = madev->mdev;
+ struct mlx5ctl_dev *mcdev __free(mlx5ctl) = fwctl_alloc_device(
+ &mdev->pdev->dev, &mlx5ctl_ops, struct mlx5ctl_dev, fwctl);
+ int ret;
+
+ if (!mcdev)
+ return -ENOMEM;
+
+ mcdev->mdev = mdev;
+
+ ret = fwctl_register(&mcdev->fwctl);
+ if (ret)
+ return ret;
+ auxiliary_set_drvdata(adev, no_free_ptr(mcdev));
+ return 0;
+}
+
+static void mlx5ctl_remove(struct auxiliary_device *adev)
+{
+ struct mlx5ctl_dev *mcdev __free(mlx5ctl) = auxiliary_get_drvdata(adev);
+
+ fwctl_unregister(&mcdev->fwctl);
+}
+
+static const struct auxiliary_device_id mlx5ctl_id_table[] = {
+ {.name = MLX5_ADEV_NAME ".fwctl",},
+ {},
+};
+MODULE_DEVICE_TABLE(auxiliary, mlx5ctl_id_table);
+
+static struct auxiliary_driver mlx5ctl_driver = {
+ .name = "mlx5_fwctl",
+ .probe = mlx5ctl_probe,
+ .remove = mlx5ctl_remove,
+ .id_table = mlx5ctl_id_table,
+};
+
+module_auxiliary_driver(mlx5ctl_driver);
+
+MODULE_IMPORT_NS(FWCTL);
+MODULE_DESCRIPTION("mlx5 ConnectX fwctl driver");
+MODULE_AUTHOR("Saeed Mahameed <saeedm@nvidia.com>");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 8bde0d4416fd55..49a357e1bc713f 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -42,6 +42,7 @@ enum {
enum fwctl_device_type {
FWCTL_DEVICE_TYPE_ERROR = 0,
+ FWCTL_DEVICE_TYPE_MLX5 = 1,
};
/**
diff --git a/include/uapi/fwctl/mlx5.h b/include/uapi/fwctl/mlx5.h
new file mode 100644
index 00000000000000..bcb4602ffdeee4
--- /dev/null
+++ b/include/uapi/fwctl/mlx5.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ *
+ * These are definitions for the command interface for mlx5 HW. mlx5 FW has a
+ * User Context mechanism which allows the FW to understand a security scope.
+ * FWCTL binds each FD to a FW user context and then places the User Context ID
+ * (UID) in each command header. The created User Context has a capability set
+ * that is appropriate for FWCTL's security model.
+ *
+ * Command formation should use a copy of the structs in mlx5_ifc.h following
+ * the Programmers Reference Manual. A open release is available here:
+ *
+ * https://network.nvidia.com/files/doc-2020/ethernet-adapters-programming-manual.pdf
+ *
+ * The device_type for this file is FWCTL_DEVICE_TYPE_MLX5.
+ */
+#ifndef _UAPI_FWCTL_MLX5_H
+#define _UAPI_FWCTL_MLX5_H
+
+#include <linux/types.h>
+
+/**
+ * struct fwctl_info_mlx5 - ioctl(FWCTL_INFO) out_device_data
+ * @uid: The FW UID this FD is bound to. Each command header will force
+ * this value.
+ * @uctx_caps: The FW capabilities that are enabled for the uid.
+ *
+ * Return basic information about the FW interface available.
+ */
+struct fwctl_info_mlx5 {
+ __u32 uid;
+ __u32 uctx_caps;
+};
+
+#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (6 preceding siblings ...)
2024-06-03 15:53 ` [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
@ 2024-06-03 15:53 ` Jason Gunthorpe
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
2024-06-05 3:11 ` Jakub Kicinski
9 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 15:53 UTC (permalink / raw)
To: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
From: Saeed Mahameed <saeedm@nvidia.com>
If the device supports User Context then it can support fwctl. Create an
auxiliary device to allow fwctl to bind to it.
Create a sysfs like:
$ ls /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -l
lrwxrwxrwx 1 root root 0 Apr 25 19:46 /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -> ../../../../bus/auxiliary/drivers/mlx5_fwctl.mlx5_fwctl
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 47e7c2639774fd..6781ddb090c475 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -228,8 +228,14 @@ enum {
MLX5_INTERFACE_PROTOCOL_VNET,
MLX5_INTERFACE_PROTOCOL_DPLL,
+ MLX5_INTERFACE_PROTOCOL_FWCTL,
};
+static bool is_fwctl_supported(struct mlx5_core_dev *dev)
+{
+ return MLX5_CAP_GEN(dev, uctx_cap);
+}
+
static const struct mlx5_adev_device {
const char *suffix;
bool (*is_supported)(struct mlx5_core_dev *dev);
@@ -252,6 +258,8 @@ static const struct mlx5_adev_device {
.is_supported = &is_mp_supported },
[MLX5_INTERFACE_PROTOCOL_DPLL] = { .suffix = "dpll",
.is_supported = &is_dpll_supported },
+ [MLX5_INTERFACE_PROTOCOL_FWCTL] = { .suffix = "fwctl",
+ .is_supported = &is_fwctl_supported },
};
int mlx5_adev_idx_alloc(void)
--
2.45.2
^ permalink raw reply related [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (7 preceding siblings ...)
2024-06-03 15:53 ` [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
@ 2024-06-03 18:42 ` Jakub Kicinski
2024-06-04 3:01 ` David Ahern
2024-06-05 3:11 ` Jakub Kicinski
9 siblings, 1 reply; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-03 18:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Mon, 3 Jun 2024 12:53:16 -0300 Jason Gunthorpe wrote:
> fwctl is a new subsystem intended to bring some common rules and order to
> the growing pattern of exposing a secure FW interface directly to
> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> exposing a device for datapath operations fwctl is focused on debugging,
> configuration and provisioning of the device. It will not have the
> necessary features like interrupt delivery to support a datapath.
If you have debug problems in your subsystem, put the APIs in your
subsystem. Don't force your choices on all the subsystems your device
interacts with:
Nacked-by: Jakub Kicinski <kuba@kernel.org>
Somewhat related, I saw nVidia sells various interesting features in
its DOCA stack. Is that Open Source?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
@ 2024-06-04 3:01 ` David Ahern
2024-06-04 14:04 ` Jakub Kicinski
0 siblings, 1 reply; 73+ messages in thread
From: David Ahern @ 2024-06-04 3:01 UTC (permalink / raw)
To: Jakub Kicinski, Jason Gunthorpe
Cc: Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Dan Williams, Christoph Hellwig,
Jiri Pirko, Leonid Bloch, Leon Romanovsky, linux-cxl, patches
On 6/3/24 12:42 PM, Jakub Kicinski wrote:
> Somewhat related, I saw nVidia sells various interesting features in its
> DOCA stack. Is that Open Source?
Seriously, Jakub, how is that in any way related to this patch set?
You are basically suggesting that if any vendor ever has an out of tree
option for its hardware every patch it sends should be considered a ruse
to enable or simplify proprietary options.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
@ 2024-06-04 9:32 ` Leon Romanovsky
2024-06-04 15:50 ` Jason Gunthorpe
2024-06-04 16:42 ` Randy Dunlap
1 sibling, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2024-06-04 9:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, linux-cxl, patches
On Mon, Jun 03, 2024 at 12:53:17PM -0300, Jason Gunthorpe wrote:
> Create the class, character device and functions for a fwctl driver to
> un/register to the subsystem.
>
> A typical fwctl driver has a sysfs presence like:
>
> $ ls -l /dev/fwctl/fwctl0
> crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0
>
> $ ls /sys/class/fwctl/fwctl0
> dev device power subsystem uevent
>
> $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> ibp0s10f0
>
> $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> fwctl0/
>
> $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> dev device power subsystem uevent
>
> Which allows userspace to link all the multi-subsystem driver components
> together and learn the subsystem specific names for the device's
> components.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> MAINTAINERS | 8 ++
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/fwctl/Kconfig | 9 +++
> drivers/fwctl/Makefile | 4 +
> drivers/fwctl/main.c | 174 +++++++++++++++++++++++++++++++++++++++++
> include/linux/fwctl.h | 68 ++++++++++++++++
> 7 files changed, 266 insertions(+)
> create mode 100644 drivers/fwctl/Kconfig
> create mode 100644 drivers/fwctl/Makefile
> create mode 100644 drivers/fwctl/main.c
> create mode 100644 include/linux/fwctl.h
<...>
> +static struct fwctl_device *
> +_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> +{
> + struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
> +
> + if (!fwctl)
> + return NULL;
<...>
> +/* Drivers use the fwctl_alloc_device() wrapper */
> +struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> + const struct fwctl_ops *ops,
> + size_t size)
> +{
> + struct fwctl_device *fwctl __free(fwctl) =
> + _alloc_device(parent, ops, size);
I'm not a big fan of cleanup.h pattern as it hides important to me
information about memory object lifetime and by "solving" one class of
problems it creates another one.
You didn't check if fwctl is NULL before using it.
> + int devnum;
> +
> + devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> + if (devnum < 0)
> + return NULL;
> + fwctl->dev.devt = fwctl_dev + devnum;
> +
> + cdev_init(&fwctl->cdev, &fwctl_fops);
> + fwctl->cdev.owner = THIS_MODULE;
> +
> + if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
Did you miss ida_free() here?
> + return NULL;
> +
> + fwctl->ops = ops;
> + return_ptr(fwctl);
> +}
> +EXPORT_SYMBOL_NS_GPL(_fwctl_alloc_device, FWCTL);
Thanks
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
@ 2024-06-04 12:16 ` Zhu Yanjun
2024-06-04 12:22 ` Leon Romanovsky
2024-06-05 15:42 ` Przemek Kitszel
1 sibling, 1 reply; 73+ messages in thread
From: Zhu Yanjun @ 2024-06-04 12:16 UTC (permalink / raw)
To: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 03.06.24 17:53, Jason Gunthorpe wrote:
> Each file descriptor gets a chunk of per-FD driver specific context that
> allows the driver to attach a device specific struct to. The core code
> takes care of the memory lifetime for this structure.
>
> The ioctl dispatch and design is based on what was built for iommufd. The
> ioctls have a struct which has a combined in/out behavior with a typical
> 'zero pad' scheme for future extension and backwards compatibility.
>
> Like iommufd some shared logic does most of the ioctl marshalling and
> compatibility work and tables diatches to some function pointers for
> each unique iotcl.
>
> This approach has proven to work quite well in the iommufd and rdma
> subsystems.
>
> Allocate an ioctl number space for the subsystem.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 1 +
> drivers/fwctl/main.c | 124 +++++++++++++++++-
> include/linux/fwctl.h | 31 +++++
> include/uapi/fwctl/fwctl.h | 41 ++++++
> 5 files changed, 196 insertions(+), 2 deletions(-)
> create mode 100644 include/uapi/fwctl/fwctl.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index a141e8e65c5d3a..4d91c5a20b98c8 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -324,6 +324,7 @@ Code Seq# Include File Comments
> 0x97 00-7F fs/ceph/ioctl.h Ceph file system
> 0x99 00-0F 537-Addinboard driver
> <mailto:buk@buks.ipn.de>
> +0x9A 00-0F include/uapi/fwctl/fwctl.h
> 0xA0 all linux/sdp/sdp.h Industrial Device Project
> <mailto:kenji@bitgate.com>
> 0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 833b853808421e..94062161e9c4d7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9084,6 +9084,7 @@ S: Maintained
> F: Documentation/userspace-api/fwctl.rst
> F: drivers/fwctl/
> F: include/linux/fwctl.h
> +F: include/uapi/fwctl/
>
> GALAXYCORE GC0308 CAMERA SENSOR DRIVER
> M: Sebastian Reichel <sre@kernel.org>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index ff9b7bad5a2b0d..7ecdabdd9dcb1e 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -9,26 +9,131 @@
> #include <linux/container_of.h>
> #include <linux/fs.h>
>
> +#include <uapi/fwctl/fwctl.h>
> +
> enum {
> FWCTL_MAX_DEVICES = 256,
> };
> static dev_t fwctl_dev;
> static DEFINE_IDA(fwctl_ida);
>
> +struct fwctl_ucmd {
> + struct fwctl_uctx *uctx;
> + void __user *ubuffer;
> + void *cmd;
> + u32 user_size;
> +};
> +
> +/* On stack memory for the ioctl structs */
> +union ucmd_buffer {
> +};
> +
> +struct fwctl_ioctl_op {
> + unsigned int size;
> + unsigned int min_size;
> + unsigned int ioctl_num;
> + int (*execute)(struct fwctl_ucmd *ucmd);
> +};
> +
> +#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
> + [_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = { \
> + .size = sizeof(_struct) + \
> + BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
> + sizeof(_struct)), \
> + .min_size = offsetofend(_struct, _last), \
> + .ioctl_num = _ioctl, \
> + .execute = _fn, \
> + }
> +static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
> +};
> +
> +static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct fwctl_uctx *uctx = filp->private_data;
> + const struct fwctl_ioctl_op *op;
> + struct fwctl_ucmd ucmd = {};
> + union ucmd_buffer buf;
> + unsigned int nr;
> + int ret;
> +
> + nr = _IOC_NR(cmd);
> + if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
> + return -ENOIOCTLCMD;
> + op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
> + if (op->ioctl_num != cmd)
> + return -ENOIOCTLCMD;
> +
> + ucmd.uctx = uctx;
> + ucmd.cmd = &buf;
> + ucmd.ubuffer = (void __user *)arg;
> + ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
> + if (ret)
> + return ret;
> +
> + if (ucmd.user_size < op->min_size)
> + return -EINVAL;
> +
> + ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
> + ucmd.user_size);
> + if (ret)
> + return ret;
> +
> + guard(rwsem_read)(&uctx->fwctl->registration_lock);
> + if (!uctx->fwctl->ops)
> + return -ENODEV;
> + return op->execute(&ucmd);
> +}
> +
> static int fwctl_fops_open(struct inode *inode, struct file *filp)
> {
> struct fwctl_device *fwctl =
> container_of(inode->i_cdev, struct fwctl_device, cdev);
> + struct fwctl_uctx *uctx __free(kfree) = NULL;
> + int ret;
> +
> + guard(rwsem_read)(&fwctl->registration_lock);
> + if (!fwctl->ops)
> + return -ENODEV;
> +
> + uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
> + if (!uctx)
> + return -ENOMEM;
> +
> + uctx->fwctl = fwctl;
> + ret = fwctl->ops->open_uctx(uctx);
> + if (ret)
> + return ret;
When something is wrong, uctx is freed in "fwctl->ops->open_uctx(uctx);"?
If not, the allocated memory uctx leaks here.
Zhu Yanjun
> +
> + scoped_guard(mutex, &fwctl->uctx_list_lock) {
> + list_add_tail(&uctx->uctx_list_entry, &fwctl->uctx_list);
> + }
>
> get_device(&fwctl->dev);
> - filp->private_data = fwctl;
> + filp->private_data = no_free_ptr(uctx);
> return 0;
> }
>
> +static void fwctl_destroy_uctx(struct fwctl_uctx *uctx)
> +{
> + lockdep_assert_held(&uctx->fwctl->uctx_list_lock);
> + list_del(&uctx->uctx_list_entry);
> + uctx->fwctl->ops->close_uctx(uctx);
> +}
> +
> static int fwctl_fops_release(struct inode *inode, struct file *filp)
> {
> - struct fwctl_device *fwctl = filp->private_data;
> + struct fwctl_uctx *uctx = filp->private_data;
> + struct fwctl_device *fwctl = uctx->fwctl;
>
> + scoped_guard(rwsem_read, &fwctl->registration_lock) {
> + if (fwctl->ops) {
> + guard(mutex)(&fwctl->uctx_list_lock);
> + fwctl_destroy_uctx(uctx);
> + }
> + }
> +
> + kfree(uctx);
> fwctl_put(fwctl);
> return 0;
> }
> @@ -37,6 +142,7 @@ static const struct file_operations fwctl_fops = {
> .owner = THIS_MODULE,
> .open = fwctl_fops_open,
> .release = fwctl_fops_release,
> + .unlocked_ioctl = fwctl_fops_ioctl,
> };
>
> static void fwctl_device_release(struct device *device)
> @@ -46,6 +152,7 @@ static void fwctl_device_release(struct device *device)
>
> if (fwctl->dev.devt)
> ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
> + mutex_destroy(&fwctl->uctx_list_lock);
> kfree(fwctl);
> }
>
> @@ -69,6 +176,9 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> return NULL;
> fwctl->dev.class = &fwctl_class;
> fwctl->dev.parent = parent;
> + init_rwsem(&fwctl->registration_lock);
> + mutex_init(&fwctl->uctx_list_lock);
> + INIT_LIST_HEAD(&fwctl->uctx_list);
> device_initialize(&fwctl->dev);
> return_ptr(fwctl);
> }
> @@ -134,8 +244,18 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
> */
> void fwctl_unregister(struct fwctl_device *fwctl)
> {
> + struct fwctl_uctx *uctx;
> +
> cdev_device_del(&fwctl->cdev, &fwctl->dev);
>
> + /* Disable and free the driver's resources for any still open FDs. */
> + guard(rwsem_write)(&fwctl->registration_lock);
> + guard(mutex)(&fwctl->uctx_list_lock);
> + while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
> + struct fwctl_uctx,
> + uctx_list_entry)))
> + fwctl_destroy_uctx(uctx);
> +
> /*
> * The driver module may unload after this returns, the op pointer will
> * not be valid.
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index ef4eaa87c945e4..1d9651de92fc19 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -11,7 +11,20 @@
> struct fwctl_device;
> struct fwctl_uctx;
>
> +/**
> + * struct fwctl_ops - Driver provided operations
> + * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
> + * bytes of this memory will be a fwctl_uctx. The driver can use the
> + * remaining bytes as its private memory.
> + * @open_uctx: Called when a file descriptor is opened before the uctx is ever
> + * used.
> + * @close_uctx: Called when the uctx is destroyed, usually when the FD is
> + * closed.
> + */
> struct fwctl_ops {
> + size_t uctx_size;
> + int (*open_uctx)(struct fwctl_uctx *uctx);
> + void (*close_uctx)(struct fwctl_uctx *uctx);
> };
>
> /**
> @@ -26,6 +39,10 @@ struct fwctl_device {
> struct device dev;
> /* private: */
> struct cdev cdev;
> +
> + struct rw_semaphore registration_lock;
> + struct mutex uctx_list_lock;
> + struct list_head uctx_list;
> const struct fwctl_ops *ops;
> };
>
> @@ -65,4 +82,18 @@ DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
> int fwctl_register(struct fwctl_device *fwctl);
> void fwctl_unregister(struct fwctl_device *fwctl);
>
> +/**
> + * struct fwctl_uctx - Per user FD context
> + * @fwctl: fwctl instance that owns the context
> + *
> + * Every FD opened by userspace will get a unique context allocation. Any driver
> + * private data will follow immediately after.
> + */
> +struct fwctl_uctx {
> + struct fwctl_device *fwctl;
> + /* private: */
> + /* Head at fwctl_device::uctx_list */
> + struct list_head uctx_list_entry;
> +};
> +
> #endif
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> new file mode 100644
> index 00000000000000..0bdce95b6d69d9
> --- /dev/null
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
> + */
> +#ifndef _UAPI_FWCTL_H
> +#define _UAPI_FWCTL_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define FWCTL_TYPE 0x9A
> +
> +/**
> + * DOC: General ioctl format
> + *
> + * The ioctl interface follows a general format to allow for extensibility. Each
> + * ioctl is passed in a structure pointer as the argument providing the size of
> + * the structure in the first u32. The kernel checks that any structure space
> + * beyond what it understands is 0. This allows userspace to use the backward
> + * compatible portion while consistently using the newer, larger, structures.
> + *
> + * ioctls use a standard meaning for common errnos:
> + *
> + * - ENOTTY: The IOCTL number itself is not supported at all
> + * - E2BIG: The IOCTL number is supported, but the provided structure has
> + * non-zero in a part the kernel does not understand.
> + * - EOPNOTSUPP: The IOCTL number is supported, and the structure is
> + * understood, however a known field has a value the kernel does not
> + * understand or support.
> + * - EINVAL: Everything about the IOCTL was understood, but a field is not
> + * correct.
> + * - ENOMEM: Out of memory.
> + * - ENODEV: The underlying device has been hot-unplugged and the FD is
> + * orphaned.
> + *
> + * As well as additional errnos, within specific ioctls.
> + */
> +enum {
> + FWCTL_CMD_BASE = 0,
> +};
> +
> +#endif
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-04 12:16 ` Zhu Yanjun
@ 2024-06-04 12:22 ` Leon Romanovsky
2024-06-04 16:50 ` Jonathan Cameron
0 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2024-06-04 12:22 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
linux-cxl, patches
On Tue, Jun 04, 2024 at 02:16:12PM +0200, Zhu Yanjun wrote:
> On 03.06.24 17:53, Jason Gunthorpe wrote:
> > Each file descriptor gets a chunk of per-FD driver specific context that
> > allows the driver to attach a device specific struct to. The core code
> > takes care of the memory lifetime for this structure.
> >
> > The ioctl dispatch and design is based on what was built for iommufd. The
> > ioctls have a struct which has a combined in/out behavior with a typical
> > 'zero pad' scheme for future extension and backwards compatibility.
> >
> > Like iommufd some shared logic does most of the ioctl marshalling and
> > compatibility work and tables diatches to some function pointers for
> > each unique iotcl.
> >
> > This approach has proven to work quite well in the iommufd and rdma
> > subsystems.
> >
> > Allocate an ioctl number space for the subsystem.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> > .../userspace-api/ioctl/ioctl-number.rst | 1 +
> > MAINTAINERS | 1 +
> > drivers/fwctl/main.c | 124 +++++++++++++++++-
> > include/linux/fwctl.h | 31 +++++
> > include/uapi/fwctl/fwctl.h | 41 ++++++
> > 5 files changed, 196 insertions(+), 2 deletions(-)
> > create mode 100644 include/uapi/fwctl/fwctl.h
<...>
> > static int fwctl_fops_open(struct inode *inode, struct file *filp)
> > {
> > struct fwctl_device *fwctl =
> > container_of(inode->i_cdev, struct fwctl_device, cdev);
> > + struct fwctl_uctx *uctx __free(kfree) = NULL;
> > + int ret;
> > +
> > + guard(rwsem_read)(&fwctl->registration_lock);
> > + if (!fwctl->ops)
> > + return -ENODEV;
> > +
> > + uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
> > + if (!uctx)
> > + return -ENOMEM;
> > +
> > + uctx->fwctl = fwctl;
> > + ret = fwctl->ops->open_uctx(uctx);
> > + if (ret)
> > + return ret;
>
> When something is wrong, uctx is freed in "fwctl->ops->open_uctx(uctx);"?
>
> If not, the allocated memory uctx leaks here.
See how uctx is declared:
struct fwctl_uctx *uctx __free(kfree) = NULL;
It will be released automatically.
See include/linux/cleanup.h for more details.
Thanks
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 3:01 ` David Ahern
@ 2024-06-04 14:04 ` Jakub Kicinski
2024-06-04 21:28 ` Saeed Mahameed
` (2 more replies)
0 siblings, 3 replies; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-04 14:04 UTC (permalink / raw)
To: David Ahern
Cc: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote:
> On 6/3/24 12:42 PM, Jakub Kicinski wrote:
> > Somewhat related, I saw nVidia sells various interesting features in its
> > DOCA stack. Is that Open Source?
>
> Seriously, Jakub, how is that in any way related to this patch set?
Whether they admit it or not, DOCA is a major reason nVidia wants
this to be standalone rather than part of RDMA.
> You are basically suggesting that if any vendor ever has an out of tree
> option for its hardware every patch it sends should be considered a ruse
> to enable or simplify proprietary options.
Ooo, is that a sore spot?
I don't begrudge anyone building proprietary options, but leave
upstream out of it.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-04 9:32 ` Leon Romanovsky
@ 2024-06-04 15:50 ` Jason Gunthorpe
2024-06-04 17:05 ` Jonathan Cameron
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-04 15:50 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, linux-cxl, patches
On Tue, Jun 04, 2024 at 12:32:19PM +0300, Leon Romanovsky wrote:
> > +static struct fwctl_device *
> > +_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> > +{
> > + struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
> > +
> > + if (!fwctl)
> > + return NULL;
>
> <...>
>
> > +/* Drivers use the fwctl_alloc_device() wrapper */
> > +struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> > + const struct fwctl_ops *ops,
> > + size_t size)
> > +{
> > + struct fwctl_device *fwctl __free(fwctl) =
> > + _alloc_device(parent, ops, size);
>
> I'm not a big fan of cleanup.h pattern as it hides important to me
> information about memory object lifetime and by "solving" one class of
> problems it creates another one.
I'm trying it here. One of the most common bugs I end up fixing is
error unwind and cleanup.h has successfully removed all of it. Let's
find out, others thought it was a good idea to add the infrastructure.
One thing that seems clear in my work here is that you should not use
cleanup.h if you don't have simple memory lifetime, like the above
case where the memory is freed if the function fails.
> You didn't check if fwctl is NULL before using it.
Oops, yes
> > + int devnum;
> > +
> > + devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> > + if (devnum < 0)
> > + return NULL;
> > + fwctl->dev.devt = fwctl_dev + devnum;
> > +
> > + cdev_init(&fwctl->cdev, &fwctl_fops);
> > + fwctl->cdev.owner = THIS_MODULE;
> > +
> > + if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
>
> Did you miss ida_free() here?
No, the put_device() does it in the release function. The __free
always calls fwctl_put()/put_device() on failure, and within all
functions except _alloc_device() the put_device() is the correct way
to free this memory.
Thanks,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-06-04 9:32 ` Leon Romanovsky
@ 2024-06-04 16:42 ` Randy Dunlap
2024-06-04 16:44 ` Jason Gunthorpe
1 sibling, 1 reply; 73+ messages in thread
From: Randy Dunlap @ 2024-06-04 16:42 UTC (permalink / raw)
To: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 6/3/24 8:53 AM, Jason Gunthorpe wrote:
> diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
> new file mode 100644
> index 00000000000000..6ceee3347ae642
> --- /dev/null
> +++ b/drivers/fwctl/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menuconfig FWCTL
> + tristate "fwctl device firmware access framework"
Use tab above instead of spaces for indentation, please.
> + help
> + fwctl provides a userspace API for restricted access to communicate
> + with on-device firmware. The communication channel is intended to
> + support a wide range of lockdown compatible device behaviors including
> + manipulating device FLASH, debugging, and other activities that don't
> + fit neatly into an existing subsystem.
--
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-04 16:42 ` Randy Dunlap
@ 2024-06-04 16:44 ` Jason Gunthorpe
0 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-04 16:44 UTC (permalink / raw)
To: Randy Dunlap
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Tue, Jun 04, 2024 at 09:42:50AM -0700, Randy Dunlap wrote:
>
>
> On 6/3/24 8:53 AM, Jason Gunthorpe wrote:
> > diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
> > new file mode 100644
> > index 00000000000000..6ceee3347ae642
> > --- /dev/null
> > +++ b/drivers/fwctl/Kconfig
> > @@ -0,0 +1,9 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menuconfig FWCTL
> > + tristate "fwctl device firmware access framework"
>
> Use tab above instead of spaces for indentation, please.
Thanks, done. Bit surprised checkpatch didn't flag it..
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-04 12:22 ` Leon Romanovsky
@ 2024-06-04 16:50 ` Jonathan Cameron
2024-06-04 16:58 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Cameron @ 2024-06-04 16:50 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Zhu Yanjun, Jason Gunthorpe, Jonathan Corbet, Itay Avraham,
Jakub Kicinski, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, David Ahern, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, linux-cxl, patches
On Tue, 4 Jun 2024 15:22:21 +0300
Leon Romanovsky <leon@kernel.org> wrote:
> On Tue, Jun 04, 2024 at 02:16:12PM +0200, Zhu Yanjun wrote:
> > On 03.06.24 17:53, Jason Gunthorpe wrote:
> > > Each file descriptor gets a chunk of per-FD driver specific context that
> > > allows the driver to attach a device specific struct to. The core code
> > > takes care of the memory lifetime for this structure.
> > >
> > > The ioctl dispatch and design is based on what was built for iommufd. The
> > > ioctls have a struct which has a combined in/out behavior with a typical
> > > 'zero pad' scheme for future extension and backwards compatibility.
> > >
> > > Like iommufd some shared logic does most of the ioctl marshalling and
> > > compatibility work and tables diatches to some function pointers for
> > > each unique iotcl.
> > >
> > > This approach has proven to work quite well in the iommufd and rdma
> > > subsystems.
> > >
> > > Allocate an ioctl number space for the subsystem.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > ---
> > > .../userspace-api/ioctl/ioctl-number.rst | 1 +
> > > MAINTAINERS | 1 +
> > > drivers/fwctl/main.c | 124 +++++++++++++++++-
> > > include/linux/fwctl.h | 31 +++++
> > > include/uapi/fwctl/fwctl.h | 41 ++++++
> > > 5 files changed, 196 insertions(+), 2 deletions(-)
> > > create mode 100644 include/uapi/fwctl/fwctl.h
>
> <...>
>
> > > static int fwctl_fops_open(struct inode *inode, struct file *filp)
> > > {
> > > struct fwctl_device *fwctl =
> > > container_of(inode->i_cdev, struct fwctl_device, cdev);
> > > + struct fwctl_uctx *uctx __free(kfree) = NULL;
> > > + int ret;
> > > +
> > > + guard(rwsem_read)(&fwctl->registration_lock);
> > > + if (!fwctl->ops)
> > > + return -ENODEV;
> > > +
> > > + uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
> > > + if (!uctx)
> > > + return -ENOMEM;
> > > +
> > > + uctx->fwctl = fwctl;
> > > + ret = fwctl->ops->open_uctx(uctx);
> > > + if (ret)
> > > + return ret;
> >
> > When something is wrong, uctx is freed in "fwctl->ops->open_uctx(uctx);"?
> >
> > If not, the allocated memory uctx leaks here.
>
> See how uctx is declared:
> struct fwctl_uctx *uctx __free(kfree) = NULL;
>
> It will be released automatically.
> See include/linux/cleanup.h for more details.
I'm lazy so not finding the discussion now, but Linus has been pretty clear
that he doesn't like this pattern because of possibility of additional cleanup
magic getting introduced and then the cleanup happening in an order that
causes problems.
Preferred option is drag the declaration to where is initialized so break
with our tradition of declarations all at the top
struct fwctl_uctx *uctx __free(kfree) =
kzalloc(...);
etc
>
> Thanks
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-04 16:50 ` Jonathan Cameron
@ 2024-06-04 16:58 ` Jason Gunthorpe
2024-06-05 11:07 ` Jonathan Cameron
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-04 16:58 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Leon Romanovsky, Zhu Yanjun, Jonathan Corbet, Itay Avraham,
Jakub Kicinski, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, David Ahern, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, linux-cxl, patches
On Tue, Jun 04, 2024 at 05:50:23PM +0100, Jonathan Cameron wrote:
> > > > static int fwctl_fops_open(struct inode *inode, struct file *filp)
> > > > {
> > > > struct fwctl_device *fwctl =
> > > > container_of(inode->i_cdev, struct fwctl_device, cdev);
> > > > + struct fwctl_uctx *uctx __free(kfree) = NULL;
> > > > + int ret;
> > > > +
> > > > + guard(rwsem_read)(&fwctl->registration_lock);
> > > > + if (!fwctl->ops)
> > > > + return -ENODEV;
> > > > +
> > > > + uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
> > > > + if (!uctx)
> > > > + return -ENOMEM;
> > > > +
> > > > + uctx->fwctl = fwctl;
> > > > + ret = fwctl->ops->open_uctx(uctx);
> > > > + if (ret)
> > > > + return ret;
> > >
> > > When something is wrong, uctx is freed in "fwctl->ops->open_uctx(uctx);"?
> > >
> > > If not, the allocated memory uctx leaks here.
> >
> > See how uctx is declared:
> > struct fwctl_uctx *uctx __free(kfree) = NULL;
> >
> > It will be released automatically.
> > See include/linux/cleanup.h for more details.
>
> I'm lazy so not finding the discussion now, but Linus has been pretty clear
> that he doesn't like this pattern because of possibility of additional cleanup
> magic getting introduced and then the cleanup happening in an order that
> causes problems.
I saw that discussion, but I thought it was talking about the macro
behavior - ie guard() creates a variable hidden in the macro.
The point about order is interesting though - notice the above will
free the uctx after unlocking (which is the slightly more preferred
order here), but it is easy to imagine cases where that order would be
wrong.
> Preferred option is drag the declaration to where is initialized so break
> with our tradition of declarations all at the top
>
> struct fwctl_uctx *uctx __free(kfree) =
> kzalloc(...);
I don't recall that dramatic conclusion in the discussion, but it does
make alot of sense to me.
Thanks,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-04 15:50 ` Jason Gunthorpe
@ 2024-06-04 17:05 ` Jonathan Cameron
2024-06-04 18:52 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Cameron @ 2024-06-04 17:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
linux-cxl, patches
On Tue, 4 Jun 2024 12:50:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Tue, Jun 04, 2024 at 12:32:19PM +0300, Leon Romanovsky wrote:
> > > +static struct fwctl_device *
> > > +_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> > > +{
> > > + struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
> > > +
> > > + if (!fwctl)
> > > + return NULL;
> >
> > <...>
> >
> > > +/* Drivers use the fwctl_alloc_device() wrapper */
> > > +struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> > > + const struct fwctl_ops *ops,
> > > + size_t size)
> > > +{
> > > + struct fwctl_device *fwctl __free(fwctl) =
> > > + _alloc_device(parent, ops, size);
> >
> > I'm not a big fan of cleanup.h pattern as it hides important to me
> > information about memory object lifetime and by "solving" one class of
> > problems it creates another one.
>
> I'm trying it here. One of the most common bugs I end up fixing is
> error unwind and cleanup.h has successfully removed all of it. Let's
> find out, others thought it was a good idea to add the infrastructure.
>
> One thing that seems clear in my work here is that you should not use
> cleanup.h if you don't have simple memory lifetime, like the above
> case where the memory is freed if the function fails.
>
> > You didn't check if fwctl is NULL before using it.
>
> Oops, yes
>
> > > + int devnum;
> > > +
> > > + devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> > > + if (devnum < 0)
> > > + return NULL;
> > > + fwctl->dev.devt = fwctl_dev + devnum;
> > > +
> > > + cdev_init(&fwctl->cdev, &fwctl_fops);
> > > + fwctl->cdev.owner = THIS_MODULE;
> > > +
> > > + if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
> >
> > Did you miss ida_free() here?
>
> No, the put_device() does it in the release function. The __free
> always calls fwctl_put()/put_device() on failure, and within all
> functions except _alloc_device() the put_device() is the correct way
> to free this memory.
The conditional handling of the ida having been allocated or not is a bit ugly
as I think it's just papering over this corner case.
Can fwctl_dev and devnum both be zero? In practice no, but is that guaranteed
for all time? Maybe...
We got some kick back from Linus a while back in CXL and the outcome was
a few more helpers rather than too much cleverness in the use of __free.
Trick for this is often to define a small function that allocates both the
ida and the device. With in that micro function handle the one error path
or if you only have two things to do, you can use __free() for the allocation.
Something like
static struct fwctl_device *__alloc_device_and_devt(sizet_t size)
{
struct fw_ctl_device *fwctl;
int devnum;
fwctl = ida_alloc_max(&fwct ...);
if (!fwctl)
return NULL;
devnum = ida_alloc_max(&fwct ...);
if (devnum < 0) {
kfree(fwctl);
return NULL;
}
fwctl->dev.devt = fwctl_Ddev + devnum;
reutrn fwctl;
}
Then call device_initialize() on the returned structure ->dev as you know
you ida and the containing structure are both in a state where the put_device()
call doesn't need conditions on 'how initialized' it is.
Still, maybe the ugly is fine.
>
> Thanks,
> Jason
>
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-04 17:05 ` Jonathan Cameron
@ 2024-06-04 18:52 ` Jason Gunthorpe
2024-06-05 11:08 ` Jonathan Cameron
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-04 18:52 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Leon Romanovsky, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
linux-cxl, patches
On Tue, Jun 04, 2024 at 06:05:55PM +0100, Jonathan Cameron wrote:
> Trick for this is often to define a small function that allocates both the
> ida and the device. With in that micro function handle the one error path
> or if you only have two things to do, you can use __free() for the allocation.
This style is already followed here, the _alloc_device() is the
function that does everything before starting reference counting (IMHO
it is the best pattern to use). If we move the ida allocation to that
function then the if inside release is not needed.
Like this:
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index d25b5eb3aee73c..a26697326e6ced 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -267,8 +267,7 @@ static void fwctl_device_release(struct device *device)
struct fwctl_device *fwctl =
container_of(device, struct fwctl_device, dev);
- if (fwctl->dev.devt)
- ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+ ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
mutex_destroy(&fwctl->uctx_list_lock);
kfree(fwctl);
}
@@ -288,6 +287,7 @@ static struct fwctl_device *
_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
{
struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
+ int devnum;
if (!fwctl)
return NULL;
@@ -296,6 +296,12 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
init_rwsem(&fwctl->registration_lock);
mutex_init(&fwctl->uctx_list_lock);
INIT_LIST_HEAD(&fwctl->uctx_list);
+
+ devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
+ if (devnum < 0)
+ return NULL;
+ fwctl->dev.devt = fwctl_dev + devnum;
+
device_initialize(&fwctl->dev);
return_ptr(fwctl);
}
@@ -307,16 +313,10 @@ struct fwctl_device *_fwctl_alloc_device(struct device *parent,
{
struct fwctl_device *fwctl __free(fwctl) =
_alloc_device(parent, ops, size);
- int devnum;
if (!fwctl)
return NULL;
- devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
- if (devnum < 0)
- return NULL;
- fwctl->dev.devt = fwctl_dev + devnum;
-
cdev_init(&fwctl->cdev, &fwctl_fops);
fwctl->cdev.owner = THIS_MODULE;
Jason
^ permalink raw reply related [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 14:04 ` Jakub Kicinski
@ 2024-06-04 21:28 ` Saeed Mahameed
2024-06-04 22:32 ` Jakub Kicinski
2024-06-04 23:56 ` Dan Williams
2024-06-06 1:58 ` David Ahern
2 siblings, 1 reply; 73+ messages in thread
From: Saeed Mahameed @ 2024-06-04 21:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Ahern, Jason Gunthorpe, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On 04 Jun 07:04, Jakub Kicinski wrote:
>On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote:
>> On 6/3/24 12:42 PM, Jakub Kicinski wrote:
>> > Somewhat related, I saw nVidia sells various interesting features in its
>> > DOCA stack. Is that Open Source?
>>
>> Seriously, Jakub, how is that in any way related to this patch set?
>
>Whether they admit it or not, DOCA is a major reason nVidia wants
>this to be standalone rather than part of RDMA.
>
No, DOCA isn't on the agenda for this new interface. But what is the point
in arguing? Apparently the vendor is not credible enough in your opinion.
Which is an absolute outrageous grounds for a NAK.
Anyway I don't see your point in bringing up DOCA here, but obviously once
this interface is accepted, all developers are welcome to use it,
including DOCA developers of course..
That being said, the why we need this is crystal clear in the
cover-letter and previous submission discussions, bringing random SDKs
into this discussion is not objective and counter productive to the
technical discussion.
>> You are basically suggesting that if any vendor ever has an out of tree
>> option for its hardware every patch it sends should be considered a ruse
>> to enable or simplify proprietary options.
>
It's apparent that you're attributing sinister agendas to patchsets when
you fail to offer valid technical opinions regarding the NAK nature. Let's
address this outside of this patchset, as this isn't the first occurrence.
Consistency in evaluating patches is crucial; some, like the fbnic and
idpf, seem to go unquestioned, while others face scrutiny.
>Ooo, is that a sore spot?
>
>I don't begrudge anyone building proprietary options, but leave
>upstream out of it.
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 21:28 ` Saeed Mahameed
@ 2024-06-04 22:32 ` Jakub Kicinski
2024-06-05 14:50 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-04 22:32 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David Ahern, Jason Gunthorpe, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Tue, 4 Jun 2024 14:28:05 -0700 Saeed Mahameed wrote:
> On 04 Jun 07:04, Jakub Kicinski wrote:
> >On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote:
> >> Seriously, Jakub, how is that in any way related to this patch set?
> >
> >Whether they admit it or not, DOCA is a major reason nVidia wants
> >this to be standalone rather than part of RDMA.
>
> No, DOCA isn't on the agenda for this new interface. But what is the point
> in arguing?
I'm not arguing any point, we argued enough. But you failed to disclose
that DOCA is very likely user of this interface. So whoever you're
planning to submit it to should know.
DOCA was top of mind for me because I noticed it has PSP support, and
I wanted to take a look at the implementation.
> Apparently the vendor is not credible enough in your opinion.
You're creating an interface where you depend on a pinky promise from
a black box that the RPC is not a write. I trust you personally not to
write a patch which abuses this interface. But this cannot possibly
extend to all developers, most of who just want to ship features.
> Which is an absolute outrageous grounds for a NAK.
>
> Anyway I don't see your point in bringing up DOCA here, but obviously once
> this interface is accepted, all developers are welcome to use it,
> including DOCA developers of course..
Of course.
> That being said, the why we need this is crystal clear in the
> cover-letter and previous submission discussions, bringing random SDKs
> into this discussion is not objective and counter productive to the
> technical discussion.
>
> >> You are basically suggesting that if any vendor ever has an out of tree
> >> option for its hardware every patch it sends should be considered a ruse
> >> to enable or simplify proprietary options.
>
> It's apparent that you're attributing sinister agendas to patchsets when
> you fail to offer valid technical opinions regarding the NAK nature. Let's
> address this outside of this patchset, as this isn't the first occurrence.
> Consistency in evaluating patches is crucial;
Exactly :| Netdev people, including multiple prominent developers from
Mellanox/nVidia have been nacking SDK interfaces in Linux networking
for 20 years. How are we going to look to all the companies which have
been doing IPUs for over a decade if we change the rules for nVidia?
> some, like the fbnic and idpf, seem to go unquestioned, while others
> face scrutiny.
fbnic got a nack for any core changes or uAPI not used by other drivers.
idpf got a nack for pretending to be a standard.
You keep saying that I'm nacking your interface because I have some
hatred and distrust for you or nVidia. I really, really don't.
Any vendor posting this would get exactly the same nack from me.
If by "let's address this outside of this patchset" you mean that we
should have a discussion about maintainer favoritism, and subsystem
capture by vendors - you have my full support!
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 14:04 ` Jakub Kicinski
2024-06-04 21:28 ` Saeed Mahameed
@ 2024-06-04 23:56 ` Dan Williams
2024-06-05 3:05 ` Jakub Kicinski
` (2 more replies)
2024-06-06 1:58 ` David Ahern
2 siblings, 3 replies; 73+ messages in thread
From: Dan Williams @ 2024-06-04 23:56 UTC (permalink / raw)
To: Jakub Kicinski, David Ahern
Cc: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Jakub Kicinski wrote:
[..]
> I don't begrudge anyone building proprietary options, but leave
> upstream out of it.
So I am of 2 minds here. In general, how is upstream benefited by
requiring every vendor command to be wrapped by a Linux command?
Mind you, I am coming at this from the perspective of being a maintainer
of a subsystem that does *not* allow unrestricted vendor commands. Since
day one, the CXL subsystem has matched netdev's general sentiment and
been more restrictive than NVMe. It places all vendor commands and even
all yet-to-be-Linux-wrapped-standard-commands behind a
CONFIG_CXL_MEM_RAW_COMMANDS option. That default-off option, when
enabled, allows any command to be sent but it taints the kernel with a
WARN(). CXL devices theoretically allow direct manipulation of system
memory without IOMMU protection which is in contrast to NVMe which would
need to work harder to violate kernel-lockdown protections.
The expectation that I laid out here [1] is based on the observation
that a significant portion of the vendor commands these devices support
are for pre-release hardware qualification and debug flows. The
recommendation to device vendors was "if you need wide distribution of
kernels that allow unrestricted vendor passthrough, work with Linux
distributions to enable this option in debug kernels, run those debug
kernels for your pre-release hardware flows, ignore the warnings".
3 years on from that recommendation it seems no vendor has even needed
that level of distribution help. I.e. checking a few distro kernels
(Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in
their debug builds. I can only assume that locally compiled custom
kernel binaries are filling the need.
So all seems quiet with current restriction for CXL endpoint vendor
commands, but this stance was recently challenged in this thread [2] by
CXL switch vendors with an assertion that fabric switch configuration
has need for more and varied vendor flows than endpoint configuration.
While I am not clear on the veracity of that claim, it at least
challenged me to do the thought experiment of "what would it look like
to relax the CXL command restriction?". Maybe we can come up with a
community answer to the "so you want to build a
userpace-to-device-firmware tunnel?" to at least get all the various
concerns documented in one place, and provide guidance for how device
vendors should navigate this space across subsystems. Between NVMe
"allow all the things", CXL "allow all the things only after tainting
the kernel", and the "never allow vendor passthrough" position (I am
sure there are other nuanced positions) it at least seems useful to
document the concerns. Here is a start for that guidance from the CXL
perspective:
* Integrity: Subsystem has a responsibility to meet kernel-lockdown
expectations:
Distros and system owners need to be assured that root's ability to
modify the running kernel image are mitigated. For CXL there are 2 ways
to do this, require Linux wrapper commands for all the low level
commands (status quo), or a new trust the device to publish which
commands have user data effects in something CXL calls the "Command
Effects Log". In that "trust Command Effects" scenario the kernel still
has no idea what the command is actually doing, but it can at least
assert that the device does not claim that the command changes the
contents of system-memory. Now, you might say, "the device can just
lie", but that betrays a conceit of the kernel restriction. A device
could lie that a Linux wrapped command when passed certain payloads does
not in turn proxy to a restricted command. So at some point there is
almost always an out-of-tree way to get around the kernel restriction,
so the question is are we better off giving a blessed path or force
vendors into ugly out-of-tree workarounds?
* Introspection / validation: Subsystem community needs to be able to
audit behavior after the fact.
To me this means even if the kernel is letting a command through based
on the stated Command Effect of "Configuration Change after Cold Reset"
upstream community has a need to be able to read the vendor
specification for that command. I.e. commands might be vendor-specific,
but never vendor-private. I see this as similar to the requirement for
open source userspace for sophisticated accelerators.
* Collaboration: open standards support open driver maintenance.
Without standards we end up with awkward situations like Confidential
Computing where every vendor races to implement the same functionality
in arbitrarily different and vendor specific ways.
For CXL devices, and I believe the devices fwctl is targeting, there
are a whole class of commands for vendor specific configuration and
debug. Commands that the kernel really need not worry about.
Some subsystems may want to allow high-performance science experiments
like what NVMe allows, but it seems worth asking the question if
standardizing device configuration and debug is really the best use of
upstream's limited time?
One of the release valves in the CXL space is openly specified
commands with opaque payloads, like "Read Vendor Debug Log". That is
clear what it does, likely a payload the kernel need never worry
about, and the "Command Effects" is empty. However, going forward there
is a new class of commands called "Set/Get Feature" that allow a wide
range of vendor toggles to be deployed which will need an upstream
response for the driver policy to vendor-specific "Features".
So if fwctl, or something like it, can strike a balance of enforcing
integrity and introspection while encouraging collaboration on the
aspects that are worth upstream collaboration, I think that is a
conversation worth having.
[1]: http://lore.kernel.org/r/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@mail.gmail.com/
[2]: http://lore.kernel.org/r/20240321174423.00007e0d@Huawei.com
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 6/8] fwctl: Add documentation
2024-06-03 15:53 ` [PATCH 6/8] fwctl: Add documentation Jason Gunthorpe
@ 2024-06-05 2:31 ` Randy Dunlap
2024-06-05 16:03 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Randy Dunlap @ 2024-06-05 2:31 UTC (permalink / raw)
To: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 6/3/24 8:53 AM, Jason Gunthorpe wrote:
> Document the purpose and rules for the fwctl subsystem.
>
> Link in kdocs to the doc tree.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> Documentation/userspace-api/fwctl.rst | 269 ++++++++++++++++++++++++++
> Documentation/userspace-api/index.rst | 1 +
> 2 files changed, 270 insertions(+)
> create mode 100644 Documentation/userspace-api/fwctl.rst
>
> diff --git a/Documentation/userspace-api/fwctl.rst b/Documentation/userspace-api/fwctl.rst
> new file mode 100644
> index 00000000000000..630e75a91838f0
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl.rst
> @@ -0,0 +1,269 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============
> +fwctl subsystem
> +===============
> +
> +:Author: Jason Gunthorpe
> +
> +Overview
> +========
> +
> +Modern devices contain extensive amounts of FW, and in many cases, are largely
> +software defined pieces of hardware. The evolution of this approach is largely a
software-defined
> +reaction to Moore's Law where a chip tape out is now highly expensive, and the
> +chip design is extremely large. Replacing fixed HW logic with a flexible and
> +tightly coupled FW/HW combination is an effective risk mitigation against chip
> +respin. Problems in the HW design can be counteracted in device FW. This is
> +especially true for devices which present a stable and backwards compatible
> +interface to the operating system driver (such as NVMe).
> +
> +The FW layer in devices has grown to incredible sizes and devices frequently
> +integrate clusters of fast processors to run it. For example, mlx5 devices have
> +over 30MB of FW code, and big configurations operate with over 1GB of FW managed
> +runtime state.
> +
> +The availability of such a flexible layer has created quite a variety in the
> +industry where single pieces of silicon are now configurable software defined
software-defined
> +devices and can operate in substantially different ways depending on the need.
> +Further we often see cases where specific sites wish to operate devices in ways
Further,
like in the next paragraph.
> +that are highly specialized and require applications that have been tailored to
> +their unique configuration.
> +
> +Further, devices have become multi-functional and integrated to the point they
> +no longer fit neatly into the kernel's division of subsystems. Modern
> +multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
> +subsystems while sharing the underlying hardware using the auxiliary device
> +system.
> +
> +All together this creates a challenge for the operating system, where devices
> +have an expansive FW environment that needs robust device-specific debugging
> +support, and FW driven functionality that is not well suited to “generic”
FW-driven
> +interfaces. fwctl seeks to allow access to the full device functionality from
> +user space in the areas of debuggability, management, and first-boot/nth-boot
> +provisioning.
> +
> +fwctl is aimed at the common device design pattern where the OS and FW
> +communicate via an RPC message layer constructed with a queue or mailbox scheme.
> +In this case the driver will typically have some layer to deliver RPC messages
> +and collect RPC responses from device FW. The in-kernel subsystem drivers that
> +operate the device for its primary purposes will use these RPCs to build their
> +drivers, but devices also usually have a set of ancillary RPCs that don't really
> +fit into any specific subsystem. For example, a HW RAID controller is primarily
> +operated by the block layer but also comes with a set of RPCs to administer the
> +construction of drives within the HW RAID.
> +
> +In the past when devices were more single function individual subsystems would
function,
> +grow different approaches to solving some of these common problems, for instance
problems. For instance
> +monitoring device health, manipulating its FLASH, debugging the FW,
> +provisioning, all have various unique interfaces across the kernel.
> +
> +fwctl's purpose is to define a common set of limited rules, described below,
> +that allow user space to securely construct and execute RPCs inside device FW.
> +The rules serve as an agreement between the operating system and FW on how to
> +correctly design the RPC interface. As a uAPI the subsystem provides a thin
> +layer of discovery and a generic uAPI to deliver the RPCs and collect the
> +response. It supports a system of user space libraries and tools which will
> +use this interface to control the device using the device native protocols.
> +
> +Scope of Action
> +---------------
> +
> +fwctl drivers are strictly restricted to being a way to operate the device FW.
> +It is not an avenue to access random kernel internals, or other operating system
> +SW states.
> +
> +fwctl instances must operate on a well-defined device function, and the device
> +should have a well-defined security model for what scope within the physical
> +device the function is permitted to access. For instance, the most complex PCIe
> +device today may broadly have several function level scopes:
function-level
> +
> + 1. A privileged function with full access to the on-device global state and
> + configuration
> +
> + 2. Multiple hypervisor functions with control over itself and child functions
> + used with VMs
> +
> + 3. Multiple VM functions tightly scoped within the VM
> +
> +The device may create a logical parent/child relationship between these scopes,
scopes;
or end that line with a period and begin the next one with "For".
> +for instance a child VM's FW may be within the scope of the hypervisor FW. It is
> +quite common in the VFIO world that the hypervisor environment has a complex
> +provisioning/profiling/configuration responsibility for the function VFIO
> +assigns to the VM.
> +
> +Further, within the function, devices often have RPC commands that fall within
> +some general scopes of action:
> +
> + 1. Access to function & child configuration, flash, etc that becomes live at a
etc.
Use FLASH as above or change above FLASH to "flash".
> + function reset.
> +
> + 2. Access to function & child runtime configuration that kernel drivers can
> + discover at runtime.
> +
> + 3. Read only access to function debug information that may report on FW objects
Read-only
> + in the function & child, including FW objects owned by other kernel
> + subsystems.
> +
> + 4. Write access to function & child debug information strictly compatible with
> + the principles of kernel lockdown and kernel integrity protection. Triggers
> + a kernel Taint.
> +
> + 5. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
> +
> +Userspace will provide a scope label on each RPC and the kernel must enforce the
> +above CAP's and taints based on that scope. A combination of kernel and FW can
> +enforce that RPCs are placed in the correct scope by userspace.
> +
> +Denied behavior
> +---------------
> +
> +There are many things this interface must not allow user space to do (without a
> +Taint or CAP), broadly derived from the principles of kernel lockdown. Some
> +examples:
> +
> + 1. DMA to/from arbitrary memory, hang the system, run code in the device, or
> + otherwise compromise device or system security and integrity.
> +
> + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
> + objects owned by kernel drivers.
> +
> + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
> + driver can react to the device configuration at function reset/driver load
> + time, but otherwise should not be coupled to fwctl.
> +
> + 4. Operate the HW in a way that overlaps with the core purpose of another
> + primary kernel subsystem, such as read/write to LBAs, send/receive of
> + network packets, or operate an accelerator's data plane.
> +
> +fwctl is not a replacement for device direct access subsystems like uacce or
> +VFIO.
> +
> +fwctl User API
> +==============
> +
> +.. kernel-doc:: include/uapi/fwctl/fwctl.h
> +.. kernel-doc:: include/uapi/fwctl/mlx5.h
> +
> +sysfs Class
> +-----------
> +
> +fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
> +(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
> +operates the iotcl uAPI described above.
> +
> +fwctl devices can be related to driver components in other subsystems through
> +sysfs::
> +
> + $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> + ibp0s10f0
> +
> + $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> + fwctl0/
> +
> + $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> + dev device power subsystem uevent
> +
> +User space Community
> +--------------------
> +
> +Drawing inspiration from nvme-cli, participating in the kernel side must come
> +with a user space in a common TBD git tree, at a minimum to usefully operate the
> +kernel driver. Providing such an implementation is a pre-condition to merging a
> +kernel driver.
> +
> +The goal is to build user space community around some of the shared problems
> +we all have, and ideally develop some common user space programs with some
> +starting themes of:
> +
> + - Device in-field debugging
> +
> + - HW provisioning
> +
> + - VFIO child device profiling before VM boot
> +
> + - Confidential Compute topics (attestation, secure provisioning)
> +
> +That stretches across all subsystems in the kernel. fwupd is a great example of
that stretch across
> +how an excellent user space experience can emerge out of kernel-side diversity.
> +
> +fwctl Kernel API
> +================
> +
> +.. kernel-doc:: drivers/fwctl/main.c
> + :export:
> +.. kernel-doc:: include/linux/fwctl.h
> +
> +fwctl Driver design
> +-------------------
> +
> +In many cases a fwctl driver is going to be part of a larger cross-subsystem
> +device possibly using the auxiliary_device mechanism. In that case several
> +subsystems are going to be sharing the same device and FW interface layer so the
> +device design must already provide for isolation and co-operation between kernel
cooperation
> +subsystems. fwctl should fit into that same model.
> +
> +Part of the driver should include a description of how its scope restrictions
> +and security model work. The driver and FW together must ensure that RPCs
> +provided by user space are mapped to the appropriate scope. If the validation is
> +done in the driver then the validation can read a 'command effects' report from
> +the device, or hardwire the enforcement. If the validation is done in the FW,
> +then the driver should pass the fwctl_rpc_scope to the FW along with the command.
> +
> +The driver and FW must co-operate to ensure that either fwctl cannot allocate
cooperate
> +any FW resources, or any resources it does allocate are freed on FD closure. A
> +driver primarily constructed around FW RPCs may find that its core PCI function
> +and RPC layer belongs under fwctl with auxiliary devices connecting to other
> +subsystems.
> +
> +Each device type must represent a stable FW ABI, such that the userspace
> +components have the same general stability we expect from the kernel. FW upgrade
> +should not break the userspace tools.
> +
> +Security Response
> +=================
> +
> +The kernel remains the gatekeeper for this interface. If violations of the
> +scopes, security or isolation principles are found, we have options to let
> +devices fix them with a FW update, push a kernel patch to parse and block RPC
> +commands or push a kernel patch to block entire firmware versions, or devices.
no comma needed ^
> +
> +While the kernel can always directly parse and restrict RPCs, it is expected
> +that the existing kernel pattern of allowing drivers to delegate validation to
> +FW to be a useful design.
(and one that can be abused...)
> +
> +Existing Similar Examples
> +=========================
> +
> +The approach described in this document is not a new idea. Direct, or near
> +direct device access has been offered by the kernel in different areas for
> +decades. With more devices wanting to follow this design pattern it is becoming
> +clear that it is not entirely well understood and, more importantly, the
> +security considerations are not well defined or agreed upon.
> +
> +Some examples:
> +
> + - HW RAID controllers. This includes RPCs to do things like compose drives into
> + a RAID volume, configure RAID parameters, monitor the HW and more.
> +
> + - Baseboard managers. RPCs for configuring settings in the device and more
> +
> + - NVMe vendor command capsules. nvme-cli provides access to some monitoring
> + functions that different products have defined, but more exists.
> +
> + - CXL also has a NVMe like vendor command system.
NVMe-like
> +
> + - DRM allows user space drivers to send commands to the device via kernel
> + mediation
> +
> + - RDMA allows user space drivers to directly push commands to the device
> + without kernel involvement
> +
> + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc
> +
> +The first 4 would be examples of areas that fwctl intends to cover.
I would s/would be/are/ fwiw.
> +
> +Some key lessons learned from these past efforts are the importance of having a
> +common user space project to use as a pre-condition for obtaining a kernel
> +driver. Developing good community around useful software in user space is key to
> +getting companies to fund participation to enable their products.
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 5926115ec0ed86..9685942fc8a21f 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -43,6 +43,7 @@ Devices and I/O
>
> accelerators/ocxl
> dma-buf-alloc-exchange
> + fwctl
> gpio/index
> iommu
> iommufd
--
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 23:56 ` Dan Williams
@ 2024-06-05 3:05 ` Jakub Kicinski
2024-06-05 11:19 ` Jonathan Cameron
2024-06-05 13:59 ` Jason Gunthorpe
2 siblings, 0 replies; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-05 3:05 UTC (permalink / raw)
To: Dan Williams
Cc: David Ahern, Jason Gunthorpe, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Tue, 4 Jun 2024 16:56:57 -0700 Dan Williams wrote:
> Jakub Kicinski wrote:
> [..]
> > I don't begrudge anyone building proprietary options, but leave
> > upstream out of it.
>
> So I am of 2 minds here. In general, how is upstream benefited by
> requiring every vendor command to be wrapped by a Linux command?
> [...]
Thanks for sharing the CXL experience and your perspective.
Also for trying to frame the discussion in a useful way,
although I have little faith that it will help :( Fingers crossed?
> * Integrity: Subsystem has a responsibility to meet kernel-lockdown
> expectations:
>
> Distros and system owners need to be assured that root's ability to
> modify the running kernel image are mitigated. For CXL there are 2 ways
> to do this, require Linux wrapper commands for all the low level
> commands (status quo), or a new trust the device to publish which
> commands have user data effects in something CXL calls the "Command
> Effects Log". In that "trust Command Effects" scenario the kernel still
> has no idea what the command is actually doing, but it can at least
> assert that the device does not claim that the command changes the
> contents of system-memory. Now, you might say, "the device can just
> lie", but that betrays a conceit of the kernel restriction. A device
> could lie that a Linux wrapped command when passed certain payloads does
> not in turn proxy to a restricted command. So at some point there is
> almost always an out-of-tree way to get around the kernel restriction,
> so the question is are we better off giving a blessed path or force
> vendors into ugly out-of-tree workarounds?
The integrity thing is a double edge sword, so I don't have much to say
here. If we take a few wrong turns we'll wrap the vendor commands with
crypto and then the vendor can control which commands you get to run ;)
Obviously I'm joking, and not saying that the intent of the current
series! But its about as realistic as "this will only be used for truly
vendor specific things".
> * Introspection / validation: Subsystem community needs to be able to
> audit behavior after the fact.
>
> To me this means even if the kernel is letting a command through based
> on the stated Command Effect of "Configuration Change after Cold Reset"
> upstream community has a need to be able to read the vendor
> specification for that command. I.e. commands might be vendor-specific,
> but never vendor-private. I see this as similar to the requirement for
> open source userspace for sophisticated accelerators.
That sounds pretty CXL specific, and IIUC unrealistic.
You assume you have some specification to consult, while this discussion
has been going for over a year now, and I can't get the vendors to share
what those turntables they so desperately need to tweak are.
> * Collaboration: open standards support open driver maintenance.
>
> Without standards we end up with awkward situations like Confidential
> Computing where every vendor races to implement the same functionality
> in arbitrarily different and vendor specific ways.
>
> For CXL devices, and I believe the devices fwctl is targeting, there
> are a whole class of commands for vendor specific configuration and
> debug. Commands that the kernel really need not worry about.
>
> Some subsystems may want to allow high-performance science experiments
> like what NVMe allows, but it seems worth asking the question if
> standardizing device configuration and debug is really the best use of
> upstream's limited time?
No, but it's not about science experiments, really. It's about
production features. The effort of implementing something properly
upstream is high. I cost time and money to get the right caliber of
people and let them go thru the revisions. I lack confidence that
merging fwctl will not negatively impact motivation for companies to
pay off our accrued technical debt. While all they need is "this simple
little feature". And before competition wins the customer. It's a race
to the bottom.
> One of the release valves in the CXL space is openly specified
> commands with opaque payloads, like "Read Vendor Debug Log". That is
> clear what it does, likely a payload the kernel need never worry
> about, and the "Command Effects" is empty. However, going forward there
> is a new class of commands called "Set/Get Feature" that allow a wide
> range of vendor toggles to be deployed which will need an upstream
> response for the driver policy to vendor-specific "Features".
>
> So if fwctl, or something like it, can strike a balance of enforcing
> integrity and introspection while encouraging collaboration on the
> aspects that are worth upstream collaboration, I think that is a
> conversation worth having.
I presume you were trying to underscore that the decision is unavoidably
a trade off, which is true. But I don't follow the exact formulation.
Is fwctl helping integrity or collaboration? If we assume use of vendor
tools is unavoidable, then I guess integrity? I really can't see how it
helps collaboration when everyone ships their custom tool set.
Back to the tradeoff. For networking, which is a _very_ mature subsystem
with a ton of standards the need to do "vendor specific things" is
marginal. The downside of the loss of an "upstream advantage" is obvious.
We need to take such decisions on subsystem by subsystem basis.
You should be able to draw the lines differently for CXL than how we
draw them for TCP/IP.
On the technical level the discussion can't go very far, because I'd
like to hear actual user problems. But I can't even get a list of those
infamous thousands of knobs :|
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
` (8 preceding siblings ...)
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
@ 2024-06-05 3:11 ` Jakub Kicinski
2024-06-05 12:06 ` Jason Gunthorpe
9 siblings, 1 reply; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-05 3:11 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Mon, 3 Jun 2024 12:53:16 -0300 Jason Gunthorpe wrote:
> Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
Please double check with Broadcom if they are still supportive,
in the current form.
Please include lore links to previous postings.
Please carry my nack on future version. At least as long as
the write access checks are.. good-faith-based.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-04 16:58 ` Jason Gunthorpe
@ 2024-06-05 11:07 ` Jonathan Cameron
2024-06-05 18:27 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Cameron @ 2024-06-05 11:07 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Zhu Yanjun, Jonathan Corbet, Itay Avraham,
Jakub Kicinski, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, David Ahern, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, linux-cxl, patches
On Tue, 4 Jun 2024 13:58:44 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Tue, Jun 04, 2024 at 05:50:23PM +0100, Jonathan Cameron wrote:
>
> > > > > static int fwctl_fops_open(struct inode *inode, struct file *filp)
> > > > > {
> > > > > struct fwctl_device *fwctl =
> > > > > container_of(inode->i_cdev, struct fwctl_device, cdev);
> > > > > + struct fwctl_uctx *uctx __free(kfree) = NULL;
> > > > > + int ret;
> > > > > +
> > > > > + guard(rwsem_read)(&fwctl->registration_lock);
> > > > > + if (!fwctl->ops)
> > > > > + return -ENODEV;
> > > > > +
> > > > > + uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
> > > > > + if (!uctx)
> > > > > + return -ENOMEM;
> > > > > +
> > > > > + uctx->fwctl = fwctl;
> > > > > + ret = fwctl->ops->open_uctx(uctx);
> > > > > + if (ret)
> > > > > + return ret;
> > > >
> > > > When something is wrong, uctx is freed in "fwctl->ops->open_uctx(uctx);"?
> > > >
> > > > If not, the allocated memory uctx leaks here.
> > >
> > > See how uctx is declared:
> > > struct fwctl_uctx *uctx __free(kfree) = NULL;
> > >
> > > It will be released automatically.
> > > See include/linux/cleanup.h for more details.
> >
> > I'm lazy so not finding the discussion now, but Linus has been pretty clear
> > that he doesn't like this pattern because of possibility of additional cleanup
> > magic getting introduced and then the cleanup happening in an order that
> > causes problems.
>
> I saw that discussion, but I thought it was talking about the macro
> behavior - ie guard() creates a variable hidden in the macro.
>
> The point about order is interesting though - notice the above will
> free the uctx after unlocking (which is the slightly more preferred
> order here), but it is easy to imagine cases where that order would be
> wrong.
>
> > Preferred option is drag the declaration to where is initialized so break
> > with our tradition of declarations all at the top
> >
> > struct fwctl_uctx *uctx __free(kfree) =
> > kzalloc(...);
>
> I don't recall that dramatic conclusion in the discussion, but it does
> make alot of sense to me.
I'll be less lazy (and today found the search foo to track it down).
https://lore.kernel.org/all/CAHk-=wicfvWPuRVDG5R1mZSxD8Xg=-0nLOiHay2T_UJ0yDX42g@mail.gmail.com/
Linus:
> IOW, my current thinking is "let's always have the constructor and
> destructor together", and see how it ends up going.
Not set in stone but I've not yet seen a suggestion of the opposite.
The example from Bartosz that got that response was
Bartosz:
> void foo(void)
> {
> char *s __free(kfree) = NULL;
>
> do_stuff();
> s = kmalloc(42, GFP_KERNEL);
> }
>
> Or does it always have to be:
>
> void foo(void)
> {
> do_stuff();
> char *s __free(kfree) = kmalloc(42, GFP_KERNEL);
> }
So option 2.
Jonathan
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev
2024-06-04 18:52 ` Jason Gunthorpe
@ 2024-06-05 11:08 ` Jonathan Cameron
0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Cameron @ 2024-06-05 11:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
linux-cxl, patches
On Tue, 4 Jun 2024 15:52:00 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Tue, Jun 04, 2024 at 06:05:55PM +0100, Jonathan Cameron wrote:
>
> > Trick for this is often to define a small function that allocates both the
> > ida and the device. With in that micro function handle the one error path
> > or if you only have two things to do, you can use __free() for the allocation.
>
> This style is already followed here, the _alloc_device() is the
> function that does everything before starting reference counting (IMHO
> it is the best pattern to use). If we move the ida allocation to that
> function then the if inside release is not needed.
>
> Like this:
LGTM (this specific code, not commenting on fwctl in general yet as needs
more thinking time!)
>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index d25b5eb3aee73c..a26697326e6ced 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -267,8 +267,7 @@ static void fwctl_device_release(struct device *device)
> struct fwctl_device *fwctl =
> container_of(device, struct fwctl_device, dev);
>
> - if (fwctl->dev.devt)
> - ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
> + ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
> mutex_destroy(&fwctl->uctx_list_lock);
> kfree(fwctl);
> }
> @@ -288,6 +287,7 @@ static struct fwctl_device *
> _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> {
> struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
> + int devnum;
>
> if (!fwctl)
> return NULL;
> @@ -296,6 +296,12 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> init_rwsem(&fwctl->registration_lock);
> mutex_init(&fwctl->uctx_list_lock);
> INIT_LIST_HEAD(&fwctl->uctx_list);
> +
> + devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> + if (devnum < 0)
> + return NULL;
> + fwctl->dev.devt = fwctl_dev + devnum;
> +
> device_initialize(&fwctl->dev);
> return_ptr(fwctl);
> }
> @@ -307,16 +313,10 @@ struct fwctl_device *_fwctl_alloc_device(struct device *parent,
> {
> struct fwctl_device *fwctl __free(fwctl) =
> _alloc_device(parent, ops, size);
> - int devnum;
>
> if (!fwctl)
> return NULL;
>
> - devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
> - if (devnum < 0)
> - return NULL;
> - fwctl->dev.devt = fwctl_dev + devnum;
> -
> cdev_init(&fwctl->cdev, &fwctl_fops);
> fwctl->cdev.owner = THIS_MODULE;
>
> Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 23:56 ` Dan Williams
2024-06-05 3:05 ` Jakub Kicinski
@ 2024-06-05 11:19 ` Jonathan Cameron
2024-06-05 13:59 ` Jason Gunthorpe
2 siblings, 0 replies; 73+ messages in thread
From: Jonathan Cameron @ 2024-06-05 11:19 UTC (permalink / raw)
To: Dan Williams
Cc: Jakub Kicinski, David Ahern, Jason Gunthorpe, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
> One of the release valves in the CXL space is openly specified
> commands with opaque payloads, like "Read Vendor Debug Log". That is
> clear what it does, likely a payload the kernel need never worry
> about, and the "Command Effects" is empty. However, going forward there
> is a new class of commands called "Set/Get Feature" that allow a wide
> range of vendor toggles to be deployed which will need an upstream
> response for the driver policy to vendor-specific "Features".
Irrelevant rat hole time ;)
I don't see those Set / Get feature as any different from other commands.
I see them as a convenience mostly there to cut down on spec duplication
and enforce some consistency across multiple similar commands, but they
are just commands like any other, validation is just one step further
into the payload.
There are already a bunch of them in the main CXL spec and like you mention
above if someone brings a well documented vendor feature (or feature from
another standard etc), then if appropriate we could let that through the
filter as well.
Same will be true of tunneled commands (I think we can ignore the cross
host security aspect of those). Ultimately we can sanity check the payload
much like a top level command.
So I mostly agree with rest of what you've said, but think this detail
doesn't matter.
>
> So if fwctl, or something like it, can strike a balance of enforcing
> integrity and introspection while encouraging collaboration on the
> aspects that are worth upstream collaboration, I think that is a
> conversation worth having.
>
> [1]: http://lore.kernel.org/r/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@mail.gmail.com/
> [2]: http://lore.kernel.org/r/20240321174423.00007e0d@Huawei.com
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-05 3:11 ` Jakub Kicinski
@ 2024-06-05 12:06 ` Jason Gunthorpe
0 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 12:06 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Tue, Jun 04, 2024 at 08:11:03PM -0700, Jakub Kicinski wrote:
> On Mon, 3 Jun 2024 12:53:16 -0300 Jason Gunthorpe wrote:
> > Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
>
> Please double check with Broadcom if they are still supportive,
> in the current form.
They are free to comment.
> Please include lore links to previous postings.
The link to mlx5ctl is already in the cover letter and Saeed linked
from there to enough of the prior stuff.
> Please carry my nack on future version. At least as long as
> the write access checks are.. good-faith-based.
I will include the acks and nacks related to the general concept on
the documentation patch 6 along with a links and mention in the PR
when we get there.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 23:56 ` Dan Williams
2024-06-05 3:05 ` Jakub Kicinski
2024-06-05 11:19 ` Jonathan Cameron
@ 2024-06-05 13:59 ` Jason Gunthorpe
2024-06-06 2:35 ` David Ahern
` (2 more replies)
2 siblings, 3 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 13:59 UTC (permalink / raw)
To: Dan Williams
Cc: Jakub Kicinski, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Tue, Jun 04, 2024 at 04:56:57PM -0700, Dan Williams wrote:
> Jakub Kicinski wrote:
> [..]
> > I don't begrudge anyone building proprietary options, but leave
> > upstream out of it.
>
> So I am of 2 minds here. In general, how is upstream benefited by
> requiring every vendor command to be wrapped by a Linux command?
People actually can use upstream :)
Amazingly there is inherit benefit to people being able to use the
software we produce.
> 3 years on from that recommendation it seems no vendor has even needed
> that level of distribution help. I.e. checking a few distro kernels
> (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in
> their debug builds. I can only assume that locally compiled custom
> kernel binaries are filling the need.
My strong advice would be to be careful about this. Android-ism where
nobody runs the upstream kernel is a real thing. For something
emerging like CXL there is a real risk that the hyperscale folks will
go off and do their own OOT stuff and in-tree CXL will be something
usuable but inferior. I've seen this happen enough times..
If people come and say we need X and the maintainer says no, they
don't just give up and stop doing X, the go and do X anyhow out of
tree. This has become especially true now that the center of business
activity in server-Linux is driven by the hyperscale crowd that don't
care much about upstream. Linux maintainer's don't actually have the
power to force the industry to do things, though people do keep
trying.. Maintainers can only lead, and productive leading is not done
with a NO.
You will start to see this pain in maybe 5-10 years if CXL starts to
be something deployed in an enterprise RedHat/Dell/etc sort of
environment. Then that missing X becomes a critical issue because it
turns out the hyperscale folks long since figured out it is really
important but didn't do anything to enable it upstream.
There is merit in upstream being something people can and do actually
use, not just an ivory tower of architectural perfection. There is
merit in bringing code into the community instead of forcing things to
be OOT.
For instance the thread you linked where there was talk of needing the
signal integrity data is a great example. Sure some of that is
manufacturing time, but also if you deploy a million interfaces in a
datacenter, then yes, there will be need to collect SI information
from live systems and do some analysis on it. You wouldn't believe how
much physically broken HW leaks out into data centers and needs
manufacturing level debugging techniques to properly root cause :(
> userpace-to-device-firmware tunnel?" to at least get all the various
> concerns documented in one place, and provide guidance for how device
> vendors should navigate this space across subsystems.
This is my effort here. If we document the expectations there is a
much better chance that a standard body or device manufacturer can
implement their interfaces in a way that works with the OS. There is a
much higher chance they will attract CVEs and be forced to fix it if
the security expectations are clearly laid out. You had a good
observation in one of those links about how they are not OS
people. Let's help them do better.
Shunt the less robust stuff to fwctl and then people can also make
their own security choices, don't enable or load the fwctl modules and
you get more protection. It is closer to your
CONFIG_CXL_MEM_RAW_COMMANDS=y but at runtime.
I think I captured most of your commentary below here in patch 6.
> Effects Log". In that "trust Command Effects" scenario the kernel still
> has no idea what the command is actually doing, but it can at least
> assert that the device does not claim that the command changes the
> contents of system-memory. Now, you might say, "the device can just
> lie", but that betrays a conceit of the kernel restriction. A device
> could lie that a Linux wrapped command when passed certain payloads does
> not in turn proxy to a restricted command.
Yeah, we have to trust the device. If the device is hostile toward the
OS then there are already big problems. We need to allow for
unintentional defects in the devices, but we don't need to be
paranoid.
IMHO a command effects report, in conjunction with a robust OS centric
defintion is something we can trust in.
> * Introspection / validation: Subsystem community needs to be able to
> audit behavior after the fact.
>
> To me this means even if the kernel is letting a command through based
> on the stated Command Effect of "Configuration Change after Cold Reset"
> upstream community has a need to be able to read the vendor
> specification for that command. I.e. commands might be vendor-specific,
> but never vendor-private. I see this as similar to the requirement for
> open source userspace for sophisticated accelerators.
I'm less hard on this. As long as reasonable open userspace exists I
think it is fine to let other stuff through too. I can appreciate the
DRM stance on this, but IMHO, there is meaningfully more value for open
source in trying get an open Vulkan implementation vs blocking users
from reading their vendor'd diagnostic SI values.
I don't think we should get into some kind of extremism and insist
that every single bit must be documented/standardized or Linux won't
support it.
This is why I envision fwctl as not being suitable for actual
datapath/performance stuff.
> * Collaboration: open standards support open driver maintenance.
>
> Without standards we end up with awkward situations like Confidential
> Computing where every vendor races to implement the same functionality
> in arbitrarily different and vendor specific ways.
Standard are important. Linux is not a standards body. Linux
maintainers can only advise, not force, the industry to make
standards. At a certain point Linux's job is to implement software to
support what people have built. CC is a sad example where the industry
did not get together enough, but still Linux will support the CC mess.
> For CXL devices, and I believe the devices fwctl is targeting, there
> are a whole class of commands for vendor specific configuration and
> debug. Commands that the kernel really need not worry about.
Right.
> Some subsystems may want to allow high-performance science experiments
> like what NVMe allows, but it seems worth asking the question if
> standardizing device configuration and debug is really the best use of
> upstream's limited time?
From what I've been seeing it looks like a significant waste of
time. For example there is minimal industry value in standardizing
values stored in a device's boot time flash configuration. If some
common software wants to access really generic configuration (like
SRIOV enable) then sure there is merit, but that is really the
minority.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 22:32 ` Jakub Kicinski
@ 2024-06-05 14:50 ` Jason Gunthorpe
2024-06-05 15:41 ` Jakub Kicinski
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 14:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Saeed Mahameed, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Tue, Jun 04, 2024 at 03:32:16PM -0700, Jakub Kicinski wrote:
> On Tue, 4 Jun 2024 14:28:05 -0700 Saeed Mahameed wrote:
> > On 04 Jun 07:04, Jakub Kicinski wrote:
> > >On Mon, 3 Jun 2024 21:01:58 -0600 David Ahern wrote:
> > >> Seriously, Jakub, how is that in any way related to this patch set?
> > >
> > >Whether they admit it or not, DOCA is a major reason nVidia wants
> > >this to be standalone rather than part of RDMA.
> >
> > No, DOCA isn't on the agenda for this new interface. But what is the point
> > in arguing?
>
> I'm not arguing any point, we argued enough. But you failed to disclose
> that DOCA is very likely user of this interface. So whoever you're
> planning to submit it to should know.
This is getting ridiculous. Did you disclose in your PSP cover letter
that all that work and new kernel uAPI is to support Meta's propritary
user space, even to the point that NO open source implementation even
exists yet? Let me check. Nope.
So why this made up double standard for Saeed? Especially after he
already said DOCA isn't on the agenda for mlx5's fwctl?
> > >> You are basically suggesting that if any vendor ever has an out of tree
> > >> option for its hardware every patch it sends should be considered a ruse
> > >> to enable or simplify proprietary options.
> >
> > It's apparent that you're attributing sinister agendas to patchsets when
> > you fail to offer valid technical opinions regarding the NAK nature. Let's
> > address this outside of this patchset, as this isn't the first occurrence.
> > Consistency in evaluating patches is crucial;
>
> Exactly :| Netdev people, including multiple prominent developers from
> Mellanox/nVidia have been nacking SDK interfaces in Linux networking
> for 20 years. How are we going to look to all the companies which have
> been doing IPUs for over a decade if we change the rules for nVidia?
That is a bleak way of painting things. fwctl is a developing
consensus on how to solve this class of problems. We get to have a
consensus that is different than the past because Linux dos actually
evolve. All your long suffering IPU comanpies are welcome to use fwctl
with their products going forward just as equally to nvidia/etc.
Amazingly, "rules" are not set in stone in Linux!
> If by "let's address this outside of this patchset" you mean that we
> should have a discussion about maintainer favoritism, and subsystem
> capture by vendors - you have my full support!
This vendor bashing needs to stop. You could have easially used the
word companies and been much more accurate. At this point the
hyperscale companies - your so-called "users" - are much more guilty
of "subsytem capture" than any vendor is, and it certainly has changed
the culture of Linux.
There are many legitimate complaints all around of maintainers being
capricious - it doesn't matter who employees them.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-05 14:50 ` Jason Gunthorpe
@ 2024-06-05 15:41 ` Jakub Kicinski
0 siblings, 0 replies; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-05 15:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Saeed Mahameed, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Wed, 5 Jun 2024 11:50:39 -0300 Jason Gunthorpe wrote:
> On Tue, Jun 04, 2024 at 03:32:16PM -0700, Jakub Kicinski wrote:
> > On Tue, 4 Jun 2024 14:28:05 -0700 Saeed Mahameed wrote:
> > > No, DOCA isn't on the agenda for this new interface. But what is the point
> > > in arguing?
> >
> > I'm not arguing any point, we argued enough. But you failed to disclose
> > that DOCA is very likely user of this interface. So whoever you're
> > planning to submit it to should know.
>
> This is getting ridiculous. Did you disclose in your PSP cover letter
> that all that work and new kernel uAPI is to support Meta's propritary
> user space, even to the point that NO open source implementation even
> exists yet? Let me check. Nope.
There is no Meta proprietary implementation. Some Meta folks who are on
the CC of the submission are working on extending Fizz, but it's not
ready. Fizz is here: https://github.com/facebookincubator/fizz
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2024-06-04 12:16 ` Zhu Yanjun
@ 2024-06-05 15:42 ` Przemek Kitszel
2024-06-05 15:49 ` Jason Gunthorpe
1 sibling, 1 reply; 73+ messages in thread
From: Przemek Kitszel @ 2024-06-05 15:42 UTC (permalink / raw)
To: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 6/3/24 17:53, Jason Gunthorpe wrote:
> Each file descriptor gets a chunk of per-FD driver specific context that
> allows the driver to attach a device specific struct to. The core code
> takes care of the memory lifetime for this structure.
>
> The ioctl dispatch and design is based on what was built for iommufd. The
> ioctls have a struct which has a combined in/out behavior with a typical
> 'zero pad' scheme for future extension and backwards compatibility.
I would go one step further and introduce a new syscall, that would
smooth out typical problems of ioctl, and base it on some TLV scheme
(similar to netlink, in some kind a way smaller brother of protobuf).
Perhaps with the name more broad than fw-knob-tuning.
Then I would go two steps back and a driver layer to interpert those
syscalls to have at least some sort of openness.
>
> Like iommufd some shared logic does most of the ioctl marshalling and
> compatibility work and tables diatches to some function pointers for
> each unique iotcl.
>
> This approach has proven to work quite well in the iommufd and rdma
> subsystems.
>
> Allocate an ioctl number space for the subsystem.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> MAINTAINERS | 1 +
> drivers/fwctl/main.c | 124 +++++++++++++++++-
> include/linux/fwctl.h | 31 +++++
> include/uapi/fwctl/fwctl.h | 41 ++++++
> 5 files changed, 196 insertions(+), 2 deletions(-)
> create mode 100644 include/uapi/fwctl/fwctl.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index a141e8e65c5d3a..4d91c5a20b98c8 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -324,6 +324,7 @@ Code Seq# Include File Comments
> 0x97 00-7F fs/ceph/ioctl.h Ceph file system
> 0x99 00-0F 537-Addinboard driver
> <mailto:buk@buks.ipn.de>
> +0x9A 00-0F include/uapi/fwctl/fwctl.h
> 0xA0 all linux/sdp/sdp.h Industrial Device Project
> <mailto:kenji@bitgate.com>
> 0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 833b853808421e..94062161e9c4d7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9084,6 +9084,7 @@ S: Maintained
> F: Documentation/userspace-api/fwctl.rst
> F: drivers/fwctl/
> F: include/linux/fwctl.h
> +F: include/uapi/fwctl/
>
> GALAXYCORE GC0308 CAMERA SENSOR DRIVER
> M: Sebastian Reichel <sre@kernel.org>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index ff9b7bad5a2b0d..7ecdabdd9dcb1e 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -9,26 +9,131 @@
> #include <linux/container_of.h>
> #include <linux/fs.h>
>
> +#include <uapi/fwctl/fwctl.h>
> +
> enum {
> FWCTL_MAX_DEVICES = 256,
> };
> static dev_t fwctl_dev;
> static DEFINE_IDA(fwctl_ida);
>
> +struct fwctl_ucmd {
> + struct fwctl_uctx *uctx;
> + void __user *ubuffer;
> + void *cmd;
> + u32 user_size;
> +};
> +
> +/* On stack memory for the ioctl structs */
> +union ucmd_buffer {
> +};
> +
> +struct fwctl_ioctl_op {
> + unsigned int size;
> + unsigned int min_size;
> + unsigned int ioctl_num;
> + int (*execute)(struct fwctl_ucmd *ucmd);
> +};
> +
> +#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
> + [_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = { \
> + .size = sizeof(_struct) + \
> + BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
> + sizeof(_struct)), \
> + .min_size = offsetofend(_struct, _last), \
> + .ioctl_num = _ioctl, \
> + .execute = _fn, \
> + }
> +static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
> +};
> +
> +static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct fwctl_uctx *uctx = filp->private_data;
> + const struct fwctl_ioctl_op *op;
> + struct fwctl_ucmd ucmd = {};
> + union ucmd_buffer buf;
> + unsigned int nr;
> + int ret;
> +
> + nr = _IOC_NR(cmd);
> + if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
> + return -ENOIOCTLCMD;
> + op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
> + if (op->ioctl_num != cmd)
> + return -ENOIOCTLCMD;
> +
> + ucmd.uctx = uctx;
> + ucmd.cmd = &buf;
> + ucmd.ubuffer = (void __user *)arg;
> + ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
> + if (ret)
> + return ret;
> +
> + if (ucmd.user_size < op->min_size)
> + return -EINVAL;
> +
> + ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
> + ucmd.user_size);
> + if (ret)
> + return ret;
> +
> + guard(rwsem_read)(&uctx->fwctl->registration_lock);
> + if (!uctx->fwctl->ops)
> + return -ENODEV;
> + return op->execute(&ucmd);
> +}
> +
> static int fwctl_fops_open(struct inode *inode, struct file *filp)
> {
> struct fwctl_device *fwctl =
> container_of(inode->i_cdev, struct fwctl_device, cdev);
> + struct fwctl_uctx *uctx __free(kfree) = NULL;
> + int ret;
> +
> + guard(rwsem_read)(&fwctl->registration_lock);
> + if (!fwctl->ops)
> + return -ENODEV;
> +
> + uctx = kzalloc(fwctl->ops->uctx_size, GFP_KERNEL | GFP_KERNEL_ACCOUNT);
> + if (!uctx)
> + return -ENOMEM;
> +
> + uctx->fwctl = fwctl;
> + ret = fwctl->ops->open_uctx(uctx);
> + if (ret)
> + return ret;
> +
> + scoped_guard(mutex, &fwctl->uctx_list_lock) {
> + list_add_tail(&uctx->uctx_list_entry, &fwctl->uctx_list);
> + }
>
> get_device(&fwctl->dev);
> - filp->private_data = fwctl;
> + filp->private_data = no_free_ptr(uctx);
> return 0;
> }
>
> +static void fwctl_destroy_uctx(struct fwctl_uctx *uctx)
> +{
> + lockdep_assert_held(&uctx->fwctl->uctx_list_lock);
> + list_del(&uctx->uctx_list_entry);
> + uctx->fwctl->ops->close_uctx(uctx);
> +}
> +
> static int fwctl_fops_release(struct inode *inode, struct file *filp)
> {
> - struct fwctl_device *fwctl = filp->private_data;
> + struct fwctl_uctx *uctx = filp->private_data;
> + struct fwctl_device *fwctl = uctx->fwctl;
>
> + scoped_guard(rwsem_read, &fwctl->registration_lock) {
> + if (fwctl->ops) {
> + guard(mutex)(&fwctl->uctx_list_lock);
> + fwctl_destroy_uctx(uctx);
> + }
> + }
> +
> + kfree(uctx);
> fwctl_put(fwctl);
> return 0;
> }
> @@ -37,6 +142,7 @@ static const struct file_operations fwctl_fops = {
> .owner = THIS_MODULE,
> .open = fwctl_fops_open,
> .release = fwctl_fops_release,
> + .unlocked_ioctl = fwctl_fops_ioctl,
> };
>
> static void fwctl_device_release(struct device *device)
> @@ -46,6 +152,7 @@ static void fwctl_device_release(struct device *device)
>
> if (fwctl->dev.devt)
> ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
> + mutex_destroy(&fwctl->uctx_list_lock);
> kfree(fwctl);
> }
>
> @@ -69,6 +176,9 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> return NULL;
> fwctl->dev.class = &fwctl_class;
> fwctl->dev.parent = parent;
> + init_rwsem(&fwctl->registration_lock);
> + mutex_init(&fwctl->uctx_list_lock);
> + INIT_LIST_HEAD(&fwctl->uctx_list);
> device_initialize(&fwctl->dev);
> return_ptr(fwctl);
> }
> @@ -134,8 +244,18 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
> */
> void fwctl_unregister(struct fwctl_device *fwctl)
> {
> + struct fwctl_uctx *uctx;
> +
> cdev_device_del(&fwctl->cdev, &fwctl->dev);
>
> + /* Disable and free the driver's resources for any still open FDs. */
> + guard(rwsem_write)(&fwctl->registration_lock);
> + guard(mutex)(&fwctl->uctx_list_lock);
> + while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
> + struct fwctl_uctx,
> + uctx_list_entry)))
> + fwctl_destroy_uctx(uctx);
> +
> /*
> * The driver module may unload after this returns, the op pointer will
> * not be valid.
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index ef4eaa87c945e4..1d9651de92fc19 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -11,7 +11,20 @@
> struct fwctl_device;
> struct fwctl_uctx;
>
> +/**
> + * struct fwctl_ops - Driver provided operations
> + * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
> + * bytes of this memory will be a fwctl_uctx. The driver can use the
> + * remaining bytes as its private memory.
> + * @open_uctx: Called when a file descriptor is opened before the uctx is ever
> + * used.
> + * @close_uctx: Called when the uctx is destroyed, usually when the FD is
> + * closed.
> + */
> struct fwctl_ops {
> + size_t uctx_size;
> + int (*open_uctx)(struct fwctl_uctx *uctx);
> + void (*close_uctx)(struct fwctl_uctx *uctx);
> };
>
> /**
> @@ -26,6 +39,10 @@ struct fwctl_device {
> struct device dev;
> /* private: */
> struct cdev cdev;
> +
> + struct rw_semaphore registration_lock;
> + struct mutex uctx_list_lock;
> + struct list_head uctx_list;
> const struct fwctl_ops *ops;
> };
>
> @@ -65,4 +82,18 @@ DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
> int fwctl_register(struct fwctl_device *fwctl);
> void fwctl_unregister(struct fwctl_device *fwctl);
>
> +/**
> + * struct fwctl_uctx - Per user FD context
> + * @fwctl: fwctl instance that owns the context
> + *
> + * Every FD opened by userspace will get a unique context allocation. Any driver
> + * private data will follow immediately after.
> + */
> +struct fwctl_uctx {
> + struct fwctl_device *fwctl;
> + /* private: */
> + /* Head at fwctl_device::uctx_list */
> + struct list_head uctx_list_entry;
> +};
> +
> #endif
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> new file mode 100644
> index 00000000000000..0bdce95b6d69d9
> --- /dev/null
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
> + */
> +#ifndef _UAPI_FWCTL_H
> +#define _UAPI_FWCTL_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define FWCTL_TYPE 0x9A
> +
> +/**
> + * DOC: General ioctl format
> + *
> + * The ioctl interface follows a general format to allow for extensibility. Each
> + * ioctl is passed in a structure pointer as the argument providing the size of
> + * the structure in the first u32. The kernel checks that any structure space
> + * beyond what it understands is 0. This allows userspace to use the backward
> + * compatible portion while consistently using the newer, larger, structures.
> + *
> + * ioctls use a standard meaning for common errnos:
> + *
> + * - ENOTTY: The IOCTL number itself is not supported at all
> + * - E2BIG: The IOCTL number is supported, but the provided structure has
> + * non-zero in a part the kernel does not understand.
> + * - EOPNOTSUPP: The IOCTL number is supported, and the structure is
> + * understood, however a known field has a value the kernel does not
> + * understand or support.
> + * - EINVAL: Everything about the IOCTL was understood, but a field is not
> + * correct.
> + * - ENOMEM: Out of memory.
> + * - ENODEV: The underlying device has been hot-unplugged and the FD is
> + * orphaned.
> + *
> + * As well as additional errnos, within specific ioctls.
> + */
> +enum {
> + FWCTL_CMD_BASE = 0,
> +};
> +
> +#endif
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-05 15:42 ` Przemek Kitszel
@ 2024-06-05 15:49 ` Jason Gunthorpe
0 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 15:49 UTC (permalink / raw)
To: Przemek Kitszel
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Wed, Jun 05, 2024 at 05:42:51PM +0200, Przemek Kitszel wrote:
> On 6/3/24 17:53, Jason Gunthorpe wrote:
> > Each file descriptor gets a chunk of per-FD driver specific context that
> > allows the driver to attach a device specific struct to. The core code
> > takes care of the memory lifetime for this structure.
> >
> > The ioctl dispatch and design is based on what was built for iommufd. The
> > ioctls have a struct which has a combined in/out behavior with a typical
> > 'zero pad' scheme for future extension and backwards compatibility.
>
> I would go one step further and introduce a new syscall, that would
> smooth out typical problems of ioctl, and base it on some TLV scheme
> (similar to netlink, in some kind a way smaller brother of protobuf).
> Perhaps with the name more broad than fw-knob-tuning.
We did a TLV scheme like netlink for RDMA. It is very complex and
frankly I think it is overkill for what this wants to do. It suited
RDMA because the system call interface is so vast there.
If the kernel had a general TLV path as an alternative to ioctl it
could be very interesting. I thought about generalizing the RDMA stuff
once, and even gave a small talk at LPC on some of the ideas, but
didn't have the bravery or justification to actually try to do it.
> Then I would go two steps back and a driver layer to interpert those
> syscalls to have at least some sort of openness.
I don't envision having thick drivers marshaling and unmarshaling FW
messages to obfuscate the data flow. The purpose here is what it says
on the label, to be a thin and simple path to sends native commands
with a security apparatus.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 6/8] fwctl: Add documentation
2024-06-05 2:31 ` Randy Dunlap
@ 2024-06-05 16:03 ` Jason Gunthorpe
2024-06-05 20:14 ` Randy Dunlap
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 16:03 UTC (permalink / raw)
To: Randy Dunlap
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Tue, Jun 04, 2024 at 07:31:10PM -0700, Randy Dunlap wrote:
> > +Modern devices contain extensive amounts of FW, and in many cases, are largely
> > +software defined pieces of hardware. The evolution of this approach is largely a
>
> software-defined
Thanks a lot Randy, I picked up all your notes.
> > +While the kernel can always directly parse and restrict RPCs, it is expected
> > +that the existing kernel pattern of allowing drivers to delegate validation to
> > +FW to be a useful design.
>
> (and one that can be abused...)
I would really like to write a paragraph about this "abuse", Dan has
some good thoughts on this as well. Did you have a specific "abuse"
in your mind?
Thanks,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-05 11:07 ` Jonathan Cameron
@ 2024-06-05 18:27 ` Jason Gunthorpe
2024-06-06 13:34 ` Jonathan Cameron
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 18:27 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Leon Romanovsky, Zhu Yanjun, Jonathan Corbet, Itay Avraham,
Jakub Kicinski, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, David Ahern, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, linux-cxl, patches
On Wed, Jun 05, 2024 at 12:07:37PM +0100, Jonathan Cameron wrote:
> > I don't recall that dramatic conclusion in the discussion, but it does
> > make alot of sense to me.
>
> I'll be less lazy (and today found the search foo to track it down).
>
> https://lore.kernel.org/all/CAHk-=wicfvWPuRVDG5R1mZSxD8Xg=-0nLOiHay2T_UJ0yDX42g@mail.gmail.com/
Oh that is a bit different discussion than I was thinking of.. I fixed
all the cases to follow this advise and checked that all the free
functions are proper pairs of whatever is being allocated.
Thanks,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 6/8] fwctl: Add documentation
2024-06-05 16:03 ` Jason Gunthorpe
@ 2024-06-05 20:14 ` Randy Dunlap
0 siblings, 0 replies; 73+ messages in thread
From: Randy Dunlap @ 2024-06-05 20:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On 6/5/24 9:03 AM, Jason Gunthorpe wrote:
> On Tue, Jun 04, 2024 at 07:31:10PM -0700, Randy Dunlap wrote:
>
>>> +Modern devices contain extensive amounts of FW, and in many cases, are largely
>>> +software defined pieces of hardware. The evolution of this approach is largely a
>>
>> software-defined
>
> Thanks a lot Randy, I picked up all your notes.
>
>>> +While the kernel can always directly parse and restrict RPCs, it is expected
>>> +that the existing kernel pattern of allowing drivers to delegate validation to
>>> +FW to be a useful design.
>>
>> (and one that can be abused...)
>
> I would really like to write a paragraph about this "abuse", Dan has
> some good thoughts on this as well. Did you have a specific "abuse"
> in your mind?
No, I don't. It just seems very open (but ioctls are just as open).
--
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-04 14:04 ` Jakub Kicinski
2024-06-04 21:28 ` Saeed Mahameed
2024-06-04 23:56 ` Dan Williams
@ 2024-06-06 1:58 ` David Ahern
2 siblings, 0 replies; 73+ messages in thread
From: David Ahern @ 2024-06-06 1:58 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 6/4/24 8:04 AM, Jakub Kicinski wrote:
> Ooo, is that a sore spot?
Maintainer overreach? Absolutely.
The sky is not falling with this proposed subsystem; engineers are
merely trying to solve real, customer problems.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-05 13:59 ` Jason Gunthorpe
@ 2024-06-06 2:35 ` David Ahern
2024-06-06 14:18 ` Jakub Kicinski
2024-06-06 4:56 ` Dan Williams
2024-06-11 15:36 ` Daniel Vetter
2 siblings, 1 reply; 73+ messages in thread
From: David Ahern @ 2024-06-06 2:35 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jakub Kicinski, Jonathan Corbet, Itay Avraham, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Christoph Hellwig,
Jiri Pirko, Leonid Bloch, Leon Romanovsky, linux-cxl, patches
On 6/5/24 7:59 AM, Jason Gunthorpe wrote:
> On Tue, Jun 04, 2024 at 04:56:57PM -0700, Dan Williams wrote:
>> Jakub Kicinski wrote:
>> [..]
>>> I don't begrudge anyone building proprietary options, but leave
>>> upstream out of it.
>>
>> So I am of 2 minds here. In general, how is upstream benefited by
>> requiring every vendor command to be wrapped by a Linux command?
>
> People actually can use upstream :)
>
> Amazingly there is inherit benefit to people being able to use the
> software we produce.
There is. There is a clear preference for open source kernels and drivers.
Until a feature is standardized and/or commoditized, it does not make
sense to create a uapi for every H/W vendor whim. All of them are
attempting to solve real problems; some of them will stick. We know
which features are valuable when customers use them, ask for them and
other vendors copy them. Until then it is a 1-off by a vendor basically
proposing a solution. Not all ideas are good ideas, and we do not need
the burden of a uapi or the burden of out of tree drivers.
>
>> 3 years on from that recommendation it seems no vendor has even needed
>> that level of distribution help. I.e. checking a few distro kernels
>> (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in
>> their debug builds. I can only assume that locally compiled custom
>> kernel binaries are filling the need.
>
> My strong advice would be to be careful about this. Android-ism where
> nobody runs the upstream kernel is a real thing. For something
> emerging like CXL there is a real risk that the hyperscale folks will
> go off and do their own OOT stuff and in-tree CXL will be something
> usuable but inferior. I've seen this happen enough times..
>
> If people come and say we need X and the maintainer says no, they
> don't just give up and stop doing X, the go and do X anyhow out of
> tree. This has become especially true now that the center of business
> activity in server-Linux is driven by the hyperscale crowd that don't
> care much about upstream. Linux maintainer's don't actually have the
> power to force the industry to do things, though people do keep
> trying.. Maintainers can only lead, and productive leading is not done
> with a NO.
+1
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-05 13:59 ` Jason Gunthorpe
2024-06-06 2:35 ` David Ahern
@ 2024-06-06 4:56 ` Dan Williams
2024-06-06 8:50 ` Leon Romanovsky
2024-06-06 14:41 ` Jason Gunthorpe
2024-06-11 15:36 ` Daniel Vetter
2 siblings, 2 replies; 73+ messages in thread
From: Dan Williams @ 2024-06-06 4:56 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jakub Kicinski, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Jason Gunthorpe wrote:
[..]
> > 3 years on from that recommendation it seems no vendor has even needed
> > that level of distribution help. I.e. checking a few distro kernels
> > (Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in
> > their debug builds. I can only assume that locally compiled custom
> > kernel binaries are filling the need.
>
> My strong advice would be to be careful about this. Android-ism where
> nobody runs the upstream kernel is a real thing. For something
> emerging like CXL there is a real risk that the hyperscale folks will
> go off and do their own OOT stuff and in-tree CXL will be something
> usuable but inferior. I've seen this happen enough times..
Hence my openness to considering fwctl...
> If people come and say we need X and the maintainer says no, they
> don't just give up and stop doing X, the go and do X anyhow out of
> tree. This has become especially true now that the center of business
> activity in server-Linux is driven by the hyperscale crowd that don't
> care much about upstream.
"...don't care much about upstream...". This could be a whole separate
thread unto itself.
> Linux maintainer's don't actually have the power to force the industry
> to do things, though people do keep trying.. Maintainers can only
> lead, and productive leading is not done with a NO.
>
> You will start to see this pain in maybe 5-10 years if CXL starts to
> be something deployed in an enterprise RedHat/Dell/etc sort of
> environment. Then that missing X becomes a critical issue because it
> turns out the hyperscale folks long since figured out it is really
> important but didn't do anything to enable it upstream.
This matches other feedback I have heard recently. Yes, distros hate
contending with every vendor's userspace toolkit, that was the original
distro feedback motivating CONFIG_CXL_MEM_RAW_COMMANDS to have a poison
pill of WARN() on use. However, allowing more vendor commands is more
preferable than contending with vendor out-of-tree drivers that likely
help keep the enterprise-distro-kernel stable-ABI train rolling. In
other words, legalize it in order to centrally regulate it.
[..]
> This is my effort here. If we document the expectations there is a
> much better chance that a standard body or device manufacturer can
> implement their interfaces in a way that works with the OS. There is a
> much higher chance they will attract CVEs and be forced to fix it if
> the security expectations are clearly laid out. You had a good
> observation in one of those links about how they are not OS
> people. Let's help them do better.
>
> Shunt the less robust stuff to fwctl and then people can also make
> their own security choices, don't enable or load the fwctl modules and
> you get more protection. It is closer to your
> CONFIG_CXL_MEM_RAW_COMMANDS=y but at runtime.
>
> I think I captured most of your commentary below here in patch 6.
I will take a look...
> > Effects Log". In that "trust Command Effects" scenario the kernel still
> > has no idea what the command is actually doing, but it can at least
> > assert that the device does not claim that the command changes the
> > contents of system-memory. Now, you might say, "the device can just
> > lie", but that betrays a conceit of the kernel restriction. A device
> > could lie that a Linux wrapped command when passed certain payloads does
> > not in turn proxy to a restricted command.
>
> Yeah, we have to trust the device. If the device is hostile toward the
> OS then there are already big problems. We need to allow for
> unintentional defects in the devices, but we don't need to be
> paranoid.
>
> IMHO a command effects report, in conjunction with a robust OS centric
> defintion is something we can trust in.
So this is where I want to start and see if we can bridge the trust gap.
I am warming to your assertion that there is a wide array of
vendor-specific configuration and debug that are not an efficient use of
upstream's time to wrap in a shared Linux ABI. I want to explore fwctl
for CXL for that use case, I personally don't want to marshal a Linux
command to each vendor's slightly different backend CXL toggles.
At the same time, I also agree with the contention that a "do anything
you want and get away with it" tunnel invites shenanigans from folks
that may not care about the long term health of the Linux kernel vs
their short term interests. That it is difficult to unring the bell once
a tunnel is in place. While subsystems will rightly take different
stances to fwctl policy, that lack of one-size-fits all seems not
sufficient reason to keep the concept out of the kernel entirely.
I appreciate that you crafted this interface with an eye towards making
it unsuitable for data-path operations.
So my questions to try to understand the specific sticking points more
are:
1/ Can you think of a Command Effect that the device could enumerate to
address the specific shenanigan's that netdev is worried about? In other
words if every command a device enables has the stated effect of
"Configuration Change after Reset" does that cut out a significant
portion of the concern? Make this a debate on finer grained effects not
coarse grained binary decision on whether fwctl should move forward at
all.
2/ About the "what if the device lies?" question. We can't revert code
that used to work, but we can definitely work with enterprise distros to
turn off fwctl where there is concern it may lead or is leading to
shenanigans. So, document what each subsystem's stance towards fwctl is,
like maybe a distro only wants fwctl to front publicly documented vendor
commands, or maybe private vendor commands ok, but only with a
constrained set of Command Effects (I potentially see CXL here). A
distro should know what they are opting into for each fwctl instance, it
likely will always need to be subsystem specific policy. A distro can
also decide lockdown policy based on Command Effects above and beyond
the ones that clearly state they allow the device to modify the running
kernel.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 4:56 ` Dan Williams
@ 2024-06-06 8:50 ` Leon Romanovsky
2024-06-06 22:11 ` Dan Williams
2024-06-06 14:41 ` Jason Gunthorpe
1 sibling, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2024-06-06 8:50 UTC (permalink / raw)
To: Dan Williams
Cc: Jason Gunthorpe, Jakub Kicinski, David Ahern, Jonathan Corbet,
Itay Avraham, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, linux-cxl, patches
On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote:
> Jason Gunthorpe wrote:
<...>
> So my questions to try to understand the specific sticking points more
> are:
>
> 1/ Can you think of a Command Effect that the device could enumerate to
> address the specific shenanigan's that netdev is worried about? In other
> words if every command a device enables has the stated effect of
> "Configuration Change after Reset" does that cut out a significant
> portion of the concern?
It will prevent SR-IOV devices (or more accurate their VFs)
to be configured through the fwctl, as they are destroyed in HW
during reboot.
Thanks
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-05 18:27 ` Jason Gunthorpe
@ 2024-06-06 13:34 ` Jonathan Cameron
2024-06-06 15:37 ` Randy Dunlap
0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Cameron @ 2024-06-06 13:34 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Zhu Yanjun, Jonathan Corbet, Itay Avraham,
Jakub Kicinski, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, David Ahern, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, linux-cxl, patches, Peter Zijlstra, Julia Lawall
On Wed, 5 Jun 2024 15:27:26 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Wed, Jun 05, 2024 at 12:07:37PM +0100, Jonathan Cameron wrote:
>
> > > I don't recall that dramatic conclusion in the discussion, but it does
> > > make alot of sense to me.
> >
> > I'll be less lazy (and today found the search foo to track it down).
> >
> > https://lore.kernel.org/all/CAHk-=wicfvWPuRVDG5R1mZSxD8Xg=-0nLOiHay2T_UJ0yDX42g@mail.gmail.com/
>
> Oh that is a bit different discussion than I was thinking of.. I fixed
> all the cases to follow this advise and checked that all the free
> functions are proper pairs of whatever is being allocated.
Yes. I think we are approaching the point where maybe we need
a 'best practice guide' somewhere. It is sort of coding style, but
it is perhaps rather complex perhaps to put in that doc.
I'm happy to help review such changes, but it would be too far down
my todo list if I took on writing one.
Maybe there is one I've missed?
Jonathan
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 2:35 ` David Ahern
@ 2024-06-06 14:18 ` Jakub Kicinski
2024-06-06 14:48 ` Jason Gunthorpe
2024-06-07 7:34 ` Jiri Pirko
0 siblings, 2 replies; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-06 14:18 UTC (permalink / raw)
To: David Ahern
Cc: Jason Gunthorpe, Dan Williams, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Wed, 5 Jun 2024 20:35:49 -0600 David Ahern wrote:
> Until a feature is standardized and/or commoditized, it does not make
> sense to create a uapi for every H/W vendor whim.
This is not about non-standard features. I work with multiple vendors
as my day job. I ask them how to set basic link configuration and the
support person gives me a link to the vendor tools! I wish I could show
you the emails.
> All of them are attempting to solve real problems; some of them will
> stick. We know which features are valuable when customers use them,
Yes, once customers deploy a feature implemented via a vendor API
they will definitely migrate to a different API. Customers like risk
and wasting their engineering resources reimplementing and redeploying
things? And we have so much success move users to new APIs in Linux!
> ask for them and other vendors copy them. Until then it is a 1-off by
> a vendor basically proposing a solution.
Certainly. Because... who exactly will ask the second vendor to
implement the common API?
And the second vendor will most certainly not mind the extra delay and
inconvenience having their product shipped via the publicly reviewed,
and slow to deploy kernel, while the first one is happily selling
the same feature already.
> Not all ideas are good ideas, and we do not need the burden of a uapi
> or the burden of out of tree drivers.
This API gives user space SDKs a trivial way of implementing all
switching, routing, filtering, QoS offloads etc.
An argument can be made that given somewhat mixed switchdev experience
we should just stay out of the way and let that happen. But just make
that argument then, instead of pretending the use of this API will be
limited to custom very vendor specific things.
Again, if someone needs this to ship their custom CXL/Infiniband
AI fabric magic, which is un-interoperable by design -- none of
my concern. But keep TCP/IP networking out of this :|
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 4:56 ` Dan Williams
2024-06-06 8:50 ` Leon Romanovsky
@ 2024-06-06 14:41 ` Jason Gunthorpe
2024-06-06 14:58 ` Jakub Kicinski
2024-06-06 17:24 ` Dan Williams
1 sibling, 2 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-06 14:41 UTC (permalink / raw)
To: Dan Williams
Cc: Jakub Kicinski, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote:
> > If people come and say we need X and the maintainer says no, they
> > don't just give up and stop doing X, the go and do X anyhow out of
> > tree. This has become especially true now that the center of business
> > activity in server-Linux is driven by the hyperscale crowd that don't
> > care much about upstream.
>
> "...don't care much about upstream...". This could be a whole separate
> thread unto itself.
Heh, it is a topic, but perhaps not one for polite company :)
> > Linux maintainer's don't actually have the power to force the industry
> > to do things, though people do keep trying.. Maintainers can only
> > lead, and productive leading is not done with a NO.
> >
> > You will start to see this pain in maybe 5-10 years if CXL starts to
> > be something deployed in an enterprise RedHat/Dell/etc sort of
> > environment. Then that missing X becomes a critical issue because it
> > turns out the hyperscale folks long since figured out it is really
> > important but didn't do anything to enable it upstream.
>
> This matches other feedback I have heard recently. Yes, distros hate
> contending with every vendor's userspace toolkit, that was the
> original
I'm not sure that is 100% true. Sure nobody likes that you have to
type 'abc X' and 'def Y' to do a similar thing, but from a distro
perpective if abc and def are both open sourced and packaged in the
distro it is still a far better outcome than users doing OOT drivers
and binary-only tools.
eg one of the long standing main Mellanox tools that is being ported
to fwctl is open source and in all distros:
https://rpmfind.net/linux/rpm2html/search.php?query=mstflint
Projects have already experimented building tooling on top of it to
make a more cross-vendor experience in some areas.
In my view it is wrong to think the kernel is the only place we can
make generic things or that allowing userspace to see the raw device
interface immediately means fragmentation and chaos. The industry is
more robust than that. Giving people working in userspace room to
invent their own solutions is actually helpful to driving some
commonality. There are already soft targets in the K8S that people
need to fit into, if the first few steps are with abc/def tools and
that brings us to an eventual true commonality, then great.
> distro feedback motivating CONFIG_CXL_MEM_RAW_COMMANDS to have a poison
> pill of WARN() on use. However, allowing more vendor commands is more
> preferable than contending with vendor out-of-tree drivers that likely
> help keep the enterprise-distro-kernel stable-ABI train rolling. In
> other words, legalize it in order to centrally regulate it.
I also liked Jakub's idea of putting a taint in for things that were
likely to have an impact on support and debug, I included that concept
in fwctl.
> > > Effects Log". In that "trust Command Effects" scenario the kernel still
> > > has no idea what the command is actually doing, but it can at least
> > > assert that the device does not claim that the command changes the
> > > contents of system-memory. Now, you might say, "the device can just
> > > lie", but that betrays a conceit of the kernel restriction. A device
> > > could lie that a Linux wrapped command when passed certain payloads does
> > > not in turn proxy to a restricted command.
> >
> > Yeah, we have to trust the device. If the device is hostile toward the
> > OS then there are already big problems. We need to allow for
> > unintentional defects in the devices, but we don't need to be
> > paranoid.
> >
> > IMHO a command effects report, in conjunction with a robust OS centric
> > defintion is something we can trust in.
>
> So this is where I want to start and see if we can bridge the trust gap.
>
> I am warming to your assertion that there is a wide array of
> vendor-specific configuration and debug that are not an efficient use of
> upstream's time to wrap in a shared Linux ABI. I want to explore fwctl
> for CXL for that use case, I personally don't want to marshal a Linux
> command to each vendor's slightly different backend CXL toggles.
Personally I think this idea to marshal/unmarshal everything in the
kernel is often misguided. If it is truely obvious and actually shared
multi-vendor capability then by all means go and do it.
But if you are spending weeks/months fighting about uAPI because all
the vendors are so different, it isn't obvious what is "generic" then
you've probably already lost. The very worst outcome is a per-device
uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of
peoples time to argue out.
> At the same time, I also agree with the contention that a "do anything
> you want and get away with it" tunnel invites shenanigans from folks
> that may not care about the long term health of the Linux kernel vs
> their short term interests.
IMHO this is disproven by history. The above mstflint I linked to is
as old as as mlx5 HW, it runs today over PCI config space and an OOT
driver. Where is real the damage to the long term health of Linux or
the ecosystem?
Like I said before I view there is a difference between DRM wanting a
Vulkan stack and doing some device specific
configuration/debugging. One has vastly more open source value than
the other.
> So my questions to try to understand the specific sticking points more
> are:
>
> 1/ Can you think of a Command Effect that the device could enumerate to
> address the specific shenanigan's that netdev is worried about?
Nothing comes to mind..
> In other words if every command a device enables has the stated
> effect of "Configuration Change after Reset" does that cut out a
> significant portion of the concern?
Related to configuration - one of Saeed's oringinal ideas was to
implement a devlink command to set the configurables in the flash in a
way that mlx5 could implement all of its options, ideally with
configurables discovered dynamically from the running device. This LPC
presentation was so agressively rejected by Jakub that Saeed abandoned
it. In the discussion it was clear Jakub is requesting to review and
possibly reject every configurable.
On this topic, unfortunately, I don't see any technical middle ground
between "netdev is the gatekeeper for all FLASH configurables" and
"devices can be fully configured regardless of their design".
> 2/ About the "what if the device lies?" question. We can't revert code
> that used to work, but we can definitely work with enterprise distros to
> turn off fwctl where there is concern it may lead or is leading to
> shenanigans.
Security is the one place where Linus has tolerated userspace
regressions. In this specific case I documented (or at least that was
the intent) there would be regression consequences to breaking the
security rules. Commands can be retroactively restricted to higher CAP
levels and rejected from lockdown if the device attracts a CVE.
IMHO the ecosystem is strongly motived to do security seriously these
days, I am not so worried.
> So, document what each subsystem's stance towards fwctl is,
> like maybe a distro only wants fwctl to front publicly documented vendor
> commands, or maybe private vendor commands ok, but only with a
> constrained set of Command Effects (I potentially see CXL here).
I wouldn't say subsystem here, but techonology. I think it is
reasonable that a CXL fwctl driver have some kconfig tunables like you
already have. This idea works alot better if the underlying thing is
already standards based.
Linux subsystem isn't a meaningful concept for a multi-function device
like mlx5 and others.
Thanks,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 14:18 ` Jakub Kicinski
@ 2024-06-06 14:48 ` Jason Gunthorpe
2024-06-06 15:05 ` Jakub Kicinski
2024-06-07 7:34 ` Jiri Pirko
1 sibling, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-06 14:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Ahern, Dan Williams, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Thu, Jun 06, 2024 at 07:18:11AM -0700, Jakub Kicinski wrote:
> An argument can be made that given somewhat mixed switchdev experience
> we should just stay out of the way and let that happen. But just make
> that argument then, instead of pretending the use of this API will be
> limited to custom very vendor specific things.
Huh?
At least mlx5 already has a very robust userspace competition to
switchdev using RDMA APIs, available in DPDK. This is long since been
done and is widely deployed.
I have no idea where you get this made up idea that fwctl is somehow
about dataplane SDKs. The acclerated networking industry long ago
moved pasted netdev in upstream, it is well known to everyone. There
is no trick here.
fwctl is not some scheme to sneak dataplane SDKs into the kernel, you
are just making stuff up.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 14:41 ` Jason Gunthorpe
@ 2024-06-06 14:58 ` Jakub Kicinski
2024-06-06 17:24 ` Dan Williams
1 sibling, 0 replies; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-06 14:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Dan Williams, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Thu, 6 Jun 2024 11:41:02 -0300 Jason Gunthorpe wrote:
> In my view it is wrong to think the kernel is the only place we can
> make generic things or that allowing userspace to see the raw device
> interface immediately means fragmentation and chaos. The industry is
> more robust than that. Giving people working in userspace room to
> invent their own solutions is actually helpful to driving some
> commonality. There are already soft targets in the K8S that people
> need to fit into, if the first few steps are with abc/def tools and
> that brings us to an eventual true commonality, then great.
Yes, this is the core of our disagreement. And one which is quite hard
to resolve with technical arguments.
I believe kernel may not be a great place to keep all the controls,
but it is in my opinion the most healthy open source project among
the available options. You mention K8S, but I'd give SoNiC (the NOS)
as a more relevant example. A hyperscaler or another trillion dollar
company can certainly have a swing at creating other open layers of
commonality. Together with its other trillion dollar friends.
Removing the minor inconvenience of having to ship an out of tree
module for out of tree tools is not worth the loss.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 14:48 ` Jason Gunthorpe
@ 2024-06-06 15:05 ` Jakub Kicinski
2024-06-06 17:47 ` David Ahern
0 siblings, 1 reply; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-06 15:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Ahern, Dan Williams, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Thu, 6 Jun 2024 11:48:18 -0300 Jason Gunthorpe wrote:
> > An argument can be made that given somewhat mixed switchdev experience
> > we should just stay out of the way and let that happen. But just make
> > that argument then, instead of pretending the use of this API will be
> > limited to custom very vendor specific things.
>
> Huh?
I'm sorry, David as been working in netdev for a long time.
I have a tendency to address the person I'm replying to,
assuming their level of understanding of the problem space.
Which makes it harder to understand for bystanders.
> At least mlx5 already has a very robust userspace competition to
> switchdev using RDMA APIs, available in DPDK. This is long since been
> done and is widely deployed.
Yeah, we had this discussion multiple times
> I have no idea where you get this made up idea that fwctl is somehow
> about dataplane SDKs. The acclerated networking industry long ago
> moved pasted netdev in upstream, it is well known to everyone. There
> is no trick here.
>
> fwctl is not some scheme to sneak dataplane SDKs into the kernel, you
> are just making stuff up.
By dataplane SDK you mean DOCA? I don't even want to go there.
I just meant forwarding offload _which I said_. You didn't understand
and now you're accusing me of "making stuff up".
This whole conversation is such a damn waste of time.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device
2024-06-06 13:34 ` Jonathan Cameron
@ 2024-06-06 15:37 ` Randy Dunlap
0 siblings, 0 replies; 73+ messages in thread
From: Randy Dunlap @ 2024-06-06 15:37 UTC (permalink / raw)
To: Jonathan Cameron, Jason Gunthorpe
Cc: Leon Romanovsky, Zhu Yanjun, Jonathan Corbet, Itay Avraham,
Jakub Kicinski, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Dan Williams, David Ahern, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, linux-cxl, patches, Peter Zijlstra, Julia Lawall
On 6/6/24 6:34 AM, Jonathan Cameron wrote:
> On Wed, 5 Jun 2024 15:27:26 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
>> On Wed, Jun 05, 2024 at 12:07:37PM +0100, Jonathan Cameron wrote:
>>
>>>> I don't recall that dramatic conclusion in the discussion, but it does
>>>> make alot of sense to me.
>>>
>>> I'll be less lazy (and today found the search foo to track it down).
>>>
>>> https://lore.kernel.org/all/CAHk-=wicfvWPuRVDG5R1mZSxD8Xg=-0nLOiHay2T_UJ0yDX42g@mail.gmail.com/
>>
>> Oh that is a bit different discussion than I was thinking of.. I fixed
>> all the cases to follow this advise and checked that all the free
>> functions are proper pairs of whatever is being allocated.
>
> Yes. I think we are approaching the point where maybe we need
> a 'best practice guide' somewhere. It is sort of coding style, but
> it is perhaps rather complex perhaps to put in that doc.
>
> I'm happy to help review such changes, but it would be too far down
> my todo list if I took on writing one.
>
> Maybe there is one I've missed?
There is not a published one that I know of, other than one that I pasted into
an email in Dec-2023, in this post:
https://lore.kernel.org/lkml/34e27c57-fc18-4918-bf44-4f8a53825361@infradead.org/
--
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 14:41 ` Jason Gunthorpe
2024-06-06 14:58 ` Jakub Kicinski
@ 2024-06-06 17:24 ` Dan Williams
2024-06-07 0:25 ` Jason Gunthorpe
1 sibling, 1 reply; 73+ messages in thread
From: Dan Williams @ 2024-06-06 17:24 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jakub Kicinski, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
Jason Gunthorpe wrote:
[..]
> > I am warming to your assertion that there is a wide array of
> > vendor-specific configuration and debug that are not an efficient use of
> > upstream's time to wrap in a shared Linux ABI. I want to explore fwctl
> > for CXL for that use case, I personally don't want to marshal a Linux
> > command to each vendor's slightly different backend CXL toggles.
>
> Personally I think this idea to marshal/unmarshal everything in the
> kernel is often misguided. If it is truely obvious and actually shared
> multi-vendor capability then by all means go and do it.
>
> But if you are spending weeks/months fighting about uAPI because all
> the vendors are so different, it isn't obvious what is "generic" then
> you've probably already lost. The very worst outcome is a per-device
> uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of
> peoples time to argue out.
Certainly once you have gotten to the "months of arguing" point it begs the
question "was there really any generic benefit to reap in the first
place?"
That said, *some* grappling, especially when muliple vendors hit the
list with the similar feature at the same time, has yielded
collaboration in the past. So I might be a few rungs back on the
spectrum from where you are, but I concede that yes, there is a point of
diminishing to negative returns.
> > At the same time, I also agree with the contention that a "do anything
> > you want and get away with it" tunnel invites shenanigans from folks
> > that may not care about the long term health of the Linux kernel vs
> > their short term interests.
>
> IMHO this is disproven by history. The above mstflint I linked to is
> as old as as mlx5 HW, it runs today over PCI config space and an OOT
> driver. Where is real the damage to the long term health of Linux or
> the ecosystem?
>
> Like I said before I view there is a difference between DRM wanting a
> Vulkan stack and doing some device specific
> configuration/debugging. One has vastly more open source value than
> the other.
Fair.
> > So my questions to try to understand the specific sticking points more
> > are:
> >
> > 1/ Can you think of a Command Effect that the device could enumerate to
> > address the specific shenanigan's that netdev is worried about?
>
> Nothing comes to mind..
Ugh, that indeed seems too severe.
> > In other words if every command a device enables has the stated
> > effect of "Configuration Change after Reset" does that cut out a
> > significant portion of the concern?
> > In other words if every command a device enables has the stated
> > effect of "Configuration Change after Reset" does that cut out a
> > significant portion of the concern?
>
> Related to configuration - one of Saeed's oringinal ideas was to
> way that mlx5 could implement all of its options, ideally with
> configurables discovered dynamically from the running device. This LPC
> presentation was so agressively rejected by Jakub that Saeed abandoned
> it. In the discussion it was clear Jakub is requesting to review and
> possibly reject every configurable.
> between "netdev is the gatekeeper for all FLASH configurables" and
> "devices can be fully configured regardless of their design".
This gets back to the unspoken conceit of the kernel restriction that I
mentioned earlier. At some point the kernel restriction begets a cynical
in-tree workaround or an out-of-tree workaround which either way means
upstream Linux loses.
> > 2/ About the "what if the device lies?" question. We can't revert code
> > that used to work, but we can definitely work with enterprise distros to
> > turn off fwctl where there is concern it may lead or is leading to
> > shenanigans.
>
> Security is the one place where Linus has tolerated userspace
> regressions. In this specific case I documented (or at least that was
> the intent) there would be regression consequences to breaking the
> security rules. Commands can be retroactively restricted to higher CAP
> levels and rejected from lockdown if the device attracts a CVE.
>
> IMHO the ecosystem is strongly motived to do security seriously these
> days, I am not so worried.
That is a good point, if a Command Effect gets tied to a CVE, or a
cynical workaround gets tied to a CVE, both of those demand an upstream
and distro response.
> > So, document what each subsystem's stance towards fwctl is,
> > like maybe a distro only wants fwctl to front publicly documented vendor
> > commands, or maybe private vendor commands ok, but only with a
> > constrained set of Command Effects (I potentially see CXL here).
>
> I wouldn't say subsystem here, but techonology. I think it is
> reasonable that a CXL fwctl driver have some kconfig tunables like you
> already have. This idea works alot better if the underlying thing is
> already standards based.
True, I worry about these technologies that cross upstream maintainer
boundaries. When you have a composable switch that enables net, block,
and/or mem use cases, which upstream maintainer policy applies to the
fwctl posture of that thing?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 15:05 ` Jakub Kicinski
@ 2024-06-06 17:47 ` David Ahern
2024-06-07 6:48 ` Jiri Pirko
0 siblings, 1 reply; 73+ messages in thread
From: David Ahern @ 2024-06-06 17:47 UTC (permalink / raw)
To: Jakub Kicinski, Jason Gunthorpe
Cc: Dan Williams, Jonathan Corbet, Itay Avraham, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Christoph Hellwig,
Jiri Pirko, Leonid Bloch, Leon Romanovsky, linux-cxl, patches
On 6/6/24 9:05 AM, Jakub Kicinski wrote:
> On Thu, 6 Jun 2024 11:48:18 -0300 Jason Gunthorpe wrote:
>>> An argument can be made that given somewhat mixed switchdev experience
>>> we should just stay out of the way and let that happen. But just make
>>> that argument then, instead of pretending the use of this API will be
>>> limited to custom very vendor specific things.
>>
>> Huh?
>
> I'm sorry, David as been working in netdev for a long time.
And I will continue working on Linux networking stack (netdev) while I
also work with the IB S/W stack, fwctl, and any other part of Linux
relevant to my job. I am not going to pick a silo (and should not be
required to).
> I have a tendency to address the person I'm replying to,
> assuming their level of understanding of the problem space.
> Which makes it harder to understand for bystanders.
>
>> At least mlx5 already has a very robust userspace competition to
>> switchdev using RDMA APIs, available in DPDK. This is long since been
>> done and is widely deployed.
>
> Yeah, we had this discussion multiple times
The switchdev / sonic comparison came to mind as well during this
thread. The existence of a kernel way (switchdev) has not stopped sonic
(userspace SDK) from gaining traction. In some cases the SDK is required
for device features that do not have a kernel uapi or vendors refuse to
offer a kernel way, so it is the only option.
The bottom line to me is that these hardline, dogmatic approaches -
resisting the recognition of reality - is only harming users. There is a
middle ground, open source drivers and tools that offer more flexibility.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 8:50 ` Leon Romanovsky
@ 2024-06-06 22:11 ` Dan Williams
2024-06-07 0:02 ` Jason Gunthorpe
2024-06-07 13:12 ` Leon Romanovsky
0 siblings, 2 replies; 73+ messages in thread
From: Dan Williams @ 2024-06-06 22:11 UTC (permalink / raw)
To: Leon Romanovsky, Dan Williams
Cc: Jason Gunthorpe, Jakub Kicinski, David Ahern, Jonathan Corbet,
Itay Avraham, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, linux-cxl, patches
Leon Romanovsky wrote:
> On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
>
> <...>
>
> > So my questions to try to understand the specific sticking points more
> > are:
> >
> > 1/ Can you think of a Command Effect that the device could enumerate to
> > address the specific shenanigan's that netdev is worried about? In other
> > words if every command a device enables has the stated effect of
> > "Configuration Change after Reset" does that cut out a significant
> > portion of the concern?
>
> It will prevent SR-IOV devices (or more accurate their VFs)
> to be configured through the fwctl, as they are destroyed in HW
> during reboot.
Right, but between zero configurability and losing live SR-IOV
configurabilitiy is there still value? Note, this is just a thought
experiment on what if any Command Effects Linux can comfortably tolerate
vs those that start to be more spicy and dip into removing stimulus /
focus on the commons, or otherwise injuring collaboration.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 22:11 ` Dan Williams
@ 2024-06-07 0:02 ` Jason Gunthorpe
2024-06-07 13:12 ` Leon Romanovsky
1 sibling, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-07 0:02 UTC (permalink / raw)
To: Dan Williams
Cc: Leon Romanovsky, Jakub Kicinski, David Ahern, Jonathan Corbet,
Itay Avraham, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, linux-cxl, patches
On Thu, Jun 06, 2024 at 03:11:21PM -0700, Dan Williams wrote:
> Leon Romanovsky wrote:
> > On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote:
> > > Jason Gunthorpe wrote:
> >
> > <...>
> >
> > > So my questions to try to understand the specific sticking points more
> > > are:
> > >
> > > 1/ Can you think of a Command Effect that the device could enumerate to
> > > address the specific shenanigan's that netdev is worried about? In other
> > > words if every command a device enables has the stated effect of
> > > "Configuration Change after Reset" does that cut out a significant
> > > portion of the concern?
> >
> > It will prevent SR-IOV devices (or more accurate their VFs)
> > to be configured through the fwctl, as they are destroyed in HW
> > during reboot.
>
> Right, but between zero configurability and losing live SR-IOV
> configurabilitiy is there still value? Note, this is just a thought
> experiment on what if any Command Effects Linux can comfortably tolerate
> vs those that start to be more spicy and dip into removing stimulus /
> focus on the commons, or otherwise injuring collaboration.
I like the idea of "takes effect on _function_ reset". VFs and PFs
both often have configuration that can become current once the fuction
is reset. A VF is usually reset by something like VFIO while a PF is
usually reset by a power cycle.
The fact configuration doesn't change until reset is, IMHO, a very
strong barrier from making some backdoor into a subsystem driver.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 17:24 ` Dan Williams
@ 2024-06-07 0:25 ` Jason Gunthorpe
2024-06-07 10:47 ` Przemek Kitszel
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-07 0:25 UTC (permalink / raw)
To: Dan Williams
Cc: Jakub Kicinski, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On Thu, Jun 06, 2024 at 10:24:46AM -0700, Dan Williams wrote:
> Jason Gunthorpe wrote:
> [..]
> > > I am warming to your assertion that there is a wide array of
> > > vendor-specific configuration and debug that are not an efficient use of
> > > upstream's time to wrap in a shared Linux ABI. I want to explore fwctl
> > > for CXL for that use case, I personally don't want to marshal a Linux
> > > command to each vendor's slightly different backend CXL toggles.
> >
> > Personally I think this idea to marshal/unmarshal everything in the
> > kernel is often misguided. If it is truely obvious and actually shared
> > multi-vendor capability then by all means go and do it.
> >
> > But if you are spending weeks/months fighting about uAPI because all
> > the vendors are so different, it isn't obvious what is "generic" then
> > you've probably already lost. The very worst outcome is a per-device
> > uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of
> > peoples time to argue out.
>
> Certainly once you have gotten to the "months of arguing" point it begs the
> question "was there really any generic benefit to reap in the first
> place?"
Indeed, but I've seen, and participated, in these things many times :)
> That said, *some* grappling, especially when muliple vendors hit the
> list with the similar feature at the same time, has yielded
> collaboration in the past.
Absolutely! But we have also frequently done that retroactively, like
see three examples and then consolidate the common APIs. The challenge
is uAPI. Since we can't change uAPI people like to rush to make it
future proof without examples. Broadly I lean towards waiting until we
have several examples to build a standard uAPI and let the examples
evolve on their own.
If there is value in the commonality then people will change over.
> This gets back to the unspoken conceit of the kernel restriction that I
> mentioned earlier. At some point the kernel restriction begets a cynical
> in-tree workaround or an out-of-tree workaround which either way means
> upstream Linux loses.
Right.. The kernel just don't have the power to say no to the
industry. Things will just go OOT and it is really our community that
suffers in the long run. As I said, you can't lead with NO.
IHMO there has to be a really high quality reason to keep support for
HW that people have built out of the kernel. Especially start ups and
other more vulnerable companies. I don't think Linux maintainers
should be choosing industry winners and losers. I sometimes feel I
have a minority opinion here though :(
> > > So, document what each subsystem's stance towards fwctl is,
> > > like maybe a distro only wants fwctl to front publicly documented vendor
> > > commands, or maybe private vendor commands ok, but only with a
> > > constrained set of Command Effects (I potentially see CXL here).
> >
> > I wouldn't say subsystem here, but techonology. I think it is
> > reasonable that a CXL fwctl driver have some kconfig tunables like you
> > already have. This idea works alot better if the underlying thing is
> > already standards based.
>
> True, I worry about these technologies that cross upstream maintainer
> boundaries. When you have a composable switch that enables net, block,
> and/or mem use cases, which upstream maintainer policy applies to the
> fwctl posture of that thing?
fwctl is intended to sit on its own. I think it is even a bad
architecture direction that Linux has N different ways to flash FW on
devices, N different ways to read diagnostics, etc all because each
subsystem went on its own. With fwctl I'd like to see a greater
consolidation of not re-inventing the low level of fw interaction
differently in each and every subsystem.
Like you mentioned CXL has its own way to program flash. How many ways
does Linux have to update device flash now? :(
So, if you have a real multi-function device fwctl should be the
central place to operate the shared PCI function and the FW
interface. There may be some duplication in subsystems, but that is a
side effect of our sub-system siloed development model (software
architecture tends to follow org chart, after all)
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 17:47 ` David Ahern
@ 2024-06-07 6:48 ` Jiri Pirko
2024-06-07 14:50 ` David Ahern
0 siblings, 1 reply; 73+ messages in thread
From: Jiri Pirko @ 2024-06-07 6:48 UTC (permalink / raw)
To: David Ahern
Cc: Jakub Kicinski, Jason Gunthorpe, Dan Williams, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
Thu, Jun 06, 2024 at 07:47:20PM CEST, dsahern@kernel.org wrote:
>On 6/6/24 9:05 AM, Jakub Kicinski wrote:
>> On Thu, 6 Jun 2024 11:48:18 -0300 Jason Gunthorpe wrote:
>>>> An argument can be made that given somewhat mixed switchdev experience
>>>> we should just stay out of the way and let that happen. But just make
>>>> that argument then, instead of pretending the use of this API will be
>>>> limited to custom very vendor specific things.
>>>
>>> Huh?
>>
>> I'm sorry, David as been working in netdev for a long time.
>
>And I will continue working on Linux networking stack (netdev) while I
>also work with the IB S/W stack, fwctl, and any other part of Linux
>relevant to my job. I am not going to pick a silo (and should not be
>required to).
>
>> I have a tendency to address the person I'm replying to,
>> assuming their level of understanding of the problem space.
>> Which makes it harder to understand for bystanders.
>>
>>> At least mlx5 already has a very robust userspace competition to
>>> switchdev using RDMA APIs, available in DPDK. This is long since been
>>> done and is widely deployed.
>>
>> Yeah, we had this discussion multiple times
>
>The switchdev / sonic comparison came to mind as well during this
>thread. The existence of a kernel way (switchdev) has not stopped sonic
>(userspace SDK) from gaining traction. In some cases the SDK is required
Is this discussion technical or policital? I'm asking because it makes
huge difference. There is no technical reason why sonic does not use
proper in-kernel solution from what I see
Yes, they chose technically the wrong way, a shortcut, requiring kernel
bypass. Honestly for reasons that are beyond my understanding :/
>for device features that do not have a kernel uapi or vendors refuse to
>offer a kernel way, so it is the only option.
Policical reasons.
>
>The bottom line to me is that these hardline, dogmatic approaches -
>resisting the recognition of reality - is only harming users. There is a
>middle ground, open source drivers and tools that offer more flexibility.
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 14:18 ` Jakub Kicinski
2024-06-06 14:48 ` Jason Gunthorpe
@ 2024-06-07 7:34 ` Jiri Pirko
2024-06-07 12:49 ` Andrew Lunn
1 sibling, 1 reply; 73+ messages in thread
From: Jiri Pirko @ 2024-06-07 7:34 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Ahern, Jason Gunthorpe, Dan Williams, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
Thu, Jun 06, 2024 at 04:18:11PM CEST, kuba@kernel.org wrote:
>On Wed, 5 Jun 2024 20:35:49 -0600 David Ahern wrote:
>> Until a feature is standardized and/or commoditized, it does not make
>> sense to create a uapi for every H/W vendor whim.
>
>This is not about non-standard features. I work with multiple vendors
>as my day job. I ask them how to set basic link configuration and the
>support person gives me a link to the vendor tools! I wish I could show
>you the emails.
Even without emails seen, I believe you. Well, isn't it just natural? I
mean, it always takes a bigger (sometimes much bigger) effort to
implement things properly introducing/extending apis/uapis.
Implement things in vendor tool is easy, low hanging fruit, people
naturally pick them.
I've been around in netdev for better part of second decade.
I think, for the sake of discussion, it is worth mentioning, that
a big part of netdev success despite complexicity is that in the
past, any attempt of kernel bypass (I recall few) was promptly rejected.
There was always big push for proper abstracted solution. And I believe
it helped a lot all over the place. Is this approach depleted?
I don't know, maybe. (And yes, I'm aware not everything could be done
this way).
I understand the reason and motivation for this patchset and what it
will solve, don't get me wrong. I kind of like it, it will help to
remove all painful detours we currenly have.
My concern is, it opens a pandora box for netdev *for sure*.
It that desired and anticipated?
Do the gains overweight the potential losses? Will it help the
ecosystem?
What is motivation for vendor to take the hard way of using proper api
(even existing ones) after?
Moreover, wouldn't this serve for vendors to go out of leash and start
to introduce even more H/W vendor whims?
I think these are serious questions we need to ask before this is merged.
>
>> All of them are attempting to solve real problems; some of them will
>> stick. We know which features are valuable when customers use them,
>
>Yes, once customers deploy a feature implemented via a vendor API
>they will definitely migrate to a different API. Customers like risk
>and wasting their engineering resources reimplementing and redeploying
>things? And we have so much success move users to new APIs in Linux!
>
>> ask for them and other vendors copy them. Until then it is a 1-off by
>> a vendor basically proposing a solution.
>
>Certainly. Because... who exactly will ask the second vendor to
>implement the common API?
>
>And the second vendor will most certainly not mind the extra delay and
>inconvenience having their product shipped via the publicly reviewed,
>and slow to deploy kernel, while the first one is happily selling
>the same feature already.
>
>> Not all ideas are good ideas, and we do not need the burden of a uapi
>> or the burden of out of tree drivers.
>
>This API gives user space SDKs a trivial way of implementing all
>switching, routing, filtering, QoS offloads etc.
>An argument can be made that given somewhat mixed switchdev experience
Can you elaborabe a bit more what you mean by "mixed switchdev
experience" please?
>we should just stay out of the way and let that happen. But just make
>that argument then, instead of pretending the use of this API will be
>limited to custom very vendor specific things.
>
>Again, if someone needs this to ship their custom CXL/Infiniband
>AI fabric magic, which is un-interoperable by design -- none of
>my concern. But keep TCP/IP networking out of this :|
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 0:25 ` Jason Gunthorpe
@ 2024-06-07 10:47 ` Przemek Kitszel
0 siblings, 0 replies; 73+ messages in thread
From: Przemek Kitszel @ 2024-06-07 10:47 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jakub Kicinski, David Ahern, Jonathan Corbet, Itay Avraham,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 6/7/24 02:25, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2024 at 10:24:46AM -0700, Dan Williams wrote:
>> Jason Gunthorpe wrote:
>> [..]
>>>> I am warming to your assertion that there is a wide array of
>>>> vendor-specific configuration and debug that are not an efficient use of
>>>> upstream's time to wrap in a shared Linux ABI. I want to explore fwctl
>>>> for CXL for that use case, I personally don't want to marshal a Linux
>>>> command to each vendor's slightly different backend CXL toggles.
>>>
>>> Personally I think this idea to marshal/unmarshal everything in the
>>> kernel is often misguided. If it is truely obvious and actually shared
>>> multi-vendor capability then by all means go and do it.
>>>
>>> But if you are spending weeks/months fighting about uAPI because all
>>> the vendors are so different, it isn't obvious what is "generic" then
>>> you've probably already lost. The very worst outcome is a per-device
>>> uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of
>>> peoples time to argue out.
>>
>> Certainly once you have gotten to the "months of arguing" point it begs the
>> question "was there really any generic benefit to reap in the first
>> place?"
>
> Indeed, but I've seen, and participated, in these things many times :)
>
>> That said, *some* grappling, especially when muliple vendors hit the
>> list with the similar feature at the same time, has yielded
>> collaboration in the past.
>
> Absolutely! But we have also frequently done that retroactively, like
> see three examples and then consolidate the common APIs. The challenge
> is uAPI. Since we can't change uAPI people like to rush to make it
> future proof without examples. Broadly I lean towards waiting until we
> have several examples to build a standard uAPI and let the examples
> evolve on their own.
>
> If there is value in the commonality then people will change over.
what has changed over decades is that now Linux has much more users than
implementations of given tool
I would love to see a move of the uAPI barrier closer to the user,
we will be free to refactor kernel APIs, given "the system tool" will be
updated at the same time.
Obviously for a new uAPI that would (re)move the promise on the very
beginning.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 7:34 ` Jiri Pirko
@ 2024-06-07 12:49 ` Andrew Lunn
2024-06-07 13:34 ` Jiri Pirko
0 siblings, 1 reply; 73+ messages in thread
From: Andrew Lunn @ 2024-06-07 12:49 UTC (permalink / raw)
To: Jiri Pirko
Cc: Jakub Kicinski, David Ahern, Jason Gunthorpe, Dan Williams,
Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, Leon Romanovsky, linux-cxl, patches
> >This API gives user space SDKs a trivial way of implementing all
> >switching, routing, filtering, QoS offloads etc.
> >An argument can be made that given somewhat mixed switchdev experience
>
> Can you elaborabe a bit more what you mean by "mixed switchdev
> experience" please?
I don't want to put words in Jakubs mouth but, in my opinion,
switchdev has been great for SoHo switches. We have over 100
supported, mostly implemented by the community, but some vendors also
supporting their own hardware.
We have two enterprise switch families supported, each by its own
vendor. And we have one TOR switch family supported by the vendor.
So i would say switchdev has worked out great for SoHo, but kernel
bypass is still the norm for most things bigger than SoHo.
Why? My guess is, the products with a SoHo switch is not actually a
switch. It is a wifi box, with a switch. It is a cable modem, with a
switch. It is an inflight entertainment system, with a switch, etc.
It is much easier to build such multi-purpose systems when everything
is nicely integrated into the kernel, you don't have to fight with
multiple vendors supplying SDKs which only work on a disjoint set of
kernels, etc.
For bigger, single purpose devices, it is just a switch, there is less
inconvenience of using just one vendor SDK, on top of the vendor
proscribed kernel.
Andrew
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-06 22:11 ` Dan Williams
2024-06-07 0:02 ` Jason Gunthorpe
@ 2024-06-07 13:12 ` Leon Romanovsky
1 sibling, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2024-06-07 13:12 UTC (permalink / raw)
To: Dan Williams
Cc: Jason Gunthorpe, Jakub Kicinski, David Ahern, Jonathan Corbet,
Itay Avraham, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan, Andy Gospodarek, Aron Silverton,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, linux-cxl, patches
On Thu, Jun 06, 2024 at 03:11:21PM -0700, Dan Williams wrote:
> Leon Romanovsky wrote:
> > On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote:
> > > Jason Gunthorpe wrote:
> >
> > <...>
> >
> > > So my questions to try to understand the specific sticking points more
> > > are:
> > >
> > > 1/ Can you think of a Command Effect that the device could enumerate to
> > > address the specific shenanigan's that netdev is worried about? In other
> > > words if every command a device enables has the stated effect of
> > > "Configuration Change after Reset" does that cut out a significant
> > > portion of the concern?
> >
> > It will prevent SR-IOV devices (or more accurate their VFs)
> > to be configured through the fwctl, as they are destroyed in HW
> > during reboot.
>
> Right, but between zero configurability and losing live SR-IOV
> configurabilitiy is there still value?
For the users that are using SR-IOV, it is a big loss. It will require
from them to use two tools now instead of one.
My point is that we need to try and find best solution for the users
and not "compromise variant" that will make everyone unhappy.
Thanks
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 12:49 ` Andrew Lunn
@ 2024-06-07 13:34 ` Jiri Pirko
2024-06-08 1:43 ` Jakub Kicinski
0 siblings, 1 reply; 73+ messages in thread
From: Jiri Pirko @ 2024-06-07 13:34 UTC (permalink / raw)
To: Andrew Lunn
Cc: Jakub Kicinski, David Ahern, Jason Gunthorpe, Dan Williams,
Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, Leon Romanovsky, linux-cxl, patches
Fri, Jun 07, 2024 at 02:49:19PM CEST, andrew@lunn.ch wrote:
>> >This API gives user space SDKs a trivial way of implementing all
>> >switching, routing, filtering, QoS offloads etc.
>> >An argument can be made that given somewhat mixed switchdev experience
>>
>> Can you elaborabe a bit more what you mean by "mixed switchdev
>> experience" please?
>
>I don't want to put words in Jakubs mouth but, in my opinion,
>switchdev has been great for SoHo switches. We have over 100
>supported, mostly implemented by the community, but some vendors also
>supporting their own hardware.
>
>We have two enterprise switch families supported, each by its own
>vendor. And we have one TOR switch family supported by the vendor.
>
>So i would say switchdev has worked out great for SoHo, but kernel
>bypass is still the norm for most things bigger than SoHo.
>
>Why? My guess is, the products with a SoHo switch is not actually a
>switch. It is a wifi box, with a switch. It is a cable modem, with a
>switch. It is an inflight entertainment system, with a switch, etc.
>It is much easier to build such multi-purpose systems when everything
>is nicely integrated into the kernel, you don't have to fight with
>multiple vendors supplying SDKs which only work on a disjoint set of
>kernels, etc.
>
>For bigger, single purpose devices, it is just a switch, there is less
>inconvenience of using just one vendor SDK, on top of the vendor
>proscribed kernel.
I'm aware of what you wrote and undertand it. I just thought Jakub's
mixed experience is about the APIs more than the politics behind vedors
adoptation process..
>
> Andrew
>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 6:48 ` Jiri Pirko
@ 2024-06-07 14:50 ` David Ahern
2024-06-07 15:14 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: David Ahern @ 2024-06-07 14:50 UTC (permalink / raw)
To: Jiri Pirko
Cc: Jakub Kicinski, Jason Gunthorpe, Dan Williams, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On 6/7/24 12:48 AM, Jiri Pirko wrote:
>> The switchdev / sonic comparison came to mind as well during this
>> thread. The existence of a kernel way (switchdev) has not stopped sonic
>> (userspace SDK) from gaining traction. In some cases the SDK is required
>
> Is this discussion technical or policital? I'm asking because it makes
> huge difference. There is no technical reason why sonic does not use
> proper in-kernel solution from what I see
> Yes, they chose technically the wrong way, a shortcut, requiring kernel
> bypass. Honestly for reasons that are beyond my understanding :/
>
>
>> for device features that do not have a kernel uapi or vendors refuse to
>> offer a kernel way, so it is the only option.
>
> Policical reasons.
>
You meant financial reasons, not political. The dominant player in
switches has zero interest in switchdev, zero interest in open sourcing
their SDK. Nothing has changed on that front in the 9 years of
switchdev's existence and no amount of 'NO' by maintainers is ever going
to pressure said vendor to do that.
Mellanox offers both with the Spectrum line and should have a pretty
good understanding of how many customers deploy with the SDK vs
switchdev. Why is that? There are those who think in logical, simple
designs (switchdev), and those who prefer complex, all userspace designs
with ping-ponging messages across processes (sonic). The latter uses all
kinds of what I call silly rationalizations from userspace allows more
flexibility, to dealing with the the kernel is too rigid, or getting
changes in is too hard, or my favorite - Linux does not scale.
The bottom line is that the SDK model is not going away. Period.
The networking stack has accepted kernel bypass compromises (xdp, xdp
sockets, OVS, a lot of the ebpf hooks, ... just examples) with the
rationale that more is brought into the Linux way. fwctl is a similar
effort - an attempt at bringing more into an open source driver and tooling.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 14:50 ` David Ahern
@ 2024-06-07 15:14 ` Jason Gunthorpe
2024-06-07 15:50 ` Jiri Pirko
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-07 15:14 UTC (permalink / raw)
To: David Ahern
Cc: Jiri Pirko, Jakub Kicinski, Dan Williams, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Fri, Jun 07, 2024 at 08:50:17AM -0600, David Ahern wrote:
> Mellanox offers both with the Spectrum line and should have a pretty
> good understanding of how many customers deploy with the SDK vs
> switchdev. Why is that?
We offer lots of options with mlx5 switching too, and switchdev is not
being selected by customers principally for performance reasons, in my
view.
The OVS space wants to operate the switch much like a firewall and
this creates a high rate of database updates and exception
packets. DPDK can operate all the same offload HW from userspace and
avoid all the system call and other kernel overhead. It is much more
purpose built to what OVS wants to do. In the >50Gbps space this
matters a lot and overall DPDK performance notably wins over switchdev
for many OVS workloads - even though the high speed path is
near-identical.
In this role DPDK is effectively a switch SDK, an open source one at
least.
Sadly I'm seeing signs that proprietary OVS focused SDKs (think
various P4 offerings and others) are out competing open DPDK on
merit :(
For whatever reason the market for switching is not strongly motivated
toward open SDKs, and the available open solutions are struggling a
bit to compete.
But to repeat again, fwctl is not for dataplane, it is not for
implementing a switch SDK (go use RDMA if you want to do that). I will
write here a commitment to accept patches blocking such usages if
drivers try to abuse the purpose of the subsystem.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 15:14 ` Jason Gunthorpe
@ 2024-06-07 15:50 ` Jiri Pirko
2024-06-07 17:24 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Jiri Pirko @ 2024-06-07 15:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Ahern, Jakub Kicinski, Dan Williams, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
Fri, Jun 07, 2024 at 05:14:51PM CEST, jgg@nvidia.com wrote:
>On Fri, Jun 07, 2024 at 08:50:17AM -0600, David Ahern wrote:
>
>> Mellanox offers both with the Spectrum line and should have a pretty
>> good understanding of how many customers deploy with the SDK vs
>> switchdev. Why is that?
>
>We offer lots of options with mlx5 switching too, and switchdev is not
>being selected by customers principally for performance reasons, in my
>view.
>
>The OVS space wants to operate the switch much like a firewall and
>this creates a high rate of database updates and exception
>packets. DPDK can operate all the same offload HW from userspace and
>avoid all the system call and other kernel overhead. It is much more
>purpose built to what OVS wants to do. In the >50Gbps space this
>matters a lot and overall DPDK performance notably wins over switchdev
>for many OVS workloads - even though the high speed path is
>near-identical.
>
>In this role DPDK is effectively a switch SDK, an open source one at
>least.
>
>Sadly I'm seeing signs that proprietary OVS focused SDKs (think
>various P4 offerings and others) are out competing open DPDK on
>merit :(
>
>For whatever reason the market for switching is not strongly motivated
>toward open SDKs, and the available open solutions are struggling a
>bit to compete.
>
>But to repeat again, fwctl is not for dataplane, it is not for
>implementing a switch SDK (go use RDMA if you want to do that). I will
switch sdk is all about control plane.
>write here a commitment to accept patches blocking such usages if
>drivers try to abuse the purpose of the subsystem.
>
>Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 15:50 ` Jiri Pirko
@ 2024-06-07 17:24 ` Jason Gunthorpe
0 siblings, 0 replies; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-07 17:24 UTC (permalink / raw)
To: Jiri Pirko
Cc: David Ahern, Jakub Kicinski, Dan Williams, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Fri, Jun 07, 2024 at 05:50:41PM +0200, Jiri Pirko wrote:
> >But to repeat again, fwctl is not for dataplane, it is not for
> >implementing a switch SDK (go use RDMA if you want to do that). I will
>
> switch sdk is all about control plane.
Ah, a poor tearm. I ment any involvement in the data flow of the
device including reaching into the so-called control plane of a switch
to manipulate data flow.
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-07 13:34 ` Jiri Pirko
@ 2024-06-08 1:43 ` Jakub Kicinski
0 siblings, 0 replies; 73+ messages in thread
From: Jakub Kicinski @ 2024-06-08 1:43 UTC (permalink / raw)
To: Jiri Pirko
Cc: Andrew Lunn, David Ahern, Jason Gunthorpe, Dan Williams,
Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, Leon Romanovsky, linux-cxl, patches
On Fri, 7 Jun 2024 15:34:48 +0200 Jiri Pirko wrote:
> >For bigger, single purpose devices, it is just a switch, there is less
> >inconvenience of using just one vendor SDK, on top of the vendor
> >proscribed kernel.
>
> I'm aware of what you wrote and undertand it. I just thought Jakub's
> mixed experience is about the APIs more than the politics behind vedors
> adoptation process..
Not the API / implementation, just that the adoption is limited.
The benefits of using a standard Linux approach is outweighed by
the large pool of talent with experience programming using the SDK
of *the* vendor.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-05 13:59 ` Jason Gunthorpe
2024-06-06 2:35 ` David Ahern
2024-06-06 4:56 ` Dan Williams
@ 2024-06-11 15:36 ` Daniel Vetter
2024-06-11 16:17 ` Jason Gunthorpe
2 siblings, 1 reply; 73+ messages in thread
From: Daniel Vetter @ 2024-06-11 15:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Dan Williams, Jakub Kicinski, David Ahern, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Wed, Jun 05, 2024 at 10:59:11AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 04, 2024 at 04:56:57PM -0700, Dan Williams wrote:
> > * Introspection / validation: Subsystem community needs to be able to
> > audit behavior after the fact.
> >
> > To me this means even if the kernel is letting a command through based
> > on the stated Command Effect of "Configuration Change after Cold Reset"
> > upstream community has a need to be able to read the vendor
> > specification for that command. I.e. commands might be vendor-specific,
> > but never vendor-private. I see this as similar to the requirement for
> > open source userspace for sophisticated accelerators.
>
> I'm less hard on this. As long as reasonable open userspace exists I
> think it is fine to let other stuff through too. I can appreciate the
> DRM stance on this, but IMHO, there is meaningfully more value for open
> source in trying get an open Vulkan implementation vs blocking users
> from reading their vendor'd diagnostic SI values.
>
> I don't think we should get into some kind of extremism and insist
> that every single bit must be documented/standardized or Linux won't
> support it.
I figured it might be useful to paint what we do in DRM with a bit more
nuance. In the principles, we're indeed fairly radical in what we require,
but in practice we aim for a much more pragmatic approach in what we
merge. There's two major axis here:
1. One is ecosystem maturity. One end is 3d, with vulkan as the clear
industry standard, and an upstream full-featured userspace driver in
mesa3d is the only technically reasonable choice. And all gpu vendors
agree and by this year even nvidia started hiring an upstream team. But
this didn't happen magically overnight, it took 1-2 decades of background
discussions and tactical push&pulling to get there.
The other end is currently AI accelerators. It's a complete mess, where
across the platform (client, edge, cloud), customer and vendor dimension
every point has a different stack. And the problem is so obvious that
everyone is working to fix this, which means currently
https://xkcd.com/927/ is happening in parallel. Just to get things going
we're accepting pretty much anything that's a notch above total garbage
for userspace and for merging into the kernel.
2. The other part is how much it impacts applications. If you can't run
the same application across different vendors, the case for an upstream
stack becomes a lot weaker. At the other end is infrastructure enabling
like device configuration, error handling and recovery, hw debugging and
reliablity/health reporting. That's a lot more vendor specific in nature
and needs to be customized anyway per deployement. And only much higher in
the stack, maybe in k8s, can a technically reasonable unification even
happen. So again we're much more lenient about infrastructure enabling
and uapi than stuff applications will use directly.
Currently that's enough of a mess in drm that I feel like enforcing
something like fwctl is still too much. But maybe once fwctl is
established with other subsystems/devices we can start the conversations
with vendors to get this going a few years down the road.
Both together mean we land a lot of code that's questionable at best,
clear garbage at worst. But since we've been in the merging garbage
business just to get things going for decades, we've become pretty good at
dealing with the kernel internal and uapi fallout, some say too good. But
personally I don't think there's a path to where we are with 3d/vulkan
that doesn't go through years of this kind of suck, and very much merged
into upstream kind of suck.
For all the concerns about trusting vendors/devices to not abuse very broad
uapi interfaces: Modern accelerator command submission boils down to "run
this context at this $addr", and the kernel never ever directly sees
anything more fly by. That's the same interface you need for a no-op job
as a full blown AI workload, so in theory maximal abuse potential.
In practice, it doesn't seem to be an issue, at least not beyond the
intentionally pragmatic choices where we merge kernel code with known
sub-par/incomplete userspace. I'm not sure why, but to my knowledge all
attempts to break the spirit of our userspace rules while following the
letter die in vendor-internal discussions, at least for all the
established upstream driver teams.
And for new ones it takes years of private chats to get them going and
fully established in upstream anyway.
Maybe one reason we have a bit an extremist reputation is that all the
public takes are the radical principled requirements, while the actual
pragmatic discussions mostly happen in private.
tldr; fwctl as I understand it feels like a bridge to far for drm today,
but I'd very much like someone else to make this happen so we could
eventually push towards adoption too.
Cheers, Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-11 15:36 ` Daniel Vetter
@ 2024-06-11 16:17 ` Jason Gunthorpe
2024-06-11 16:54 ` Daniel Vetter
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 16:17 UTC (permalink / raw)
To: Daniel Vetter
Cc: Dan Williams, Jakub Kicinski, David Ahern, Jonathan Corbet,
Itay Avraham, Leon Romanovsky, linux-doc, linux-rdma, netdev,
Paolo Abeni, Saeed Mahameed, Tariq Toukan, Andy Gospodarek,
Aron Silverton, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Tue, Jun 11, 2024 at 05:36:17PM +0200, Daniel Vetter wrote:
> reliablity/health reporting. That's a lot more vendor specific in nature
> and needs to be customized anyway per deployement. And only much higher in
> the stack, maybe in k8s, can a technically reasonable unification even
> happen. So again we're much more lenient about infrastructure enabling
> and uapi than stuff applications will use directly.
To be clear, this is the specific niche fwctl is for. It is not for
GPU command submission or something like that, and as I said to Jiri I
would agree to agressively block such abuses.
> Currently that's enough of a mess in drm that I feel like enforcing
> something like fwctl is still too much. But maybe once fwctl is
> established with other subsystems/devices we can start the conversations
> with vendors to get this going a few years down the road.
I wouldn't say enforcing, but instead of having every GPU driver build
their own weird vendor'd way to access their debug/diagnostic stuff
steer them into fwctl. These data center GPUs with FW at least have
lots of appropriate stuff and all the vendor OOT stuff has tooling to
inspect the GPUs far more than DRM has code for (ie
rocm-smi/nvidia-smi are have some features that are potentially good
candidates for fwctl)
> In practice, it doesn't seem to be an issue, at least not beyond the
> intentionally pragmatic choices where we merge kernel code with known
> sub-par/incomplete userspace. I'm not sure why, but to my knowledge all
> attempts to break the spirit of our userspace rules while following the
> letter die in vendor-internal discussions, at least for all the
> established upstream driver teams.
I think the same is broadly true of RDMA as well, except we don't
bother with the kernel trying to police the command stream - direct
submission from userspace. I can't say it has been much of an issue.
> tldr; fwctl as I understand it feels like a bridge to far for drm today,
> but I'd very much like someone else to make this happen so we could
> eventually push towards adoption too.
Hahah, okay, well, I'm pushing :)
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 0/8] Introduce fwctl subystem
2024-06-11 16:17 ` Jason Gunthorpe
@ 2024-06-11 16:54 ` Daniel Vetter
0 siblings, 0 replies; 73+ messages in thread
From: Daniel Vetter @ 2024-06-11 16:54 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Daniel Vetter, Dan Williams, Jakub Kicinski, David Ahern,
Jonathan Corbet, Itay Avraham, Leon Romanovsky, linux-doc,
linux-rdma, netdev, Paolo Abeni, Saeed Mahameed, Tariq Toukan,
Andy Gospodarek, Aron Silverton, Christoph Hellwig, Jiri Pirko,
Leonid Bloch, Leon Romanovsky, linux-cxl, patches
On Tue, Jun 11, 2024 at 01:17:02PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 11, 2024 at 05:36:17PM +0200, Daniel Vetter wrote:
> > reliablity/health reporting. That's a lot more vendor specific in nature
> > and needs to be customized anyway per deployement. And only much higher in
> > the stack, maybe in k8s, can a technically reasonable unification even
> > happen. So again we're much more lenient about infrastructure enabling
> > and uapi than stuff applications will use directly.
>
> To be clear, this is the specific niche fwctl is for. It is not for
> GPU command submission or something like that, and as I said to Jiri I
> would agree to agressively block such abuses.
>
> > Currently that's enough of a mess in drm that I feel like enforcing
> > something like fwctl is still too much. But maybe once fwctl is
> > established with other subsystems/devices we can start the conversations
> > with vendors to get this going a few years down the road.
>
> I wouldn't say enforcing, but instead of having every GPU driver build
> their own weird vendor'd way to access their debug/diagnostic stuff
> steer them into fwctl. These data center GPUs with FW at least have
> lots of appropriate stuff and all the vendor OOT stuff has tooling to
> inspect the GPUs far more than DRM has code for (ie
> rocm-smi/nvidia-smi are have some features that are potentially good
> candidates for fwctl)
Yeah "enforcing" to the level we do with 3d/vulkan would be years down the
road, if ever. Very unlikely imo for debug/diagnostics/tuning stuff.
> > In practice, it doesn't seem to be an issue, at least not beyond the
> > intentionally pragmatic choices where we merge kernel code with known
> > sub-par/incomplete userspace. I'm not sure why, but to my knowledge all
> > attempts to break the spirit of our userspace rules while following the
> > letter die in vendor-internal discussions, at least for all the
> > established upstream driver teams.
>
> I think the same is broadly true of RDMA as well, except we don't
> bother with the kernel trying to police the command stream - direct
> submission from userspace. I can't say it has been much of an issue.
Maybe just a bit confusion, but all modern-ish drm drivers stopped parsing
the command stream a while ago. We only ever did that to fill security
gaps, never to enforce any rules about what userspace is allowed to do
beyond that.
The rule that the open userspace needs to be complete, for some reasonably
pragmatic definition of "complete", is entirely a social contract. And I'm
not aware of any real issues with enforcing that beyond just trusting the
established vendor teams. So yeah no real issues with uabi that allows
maximal abuse because it's entirely unchecked by the kernel code.
Or put differently, I think we're trying to make the same point.
> > tldr; fwctl as I understand it feels like a bridge to far for drm today,
> > but I'd very much like someone else to make this happen so we could
> > eventually push towards adoption too.
>
> Hahah, okay, well, I'm pushing :)
Thanks :-)
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
@ 2024-06-13 23:32 ` Dave Jiang
2024-06-13 23:40 ` Jason Gunthorpe
0 siblings, 1 reply; 73+ messages in thread
From: Dave Jiang @ 2024-06-13 23:32 UTC (permalink / raw)
To: Jason Gunthorpe, Jonathan Corbet, Itay Avraham, Jakub Kicinski,
Leon Romanovsky, linux-doc, linux-rdma, netdev, Paolo Abeni,
Saeed Mahameed, Tariq Toukan
Cc: Andy Gospodarek, Aron Silverton, Dan Williams, David Ahern,
Christoph Hellwig, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
linux-cxl, patches
On 6/3/24 8:53 AM, Jason Gunthorpe wrote:
> Userspace will need to know some details about the fwctl interface being
> used to locate the correct userspace code to communicate with the
> kernel. Provide a simple device_type enum indicating what the kernel
> driver is.
>
> Allow the device to provide a device specific info struct that contains
> any additional information that the driver may need to provide to
> userspace.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/fwctl/main.c | 54 ++++++++++++++++++++++++++++++++++++++
> include/linux/fwctl.h | 8 ++++++
> include/uapi/fwctl/fwctl.h | 29 ++++++++++++++++++++
> 3 files changed, 91 insertions(+)
>
> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index 7ecdabdd9dcb1e..10e3f504893892 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c
> @@ -17,6 +17,8 @@ enum {
> static dev_t fwctl_dev;
> static DEFINE_IDA(fwctl_ida);
>
> +DEFINE_FREE(kfree_errptr, void *, if (!IS_ERR_OR_NULL(_T)) kfree(_T));
> +
> struct fwctl_ucmd {
> struct fwctl_uctx *uctx;
> void __user *ubuffer;
> @@ -24,8 +26,59 @@ struct fwctl_ucmd {
> u32 user_size;
> };
>
> +static int ucmd_respond(struct fwctl_ucmd *ucmd, size_t cmd_len)
> +{
> + if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
> + min_t(size_t, ucmd->user_size, cmd_len)))
> + return -EFAULT;
> + return 0;
> +}
> +
> +static int copy_to_user_zero_pad(void __user *to, const void *from,
> + size_t from_len, size_t user_len)
> +{
> + size_t copy_len;
> +
> + copy_len = min(from_len, user_len);
> + if (copy_to_user(to, from, copy_len))
> + return -EFAULT;
> + if (copy_len < user_len) {
> + if (clear_user(to + copy_len, user_len - copy_len))
> + return -EFAULT;
> + }
> + return 0;
> +}
> +
> +static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
> +{
> + struct fwctl_device *fwctl = ucmd->uctx->fwctl;
> + struct fwctl_info *cmd = ucmd->cmd;
> + size_t driver_info_len = 0;
> +
> + if (cmd->flags)
> + return -EOPNOTSUPP;
> +
> + if (cmd->device_data_len) {
> + void *driver_info __free(kfree_errptr) = NULL;
> +
> + driver_info = fwctl->ops->info(ucmd->uctx, &driver_info_len);
Hi Jason,
Are you open to pass in potential user input for the info query? I'm working on plumbing fwctl for CXL. The current CXL query command [1] takes a number of commands as input for its ioctl. For fwctl_cmd_info(), the current implementation is when ->info() is called no information about the user buffer length or an input buffer is provided. To make things work I can just return everything each ioctl call and user can sort it out by calling the ioctl twice and provide a u32 size buffer first to figure out the total number of commands and then provide a larger buffer for all the command info. Just trying to see if you are open to something a bit more cleaner than depending on a side effect of the ioctl to retrieve all the information.
[1] https://elixir.bootlin.com/linux/v6.10-rc3/source/drivers/cxl/core/mbox.c#L526
DJ
> + if (IS_ERR(driver_info))
> + return PTR_ERR(driver_info);
> +
> + if (copy_to_user_zero_pad(u64_to_user_ptr(cmd->out_device_data),
> + driver_info, driver_info_len,
> + cmd->device_data_len))
> + return -EFAULT;
> + }
> +
> + cmd->out_device_type = fwctl->ops->device_type;
> + cmd->device_data_len = driver_info_len;
> + return ucmd_respond(ucmd, sizeof(*cmd));
> +}
> +
> /* On stack memory for the ioctl structs */
> union ucmd_buffer {
> + struct fwctl_info info;
> };
>
> struct fwctl_ioctl_op {
> @@ -45,6 +98,7 @@ struct fwctl_ioctl_op {
> .execute = _fn, \
> }
> static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
> + IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
> };
>
> static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index 1d9651de92fc19..9a906b861acf3a 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -7,12 +7,14 @@
> #include <linux/device.h>
> #include <linux/cdev.h>
> #include <linux/cleanup.h>
> +#include <uapi/fwctl/fwctl.h>
>
> struct fwctl_device;
> struct fwctl_uctx;
>
> /**
> * struct fwctl_ops - Driver provided operations
> + * @device_type: The drivers assigned device_type number. This is uABI
> * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
> * bytes of this memory will be a fwctl_uctx. The driver can use the
> * remaining bytes as its private memory.
> @@ -20,11 +22,17 @@ struct fwctl_uctx;
> * used.
> * @close_uctx: Called when the uctx is destroyed, usually when the FD is
> * closed.
> + * @info: Implement FWCTL_INFO. Return a kmalloc() memory that is copied to
> + * out_device_data. On input length indicates the size of the user buffer
> + * on output it indicates the size of the memory. The driver can ignore
> + * length on input, the core code will handle everything.
> */
> struct fwctl_ops {
> + enum fwctl_device_type device_type;
> size_t uctx_size;
> int (*open_uctx)(struct fwctl_uctx *uctx);
> void (*close_uctx)(struct fwctl_uctx *uctx);
> + void *(*info)(struct fwctl_uctx *uctx, size_t *length);
> };
>
> /**
> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> index 0bdce95b6d69d9..39db9f09f8068e 100644
> --- a/include/uapi/fwctl/fwctl.h
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -36,6 +36,35 @@
> */
> enum {
> FWCTL_CMD_BASE = 0,
> + FWCTL_CMD_INFO = 0,
> + FWCTL_CMD_RPC = 1,
> };
>
> +enum fwctl_device_type {
> + FWCTL_DEVICE_TYPE_ERROR = 0,
> +};
> +
> +/**
> + * struct fwctl_info - ioctl(FWCTL_INFO)
> + * @size: sizeof(struct fwctl_info)
> + * @flags: Must be 0
> + * @out_device_type: Returns the type of the device from enum fwctl_device_type
> + * @device_data_len: On input the length of the out_device_data memory. On
> + * output the size of the kernel's device_data which may be larger or
> + * smaller than the input. Maybe 0 on input.
> + * @out_device_data: Pointer to a memory of device_data_len bytes. Kernel will
> + * fill the entire memory, zeroing as required.
> + *
> + * Returns basic information about this fwctl instance, particularly what driver
> + * is being used to define the device_data format.
> + */
> +struct fwctl_info {
> + __u32 size;
> + __u32 flags;
> + __u32 out_device_type;
> + __u32 device_data_len;
> + __aligned_u64 out_device_data;
> +};
> +#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
> +
> #endif
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device
2024-06-13 23:32 ` Dave Jiang
@ 2024-06-13 23:40 ` Jason Gunthorpe
2024-06-14 16:37 ` Dave Jiang
0 siblings, 1 reply; 73+ messages in thread
From: Jason Gunthorpe @ 2024-06-13 23:40 UTC (permalink / raw)
To: Dave Jiang
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On Thu, Jun 13, 2024 at 04:32:44PM -0700, Dave Jiang wrote:
> Are you open to pass in potential user input for the info query? I'm
> working on plumbing fwctl for CXL.
Neat!
> The current CXL query command [1] takes a number of commands as
> input for its ioctl. For fwctl_cmd_info(), the current
> implementation is when ->info() is called no information about the
> user buffer length or an input buffer is provided.
Right, the purpose of info is to report information about the fwctl
driver. It is to allow the userspace to connect to the correct
userspace driver. It shouldn't be doing much with the device.
If you want to execute a info command *to the fw* then I'd expect
you'd execute the command through the normal RPC channel? Does
something prevent this?
This is how the mlx5 driver is working where there are many info
(called CAP) commands that return data, and they all run over the rpc
channel.
Thanks,
Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device
2024-06-13 23:40 ` Jason Gunthorpe
@ 2024-06-14 16:37 ` Dave Jiang
0 siblings, 0 replies; 73+ messages in thread
From: Dave Jiang @ 2024-06-14 16:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jonathan Corbet, Itay Avraham, Jakub Kicinski, Leon Romanovsky,
linux-doc, linux-rdma, netdev, Paolo Abeni, Saeed Mahameed,
Tariq Toukan, Andy Gospodarek, Aron Silverton, Dan Williams,
David Ahern, Christoph Hellwig, Jiri Pirko, Leonid Bloch,
Leon Romanovsky, linux-cxl, patches
On 6/13/24 4:40 PM, Jason Gunthorpe wrote:
> On Thu, Jun 13, 2024 at 04:32:44PM -0700, Dave Jiang wrote:
>
>> Are you open to pass in potential user input for the info query? I'm
>> working on plumbing fwctl for CXL.
>
> Neat!
>
>> The current CXL query command [1] takes a number of commands as
>> input for its ioctl. For fwctl_cmd_info(), the current
>> implementation is when ->info() is called no information about the
>> user buffer length or an input buffer is provided.
>
> Right, the purpose of info is to report information about the fwctl
> driver. It is to allow the userspace to connect to the correct
> userspace driver. It shouldn't be doing much with the device.
>
> If you want to execute a info command *to the fw* then I'd expect
> you'd execute the command through the normal RPC channel? Does
> something prevent this?
Ok that makes sense. I should be able to do it through RPC with some tweaks.
>
> This is how the mlx5 driver is working where there are many info
> (called CAP) commands that return data, and they all run over the rpc
> channel.
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2024-06-14 16:42 UTC | newest]
Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-06-04 9:32 ` Leon Romanovsky
2024-06-04 15:50 ` Jason Gunthorpe
2024-06-04 17:05 ` Jonathan Cameron
2024-06-04 18:52 ` Jason Gunthorpe
2024-06-05 11:08 ` Jonathan Cameron
2024-06-04 16:42 ` Randy Dunlap
2024-06-04 16:44 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2024-06-04 12:16 ` Zhu Yanjun
2024-06-04 12:22 ` Leon Romanovsky
2024-06-04 16:50 ` Jonathan Cameron
2024-06-04 16:58 ` Jason Gunthorpe
2024-06-05 11:07 ` Jonathan Cameron
2024-06-05 18:27 ` Jason Gunthorpe
2024-06-06 13:34 ` Jonathan Cameron
2024-06-06 15:37 ` Randy Dunlap
2024-06-05 15:42 ` Przemek Kitszel
2024-06-05 15:49 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2024-06-13 23:32 ` Dave Jiang
2024-06-13 23:40 ` Jason Gunthorpe
2024-06-14 16:37 ` Dave Jiang
2024-06-03 15:53 ` [PATCH 4/8] taint: Add TAINT_FWCTL Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 6/8] fwctl: Add documentation Jason Gunthorpe
2024-06-05 2:31 ` Randy Dunlap
2024-06-05 16:03 ` Jason Gunthorpe
2024-06-05 20:14 ` Randy Dunlap
2024-06-03 15:53 ` [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
2024-06-04 3:01 ` David Ahern
2024-06-04 14:04 ` Jakub Kicinski
2024-06-04 21:28 ` Saeed Mahameed
2024-06-04 22:32 ` Jakub Kicinski
2024-06-05 14:50 ` Jason Gunthorpe
2024-06-05 15:41 ` Jakub Kicinski
2024-06-04 23:56 ` Dan Williams
2024-06-05 3:05 ` Jakub Kicinski
2024-06-05 11:19 ` Jonathan Cameron
2024-06-05 13:59 ` Jason Gunthorpe
2024-06-06 2:35 ` David Ahern
2024-06-06 14:18 ` Jakub Kicinski
2024-06-06 14:48 ` Jason Gunthorpe
2024-06-06 15:05 ` Jakub Kicinski
2024-06-06 17:47 ` David Ahern
2024-06-07 6:48 ` Jiri Pirko
2024-06-07 14:50 ` David Ahern
2024-06-07 15:14 ` Jason Gunthorpe
2024-06-07 15:50 ` Jiri Pirko
2024-06-07 17:24 ` Jason Gunthorpe
2024-06-07 7:34 ` Jiri Pirko
2024-06-07 12:49 ` Andrew Lunn
2024-06-07 13:34 ` Jiri Pirko
2024-06-08 1:43 ` Jakub Kicinski
2024-06-06 4:56 ` Dan Williams
2024-06-06 8:50 ` Leon Romanovsky
2024-06-06 22:11 ` Dan Williams
2024-06-07 0:02 ` Jason Gunthorpe
2024-06-07 13:12 ` Leon Romanovsky
2024-06-06 14:41 ` Jason Gunthorpe
2024-06-06 14:58 ` Jakub Kicinski
2024-06-06 17:24 ` Dan Williams
2024-06-07 0:25 ` Jason Gunthorpe
2024-06-07 10:47 ` Przemek Kitszel
2024-06-11 15:36 ` Daniel Vetter
2024-06-11 16:17 ` Jason Gunthorpe
2024-06-11 16:54 ` Daniel Vetter
2024-06-06 1:58 ` David Ahern
2024-06-05 3:11 ` Jakub Kicinski
2024-06-05 12:06 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).