linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Introduce fwctl subystem
@ 2024-08-21 18:10 Jason Gunthorpe
  2024-08-21 18:10 ` [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

fwctl is a new subsystem intended to bring some common rules and order to
the growing pattern of exposing a secure FW interface directly to
userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
exposing a device for datapath operations fwctl is focused on debugging,
configuration and provisioning of the device. It will not have the
necessary features like interrupt delivery to support a datapath.

This concept is similar to the long standing practice in the "HW" RAID
space of having a device specific misc device to manager the RAID
controller FW. fwctl generalizes this notion of a companion debug and
management interface that goes along with a dataplane implemented in an
appropriate subsystem.

The need for this has reached a critical point as many users are moving to
run lockdown enabled kernels. Several existing devices have had long
standing tooling for management that relied on /sys/../resource0 or PCI
config space access which is not permitted in lockdown. A major point of
fwctl is to define and document the rules that a device must follow to
expose a lockdown compatible RPC.

Based on some discussion fwctl splits the RPCs into four categories

	FWCTL_RPC_CONFIGURATION
	FWCTL_RPC_DEBUG_READ_ONLY
	FWCTL_RPC_DEBUG_WRITE
	FWCTL_RPC_DEBUG_WRITE_FULL

Where the latter two trigger a new TAINT_FWCTL, and the final one requires
CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
would be responsible to restrict RPCs to the requested security scope,
while the core code handles the tainting and CAP checks.

For details see the final patch which introduces the documentation.

This series incorporates a version of the mlx5ctl interface previously
proposed:
  https://lore.kernel.org/r/20240207072435.14182-1-saeed@kernel.org/

For this series the memory registration mechanism was removed, but I
expect it will come back.

It also includes the FWCL driver series from David:
  https://lore.kernel.org/all/20240718213446.1750135-1-dave.jiang@intel.com/


This is still waiting a 3rd fwctl driver and the CXL side to finish some
of its development. The github has the necessary CXL precursor patches.

There have been two LWN articles written discussing various aspects of
this proposal:

 https://lwn.net/Articles/955001/
 https://lwn.net/Articles/969383/

And a really giant ksummit thread:

 https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/

Several have expressed general support for this concept:

 Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
 Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org/
 Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
 Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org/
 NVIDIA Networking
 Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
 Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
 SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com

Work is ongoing for a robust multi-device open source userspace, currently
the mlx5ctl_user that was posted by Saeed has been updated to use fwctl.

  https://github.com/saeedtx/mlx5ctl.git
  https://github.com/jgunthorpe/mlx5ctl.git

This is on github: https://github.com/jgunthorpe/linux/commits/fwctl

v3:
 - Rebase to v6.11-rc4
 - Add a squashed version of David's CXL series as the 2nd driver
 - Add missing includes
 - Improve comments based on feedback
 - Use the kdoc format that puts the member docs inside the struct
 - Rewrite fwctl_alloc_device() to be clearer
 - Incorporate all remarks for the documentation
v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
 - Rebase to v6.10-rc5
 - Minor style changes
 - Follow the style consensus for the guard stuff
 - Documentation grammer/spelling
 - Add missed length output for mlx5 get_info
 - Add two more missed MLX5 CMD's
 - Collect tags
v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com

Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Cc: Aron Silverton <aron.silverton@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Itay Avraham <itayavr@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Leonid Bloch <lbloch@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Dave Jiang (1):
  fwctl/cxl: Add driver for CXL mailbox for handling CXL features
    commands (RFC)

Jason Gunthorpe (7):
  fwctl: Add basic structure for a class subsystem with a cdev
  fwctl: Basic ioctl dispatch for the character device
  fwctl: FWCTL_INFO to return basic information about the device
  taint: Add TAINT_FWCTL
  fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  fwctl: Add documentation
  cxl: Create an auxiliary device for fwctl_cxl

Saeed Mahameed (2):
  fwctl/mlx5: Support for communicating with mlx5 fw
  mlx5: Create an auxiliary device for fwctl_mlx5

 Documentation/admin-guide/tainted-kernels.rst |   5 +
 Documentation/userspace-api/fwctl.rst         | 285 ++++++++++++
 Documentation/userspace-api/index.rst         |   1 +
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 MAINTAINERS                                   |  23 +
 drivers/Kconfig                               |   2 +
 drivers/Makefile                              |   1 +
 drivers/cxl/core/memdev.c                     |  19 +
 drivers/fwctl/Kconfig                         |  32 ++
 drivers/fwctl/Makefile                        |   6 +
 drivers/fwctl/cxl/Makefile                    |   4 +
 drivers/fwctl/cxl/cxl.c                       | 274 ++++++++++++
 drivers/fwctl/main.c                          | 414 ++++++++++++++++++
 drivers/fwctl/mlx5/Makefile                   |   4 +
 drivers/fwctl/mlx5/main.c                     | 337 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |   8 +
 include/linux/cxl/mailbox.h                   | 104 +++++
 include/linux/fwctl.h                         | 135 ++++++
 include/linux/panic.h                         |   3 +-
 include/uapi/fwctl/cxl.h                      |  94 ++++
 include/uapi/fwctl/fwctl.h                    | 140 ++++++
 include/uapi/fwctl/mlx5.h                     |  36 ++
 kernel/panic.c                                |   1 +
 tools/debugging/kernel-chktaint               |   8 +
 24 files changed, 1936 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/userspace-api/fwctl.rst
 create mode 100644 drivers/fwctl/Kconfig
 create mode 100644 drivers/fwctl/Makefile
 create mode 100644 drivers/fwctl/cxl/Makefile
 create mode 100644 drivers/fwctl/cxl/cxl.c
 create mode 100644 drivers/fwctl/main.c
 create mode 100644 drivers/fwctl/mlx5/Makefile
 create mode 100644 drivers/fwctl/mlx5/main.c
 create mode 100644 include/linux/fwctl.h
 create mode 100644 include/uapi/fwctl/cxl.h
 create mode 100644 include/uapi/fwctl/fwctl.h
 create mode 100644 include/uapi/fwctl/mlx5.h


base-commit: cd0c76bee95e9c2092418523599439d2c8dbff7e
-- 
2.46.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-23 13:48   ` Jonathan Cameron
  2024-08-21 18:10 ` [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

Create the class, character device and functions for a fwctl driver to
un/register to the subsystem.

A typical fwctl driver has a sysfs presence like:

$ ls -l /dev/fwctl/fwctl0
crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0

$ ls /sys/class/fwctl/fwctl0
dev  device  power  subsystem  uevent

$ ls /sys/class/fwctl/fwctl0/device/infiniband/
ibp0s10f0

$ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
fwctl0/

$ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
dev  device  power  subsystem  uevent

Which allows userspace to link all the multi-subsystem driver components
together and learn the subsystem specific names for the device's
components.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 MAINTAINERS            |   8 ++
 drivers/Kconfig        |   2 +
 drivers/Makefile       |   1 +
 drivers/fwctl/Kconfig  |   9 +++
 drivers/fwctl/Makefile |   4 +
 drivers/fwctl/main.c   | 168 +++++++++++++++++++++++++++++++++++++++++
 include/linux/fwctl.h  |  69 +++++++++++++++++
 7 files changed, 261 insertions(+)
 create mode 100644 drivers/fwctl/Kconfig
 create mode 100644 drivers/fwctl/Makefile
 create mode 100644 drivers/fwctl/main.c
 create mode 100644 include/linux/fwctl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0ff21a07589b51..2efd8d14495431 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9241,6 +9241,14 @@ F:	kernel/futex/*
 F:	tools/perf/bench/futex*
 F:	tools/testing/selftests/futex/
 
+FWCTL SUBSYSTEM
+M:	Jason Gunthorpe <jgg@nvidia.com>
+M:	Saeed Mahameed <saeedm@nvidia.com>
+S:	Maintained
+F:	Documentation/userspace-api/fwctl.rst
+F:	drivers/fwctl/
+F:	include/linux/fwctl.h
+
 GALAXYCORE GC0308 CAMERA SENSOR DRIVER
 M:	Sebastian Reichel <sre@kernel.org>
 L:	linux-media@vger.kernel.org
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 7bdad836fc6207..7c556c5ac4fddc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -21,6 +21,8 @@ source "drivers/connector/Kconfig"
 
 source "drivers/firmware/Kconfig"
 
+source "drivers/fwctl/Kconfig"
+
 source "drivers/gnss/Kconfig"
 
 source "drivers/mtd/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index fe9ceb0d2288ad..f6a241b747b29c 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_MEMSTICK)		+= memstick/
 obj-y				+= leds/
 obj-$(CONFIG_INFINIBAND)	+= infiniband/
 obj-y				+= firmware/
+obj-$(CONFIG_FWCTL)		+= fwctl/
 obj-$(CONFIG_CRYPTO)		+= crypto/
 obj-$(CONFIG_SUPERH)		+= sh/
 obj-y				+= clocksource/
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
new file mode 100644
index 00000000000000..37147a695add9a
--- /dev/null
+++ b/drivers/fwctl/Kconfig
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menuconfig FWCTL
+	tristate "fwctl device firmware access framework"
+	help
+	  fwctl provides a userspace API for restricted access to communicate
+	  with on-device firmware. The communication channel is intended to
+	  support a wide range of lockdown compatible device behaviors including
+	  manipulating device FLASH, debugging, and other activities that don't
+	  fit neatly into an existing subsystem.
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
new file mode 100644
index 00000000000000..1cad210f6ba580
--- /dev/null
+++ b/drivers/fwctl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL) += fwctl.o
+
+fwctl-y += main.o
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
new file mode 100644
index 00000000000000..7f3e7713d0e6e9
--- /dev/null
+++ b/drivers/fwctl/main.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#define pr_fmt(fmt) "fwctl: " fmt
+#include <linux/fwctl.h>
+
+#include <linux/container_of.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+enum {
+	FWCTL_MAX_DEVICES = 256,
+};
+static dev_t fwctl_dev;
+static DEFINE_IDA(fwctl_ida);
+
+static int fwctl_fops_open(struct inode *inode, struct file *filp)
+{
+	struct fwctl_device *fwctl =
+		container_of(inode->i_cdev, struct fwctl_device, cdev);
+
+	get_device(&fwctl->dev);
+	filp->private_data = fwctl;
+	return 0;
+}
+
+static int fwctl_fops_release(struct inode *inode, struct file *filp)
+{
+	struct fwctl_device *fwctl = filp->private_data;
+
+	fwctl_put(fwctl);
+	return 0;
+}
+
+static const struct file_operations fwctl_fops = {
+	.owner = THIS_MODULE,
+	.open = fwctl_fops_open,
+	.release = fwctl_fops_release,
+};
+
+static void fwctl_device_release(struct device *device)
+{
+	struct fwctl_device *fwctl =
+		container_of(device, struct fwctl_device, dev);
+
+	ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+	kfree(fwctl);
+}
+
+static char *fwctl_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "fwctl/%s", dev_name(dev));
+}
+
+static struct class fwctl_class = {
+	.name = "fwctl",
+	.dev_release = fwctl_device_release,
+	.devnode = fwctl_devnode,
+};
+
+static struct fwctl_device *
+_alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
+{
+	struct fwctl_device *fwctl __free(kfree) = kzalloc(size, GFP_KERNEL);
+	int devnum;
+
+	if (!fwctl)
+		return NULL;
+
+	fwctl->dev.class = &fwctl_class;
+	fwctl->dev.parent = parent;
+
+	devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
+	if (devnum < 0)
+		return NULL;
+	fwctl->dev.devt = fwctl_dev + devnum;
+
+	device_initialize(&fwctl->dev);
+	return_ptr(fwctl);
+}
+
+/* Drivers use the fwctl_alloc_device() wrapper */
+struct fwctl_device *_fwctl_alloc_device(struct device *parent,
+					 const struct fwctl_ops *ops,
+					 size_t size)
+{
+	struct fwctl_device *fwctl __free(fwctl) =
+		_alloc_device(parent, ops, size);
+
+	if (!fwctl)
+		return NULL;
+
+	cdev_init(&fwctl->cdev, &fwctl_fops);
+	/*
+	 * The driver module is protected by fwctl_register/unregister(),
+	 * unregister won't complete until we are done with the driver's module.
+	 */
+	fwctl->cdev.owner = THIS_MODULE;
+
+	if (dev_set_name(&fwctl->dev, "fwctl%d", fwctl->dev.devt - fwctl_dev))
+		return NULL;
+
+	fwctl->ops = ops;
+	return_ptr(fwctl);
+}
+EXPORT_SYMBOL_NS_GPL(_fwctl_alloc_device, FWCTL);
+
+/**
+ * fwctl_register - Register a new device to the subsystem
+ * @fwctl: Previously allocated fwctl_device
+ *
+ * On return the device is visible through sysfs and /dev, driver ops may be
+ * called.
+ */
+int fwctl_register(struct fwctl_device *fwctl)
+{
+	return cdev_device_add(&fwctl->cdev, &fwctl->dev);
+}
+EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
+
+/**
+ * fwctl_unregister - Unregister a device from the subsystem
+ * @fwctl: Previously allocated and registered fwctl_device
+ *
+ * Undoes fwctl_register(). On return no driver ops will be called. The
+ * caller must still call fwctl_put() to free the fwctl.
+ *
+ * The design of fwctl allows this sort of disassociation of the driver from the
+ * subsystem primarily by keeping memory allocations owned by the core subsytem.
+ * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
+ * callback. This allows the module to remain unlocked while FDs are open.
+ */
+void fwctl_unregister(struct fwctl_device *fwctl)
+{
+	cdev_device_del(&fwctl->cdev, &fwctl->dev);
+}
+EXPORT_SYMBOL_NS_GPL(fwctl_unregister, FWCTL);
+
+static int __init fwctl_init(void)
+{
+	int ret;
+
+	ret = alloc_chrdev_region(&fwctl_dev, 0, FWCTL_MAX_DEVICES, "fwctl");
+	if (ret)
+		return ret;
+
+	ret = class_register(&fwctl_class);
+	if (ret)
+		goto err_chrdev;
+	return 0;
+
+err_chrdev:
+	unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
+	return ret;
+}
+
+static void __exit fwctl_exit(void)
+{
+	class_unregister(&fwctl_class);
+	unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
+}
+
+module_init(fwctl_init);
+module_exit(fwctl_exit);
+MODULE_DESCRIPTION("fwctl device firmware access framework");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
new file mode 100644
index 00000000000000..68ac2d5ab87481
--- /dev/null
+++ b/include/linux/fwctl.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#ifndef __LINUX_FWCTL_H
+#define __LINUX_FWCTL_H
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/cleanup.h>
+
+struct fwctl_device;
+struct fwctl_uctx;
+
+struct fwctl_ops {
+};
+
+/**
+ * struct fwctl_device - Per-driver registration struct
+ * @dev: The sysfs (class/fwctl/fwctlXX) device
+ *
+ * Each driver instance will have one of these structs with the driver private
+ * data following immediately after. This struct is refcounted, it is freed by
+ * calling fwctl_put().
+ */
+struct fwctl_device {
+	struct device dev;
+	/* private: */
+	struct cdev cdev;
+	const struct fwctl_ops *ops;
+};
+
+struct fwctl_device *_fwctl_alloc_device(struct device *parent,
+					 const struct fwctl_ops *ops,
+					 size_t size);
+/**
+ * fwctl_alloc_device - Allocate a fwctl
+ * @parent: Physical device that provides the FW interface
+ * @ops: Driver ops to register
+ * @drv_struct: 'struct driver_fwctl' that holds the struct fwctl_device
+ * @member: Name of the struct fwctl_device in @drv_struct
+ *
+ * This allocates and initializes the fwctl_device embedded in the drv_struct.
+ * Upon success the pointer must be freed via fwctl_put(). Returns a 'drv_struct
+ * \*' on success, NULL on error.
+ */
+#define fwctl_alloc_device(parent, ops, drv_struct, member)               \
+	({                                                                \
+		static_assert(__same_type(struct fwctl_device,            \
+					  ((drv_struct *)NULL)->member)); \
+		static_assert(offsetof(drv_struct, member) == 0);         \
+		(drv_struct *)_fwctl_alloc_device(parent, ops,            \
+						  sizeof(drv_struct));    \
+	})
+
+static inline struct fwctl_device *fwctl_get(struct fwctl_device *fwctl)
+{
+	get_device(&fwctl->dev);
+	return fwctl;
+}
+static inline void fwctl_put(struct fwctl_device *fwctl)
+{
+	put_device(&fwctl->dev);
+}
+DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
+
+int fwctl_register(struct fwctl_device *fwctl);
+void fwctl_unregister(struct fwctl_device *fwctl);
+
+#endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
  2024-08-21 18:10 ` [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-23 14:02   ` Jonathan Cameron
  2024-08-21 18:10 ` [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

Each file descriptor gets a chunk of per-FD driver specific context that
allows the driver to attach a device specific struct to. The core code
takes care of the memory lifetime for this structure.

The ioctl dispatch and design is based on what was built for iommufd. The
ioctls have a struct which has a combined in/out behavior with a typical
'zero pad' scheme for future extension and backwards compatibility.

Like iommufd some shared logic does most of the ioctl marshalling and
compatibility work and tables diatches to some function pointers for
each unique iotcl.

This approach has proven to work quite well in the iommufd and rdma
subsystems.

Allocate an ioctl number space for the subsystem.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 MAINTAINERS                                   |   1 +
 drivers/fwctl/main.c                          | 139 +++++++++++++++++-
 include/linux/fwctl.h                         |  46 ++++++
 include/uapi/fwctl/fwctl.h                    |  38 +++++
 5 files changed, 223 insertions(+), 2 deletions(-)
 create mode 100644 include/uapi/fwctl/fwctl.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index e91c0376ee5934..c581686451fb1e 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -327,6 +327,7 @@ Code  Seq#    Include File                                           Comments
 0x97  00-7F  fs/ceph/ioctl.h                                         Ceph file system
 0x99  00-0F                                                          537-Addinboard driver
                                                                      <mailto:buk@buks.ipn.de>
+0x9A  00-0F  include/uapi/fwctl/fwctl.h
 0xA0  all    linux/sdp/sdp.h                                         Industrial Device Project
                                                                      <mailto:kenji@bitgate.com>
 0xA1  0      linux/vtpm_proxy.h                                      TPM Emulator Proxy Driver
diff --git a/MAINTAINERS b/MAINTAINERS
index 2efd8d14495431..97945ca04b1108 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9248,6 +9248,7 @@ S:	Maintained
 F:	Documentation/userspace-api/fwctl.rst
 F:	drivers/fwctl/
 F:	include/linux/fwctl.h
+F:	include/uapi/fwctl/
 
 GALAXYCORE GC0308 CAMERA SENSOR DRIVER
 M:	Sebastian Reichel <sre@kernel.org>
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index 7f3e7713d0e6e9..f2e30ffc1e0cb5 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -10,26 +10,136 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#include <uapi/fwctl/fwctl.h>
+
 enum {
 	FWCTL_MAX_DEVICES = 256,
 };
 static dev_t fwctl_dev;
 static DEFINE_IDA(fwctl_ida);
 
+struct fwctl_ucmd {
+	struct fwctl_uctx *uctx;
+	void __user *ubuffer;
+	void *cmd;
+	u32 user_size;
+};
+
+/* On stack memory for the ioctl structs */
+union ucmd_buffer {
+};
+
+struct fwctl_ioctl_op {
+	unsigned int size;
+	unsigned int min_size;
+	unsigned int ioctl_num;
+	int (*execute)(struct fwctl_ucmd *ucmd);
+};
+
+#define IOCTL_OP(_ioctl, _fn, _struct, _last)                         \
+	[_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = {                        \
+		.size = sizeof(_struct) +                             \
+			BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
+					  sizeof(_struct)),           \
+		.min_size = offsetofend(_struct, _last),              \
+		.ioctl_num = _ioctl,                                  \
+		.execute = _fn,                                       \
+	}
+static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
+};
+
+static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
+			       unsigned long arg)
+{
+	struct fwctl_uctx *uctx = filp->private_data;
+	const struct fwctl_ioctl_op *op;
+	struct fwctl_ucmd ucmd = {};
+	union ucmd_buffer buf;
+	unsigned int nr;
+	int ret;
+
+	nr = _IOC_NR(cmd);
+	if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
+		return -ENOIOCTLCMD;
+
+	op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
+	if (op->ioctl_num != cmd)
+		return -ENOIOCTLCMD;
+
+	ucmd.uctx = uctx;
+	ucmd.cmd = &buf;
+	ucmd.ubuffer = (void __user *)arg;
+	ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
+	if (ret)
+		return ret;
+
+	if (ucmd.user_size < op->min_size)
+		return -EINVAL;
+
+	ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
+				    ucmd.user_size);
+	if (ret)
+		return ret;
+
+	guard(rwsem_read)(&uctx->fwctl->registration_lock);
+	if (!uctx->fwctl->ops)
+		return -ENODEV;
+	return op->execute(&ucmd);
+}
+
 static int fwctl_fops_open(struct inode *inode, struct file *filp)
 {
 	struct fwctl_device *fwctl =
 		container_of(inode->i_cdev, struct fwctl_device, cdev);
+	int ret;
+
+	guard(rwsem_read)(&fwctl->registration_lock);
+	if (!fwctl->ops)
+		return -ENODEV;
+
+	struct fwctl_uctx *uctx __free(kfree) =
+		kzalloc(fwctl->ops->uctx_size, GFP_KERNEL_ACCOUNT);
+	if (!uctx)
+		return -ENOMEM;
+
+	uctx->fwctl = fwctl;
+	ret = fwctl->ops->open_uctx(uctx);
+	if (ret)
+		return ret;
+
+	scoped_guard(mutex, &fwctl->uctx_list_lock) {
+		list_add_tail(&uctx->uctx_list_entry, &fwctl->uctx_list);
+	}
 
 	get_device(&fwctl->dev);
-	filp->private_data = fwctl;
+	filp->private_data = no_free_ptr(uctx);
 	return 0;
 }
 
+static void fwctl_destroy_uctx(struct fwctl_uctx *uctx)
+{
+	lockdep_assert_held(&uctx->fwctl->uctx_list_lock);
+	list_del(&uctx->uctx_list_entry);
+	uctx->fwctl->ops->close_uctx(uctx);
+}
+
 static int fwctl_fops_release(struct inode *inode, struct file *filp)
 {
-	struct fwctl_device *fwctl = filp->private_data;
+	struct fwctl_uctx *uctx = filp->private_data;
+	struct fwctl_device *fwctl = uctx->fwctl;
 
+	scoped_guard(rwsem_read, &fwctl->registration_lock) {
+		/*
+		 * fwctl_unregister() has already removed the driver and
+		 * destroyed the uctx.
+		 */
+		if (fwctl->ops) {
+			guard(mutex)(&fwctl->uctx_list_lock);
+			fwctl_destroy_uctx(uctx);
+		}
+	}
+
+	kfree(uctx);
 	fwctl_put(fwctl);
 	return 0;
 }
@@ -38,6 +148,7 @@ static const struct file_operations fwctl_fops = {
 	.owner = THIS_MODULE,
 	.open = fwctl_fops_open,
 	.release = fwctl_fops_release,
+	.unlocked_ioctl = fwctl_fops_ioctl,
 };
 
 static void fwctl_device_release(struct device *device)
@@ -46,6 +157,7 @@ static void fwctl_device_release(struct device *device)
 		container_of(device, struct fwctl_device, dev);
 
 	ida_free(&fwctl_ida, fwctl->dev.devt - fwctl_dev);
+	mutex_destroy(&fwctl->uctx_list_lock);
 	kfree(fwctl);
 }
 
@@ -71,6 +183,9 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
 
 	fwctl->dev.class = &fwctl_class;
 	fwctl->dev.parent = parent;
+	init_rwsem(&fwctl->registration_lock);
+	mutex_init(&fwctl->uctx_list_lock);
+	INIT_LIST_HEAD(&fwctl->uctx_list);
 
 	devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
 	if (devnum < 0)
@@ -127,6 +242,10 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
  * Undoes fwctl_register(). On return no driver ops will be called. The
  * caller must still call fwctl_put() to free the fwctl.
  *
+ * Unregister will return even if userspace still has file descriptors open.
+ * This will call ops->close_uctx() on any open FDs and after return no driver
+ * op will be called. The FDs remain open but all fops will return -ENODEV.
+ *
  * The design of fwctl allows this sort of disassociation of the driver from the
  * subsystem primarily by keeping memory allocations owned by the core subsytem.
  * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
@@ -134,7 +253,23 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
  */
 void fwctl_unregister(struct fwctl_device *fwctl)
 {
+	struct fwctl_uctx *uctx;
+
 	cdev_device_del(&fwctl->cdev, &fwctl->dev);
+
+	/* Disable and free the driver's resources for any still open FDs. */
+	guard(rwsem_write)(&fwctl->registration_lock);
+	guard(mutex)(&fwctl->uctx_list_lock);
+	while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
+						struct fwctl_uctx,
+						uctx_list_entry)))
+		fwctl_destroy_uctx(uctx);
+
+	/*
+	 * The driver module may unload after this returns, the op pointer will
+	 * not be valid.
+	 */
+	fwctl->ops = NULL;
 }
 EXPORT_SYMBOL_NS_GPL(fwctl_unregister, FWCTL);
 
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 68ac2d5ab87481..ca4245825e91bf 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -11,7 +11,30 @@
 struct fwctl_device;
 struct fwctl_uctx;
 
+/**
+ * struct fwctl_ops - Driver provided operations
+ *
+ * fwctl_unregister() will wait until all excuting ops are completed before it
+ * returns. Drivers should be mindful to not let their ops run for too long as
+ * it will block device hot unplug and module unloading.
+ */
 struct fwctl_ops {
+	/**
+	 * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
+	 * bytes of this memory will be a fwctl_uctx. The driver can use the
+	 * remaining bytes as its private memory.
+	 */
+	size_t uctx_size;
+	/**
+	 * @open_uctx: Called when a file descriptor is opened before the uctx
+	 * is ever used.
+	 */
+	int (*open_uctx)(struct fwctl_uctx *uctx);
+	/**
+	 * @close_uctx: Called when the uctx is destroyed, usually when the FD
+	 * is closed.
+	 */
+	void (*close_uctx)(struct fwctl_uctx *uctx);
 };
 
 /**
@@ -26,6 +49,15 @@ struct fwctl_device {
 	struct device dev;
 	/* private: */
 	struct cdev cdev;
+
+	/*
+	 * Protect ops, held for write when ops becomes NULL during unregister,
+	 * held for read whenver ops is loaded or an ops function is running.
+	 */
+	struct rw_semaphore registration_lock;
+	/* Protect uctx_list */
+	struct mutex uctx_list_lock;
+	struct list_head uctx_list;
 	const struct fwctl_ops *ops;
 };
 
@@ -66,4 +98,18 @@ DEFINE_FREE(fwctl, struct fwctl_device *, if (_T) fwctl_put(_T));
 int fwctl_register(struct fwctl_device *fwctl);
 void fwctl_unregister(struct fwctl_device *fwctl);
 
+/**
+ * struct fwctl_uctx - Per user FD context
+ * @fwctl: fwctl instance that owns the context
+ *
+ * Every FD opened by userspace will get a unique context allocation. Any driver
+ * private data will follow immediately after.
+ */
+struct fwctl_uctx {
+	struct fwctl_device *fwctl;
+	/* private: */
+	/* Head at fwctl_device::uctx_list */
+	struct list_head uctx_list_entry;
+};
+
 #endif
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
new file mode 100644
index 00000000000000..22fa750d7e8184
--- /dev/null
+++ b/include/uapi/fwctl/fwctl.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
+ */
+#ifndef _UAPI_FWCTL_H
+#define _UAPI_FWCTL_H
+
+#define FWCTL_TYPE 0x9A
+
+/**
+ * DOC: General ioctl format
+ *
+ * The ioctl interface follows a general format to allow for extensibility. Each
+ * ioctl is passed in a structure pointer as the argument providing the size of
+ * the structure in the first u32. The kernel checks that any structure space
+ * beyond what it understands is 0. This allows userspace to use the backward
+ * compatible portion while consistently using the newer, larger, structures.
+ *
+ * ioctls use a standard meaning for common errnos:
+ *
+ *  - ENOTTY: The IOCTL number itself is not supported at all
+ *  - E2BIG: The IOCTL number is supported, but the provided structure has
+ *    non-zero in a part the kernel does not understand.
+ *  - EOPNOTSUPP: The IOCTL number is supported, and the structure is
+ *    understood, however a known field has a value the kernel does not
+ *    understand or support.
+ *  - EINVAL: Everything about the IOCTL was understood, but a field is not
+ *    correct.
+ *  - ENOMEM: Out of memory.
+ *  - ENODEV: The underlying device has been hot-unplugged and the FD is
+ *            orphaned.
+ *
+ * As well as additional errnos, within specific ioctls.
+ */
+enum {
+	FWCTL_CMD_BASE = 0,
+};
+
+#endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
  2024-08-21 18:10 ` [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
  2024-08-21 18:10 ` [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-23 14:14   ` Jonathan Cameron
  2024-08-21 18:10 ` [PATCH v3 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

Userspace will need to know some details about the fwctl interface being
used to locate the correct userspace code to communicate with the
kernel. Provide a simple device_type enum indicating what the kernel
driver is.

Allow the device to provide a device specific info struct that contains
any additional information that the driver may need to provide to
userspace.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/fwctl/main.c       | 51 ++++++++++++++++++++++++++++++++++++++
 include/linux/fwctl.h      | 12 +++++++++
 include/uapi/fwctl/fwctl.h | 32 ++++++++++++++++++++++++
 3 files changed, 95 insertions(+)

diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index f2e30ffc1e0cb5..b281ccc52b4e57 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -25,8 +25,58 @@ struct fwctl_ucmd {
 	u32 user_size;
 };
 
+static int ucmd_respond(struct fwctl_ucmd *ucmd, size_t cmd_len)
+{
+	if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
+			 min_t(size_t, ucmd->user_size, cmd_len)))
+		return -EFAULT;
+	return 0;
+}
+
+static int copy_to_user_zero_pad(void __user *to, const void *from,
+				 size_t from_len, size_t user_len)
+{
+	size_t copy_len;
+
+	copy_len = min(from_len, user_len);
+	if (copy_to_user(to, from, copy_len))
+		return -EFAULT;
+	if (copy_len < user_len) {
+		if (clear_user(to + copy_len, user_len - copy_len))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
+{
+	struct fwctl_device *fwctl = ucmd->uctx->fwctl;
+	struct fwctl_info *cmd = ucmd->cmd;
+	size_t driver_info_len = 0;
+
+	if (cmd->flags)
+		return -EOPNOTSUPP;
+
+	if (cmd->device_data_len) {
+		void *driver_info __free(kfree) =
+			fwctl->ops->info(ucmd->uctx, &driver_info_len);
+		if (IS_ERR(driver_info))
+			return PTR_ERR(driver_info);
+
+		if (copy_to_user_zero_pad(u64_to_user_ptr(cmd->out_device_data),
+					  driver_info, driver_info_len,
+					  cmd->device_data_len))
+			return -EFAULT;
+	}
+
+	cmd->out_device_type = fwctl->ops->device_type;
+	cmd->device_data_len = driver_info_len;
+	return ucmd_respond(ucmd, sizeof(*cmd));
+}
+
 /* On stack memory for the ioctl structs */
 union ucmd_buffer {
+	struct fwctl_info info;
 };
 
 struct fwctl_ioctl_op {
@@ -46,6 +96,7 @@ struct fwctl_ioctl_op {
 		.execute = _fn,                                       \
 	}
 static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
+	IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
 };
 
 static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index ca4245825e91bf..6b596931a55169 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -7,6 +7,7 @@
 #include <linux/device.h>
 #include <linux/cdev.h>
 #include <linux/cleanup.h>
+#include <uapi/fwctl/fwctl.h>
 
 struct fwctl_device;
 struct fwctl_uctx;
@@ -19,6 +20,10 @@ struct fwctl_uctx;
  * it will block device hot unplug and module unloading.
  */
 struct fwctl_ops {
+	/**
+	 * @device_type: The drivers assigned device_type number. This is uABI.
+	 */
+	enum fwctl_device_type device_type;
 	/**
 	 * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
 	 * bytes of this memory will be a fwctl_uctx. The driver can use the
@@ -35,6 +40,13 @@ struct fwctl_ops {
 	 * is closed.
 	 */
 	void (*close_uctx)(struct fwctl_uctx *uctx);
+	/**
+	 * @info: Implement FWCTL_INFO. Return a kmalloc() memory that is copied
+	 * to out_device_data. On input length indicates the size of the user
+	 * buffer on output it indicates the size of the memory. The driver can
+	 * ignore length on input, the core code will handle everything.
+	 */
+	void *(*info)(struct fwctl_uctx *uctx, size_t *length);
 };
 
 /**
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 22fa750d7e8184..39db9f09f8068e 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -4,6 +4,9 @@
 #ifndef _UAPI_FWCTL_H
 #define _UAPI_FWCTL_H
 
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
 #define FWCTL_TYPE 0x9A
 
 /**
@@ -33,6 +36,35 @@
  */
 enum {
 	FWCTL_CMD_BASE = 0,
+	FWCTL_CMD_INFO = 0,
+	FWCTL_CMD_RPC = 1,
 };
 
+enum fwctl_device_type {
+	FWCTL_DEVICE_TYPE_ERROR = 0,
+};
+
+/**
+ * struct fwctl_info - ioctl(FWCTL_INFO)
+ * @size: sizeof(struct fwctl_info)
+ * @flags: Must be 0
+ * @out_device_type: Returns the type of the device from enum fwctl_device_type
+ * @device_data_len: On input the length of the out_device_data memory. On
+ *	output the size of the kernel's device_data which may be larger or
+ *	smaller than the input. Maybe 0 on input.
+ * @out_device_data: Pointer to a memory of device_data_len bytes. Kernel will
+ *	fill the entire memory, zeroing as required.
+ *
+ * Returns basic information about this fwctl instance, particularly what driver
+ * is being used to define the device_data format.
+ */
+struct fwctl_info {
+	__u32 size;
+	__u32 flags;
+	__u32 out_device_type;
+	__u32 device_data_len;
+	__aligned_u64 out_device_data;
+};
+#define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
+
 #endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 04/10] taint: Add TAINT_FWCTL
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2024-08-21 18:10 ` [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-21 23:35   ` Greg Kroah-Hartman
  2024-08-21 18:10 ` [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

Requesting a fwctl scope of access that includes mutating device debug
data will cause the kernel to be tainted. Changing the device operation
through things in the debug scope may cause the device to malfunction in
undefined ways. This should be reflected in the TAINT flags to help any
debuggers understand that something has been done.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 Documentation/admin-guide/tainted-kernels.rst | 5 +++++
 include/linux/panic.h                         | 3 ++-
 kernel/panic.c                                | 1 +
 tools/debugging/kernel-chktaint               | 8 ++++++++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index f92551539e8a66..f91a54966a9719 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -101,6 +101,7 @@ Bit  Log  Number  Reason that got the kernel tainted
  16  _/X   65536  auxiliary taint, defined for and used by distros
  17  _/T  131072  kernel was built with the struct randomization plugin
  18  _/N  262144  an in-kernel test has been run
+ 19  _/J  524288  userspace used a mutating debug operation in fwctl
 ===  ===  ======  ========================================================
 
 Note: The character ``_`` is representing a blank in this table to make reading
@@ -182,3 +183,7 @@ More detailed explanation for tainting
      produce extremely unusual kernel structure layouts (even performance
      pathological ones), which is important to know when debugging. Set at
      build time.
+
+ 18) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
+     to use the devices debugging features. Device debugging features could
+     cause the device to malfunction in undefined ways.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index 3130e0b5116b03..bf4f276be661b7 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -73,7 +73,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
 #define TAINT_AUX			16
 #define TAINT_RANDSTRUCT		17
 #define TAINT_TEST			18
-#define TAINT_FLAGS_COUNT		19
+#define TAINT_FWCTL			19
+#define TAINT_FLAGS_COUNT		20
 #define TAINT_FLAGS_MAX			((1UL << TAINT_FLAGS_COUNT) - 1)
 
 struct taint_flag {
diff --git a/kernel/panic.c b/kernel/panic.c
index f861bedc1925e7..997b18c7455cd4 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -502,6 +502,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
 	TAINT_FLAG(AUX,				'X', ' ', true),
 	TAINT_FLAG(RANDSTRUCT,			'T', ' ', true),
 	TAINT_FLAG(TEST,			'N', ' ', true),
+	TAINT_FLAG(FWCTL,			'J', ' ', true),
 };
 
 #undef TAINT_FLAG
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 279be06332be99..e7da0909d09707 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -204,6 +204,14 @@ else
 	echo " * an in-kernel test (such as a KUnit test) has been run (#18)"
 fi
 
+T=`expr $T / 2`
+if [ `expr $T % 2` -eq 0 ]; then
+	addout " "
+else
+	addout "J"
+	echo " * fwctl's mutating debug interface was used (#19)"
+fi
+
 echo "For a more detailed explanation of the various taint flags see"
 echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources"
 echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html"
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2024-08-21 18:10 ` [PATCH v3 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-21 23:49   ` Jakub Kicinski
  2024-08-23 14:23   ` Jonathan Cameron
  2024-08-21 18:10 ` [PATCH v3 06/10] fwctl: Add documentation Jason Gunthorpe
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

Add the FWCTL_RPC ioctl which allows a request/response RPC call to device
firmware. Drivers implementing this call must follow the security
guidelines under Documentation/userspace-api/fwctl.rst

The core code provides some memory management helpers to get the messages
copied from and back to userspace. The driver is responsible for
allocating the output message memory and delivering the message to the
device.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/fwctl/main.c       | 60 +++++++++++++++++++++++++++++++++
 include/linux/fwctl.h      |  8 +++++
 include/uapi/fwctl/fwctl.h | 68 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 136 insertions(+)

diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index b281ccc52b4e57..54a7356e586cda 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -8,15 +8,18 @@
 #include <linux/container_of.h>
 #include <linux/fs.h>
 #include <linux/module.h>
+#include <linux/sizes.h>
 #include <linux/slab.h>
 
 #include <uapi/fwctl/fwctl.h>
 
 enum {
 	FWCTL_MAX_DEVICES = 256,
+	MAX_RPC_LEN = SZ_2M,
 };
 static dev_t fwctl_dev;
 static DEFINE_IDA(fwctl_ida);
+static unsigned long fwctl_tainted;
 
 struct fwctl_ucmd {
 	struct fwctl_uctx *uctx;
@@ -74,9 +77,65 @@ static int fwctl_cmd_info(struct fwctl_ucmd *ucmd)
 	return ucmd_respond(ucmd, sizeof(*cmd));
 }
 
+static int fwctl_cmd_rpc(struct fwctl_ucmd *ucmd)
+{
+	struct fwctl_device *fwctl = ucmd->uctx->fwctl;
+	struct fwctl_rpc *cmd = ucmd->cmd;
+	size_t out_len;
+
+	if (cmd->in_len > MAX_RPC_LEN || cmd->out_len > MAX_RPC_LEN)
+		return -EMSGSIZE;
+
+	switch (cmd->scope) {
+	case FWCTL_RPC_CONFIGURATION:
+	case FWCTL_RPC_DEBUG_READ_ONLY:
+		break;
+
+	case FWCTL_RPC_DEBUG_WRITE_FULL:
+		if (!capable(CAP_SYS_RAWIO))
+			return -EPERM;
+		fallthrough;
+	case FWCTL_RPC_DEBUG_WRITE:
+		if (!test_and_set_bit(0, &fwctl_tainted)) {
+			dev_warn(
+				&fwctl->dev,
+				"%s(%d): has requested full access to the physical device device",
+				current->comm, task_pid_nr(current));
+			add_taint(TAINT_FWCTL, LOCKDEP_STILL_OK);
+		}
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	void *inbuf __free(kvfree) = kvzalloc(cmd->in_len, GFP_KERNEL_ACCOUNT);
+	if (!inbuf)
+		return -ENOMEM;
+	if (copy_from_user(inbuf, u64_to_user_ptr(cmd->in), cmd->in_len))
+		return -EFAULT;
+
+	out_len = cmd->out_len;
+	void *outbuf __free(kvfree) = fwctl->ops->fw_rpc(
+		ucmd->uctx, cmd->scope, inbuf, cmd->in_len, &out_len);
+	if (IS_ERR(outbuf))
+		return PTR_ERR(outbuf);
+	if (outbuf == inbuf) {
+		/* The driver can re-use inbuf as outbuf */
+		inbuf = NULL;
+	}
+
+	if (copy_to_user(u64_to_user_ptr(cmd->out), outbuf,
+			 min(cmd->out_len, out_len)))
+		return -EFAULT;
+
+	cmd->out_len = out_len;
+	return ucmd_respond(ucmd, sizeof(*cmd));
+}
+
 /* On stack memory for the ioctl structs */
 union ucmd_buffer {
 	struct fwctl_info info;
+	struct fwctl_rpc rpc;
 };
 
 struct fwctl_ioctl_op {
@@ -97,6 +156,7 @@ struct fwctl_ioctl_op {
 	}
 static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
 	IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
+	IOCTL_OP(FWCTL_RPC, fwctl_cmd_rpc, struct fwctl_rpc, out),
 };
 
 static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
index 6b596931a55169..6eac9497ff1afc 100644
--- a/include/linux/fwctl.h
+++ b/include/linux/fwctl.h
@@ -47,6 +47,14 @@ struct fwctl_ops {
 	 * ignore length on input, the core code will handle everything.
 	 */
 	void *(*info)(struct fwctl_uctx *uctx, size_t *length);
+	/**
+	 * @fw_rpc: Implement FWCTL_RPC. Deliver rpc_in/in_len to the FW and
+	 * return the response and set out_len. rpc_in can be returned as the
+	 * response pointer. Otherwise the returned pointer is freed with
+	 * kvfree().
+	 */
+	void *(*fw_rpc)(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+			void *rpc_in, size_t in_len, size_t *out_len);
 };
 
 /**
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 39db9f09f8068e..3af9f9eb9b1878 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -67,4 +67,72 @@ struct fwctl_info {
 };
 #define FWCTL_INFO _IO(FWCTL_TYPE, FWCTL_CMD_INFO)
 
+/**
+ * enum fwctl_rpc_scope - Scope of access for the RPC
+ *
+ * Refer to fwctl.rst for a more detailed discussion of these scopes.
+ */
+enum fwctl_rpc_scope {
+	/**
+	 * @FWCTL_RPC_CONFIGURATION: Device configuration access scope
+	 *
+	 * Read/write access to device configuration. When configuration
+	 * is written to the device it remains in a fully supported state.
+	 */
+	FWCTL_RPC_CONFIGURATION = 0,
+	/**
+	 * @FWCTL_RPC_DEBUG_READ_ONLY: Read only access to debug information
+	 *
+	 * Readable debug information. Debug information is compatible with
+	 * kernel lockdown, and does not disclose any sensitive information. For
+	 * instance exposing any encryption secrets from this information is
+	 * forbidden.
+	 */
+	FWCTL_RPC_DEBUG_READ_ONLY = 1,
+	/**
+	 * @FWCTL_RPC_DEBUG_WRITE: Writable access to lockdown compatible debug information
+	 *
+	 * Allows write access to data in the device which may leave a fully
+	 * supported state. This is intended to permit intensive and possibly
+	 * invasive debugging. This scope will taint the kernel.
+	 */
+	FWCTL_RPC_DEBUG_WRITE = 2,
+	/**
+	 * @FWCTL_RPC_DEBUG_WRITE_FULL: Write access to all debug information
+	 *
+	 * Allows read/write access to everything. Requires CAP_SYS_RAW_IO, so
+	 * it is not required to follow lockdown principals. If in doubt
+	 * debugging should be placed in this scope. This scope will taint the
+	 * kernel.
+	 */
+	FWCTL_RPC_DEBUG_WRITE_FULL = 3,
+};
+
+/**
+ * struct fwctl_rpc - ioctl(FWCTL_RPC)
+ * @size: sizeof(struct fwctl_rpc)
+ * @scope: One of enum fwctl_rpc_scope, required scope for the RPC
+ * @in_len: Length of the in memory
+ * @out_len: Length of the out memory
+ * @in: Request message in device specific format
+ * @out: Response message in device specific format
+ *
+ * Deliver a Remote Procedure Call to the device FW and return the response. The
+ * call's parameters and return are marshaled into linear buffers of memory. Any
+ * errno indicates that delivery of the RPC to the device failed. Return status
+ * originating in the device during a successful delivery must be encoded into
+ * out.
+ *
+ * The format of the buffers matches the out_device_type from FWCTL_INFO.
+ */
+struct fwctl_rpc {
+	__u32 size;
+	__u32 scope;
+	__u32 in_len;
+	__u32 out_len;
+	__aligned_u64 in;
+	__aligned_u64 out;
+};
+#define FWCTL_RPC _IO(FWCTL_TYPE, FWCTL_CMD_RPC)
+
 #endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 06/10] fwctl: Add documentation
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2024-08-21 18:10 ` [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-23 14:35   ` Jonathan Cameron
  2024-08-21 18:10 ` [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

Document the purpose and rules for the fwctl subsystem.

Link in kdocs to the doc tree.

Nacked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 Documentation/userspace-api/fwctl.rst | 285 ++++++++++++++++++++++++++
 Documentation/userspace-api/index.rst |   1 +
 2 files changed, 286 insertions(+)
 create mode 100644 Documentation/userspace-api/fwctl.rst

diff --git a/Documentation/userspace-api/fwctl.rst b/Documentation/userspace-api/fwctl.rst
new file mode 100644
index 00000000000000..8f3da30ee7c91b
--- /dev/null
+++ b/Documentation/userspace-api/fwctl.rst
@@ -0,0 +1,285 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+fwctl subsystem
+===============
+
+:Author: Jason Gunthorpe
+
+Overview
+========
+
+Modern devices contain extensive amounts of FW, and in many cases, are largely
+software-defined pieces of hardware. The evolution of this approach is largely a
+reaction to Moore's Law where a chip tape out is now highly expensive, and the
+chip design is extremely large. Replacing fixed HW logic with a flexible and
+tightly coupled FW/HW combination is an effective risk mitigation against chip
+respin. Problems in the HW design can be counteracted in device FW. This is
+especially true for devices which present a stable and backwards compatible
+interface to the operating system driver (such as NVMe).
+
+The FW layer in devices has grown to incredible sizes and devices frequently
+integrate clusters of fast processors to run it. For example, mlx5 devices have
+over 30MB of FW code, and big configurations operate with over 1GB of FW managed
+runtime state.
+
+The availability of such a flexible layer has created quite a variety in the
+industry where single pieces of silicon are now configurable software-defined
+devices and can operate in substantially different ways depending on the need.
+Further, we often see cases where specific sites wish to operate devices in ways
+that are highly specialized and require applications that have been tailored to
+their unique configuration.
+
+Further, devices have become multi-functional and integrated to the point they
+no longer fit neatly into the kernel's division of subsystems. Modern
+multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
+subsystems while sharing the underlying hardware using the auxiliary device
+system.
+
+All together this creates a challenge for the operating system, where devices
+have an expansive FW environment that needs robust device-specific debugging
+support, and FW-driven functionality that is not well suited to “generic”
+interfaces. fwctl seeks to allow access to the full device functionality from
+user space in the areas of debuggability, management, and first-boot/nth-boot
+provisioning.
+
+fwctl is aimed at the common device design pattern where the OS and FW
+communicate via an RPC message layer constructed with a queue or mailbox scheme.
+In this case the driver will typically have some layer to deliver RPC messages
+and collect RPC responses from device FW. The in-kernel subsystem drivers that
+operate the device for its primary purposes will use these RPCs to build their
+drivers, but devices also usually have a set of ancillary RPCs that don't really
+fit into any specific subsystem. For example, a HW RAID controller is primarily
+operated by the block layer but also comes with a set of RPCs to administer the
+construction of drives within the HW RAID.
+
+In the past when devices were more single function, individual subsystems would
+grow different approaches to solving some of these common problems. For instance
+monitoring device health, manipulating its FLASH, debugging the FW,
+provisioning, all have various unique interfaces across the kernel.
+
+fwctl's purpose is to define a common set of limited rules, described below,
+that allow user space to securely construct and execute RPCs inside device FW.
+The rules serve as an agreement between the operating system and FW on how to
+correctly design the RPC interface. As a uAPI the subsystem provides a thin
+layer of discovery and a generic uAPI to deliver the RPCs and collect the
+response. It supports a system of user space libraries and tools which will
+use this interface to control the device using the device native protocols.
+
+Scope of Action
+---------------
+
+fwctl drivers are strictly restricted to being a way to operate the device FW.
+It is not an avenue to access random kernel internals, or other operating system
+SW states.
+
+fwctl instances must operate on a well-defined device function, and the device
+should have a well-defined security model for what scope within the physical
+device the function is permitted to access. For instance, the most complex PCIe
+device today may broadly have several function-level scopes:
+
+ 1. A privileged function with full access to the on-device global state and
+    configuration
+
+ 2. Multiple hypervisor functions with control over itself and child functions
+    used with VMs
+
+ 3. Multiple VM functions tightly scoped within the VM
+
+The device may create a logical parent/child relationship between these scopes.
+For instance a child VM's FW may be within the scope of the hypervisor FW. It is
+quite common in the VFIO world that the hypervisor environment has a complex
+provisioning/profiling/configuration responsibility for the function VFIO
+assigns to the VM.
+
+Further, within the function, devices often have RPC commands that fall within
+some general scopes of action (see enum fwctl_rpc_scope):
+
+ 1. Access to function & child configuration, FLASH, etc. that becomes live at a
+    function reset. Access to function & child runtime configuration that is
+    transparent or non-disruptive to any driver or VM.
+
+ 2. Read-only access to function debug information that may report on FW objects
+    in the function & child, including FW objects owned by other kernel
+    subsystems.
+
+ 3. Write access to function & child debug information strictly compatible with
+    the principles of kernel lockdown and kernel integrity protection. Triggers
+    a kernel Taint.
+
+ 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
+
+User space will provide a scope label on each RPC and the kernel must enforce the
+above CAPs and taints based on that scope. A combination of kernel and FW can
+enforce that RPCs are placed in the correct scope by user space.
+
+Denied behavior
+---------------
+
+There are many things this interface must not allow user space to do (without a
+Taint or CAP), broadly derived from the principles of kernel lockdown. Some
+examples:
+
+ 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
+    untrusted code, or otherwise compromise device or system security and
+    integrity.
+
+ 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
+    objects owned by kernel drivers.
+
+ 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
+    driver can react to the device configuration at function reset/driver load
+    time, but otherwise must not be coupled to fwctl.
+
+ 4. Operate the HW in a way that overlaps with the core purpose of another
+    primary kernel subsystem, such as read/write to LBAs, send/receive of
+    network packets, or operate an accelerator's data plane.
+
+fwctl is not a replacement for device direct access subsystems like uacce or
+VFIO.
+
+Operations exposed through fwctl's non-taining interfaces should be fully
+sharable with other users of the device. For instance exposing a RPC through
+fwctl should never prevent a kernel subsystem from also concurrently using that
+same RPC or hardware unit down the road. In such cases fwctl will be less
+important than proper kernel subsystems that eventually emerge. Mistakes in this
+area resulting in clashes will be resolved in favour of a kernel implementation.
+
+fwctl User API
+==============
+
+.. kernel-doc:: include/uapi/fwctl/fwctl.h
+.. kernel-doc:: include/uapi/fwctl/mlx5.h
+
+sysfs Class
+-----------
+
+fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
+(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
+operates the iotcl uAPI described above.
+
+fwctl devices can be related to driver components in other subsystems through
+sysfs::
+
+    $ ls /sys/class/fwctl/fwctl0/device/infiniband/
+    ibp0s10f0
+
+    $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
+    fwctl0/
+
+    $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
+    dev  device  power  subsystem  uevent
+
+User space Community
+--------------------
+
+Drawing inspiration from nvme-cli, participating in the kernel side must come
+with a user space in a common TBD git tree, at a minimum to usefully operate the
+kernel driver. Providing such an implementation is a pre-condition to merging a
+kernel driver.
+
+The goal is to build user space community around some of the shared problems
+we all have, and ideally develop some common user space programs with some
+starting themes of:
+
+ - Device in-field debugging
+
+ - HW provisioning
+
+ - VFIO child device profiling before VM boot
+
+ - Confidential Compute topics (attestation, secure provisioning)
+
+that stretch across all subsystems in the kernel. fwupd is a great example of
+how an excellent user space experience can emerge out of kernel-side diversity.
+
+fwctl Kernel API
+================
+
+.. kernel-doc:: drivers/fwctl/main.c
+   :export:
+.. kernel-doc:: include/linux/fwctl.h
+
+fwctl Driver design
+-------------------
+
+In many cases a fwctl driver is going to be part of a larger cross-subsystem
+device possibly using the auxiliary_device mechanism. In that case several
+subsystems are going to be sharing the same device and FW interface layer so the
+device design must already provide for isolation and cooperation between kernel
+subsystems. fwctl should fit into that same model.
+
+Part of the driver should include a description of how its scope restrictions
+and security model work. The driver and FW together must ensure that RPCs
+provided by user space are mapped to the appropriate scope. If the validation is
+done in the driver then the validation can read a 'command effects' report from
+the device, or hardwire the enforcement. If the validation is done in the FW,
+then the driver should pass the fwctl_rpc_scope to the FW along with the command.
+
+The driver and FW must cooperate to ensure that either fwctl cannot allocate
+any FW resources, or any resources it does allocate are freed on FD closure.  A
+driver primarily constructed around FW RPCs may find that its core PCI function
+and RPC layer belongs under fwctl with auxiliary devices connecting to other
+subsystems.
+
+Each device type must be mindful of Linux's philosophy for stable ABI. The FW
+RPC interface does not have to meet a strictly stable ABI, but it does need to
+meet an expectation that userspace tools that are deployed and in significant
+use don't needlessly break. FW upgrade and kernel upgrade should keep widely
+deployed tooling working.
+
+Development and debugging focused RPCs under more permissive scopes can have
+less stablitiy if the tools using them are only run under exceptional
+circumstances and not for every day use of the device. Debugging tools may even
+require exact version matching as they may require something similar to DWARF
+debug information from the FW binary.
+
+Security Response
+=================
+
+The kernel remains the gatekeeper for this interface. If violations of the
+scopes, security or isolation principles are found, we have options to let
+devices fix them with a FW update, push a kernel patch to parse and block RPC
+commands or push a kernel patch to block entire firmware versions/devices.
+
+While the kernel can always directly parse and restrict RPCs, it is expected
+that the existing kernel pattern of allowing drivers to delegate validation to
+FW to be a useful design.
+
+Existing Similar Examples
+=========================
+
+The approach described in this document is not a new idea. Direct, or near
+direct device access has been offered by the kernel in different areas for
+decades. With more devices wanting to follow this design pattern it is becoming
+clear that it is not entirely well understood and, more importantly, the
+security considerations are not well defined or agreed upon.
+
+Some examples:
+
+ - HW RAID controllers. This includes RPCs to do things like compose drives into
+   a RAID volume, configure RAID parameters, monitor the HW and more.
+
+ - Baseboard managers. RPCs for configuring settings in the device and more
+
+ - NVMe vendor command capsules. nvme-cli provides access to some monitoring
+   functions that different products have defined, but more exist.
+
+ - CXL also has a NVMe-like vendor command system.
+
+ - DRM allows user space drivers to send commands to the device via kernel
+   mediation
+
+ - RDMA allows user space drivers to directly push commands to the device
+   without kernel involvement
+
+ - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
+
+The first 4 are examples of areas that fwctl intends to cover. The latter three
+are examples of denied behavior as they fully overlap with the primary purpose
+of a kernel subsystem.
+
+Some key lessons learned from these past efforts are the importance of having a
+common user space project to use as a pre-condition for obtaining a kernel
+driver. Developing good community around useful software in user space is key to
+getting companies to fund participation to enable their products.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 274cc7546efc2a..2bc43a65807486 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -44,6 +44,7 @@ Devices and I/O
 
    accelerators/ocxl
    dma-buf-alloc-exchange
+   fwctl
    gpio/index
    iommufd
    media/index
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2024-08-21 18:10 ` [PATCH v3 06/10] fwctl: Add documentation Jason Gunthorpe
@ 2024-08-21 18:10 ` Jason Gunthorpe
  2024-08-23 14:48   ` Jonathan Cameron
  2024-08-21 18:11 ` [PATCH v3 08/10] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:10 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

mlx5's fw has long provided a User Context concept. This has a long
history in RDMA as part of the devx extended verbs programming
interface. A User Context is a security envelope that contains objects and
controls access. It contains the Protection Domain object from the
InfiniBand Architecture and both togther provide the OS with the necessary
tools to bind a security context like a process to the device.

The security context is restricted to not be able to touch the kernel or
other processes. In the RDMA verbs case it is also restricted to not touch
global device resources.

The fwctl_mlx5 takes this approach and builds a User Context per fwctl
file descriptor and uses a FW security capability on the User Context to
enable access to global device resources. This makes the context useful
for provisioning and debugging the global device state.

mlx5 already has a robust infrastructure for delivering RPC messages to
fw. Trivially connect fwctl's RPC mechanism to mlx5_cmd_do(). Enforce the
User Context ID in every RPC header so the FW knows the security context
of the issuing ID.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 MAINTAINERS                 |   7 +
 drivers/fwctl/Kconfig       |  14 ++
 drivers/fwctl/Makefile      |   1 +
 drivers/fwctl/mlx5/Makefile |   4 +
 drivers/fwctl/mlx5/main.c   | 337 ++++++++++++++++++++++++++++++++++++
 include/uapi/fwctl/fwctl.h  |   1 +
 include/uapi/fwctl/mlx5.h   |  36 ++++
 7 files changed, 400 insertions(+)
 create mode 100644 drivers/fwctl/mlx5/Makefile
 create mode 100644 drivers/fwctl/mlx5/main.c
 create mode 100644 include/uapi/fwctl/mlx5.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 97945ca04b1108..d7d12adc521be1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9250,6 +9250,13 @@ F:	drivers/fwctl/
 F:	include/linux/fwctl.h
 F:	include/uapi/fwctl/
 
+FWCTL MLX5 DRIVER
+M:	Saeed Mahameed <saeedm@nvidia.com>
+R:	Itay Avraham <itayavr@nvidia.com>
+L:	linux-kernel@vger.kernel.org
+S:	Maintained
+F:	drivers/fwctl/mlx5/
+
 GALAXYCORE GC0308 CAMERA SENSOR DRIVER
 M:	Sebastian Reichel <sre@kernel.org>
 L:	linux-media@vger.kernel.org
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
index 37147a695add9a..e5ee2d46d43126 100644
--- a/drivers/fwctl/Kconfig
+++ b/drivers/fwctl/Kconfig
@@ -7,3 +7,17 @@ menuconfig FWCTL
 	  support a wide range of lockdown compatible device behaviors including
 	  manipulating device FLASH, debugging, and other activities that don't
 	  fit neatly into an existing subsystem.
+
+if FWCTL
+config FWCTL_MLX5
+	tristate "mlx5 ConnectX control fwctl driver"
+	depends on MLX5_CORE
+	help
+	  MLX5CTL provides interface for the user process to access the debug and
+	  configuration registers of the ConnectX hardware family
+	  (NICs, PCI switches and SmartNIC SoCs).
+	  This will allow configuration and debug tools to work out of the box on
+	  mainstream kernel.
+
+	  If you don't know what to do here, say N.
+endif
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
index 1cad210f6ba580..1c535f694d7fe4 100644
--- a/drivers/fwctl/Makefile
+++ b/drivers/fwctl/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_FWCTL) += fwctl.o
+obj-$(CONFIG_FWCTL_MLX5) += mlx5/
 
 fwctl-y += main.o
diff --git a/drivers/fwctl/mlx5/Makefile b/drivers/fwctl/mlx5/Makefile
new file mode 100644
index 00000000000000..139a23e3c7c517
--- /dev/null
+++ b/drivers/fwctl/mlx5/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL_MLX5) += mlx5_fwctl.o
+
+mlx5_fwctl-y += main.o
diff --git a/drivers/fwctl/mlx5/main.c b/drivers/fwctl/mlx5/main.c
new file mode 100644
index 00000000000000..8839770fbe7ba5
--- /dev/null
+++ b/drivers/fwctl/mlx5/main.c
@@ -0,0 +1,337 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/fwctl.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/driver.h>
+#include <uapi/fwctl/mlx5.h>
+
+#define mlx5ctl_err(mcdev, format, ...) \
+	dev_err(&mcdev->fwctl.dev, format, ##__VA_ARGS__)
+
+#define mlx5ctl_dbg(mcdev, format, ...)                             \
+	dev_dbg(&mcdev->fwctl.dev, "PID %u: " format, current->pid, \
+		##__VA_ARGS__)
+
+struct mlx5ctl_uctx {
+	struct fwctl_uctx uctx;
+	u32 uctx_caps;
+	u32 uctx_uid;
+};
+
+struct mlx5ctl_dev {
+	struct fwctl_device fwctl;
+	struct mlx5_core_dev *mdev;
+};
+DEFINE_FREE(mlx5ctl, struct mlx5ctl_dev *, if (_T) fwctl_put(&_T->fwctl));
+
+struct mlx5_ifc_mbox_in_hdr_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mbox_out_hdr_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+
+	u8 syndrome[0x20];
+
+	u8 reserved_at_40[0x40];
+};
+
+enum {
+	MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES = 0x4,
+};
+
+enum {
+	MLX5_CMD_OP_QUERY_DIAGNOSTIC_PARAMS = 0x819,
+	MLX5_CMD_OP_SET_DIAGNOSTIC_PARAMS = 0x820,
+	MLX5_CMD_OP_QUERY_DIAGNOSTIC_COUNTERS = 0x821,
+	MLX5_CMD_OP_POSTPONE_CONNECTED_QP_TIMEOUT = 0xb2e,
+};
+
+static int mlx5ctl_alloc_uid(struct mlx5ctl_dev *mcdev, u32 cap)
+{
+	u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {};
+	void *uctx;
+	int ret;
+	u16 uid;
+
+	uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx);
+
+	mlx5ctl_dbg(mcdev, "%s: caps 0x%x\n", __func__, cap);
+	MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
+	MLX5_SET(uctx, uctx, cap, cap);
+
+	ret = mlx5_cmd_exec(mcdev->mdev, in, sizeof(in), out, sizeof(out));
+	if (ret)
+		return ret;
+
+	uid = MLX5_GET(create_uctx_out, out, uid);
+	mlx5ctl_dbg(mcdev, "allocated uid %u with caps 0x%x\n", uid, cap);
+	return uid;
+}
+
+static void mlx5ctl_release_uid(struct mlx5ctl_dev *mcdev, u16 uid)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_uctx_in)] = {};
+	struct mlx5_core_dev *mdev = mcdev->mdev;
+	int ret;
+
+	MLX5_SET(destroy_uctx_in, in, opcode, MLX5_CMD_OP_DESTROY_UCTX);
+	MLX5_SET(destroy_uctx_in, in, uid, uid);
+
+	ret = mlx5_cmd_exec_in(mdev, destroy_uctx, in);
+	mlx5ctl_dbg(mcdev, "released uid %u %pe\n", uid, ERR_PTR(ret));
+}
+
+static int mlx5ctl_open_uctx(struct fwctl_uctx *uctx)
+{
+	struct mlx5ctl_uctx *mfd =
+		container_of(uctx, struct mlx5ctl_uctx, uctx);
+	struct mlx5ctl_dev *mcdev =
+		container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+	int uid;
+
+	/*
+	 * New FW supports the TOOLS_RESOURCES uid security label
+	 * which allows commands to manipulate the global device state.
+	 * Otherwise only basic existing RDMA devx privilege are allowed.
+	 */
+	if (MLX5_CAP_GEN(mcdev->mdev, uctx_cap) &
+	    MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES)
+		mfd->uctx_caps |= MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES;
+
+	uid = mlx5ctl_alloc_uid(mcdev, mfd->uctx_caps);
+	if (uid < 0)
+		return uid;
+
+	mfd->uctx_uid = uid;
+	return 0;
+}
+
+static void mlx5ctl_close_uctx(struct fwctl_uctx *uctx)
+{
+	struct mlx5ctl_dev *mcdev =
+		container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+	struct mlx5ctl_uctx *mfd =
+		container_of(uctx, struct mlx5ctl_uctx, uctx);
+
+	mlx5ctl_release_uid(mcdev, mfd->uctx_uid);
+}
+
+static void *mlx5ctl_info(struct fwctl_uctx *uctx, size_t *length)
+{
+	struct mlx5ctl_uctx *mfd =
+		container_of(uctx, struct mlx5ctl_uctx, uctx);
+	struct fwctl_info_mlx5 *info;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	info->uid = mfd->uctx_uid;
+	info->uctx_caps = mfd->uctx_caps;
+	*length = sizeof(*info);
+	return info;
+}
+
+static bool mlx5ctl_validate_rpc(const void *in, enum fwctl_rpc_scope scope)
+{
+	u16 opcode = MLX5_GET(mbox_in_hdr, in, opcode);
+
+	/*
+	 * Currently the driver can't keep track of commands that allocate
+	 * objects in the FW, these commands are safe from a security
+	 * perspective but nothing will free the memory when the FD is closed.
+	 * For now permit only query commands. Also the caps for the scope have
+	 * not been defined yet, filter commands manually for now.
+	 */
+	switch (opcode) {
+	case MLX5_CMD_OP_POSTPONE_CONNECTED_QP_TIMEOUT:
+	case MLX5_CMD_OP_QUERY_ADAPTER:
+	case MLX5_CMD_OP_QUERY_ESW_FUNCTIONS:
+	case MLX5_CMD_OP_QUERY_HCA_CAP:
+	case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT:
+	case MLX5_CMD_OP_QUERY_ROCE_ADDRESS:
+		return scope <= FWCTL_RPC_CONFIGURATION;
+
+	case MLX5_CMD_OP_QUERY_CONG_PARAMS:
+	case MLX5_CMD_OP_QUERY_CONG_STATISTICS:
+	case MLX5_CMD_OP_QUERY_CONG_STATUS:
+	case MLX5_CMD_OP_QUERY_CQ:
+	case MLX5_CMD_OP_QUERY_DCT:
+	case MLX5_CMD_OP_QUERY_DIAGNOSTIC_COUNTERS:
+	case MLX5_CMD_OP_QUERY_DIAGNOSTIC_PARAMS:
+	case MLX5_CMD_OP_QUERY_EQ:
+	case MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT:
+	case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
+	case MLX5_CMD_OP_QUERY_FLOW_GROUP:
+	case MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY:
+	case MLX5_CMD_OP_QUERY_FLOW_TABLE:
+	case MLX5_CMD_OP_QUERY_GENERAL_OBJECT:
+	case MLX5_CMD_OP_QUERY_ISSI:
+	case MLX5_CMD_OP_QUERY_L2_TABLE_ENTRY:
+	case MLX5_CMD_OP_QUERY_LAG:
+	case MLX5_CMD_OP_QUERY_MAD_DEMUX:
+	case MLX5_CMD_OP_QUERY_MKEY:
+	case MLX5_CMD_OP_QUERY_MODIFY_HEADER_CONTEXT:
+	case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT:
+	case MLX5_CMD_OP_QUERY_PAGES:
+	case MLX5_CMD_OP_QUERY_Q_COUNTER:
+	case MLX5_CMD_OP_QUERY_QP:
+	case MLX5_CMD_OP_QUERY_RMP:
+	case MLX5_CMD_OP_QUERY_RQ:
+	case MLX5_CMD_OP_QUERY_RQT:
+	case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
+	case MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS:
+	case MLX5_CMD_OP_QUERY_SQ:
+	case MLX5_CMD_OP_QUERY_SRQ:
+	case MLX5_CMD_OP_QUERY_TIR:
+	case MLX5_CMD_OP_QUERY_TIS:
+	case MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE:
+	case MLX5_CMD_OP_QUERY_VNIC_ENV:
+	case MLX5_CMD_OP_QUERY_VPORT_COUNTER:
+	case MLX5_CMD_OP_QUERY_VPORT_STATE:
+	case MLX5_CMD_OP_QUERY_WOL_ROL:
+	case MLX5_CMD_OP_QUERY_XRC_SRQ:
+	case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY:
+	case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS:
+	case MLX5_CMD_OP_QUERY_XRQ:
+		return scope <= FWCTL_RPC_DEBUG_READ_ONLY;
+
+	case MLX5_CMD_OP_SET_DIAGNOSTIC_PARAMS:
+		return scope <= FWCTL_RPC_DEBUG_WRITE;
+
+	case MLX5_CMD_OP_ACCESS_REG:
+		return scope <= FWCTL_RPC_DEBUG_WRITE_FULL;
+	default:
+		return false;
+	}
+}
+
+static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+			    void *rpc_in, size_t in_len, size_t *out_len)
+{
+	struct mlx5ctl_dev *mcdev =
+		container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
+	struct mlx5ctl_uctx *mfd =
+		container_of(uctx, struct mlx5ctl_uctx, uctx);
+	void *rpc_alloc __free(kvfree) = NULL;
+	void *rpc_out;
+	int ret;
+
+	if (in_len < MLX5_ST_SZ_BYTES(mbox_in_hdr) ||
+	    *out_len < MLX5_ST_SZ_BYTES(mbox_out_hdr))
+		return ERR_PTR(-EMSGSIZE);
+
+	/* FIXME: Requires device support for more scopes */
+	if (scope != FWCTL_RPC_CONFIGURATION &&
+	    scope != FWCTL_RPC_DEBUG_READ_ONLY)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mlx5ctl_dbg(mcdev, "[UID %d] cmdif: opcode 0x%x inlen %zu outlen %zu\n",
+		    mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
+		    in_len, *out_len);
+
+	if (!mlx5ctl_validate_rpc(rpc_in, scope))
+		return ERR_PTR(-EBADMSG);
+
+	/*
+	 * mlx5_cmd_do() copies the input message to its own buffer before
+	 * executing it, so we can reuse the allocation for the output.
+	 */
+	if (*out_len <= in_len) {
+		rpc_out = rpc_in;
+	} else {
+		rpc_out = rpc_alloc = kvzalloc(*out_len, GFP_KERNEL);
+		if (!rpc_alloc)
+			return ERR_PTR(-ENOMEM);
+	}
+
+	/* Enforce the user context for the command */
+	MLX5_SET(mbox_in_hdr, rpc_in, uid, mfd->uctx_uid);
+	ret = mlx5_cmd_do(mcdev->mdev, rpc_in, in_len, rpc_out, *out_len);
+
+	mlx5ctl_dbg(mcdev,
+		    "[UID %d] cmdif: opcode 0x%x status 0x%x retval %pe\n",
+		    mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
+		    MLX5_GET(mbox_out_hdr, rpc_out, status), ERR_PTR(ret));
+
+	/*
+	 * -EREMOTEIO means execution succeeded and the out is valid,
+	 * but an error code was returned inside out. Everything else
+	 * means the RPC did not make it to the device.
+	 */
+	if (ret && ret != -EREMOTEIO)
+		return ERR_PTR(ret);
+	if (rpc_out == rpc_in)
+		return rpc_in;
+	return_ptr(rpc_alloc);
+}
+
+static const struct fwctl_ops mlx5ctl_ops = {
+	.device_type = FWCTL_DEVICE_TYPE_MLX5,
+	.uctx_size = sizeof(struct mlx5ctl_uctx),
+	.open_uctx = mlx5ctl_open_uctx,
+	.close_uctx = mlx5ctl_close_uctx,
+	.info = mlx5ctl_info,
+	.fw_rpc = mlx5ctl_fw_rpc,
+};
+
+static int mlx5ctl_probe(struct auxiliary_device *adev,
+			 const struct auxiliary_device_id *id)
+
+{
+	struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+	struct mlx5_core_dev *mdev = madev->mdev;
+	struct mlx5ctl_dev *mcdev __free(mlx5ctl) = fwctl_alloc_device(
+		&mdev->pdev->dev, &mlx5ctl_ops, struct mlx5ctl_dev, fwctl);
+	int ret;
+
+	if (!mcdev)
+		return -ENOMEM;
+
+	mcdev->mdev = mdev;
+
+	ret = fwctl_register(&mcdev->fwctl);
+	if (ret)
+		return ret;
+	auxiliary_set_drvdata(adev, no_free_ptr(mcdev));
+	return 0;
+}
+
+static void mlx5ctl_remove(struct auxiliary_device *adev)
+{
+	struct mlx5ctl_dev *mcdev __free(mlx5ctl) = auxiliary_get_drvdata(adev);
+
+	fwctl_unregister(&mcdev->fwctl);
+}
+
+static const struct auxiliary_device_id mlx5ctl_id_table[] = {
+	{.name = MLX5_ADEV_NAME ".fwctl",},
+	{}
+};
+MODULE_DEVICE_TABLE(auxiliary, mlx5ctl_id_table);
+
+static struct auxiliary_driver mlx5ctl_driver = {
+	.name = "mlx5_fwctl",
+	.probe = mlx5ctl_probe,
+	.remove = mlx5ctl_remove,
+	.id_table = mlx5ctl_id_table,
+};
+
+module_auxiliary_driver(mlx5ctl_driver);
+
+MODULE_IMPORT_NS(FWCTL);
+MODULE_DESCRIPTION("mlx5 ConnectX fwctl driver");
+MODULE_AUTHOR("Saeed Mahameed <saeedm@nvidia.com>");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index 3af9f9eb9b1878..f9b27fb5c1618c 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -42,6 +42,7 @@ enum {
 
 enum fwctl_device_type {
 	FWCTL_DEVICE_TYPE_ERROR = 0,
+	FWCTL_DEVICE_TYPE_MLX5 = 1,
 };
 
 /**
diff --git a/include/uapi/fwctl/mlx5.h b/include/uapi/fwctl/mlx5.h
new file mode 100644
index 00000000000000..bcb4602ffdeee4
--- /dev/null
+++ b/include/uapi/fwctl/mlx5.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ *
+ * These are definitions for the command interface for mlx5 HW. mlx5 FW has a
+ * User Context mechanism which allows the FW to understand a security scope.
+ * FWCTL binds each FD to a FW user context and then places the User Context ID
+ * (UID) in each command header. The created User Context has a capability set
+ * that is appropriate for FWCTL's security model.
+ *
+ * Command formation should use a copy of the structs in mlx5_ifc.h following
+ * the Programmers Reference Manual. A open release is available here:
+ *
+ *  https://network.nvidia.com/files/doc-2020/ethernet-adapters-programming-manual.pdf
+ *
+ * The device_type for this file is FWCTL_DEVICE_TYPE_MLX5.
+ */
+#ifndef _UAPI_FWCTL_MLX5_H
+#define _UAPI_FWCTL_MLX5_H
+
+#include <linux/types.h>
+
+/**
+ * struct fwctl_info_mlx5 - ioctl(FWCTL_INFO) out_device_data
+ * @uid: The FW UID this FD is bound to. Each command header will force
+ *	this value.
+ * @uctx_caps: The FW capabilities that are enabled for the uid.
+ *
+ * Return basic information about the FW interface available.
+ */
+struct fwctl_info_mlx5 {
+	__u32 uid;
+	__u32 uctx_caps;
+};
+
+#endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 08/10] mlx5: Create an auxiliary device for fwctl_mlx5
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2024-08-21 18:10 ` [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
@ 2024-08-21 18:11 ` Jason Gunthorpe
  2024-08-21 18:11 ` [PATCH v3 09/10] fwctl/cxl: Add driver for CXL mailbox for handling CXL features commands (RFC) Jason Gunthorpe
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:11 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

If the device supports User Context then it can support fwctl. Create an
auxiliary device to allow fwctl to bind to it.

Create a sysfs like:

$ ls /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -l
lrwxrwxrwx 1 root root 0 Apr 25 19:46 /sys/devices/pci0000:00/0000:00:0a.0/mlx5_core.fwctl.0/driver -> ../../../../bus/auxiliary/drivers/mlx5_fwctl.mlx5_fwctl

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 9a79674d27f15a..0786b011a8bc29 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -228,8 +228,14 @@ enum {
 	MLX5_INTERFACE_PROTOCOL_VNET,
 
 	MLX5_INTERFACE_PROTOCOL_DPLL,
+	MLX5_INTERFACE_PROTOCOL_FWCTL,
 };
 
+static bool is_fwctl_supported(struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, uctx_cap);
+}
+
 static const struct mlx5_adev_device {
 	const char *suffix;
 	bool (*is_supported)(struct mlx5_core_dev *dev);
@@ -252,6 +258,8 @@ static const struct mlx5_adev_device {
 					   .is_supported = &is_mp_supported },
 	[MLX5_INTERFACE_PROTOCOL_DPLL] = { .suffix = "dpll",
 					   .is_supported = &is_dpll_supported },
+	[MLX5_INTERFACE_PROTOCOL_FWCTL] = { .suffix = "fwctl",
+					    .is_supported = &is_fwctl_supported },
 };
 
 int mlx5_adev_idx_alloc(void)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 09/10] fwctl/cxl: Add driver for CXL mailbox for handling CXL features commands (RFC)
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2024-08-21 18:11 ` [PATCH v3 08/10] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
@ 2024-08-21 18:11 ` Jason Gunthorpe
  2024-08-21 18:11 ` [PATCH v3 10/10] cxl: Create an auxiliary device for fwctl_cxl Jason Gunthorpe
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:11 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

From: Dave Jiang <dave.jiang@intel.com>

Add an fwctl (auxiliary bus) driver to allow sending of CXL feature
commands from userspace through as ioctls. Create a driver skeleton for
initial setup.

FWCTL_INFO will return the commands supported by the fwctl char device as
a bitmap of enable commands.

fwctl provides a fwctl_ops->fw_rpc() callback in order to issue ioctls to
a device.

FWCTL_RPC will start by supporting the CXL feature commands: Get Supported
Features, Get Feature, and Set Feature.

FWCTL_RPC provides 'enum fwctl_rpc_scope' parameter where it indicates the
security scope of the call. The Get Supported Features and Get Feature
calls can be executed with the scope of FWCTL_RPC_DEBUG_READ_ONLY. The Set
Feature call is gated by the effects of the feature reported by Get
Supported Features call for the specific feature.

Add a software command through FWCTL_RPC in order for the user to retrieve
information about the commands that are supported. In this instance only 3
commands are supported: Get Supported Features, Get Feature, and Set
Feature.

The expected flow of operation is to send the call first with 0 set to the
n_commands parameter to indicate query of total commands available. And
then a second call provides the number of commands to retrieve with the
appropriate amount of memory allocated to store information about the
commands.

Link: https://patch.msgid.link/r/20240718213446.1750135-9-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 MAINTAINERS                 |   7 +
 drivers/fwctl/Kconfig       |   9 ++
 drivers/fwctl/Makefile      |   1 +
 drivers/fwctl/cxl/Makefile  |   4 +
 drivers/fwctl/cxl/cxl.c     | 274 ++++++++++++++++++++++++++++++++++++
 include/linux/cxl/mailbox.h | 104 ++++++++++++++
 include/uapi/fwctl/cxl.h    |  94 +++++++++++++
 include/uapi/fwctl/fwctl.h  |   1 +
 8 files changed, 494 insertions(+)
 create mode 100644 drivers/fwctl/cxl/Makefile
 create mode 100644 drivers/fwctl/cxl/cxl.c
 create mode 100644 include/uapi/fwctl/cxl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d7d12adc521be1..9933c67303f0ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9257,6 +9257,13 @@ L:	linux-kernel@vger.kernel.org
 S:	Maintained
 F:	drivers/fwctl/mlx5/
 
+FWCTL CXL DRIVER
+M:	Dave Jiang <dave.jiang@intel.com>
+R:	Dan Williams <dan.j.williams@intel.com>
+L:	linux-cxl@vger.kernel.org
+S:	Maintained
+F:	drivers/fwctl/cxl/
+
 GALAXYCORE GC0308 CAMERA SENSOR DRIVER
 M:	Sebastian Reichel <sre@kernel.org>
 L:	linux-media@vger.kernel.org
diff --git a/drivers/fwctl/Kconfig b/drivers/fwctl/Kconfig
index e5ee2d46d43126..e49903a9d0d34f 100644
--- a/drivers/fwctl/Kconfig
+++ b/drivers/fwctl/Kconfig
@@ -19,5 +19,14 @@ config FWCTL_MLX5
 	  This will allow configuration and debug tools to work out of the box on
 	  mainstream kernel.
 
+	  If you don't know what to do here, say N.
+
+config FWCTL_CXL
+	tristate "CXL fwctl driver"
+	depends on CXL_BUS
+	help
+	  CXLCTL provides interface for the user process to access user allowed
+	  mailbox commands for CXL device.
+
 	  If you don't know what to do here, say N.
 endif
diff --git a/drivers/fwctl/Makefile b/drivers/fwctl/Makefile
index 1c535f694d7fe4..bd356e6f2e5af1 100644
--- a/drivers/fwctl/Makefile
+++ b/drivers/fwctl/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_FWCTL) += fwctl.o
 obj-$(CONFIG_FWCTL_MLX5) += mlx5/
+obj-$(CONFIG_FWCTL_CXL) += cxl/
 
 fwctl-y += main.o
diff --git a/drivers/fwctl/cxl/Makefile b/drivers/fwctl/cxl/Makefile
new file mode 100644
index 00000000000000..62319452157272
--- /dev/null
+++ b/drivers/fwctl/cxl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FWCTL_CXL) += cxl_fwctl.o
+
+cxl_fwctl-y += cxl.o
diff --git a/drivers/fwctl/cxl/cxl.c b/drivers/fwctl/cxl/cxl.c
new file mode 100644
index 00000000000000..8836a806763f54
--- /dev/null
+++ b/drivers/fwctl/cxl/cxl.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, Intel Corporation
+ */
+#include <linux/fwctl.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/cxl/mailbox.h>
+#include <linux/auxiliary_bus.h>
+#include <uapi/fwctl/cxl.h>
+
+struct cxlctl_uctx {
+	struct fwctl_uctx uctx;
+	u32 uctx_caps;
+	u32 uctx_uid;
+};
+
+struct cxlctl_dev {
+	struct fwctl_device fwctl;
+	struct cxl_mailbox *mbox;
+};
+
+DEFINE_FREE(cxlctl, struct cxlctl_dev *, if (_T) fwctl_put(&_T->fwctl))
+
+static int cxlctl_open_uctx(struct fwctl_uctx *uctx)
+{
+	struct cxlctl_uctx *cxlctl_uctx =
+		container_of(uctx, struct cxlctl_uctx, uctx);
+
+	cxlctl_uctx->uctx_caps = BIT(FWCTL_CXL_QUERY_COMMANDS) |
+				 BIT(FWCTL_CXL_SEND_COMMAND);
+
+	return 0;
+}
+
+static void cxlctl_close_uctx(struct fwctl_uctx *uctx)
+{
+}
+
+static void *cxlctl_info(struct fwctl_uctx *uctx, size_t *length)
+{
+	struct cxlctl_uctx *cxlctl_uctx =
+		container_of(uctx, struct cxlctl_uctx, uctx);
+	struct fwctl_info_cxl *info;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	info->uctx_caps = cxlctl_uctx->uctx_caps;
+
+	return info;
+}
+
+static bool cxlctl_validate_set_features(struct cxl_mailbox *cxl_mbox,
+					 const struct fwctl_cxl_command *send_cmd,
+					 enum fwctl_rpc_scope scope)
+{
+	struct cxl_feat_entry *feat;
+	bool found = false;
+	uuid_t uuid;
+	u16 mask;
+
+	if (send_cmd->in.size < sizeof(struct set_feature_input))
+		return false;
+
+	if (copy_from_user(&uuid, u64_to_user_ptr(send_cmd->in.payload),
+			   sizeof(uuid)))
+		return false;
+
+	for (int i = 0; i < cxl_mbox->num_features; i++) {
+		feat = &cxl_mbox->entries[i];
+		if (uuid_equal(&uuid, &feat->uuid)) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found)
+		return false;
+
+	/* Currently no user background command support */
+	if (feat->effects & CXL_CMD_BACKGROUND)
+		return false;
+
+	mask = CXL_CMD_CONFIG_CHANGE_IMMEDIATE |
+	       CXL_CMD_DATA_CHANGE_IMMEDIATE |
+	       CXL_CMD_POLICY_CHANGE_IMMEDIATE |
+	       CXL_CMD_LOG_CHANGE_IMMEDIATE;
+	if (feat->effects & mask && scope >= FWCTL_RPC_DEBUG_WRITE)
+		return true;
+
+	/* These effects supported for all scope */
+	if ((feat->effects & CXL_CMD_CONFIG_CHANGE_COLD_RESET ||
+	     feat->effects & CXL_CMD_CONFIG_CHANGE_CONV_RESET) &&
+	    scope >= FWCTL_RPC_DEBUG_READ_ONLY)
+		return true;
+
+	return false;
+}
+
+static bool cxlctl_validate_hw_cmds(struct cxl_mailbox *cxl_mbox,
+				    const struct fwctl_cxl_command *send_cmd,
+				    enum fwctl_rpc_scope scope)
+{
+	struct cxl_mem_command *cmd;
+
+	/*
+	 * Only supporting feature commands.
+	 */
+	if (!cxl_mbox->num_features)
+		return false;
+
+	cmd = cxl_get_mem_command(send_cmd->id);
+	if (!cmd)
+		return false;
+
+	if (test_bit(cmd->info.id, cxl_mbox->enabled_cmds))
+		return false;
+
+	if (test_bit(cmd->info.id, cxl_mbox->exclusive_cmds))
+		return false;
+
+	switch (cmd->opcode) {
+	case CXL_MBOX_OP_GET_SUPPORTED_FEATURES:
+	case CXL_MBOX_OP_GET_FEATURE:
+		if (scope >= FWCTL_RPC_DEBUG_READ_ONLY)
+			return true;
+		break;
+	case CXL_MBOX_OP_SET_FEATURE:
+		return cxlctl_validate_set_features(cxl_mbox, send_cmd, scope);
+	default:
+		return false;
+	};
+
+	return false;
+}
+
+static bool cxlctl_validate_query_commands(struct fwctl_rpc_cxl *rpc_in)
+{
+	int cmds;
+
+	if (rpc_in->payload_size < sizeof(rpc_in->query))
+		return false;
+
+	cmds = rpc_in->query.n_commands;
+	if (cmds) {
+		int cmds_size = rpc_in->payload_size - sizeof(rpc_in->query);
+
+		if (cmds != cmds_size / sizeof(struct cxl_command_info))
+			return false;
+	}
+
+	return true;
+}
+
+static bool cxlctl_validate_rpc(struct fwctl_uctx *uctx,
+				struct fwctl_rpc_cxl *rpc_in,
+				enum fwctl_rpc_scope scope)
+{
+	struct cxlctl_dev *cxlctl =
+		container_of(uctx->fwctl, struct cxlctl_dev, fwctl);
+
+	switch (rpc_in->rpc_cmd) {
+	case FWCTL_CXL_QUERY_COMMANDS:
+		return cxlctl_validate_query_commands(rpc_in);
+
+	case FWCTL_CXL_SEND_COMMAND:
+		return cxlctl_validate_hw_cmds(cxlctl->mbox,
+					       &rpc_in->send_cmd, scope);
+
+	default:
+		return false;
+	}
+}
+
+static void *send_cxl_command(struct cxl_mailbox *cxl_mbox,
+			      struct fwctl_cxl_command *send_cmd,
+			      size_t *out_len)
+{
+	struct cxl_mbox_cmd mbox_cmd;
+	int rc;
+
+	rc = cxl_fwctl_send_cmd(cxl_mbox, send_cmd, &mbox_cmd, out_len);
+	if (rc)
+		return ERR_PTR(rc);
+
+	*out_len = mbox_cmd.size_out;
+
+	return mbox_cmd.payload_out;
+}
+
+static void *cxlctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
+			   void *in, size_t in_len, size_t *out_len)
+{
+	struct cxlctl_dev *cxlctl =
+		container_of(uctx->fwctl, struct cxlctl_dev, fwctl);
+	struct cxl_mailbox *cxl_mbox = cxlctl->mbox;
+	struct fwctl_rpc_cxl *rpc_in = in;
+
+	if (!cxlctl_validate_rpc(uctx, rpc_in, scope))
+		return ERR_PTR(-EPERM);
+
+	switch (rpc_in->rpc_cmd) {
+	case FWCTL_CXL_QUERY_COMMANDS:
+		return cxl_query_cmd_from_fwctl(cxl_mbox, &rpc_in->query,
+						out_len);
+
+	case FWCTL_CXL_SEND_COMMAND:
+		return send_cxl_command(cxl_mbox, &rpc_in->send_cmd, out_len);
+
+	default:
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+}
+
+static const struct fwctl_ops cxlctl_ops = {
+	.device_type = FWCTL_DEVICE_TYPE_CXL,
+	.uctx_size = sizeof(struct cxlctl_uctx),
+	.open_uctx = cxlctl_open_uctx,
+	.close_uctx = cxlctl_close_uctx,
+	.info = cxlctl_info,
+	.fw_rpc = cxlctl_fw_rpc,
+};
+
+static int cxlctl_probe(struct auxiliary_device *adev,
+			const struct auxiliary_device_id *id)
+{
+	struct cxl_mailbox *mbox = container_of(adev, struct cxl_mailbox, adev);
+	struct cxlctl_dev *cxlctl __free(cxlctl) =
+		fwctl_alloc_device(mbox->host, &cxlctl_ops,
+				   struct cxlctl_dev, fwctl);
+	int rc;
+
+	if (!cxlctl)
+		return -ENOMEM;
+
+	cxlctl->mbox = mbox;
+
+	rc = fwctl_register(&cxlctl->fwctl);
+	if (rc)
+		return rc;
+
+	auxiliary_set_drvdata(adev, no_free_ptr(cxlctl));
+
+	return 0;
+}
+
+static void cxlctl_remove(struct auxiliary_device *adev)
+{
+	struct cxlctl_dev *ctldev __free(cxlctl) = auxiliary_get_drvdata(adev);
+
+	fwctl_unregister(&ctldev->fwctl);
+}
+
+static const struct auxiliary_device_id cxlctl_id_table[] = {
+	{ .name = "CXL.fwctl", },
+	{},
+};
+MODULE_DEVICE_TABLE(auxiliary, cxlctl_id_table);
+
+static struct auxiliary_driver cxlctl_driver = {
+	.name = "cxl_fwctl",
+	.probe = cxlctl_probe,
+	.remove = cxlctl_remove,
+	.id_table = cxlctl_id_table,
+};
+
+module_auxiliary_driver(cxlctl_driver);
+
+MODULE_IMPORT_NS(CXL);
+MODULE_IMPORT_NS(FWCTL);
+MODULE_DESCRIPTION("CXL fwctl driver");
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/cxl/mailbox.h b/include/linux/cxl/mailbox.h
index 570864239b8f14..13b5bb6e5bc310 100644
--- a/include/linux/cxl/mailbox.h
+++ b/include/linux/cxl/mailbox.h
@@ -4,6 +4,7 @@
 #define __CXL_MBOX_H__
 
 #include <uapi/linux/cxl_mem.h>
+#include <uapi/fwctl/cxl.h>
 #include <linux/auxiliary_bus.h>
 
 /**
@@ -68,4 +69,107 @@ struct cxl_mailbox {
 	struct cxl_feat_entry *entries;
 };
 
+enum cxl_opcode {
+	CXL_MBOX_OP_INVALID		= 0x0000,
+	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
+	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
+	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
+	CXL_MBOX_OP_GET_EVT_INT_POLICY	= 0x0102,
+	CXL_MBOX_OP_SET_EVT_INT_POLICY	= 0x0103,
+	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
+	CXL_MBOX_OP_TRANSFER_FW		= 0x0201,
+	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
+	CXL_MBOX_OP_GET_TIMESTAMP	= 0x0300,
+	CXL_MBOX_OP_SET_TIMESTAMP	= 0x0301,
+	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
+	CXL_MBOX_OP_GET_LOG		= 0x0401,
+	CXL_MBOX_OP_GET_LOG_CAPS	= 0x0402,
+	CXL_MBOX_OP_CLEAR_LOG           = 0x0403,
+	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
+	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
+	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
+	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
+	CXL_MBOX_OP_IDENTIFY		= 0x4000,
+	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
+	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
+	CXL_MBOX_OP_GET_LSA		= 0x4102,
+	CXL_MBOX_OP_SET_LSA		= 0x4103,
+	CXL_MBOX_OP_GET_HEALTH_INFO	= 0x4200,
+	CXL_MBOX_OP_GET_ALERT_CONFIG	= 0x4201,
+	CXL_MBOX_OP_SET_ALERT_CONFIG	= 0x4202,
+	CXL_MBOX_OP_GET_SHUTDOWN_STATE	= 0x4203,
+	CXL_MBOX_OP_SET_SHUTDOWN_STATE	= 0x4204,
+	CXL_MBOX_OP_GET_POISON		= 0x4300,
+	CXL_MBOX_OP_INJECT_POISON	= 0x4301,
+	CXL_MBOX_OP_CLEAR_POISON	= 0x4302,
+	CXL_MBOX_OP_GET_SCAN_MEDIA_CAPS	= 0x4303,
+	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,
+	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,
+	CXL_MBOX_OP_SANITIZE		= 0x4400,
+	CXL_MBOX_OP_SECURE_ERASE	= 0x4401,
+	CXL_MBOX_OP_GET_SECURITY_STATE	= 0x4500,
+	CXL_MBOX_OP_SET_PASSPHRASE	= 0x4501,
+	CXL_MBOX_OP_DISABLE_PASSPHRASE	= 0x4502,
+	CXL_MBOX_OP_UNLOCK		= 0x4503,
+	CXL_MBOX_OP_FREEZE_SECURITY	= 0x4504,
+	CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE	= 0x4505,
+	CXL_MBOX_OP_MAX			= 0x10000
+};
+
+#define CXL_CMD_CONFIG_CHANGE_COLD_RESET	BIT(0)
+#define CXL_CMD_CONFIG_CHANGE_IMMEDIATE		BIT(1)
+#define CXL_CMD_DATA_CHANGE_IMMEDIATE		BIT(2)
+#define CXL_CMD_POLICY_CHANGE_IMMEDIATE		BIT(3)
+#define CXL_CMD_LOG_CHANGE_IMMEDIATE		BIT(4)
+#define CXL_CMD_SECURITY_STATE_CHANGE		BIT(5)
+#define CXL_CMD_BACKGROUND			BIT(6)
+#define CXL_CMD_BGCMD_ABORT_SUPPORTED		BIT(7)
+#define CXL_CMD_CONFIG_CHANGE_CONV_RESET	(BIT(9) | BIT(10))
+#define CXL_CMD_CONFIG_CHANGE_CXL_RESET		(BIT(9) | BIT(11))
+
+struct cxl_feat_entry {
+	uuid_t uuid;
+	__le16 id;
+	__le16 get_feat_size;
+	__le16 set_feat_size;
+	__le32 flags;
+	u8 get_feat_ver;
+	u8 set_feat_ver;
+	__le16 effects;
+	u8 reserved[18];
+} __packed;
+
+/**
+ * struct cxl_mem_command - Driver representation of a memory device command
+ * @info: Command information as it exists for the UAPI
+ * @opcode: The actual bits used for the mailbox protocol
+ * @flags: Set of flags effecting driver behavior.
+ *
+ *  * %CXL_CMD_FLAG_FORCE_ENABLE: In cases of error, commands with this flag
+ *    will be enabled by the driver regardless of what hardware may have
+ *    advertised.
+ *
+ * The cxl_mem_command is the driver's internal representation of commands that
+ * are supported by the driver. Some of these commands may not be supported by
+ * the hardware. The driver will use @info to validate the fields passed in by
+ * the user then submit the @opcode to the hardware.
+ *
+ * See struct cxl_command_info.
+ */
+struct cxl_mem_command {
+	struct cxl_command_info info;
+	enum cxl_opcode opcode;
+	u32 flags;
+#define CXL_CMD_FLAG_FORCE_ENABLE BIT(0)
+};
+
+struct cxl_mem_command *cxl_get_mem_command(u32 id);
+int cxl_fwctl_send_cmd(struct cxl_mailbox *cxl_mbox,
+		       struct fwctl_cxl_command *fwctl_cmd,
+		       struct cxl_mbox_cmd *mbox_cmd,
+		       size_t *out_len);
+void *cxl_query_cmd_from_fwctl(struct cxl_mailbox *cxl_mbox,
+			       struct cxl_mem_query_commands *q,
+			       size_t *out_len);
+
 #endif
diff --git a/include/uapi/fwctl/cxl.h b/include/uapi/fwctl/cxl.h
new file mode 100644
index 00000000000000..a3a96195f6da4c
--- /dev/null
+++ b/include/uapi/fwctl/cxl.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2024, Intel Corporation
+ *
+ * These are definitions for the mailbox command interface of CXL subsystem.
+ */
+#ifndef _UAPI_FWCTL_CXL_H_
+#define _UAPI_FWCTL_CXL_H_
+
+#include <linux/types.h>
+
+enum fwctl_cxl_commands {
+	FWCTL_CXL_QUERY_COMMANDS = 0,
+	FWCTL_CXL_SEND_COMMAND,
+};
+
+/**
+ * struct fwctl_info_cxl - ioctl(FWCTL_INFO) out_device_data
+ * @uctx_caps: The command capabilities driver accepts.
+ *
+ * Return basic information about the FW interface available.
+ */
+struct fwctl_info_cxl {
+	__u32 uctx_caps;
+};
+
+/*
+ * CXL spec r3.1 Table 8-101 Set Feature Input Payload
+ */
+struct set_feature_input {
+	__u8 uuid[16];
+	__u32 flags;
+	__u16 offset;
+	__u8 version;
+	__u8 reserved[9];
+	__u8 data[];
+} __packed;
+
+/**
+ * struct cxl_send_command - Send a command to a memory device.
+ * @id: The command to send to the memory device. This must be one of the
+ *	commands returned by the query command.
+ * @flags: Flags for the command (input).
+ * @raw: Special fields for raw commands
+ * @raw.opcode: Opcode passed to hardware when using the RAW command.
+ * @raw.rsvd: Must be zero.
+ * @rsvd: Must be zero.
+ * @retval: Return value from the memory device (output).
+ * @in: Parameters associated with input payload.
+ * @in.size: Size of the payload to provide to the device (input).
+ * @in.rsvd: Must be zero.
+ * @in.payload: Pointer to memory for payload input, payload is little endian.
+ *
+ * Output payload is defined with 'struct fwctl_rpc' and is the hardware output
+ */
+struct fwctl_cxl_command {
+	__u32 id;
+	__u32 flags;
+	union {
+		struct {
+			__u16 opcode;
+			__u16 rsvd;
+		} raw;
+		__u32 rsvd;
+	};
+
+	struct {
+		__u32 size;
+		__u32 rsvd;
+		__u64 payload;
+	} in;
+};
+
+/**
+ * struct fwctl_rpc_cxl - ioctl(FWCTL_RPC) input
+ */
+struct fwctl_rpc_cxl {
+	__u32 rpc_cmd;
+	__u32 payload_size;
+	__u32 version;
+	__u32 rsvd;
+	union {
+		struct cxl_mem_query_commands query;
+		struct fwctl_cxl_command send_cmd;
+	};
+};
+
+struct fwctl_rpc_cxl_out {
+	__u32 retval;
+	__u32 rsvd;
+	__u8 payload[];
+};
+
+#endif
diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
index f9b27fb5c1618c..4e4d30104667c7 100644
--- a/include/uapi/fwctl/fwctl.h
+++ b/include/uapi/fwctl/fwctl.h
@@ -43,6 +43,7 @@ enum {
 enum fwctl_device_type {
 	FWCTL_DEVICE_TYPE_ERROR = 0,
 	FWCTL_DEVICE_TYPE_MLX5 = 1,
+	FWCTL_DEVICE_TYPE_CXL = 2,
 };
 
 /**
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 10/10] cxl: Create an auxiliary device for fwctl_cxl
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2024-08-21 18:11 ` [PATCH v3 09/10] fwctl/cxl: Add driver for CXL mailbox for handling CXL features commands (RFC) Jason Gunthorpe
@ 2024-08-21 18:11 ` Jason Gunthorpe
  2024-09-13 22:39 ` [PATCH v3 00/10] Introduce fwctl subystem Dave Jiang
  2024-12-05 22:28 ` Shannon Nelson
  11 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 18:11 UTC (permalink / raw)
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

This will link the fwctl subsystem to CXL devices.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/cxl/core/memdev.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 15c36971fe43ea..f6f33f0f733741 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -13,6 +13,8 @@
 
 DECLARE_RWSEM(cxl_memdev_rwsem);
 
+#define CXL_ADEV_NAME "fwctl-cxl"
+
 /*
  * An entire PCI topology full of devices should be enough for any
  * config
@@ -1030,6 +1032,7 @@ static const struct file_operations cxl_memdev_fops = {
 struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
 				       struct cxl_dev_state *cxlds)
 {
+	struct auxiliary_device *adev;
 	struct cxl_memdev *cxlmd;
 	struct device *dev;
 	struct cdev *cdev;
@@ -1056,11 +1059,27 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
 	if (rc)
 		goto err;
 
+	adev = &cxlds->cxl_mbox.adev;
+	adev->id = cxlmd->id;
+	adev->name = CXL_ADEV_NAME;
+	adev->dev.parent = dev;
+
+	rc = auxiliary_device_init(adev);
+	if (rc)
+		goto err;
+
+	rc = auxiliary_device_add(adev);
+	if (rc)
+		goto aux_err;
+
 	rc = devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
 	if (rc)
 		return ERR_PTR(rc);
 	return cxlmd;
 
+aux_err:
+	auxiliary_device_uninit(adev);
+
 err:
 	/*
 	 * The cdev was briefly live, shutdown any ioctl operations that
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 04/10] taint: Add TAINT_FWCTL
  2024-08-21 18:10 ` [PATCH v3 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
@ 2024-08-21 23:35   ` Greg Kroah-Hartman
  2024-08-22 15:34     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Greg Kroah-Hartman @ 2024-08-21 23:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Christoph Hellwig, Itay Avraham,
	Jiri Pirko, Jakub Kicinski, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed

On Wed, Aug 21, 2024 at 03:10:56PM -0300, Jason Gunthorpe wrote:
> Requesting a fwctl scope of access that includes mutating device debug
> data will cause the kernel to be tainted. Changing the device operation
> through things in the debug scope may cause the device to malfunction in
> undefined ways. This should be reflected in the TAINT flags to help any
> debuggers understand that something has been done.

I know naming is hard, but the word "fwctl" is rough, don't you think?
It's become much more than just a random driver in the kernel tree, it's
now a taint flag and is exposed to userspace.  So I think you need to
rename it to something that is at least pronouncable when talking about
it (i.e. something with vowels...)

There's no need to keep it in 8 or less characters, this isn't the
1970's anymore, we have enough room for everyone to spell things out
now. :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  2024-08-21 18:10 ` [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
@ 2024-08-21 23:49   ` Jakub Kicinski
  2024-08-22  0:14     ` Jason Gunthorpe
  2024-08-23 14:23   ` Jonathan Cameron
  1 sibling, 1 reply; 33+ messages in thread
From: Jakub Kicinski @ 2024-08-21 23:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:57 -0300 Jason Gunthorpe wrote:
> +	case FWCTL_RPC_DEBUG_WRITE_FULL:
> +		if (!capable(CAP_SYS_RAWIO))
> +			return -EPERM;
> +		fallthrough;
> +	case FWCTL_RPC_DEBUG_WRITE:

Nacked-by: Jakub Kicinski <kuba@kernel.org> # RFC 3514

How many times do I have to ask you to keep my tags, and to CC netdev?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  2024-08-21 23:49   ` Jakub Kicinski
@ 2024-08-22  0:14     ` Jason Gunthorpe
  2024-08-22  0:30       ` Jakub Kicinski
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-22  0:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed

On Wed, Aug 21, 2024 at 04:49:50PM -0700, Jakub Kicinski wrote:
> On Wed, 21 Aug 2024 15:10:57 -0300 Jason Gunthorpe wrote:
> > +	case FWCTL_RPC_DEBUG_WRITE_FULL:
> > +		if (!capable(CAP_SYS_RAWIO))
> > +			return -EPERM;
> > +		fallthrough;
> > +	case FWCTL_RPC_DEBUG_WRITE:
> 
> Nacked-by: Jakub Kicinski <kuba@kernel.org> # RFC 3514

Your "evil bit" thing has been responded to already and that isn't how
it works.

> How many times do I have to ask you to keep my tags, and to CC netdev?

It is on patch 6 which is where I said I'd put it:

https://lore.kernel.org/all/20240605120634.GS19897@nvidia.com/

You never asked me for netdev ccs.

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  2024-08-22  0:14     ` Jason Gunthorpe
@ 2024-08-22  0:30       ` Jakub Kicinski
  2024-08-27 15:27         ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Kicinski @ 2024-08-22  0:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 21:14:34 -0300 Jason Gunthorpe wrote:
> > Nacked-by: Jakub Kicinski <kuba@kernel.org> # RFC 3514  
> 
> Your "evil bit" thing has been responded to already and that isn't how
> it works.

"Isn't how it works"? Please just carry the tag and don't waste my time.

> You never asked me for netdev ccs.

I definitely have. Either you or Saeed who was posting the earlier
revisions.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 04/10] taint: Add TAINT_FWCTL
  2024-08-21 23:35   ` Greg Kroah-Hartman
@ 2024-08-22 15:34     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-22 15:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Christoph Hellwig, Itay Avraham,
	Jiri Pirko, Jakub Kicinski, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed

On Thu, Aug 22, 2024 at 07:35:20AM +0800, Greg Kroah-Hartman wrote:
> On Wed, Aug 21, 2024 at 03:10:56PM -0300, Jason Gunthorpe wrote:
> > Requesting a fwctl scope of access that includes mutating device debug
> > data will cause the kernel to be tainted. Changing the device operation
> > through things in the debug scope may cause the device to malfunction in
> > undefined ways. This should be reflected in the TAINT flags to help any
> > debuggers understand that something has been done.
> 
> I know naming is hard, but the word "fwctl" is rough, don't you think?
> It's become much more than just a random driver in the kernel tree, it's
> now a taint flag and is exposed to userspace.  So I think you need to
> rename it to something that is at least pronouncable when talking about
> it (i.e. something with vowels...)

Okay, that makes sense to me. We could also choose a different name
for the taint flag.

Let's see if some people have some ideas, I don't have a ready
alternative..

We've just been calling it "firwmare control" in conversation

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev
  2024-08-21 18:10 ` [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
@ 2024-08-23 13:48   ` Jonathan Cameron
  0 siblings, 0 replies; 33+ messages in thread
From: Jonathan Cameron @ 2024-08-23 13:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:53 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Create the class, character device and functions for a fwctl driver to
> un/register to the subsystem.
> 
> A typical fwctl driver has a sysfs presence like:
> 
> $ ls -l /dev/fwctl/fwctl0
> crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0
> 
> $ ls /sys/class/fwctl/fwctl0
> dev  device  power  subsystem  uevent
> 
> $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> ibp0s10f0
> 
> $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> fwctl0/
> 
> $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> dev  device  power  subsystem  uevent
> 
> Which allows userspace to link all the multi-subsystem driver components
> together and learn the subsystem specific names for the device's
> components.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Hi Jason,

Tags I'm giving for this are still on a purely technical basis.
I'm not taking a position (yet anyway) on whether this is is
a good idea in general. I would like that discussion to not risk
being distracted by the code state etc though, so FWIW this is
nice and clean and looks good to me, so that shouldn't be an
issue!

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device
  2024-08-21 18:10 ` [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
@ 2024-08-23 14:02   ` Jonathan Cameron
  2024-08-27 14:56     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Cameron @ 2024-08-23 14:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:54 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Each file descriptor gets a chunk of per-FD driver specific context that
> allows the driver to attach a device specific struct to. The core code
> takes care of the memory lifetime for this structure.
> 
> The ioctl dispatch and design is based on what was built for iommufd. The
> ioctls have a struct which has a combined in/out behavior with a typical
> 'zero pad' scheme for future extension and backwards compatibility.
> 
> Like iommufd some shared logic does most of the ioctl marshalling and
> compatibility work and tables diatches to some function pointers for
> each unique iotcl.
> 
> This approach has proven to work quite well in the iommufd and rdma
> subsystems.
> 
> Allocate an ioctl number space for the subsystem.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Hi Jason,

A few minor things inline, but all trivial so
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
> index 7f3e7713d0e6e9..f2e30ffc1e0cb5 100644
> --- a/drivers/fwctl/main.c
> +++ b/drivers/fwctl/main.c



>  
> @@ -71,6 +183,9 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
>  
>  	fwctl->dev.class = &fwctl_class;
>  	fwctl->dev.parent = parent;
> +	init_rwsem(&fwctl->registration_lock);
> +	mutex_init(&fwctl->uctx_list_lock);

If the ida_alloc_max() fails,I don't think you destroy the mutex as the
device isn't yet initialized / put in the error path.

Whilst i find it hard to care, it's nice to always destroy mutex, or not do it at all.
Feels odd to only do it if things go well.

> +	INIT_LIST_HEAD(&fwctl->uctx_list);
>  
>  	devnum = ida_alloc_max(&fwctl_ida, FWCTL_MAX_DEVICES - 1, GFP_KERNEL);
>  	if (devnum < 0)
> @@ -127,6 +242,10 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
>   * Undoes fwctl_register(). On return no driver ops will be called. The
>   * caller must still call fwctl_put() to free the fwctl.
>   *
> + * Unregister will return even if userspace still has file descriptors open.
> + * This will call ops->close_uctx() on any open FDs and after return no driver
> + * op will be called. The FDs remain open but all fops will return -ENODEV.
> + *
>   * The design of fwctl allows this sort of disassociation of the driver from the
>   * subsystem primarily by keeping memory allocations owned by the core subsytem.
>   * The fwctl_device and fwctl_uctx can both be freed without requiring a driver
> @@ -134,7 +253,23 @@ EXPORT_SYMBOL_NS_GPL(fwctl_register, FWCTL);
>   */
>  void fwctl_unregister(struct fwctl_device *fwctl)
>  {
> +	struct fwctl_uctx *uctx;
> +
>  	cdev_device_del(&fwctl->cdev, &fwctl->dev);
> +
> +	/* Disable and free the driver's resources for any still open FDs. */
> +	guard(rwsem_write)(&fwctl->registration_lock);
> +	guard(mutex)(&fwctl->uctx_list_lock);
> +	while ((uctx = list_first_entry_or_null(&fwctl->uctx_list,
> +						struct fwctl_uctx,
> +						uctx_list_entry)))
> +		fwctl_destroy_uctx(uctx);
> +
> +	/*
> +	 * The driver module may unload after this returns, the op pointer will
> +	 * not be valid.
> +	 */
> +	fwctl->ops = NULL;
>  }
>  EXPORT_SYMBOL_NS_GPL(fwctl_unregister, FWCTL);
>  
> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index 68ac2d5ab87481..ca4245825e91bf 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h

>  
>  /**
> @@ -26,6 +49,15 @@ struct fwctl_device {
>  	struct device dev;
>  	/* private: */
>  	struct cdev cdev;
> +
> +	/*
> +	 * Protect ops, held for write when ops becomes NULL during unregister,
> +	 * held for read whenver ops is loaded or an ops function is running.
> +	 */
> +	struct rw_semaphore registration_lock;

Maybe move down to just above ops?

> +	/* Protect uctx_list */
> +	struct mutex uctx_list_lock;
> +	struct list_head uctx_list;
>  	const struct fwctl_ops *ops;
>  };

> diff --git a/include/uapi/fwctl/fwctl.h b/include/uapi/fwctl/fwctl.h
> new file mode 100644
> index 00000000000000..22fa750d7e8184
> --- /dev/null
> +++ b/include/uapi/fwctl/fwctl.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
> + */
> +#ifndef _UAPI_FWCTL_H
> +#define _UAPI_FWCTL_H
> +
> +#define FWCTL_TYPE 0x9A
> +
> +/**
> + * DOC: General ioctl format
> + *
> + * The ioctl interface follows a general format to allow for extensibility. Each
> + * ioctl is passed in a structure pointer as the argument providing the size of
Pedantic Englishman time:
passed a structure pointer

(otherwise I read that as passing an ioctl in a pointer which is weird).

> + * the structure in the first u32. The kernel checks that any structure space 
...



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device
  2024-08-21 18:10 ` [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
@ 2024-08-23 14:14   ` Jonathan Cameron
  2024-08-27 14:47     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Cameron @ 2024-08-23 14:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:55 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Userspace will need to know some details about the fwctl interface being
> used to locate the correct userspace code to communicate with the
> kernel. Provide a simple device_type enum indicating what the kernel
> driver is.
> 
> Allow the device to provide a device specific info struct that contains
> any additional information that the driver may need to provide to
> userspace.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Just one minor question: How likely is the data being passed back
from the driver to be const?  Feels like it might be and should
be easy enough to support either const or not.

Either way, LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> diff --git a/include/linux/fwctl.h b/include/linux/fwctl.h
> index ca4245825e91bf..6b596931a55169 100644
> --- a/include/linux/fwctl.h
> +++ b/include/linux/fwctl.h
> @@ -7,6 +7,7 @@
>  #include <linux/device.h>
>  #include <linux/cdev.h>
>  #include <linux/cleanup.h>
> +#include <uapi/fwctl/fwctl.h>
>  
>  struct fwctl_device;
>  struct fwctl_uctx;
> @@ -19,6 +20,10 @@ struct fwctl_uctx;
>   * it will block device hot unplug and module unloading.
>   */
>  struct fwctl_ops {
> +	/**
> +	 * @device_type: The drivers assigned device_type number. This is uABI.
> +	 */
> +	enum fwctl_device_type device_type;
>  	/**
>  	 * @uctx_size: The size of the fwctl_uctx struct to allocate. The first
>  	 * bytes of this memory will be a fwctl_uctx. The driver can use the
> @@ -35,6 +40,13 @@ struct fwctl_ops {
>  	 * is closed.
>  	 */
>  	void (*close_uctx)(struct fwctl_uctx *uctx);
> +	/**
> +	 * @info: Implement FWCTL_INFO. Return a kmalloc() memory that is copied
> +	 * to out_device_data. On input length indicates the size of the user
> +	 * buffer on output it indicates the size of the memory. The driver can
> +	 * ignore length on input, the core code will handle everything.
> +	 */

Maybe it's worth supporting const data as well?

> +	void *(*info)(struct fwctl_uctx *uctx, size_t *length);
>  };
>  
>  /**



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  2024-08-21 18:10 ` [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
  2024-08-21 23:49   ` Jakub Kicinski
@ 2024-08-23 14:23   ` Jonathan Cameron
  1 sibling, 0 replies; 33+ messages in thread
From: Jonathan Cameron @ 2024-08-23 14:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:57 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Add the FWCTL_RPC ioctl which allows a request/response RPC call to device
> firmware. Drivers implementing this call must follow the security
> guidelines under Documentation/userspace-api/fwctl.rst
> 
> The core code provides some memory management helpers to get the messages
> copied from and back to userspace. The driver is responsible for
> allocating the output message memory and delivering the message to the
> device.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 06/10] fwctl: Add documentation
  2024-08-21 18:10 ` [PATCH v3 06/10] fwctl: Add documentation Jason Gunthorpe
@ 2024-08-23 14:35   ` Jonathan Cameron
  2024-08-27 14:58     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Cameron @ 2024-08-23 14:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:58 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Document the purpose and rules for the fwctl subsystem.
> 
> Link in kdocs to the doc tree.
> 
> Nacked-by: Jakub Kicinski <kuba@kernel.org>
> Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Just one trivial plural / singular comment.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/userspace-api/fwctl.rst | 285 ++++++++++++++++++++++++++
>  Documentation/userspace-api/index.rst |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/userspace-api/fwctl.rst
> 
> diff --git a/Documentation/userspace-api/fwctl.rst b/Documentation/userspace-api/fwctl.rst
> new file mode 100644
> index 00000000000000..8f3da30ee7c91b
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl.rst
> @@ -0,0 +1,285 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============
> +fwctl subsystem
> +===============
> +
> +:Author: Jason Gunthorpe
> +
> +Overview
> +========
> +
> +Modern devices contain extensive amounts of FW, and in many cases, are largely
> +software-defined pieces of hardware. The evolution of this approach is largely a
> +reaction to Moore's Law where a chip tape out is now highly expensive, and the
> +chip design is extremely large. Replacing fixed HW logic with a flexible and
> +tightly coupled FW/HW combination is an effective risk mitigation against chip
> +respin. Problems in the HW design can be counteracted in device FW. This is
> +especially true for devices which present a stable and backwards compatible
> +interface to the operating system driver (such as NVMe).
> +
> +The FW layer in devices has grown to incredible sizes and devices frequently
incredible size
(tricky to get the plurals right in this sentence, but currently its a mixture)
> +integrate clusters of fast processors to run it. For example, mlx5 devices have
> +over 30MB of FW code, and big configurations operate with over 1GB of FW managed
> +runtime state.
...


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
  2024-08-21 18:10 ` [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
@ 2024-08-23 14:48   ` Jonathan Cameron
  2024-08-27 15:07     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Cameron @ 2024-08-23 14:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Wed, 21 Aug 2024 15:10:59 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> mlx5's fw has long provided a User Context concept. This has a long
> history in RDMA as part of the devx extended verbs programming
> interface. A User Context is a security envelope that contains objects and
> controls access. It contains the Protection Domain object from the
> InfiniBand Architecture and both togther provide the OS with the necessary
> tools to bind a security context like a process to the device.
> 
> The security context is restricted to not be able to touch the kernel or
> other processes. In the RDMA verbs case it is also restricted to not touch
> global device resources.
> 
> The fwctl_mlx5 takes this approach and builds a User Context per fwctl
> file descriptor and uses a FW security capability on the User Context to
> enable access to global device resources. This makes the context useful
> for provisioning and debugging the global device state.
> 
> mlx5 already has a robust infrastructure for delivering RPC messages to
> fw. Trivially connect fwctl's RPC mechanism to mlx5_cmd_do(). Enforce the
> User Context ID in every RPC header so the FW knows the security context
> of the issuing ID.
> 
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Trivial stuff only. Feel free to ignore if you really like the code
the way it is.  I don't know the MLX5 parts, but based on just what
is visible here and in this series.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> diff --git a/drivers/fwctl/mlx5/main.c b/drivers/fwctl/mlx5/main.c
> new file mode 100644
> index 00000000000000..8839770fbe7ba5
> --- /dev/null
> +++ b/drivers/fwctl/mlx5/main.c


> +
> +static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
> +			    void *rpc_in, size_t in_len, size_t *out_len)
> +{
> +	struct mlx5ctl_dev *mcdev =
> +		container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
> +	struct mlx5ctl_uctx *mfd =
> +		container_of(uctx, struct mlx5ctl_uctx, uctx);
> +	void *rpc_alloc __free(kvfree) = NULL;

Whilst you can't completely pair this with destructor, I'd still
move this as locally as possible.

> +	void *rpc_out;
> +	int ret;
> +
> +	if (in_len < MLX5_ST_SZ_BYTES(mbox_in_hdr) ||
> +	    *out_len < MLX5_ST_SZ_BYTES(mbox_out_hdr))
> +		return ERR_PTR(-EMSGSIZE);
> +
> +	/* FIXME: Requires device support for more scopes */
> +	if (scope != FWCTL_RPC_CONFIGURATION &&
> +	    scope != FWCTL_RPC_DEBUG_READ_ONLY)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	mlx5ctl_dbg(mcdev, "[UID %d] cmdif: opcode 0x%x inlen %zu outlen %zu\n",
> +		    mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
> +		    in_len, *out_len);
> +
> +	if (!mlx5ctl_validate_rpc(rpc_in, scope))
> +		return ERR_PTR(-EBADMSG);
> +
> +	/*
> +	 * mlx5_cmd_do() copies the input message to its own buffer before
> +	 * executing it, so we can reuse the allocation for the output.
> +	 */
> +	if (*out_len <= in_len) {
> +		rpc_out = rpc_in;
> +	} else {
> +		rpc_out = rpc_alloc = kvzalloc(*out_len, GFP_KERNEL);
> +		if (!rpc_alloc)
> +			return ERR_PTR(-ENOMEM);
> +	}
> +
> +	/* Enforce the user context for the command */
> +	MLX5_SET(mbox_in_hdr, rpc_in, uid, mfd->uctx_uid);
> +	ret = mlx5_cmd_do(mcdev->mdev, rpc_in, in_len, rpc_out, *out_len);
> +
> +	mlx5ctl_dbg(mcdev,
> +		    "[UID %d] cmdif: opcode 0x%x status 0x%x retval %pe\n",
> +		    mfd->uctx_uid, MLX5_GET(mbox_in_hdr, rpc_in, opcode),
> +		    MLX5_GET(mbox_out_hdr, rpc_out, status), ERR_PTR(ret));
> +
> +	/*
> +	 * -EREMOTEIO means execution succeeded and the out is valid,
> +	 * but an error code was returned inside out. Everything else
> +	 * means the RPC did not make it to the device.
> +	 */
> +	if (ret && ret != -EREMOTEIO)
> +		return ERR_PTR(ret);
> +	if (rpc_out == rpc_in)
> +		return rpc_in;
> +	return_ptr(rpc_alloc);
> +}
> +

> +static void mlx5ctl_remove(struct auxiliary_device *adev)
> +{
> +	struct mlx5ctl_dev *mcdev __free(mlx5ctl) = auxiliary_get_drvdata(adev);

I'm not keen on the non constructor being paired with destructor
but it's your code so you get keep the confusion if you really
like it.

I'd just have an explicit put.

> +
> +	fwctl_unregister(&mcdev->fwctl);
> +}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device
  2024-08-23 14:14   ` Jonathan Cameron
@ 2024-08-27 14:47     ` Jason Gunthorpe
  2024-08-27 14:55       ` Andy Gospodarek
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-27 14:47 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Fri, Aug 23, 2024 at 03:14:11PM +0100, Jonathan Cameron wrote:
> On Wed, 21 Aug 2024 15:10:55 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > Userspace will need to know some details about the fwctl interface being
> > used to locate the correct userspace code to communicate with the
> > kernel. Provide a simple device_type enum indicating what the kernel
> > driver is.
> > 
> > Allow the device to provide a device specific info struct that contains
> > any additional information that the driver may need to provide to
> > userspace.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Just one minor question: How likely is the data being passed back
> from the driver to be const?  

I'm guessing not very? I expect alot of drivers will want to include
dynamic information about their FW

> Feels like it might be and should
> be easy enough to support either const or not.

It would by easy, lets wait and see, adding another op is trivial.
Allocating memory is not the end of the world on this path anyhow.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device
  2024-08-27 14:47     ` Jason Gunthorpe
@ 2024-08-27 14:55       ` Andy Gospodarek
  0 siblings, 0 replies; 33+ messages in thread
From: Andy Gospodarek @ 2024-08-27 14:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jonathan Cameron, Andy Gospodarek, Aron Silverton, Dan Williams,
	Daniel Vetter, Dave Jiang, David Ahern, Greg Kroah-Hartman,
	Christoph Hellwig, Itay Avraham, Jiri Pirko, Jakub Kicinski,
	Leonid Bloch, Leon Romanovsky, linux-cxl, linux-rdma,
	Saeed Mahameed

On Tue, Aug 27, 2024 at 11:47:23AM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 23, 2024 at 03:14:11PM +0100, Jonathan Cameron wrote:
> > On Wed, 21 Aug 2024 15:10:55 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > > Userspace will need to know some details about the fwctl interface being
> > > used to locate the correct userspace code to communicate with the
> > > kernel. Provide a simple device_type enum indicating what the kernel
> > > driver is.
> > > 
> > > Allow the device to provide a device specific info struct that contains
> > > any additional information that the driver may need to provide to
> > > userspace.
> > > 
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Just one minor question: How likely is the data being passed back
> > from the driver to be const?  
> 
> I'm guessing not very? I expect alot of drivers will want to include
> dynamic information about their FW
> 

Agreed.  The presumption is that this will be used to query information from
FW that has no existing API to discover the values.

> > Feels like it might be and should
> > be easy enough to support either const or not.
> 
> It would by easy, lets wait and see, adding another op is trivial.
> Allocating memory is not the end of the world on this path anyhow.

+1 

> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device
  2024-08-23 14:02   ` Jonathan Cameron
@ 2024-08-27 14:56     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-27 14:56 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Fri, Aug 23, 2024 at 03:02:07PM +0100, Jonathan Cameron wrote:

> > @@ -71,6 +183,9 @@ _alloc_device(struct device *parent, const struct fwctl_ops *ops, size_t size)
> >  
> >  	fwctl->dev.class = &fwctl_class;
> >  	fwctl->dev.parent = parent;
> > +	init_rwsem(&fwctl->registration_lock);
> > +	mutex_init(&fwctl->uctx_list_lock);
> 
> If the ida_alloc_max() fails,I don't think you destroy the mutex as the
> device isn't yet initialized / put in the error path.

Right
 
> Whilst i find it hard to care, it's nice to always destroy mutex, or not do it at all.
> Feels odd to only do it if things go well.

Indeed, mutex_destroy is just a sanity check. Still, lets just change
the order then and move the ida up.

> > @@ -26,6 +49,15 @@ struct fwctl_device {
> >  	struct device dev;
> >  	/* private: */
> >  	struct cdev cdev;
> > +
> > +	/*
> > +	 * Protect ops, held for write when ops becomes NULL during unregister,
> > +	 * held for read whenver ops is loaded or an ops function is running.
> > +	 */
> > +	struct rw_semaphore registration_lock;
> 
> Maybe move down to just above ops?

Yeah

> > +/**
> > + * DOC: General ioctl format
> > + *
> > + * The ioctl interface follows a general format to allow for extensibility. Each
> > + * ioctl is passed in a structure pointer as the argument providing the size of
> Pedantic Englishman time:
> passed a structure pointer
> 
> (otherwise I read that as passing an ioctl in a pointer which is weird).

Done

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 06/10] fwctl: Add documentation
  2024-08-23 14:35   ` Jonathan Cameron
@ 2024-08-27 14:58     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-27 14:58 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Fri, Aug 23, 2024 at 03:35:13PM +0100, Jonathan Cameron wrote:

> > +Modern devices contain extensive amounts of FW, and in many cases, are largely
> > +software-defined pieces of hardware. The evolution of this approach is largely a
> > +reaction to Moore's Law where a chip tape out is now highly expensive, and the
> > +chip design is extremely large. Replacing fixed HW logic with a flexible and
> > +tightly coupled FW/HW combination is an effective risk mitigation against chip
> > +respin. Problems in the HW design can be counteracted in device FW. This is
> > +especially true for devices which present a stable and backwards compatible
> > +interface to the operating system driver (such as NVMe).
> > +
> > +The FW layer in devices has grown to incredible sizes and devices frequently
> incredible size
> (tricky to get the plurals right in this sentence, but currently its a mixture)

Okay I trust your english!

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw
  2024-08-23 14:48   ` Jonathan Cameron
@ 2024-08-27 15:07     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-27 15:07 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch,
	Leon Romanovsky, linux-cxl, linux-rdma, Saeed Mahameed

On Fri, Aug 23, 2024 at 03:48:30PM +0100, Jonathan Cameron wrote:
> On Wed, 21 Aug 2024 15:10:59 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > From: Saeed Mahameed <saeedm@nvidia.com>
> > 
> > mlx5's fw has long provided a User Context concept. This has a long
> > history in RDMA as part of the devx extended verbs programming
> > interface. A User Context is a security envelope that contains objects and
> > controls access. It contains the Protection Domain object from the
> > InfiniBand Architecture and both togther provide the OS with the necessary
> > tools to bind a security context like a process to the device.
> > 
> > The security context is restricted to not be able to touch the kernel or
> > other processes. In the RDMA verbs case it is also restricted to not touch
> > global device resources.
> > 
> > The fwctl_mlx5 takes this approach and builds a User Context per fwctl
> > file descriptor and uses a FW security capability on the User Context to
> > enable access to global device resources. This makes the context useful
> > for provisioning and debugging the global device state.
> > 
> > mlx5 already has a robust infrastructure for delivering RPC messages to
> > fw. Trivially connect fwctl's RPC mechanism to mlx5_cmd_do(). Enforce the
> > User Context ID in every RPC header so the FW knows the security context
> > of the issuing ID.
> > 
> > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Trivial stuff only. Feel free to ignore if you really like the code
> the way it is.  I don't know the MLX5 parts, but based on just what
> is visible here and in this series.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > diff --git a/drivers/fwctl/mlx5/main.c b/drivers/fwctl/mlx5/main.c
> > new file mode 100644
> > index 00000000000000..8839770fbe7ba5
> > --- /dev/null
> > +++ b/drivers/fwctl/mlx5/main.c
> 
> 
> > +
> > +static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
> > +			    void *rpc_in, size_t in_len, size_t *out_len)
> > +{
> > +	struct mlx5ctl_dev *mcdev =
> > +		container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
> > +	struct mlx5ctl_uctx *mfd =
> > +		container_of(uctx, struct mlx5ctl_uctx, uctx);
> > +	void *rpc_alloc __free(kvfree) = NULL;
> 
> Whilst you can't completely pair this with destructor, I'd still
> move this as locally as possible.

Yeah, this is a troubling area for cleanup.h here.

I can't really move it as locally as possible because the assignment
is in a scope:

	} else {
		rpc_out = rpc_alloc = kvzalloc(*out_len, GFP_KERNEL);
		if (!rpc_alloc)
			return ERR_PTR(-ENOMEM);
	}

So given the choice of putting it at the top or put a NULL initialized
variable above the if, I'm feeling the top is more kernely?

Or this is just the wrong place to use a cleanup.h technique??

--- a/drivers/fwctl/mlx5/main.c
+++ b/drivers/fwctl/mlx5/main.c
@@ -226,7 +226,6 @@ static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
                container_of(uctx->fwctl, struct mlx5ctl_dev, fwctl);
        struct mlx5ctl_uctx *mfd =
                container_of(uctx, struct mlx5ctl_uctx, uctx);
-       void *rpc_alloc __free(kvfree) = NULL;
        void *rpc_out;
        int ret;
 
@@ -253,8 +252,8 @@ static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
        if (*out_len <= in_len) {
                rpc_out = rpc_in;
        } else {
-               rpc_out = rpc_alloc = kvzalloc(*out_len, GFP_KERNEL);
-               if (!rpc_alloc)
+               rpc_out = kvzalloc(*out_len, GFP_KERNEL);
+               if (!rpc_out)
                        return ERR_PTR(-ENOMEM);
        }
 
@@ -272,11 +271,12 @@ static void *mlx5ctl_fw_rpc(struct fwctl_uctx *uctx, enum fwctl_rpc_scope scope,
         * but an error code was returned inside out. Everything else
         * means the RPC did not make it to the device.
         */
-       if (ret && ret != -EREMOTEIO)
+       if (ret && ret != -EREMOTEIO) {
+               if (rpc_out != rpc_in)
+                       kfree(rpc_out);
                return ERR_PTR(ret);
-       if (rpc_out == rpc_in)
-               return rpc_in;
-       return_ptr(rpc_alloc);
+       }
+       return rpc_out;
 }

Arguably it is clearer like above.. Let's go with the above, I think
this was too clever a use of cleanup.h, it seems to work alot better
with simpler cases.

> > +static void mlx5ctl_remove(struct auxiliary_device *adev)
> > +{
> > +	struct mlx5ctl_dev *mcdev __free(mlx5ctl) = auxiliary_get_drvdata(adev);
> 
> I'm not keen on the non constructor being paired with destructor
> but it's your code so you get keep the confusion if you really
> like it.
> 
> I'd just have an explicit put.

Yes, I thought I did that already.. Hum must have just thought about it

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  2024-08-22  0:30       ` Jakub Kicinski
@ 2024-08-27 15:27         ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-08-27 15:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	Dave Jiang, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed

On Wed, Aug 21, 2024 at 05:30:51PM -0700, Jakub Kicinski wrote:
> On Wed, 21 Aug 2024 21:14:34 -0300 Jason Gunthorpe wrote:
> > > Nacked-by: Jakub Kicinski <kuba@kernel.org> # RFC 3514  
> > 
> > Your "evil bit" thing has been responded to already and that isn't how
> > it works.
> 
> "Isn't how it works"? Please just carry the tag and don't waste my time.

You raised this before, it was answered and explained. You didn't
continue that discussion.

It is standard community practice to have reasonable technical
discussion before breaking out NAK tags. I'm definately not carrying
tags on technical patches that don't meet that threshold.

If you have a question to ask then ask it in a normal polite way
please.

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/10] Introduce fwctl subystem
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (9 preceding siblings ...)
  2024-08-21 18:11 ` [PATCH v3 10/10] cxl: Create an auxiliary device for fwctl_cxl Jason Gunthorpe
@ 2024-09-13 22:39 ` Dave Jiang
  2024-09-16  7:54   ` Leon Romanovsky
  2024-09-17 20:59   ` Dave Jiang
  2024-12-05 22:28 ` Shannon Nelson
  11 siblings, 2 replies; 33+ messages in thread
From: Dave Jiang @ 2024-09-13 22:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	David Ahern, Greg Kroah-Hartman, Christoph Hellwig, Itay Avraham,
	Jiri Pirko, Jakub Kicinski, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed



On 8/21/24 11:10 AM, Jason Gunthorpe wrote:
> fwctl is a new subsystem intended to bring some common rules and order to
> the growing pattern of exposing a secure FW interface directly to
> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> exposing a device for datapath operations fwctl is focused on debugging,
> configuration and provisioning of the device. It will not have the
> necessary features like interrupt delivery to support a datapath.
> 
> This concept is similar to the long standing practice in the "HW" RAID
> space of having a device specific misc device to manager the RAID
> controller FW. fwctl generalizes this notion of a companion debug and
> management interface that goes along with a dataplane implemented in an
> appropriate subsystem.
> 
> The need for this has reached a critical point as many users are moving to
> run lockdown enabled kernels. Several existing devices have had long
> standing tooling for management that relied on /sys/../resource0 or PCI
> config space access which is not permitted in lockdown. A major point of
> fwctl is to define and document the rules that a device must follow to
> expose a lockdown compatible RPC.
> 
> Based on some discussion fwctl splits the RPCs into four categories
> 
> 	FWCTL_RPC_CONFIGURATION
> 	FWCTL_RPC_DEBUG_READ_ONLY
> 	FWCTL_RPC_DEBUG_WRITE
> 	FWCTL_RPC_DEBUG_WRITE_FULL
> 
> Where the latter two trigger a new TAINT_FWCTL, and the final one requires
> CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
> would be responsible to restrict RPCs to the requested security scope,
> while the core code handles the tainting and CAP checks.
> 
> For details see the final patch which introduces the documentation.
> 
> This series incorporates a version of the mlx5ctl interface previously
> proposed:
>   https://lore.kernel.org/r/20240207072435.14182-1-saeed@kernel.org/
> 
> For this series the memory registration mechanism was removed, but I
> expect it will come back.
> 
> It also includes the FWCL driver series from David:
>   https://lore.kernel.org/all/20240718213446.1750135-1-dave.jiang@intel.com/
> 
> 
> This is still waiting a 3rd fwctl driver and the CXL side to finish some
> of its development. The github has the necessary CXL precursor patches.
> 
> There have been two LWN articles written discussing various aspects of
> this proposal:
> 
>  https://lwn.net/Articles/955001/
>  https://lwn.net/Articles/969383/
> 
> And a really giant ksummit thread:
> 
>  https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
> 
> Several have expressed general support for this concept:
> 
>  Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
>  Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org/
>  Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
>  Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org/
>  NVIDIA Networking
>  Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
>  Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
>  SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
> 
> Work is ongoing for a robust multi-device open source userspace, currently
> the mlx5ctl_user that was posted by Saeed has been updated to use fwctl.
> 
>   https://github.com/saeedtx/mlx5ctl.git
>   https://github.com/jgunthorpe/mlx5ctl.git
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
> 
> v3:
>  - Rebase to v6.11-rc4
>  - Add a squashed version of David's CXL series as the 2nd driver
>  - Add missing includes
>  - Improve comments based on feedback
>  - Use the kdoc format that puts the member docs inside the struct
>  - Rewrite fwctl_alloc_device() to be clearer
>  - Incorporate all remarks for the documentation
> v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
>  - Rebase to v6.10-rc5
>  - Minor style changes
>  - Follow the style consensus for the guard stuff
>  - Documentation grammer/spelling
>  - Add missed length output for mlx5 get_info
>  - Add two more missed MLX5 CMD's
>  - Collect tags
> v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
> 
> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Cc: Aron Silverton <aron.silverton@oracle.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: Itay Avraham <itayavr@nvidia.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Jiri Pirko <jiri@nvidia.com>
> Cc: Leon Romanovsky <leonro@nvidia.com>
> Cc: Leonid Bloch <lbloch@nvidia.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-cxl@vger.kernel.org
> Cc: linux-rdma@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Dave Jiang (1):
>   fwctl/cxl: Add driver for CXL mailbox for handling CXL features
>     commands (RFC)
> 
> Jason Gunthorpe (7):
>   fwctl: Add basic structure for a class subsystem with a cdev
>   fwctl: Basic ioctl dispatch for the character device
>   fwctl: FWCTL_INFO to return basic information about the device
>   taint: Add TAINT_FWCTL
>   fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
>   fwctl: Add documentation
>   cxl: Create an auxiliary device for fwctl_cxl
> 
> Saeed Mahameed (2):
>   fwctl/mlx5: Support for communicating with mlx5 fw
>   mlx5: Create an auxiliary device for fwctl_mlx5
> 
>  Documentation/admin-guide/tainted-kernels.rst |   5 +
>  Documentation/userspace-api/fwctl.rst         | 285 ++++++++++++
>  Documentation/userspace-api/index.rst         |   1 +
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  MAINTAINERS                                   |  23 +
>  drivers/Kconfig                               |   2 +
>  drivers/Makefile                              |   1 +
>  drivers/cxl/core/memdev.c                     |  19 +
>  drivers/fwctl/Kconfig                         |  32 ++
>  drivers/fwctl/Makefile                        |   6 +
>  drivers/fwctl/cxl/Makefile                    |   4 +
>  drivers/fwctl/cxl/cxl.c                       | 274 ++++++++++++
>  drivers/fwctl/main.c                          | 414 ++++++++++++++++++
>  drivers/fwctl/mlx5/Makefile                   |   4 +
>  drivers/fwctl/mlx5/main.c                     | 337 ++++++++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/dev.c |   8 +
>  include/linux/cxl/mailbox.h                   | 104 +++++
>  include/linux/fwctl.h                         | 135 ++++++
>  include/linux/panic.h                         |   3 +-
>  include/uapi/fwctl/cxl.h                      |  94 ++++
>  include/uapi/fwctl/fwctl.h                    | 140 ++++++
>  include/uapi/fwctl/mlx5.h                     |  36 ++
>  kernel/panic.c                                |   1 +
>  tools/debugging/kernel-chktaint               |   8 +
>  24 files changed, 1936 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/userspace-api/fwctl.rst
>  create mode 100644 drivers/fwctl/Kconfig
>  create mode 100644 drivers/fwctl/Makefile
>  create mode 100644 drivers/fwctl/cxl/Makefile
>  create mode 100644 drivers/fwctl/cxl/cxl.c
>  create mode 100644 drivers/fwctl/main.c
>  create mode 100644 drivers/fwctl/mlx5/Makefile
>  create mode 100644 drivers/fwctl/mlx5/main.c
>  create mode 100644 include/linux/fwctl.h
>  create mode 100644 include/uapi/fwctl/cxl.h
>  create mode 100644 include/uapi/fwctl/fwctl.h
>  create mode 100644 include/uapi/fwctl/mlx5.h
> 
> 
> base-commit: cd0c76bee95e9c2092418523599439d2c8dbff7e

Hi Jason,
Which base-commit is this? I'm not finding the hash in the upstream tree. I'm having trouble applying the series against 6.10 or 6.11-rc7 via b4. 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/10] Introduce fwctl subystem
  2024-09-13 22:39 ` [PATCH v3 00/10] Introduce fwctl subystem Dave Jiang
@ 2024-09-16  7:54   ` Leon Romanovsky
  2024-09-17 20:59   ` Dave Jiang
  1 sibling, 0 replies; 33+ messages in thread
From: Leon Romanovsky @ 2024-09-16  7:54 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Jason Gunthorpe, Andy Gospodarek, Aron Silverton, Dan Williams,
	Daniel Vetter, David Ahern, Greg Kroah-Hartman, Christoph Hellwig,
	Itay Avraham, Jiri Pirko, Jakub Kicinski, Leonid Bloch, linux-cxl,
	linux-rdma, Saeed Mahameed

On Fri, Sep 13, 2024 at 03:39:57PM -0700, Dave Jiang wrote:
> 
> 
> On 8/21/24 11:10 AM, Jason Gunthorpe wrote:
> > fwctl is a new subsystem intended to bring some common rules and order to
> > the growing pattern of exposing a secure FW interface directly to
> > userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> > exposing a device for datapath operations fwctl is focused on debugging,
> > configuration and provisioning of the device. It will not have the
> > necessary features like interrupt delivery to support a datapath.
> > 
> > This concept is similar to the long standing practice in the "HW" RAID
> > space of having a device specific misc device to manager the RAID
> > controller FW. fwctl generalizes this notion of a companion debug and
> > management interface that goes along with a dataplane implemented in an
> > appropriate subsystem.
> > 
> > The need for this has reached a critical point as many users are moving to
> > run lockdown enabled kernels. Several existing devices have had long
> > standing tooling for management that relied on /sys/../resource0 or PCI
> > config space access which is not permitted in lockdown. A major point of
> > fwctl is to define and document the rules that a device must follow to
> > expose a lockdown compatible RPC.
> > 
> > Based on some discussion fwctl splits the RPCs into four categories
> > 
> > 	FWCTL_RPC_CONFIGURATION
> > 	FWCTL_RPC_DEBUG_READ_ONLY
> > 	FWCTL_RPC_DEBUG_WRITE
> > 	FWCTL_RPC_DEBUG_WRITE_FULL
> > 
> > Where the latter two trigger a new TAINT_FWCTL, and the final one requires
> > CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
> > would be responsible to restrict RPCs to the requested security scope,
> > while the core code handles the tainting and CAP checks.
> > 
> > For details see the final patch which introduces the documentation.
> > 
> > This series incorporates a version of the mlx5ctl interface previously
> > proposed:
> >   https://lore.kernel.org/r/20240207072435.14182-1-saeed@kernel.org/
> > 
> > For this series the memory registration mechanism was removed, but I
> > expect it will come back.
> > 
> > It also includes the FWCL driver series from David:
> >   https://lore.kernel.org/all/20240718213446.1750135-1-dave.jiang@intel.com/
> > 
> > 
> > This is still waiting a 3rd fwctl driver and the CXL side to finish some
> > of its development. The github has the necessary CXL precursor patches.
> > 
> > There have been two LWN articles written discussing various aspects of
> > this proposal:
> > 
> >  https://lwn.net/Articles/955001/
> >  https://lwn.net/Articles/969383/
> > 
> > And a really giant ksummit thread:
> > 
> >  https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
> > 
> > Several have expressed general support for this concept:
> > 
> >  Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
> >  Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org/
> >  Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> >  Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org/
> >  NVIDIA Networking
> >  Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
> >  Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
> >  SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
> > 
> > Work is ongoing for a robust multi-device open source userspace, currently
> > the mlx5ctl_user that was posted by Saeed has been updated to use fwctl.
> > 
> >   https://github.com/saeedtx/mlx5ctl.git
> >   https://github.com/jgunthorpe/mlx5ctl.git
> > 
> > This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
> > 
> > v3:
> >  - Rebase to v6.11-rc4
> >  - Add a squashed version of David's CXL series as the 2nd driver
> >  - Add missing includes
> >  - Improve comments based on feedback
> >  - Use the kdoc format that puts the member docs inside the struct
> >  - Rewrite fwctl_alloc_device() to be clearer
> >  - Incorporate all remarks for the documentation
> > v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
> >  - Rebase to v6.10-rc5
> >  - Minor style changes
> >  - Follow the style consensus for the guard stuff
> >  - Documentation grammer/spelling
> >  - Add missed length output for mlx5 get_info
> >  - Add two more missed MLX5 CMD's
> >  - Collect tags
> > v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
> > 
> > Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > Cc: Aron Silverton <aron.silverton@oracle.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: David Ahern <dsahern@kernel.org>
> > Cc: Itay Avraham <itayavr@nvidia.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Jiri Pirko <jiri@nvidia.com>
> > Cc: Leon Romanovsky <leonro@nvidia.com>
> > Cc: Leonid Bloch <lbloch@nvidia.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-cxl@vger.kernel.org
> > Cc: linux-rdma@vger.kernel.org
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Dave Jiang (1):
> >   fwctl/cxl: Add driver for CXL mailbox for handling CXL features
> >     commands (RFC)
> > 
> > Jason Gunthorpe (7):
> >   fwctl: Add basic structure for a class subsystem with a cdev
> >   fwctl: Basic ioctl dispatch for the character device
> >   fwctl: FWCTL_INFO to return basic information about the device
> >   taint: Add TAINT_FWCTL
> >   fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
> >   fwctl: Add documentation
> >   cxl: Create an auxiliary device for fwctl_cxl
> > 
> > Saeed Mahameed (2):
> >   fwctl/mlx5: Support for communicating with mlx5 fw
> >   mlx5: Create an auxiliary device for fwctl_mlx5
> > 
> >  Documentation/admin-guide/tainted-kernels.rst |   5 +
> >  Documentation/userspace-api/fwctl.rst         | 285 ++++++++++++
> >  Documentation/userspace-api/index.rst         |   1 +
> >  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
> >  MAINTAINERS                                   |  23 +
> >  drivers/Kconfig                               |   2 +
> >  drivers/Makefile                              |   1 +
> >  drivers/cxl/core/memdev.c                     |  19 +
> >  drivers/fwctl/Kconfig                         |  32 ++
> >  drivers/fwctl/Makefile                        |   6 +
> >  drivers/fwctl/cxl/Makefile                    |   4 +
> >  drivers/fwctl/cxl/cxl.c                       | 274 ++++++++++++
> >  drivers/fwctl/main.c                          | 414 ++++++++++++++++++
> >  drivers/fwctl/mlx5/Makefile                   |   4 +
> >  drivers/fwctl/mlx5/main.c                     | 337 ++++++++++++++
> >  drivers/net/ethernet/mellanox/mlx5/core/dev.c |   8 +
> >  include/linux/cxl/mailbox.h                   | 104 +++++
> >  include/linux/fwctl.h                         | 135 ++++++
> >  include/linux/panic.h                         |   3 +-
> >  include/uapi/fwctl/cxl.h                      |  94 ++++
> >  include/uapi/fwctl/fwctl.h                    | 140 ++++++
> >  include/uapi/fwctl/mlx5.h                     |  36 ++
> >  kernel/panic.c                                |   1 +
> >  tools/debugging/kernel-chktaint               |   8 +
> >  24 files changed, 1936 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/userspace-api/fwctl.rst
> >  create mode 100644 drivers/fwctl/Kconfig
> >  create mode 100644 drivers/fwctl/Makefile
> >  create mode 100644 drivers/fwctl/cxl/Makefile
> >  create mode 100644 drivers/fwctl/cxl/cxl.c
> >  create mode 100644 drivers/fwctl/main.c
> >  create mode 100644 drivers/fwctl/mlx5/Makefile
> >  create mode 100644 drivers/fwctl/mlx5/main.c
> >  create mode 100644 include/linux/fwctl.h
> >  create mode 100644 include/uapi/fwctl/cxl.h
> >  create mode 100644 include/uapi/fwctl/fwctl.h
> >  create mode 100644 include/uapi/fwctl/mlx5.h
> > 
> > 
> > base-commit: cd0c76bee95e9c2092418523599439d2c8dbff7e
> 
> Hi Jason,
> Which base-commit is this? I'm not finding the hash in the upstream tree. I'm having trouble applying the series against 6.10 or 6.11-rc7 via b4. 

This base-commit very depends on there cover letter was generated and
stored while creating the series. In this specific case, Jason put cover
letter to be the last commit in the series, so the base-commit points to
the last patch cd0c76bee95e ("cxl: Infrastructure for fwctl").

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/10] Introduce fwctl subystem
  2024-09-13 22:39 ` [PATCH v3 00/10] Introduce fwctl subystem Dave Jiang
  2024-09-16  7:54   ` Leon Romanovsky
@ 2024-09-17 20:59   ` Dave Jiang
  1 sibling, 0 replies; 33+ messages in thread
From: Dave Jiang @ 2024-09-17 20:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Gospodarek, Aron Silverton, Dan Williams, Daniel Vetter,
	David Ahern, Greg Kroah-Hartman, Christoph Hellwig, Itay Avraham,
	Jiri Pirko, Jakub Kicinski, Leonid Bloch, Leon Romanovsky,
	linux-cxl, linux-rdma, Saeed Mahameed



On 9/13/24 3:39 PM, Dave Jiang wrote:
> 
> 
> On 8/21/24 11:10 AM, Jason Gunthorpe wrote:
>> fwctl is a new subsystem intended to bring some common rules and order to
>> the growing pattern of exposing a secure FW interface directly to
>> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
>> exposing a device for datapath operations fwctl is focused on debugging,
>> configuration and provisioning of the device. It will not have the
>> necessary features like interrupt delivery to support a datapath.
>>
>> This concept is similar to the long standing practice in the "HW" RAID
>> space of having a device specific misc device to manager the RAID
>> controller FW. fwctl generalizes this notion of a companion debug and
>> management interface that goes along with a dataplane implemented in an
>> appropriate subsystem.
>>
>> The need for this has reached a critical point as many users are moving to
>> run lockdown enabled kernels. Several existing devices have had long
>> standing tooling for management that relied on /sys/../resource0 or PCI
>> config space access which is not permitted in lockdown. A major point of
>> fwctl is to define and document the rules that a device must follow to
>> expose a lockdown compatible RPC.
>>
>> Based on some discussion fwctl splits the RPCs into four categories
>>
>> 	FWCTL_RPC_CONFIGURATION
>> 	FWCTL_RPC_DEBUG_READ_ONLY
>> 	FWCTL_RPC_DEBUG_WRITE
>> 	FWCTL_RPC_DEBUG_WRITE_FULL
>>
>> Where the latter two trigger a new TAINT_FWCTL, and the final one requires
>> CAP_SYS_RAWIO - excluding it from lockdown. The device driver and its FW
>> would be responsible to restrict RPCs to the requested security scope,
>> while the core code handles the tainting and CAP checks.
>>
>> For details see the final patch which introduces the documentation.
>>
>> This series incorporates a version of the mlx5ctl interface previously
>> proposed:
>>   https://lore.kernel.org/r/20240207072435.14182-1-saeed@kernel.org/
>>
>> For this series the memory registration mechanism was removed, but I
>> expect it will come back.
>>
>> It also includes the FWCL driver series from David:
>>   https://lore.kernel.org/all/20240718213446.1750135-1-dave.jiang@intel.com/
>>
>>
>> This is still waiting a 3rd fwctl driver and the CXL side to finish some
>> of its development. The github has the necessary CXL precursor patches.
>>
>> There have been two LWN articles written discussing various aspects of
>> this proposal:
>>
>>  https://lwn.net/Articles/955001/
>>  https://lwn.net/Articles/969383/
>>
>> And a really giant ksummit thread:
>>
>>  https://lore.kernel.org/ksummit/668c67a324609_ed99294c0@dwillia2-xfh.jf.intel.com.notmuch/
>>
>> Several have expressed general support for this concept:
>>
>>  Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
>>  Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org/
>>  Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
>>  Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org/
>>  NVIDIA Networking
>>  Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
>>  Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
>>  SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
>>
>> Work is ongoing for a robust multi-device open source userspace, currently
>> the mlx5ctl_user that was posted by Saeed has been updated to use fwctl.
>>
>>   https://github.com/saeedtx/mlx5ctl.git
>>   https://github.com/jgunthorpe/mlx5ctl.git
>>
>> This is on github: https://github.com/jgunthorpe/linux/commits/fwctl
>>
>> v3:
>>  - Rebase to v6.11-rc4
>>  - Add a squashed version of David's CXL series as the 2nd driver
>>  - Add missing includes
>>  - Improve comments based on feedback
>>  - Use the kdoc format that puts the member docs inside the struct
>>  - Rewrite fwctl_alloc_device() to be clearer
>>  - Incorporate all remarks for the documentation
>> v2: https://lore.kernel.org/r/0-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com
>>  - Rebase to v6.10-rc5
>>  - Minor style changes
>>  - Follow the style consensus for the guard stuff
>>  - Documentation grammer/spelling
>>  - Add missed length output for mlx5 get_info
>>  - Add two more missed MLX5 CMD's
>>  - Collect tags
>> v1: https://lore.kernel.org/r/0-v1-9912f1a11620+2a-fwctl_jgg@nvidia.com
>>
>> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>> Cc: Aron Silverton <aron.silverton@oracle.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: David Ahern <dsahern@kernel.org>
>> Cc: Itay Avraham <itayavr@nvidia.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Jiri Pirko <jiri@nvidia.com>
>> Cc: Leon Romanovsky <leonro@nvidia.com>
>> Cc: Leonid Bloch <lbloch@nvidia.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: linux-cxl@vger.kernel.org
>> Cc: linux-rdma@vger.kernel.org
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>>
>> Dave Jiang (1):
>>   fwctl/cxl: Add driver for CXL mailbox for handling CXL features
>>     commands (RFC)
>>
>> Jason Gunthorpe (7):
>>   fwctl: Add basic structure for a class subsystem with a cdev
>>   fwctl: Basic ioctl dispatch for the character device
>>   fwctl: FWCTL_INFO to return basic information about the device
>>   taint: Add TAINT_FWCTL
>>   fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
>>   fwctl: Add documentation
>>   cxl: Create an auxiliary device for fwctl_cxl
>>
>> Saeed Mahameed (2):
>>   fwctl/mlx5: Support for communicating with mlx5 fw
>>   mlx5: Create an auxiliary device for fwctl_mlx5
>>
>>  Documentation/admin-guide/tainted-kernels.rst |   5 +
>>  Documentation/userspace-api/fwctl.rst         | 285 ++++++++++++
>>  Documentation/userspace-api/index.rst         |   1 +
>>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>>  MAINTAINERS                                   |  23 +
>>  drivers/Kconfig                               |   2 +
>>  drivers/Makefile                              |   1 +
>>  drivers/cxl/core/memdev.c                     |  19 +
>>  drivers/fwctl/Kconfig                         |  32 ++
>>  drivers/fwctl/Makefile                        |   6 +
>>  drivers/fwctl/cxl/Makefile                    |   4 +
>>  drivers/fwctl/cxl/cxl.c                       | 274 ++++++++++++
>>  drivers/fwctl/main.c                          | 414 ++++++++++++++++++
>>  drivers/fwctl/mlx5/Makefile                   |   4 +
>>  drivers/fwctl/mlx5/main.c                     | 337 ++++++++++++++
>>  drivers/net/ethernet/mellanox/mlx5/core/dev.c |   8 +
>>  include/linux/cxl/mailbox.h                   | 104 +++++
>>  include/linux/fwctl.h                         | 135 ++++++
>>  include/linux/panic.h                         |   3 +-
>>  include/uapi/fwctl/cxl.h                      |  94 ++++
>>  include/uapi/fwctl/fwctl.h                    | 140 ++++++
>>  include/uapi/fwctl/mlx5.h                     |  36 ++
>>  kernel/panic.c                                |   1 +
>>  tools/debugging/kernel-chktaint               |   8 +
>>  24 files changed, 1936 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/userspace-api/fwctl.rst
>>  create mode 100644 drivers/fwctl/Kconfig
>>  create mode 100644 drivers/fwctl/Makefile
>>  create mode 100644 drivers/fwctl/cxl/Makefile
>>  create mode 100644 drivers/fwctl/cxl/cxl.c
>>  create mode 100644 drivers/fwctl/main.c
>>  create mode 100644 drivers/fwctl/mlx5/Makefile
>>  create mode 100644 drivers/fwctl/mlx5/main.c
>>  create mode 100644 include/linux/fwctl.h
>>  create mode 100644 include/uapi/fwctl/cxl.h
>>  create mode 100644 include/uapi/fwctl/fwctl.h
>>  create mode 100644 include/uapi/fwctl/mlx5.h
>>
>>
>> base-commit: cd0c76bee95e9c2092418523599439d2c8dbff7e
> 
> Hi Jason,
> Which base-commit is this? I'm not finding the hash in the upstream tree. I'm having trouble applying the series against 6.10 or 6.11-rc7 via b4. 

Got it to apply against 6.11. So ignore this. :) 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/10] Introduce fwctl subystem
  2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
                   ` (10 preceding siblings ...)
  2024-09-13 22:39 ` [PATCH v3 00/10] Introduce fwctl subystem Dave Jiang
@ 2024-12-05 22:28 ` Shannon Nelson
  2024-12-05 23:58   ` Jason Gunthorpe
  11 siblings, 1 reply; 33+ messages in thread
From: Shannon Nelson @ 2024-12-05 22:28 UTC (permalink / raw)
  To: jgg
  Cc: andrew.gospodarek, aron.silverton, dan.j.williams, daniel.vetter,
	dave.jiang, dsahern, gregkh, hch, itayavr, jiri, kuba, lbloch,
	leonro, linux-cxl, linux-rdma, saeedm

On 08/21/2024 3:10 PM, Jason Gunthorpe wrote:
> 
> fwctl is a new subsystem intended to bring some common rules and order to
> the growing pattern of exposing a secure FW interface directly to
> userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
> exposing a device for datapath operations fwctl is focused on debugging,
> configuration and provisioning of the device. It will not have the
> necessary features like interrupt delivery to support a datapath.

[snip]

> 
> Several have expressed general support for this concept:
> 
>  Broadcom Networking - https://lore.kernel.org/r/Zf2n02q0GevGdS-Z@C02YVCJELVCG
>  Christoph Hellwig - https://lore.kernel.org/r/Zcx53N8lQjkpEu94@infradead.org/
>  Daniel Vetter - https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
>  Enfabrica - https://lore.kernel.org/r/9cc7127f-8674-43bc-b4d7-b1c4c2d96fed@kernel.org/
>  NVIDIA Networking
>  Oded Gabbay/Habana - https://lore.kernel.org/r/ZrMl1bkPP-3G9B4N@T14sgabbay.
>  Oracle Linux - https://lore.kernel.org/r/6lakj6lxlxhdgrewodvj3xh6sxn3d36t5dab6najzyti2navx3@wrge7cyfk6nq
>  SuSE/Hannes - https://lore.kernel.org/r/2fd48f87-2521-4c34-8589-dbb7e91bb1c8@suse.com
> 

Hi Jason,

To add to the support, I can say that we're building an fwctl driver
for our Pensando DSC device and likely will be able to post our first
RFC after the winter holidays.  This will include a couple of updates
in pds_core for support of a new auxiliary device, and a new pds_fwctl
driver to link that to fwctl subsystem.

sln


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/10] Introduce fwctl subystem
  2024-12-05 22:28 ` Shannon Nelson
@ 2024-12-05 23:58   ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2024-12-05 23:58 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: andrew.gospodarek, aron.silverton, dan.j.williams, daniel.vetter,
	dave.jiang, dsahern, gregkh, hch, itayavr, jiri, kuba, lbloch,
	leonro, linux-cxl, linux-rdma, saeedm

On Thu, Dec 05, 2024 at 02:28:18PM -0800, Shannon Nelson wrote:

> To add to the support, I can say that we're building an fwctl driver
> for our Pensando DSC device and likely will be able to post our first
> RFC after the winter holidays.  This will include a couple of updates
> in pds_core for support of a new auxiliary device, and a new pds_fwctl
> driver to link that to fwctl subsystem.

That's great Shannon, I look forward to seeing it in the new year. I
think the CXL driver is getting to be in good shape so I imagine
a possiblity to start to move this stuff into linux-next around
Feburary.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-12-05 23:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 18:10 [PATCH v3 00/10] Introduce fwctl subystem Jason Gunthorpe
2024-08-21 18:10 ` [PATCH v3 01/10] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-08-23 13:48   ` Jonathan Cameron
2024-08-21 18:10 ` [PATCH v3 02/10] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2024-08-23 14:02   ` Jonathan Cameron
2024-08-27 14:56     ` Jason Gunthorpe
2024-08-21 18:10 ` [PATCH v3 03/10] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2024-08-23 14:14   ` Jonathan Cameron
2024-08-27 14:47     ` Jason Gunthorpe
2024-08-27 14:55       ` Andy Gospodarek
2024-08-21 18:10 ` [PATCH v3 04/10] taint: Add TAINT_FWCTL Jason Gunthorpe
2024-08-21 23:35   ` Greg Kroah-Hartman
2024-08-22 15:34     ` Jason Gunthorpe
2024-08-21 18:10 ` [PATCH v3 05/10] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
2024-08-21 23:49   ` Jakub Kicinski
2024-08-22  0:14     ` Jason Gunthorpe
2024-08-22  0:30       ` Jakub Kicinski
2024-08-27 15:27         ` Jason Gunthorpe
2024-08-23 14:23   ` Jonathan Cameron
2024-08-21 18:10 ` [PATCH v3 06/10] fwctl: Add documentation Jason Gunthorpe
2024-08-23 14:35   ` Jonathan Cameron
2024-08-27 14:58     ` Jason Gunthorpe
2024-08-21 18:10 ` [PATCH v3 07/10] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
2024-08-23 14:48   ` Jonathan Cameron
2024-08-27 15:07     ` Jason Gunthorpe
2024-08-21 18:11 ` [PATCH v3 08/10] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
2024-08-21 18:11 ` [PATCH v3 09/10] fwctl/cxl: Add driver for CXL mailbox for handling CXL features commands (RFC) Jason Gunthorpe
2024-08-21 18:11 ` [PATCH v3 10/10] cxl: Create an auxiliary device for fwctl_cxl Jason Gunthorpe
2024-09-13 22:39 ` [PATCH v3 00/10] Introduce fwctl subystem Dave Jiang
2024-09-16  7:54   ` Leon Romanovsky
2024-09-17 20:59   ` Dave Jiang
2024-12-05 22:28 ` Shannon Nelson
2024-12-05 23:58   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).