From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>
Cc: Chiara Meiohas <cmeiohas@nvidia.com>,
linux-rdma@vger.kernel.org, Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH rdma-next v1 1/6] RDMA/uverbs: Introduce UCAP (User CAPabilities) API
Date: Sat, 8 Mar 2025 08:25:55 +0100 [thread overview]
Message-ID: <42bdf595-4b4d-475f-9dcf-13176bdaa1ea@linux.dev> (raw)
In-Reply-To: <5a1379187cd21178e8554afc81a3c941f21af22f.1741261611.git.leon@kernel.org>
在 2025/3/6 12:51, Leon Romanovsky 写道:
> From: Chiara Meiohas <cmeiohas@nvidia.com>
>
> Implement a new User CAPabilities (UCAP) API to provide fine-grained
> control over specific firmware features.
>
> This approach offers more granular capabilities than the existing Linux
> capabilities, which may be too generic for certain FW features.
>
> This mechanism represents each capability as a character device with
> root read-write access. Root processes can grant users special
> privileges by allowing access to these character devices (e.g., using
> chown).
Hi, Chiara
I read this patch-set carefully. If I get this patch-set correctly, this
patch-set introduces a new User CAPabilities API to control specific
firmware feature.
Do we have a user guide to use this UCAP? For example, we suspect that a
Firmware problem will occur in production environment, how can we use
this UCAP to debug this Firmware problem?
Thanks.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
>
> UCAP character devices are located in /dev/infiniband and the class path
> is /sys/class/infiniband_ucaps.
>
> Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
> Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> ---
> drivers/infiniband/core/Makefile | 3 +-
> drivers/infiniband/core/ucaps.c | 267 ++++++++++++++++++++++++++
> drivers/infiniband/core/uverbs_main.c | 2 +
> include/rdma/ib_ucaps.h | 25 +++
> 4 files changed, 296 insertions(+), 1 deletion(-)
> create mode 100644 drivers/infiniband/core/ucaps.c
> create mode 100644 include/rdma/ib_ucaps.h
>
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index 8ab4eea5a0a5..d49ded7e95f0 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -39,6 +39,7 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
> uverbs_std_types_async_fd.o \
> uverbs_std_types_srq.o \
> uverbs_std_types_wq.o \
> - uverbs_std_types_qp.o
> + uverbs_std_types_qp.o \
> + ucaps.o
> ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
> ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
> diff --git a/drivers/infiniband/core/ucaps.c b/drivers/infiniband/core/ucaps.c
> new file mode 100644
> index 000000000000..6853c6d078f9
> --- /dev/null
> +++ b/drivers/infiniband/core/ucaps.c
> @@ -0,0 +1,267 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/kref.h>
> +#include <linux/cdev.h>
> +#include <linux/mutex.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <rdma/ib_ucaps.h>
> +
> +#define RDMA_UCAP_FIRST RDMA_UCAP_MLX5_CTRL_LOCAL
> +
> +static DEFINE_MUTEX(ucaps_mutex);
> +static struct ib_ucap *ucaps_list[RDMA_UCAP_MAX];
> +static bool ucaps_class_is_registered;
> +static dev_t ucaps_base_dev;
> +
> +struct ib_ucap {
> + struct cdev cdev;
> + struct device dev;
> + struct kref ref;
> +};
> +
> +static const char *ucap_names[RDMA_UCAP_MAX] = {
> + [RDMA_UCAP_MLX5_CTRL_LOCAL] = "mlx5_perm_ctrl_local",
> + [RDMA_UCAP_MLX5_CTRL_OTHER_VHCA] = "mlx5_perm_ctrl_other_vhca"
> +};
> +
> +static char *ucaps_devnode(const struct device *dev, umode_t *mode)
> +{
> + if (mode)
> + *mode = 0600;
> +
> + return kasprintf(GFP_KERNEL, "infiniband/%s", dev_name(dev));
> +}
> +
> +static const struct class ucaps_class = {
> + .name = "infiniband_ucaps",
> + .devnode = ucaps_devnode,
> +};
> +
> +static const struct file_operations ucaps_cdev_fops = {
> + .owner = THIS_MODULE,
> + .open = simple_open,
> +};
> +
> +/**
> + * ib_cleanup_ucaps - cleanup all API resources and class.
> + *
> + * This is called once, when removing the ib_uverbs module.
> + */
> +void ib_cleanup_ucaps(void)
> +{
> + mutex_lock(&ucaps_mutex);
> + if (!ucaps_class_is_registered) {
> + mutex_unlock(&ucaps_mutex);
> + return;
> + }
> +
> + for (int i = RDMA_UCAP_FIRST; i < RDMA_UCAP_MAX; i++)
> + WARN_ON(ucaps_list[i]);
> +
> + class_unregister(&ucaps_class);
> + ucaps_class_is_registered = false;
> + unregister_chrdev_region(ucaps_base_dev, RDMA_UCAP_MAX);
> + mutex_unlock(&ucaps_mutex);
> +}
> +
> +static int get_ucap_from_devt(dev_t devt, u64 *idx_mask)
> +{
> + for (int type = RDMA_UCAP_FIRST; type < RDMA_UCAP_MAX; type++) {
> + if (ucaps_list[type] && ucaps_list[type]->dev.devt == devt) {
> + *idx_mask |= 1 << type;
> + return 0;
> + }
> + }
> +
> + return -EINVAL;
> +}
> +
> +static int get_devt_from_fd(unsigned int fd, dev_t *ret_dev)
> +{
> + struct file *file;
> +
> + file = fget(fd);
> + if (!file)
> + return -EBADF;
> +
> + *ret_dev = file_inode(file)->i_rdev;
> + fput(file);
> + return 0;
> +}
> +
> +/**
> + * ib_ucaps_init - Initialization required before ucap creation.
> + *
> + * Return: 0 on success, or a negative errno value on failure
> + */
> +static int ib_ucaps_init(void)
> +{
> + int ret = 0;
> +
> + if (ucaps_class_is_registered)
> + return ret;
> +
> + ret = class_register(&ucaps_class);
> + if (ret)
> + return ret;
> +
> + ret = alloc_chrdev_region(&ucaps_base_dev, 0, RDMA_UCAP_MAX,
> + ucaps_class.name);
> + if (ret < 0) {
> + class_unregister(&ucaps_class);
> + return ret;
> + }
> +
> + ucaps_class_is_registered = true;
> +
> + return 0;
> +}
> +
> +static void ucap_dev_release(struct device *device)
> +{
> + struct ib_ucap *ucap = container_of(device, struct ib_ucap, dev);
> +
> + kfree(ucap);
> +}
> +
> +/**
> + * ib_create_ucap - Add a ucap character device
> + * @type: UCAP type
> + *
> + * Creates a ucap character device in the /dev/infiniband directory. By default,
> + * the device has root-only read-write access.
> + *
> + * A driver may call this multiple times with the same UCAP type. A reference
> + * count tracks creations and deletions.
> + *
> + * Return: 0 on success, or a negative errno value on failure
> + */
> +int ib_create_ucap(enum rdma_user_cap type)
> +{
> + struct ib_ucap *ucap;
> + int ret;
> +
> + if (type >= RDMA_UCAP_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&ucaps_mutex);
> + ret = ib_ucaps_init();
> + if (ret)
> + goto unlock;
> +
> + ucap = ucaps_list[type];
> + if (ucap) {
> + kref_get(&ucap->ref);
> + mutex_unlock(&ucaps_mutex);
> + return 0;
> + }
> +
> + ucap = kzalloc(sizeof(*ucap), GFP_KERNEL);
> + if (!ucap) {
> + ret = -ENOMEM;
> + goto unlock;
> + }
> +
> + device_initialize(&ucap->dev);
> + ucap->dev.class = &ucaps_class;
> + ucap->dev.devt = MKDEV(MAJOR(ucaps_base_dev), type);
> + ucap->dev.release = ucap_dev_release;
> + ret = dev_set_name(&ucap->dev, ucap_names[type]);
> + if (ret)
> + goto err_device;
> +
> + cdev_init(&ucap->cdev, &ucaps_cdev_fops);
> + ucap->cdev.owner = THIS_MODULE;
> +
> + ret = cdev_device_add(&ucap->cdev, &ucap->dev);
> + if (ret)
> + goto err_device;
> +
> + kref_init(&ucap->ref);
> + ucaps_list[type] = ucap;
> + mutex_unlock(&ucaps_mutex);
> +
> + return 0;
> +
> +err_device:
> + put_device(&ucap->dev);
> +unlock:
> + mutex_unlock(&ucaps_mutex);
> + return ret;
> +}
> +EXPORT_SYMBOL(ib_create_ucap);
> +
> +static void ib_release_ucap(struct kref *ref)
> +{
> + struct ib_ucap *ucap = container_of(ref, struct ib_ucap, ref);
> + enum rdma_user_cap type;
> +
> + for (type = RDMA_UCAP_FIRST; type < RDMA_UCAP_MAX; type++) {
> + if (ucaps_list[type] == ucap)
> + break;
> + }
> + WARN_ON(type == RDMA_UCAP_MAX);
> +
> + ucaps_list[type] = NULL;
> + cdev_device_del(&ucap->cdev, &ucap->dev);
> + put_device(&ucap->dev);
> +}
> +
> +/**
> + * ib_remove_ucap - Remove a ucap character device
> + * @type: User cap type
> + *
> + * Removes the ucap character device according to type. The device is completely
> + * removed from the filesystem when its reference count reaches 0.
> + */
> +void ib_remove_ucap(enum rdma_user_cap type)
> +{
> + struct ib_ucap *ucap;
> +
> + mutex_lock(&ucaps_mutex);
> + ucap = ucaps_list[type];
> + if (WARN_ON(!ucap))
> + goto end;
> +
> + kref_put(&ucap->ref, ib_release_ucap);
> +end:
> + mutex_unlock(&ucaps_mutex);
> +}
> +EXPORT_SYMBOL(ib_remove_ucap);
> +
> +/**
> + * ib_get_ucaps - Get bitmask of ucap types from file descriptors
> + * @fds: Array of file descriptors
> + * @fd_count: Number of file descriptors in the array
> + * @idx_mask: Bitmask to be updated based on the ucaps in the fd list
> + *
> + * Given an array of file descriptors, this function returns a bitmask of
> + * the ucaps where a bit is set if an FD for that ucap type was in the array.
> + *
> + * Return: 0 on success, or a negative errno value on failure
> + */
> +int ib_get_ucaps(int *fds, int fd_count, uint64_t *idx_mask)
> +{
> + int ret = 0;
> + dev_t dev;
> +
> + *idx_mask = 0;
> + mutex_lock(&ucaps_mutex);
> + for (int i = 0; i < fd_count; i++) {
> + ret = get_devt_from_fd(fds[i], &dev);
> + if (ret)
> + goto end;
> +
> + ret = get_ucap_from_devt(dev, idx_mask);
> + if (ret)
> + goto end;
> + }
> +
> +end:
> + mutex_unlock(&ucaps_mutex);
> + return ret;
> +}
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 85cfc790a7bb..973fe2c7ef53 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -52,6 +52,7 @@
> #include <rdma/ib.h>
> #include <rdma/uverbs_std_types.h>
> #include <rdma/rdma_netlink.h>
> +#include <rdma/ib_ucaps.h>
>
> #include "uverbs.h"
> #include "core_priv.h"
> @@ -1345,6 +1346,7 @@ static void __exit ib_uverbs_cleanup(void)
> IB_UVERBS_NUM_FIXED_MINOR);
> unregister_chrdev_region(dynamic_uverbs_dev,
> IB_UVERBS_NUM_DYNAMIC_MINOR);
> + ib_cleanup_ucaps();
> mmu_notifier_synchronize();
> }
>
> diff --git a/include/rdma/ib_ucaps.h b/include/rdma/ib_ucaps.h
> new file mode 100644
> index 000000000000..8f0552a2b2b0
> --- /dev/null
> +++ b/include/rdma/ib_ucaps.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#ifndef _IB_UCAPS_H_
> +#define _IB_UCAPS_H_
> +
> +#define UCAP_ENABLED(ucaps, type) (!!((ucaps) & (1U << (type))))
> +
> +enum rdma_user_cap {
> + RDMA_UCAP_MLX5_CTRL_LOCAL,
> + RDMA_UCAP_MLX5_CTRL_OTHER_VHCA,
> + RDMA_UCAP_MAX
> +};
> +
> +void ib_cleanup_ucaps(void);
> +
> +int ib_create_ucap(enum rdma_user_cap type);
> +
> +void ib_remove_ucap(enum rdma_user_cap type);
> +
> +int ib_get_ucaps(int *fds, int fd_count, uint64_t *idx_mask);
> +
> +#endif /* _IB_UCAPS_H_ */
next prev parent reply other threads:[~2025-03-08 7:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-06 11:51 [PATCH rdma-next v1 0/6] Introduce UCAP API and usage in mlx5 Leon Romanovsky
2025-03-06 11:51 ` [PATCH rdma-next v1 1/6] RDMA/uverbs: Introduce UCAP (User CAPabilities) API Leon Romanovsky
2025-03-08 7:25 ` Zhu Yanjun [this message]
2025-03-08 19:21 ` Leon Romanovsky
2025-03-06 11:51 ` [PATCH rdma-next v1 2/6] RDMA/mlx5: Create UCAP char devices for supported device capabilities Leon Romanovsky
2025-03-06 11:51 ` [PATCH rdma-next v1 3/6] RDMA/uverbs: Add support for UCAPs in context creation Leon Romanovsky
2025-03-06 11:51 ` [PATCH rdma-next v1 4/6] RDMA/mlx5: Check enabled UCAPs when creating ucontext Leon Romanovsky
2025-03-06 11:51 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Expose RDMA TRANSPORT flow table types to userspace Leon Romanovsky
2025-03-06 11:51 ` [PATCH rdma-next v1 6/6] docs: infiniband: document the UCAP API Leon Romanovsky
2025-03-08 19:23 ` [PATCH rdma-next v1 0/6] Introduce UCAP API and usage in mlx5 Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42bdf595-4b4d-475f-9dcf-13176bdaa1ea@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=cmeiohas@nvidia.com \
--cc=jgg@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox