From: Alex Williamson <alex.williamson@redhat.com>
To: Zhi Wang <zhiw@nvidia.com>
Cc: <kvm@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
<kevin.tian@intel.com>, <jgg@nvidia.com>,
<alison.schofield@intel.com>, <dan.j.williams@intel.com>,
<dave.jiang@intel.com>, <dave@stgolabs.net>,
<jonathan.cameron@huawei.com>, <ira.weiny@intel.com>,
<vishal.l.verma@intel.com>, <alucerop@amd.com>,
<acurrid@nvidia.com>, <cjia@nvidia.com>, <smitra@nvidia.com>,
<ankita@nvidia.com>, <aniketa@nvidia.com>, <kwankhede@nvidia.com>,
<targupta@nvidia.com>, <zhiwang@kernel.org>
Subject: Re: [RFC 04/13] vfio: introduce vfio-cxl core preludes
Date: Fri, 11 Oct 2024 12:33:51 -0600 [thread overview]
Message-ID: <20241011123351.27474f2b.alex.williamson@redhat.com> (raw)
In-Reply-To: <20240920223446.1908673-5-zhiw@nvidia.com>
On Fri, 20 Sep 2024 15:34:37 -0700
Zhi Wang <zhiw@nvidia.com> wrote:
> In VFIO, common functions that used by VFIO variant drivers are managed
> in a set of "core" functions. E.g. the vfio-pci-core provides the common
> functions used by VFIO variant drviers to support PCI device
> passhthrough.
>
> Although the CXL type-2 device has a PCI-compatible interface for device
> configuration and programming, they still needs special handlings when
> initialize the device:
>
> - Probing the CXL DVSECs in the configuration.
> - Probing the CXL register groups implemented by the device.
> - Configuring the CXL device state required by the kernel CXL core.
> - Create the CXL region.
> - Special handlings of the CXL MMIO BAR.
>
> Introduce vfio-cxl core predules to hold all the common functions used
s/predules/preludes/
> by VFIO variant drivers to support CXL device passthrough.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
> drivers/vfio/pci/Kconfig | 4 +
> drivers/vfio/pci/Makefile | 3 +
> drivers/vfio/pci/vfio_cxl_core.c | 264 +++++++++++++++++++++++++++++++
> include/linux/vfio_pci_core.h | 37 +++++
> 4 files changed, 308 insertions(+)
> create mode 100644 drivers/vfio/pci/vfio_cxl_core.c
>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index bf50ffa10bde..2196e79b132b 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -7,6 +7,10 @@ config VFIO_PCI_CORE
> select VFIO_VIRQFD
> select IRQ_BYPASS_MANAGER
>
> +config VFIO_CXL_CORE
> + tristate
> + select VFIO_PCI_CORE
I don't see anything in this series that depends on CXL Kconfigs, so it
seems this will break in randconfig when the resulting vfio-cxl variant
driver is enabled without core CXL support.
> +
> config VFIO_PCI_MMAP
> def_bool y if !S390
> depends on VFIO_PCI_CORE
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index cf00c0a7e55c..b51221b94b0b 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -8,6 +8,9 @@ vfio-pci-y := vfio_pci.o
> vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
> obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
>
> +vfio-cxl-core-y := vfio_cxl_core.o
> +obj-$(CONFIG_VFIO_CXL_CORE) += vfio-cxl-core.o
> +
> obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/
>
> obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> diff --git a/drivers/vfio/pci/vfio_cxl_core.c b/drivers/vfio/pci/vfio_cxl_core.c
> new file mode 100644
> index 000000000000..6a7859333f67
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_cxl_core.c
> @@ -0,0 +1,264 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/device.h>
> +#include <linux/eventfd.h>
> +#include <linux/file.h>
> +#include <linux/interrupt.h>
> +#include <linux/iommu.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/notifier.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +
> +#include "vfio_pci_priv.h"
> +
> +#define DRIVER_AUTHOR "Zhi Wang <zhiw@nvidia.com>"
> +#define DRIVER_DESC "core driver for VFIO based CXL devices"
> +
> +static int get_hpa_and_request_dpa(struct vfio_pci_core_device *core_dev)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> + struct pci_dev *pdev = core_dev->pdev;
> + u64 max;
> +
> + cxl->cxlrd = cxl_get_hpa_freespace(cxl->endpoint, 1,
> + CXL_DECODER_F_RAM |
> + CXL_DECODER_F_TYPE2,
> + &max);
I don't see that this adhere to the comment in cxl_get_hpa_freespace()
that the caller needs to deal with the elevated ref count on the root
decoder. There's no put_device() call in either the error path or
disable path.
Also, maybe this is inherent in the cxl code, but cxl->cxlrd seems
redundant to me, couldn't we refer to this as cxl->root_decoder? (or
some variant more descriptive than "rd")
Is this exclusively a type2 extension or how do you envision type1/3
devices with vfio?
> + if (IS_ERR(cxl->cxlrd)) {
> + pci_err(pdev, "Fail to get HPA space.\n");
> + return PTR_ERR(cxl->cxlrd);
> + }
> +
> + if (max < cxl->region.size) {
> + pci_err(pdev, "No enough free HPA space %llu < %llu\n",
> + max, cxl->region.size);
> + return -ENOSPC;
> + }
> +
> + cxl->cxled = cxl_request_dpa(cxl->endpoint, true, cxl->region.size,
> + cxl->region.size);
cxl->endpoint_decoder? cxl->endp_dec?
> + if (IS_ERR(cxl->cxled)) {
> + pci_err(pdev, "Fail to request DPA\n");
> + return PTR_ERR(cxl->cxled);
> + }
> +
> + return 0;
> +}
> +
> +static int create_cxl_region(struct vfio_pci_core_device *core_dev)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> + struct pci_dev *pdev = core_dev->pdev;
> + resource_size_t start, end;
> + int ret;
> +
> + ret = cxl_accel_request_resource(cxl->cxlds, true);
> + if (ret) {
> + pci_err(pdev, "Fail to request CXL resource\n");
> + return ret;
> + }
Where is the corresponding release_resource()?
> +
> + if (!cxl_await_media_ready(cxl->cxlds)) {
> + cxl_accel_set_media_ready(cxl->cxlds);
> + } else {
> + pci_err(pdev, "CXL media is not active\n");
> + return ret;
> + }
We're not capturing the media ready error for this return. I think
Jason would typically suggest a success oriented flow as:
ret = cxl_await_media_ready(cxl->cxlds)
if (ret) {
pci_err(...);
return ret;
}
cxl_accel_set_media_ready(cxl->cxlds);
> +
> + cxl->cxlmd = devm_cxl_add_memdev(&pdev->dev, cxl->cxlds);
> + if (IS_ERR(cxl->cxlmd)) {
> + pci_err(pdev, "Fail to create CXL memdev\n");
> + return PTR_ERR(cxl->cxlmd);
> + }
> +
> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
> + if (IS_ERR(cxl->endpoint)) {
> + pci_err(pdev, "Fail to acquire CXL endpoint\n");
> + return PTR_ERR(cxl->endpoint);
> + }
> +
> + ret = get_hpa_and_request_dpa(core_dev);
> + if (ret)
> + goto out;
> +
> + cxl->region.region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1);
> + if (IS_ERR(cxl->region.region)) {
> + ret = PTR_ERR(cxl->region.region);
> + pci_err(pdev, "Fail to create CXL region\n");
> + cxl_dpa_free(cxl->cxled);
> + goto out;
> + }
> +
> + cxl_accel_get_region_params(cxl->region.region, &start, &end);
> +
> + cxl->region.addr = start;
> +out:
> + cxl_release_endpoint(cxl->cxlmd, cxl->endpoint);
> + return ret;
> +}
> +
> +/* Standard CXL-type 2 driver initialization sequence */
> +static int enable_cxl(struct vfio_pci_core_device *core_dev, u16 dvsec)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> + struct pci_dev *pdev = core_dev->pdev;
> + u32 count;
> + u64 offset, size;
> + int ret;
> +
> + cxl->cxlds = cxl_accel_state_create(&pdev->dev, cxl->caps);
> + if (IS_ERR(cxl->cxlds))
> + return PTR_ERR(cxl->cxlds);
> +
> + cxl_accel_set_dvsec(cxl->cxlds, dvsec);
> + cxl_accel_set_serial(cxl->cxlds, pdev->dev.id);
Doesn't seem to meet the description were cxl_device_state.serial is
described as the PCIe device serial number, not a struct device
instance number.
> +
> + cxl_accel_set_resource(cxl->cxlds, cxl->dpa_res, CXL_ACCEL_RES_DPA);
> + cxl_accel_set_resource(cxl->cxlds, cxl->ram_res, CXL_ACCEL_RES_RAM);
> +
> + ret = cxl_pci_accel_setup_regs(pdev, cxl->cxlds);
> + if (ret) {
> + pci_err(pdev, "Fail to setup CXL accel regs\n");
> + return ret;
> + }
> +
> + ret = cxl_get_hdm_info(cxl->cxlds, &count, &offset, &size);
> + if (ret)
> + return ret;
> +
> + if (!count || !size) {
> + pci_err(pdev, "Fail to find CXL HDM reg offset\n");
> + return -ENODEV;
> + }
> +
> + cxl->hdm_count = count;
> + cxl->hdm_reg_offset = offset;
> + cxl->hdm_reg_size = size;
> +
> + return create_cxl_region(core_dev);
> +}
> +
> +static void disable_cxl(struct vfio_pci_core_device *core_dev)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> +
> + if (cxl->region.region)
> + cxl_region_detach(cxl->cxled);
> +
> + if (cxl->cxled)
> + cxl_dpa_free(cxl->cxled);
> +}
> +
> +int vfio_cxl_core_enable(struct vfio_pci_core_device *core_dev)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> + struct pci_dev *pdev = core_dev->pdev;
> + u16 dvsec;
> + int ret;
> +
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (!dvsec)
> + return -ENODEV;
> +
> + if (!cxl->region.size)
> + return -EINVAL;
> +
> + ret = vfio_pci_core_enable(core_dev);
> + if (ret)
> + return ret;
> +
> + ret = enable_cxl(core_dev, dvsec);
> + if (ret)
> + goto err_enable_cxl_device;
> +
> + return 0;
> +
> +err_enable_cxl_device:
> + vfio_pci_core_disable(core_dev);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_cxl_core_enable);
These should all be _GPL symbols by default, right?
> +
> +void vfio_cxl_core_finish_enable(struct vfio_pci_core_device *core_dev)
> +{
> + vfio_pci_core_finish_enable(core_dev);
> +}
> +EXPORT_SYMBOL(vfio_cxl_core_finish_enable);
> +
> +void vfio_cxl_core_close_device(struct vfio_device *vdev)
> +{
> + struct vfio_pci_core_device *core_dev =
> + container_of(vdev, struct vfio_pci_core_device, vdev);
> +
> + disable_cxl(core_dev);
> + vfio_pci_core_close_device(vdev);
> +}
> +EXPORT_SYMBOL(vfio_cxl_core_close_device);
> +
> +/*
> + * Configure the resource required by the kernel CXL core:
> + * device DPA and device RAM size
> + */
> +void vfio_cxl_core_set_resource(struct vfio_pci_core_device *core_dev,
> + struct resource res,
> + enum accel_resource type)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> +
> + switch (type) {
> + case CXL_ACCEL_RES_DPA:
> + cxl->dpa_size = res.end - res.start + 1;
> + cxl->dpa_res = res;
> + break;
> +
> + case CXL_ACCEL_RES_RAM:
> + cxl->ram_res = res;
> + break;
> +
> + default:
> + WARN(1, "invalid resource type: %d\n", type);
> + break;
> + }
> +}
> +EXPORT_SYMBOL(vfio_cxl_core_set_resource);
It's not obvious to me why we want to multiplex these through one
function rather than have separate functions to set the dpa and ram.
The usage in patch 12/ doesn't really dictate a multiplexed function.
> +
> +/* Configure the expected CXL region size to be created */
> +void vfio_cxl_core_set_region_size(struct vfio_pci_core_device *core_dev,
> + u64 size)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> +
> + if (WARN_ON(size > cxl->dpa_size))
> + return;
> +
> + if (WARN_ON(cxl->region.region))
> + return;
> +
> + cxl->region.size = size;
> +}
> +EXPORT_SYMBOL(vfio_cxl_core_set_region_size);
> +
> +/* Configure the driver cap required by the kernel CXL core */
> +void vfio_cxl_core_set_driver_hdm_cap(struct vfio_pci_core_device *core_dev)
> +{
> + struct vfio_cxl *cxl = &core_dev->cxl;
> +
> + cxl->caps |= CXL_ACCEL_DRIVER_CAP_HDM;
> +}
> +EXPORT_SYMBOL(vfio_cxl_core_set_driver_hdm_cap);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> +MODULE_IMPORT_NS(CXL);
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index fbb472dd99b3..7762d4a3e825 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -15,6 +15,8 @@
> #include <linux/types.h>
> #include <linux/uuid.h>
> #include <linux/notifier.h>
> +#include <linux/cxl_accel_mem.h>
> +#include <linux/cxl_accel_pci.h>
>
> #ifndef VFIO_PCI_CORE_H
> #define VFIO_PCI_CORE_H
> @@ -49,6 +51,31 @@ struct vfio_pci_region {
> u32 flags;
> };
>
> +struct vfio_cxl_region {
> + u64 size;
> + u64 addr;
> + struct cxl_region *region;
> +};
> +
> +struct vfio_cxl {
> + u8 caps;
> + u64 dpa_size;
> +
> + u32 hdm_count;
Poor packing, caps and hdm_count should at least be adjacent to leave
only a single 24-bit gap.
> + u64 hdm_reg_offset;
> + u64 hdm_reg_size;
> +
> + struct cxl_dev_state *cxlds;
> + struct cxl_memdev *cxlmd;
> + struct cxl_root_decoder *cxlrd;
> + struct cxl_port *endpoint;
> + struct cxl_endpoint_decoder *cxled;
> + struct resource dpa_res;
> + struct resource ram_res;
> +
> + struct vfio_cxl_region region;
> +};
> +
> struct vfio_pci_core_device {
> struct vfio_device vdev;
> struct pci_dev *pdev;
> @@ -94,6 +121,7 @@ struct vfio_pci_core_device {
> struct vfio_pci_core_device *sriov_pf_core_dev;
> struct notifier_block nb;
> struct rw_semaphore memory_lock;
> + struct vfio_cxl cxl;
I'd prefer we not embed a structure here that's unused for 100% of
current use cases. Why can't we have:
struct vfio_cxl_core_device {
struct vfio_pci_core_device pci_core;
struct vfio_cxl clx;
};
Thanks,
Alex
> };
>
> /* Will be exported for vfio pci drivers usage */
> @@ -159,4 +187,13 @@ VFIO_IOREAD_DECLARATION(32)
> VFIO_IOREAD_DECLARATION(64)
> #endif
>
> +int vfio_cxl_core_enable(struct vfio_pci_core_device *core_dev);
> +void vfio_cxl_core_finish_enable(struct vfio_pci_core_device *core_dev);
> +void vfio_cxl_core_close_device(struct vfio_device *vdev);
> +void vfio_cxl_core_set_resource(struct vfio_pci_core_device *core_dev,
> + struct resource res,
> + enum accel_resource type);
> +void vfio_cxl_core_set_region_size(struct vfio_pci_core_device *core_dev,
> + u64 size);
> +void vfio_cxl_core_set_driver_hdm_cap(struct vfio_pci_core_device *core_dev);
> #endif /* VFIO_PCI_CORE_H */
next prev parent reply other threads:[~2024-10-11 18:34 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-20 22:34 [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough Zhi Wang
2024-09-20 22:34 ` [RFC 01/13] cxl: allow a type-2 device not to have memory device registers Zhi Wang
2024-09-23 8:01 ` Tian, Kevin
2024-09-23 15:38 ` Dave Jiang
2024-09-24 8:03 ` Zhi Wang
2024-09-20 22:34 ` [RFC 02/13] cxl: introduce cxl_get_hdm_info() Zhi Wang
2024-10-17 15:44 ` Jonathan Cameron
2024-10-19 5:38 ` Zhi Wang
2024-09-20 22:34 ` [RFC 03/13] cxl: introduce cxl_find_comp_reglock_offset() Zhi Wang
2024-09-20 22:34 ` [RFC 04/13] vfio: introduce vfio-cxl core preludes Zhi Wang
2024-10-11 18:33 ` Alex Williamson [this message]
2024-09-20 22:34 ` [RFC 05/13] vfio/cxl: expose CXL region to the usersapce via a new VFIO device region Zhi Wang
2024-10-11 19:12 ` Alex Williamson
2024-09-20 22:34 ` [RFC 06/13] vfio/pci: expose vfio_pci_rw() Zhi Wang
2024-09-20 22:34 ` [RFC 07/13] vfio/cxl: introduce vfio_cxl_core_{read, write}() Zhi Wang
2024-09-20 22:34 ` [RFC 08/13] vfio/cxl: emulate HDM decoder registers Zhi Wang
2024-09-20 22:34 ` [RFC 09/13] vfio/pci: introduce CXL device awareness Zhi Wang
2024-10-11 20:37 ` Alex Williamson
2024-09-20 22:34 ` [RFC 10/13] vfio/pci: emulate CXL DVSEC registers in the configuration space Zhi Wang
2024-10-11 21:02 ` Alex Williamson
2024-09-20 22:34 ` [RFC 11/13] vfio/cxl: introduce VFIO CXL device cap Zhi Wang
2024-10-11 21:14 ` Alex Williamson
2024-09-20 22:34 ` [RFC 12/13] vfio/cxl: VFIO variant driver for QEMU CXL accel device Zhi Wang
2024-09-20 22:34 ` [RFC 13/13] vfio/cxl: workaround: don't take resource region when cxl is enabled Zhi Wang
2024-09-23 8:00 ` [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough Tian, Kevin
2024-09-24 8:30 ` Zhi Wang
2024-09-25 13:05 ` Jonathan Cameron
2024-09-27 7:18 ` Zhi Wang
2024-10-04 11:40 ` Jonathan Cameron
2024-10-19 5:30 ` Zhi Wang
2024-10-21 11:07 ` Alejandro Lucero Palau
2024-09-26 6:55 ` Tian, Kevin
2024-09-25 10:11 ` Alejandro Lucero Palau
2024-09-27 7:38 ` Zhi Wang
2024-09-27 7:38 ` Zhi Wang
2024-10-21 10:49 ` Zhi Wang
2024-10-21 13:10 ` Alejandro Lucero Palau
2024-10-30 11:56 ` Zhi Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241011123351.27474f2b.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=acurrid@nvidia.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jgg@nvidia.com \
--cc=jonathan.cameron@huawei.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=linux-cxl@vger.kernel.org \
--cc=smitra@nvidia.com \
--cc=targupta@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=zhiw@nvidia.com \
--cc=zhiwang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox