From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AADC8BE5 for ; Fri, 11 Oct 2024 18:33:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728671640; cv=none; b=psJ3k3PK/Cq199KyTjwMxhu8voocUVLHtt7k8hdE+Yav3zMj0aamemlOnfejZtLiUD7Pa+WXp/bOGmQ9DwXM4UqxD69yOTr6eUP2ALFIZEQIoIQwfijsccp4Fb73+yY2RNP+xJVun0S3DhotqNP1lddRjbNFm5bnYVy9/lrFdwk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728671640; c=relaxed/simple; bh=3/GHCGogAuhYSv//x85j/Z6b9fGW4Nuf2Rs+fdP7h/c=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a4a3Zus40IsbKGQx7BdNU7fqeADxMTBXH3UbQnAU5p+Vzer+qM1rJq2FarDrgtS7y3C8ZKNAWoUsZPrxEyXSvpcCUJiQOHLbcugtqlJ71uQGYOuaXJHtMAVr7fimL5qn3f3yoTyKR4RfXaUZDa69dp7PRejXPCGzqBj3pHEOXRM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iPk5XT7T; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iPk5XT7T" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728671637; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VWEsJtQknKO+f6wwawdGRc5BYz0aknSe6+kTVKNHDEQ=; b=iPk5XT7TtQ4KoZ+8GZVLo+RohG9DsGNaSn6+WTT8tIMy/BrXvQokWaB9W1PknPG52gkdqD WNQ0wBFeb6749FiWO/MREyYgWHn0+CAnNNu1J3AxsicKEVB8r71wl3MjyA6KajPEgTPbDV n6dnPT5sIJjlqtUyLo7OhkD9gM+Gu/0= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-193--IKEKGzfO1GNayVA0xuMTA-1; Fri, 11 Oct 2024 14:33:56 -0400 X-MC-Unique: -IKEKGzfO1GNayVA0xuMTA-1 Received: by mail-io1-f70.google.com with SMTP id ca18e2360f4ac-83549342337so34895139f.3 for ; Fri, 11 Oct 2024 11:33:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728671636; x=1729276436; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=VWEsJtQknKO+f6wwawdGRc5BYz0aknSe6+kTVKNHDEQ=; b=s8jpOb6sDdWmAgQUGxlnCDx3rRCP/h8bryz3jbZn4twAjzkP0WmLHGsYUNB+/cZp0w 0vAQYzHYwUzramnVF/nsgP7FwOrGXUZOestpa/O16kwdZ1CrIGdWhj9zMl122G4Esn4F oBwuoIp+XsrCCF8WRTbI7ODIcv2n8CTReoAGOgs9PsB1pKHLEFEnm7T3xwUXp2514j8w WjwOTEk2CFAOCFCjPsDaGVsm3ZTxM/0H+oE9FuLOFmf9ZRZIkj1NDf9GPUoCi7jamyD/ CpkTMO5WHWSt/14b9+5pZhsBF9LGzMcPJ7NmTGuNO2lPvNTDIg3SABGAPh6GMfvmd8Vd yu3A== X-Forwarded-Encrypted: i=1; AJvYcCX5zbgGJCSEHA12Hisx+wv1ofMt7C00f5v7F3JD0v/TjszilDho1HPVrk0LgBQBHFofRjrh/vvk6ro=@vger.kernel.org X-Gm-Message-State: AOJu0YzZ0Qp5J93ZOyT+D8sekDxDYvEngW9P+Xv5tY3kz9/IH2Ui7MJb BsnB8/OjfJDrBf3XB0ZHKDIATgt/UJeRrMJmbAUD8c/+SoL1ROLyNvezcfYHIdmczpmQTPPUtkA LilYFgt4aBf2XkrxfgPjjhG/mB2EHnoEAUzwA2HVRqwyxp85encXiUJTOfg== X-Received: by 2002:a5e:df09:0:b0:82c:edde:1284 with SMTP id ca18e2360f4ac-8378f459394mr85038939f.0.1728671635651; Fri, 11 Oct 2024 11:33:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEd6OXg+HD0QliOBqjfCfwDOnHpiNIAAOMAgvMMk32OIcgd0iGT0f7gaPJGraeLqmTu/XYybQ== X-Received: by 2002:a5e:df09:0:b0:82c:edde:1284 with SMTP id ca18e2360f4ac-8378f459394mr85037139f.0.1728671635073; Fri, 11 Oct 2024 11:33:55 -0700 (PDT) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dbada84a35sm749888173.117.2024.10.11.11.33.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Oct 2024 11:33:54 -0700 (PDT) Date: Fri, 11 Oct 2024 12:33:51 -0600 From: Alex Williamson To: Zhi Wang Cc: , , , , , , , , , , , , , , , , , , , Subject: Re: [RFC 04/13] vfio: introduce vfio-cxl core preludes Message-ID: <20241011123351.27474f2b.alex.williamson@redhat.com> In-Reply-To: <20240920223446.1908673-5-zhiw@nvidia.com> References: <20240920223446.1908673-1-zhiw@nvidia.com> <20240920223446.1908673-5-zhiw@nvidia.com> Organization: Red Hat Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 20 Sep 2024 15:34:37 -0700 Zhi Wang wrote: > In VFIO, common functions that used by VFIO variant drivers are managed > in a set of "core" functions. E.g. the vfio-pci-core provides the common > functions used by VFIO variant drviers to support PCI device > passhthrough. > > Although the CXL type-2 device has a PCI-compatible interface for device > configuration and programming, they still needs special handlings when > initialize the device: > > - Probing the CXL DVSECs in the configuration. > - Probing the CXL register groups implemented by the device. > - Configuring the CXL device state required by the kernel CXL core. > - Create the CXL region. > - Special handlings of the CXL MMIO BAR. > > Introduce vfio-cxl core predules to hold all the common functions used s/predules/preludes/ > by VFIO variant drivers to support CXL device passthrough. > > Signed-off-by: Zhi Wang > --- > drivers/vfio/pci/Kconfig | 4 + > drivers/vfio/pci/Makefile | 3 + > drivers/vfio/pci/vfio_cxl_core.c | 264 +++++++++++++++++++++++++++++++ > include/linux/vfio_pci_core.h | 37 +++++ > 4 files changed, 308 insertions(+) > create mode 100644 drivers/vfio/pci/vfio_cxl_core.c > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig > index bf50ffa10bde..2196e79b132b 100644 > --- a/drivers/vfio/pci/Kconfig > +++ b/drivers/vfio/pci/Kconfig > @@ -7,6 +7,10 @@ config VFIO_PCI_CORE > select VFIO_VIRQFD > select IRQ_BYPASS_MANAGER > > +config VFIO_CXL_CORE > + tristate > + select VFIO_PCI_CORE I don't see anything in this series that depends on CXL Kconfigs, so it seems this will break in randconfig when the resulting vfio-cxl variant driver is enabled without core CXL support. > + > config VFIO_PCI_MMAP > def_bool y if !S390 > depends on VFIO_PCI_CORE > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile > index cf00c0a7e55c..b51221b94b0b 100644 > --- a/drivers/vfio/pci/Makefile > +++ b/drivers/vfio/pci/Makefile > @@ -8,6 +8,9 @@ vfio-pci-y := vfio_pci.o > vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o > obj-$(CONFIG_VFIO_PCI) += vfio-pci.o > > +vfio-cxl-core-y := vfio_cxl_core.o > +obj-$(CONFIG_VFIO_CXL_CORE) += vfio-cxl-core.o > + > obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/ > > obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ > diff --git a/drivers/vfio/pci/vfio_cxl_core.c b/drivers/vfio/pci/vfio_cxl_core.c > new file mode 100644 > index 000000000000..6a7859333f67 > --- /dev/null > +++ b/drivers/vfio/pci/vfio_cxl_core.c > @@ -0,0 +1,264 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "vfio_pci_priv.h" > + > +#define DRIVER_AUTHOR "Zhi Wang " > +#define DRIVER_DESC "core driver for VFIO based CXL devices" > + > +static int get_hpa_and_request_dpa(struct vfio_pci_core_device *core_dev) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + struct pci_dev *pdev = core_dev->pdev; > + u64 max; > + > + cxl->cxlrd = cxl_get_hpa_freespace(cxl->endpoint, 1, > + CXL_DECODER_F_RAM | > + CXL_DECODER_F_TYPE2, > + &max); I don't see that this adhere to the comment in cxl_get_hpa_freespace() that the caller needs to deal with the elevated ref count on the root decoder. There's no put_device() call in either the error path or disable path. Also, maybe this is inherent in the cxl code, but cxl->cxlrd seems redundant to me, couldn't we refer to this as cxl->root_decoder? (or some variant more descriptive than "rd") Is this exclusively a type2 extension or how do you envision type1/3 devices with vfio? > + if (IS_ERR(cxl->cxlrd)) { > + pci_err(pdev, "Fail to get HPA space.\n"); > + return PTR_ERR(cxl->cxlrd); > + } > + > + if (max < cxl->region.size) { > + pci_err(pdev, "No enough free HPA space %llu < %llu\n", > + max, cxl->region.size); > + return -ENOSPC; > + } > + > + cxl->cxled = cxl_request_dpa(cxl->endpoint, true, cxl->region.size, > + cxl->region.size); cxl->endpoint_decoder? cxl->endp_dec? > + if (IS_ERR(cxl->cxled)) { > + pci_err(pdev, "Fail to request DPA\n"); > + return PTR_ERR(cxl->cxled); > + } > + > + return 0; > +} > + > +static int create_cxl_region(struct vfio_pci_core_device *core_dev) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + struct pci_dev *pdev = core_dev->pdev; > + resource_size_t start, end; > + int ret; > + > + ret = cxl_accel_request_resource(cxl->cxlds, true); > + if (ret) { > + pci_err(pdev, "Fail to request CXL resource\n"); > + return ret; > + } Where is the corresponding release_resource()? > + > + if (!cxl_await_media_ready(cxl->cxlds)) { > + cxl_accel_set_media_ready(cxl->cxlds); > + } else { > + pci_err(pdev, "CXL media is not active\n"); > + return ret; > + } We're not capturing the media ready error for this return. I think Jason would typically suggest a success oriented flow as: ret = cxl_await_media_ready(cxl->cxlds) if (ret) { pci_err(...); return ret; } cxl_accel_set_media_ready(cxl->cxlds); > + > + cxl->cxlmd = devm_cxl_add_memdev(&pdev->dev, cxl->cxlds); > + if (IS_ERR(cxl->cxlmd)) { > + pci_err(pdev, "Fail to create CXL memdev\n"); > + return PTR_ERR(cxl->cxlmd); > + } > + > + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd); > + if (IS_ERR(cxl->endpoint)) { > + pci_err(pdev, "Fail to acquire CXL endpoint\n"); > + return PTR_ERR(cxl->endpoint); > + } > + > + ret = get_hpa_and_request_dpa(core_dev); > + if (ret) > + goto out; > + > + cxl->region.region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1); > + if (IS_ERR(cxl->region.region)) { > + ret = PTR_ERR(cxl->region.region); > + pci_err(pdev, "Fail to create CXL region\n"); > + cxl_dpa_free(cxl->cxled); > + goto out; > + } > + > + cxl_accel_get_region_params(cxl->region.region, &start, &end); > + > + cxl->region.addr = start; > +out: > + cxl_release_endpoint(cxl->cxlmd, cxl->endpoint); > + return ret; > +} > + > +/* Standard CXL-type 2 driver initialization sequence */ > +static int enable_cxl(struct vfio_pci_core_device *core_dev, u16 dvsec) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + struct pci_dev *pdev = core_dev->pdev; > + u32 count; > + u64 offset, size; > + int ret; > + > + cxl->cxlds = cxl_accel_state_create(&pdev->dev, cxl->caps); > + if (IS_ERR(cxl->cxlds)) > + return PTR_ERR(cxl->cxlds); > + > + cxl_accel_set_dvsec(cxl->cxlds, dvsec); > + cxl_accel_set_serial(cxl->cxlds, pdev->dev.id); Doesn't seem to meet the description were cxl_device_state.serial is described as the PCIe device serial number, not a struct device instance number. > + > + cxl_accel_set_resource(cxl->cxlds, cxl->dpa_res, CXL_ACCEL_RES_DPA); > + cxl_accel_set_resource(cxl->cxlds, cxl->ram_res, CXL_ACCEL_RES_RAM); > + > + ret = cxl_pci_accel_setup_regs(pdev, cxl->cxlds); > + if (ret) { > + pci_err(pdev, "Fail to setup CXL accel regs\n"); > + return ret; > + } > + > + ret = cxl_get_hdm_info(cxl->cxlds, &count, &offset, &size); > + if (ret) > + return ret; > + > + if (!count || !size) { > + pci_err(pdev, "Fail to find CXL HDM reg offset\n"); > + return -ENODEV; > + } > + > + cxl->hdm_count = count; > + cxl->hdm_reg_offset = offset; > + cxl->hdm_reg_size = size; > + > + return create_cxl_region(core_dev); > +} > + > +static void disable_cxl(struct vfio_pci_core_device *core_dev) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + > + if (cxl->region.region) > + cxl_region_detach(cxl->cxled); > + > + if (cxl->cxled) > + cxl_dpa_free(cxl->cxled); > +} > + > +int vfio_cxl_core_enable(struct vfio_pci_core_device *core_dev) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + struct pci_dev *pdev = core_dev->pdev; > + u16 dvsec; > + int ret; > + > + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL, > + CXL_DVSEC_PCIE_DEVICE); > + if (!dvsec) > + return -ENODEV; > + > + if (!cxl->region.size) > + return -EINVAL; > + > + ret = vfio_pci_core_enable(core_dev); > + if (ret) > + return ret; > + > + ret = enable_cxl(core_dev, dvsec); > + if (ret) > + goto err_enable_cxl_device; > + > + return 0; > + > +err_enable_cxl_device: > + vfio_pci_core_disable(core_dev); > + return ret; > +} > +EXPORT_SYMBOL(vfio_cxl_core_enable); These should all be _GPL symbols by default, right? > + > +void vfio_cxl_core_finish_enable(struct vfio_pci_core_device *core_dev) > +{ > + vfio_pci_core_finish_enable(core_dev); > +} > +EXPORT_SYMBOL(vfio_cxl_core_finish_enable); > + > +void vfio_cxl_core_close_device(struct vfio_device *vdev) > +{ > + struct vfio_pci_core_device *core_dev = > + container_of(vdev, struct vfio_pci_core_device, vdev); > + > + disable_cxl(core_dev); > + vfio_pci_core_close_device(vdev); > +} > +EXPORT_SYMBOL(vfio_cxl_core_close_device); > + > +/* > + * Configure the resource required by the kernel CXL core: > + * device DPA and device RAM size > + */ > +void vfio_cxl_core_set_resource(struct vfio_pci_core_device *core_dev, > + struct resource res, > + enum accel_resource type) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + > + switch (type) { > + case CXL_ACCEL_RES_DPA: > + cxl->dpa_size = res.end - res.start + 1; > + cxl->dpa_res = res; > + break; > + > + case CXL_ACCEL_RES_RAM: > + cxl->ram_res = res; > + break; > + > + default: > + WARN(1, "invalid resource type: %d\n", type); > + break; > + } > +} > +EXPORT_SYMBOL(vfio_cxl_core_set_resource); It's not obvious to me why we want to multiplex these through one function rather than have separate functions to set the dpa and ram. The usage in patch 12/ doesn't really dictate a multiplexed function. > + > +/* Configure the expected CXL region size to be created */ > +void vfio_cxl_core_set_region_size(struct vfio_pci_core_device *core_dev, > + u64 size) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + > + if (WARN_ON(size > cxl->dpa_size)) > + return; > + > + if (WARN_ON(cxl->region.region)) > + return; > + > + cxl->region.size = size; > +} > +EXPORT_SYMBOL(vfio_cxl_core_set_region_size); > + > +/* Configure the driver cap required by the kernel CXL core */ > +void vfio_cxl_core_set_driver_hdm_cap(struct vfio_pci_core_device *core_dev) > +{ > + struct vfio_cxl *cxl = &core_dev->cxl; > + > + cxl->caps |= CXL_ACCEL_DRIVER_CAP_HDM; > +} > +EXPORT_SYMBOL(vfio_cxl_core_set_driver_hdm_cap); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR(DRIVER_AUTHOR); > +MODULE_DESCRIPTION(DRIVER_DESC); > +MODULE_IMPORT_NS(CXL); > diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h > index fbb472dd99b3..7762d4a3e825 100644 > --- a/include/linux/vfio_pci_core.h > +++ b/include/linux/vfio_pci_core.h > @@ -15,6 +15,8 @@ > #include > #include > #include > +#include > +#include > > #ifndef VFIO_PCI_CORE_H > #define VFIO_PCI_CORE_H > @@ -49,6 +51,31 @@ struct vfio_pci_region { > u32 flags; > }; > > +struct vfio_cxl_region { > + u64 size; > + u64 addr; > + struct cxl_region *region; > +}; > + > +struct vfio_cxl { > + u8 caps; > + u64 dpa_size; > + > + u32 hdm_count; Poor packing, caps and hdm_count should at least be adjacent to leave only a single 24-bit gap. > + u64 hdm_reg_offset; > + u64 hdm_reg_size; > + > + struct cxl_dev_state *cxlds; > + struct cxl_memdev *cxlmd; > + struct cxl_root_decoder *cxlrd; > + struct cxl_port *endpoint; > + struct cxl_endpoint_decoder *cxled; > + struct resource dpa_res; > + struct resource ram_res; > + > + struct vfio_cxl_region region; > +}; > + > struct vfio_pci_core_device { > struct vfio_device vdev; > struct pci_dev *pdev; > @@ -94,6 +121,7 @@ struct vfio_pci_core_device { > struct vfio_pci_core_device *sriov_pf_core_dev; > struct notifier_block nb; > struct rw_semaphore memory_lock; > + struct vfio_cxl cxl; I'd prefer we not embed a structure here that's unused for 100% of current use cases. Why can't we have: struct vfio_cxl_core_device { struct vfio_pci_core_device pci_core; struct vfio_cxl clx; }; Thanks, Alex > }; > > /* Will be exported for vfio pci drivers usage */ > @@ -159,4 +187,13 @@ VFIO_IOREAD_DECLARATION(32) > VFIO_IOREAD_DECLARATION(64) > #endif > > +int vfio_cxl_core_enable(struct vfio_pci_core_device *core_dev); > +void vfio_cxl_core_finish_enable(struct vfio_pci_core_device *core_dev); > +void vfio_cxl_core_close_device(struct vfio_device *vdev); > +void vfio_cxl_core_set_resource(struct vfio_pci_core_device *core_dev, > + struct resource res, > + enum accel_resource type); > +void vfio_cxl_core_set_region_size(struct vfio_pci_core_device *core_dev, > + u64 size); > +void vfio_cxl_core_set_driver_hdm_cap(struct vfio_pci_core_device *core_dev); > #endif /* VFIO_PCI_CORE_H */