[PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM
@ 2024-10-21  5:21 Vivek Kasireddy
  2024-10-21  5:21 ` [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device Vivek Kasireddy
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Vivek Kasireddy @ 2024-10-21  5:21 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: Vivek Kasireddy, Michal Wajdeczko, Michał Winiarski,
	Simona Vetter, Matthew Auld, Matthew Brost, Thomas Hellström,
	Dongwon Kim

While testing [1] and [2] with a SRIOV enabled dGPU, it was noticed
that migrating a BO to System RAM before exporting it as a dmabuf
results in considerable performance degradation while running in a
Guest VM. For example, running a simple 3D app such as weston-simple-egl
would yield ~50 FPS instead of ~59 FPS, assuming a mode of 1920x1080@60.

One fix to this problem is to not migrate the BO and keep it in LMEM
during export. However, given that the GPU running in PF mode on the
Host cannot effectively access the PCI BAR addresses backing the
imported dmabuf BO, they need to be translated into LMEM addresses
(DPA) to enable this use-case to work properly.

With this patch series applied, it would become possible to display
(via Qemu GTK UI) Guest VM compositor's framebuffer (created in its LMEM)
on the Host without having to make any copies of it or a roundtrip
to System RAM. And, weston-simple-egl can now achieve ~59 FPS while
running with Gnome Wayland in the Guest VM.

Changelog:

v1 -> v2:
- Use a dma_addr array instead of SG table to store translated DMA
  addresses (Matt)
- Use a cursor to iterate over the entries in the dma_addr array
  instead of relying on SG iterator (Matt)
- Rebased and tested this series on top of [3] that introduces
  drm_pagemap_dma_addr and xe_res_first_dma/__xe_res_dma_next
  that this version relies on

[1] https://patchwork.freedesktop.org/series/131746/
[2] https://patchwork.freedesktop.org/series/135273/
[3] https://patchwork.freedesktop.org/series/137870/

Patchset overview:

Patch 1: PCI driver patch to unblock P2P DMA between VF and PF
Patch 2: Prevent BO migration to System RAM while running in VM
Patch 3: Helper function to get VF's backing object in LMEM
Patch 4-5: Create a new dma_addr array for LMEM based dmabuf BOs
	   to store translated addresses (DPAs)

This series is tested using the following method:
- Run Qemu with the following relevant options:
  qemu-system-x86_64 -m 4096m ....
  -device vfio-pci,host=0000:03:00.1
  -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080
  -display gtk,gl=on
  -object memory-backend-memfd,id=mem1,size=4096M
  -machine memory-backend=mem1 ...
- Run Gnome Wayland with the following options in the Guest VM:
  # cat /usr/lib/udev/rules.d/61-mutter-primary-gpu.rules
  ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary", TAG+="mutter-device-disable-kms-modifiers"
  # XDG_SESSION_TYPE=wayland dbus-run-session -- /usr/bin/gnome-shell --wayland --no-x11 &

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>

Vivek Kasireddy (5):
  PCI/P2PDMA: Don't enforce ACS check for functions of same device
  drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode
  drm/xe/pf: Add a helper function to get a VF's backing object in LMEM
  drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with
    VFs
  drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs

 drivers/gpu/drm/xe/xe_bo.c                 | 116 +++++++++++++++++++--
 drivers/gpu/drm/xe/xe_bo_types.h           |  11 +-
 drivers/gpu/drm/xe/xe_dma_buf.c            |   9 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c |  23 ++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |   1 +
 drivers/gpu/drm/xe/xe_pt.c                 |   8 +-
 drivers/pci/p2pdma.c                       |  17 ++-
 7 files changed, 172 insertions(+), 13 deletions(-)

-- 
2.45.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
@ 2024-10-21  5:21 ` Vivek Kasireddy
  2024-10-22 15:16   ` Bjorn Helgaas
  2024-10-21  5:21 ` [PATCH v2 2/5] drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode Vivek Kasireddy
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Vivek Kasireddy @ 2024-10-21  5:21 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: Vivek Kasireddy, Bjorn Helgaas, Logan Gunthorpe, linux-pci

Functions of the same PCI device (such as a PF and a VF) share the
same bus and have a common root port and typically, the PF provisions
resources for the VF. Therefore, they can be considered compatible
as far as P2P access is considered.

Currently, although the distance (2) is correctly calculated for
functions of the same device, an ACS check failure prevents P2P DMA
access between them. Therefore, introduce a small function named
pci_devs_are_p2pdma_compatible() to determine if the provider and
client belong to the same device and facilitate P2P DMA between
them by not enforcing the ACS check.

v2:
- Relax the enforcment of ACS check only for Intel GPU functions
  as they are P2PDMA compatible given the way the PF provisions
  the resources among multiple VFs.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <linux-pci@vger.kernel.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/pci/p2pdma.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 4f47a13cb500..a230e661f939 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -535,6 +535,17 @@ static unsigned long map_types_idx(struct pci_dev *client)
 	return (pci_domain_nr(client->bus) << 16) | pci_dev_id(client);
 }
 
+static bool pci_devs_are_p2pdma_compatible(struct pci_dev *provider,
+					   struct pci_dev *client)
+{
+	if (provider->vendor == PCI_VENDOR_ID_INTEL) {
+		if (pci_is_vga(provider) && pci_is_vga(client))
+			return pci_physfn(provider) == pci_physfn(client);
+	}
+
+	return false;
+}
+
 /*
  * Calculate the P2PDMA mapping type and distance between two PCI devices.
  *
@@ -634,7 +645,7 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
 
 	*dist = dist_a + dist_b;
 
-	if (!acs_cnt) {
+	if (!acs_cnt || pci_devs_are_p2pdma_compatible(provider, client)) {
 		map_type = PCI_P2PDMA_MAP_BUS_ADDR;
 		goto done;
 	}
@@ -696,7 +707,9 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
 		return -1;
 
 	for (i = 0; i < num_clients; i++) {
-		pci_client = find_parent_pci_dev(clients[i]);
+		pci_client = dev_is_pf(clients[i]) ?
+				pci_dev_get(to_pci_dev(clients[i])) :
+				find_parent_pci_dev(clients[i]);
 		if (!pci_client) {
 			if (verbose)
 				dev_warn(clients[i],
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/5] drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
  2024-10-21  5:21 ` [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device Vivek Kasireddy
@ 2024-10-21  5:21 ` Vivek Kasireddy
  2024-10-21  5:21 ` [PATCH v2 3/5] drm/xe/pf: Add a helper function to get a VF's backing object in LMEM Vivek Kasireddy
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Vivek Kasireddy @ 2024-10-21  5:21 UTC (permalink / raw)
  To: dri-devel, intel-xe; +Cc: Vivek Kasireddy

If the importer has allow_peer2peer set to true, then we can expect that
it would be able to handle VRAM addresses. Therefore, in this specific
case and only while running in VF mode, do not migrate the BO to System
RAM before exporting it.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/gpu/drm/xe/xe_dma_buf.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 68f309f5e981..a90c9368d265 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -17,6 +17,7 @@
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_pm.h"
+#include "xe_sriov.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_vm.h"
 
@@ -26,8 +27,11 @@ static int xe_dma_buf_attach(struct dma_buf *dmabuf,
 			     struct dma_buf_attachment *attach)
 {
 	struct drm_gem_object *obj = attach->dmabuf->priv;
+	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct xe_device *xe = xe_bo_device(bo);
 
 	if (attach->peer2peer &&
+	    !IS_SRIOV_VF(xe) &&
 	    pci_p2pdma_distance(to_pci_dev(obj->dev->dev), attach->dev, false) < 0)
 		attach->peer2peer = false;
 
@@ -51,7 +55,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 	struct drm_gem_object *obj = attach->dmabuf->priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	struct xe_device *xe = xe_bo_device(bo);
-	int ret;
+	int ret = 0;
 
 	/*
 	 * For now only support pinning in TT memory, for two reasons:
@@ -63,7 +67,8 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 		return -EINVAL;
 	}
 
-	ret = xe_bo_migrate(bo, XE_PL_TT);
+	if (!IS_SRIOV_VF(xe) || !attach->peer2peer)
+		ret = xe_bo_migrate(bo, XE_PL_TT);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			drm_dbg(&xe->drm,
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/5] drm/xe/pf: Add a helper function to get a VF's backing object in LMEM
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
  2024-10-21  5:21 ` [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device Vivek Kasireddy
  2024-10-21  5:21 ` [PATCH v2 2/5] drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode Vivek Kasireddy
@ 2024-10-21  5:21 ` Vivek Kasireddy
  2024-10-21  5:21 ` [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs Vivek Kasireddy
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Vivek Kasireddy @ 2024-10-21  5:21 UTC (permalink / raw)
  To: dri-devel, intel-xe; +Cc: Vivek Kasireddy

To properly import a dmabuf that is associated with a VF (or that
originates in a Guest VM that includes a VF), we need to know where
in LMEM the VF's allocated regions exist. Therefore, introduce a
new helper to return the object that backs the VF's regions in LMEM.

v2:
- Make the helper return the LMEM object instead of the start address.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 23 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index a863e50b756e..8a5df3d76533 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -1455,6 +1455,29 @@ u64 xe_gt_sriov_pf_config_get_lmem(struct xe_gt *gt, unsigned int vfid)
 	return size;
 }
 
+/**
+ * xe_gt_sriov_pf_config_get_lmem_obj - Get VF's LMEM BO.
+ * @gt: the &xe_gt
+ * @vfid: the VF identifier
+ *
+ * This function can only be called on PF.
+ *
+ * Return: BO that is backing VF's quota in LMEM.
+ */
+struct xe_bo *xe_gt_sriov_pf_config_get_lmem_obj(struct xe_gt *gt,
+						 unsigned int vfid)
+{
+	struct xe_gt_sriov_config *config;
+	struct xe_bo *lmem_obj;
+
+	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
+	config = pf_pick_vf_config(gt, vfid);
+	lmem_obj = config->lmem_obj;
+	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
+
+	return lmem_obj;
+}
+
 /**
  * xe_gt_sriov_pf_config_set_lmem - Provision VF with LMEM.
  * @gt: the &xe_gt (can't be media)
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
index b74ec38baa18..7779dc9f9c8d 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
@@ -31,6 +31,7 @@ int xe_gt_sriov_pf_config_set_fair_dbs(struct xe_gt *gt, unsigned int vfid, unsi
 int xe_gt_sriov_pf_config_bulk_set_dbs(struct xe_gt *gt, unsigned int vfid, unsigned int num_vfs,
 				       u32 num_dbs);
 
+struct xe_bo *xe_gt_sriov_pf_config_get_lmem_obj(struct xe_gt *gt, unsigned int vfid);
 u64 xe_gt_sriov_pf_config_get_lmem(struct xe_gt *gt, unsigned int vfid);
 int xe_gt_sriov_pf_config_set_lmem(struct xe_gt *gt, unsigned int vfid, u64 size);
 int xe_gt_sriov_pf_config_set_fair_lmem(struct xe_gt *gt, unsigned int vfid, unsigned int num_vfs);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
                   ` (2 preceding siblings ...)
  2024-10-21  5:21 ` [PATCH v2 3/5] drm/xe/pf: Add a helper function to get a VF's backing object in LMEM Vivek Kasireddy
@ 2024-10-21  5:21 ` Vivek Kasireddy
  2024-10-22 10:12   ` kernel test robot
  2024-10-22 10:54   ` kernel test robot
  2024-10-21  5:21 ` [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs Vivek Kasireddy
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Vivek Kasireddy @ 2024-10-21  5:21 UTC (permalink / raw)
  To: dri-devel, intel-xe; +Cc: Vivek Kasireddy, Matthew Brost, Thomas Hellström

For BOs of type ttm_bo_type_sg, that are backed by PCI BAR addresses
associated with a VF, we need to adjust and translate these addresses
to LMEM addresses to make the BOs usable by the PF. Otherwise, the BOs
(i.e, PCI BAR addresses) are only accessible by the CPU and not by
the GPU.

In order to do the above, we first need to identify if the DMA addresses
associated with an imported BO (type ttm_bo_type_sg) belong to System
RAM or a VF or other PCI devices. After we confirm that they belong to
a VF, we convert the DMA addresses (IOVAs in this case) to DPAs and
create a new dma_addr array (of type drm_pagemap_dma_addr) and populate
it with the new addresses along with the segment sizes.

v2:
- Use dma_addr array instead of sg table to store translated addresses
  (Matt)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c       | 116 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_bo_types.h |  11 ++-
 2 files changed, 120 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 5b232f2951b1..81a2f8c8031a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -6,6 +6,7 @@
 #include "xe_bo.h"
 
 #include <linux/dma-buf.h>
+#include <linux/iommu.h>
 
 #include <drm/drm_drv.h>
 #include <drm/drm_gem_ttm_helper.h>
@@ -15,16 +16,19 @@
 #include <drm/ttm/ttm_tt.h>
 #include <uapi/drm/xe_drm.h>
 
+#include "regs/xe_bars.h"
 #include "xe_device.h"
 #include "xe_dma_buf.h"
 #include "xe_drm_client.h"
 #include "xe_ggtt.h"
 #include "xe_gt.h"
+#include "xe_gt_sriov_pf_config.h"
 #include "xe_map.h"
 #include "xe_migrate.h"
 #include "xe_pm.h"
 #include "xe_preempt_fence.h"
 #include "xe_res_cursor.h"
+#include "xe_sriov_pf_helpers.h"
 #include "xe_trace_bo.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_vm.h"
@@ -543,6 +547,94 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
 	return ret;
 }
 
+static struct pci_dev *xe_find_vf_dev(struct xe_device *xe,
+				      phys_addr_t phys)
+{
+	struct pci_dev *pdev, *pf_pdev = to_pci_dev(xe->drm.dev);
+	resource_size_t io_start, io_size;
+
+	list_for_each_entry(pdev, &pf_pdev->bus->devices, bus_list) {
+		if (pdev->is_physfn)
+			continue;
+
+		io_start = pci_resource_start(pdev, LMEM_BAR);
+		io_size = pci_resource_len(pdev, LMEM_BAR);
+
+		if (phys >= io_start &&
+		    phys < (io_start + io_size - PAGE_SIZE))
+			return pdev;
+	}
+
+	return NULL;
+}
+
+
+static void xe_bo_translate_iova_to_dpa(struct iommu_domain *domain,
+					struct xe_bo *bo, struct sg_table *sg,
+					resource_size_t io_start, int vfid)
+{
+	struct xe_device *xe = xe_bo_device(bo);
+	struct xe_gt *gt = xe_root_mmio_gt(xe);
+	struct scatterlist *sgl;
+	struct xe_bo *lmem_bo;
+	phys_addr_t phys;
+	dma_addr_t addr;
+	u64 offset, i;
+
+	lmem_bo = xe_gt_sriov_pf_config_get_lmem_obj(gt, ++vfid);
+
+	for_each_sgtable_dma_sg(sg, sgl, i) {
+		phys = iommu_iova_to_phys(domain, sg_dma_address(sgl));
+		offset = phys - io_start;
+		addr = xe_bo_addr(lmem_bo, offset, sg_dma_len(sgl));
+
+		bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
+						DRM_INTERCONNECT_DRIVER,
+						get_order(sg_dma_len(sgl)),
+						DMA_BIDIRECTIONAL);
+	}
+}
+
+static int xe_bo_sg_to_dma_addr_array(struct sg_table *sg, struct xe_bo *bo)
+{
+	struct xe_device *xe = xe_bo_device(bo);
+	struct iommu_domain *domain;
+	resource_size_t io_start;
+	struct pci_dev *pdev;
+	phys_addr_t phys;
+	int vfid;
+
+	if (!IS_SRIOV_PF(xe))
+		return 0;
+
+	domain = iommu_get_domain_for_dev(xe->drm.dev);
+	if (!domain)
+		return 0;
+
+	phys = iommu_iova_to_phys(domain, sg_dma_address(sg->sgl));
+	if (page_is_ram(PFN_DOWN(phys)))
+		return 0;
+
+	pdev = xe_find_vf_dev(xe, phys);
+	if (!pdev)
+		return 0;
+
+	vfid = pci_iov_vf_id(pdev);
+	if (vfid < 0)
+		return 0;
+
+	bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
+				     GFP_KERNEL);
+	if (!bo->dma_addr)
+		return -ENOMEM;
+
+	bo->is_devmem_external = true;
+	io_start = pci_resource_start(pdev, LMEM_BAR);
+	xe_bo_translate_iova_to_dpa(domain, bo, sg, io_start, vfid);
+
+	return 0;
+}
+
 /*
  * The dma-buf map_attachment() / unmap_attachment() is hooked up here.
  * Note that unmapping the attachment is deferred to the next
@@ -560,12 +652,15 @@ static int xe_bo_move_dmabuf(struct ttm_buffer_object *ttm_bo,
 					       ttm);
 	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
 	struct sg_table *sg;
+	int ret;
 
 	xe_assert(xe, attach);
 	xe_assert(xe, ttm_bo->ttm);
 
-	if (new_res->mem_type == XE_PL_SYSTEM)
-		goto out;
+	if (new_res->mem_type == XE_PL_SYSTEM) {
+		ttm_bo_move_null(ttm_bo, new_res);
+		return 0;
+	}
 
 	if (ttm_bo->sg) {
 		dma_buf_unmap_attachment(attach, ttm_bo->sg, DMA_BIDIRECTIONAL);
@@ -576,13 +671,16 @@ static int xe_bo_move_dmabuf(struct ttm_buffer_object *ttm_bo,
 	if (IS_ERR(sg))
 		return PTR_ERR(sg);
 
+	ret = xe_bo_sg_to_dma_addr_array(sg, ttm_to_xe_bo(ttm_bo));
+	if (ret < 0) {
+		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
+		return ret;
+	}
+
 	ttm_bo->sg = sg;
 	xe_tt->sg = sg;
 
-out:
-	ttm_bo_move_null(ttm_bo, new_res);
-
-	return 0;
+	return ret;
 }
 
 /**
@@ -1066,6 +1164,8 @@ static void xe_ttm_bo_release_notify(struct ttm_buffer_object *ttm_bo)
 
 static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
 {
+	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
+
 	if (!xe_bo_is_xe_bo(ttm_bo))
 		return;
 
@@ -1079,6 +1179,10 @@ static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
 
 		dma_buf_unmap_attachment(ttm_bo->base.import_attach, ttm_bo->sg,
 					 DMA_BIDIRECTIONAL);
+
+		if (bo->is_devmem_external) {
+			kfree(bo->dma_addr);
+		}
 		ttm_bo->sg = NULL;
 		xe_tt->sg = NULL;
 	}
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 13c6d8a69e91..f74876be3f8d 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -66,7 +66,16 @@ struct xe_bo {
 
 	/** @ccs_cleared */
 	bool ccs_cleared;
-
+	/**
+	 * @is_devmem_external: Whether this BO is an imported dma-buf that
+	 * is LMEM based.
+	 */
+	bool is_devmem_external;
+	/**
+	 * @dma_addr: An array to store DMA addresses (DPAs) for imported
+	 * dmabuf BOs that are LMEM based.
+	 */
+	struct drm_pagemap_dma_addr *dma_addr;
 	/**
 	 * @cpu_caching: CPU caching mode. Currently only used for userspace
 	 * objects. Exceptions are system memory on DGFX, which is always
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
                   ` (3 preceding siblings ...)
  2024-10-21  5:21 ` [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs Vivek Kasireddy
@ 2024-10-21  5:21 ` Vivek Kasireddy
  2024-10-22 12:58   ` kernel test robot
  2024-10-21  5:52 ` ✓ CI.Patch_applied: success for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2) Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Vivek Kasireddy @ 2024-10-21  5:21 UTC (permalink / raw)
  To: dri-devel, intel-xe; +Cc: Vivek Kasireddy

If a BO's is_devmem_external flag is set, it means that it is an
imported dmabuf BO that has a backing store in VRAM. Therefore, we
need to add XE_PPGTT_PTE_DM to the PTE flags as part of vm_bind.

v2:
- Use a cursor to iterate over the entries in the dma_addr array
  instead of relying on SG iterator (Matt)

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index f27f579f4d85..cfa11acbf525 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -660,10 +660,11 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 			xe_walk.default_pte &= ~XE_USM_PPGTT_PTE_AE;
 	}
 
-	if (is_devmem) {
+	if (is_devmem || bo->is_devmem_external)
 		xe_walk.default_pte |= XE_PPGTT_PTE_DM;
+
+	if (is_devmem)
 		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
-	}
 
 	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
 		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
@@ -677,6 +678,9 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
 			xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
 				     xe_vma_size(vma), &curs);
+		else if (bo->is_devmem_external)
+			xe_res_first_dma(bo->dma_addr, xe_vma_bo_offset(vma),
+					xe_vma_size(vma), &curs);
 		else
 			xe_res_first_sg(xe_bo_sg(bo), xe_vma_bo_offset(vma),
 					xe_vma_size(vma), &curs);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✓ CI.Patch_applied: success for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2)
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
                   ` (4 preceding siblings ...)
  2024-10-21  5:21 ` [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs Vivek Kasireddy
@ 2024-10-21  5:52 ` Patchwork
  2024-10-21  5:52 ` ✗ CI.checkpatch: warning " Patchwork
  2024-10-21  5:52 ` ✗ CI.KUnit: failure " Patchwork
  7 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2024-10-21  5:52 UTC (permalink / raw)
  To: Vivek Kasireddy; +Cc: intel-xe

== Series Details ==

Series: drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2)
URL   : https://patchwork.freedesktop.org/series/139920/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: f6a9ad8c8480 drm-tip: 2024y-10m-21d-01h-17m-57s UTC integration manifest
=== git am output follows ===
Applying: PCI/P2PDMA: Don't enforce ACS check for functions of same device
Applying: drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode
Applying: drm/xe/pf: Add a helper function to get a VF's backing object in LMEM
Applying: drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
Applying: drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs



^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ CI.checkpatch: warning for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2)
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
                   ` (5 preceding siblings ...)
  2024-10-21  5:52 ` ✓ CI.Patch_applied: success for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2) Patchwork
@ 2024-10-21  5:52 ` Patchwork
  2024-10-21  5:52 ` ✗ CI.KUnit: failure " Patchwork
  7 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2024-10-21  5:52 UTC (permalink / raw)
  To: Vivek Kasireddy; +Cc: intel-xe

== Series Details ==

Series: drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2)
URL   : https://patchwork.freedesktop.org/series/139920/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 4346ea59037d66745c16357b2d8da2e479c3c631
Author: Vivek Kasireddy <vivek.kasireddy@intel.com>
Date:   Sun Oct 20 22:21:33 2024 -0700

    drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs
    
    If a BO's is_devmem_external flag is set, it means that it is an
    imported dmabuf BO that has a backing store in VRAM. Therefore, we
    need to add XE_PPGTT_PTE_DM to the PTE flags as part of vm_bind.
    
    v2:
    - Use a cursor to iterate over the entries in the dma_addr array
      instead of relying on SG iterator (Matt)
    
    Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
+ /mt/dim checkpatch f6a9ad8c848019c9c85d4adff59359427f794f8a drm-intel
bd31abae4502 PCI/P2PDMA: Don't enforce ACS check for functions of same device
b4556b1dbfc9 drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode
ba8212b14cd8 drm/xe/pf: Add a helper function to get a VF's backing object in LMEM
d9b92a507fd8 drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
-:88: CHECK:LINE_SPACING: Please don't use multiple blank lines
#88: FILE: drivers/gpu/drm/xe/xe_bo.c:571:
+
+

-:109: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#109: FILE: drivers/gpu/drm/xe/xe_bo.c:592:
+		bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
+						DRM_INTERCONNECT_DRIVER,

-:211: WARNING:BRACES: braces {} are not necessary for single statement blocks
#211: FILE: drivers/gpu/drm/xe/xe_bo.c:1183:
+		if (bo->is_devmem_external) {
+			kfree(bo->dma_addr);
+		}

total: 0 errors, 1 warnings, 2 checks, 192 lines checked
4346ea59037d drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs
-:41: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#41: FILE: drivers/gpu/drm/xe/xe_pt.c:683:
+			xe_res_first_dma(bo->dma_addr, xe_vma_bo_offset(vma),
+					xe_vma_size(vma), &curs);

total: 0 errors, 0 warnings, 1 checks, 22 lines checked



^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ CI.KUnit: failure for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2)
  2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
                   ` (6 preceding siblings ...)
  2024-10-21  5:52 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-10-21  5:52 ` Patchwork
  7 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2024-10-21  5:52 UTC (permalink / raw)
  To: Vivek Kasireddy; +Cc: intel-xe

== Series Details ==

Series: drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2)
URL   : https://patchwork.freedesktop.org/series/139920/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../drivers/gpu/drm/xe/xe_bo.c: In function ‘xe_bo_translate_iova_to_dpa’:
../drivers/gpu/drm/xe/xe_bo.c:591:29: error: invalid use of undefined type ‘struct drm_pagemap_dma_addr’
  591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
      |                             ^
../drivers/gpu/drm/xe/xe_bo.c:591:35: error: implicit declaration of function ‘drm_pagemap_dma_addr_encode’ [-Werror=implicit-function-declaration]
  591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_bo.c:592:49: error: ‘DRM_INTERCONNECT_DRIVER’ undeclared (first use in this function)
  592 |                                                 DRM_INTERCONNECT_DRIVER,
      |                                                 ^~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_bo.c:592:49: note: each undeclared identifier is reported only once for each function it appears in
../drivers/gpu/drm/xe/xe_bo.c:591:33: error: invalid use of undefined type ‘struct drm_pagemap_dma_addr’
  591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
      |                                 ^
In file included from ../include/linux/percpu.h:5,
                 from ../include/linux/percpu_counter.h:14,
                 from ../include/linux/mm_types.h:21,
                 from ../include/linux/mmzone.h:22,
                 from ../include/linux/gfp.h:7,
                 from ../include/linux/mm.h:7,
                 from ../include/linux/pagemap.h:8,
                 from ../include/drm/ttm/ttm_tt.h:30,
                 from ../drivers/gpu/drm/xe/xe_bo.h:9,
                 from ../drivers/gpu/drm/xe/xe_bo.c:6:
../drivers/gpu/drm/xe/xe_bo.c: In function ‘xe_bo_sg_to_dma_addr_array’:
../drivers/gpu/drm/xe/xe_bo.c:626:55: error: invalid application of ‘sizeof’ to incomplete type ‘struct drm_pagemap_dma_addr’
  626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
      |                                                       ^
../include/linux/alloc_tag.h:202:16: note: in definition of macro ‘alloc_hooks_tag’
  202 |         typeof(_do_alloc) _res = _do_alloc;                             \
      |                ^~~~~~~~~
../include/linux/slab.h:925:49: note: in expansion of macro ‘alloc_hooks’
  925 | #define kmalloc_array(...)                      alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
      |                                                 ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_bo.c:626:24: note: in expansion of macro ‘kmalloc_array’
  626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
      |                        ^~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_bo.c:626:55: error: invalid application of ‘sizeof’ to incomplete type ‘struct drm_pagemap_dma_addr’
  626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
      |                                                       ^
../include/linux/alloc_tag.h:202:34: note: in definition of macro ‘alloc_hooks_tag’
  202 |         typeof(_do_alloc) _res = _do_alloc;                             \
      |                                  ^~~~~~~~~
../include/linux/slab.h:925:49: note: in expansion of macro ‘alloc_hooks’
  925 | #define kmalloc_array(...)                      alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
      |                                                 ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_bo.c:626:24: note: in expansion of macro ‘kmalloc_array’
  626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
      |                        ^~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_bo.c:626:22: warning: assignment to ‘struct drm_pagemap_dma_addr *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
  626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
      |                      ^
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:229: drivers/gpu/drm/xe/xe_bo.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:478: drivers/gpu/drm/xe] Error 2
make[6]: *** Waiting for unfinished jobs....
make[5]: *** [../scripts/Makefile.build:478: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:478: drivers/gpu] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:478: drivers] Error 2
make[3]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
make[2]: *** [/kernel/Makefile:1936: .] Error 2
make[1]: *** [/kernel/Makefile:224: __sub-make] Error 2
make: *** [Makefile:224: __sub-make] Error 2

[05:52:22] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[05:52:26] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
  2024-10-21  5:21 ` [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs Vivek Kasireddy
@ 2024-10-22 10:12   ` kernel test robot
  2024-10-22 10:54   ` kernel test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kernel test robot @ 2024-10-22 10:12 UTC (permalink / raw)
  To: Vivek Kasireddy, dri-devel, intel-xe
  Cc: oe-kbuild-all, Vivek Kasireddy, Matthew Brost,
	Thomas Hellström

Hi Vivek,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-xe/drm-xe-next]
[also build test ERROR on drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes drm-misc/drm-misc-next drm-tip/drm-tip linus/master v6.12-rc4 next-20241021]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vivek-Kasireddy/PCI-P2PDMA-Don-t-enforce-ACS-check-for-functions-of-same-device/20241021-134804
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20241021052236.1820329-5-vivek.kasireddy%40intel.com
patch subject: [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
config: i386-buildonly-randconfig-003-20241022 (https://download.01.org/0day-ci/archive/20241022/202410221702.FLgKnDgM-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241022/202410221702.FLgKnDgM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410221702.FLgKnDgM-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/xe/xe_bo.c: In function 'xe_bo_translate_iova_to_dpa':
>> drivers/gpu/drm/xe/xe_bo.c:591:29: error: invalid use of undefined type 'struct drm_pagemap_dma_addr'
     591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
         |                             ^
>> drivers/gpu/drm/xe/xe_bo.c:591:35: error: implicit declaration of function 'drm_pagemap_dma_addr_encode' [-Werror=implicit-function-declaration]
     591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/gpu/drm/xe/xe_bo.c:592:49: error: 'DRM_INTERCONNECT_DRIVER' undeclared (first use in this function)
     592 |                                                 DRM_INTERCONNECT_DRIVER,
         |                                                 ^~~~~~~~~~~~~~~~~~~~~~~
   drivers/gpu/drm/xe/xe_bo.c:592:49: note: each undeclared identifier is reported only once for each function it appears in
   drivers/gpu/drm/xe/xe_bo.c:591:33: error: invalid use of undefined type 'struct drm_pagemap_dma_addr'
     591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
         |                                 ^
   In file included from include/linux/percpu.h:5,
                    from arch/x86/include/asm/msr.h:15,
                    from arch/x86/include/asm/tsc.h:10,
                    from arch/x86/include/asm/timex.h:6,
                    from include/linux/timex.h:67,
                    from include/linux/time32.h:13,
                    from include/linux/time.h:60,
                    from include/linux/jiffies.h:10,
                    from include/linux/ktime.h:25,
                    from include/linux/timer.h:6,
                    from include/linux/workqueue.h:9,
                    from include/linux/mm_types.h:19,
                    from include/linux/mmzone.h:22,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from include/linux/pagemap.h:8,
                    from include/drm/ttm/ttm_tt.h:30,
                    from drivers/gpu/drm/xe/xe_bo.h:9,
                    from drivers/gpu/drm/xe/xe_bo.c:6:
   drivers/gpu/drm/xe/xe_bo.c: In function 'xe_bo_sg_to_dma_addr_array':
>> drivers/gpu/drm/xe/xe_bo.c:626:55: error: invalid application of 'sizeof' to incomplete type 'struct drm_pagemap_dma_addr'
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                                                       ^
   include/linux/alloc_tag.h:202:16: note: in definition of macro 'alloc_hooks_tag'
     202 |         typeof(_do_alloc) _res = _do_alloc;                             \
         |                ^~~~~~~~~
   include/linux/slab.h:925:49: note: in expansion of macro 'alloc_hooks'
     925 | #define kmalloc_array(...)                      alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
         |                                                 ^~~~~~~~~~~
   drivers/gpu/drm/xe/xe_bo.c:626:24: note: in expansion of macro 'kmalloc_array'
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                        ^~~~~~~~~~~~~
>> drivers/gpu/drm/xe/xe_bo.c:626:55: error: invalid application of 'sizeof' to incomplete type 'struct drm_pagemap_dma_addr'
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                                                       ^
   include/linux/alloc_tag.h:202:34: note: in definition of macro 'alloc_hooks_tag'
     202 |         typeof(_do_alloc) _res = _do_alloc;                             \
         |                                  ^~~~~~~~~
   include/linux/slab.h:925:49: note: in expansion of macro 'alloc_hooks'
     925 | #define kmalloc_array(...)                      alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
         |                                                 ^~~~~~~~~~~
   drivers/gpu/drm/xe/xe_bo.c:626:24: note: in expansion of macro 'kmalloc_array'
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                        ^~~~~~~~~~~~~
   drivers/gpu/drm/xe/xe_bo.c:626:22: warning: assignment to 'struct drm_pagemap_dma_addr *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                      ^
   cc1: some warnings being treated as errors


vim +591 drivers/gpu/drm/xe/xe_bo.c

   570	
   571	
   572	static void xe_bo_translate_iova_to_dpa(struct iommu_domain *domain,
   573						struct xe_bo *bo, struct sg_table *sg,
   574						resource_size_t io_start, int vfid)
   575	{
   576		struct xe_device *xe = xe_bo_device(bo);
   577		struct xe_gt *gt = xe_root_mmio_gt(xe);
   578		struct scatterlist *sgl;
   579		struct xe_bo *lmem_bo;
   580		phys_addr_t phys;
   581		dma_addr_t addr;
   582		u64 offset, i;
   583	
   584		lmem_bo = xe_gt_sriov_pf_config_get_lmem_obj(gt, ++vfid);
   585	
   586		for_each_sgtable_dma_sg(sg, sgl, i) {
   587			phys = iommu_iova_to_phys(domain, sg_dma_address(sgl));
   588			offset = phys - io_start;
   589			addr = xe_bo_addr(lmem_bo, offset, sg_dma_len(sgl));
   590	
 > 591			bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
 > 592							DRM_INTERCONNECT_DRIVER,
   593							get_order(sg_dma_len(sgl)),
   594							DMA_BIDIRECTIONAL);
   595		}
   596	}
   597	
   598	static int xe_bo_sg_to_dma_addr_array(struct sg_table *sg, struct xe_bo *bo)
   599	{
   600		struct xe_device *xe = xe_bo_device(bo);
   601		struct iommu_domain *domain;
   602		resource_size_t io_start;
   603		struct pci_dev *pdev;
   604		phys_addr_t phys;
   605		int vfid;
   606	
   607		if (!IS_SRIOV_PF(xe))
   608			return 0;
   609	
   610		domain = iommu_get_domain_for_dev(xe->drm.dev);
   611		if (!domain)
   612			return 0;
   613	
   614		phys = iommu_iova_to_phys(domain, sg_dma_address(sg->sgl));
   615		if (page_is_ram(PFN_DOWN(phys)))
   616			return 0;
   617	
   618		pdev = xe_find_vf_dev(xe, phys);
   619		if (!pdev)
   620			return 0;
   621	
   622		vfid = pci_iov_vf_id(pdev);
   623		if (vfid < 0)
   624			return 0;
   625	
 > 626		bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
   627					     GFP_KERNEL);
   628		if (!bo->dma_addr)
   629			return -ENOMEM;
   630	
   631		bo->is_devmem_external = true;
   632		io_start = pci_resource_start(pdev, LMEM_BAR);
   633		xe_bo_translate_iova_to_dpa(domain, bo, sg, io_start, vfid);
   634	
   635		return 0;
   636	}
   637	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
  2024-10-21  5:21 ` [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs Vivek Kasireddy
  2024-10-22 10:12   ` kernel test robot
@ 2024-10-22 10:54   ` kernel test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kernel test robot @ 2024-10-22 10:54 UTC (permalink / raw)
  To: Vivek Kasireddy, dri-devel, intel-xe
  Cc: llvm, oe-kbuild-all, Vivek Kasireddy, Matthew Brost,
	Thomas Hellström

Hi Vivek,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-xe/drm-xe-next]
[also build test ERROR on drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes drm-misc/drm-misc-next drm-tip/drm-tip linus/master v6.12-rc4 next-20241021]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vivek-Kasireddy/PCI-P2PDMA-Don-t-enforce-ACS-check-for-functions-of-same-device/20241021-134804
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20241021052236.1820329-5-vivek.kasireddy%40intel.com
patch subject: [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20241022/202410221832.R04DR21j-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241022/202410221832.R04DR21j-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410221832.R04DR21j-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/xe/xe_bo.c:591:15: error: subscript of pointer to incomplete type 'struct drm_pagemap_dma_addr'
     591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
         |                 ~~~~~~~~~~~~^
   drivers/gpu/drm/xe/xe_bo_types.h:78:9: note: forward declaration of 'struct drm_pagemap_dma_addr'
      78 |         struct drm_pagemap_dma_addr *dma_addr;
         |                ^
>> drivers/gpu/drm/xe/xe_bo.c:591:21: error: call to undeclared function 'drm_pagemap_dma_addr_encode'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     591 |                 bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
         |                                   ^
>> drivers/gpu/drm/xe/xe_bo.c:592:7: error: use of undeclared identifier 'DRM_INTERCONNECT_DRIVER'
     592 |                                                 DRM_INTERCONNECT_DRIVER,
         |                                                 ^
>> drivers/gpu/drm/xe/xe_bo.c:626:48: error: invalid application of 'sizeof' to an incomplete type 'struct drm_pagemap_dma_addr'
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                                                       ^~~~~~~~~~~~~~~
   include/linux/slab.h:925:63: note: expanded from macro 'kmalloc_array'
     925 | #define kmalloc_array(...)                      alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
         |                                                                                  ^~~~~~~~~~~
   include/linux/alloc_tag.h:210:31: note: expanded from macro 'alloc_hooks'
     210 |         alloc_hooks_tag(&_alloc_tag, _do_alloc);                        \
         |                                      ^~~~~~~~~
   include/linux/alloc_tag.h:202:9: note: expanded from macro 'alloc_hooks_tag'
     202 |         typeof(_do_alloc) _res = _do_alloc;                             \
         |                ^~~~~~~~~
   drivers/gpu/drm/xe/xe_bo_types.h:78:9: note: forward declaration of 'struct drm_pagemap_dma_addr'
      78 |         struct drm_pagemap_dma_addr *dma_addr;
         |                ^
>> drivers/gpu/drm/xe/xe_bo.c:626:48: error: invalid application of 'sizeof' to an incomplete type 'struct drm_pagemap_dma_addr'
     626 |         bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
         |                                                       ^~~~~~~~~~~~~~~
   include/linux/slab.h:925:63: note: expanded from macro 'kmalloc_array'
     925 | #define kmalloc_array(...)                      alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
         |                                                                                  ^~~~~~~~~~~
   include/linux/alloc_tag.h:210:31: note: expanded from macro 'alloc_hooks'
     210 |         alloc_hooks_tag(&_alloc_tag, _do_alloc);                        \
         |                                      ^~~~~~~~~
   include/linux/alloc_tag.h:202:27: note: expanded from macro 'alloc_hooks_tag'
     202 |         typeof(_do_alloc) _res = _do_alloc;                             \
         |                                  ^~~~~~~~~
   drivers/gpu/drm/xe/xe_bo_types.h:78:9: note: forward declaration of 'struct drm_pagemap_dma_addr'
      78 |         struct drm_pagemap_dma_addr *dma_addr;
         |                ^
   5 errors generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for MODVERSIONS
   Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
   Selected by [y]:
   - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=y] || GCC_PLUGINS [=n]) && MODULES [=y]


vim +591 drivers/gpu/drm/xe/xe_bo.c

   570	
   571	
   572	static void xe_bo_translate_iova_to_dpa(struct iommu_domain *domain,
   573						struct xe_bo *bo, struct sg_table *sg,
   574						resource_size_t io_start, int vfid)
   575	{
   576		struct xe_device *xe = xe_bo_device(bo);
   577		struct xe_gt *gt = xe_root_mmio_gt(xe);
   578		struct scatterlist *sgl;
   579		struct xe_bo *lmem_bo;
   580		phys_addr_t phys;
   581		dma_addr_t addr;
   582		u64 offset, i;
   583	
   584		lmem_bo = xe_gt_sriov_pf_config_get_lmem_obj(gt, ++vfid);
   585	
   586		for_each_sgtable_dma_sg(sg, sgl, i) {
   587			phys = iommu_iova_to_phys(domain, sg_dma_address(sgl));
   588			offset = phys - io_start;
   589			addr = xe_bo_addr(lmem_bo, offset, sg_dma_len(sgl));
   590	
 > 591			bo->dma_addr[i] = drm_pagemap_dma_addr_encode(addr,
 > 592							DRM_INTERCONNECT_DRIVER,
   593							get_order(sg_dma_len(sgl)),
   594							DMA_BIDIRECTIONAL);
   595		}
   596	}
   597	
   598	static int xe_bo_sg_to_dma_addr_array(struct sg_table *sg, struct xe_bo *bo)
   599	{
   600		struct xe_device *xe = xe_bo_device(bo);
   601		struct iommu_domain *domain;
   602		resource_size_t io_start;
   603		struct pci_dev *pdev;
   604		phys_addr_t phys;
   605		int vfid;
   606	
   607		if (!IS_SRIOV_PF(xe))
   608			return 0;
   609	
   610		domain = iommu_get_domain_for_dev(xe->drm.dev);
   611		if (!domain)
   612			return 0;
   613	
   614		phys = iommu_iova_to_phys(domain, sg_dma_address(sg->sgl));
   615		if (page_is_ram(PFN_DOWN(phys)))
   616			return 0;
   617	
   618		pdev = xe_find_vf_dev(xe, phys);
   619		if (!pdev)
   620			return 0;
   621	
   622		vfid = pci_iov_vf_id(pdev);
   623		if (vfid < 0)
   624			return 0;
   625	
 > 626		bo->dma_addr = kmalloc_array(sg->nents, sizeof(*bo->dma_addr),
   627					     GFP_KERNEL);
   628		if (!bo->dma_addr)
   629			return -ENOMEM;
   630	
   631		bo->is_devmem_external = true;
   632		io_start = pci_resource_start(pdev, LMEM_BAR);
   633		xe_bo_translate_iova_to_dpa(domain, bo, sg, io_start, vfid);
   634	
   635		return 0;
   636	}
   637	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs
  2024-10-21  5:21 ` [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs Vivek Kasireddy
@ 2024-10-22 12:58   ` kernel test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2024-10-22 12:58 UTC (permalink / raw)
  To: Vivek Kasireddy, dri-devel, intel-xe; +Cc: oe-kbuild-all, Vivek Kasireddy

Hi Vivek,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-xe/drm-xe-next]
[also build test ERROR on drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes drm-misc/drm-misc-next drm-tip/drm-tip linus/master v6.12-rc4 next-20241022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vivek-Kasireddy/PCI-P2PDMA-Don-t-enforce-ACS-check-for-functions-of-same-device/20241021-134804
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20241021052236.1820329-6-vivek.kasireddy%40intel.com
patch subject: [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs
config: i386-buildonly-randconfig-003-20241022 (https://download.01.org/0day-ci/archive/20241022/202410222048.8IhSS8iE-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241022/202410222048.8IhSS8iE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410222048.8IhSS8iE-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/xe/xe_pt.c: In function 'xe_pt_stage_bind':
>> drivers/gpu/drm/xe/xe_pt.c:682:25: error: implicit declaration of function 'xe_res_first_dma'; did you mean 'xe_res_first_sg'? [-Werror=implicit-function-declaration]
     682 |                         xe_res_first_dma(bo->dma_addr, xe_vma_bo_offset(vma),
         |                         ^~~~~~~~~~~~~~~~
         |                         xe_res_first_sg
   cc1: some warnings being treated as errors


vim +682 drivers/gpu/drm/xe/xe_pt.c

   583	
   584	/**
   585	 * xe_pt_stage_bind() - Build a disconnected page-table tree for a given address
   586	 * range.
   587	 * @tile: The tile we're building for.
   588	 * @vma: The vma indicating the address range.
   589	 * @entries: Storage for the update entries used for connecting the tree to
   590	 * the main tree at commit time.
   591	 * @num_entries: On output contains the number of @entries used.
   592	 *
   593	 * This function builds a disconnected page-table tree for a given address
   594	 * range. The tree is connected to the main vm tree for the gpu using
   595	 * xe_migrate_update_pgtables() and for the cpu using xe_pt_commit_bind().
   596	 * The function builds xe_vm_pgtable_update structures for already existing
   597	 * shared page-tables, and non-existing shared and non-shared page-tables
   598	 * are built and populated directly.
   599	 *
   600	 * Return 0 on success, negative error code on error.
   601	 */
   602	static int
   603	xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
   604			 struct xe_vm_pgtable_update *entries, u32 *num_entries)
   605	{
   606		struct xe_device *xe = tile_to_xe(tile);
   607		struct xe_bo *bo = xe_vma_bo(vma);
   608		bool is_devmem = !xe_vma_is_userptr(vma) && bo &&
   609			(xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
   610		struct xe_res_cursor curs;
   611		struct xe_pt_stage_bind_walk xe_walk = {
   612			.base = {
   613				.ops = &xe_pt_stage_bind_ops,
   614				.shifts = xe_normal_pt_shifts,
   615				.max_level = XE_PT_HIGHEST_LEVEL,
   616			},
   617			.vm = xe_vma_vm(vma),
   618			.tile = tile,
   619			.curs = &curs,
   620			.va_curs_start = xe_vma_start(vma),
   621			.vma = vma,
   622			.wupd.entries = entries,
   623			.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAG_64K) && is_devmem,
   624		};
   625		struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
   626		int ret;
   627	
   628		/**
   629		 * Default atomic expectations for different allocation scenarios are as follows:
   630		 *
   631		 * 1. Traditional API: When the VM is not in LR mode:
   632		 *    - Device atomics are expected to function with all allocations.
   633		 *
   634		 * 2. Compute/SVM API: When the VM is in LR mode:
   635		 *    - Device atomics are the default behavior when the bo is placed in a single region.
   636		 *    - In all other cases device atomics will be disabled with AE=0 until an application
   637		 *      request differently using a ioctl like madvise.
   638		 */
   639		if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
   640			if (xe_vm_in_lr_mode(xe_vma_vm(vma))) {
   641				if (bo && xe_bo_has_single_placement(bo))
   642					xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
   643				/**
   644				 * If a SMEM+LMEM allocation is backed by SMEM, a device
   645				 * atomics will cause a gpu page fault and which then
   646				 * gets migrated to LMEM, bind such allocations with
   647				 * device atomics enabled.
   648				 */
   649				else if (is_devmem && !xe_bo_has_single_placement(bo))
   650					xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
   651			} else {
   652				xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
   653			}
   654	
   655			/**
   656			 * Unset AE if the platform(PVC) doesn't support it on an
   657			 * allocation
   658			 */
   659			if (!xe->info.has_device_atomics_on_smem && !is_devmem)
   660				xe_walk.default_pte &= ~XE_USM_PPGTT_PTE_AE;
   661		}
   662	
   663		if (is_devmem || bo->is_devmem_external)
   664			xe_walk.default_pte |= XE_PPGTT_PTE_DM;
   665	
   666		if (is_devmem)
   667			xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
   668	
   669		if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
   670			xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
   671	
   672		xe_bo_assert_held(bo);
   673	
   674		if (!xe_vma_is_null(vma)) {
   675			if (xe_vma_is_userptr(vma))
   676				xe_res_first_sg(to_userptr_vma(vma)->userptr.sg, 0,
   677						xe_vma_size(vma), &curs);
   678			else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
   679				xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
   680					     xe_vma_size(vma), &curs);
   681			else if (bo->is_devmem_external)
 > 682				xe_res_first_dma(bo->dma_addr, xe_vma_bo_offset(vma),
   683						xe_vma_size(vma), &curs);
   684			else
   685				xe_res_first_sg(xe_bo_sg(bo), xe_vma_bo_offset(vma),
   686						xe_vma_size(vma), &curs);
   687		} else {
   688			curs.size = xe_vma_size(vma);
   689		}
   690	
   691		ret = xe_pt_walk_range(&pt->base, pt->level, xe_vma_start(vma),
   692				       xe_vma_end(vma), &xe_walk.base);
   693	
   694		*num_entries = xe_walk.wupd.num_used_entries;
   695		return ret;
   696	}
   697	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-21  5:21 ` [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device Vivek Kasireddy
@ 2024-10-22 15:16   ` Bjorn Helgaas
  2024-10-22 21:15     ` Logan Gunthorpe
  2024-10-24  5:58     ` Kasireddy, Vivek
  0 siblings, 2 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2024-10-22 15:16 UTC (permalink / raw)
  To: Vivek Kasireddy
  Cc: dri-devel, intel-xe, Bjorn Helgaas, Logan Gunthorpe, linux-pci

On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
> Functions of the same PCI device (such as a PF and a VF) share the
> same bus and have a common root port and typically, the PF provisions
> resources for the VF. Therefore, they can be considered compatible
> as far as P2P access is considered.
> 
> Currently, although the distance (2) is correctly calculated for
> functions of the same device, an ACS check failure prevents P2P DMA
> access between them. Therefore, introduce a small function named
> pci_devs_are_p2pdma_compatible() to determine if the provider and
> client belong to the same device and facilitate P2P DMA between
> them by not enforcing the ACS check.
> 
> v2:
> - Relax the enforcment of ACS check only for Intel GPU functions
>   as they are P2PDMA compatible given the way the PF provisions
>   the resources among multiple VFs.

I don't want version history in the commit log.  If the content is
useful, just incorporate it here directly (without the version info),
and put the version-to-version changelog below the "---".

> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: <linux-pci@vger.kernel.org>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/pci/p2pdma.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 4f47a13cb500..a230e661f939 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -535,6 +535,17 @@ static unsigned long map_types_idx(struct pci_dev *client)
>  	return (pci_domain_nr(client->bus) << 16) | pci_dev_id(client);
>  }
>  
> +static bool pci_devs_are_p2pdma_compatible(struct pci_dev *provider,
> +					   struct pci_dev *client)
> +{
> +	if (provider->vendor == PCI_VENDOR_ID_INTEL) {
> +		if (pci_is_vga(provider) && pci_is_vga(client))
> +			return pci_physfn(provider) == pci_physfn(client);
> +	}

This doesn't explain why this should be specific to Intel or VGA.  As
far as I can tell, everything mentioned in the commit log is generic.

I see the previous comments
(https://lore.kernel.org/all/eddb423c-945f-40c9-b904-43ea8371f1c4@deltatee.com/),
but none of that context was captured here.

I'm not sure what you refer to by "PF provisions resources for the
VF".  Isn't it *always* the case that the architected PCI resources
(BARs) are configured by the PF?  It sounds like you're referring to
something Intel GPU-specific beyond that?

> +	return false;
> +}
> +
>  /*
>   * Calculate the P2PDMA mapping type and distance between two PCI devices.
>   *
> @@ -634,7 +645,7 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
>  
>  	*dist = dist_a + dist_b;
>  
> -	if (!acs_cnt) {
> +	if (!acs_cnt || pci_devs_are_p2pdma_compatible(provider, client)) {
>  		map_type = PCI_P2PDMA_MAP_BUS_ADDR;
>  		goto done;
>  	}
> @@ -696,7 +707,9 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
>  		return -1;
>  
>  	for (i = 0; i < num_clients; i++) {
> -		pci_client = find_parent_pci_dev(clients[i]);
> +		pci_client = dev_is_pf(clients[i]) ?
> +				pci_dev_get(to_pci_dev(clients[i])) :
> +				find_parent_pci_dev(clients[i]);
>  		if (!pci_client) {
>  			if (verbose)
>  				dev_warn(clients[i],
> -- 
> 2.45.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-22 15:16   ` Bjorn Helgaas
@ 2024-10-22 21:15     ` Logan Gunthorpe
  2024-10-24  5:50       ` Kasireddy, Vivek
  2024-10-24  5:58     ` Kasireddy, Vivek
  1 sibling, 1 reply; 24+ messages in thread
From: Logan Gunthorpe @ 2024-10-22 21:15 UTC (permalink / raw)
  To: Bjorn Helgaas, Vivek Kasireddy
  Cc: dri-devel, intel-xe, Bjorn Helgaas, linux-pci



On 2024-10-22 09:16, Bjorn Helgaas wrote:
> On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
>> Functions of the same PCI device (such as a PF and a VF) share the
>> same bus and have a common root port and typically, the PF provisions
>> resources for the VF. Therefore, they can be considered compatible
>> as far as P2P access is considered.
>>
>> Currently, although the distance (2) is correctly calculated for
>> functions of the same device, an ACS check failure prevents P2P DMA
>> access between them. Therefore, introduce a small function named
>> pci_devs_are_p2pdma_compatible() to determine if the provider and
>> client belong to the same device and facilitate P2P DMA between
>> them by not enforcing the ACS check.
>>
>> v2:
>> - Relax the enforcment of ACS check only for Intel GPU functions
>>   as they are P2PDMA compatible given the way the PF provisions
>>   the resources among multiple VFs.
> 
> I don't want version history in the commit log.  If the content is
> useful, just incorporate it here directly (without the version info),
> and put the version-to-version changelog below the "---".
> 
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: <linux-pci@vger.kernel.org>
>> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> ---
>>  drivers/pci/p2pdma.c | 17 +++++++++++++++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index 4f47a13cb500..a230e661f939 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -535,6 +535,17 @@ static unsigned long map_types_idx(struct pci_dev *client)
>>  	return (pci_domain_nr(client->bus) << 16) | pci_dev_id(client);
>>  }
>>  
>> +static bool pci_devs_are_p2pdma_compatible(struct pci_dev *provider,
>> +					   struct pci_dev *client)
>> +{
>> +	if (provider->vendor == PCI_VENDOR_ID_INTEL) {
>> +		if (pci_is_vga(provider) && pci_is_vga(client))
>> +			return pci_physfn(provider) == pci_physfn(client);
>> +	}

I'd echo many of Bjorn's concerns. In addition, I think the name of the
pci_devs_are_p2pdma_compatible() isn't quite right. Specifically this is
dealing with PCI functions within a single device that are known to
allow P2P traffic. So I think the name should probably reflect that.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-22 21:15     ` Logan Gunthorpe
@ 2024-10-24  5:50       ` Kasireddy, Vivek
  2024-10-24 16:21         ` Logan Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Kasireddy, Vivek @ 2024-10-24  5:50 UTC (permalink / raw)
  To: Logan Gunthorpe, Bjorn Helgaas
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, linux-pci@vger.kernel.org

Hi Logan,

> Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> functions of same device
> 
> 
> 
> On 2024-10-22 09:16, Bjorn Helgaas wrote:
> > On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
> >> Functions of the same PCI device (such as a PF and a VF) share the
> >> same bus and have a common root port and typically, the PF provisions
> >> resources for the VF. Therefore, they can be considered compatible
> >> as far as P2P access is considered.
> >>
> >> Currently, although the distance (2) is correctly calculated for
> >> functions of the same device, an ACS check failure prevents P2P DMA
> >> access between them. Therefore, introduce a small function named
> >> pci_devs_are_p2pdma_compatible() to determine if the provider and
> >> client belong to the same device and facilitate P2P DMA between
> >> them by not enforcing the ACS check.
> >>
> >> v2:
> >> - Relax the enforcment of ACS check only for Intel GPU functions
> >>   as they are P2PDMA compatible given the way the PF provisions
> >>   the resources among multiple VFs.
> >
> > I don't want version history in the commit log.  If the content is
> > useful, just incorporate it here directly (without the version info),
> > and put the version-to-version changelog below the "---".
> >
> >> Cc: Bjorn Helgaas <bhelgaas@google.com>
> >> Cc: Logan Gunthorpe <logang@deltatee.com>
> >> Cc: <linux-pci@vger.kernel.org>
> >> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> >> ---
> >>  drivers/pci/p2pdma.c | 17 +++++++++++++++--
> >>  1 file changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> >> index 4f47a13cb500..a230e661f939 100644
> >> --- a/drivers/pci/p2pdma.c
> >> +++ b/drivers/pci/p2pdma.c
> >> @@ -535,6 +535,17 @@ static unsigned long map_types_idx(struct
> pci_dev *client)
> >>  	return (pci_domain_nr(client->bus) << 16) | pci_dev_id(client);
> >>  }
> >>
> >> +static bool pci_devs_are_p2pdma_compatible(struct pci_dev *provider,
> >> +					   struct pci_dev *client)
> >> +{
> >> +	if (provider->vendor == PCI_VENDOR_ID_INTEL) {
> >> +		if (pci_is_vga(provider) && pci_is_vga(client))
> >> +			return pci_physfn(provider) == pci_physfn(client);
> >> +	}
> 
> I'd echo many of Bjorn's concerns. In addition, I think the name of the
> pci_devs_are_p2pdma_compatible() isn't quite right. Specifically this is
> dealing with PCI functions within a single device that are known to
> allow P2P traffic. So I think the name should probably reflect that.
Would pci_devfns_support_p2pdma() be a more appropriate name?

Thanks,
Vivek

> 
> Thanks,
> 
> Logan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-22 15:16   ` Bjorn Helgaas
  2024-10-22 21:15     ` Logan Gunthorpe
@ 2024-10-24  5:58     ` Kasireddy, Vivek
  2024-10-24 17:59       ` Bjorn Helgaas
  1 sibling, 1 reply; 24+ messages in thread
From: Kasireddy, Vivek @ 2024-10-24  5:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, Logan Gunthorpe, linux-pci@vger.kernel.org

Hi Bjorn,

> Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> functions of same device
> 
> On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
> > Functions of the same PCI device (such as a PF and a VF) share the
> > same bus and have a common root port and typically, the PF provisions
> > resources for the VF. Therefore, they can be considered compatible
> > as far as P2P access is considered.
> >
> > Currently, although the distance (2) is correctly calculated for
> > functions of the same device, an ACS check failure prevents P2P DMA
> > access between them. Therefore, introduce a small function named
> > pci_devs_are_p2pdma_compatible() to determine if the provider and
> > client belong to the same device and facilitate P2P DMA between
> > them by not enforcing the ACS check.
> >
> > v2:
> > - Relax the enforcment of ACS check only for Intel GPU functions
> >   as they are P2PDMA compatible given the way the PF provisions
> >   the resources among multiple VFs.
> 
> I don't want version history in the commit log.  If the content is
> useful, just incorporate it here directly (without the version info),
> and put the version-to-version changelog below the "---".
Ok, noted; will follow your suggestion for the next versions.

> 
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Cc: <linux-pci@vger.kernel.org>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > ---
> >  drivers/pci/p2pdma.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> > index 4f47a13cb500..a230e661f939 100644
> > --- a/drivers/pci/p2pdma.c
> > +++ b/drivers/pci/p2pdma.c
> > @@ -535,6 +535,17 @@ static unsigned long map_types_idx(struct pci_dev
> *client)
> >  	return (pci_domain_nr(client->bus) << 16) | pci_dev_id(client);
> >  }
> >
> > +static bool pci_devs_are_p2pdma_compatible(struct pci_dev *provider,
> > +					   struct pci_dev *client)
> > +{
> > +	if (provider->vendor == PCI_VENDOR_ID_INTEL) {
> > +		if (pci_is_vga(provider) && pci_is_vga(client))
> > +			return pci_physfn(provider) == pci_physfn(client);
> > +	}
> 
> This doesn't explain why this should be specific to Intel or VGA.  As
> far as I can tell, everything mentioned in the commit log is generic.
> 
> I see the previous comments
> (https://lore.kernel.org/all/eddb423c-945f-40c9-b904-
> 43ea8371f1c4@deltatee.com/),
> but none of that context was captured here.
Ok, I'll augment the commit message to include this context.

> 
> I'm not sure what you refer to by "PF provisions resources for the
> VF".  Isn't it *always* the case that the architected PCI resources
> (BARs) are configured by the PF?  It sounds like you're referring to
> something Intel GPU-specific beyond that?
What I meant to say is that since PF provisions the resources for the VF
in a typical scenario, they should be automatically P2PDMA compatible
particularly when the provider is the VF and PF is the client. However,
since this cannot be guaranteed on all the PCI devices out there for various
reasons, my objective is to start including the ones that can be tested and
are known to be compatible (Intel GPUs).

I'll capture these additional details in the next version.

Thanks,
Vivek

> 
> > +	return false;
> > +}
> > +
> >  /*
> >   * Calculate the P2PDMA mapping type and distance between two PCI
> devices.
> >   *
> > @@ -634,7 +645,7 @@ calc_map_type_and_dist(struct pci_dev *provider,
> struct pci_dev *client,
> >
> >  	*dist = dist_a + dist_b;
> >
> > -	if (!acs_cnt) {
> > +	if (!acs_cnt || pci_devs_are_p2pdma_compatible(provider, client)) {
> >  		map_type = PCI_P2PDMA_MAP_BUS_ADDR;
> >  		goto done;
> >  	}
> > @@ -696,7 +707,9 @@ int pci_p2pdma_distance_many(struct pci_dev
> *provider, struct device **clients,
> >  		return -1;
> >
> >  	for (i = 0; i < num_clients; i++) {
> > -		pci_client = find_parent_pci_dev(clients[i]);
> > +		pci_client = dev_is_pf(clients[i]) ?
> > +				pci_dev_get(to_pci_dev(clients[i])) :
> > +				find_parent_pci_dev(clients[i]);
> >  		if (!pci_client) {
> >  			if (verbose)
> >  				dev_warn(clients[i],
> > --
> > 2.45.1
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-24  5:50       ` Kasireddy, Vivek
@ 2024-10-24 16:21         ` Logan Gunthorpe
  2024-10-24 18:01           ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Logan Gunthorpe @ 2024-10-24 16:21 UTC (permalink / raw)
  To: Kasireddy, Vivek, Bjorn Helgaas
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, linux-pci@vger.kernel.org



On 2024-10-23 23:50, Kasireddy, Vivek wrote:
>> I'd echo many of Bjorn's concerns. In addition, I think the name of the
>> pci_devs_are_p2pdma_compatible() isn't quite right. Specifically this is
>> dealing with PCI functions within a single device that are known to
>> allow P2P traffic. So I think the name should probably reflect that.
> Would pci_devfns_support_p2pdma() be a more appropriate name?

That sounds better to me, thanks.

Logan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-24  5:58     ` Kasireddy, Vivek
@ 2024-10-24 17:59       ` Bjorn Helgaas
  2024-10-25  6:57         ` Kasireddy, Vivek
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-10-24 17:59 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, Logan Gunthorpe, linux-pci@vger.kernel.org

On Thu, Oct 24, 2024 at 05:58:48AM +0000, Kasireddy, Vivek wrote:
> > Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> > functions of same device
> > 
> > On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
> > > Functions of the same PCI device (such as a PF and a VF) share the
> > > same bus and have a common root port and typically, the PF provisions
> > > resources for the VF. Therefore, they can be considered compatible
> > > as far as P2P access is considered.

I don't understand the "therefore they can be considered compatible"
conclusion.  The spec quote below seems like it addresses exactly this
situation: it says ACS can control peer-to-peer requests between VFs.

> ...
> > I'm not sure what you refer to by "PF provisions resources for the
> > VF".  Isn't it *always* the case that the architected PCI
> > resources (BARs) are configured by the PF?  It sounds like you're
> > referring to something Intel GPU-specific beyond that?
>
> What I meant to say is that since PF provisions the resources for
> the VF in a typical scenario,

Are you talking about BARs?  As far as I know, the PF BAR assignments
always (not just in typical scenarios) determine the VF BAR
assignments.  

Or are you referring to some other non-BAR resources?

> they should be automatically P2PDMA compatible particularly when the
> provider is the VF and PF is the client. However, since this cannot
> be guaranteed on all the PCI devices out there for various reasons,
> my objective is to start including the ones that can be tested and
> are known to be compatible (Intel GPUs).

Regardless of BAR or other VF resources, I don't think VFs are
automatically P2PDMA compatible.  For example, PCIe r6.0, sec 6.12.1.2
says:

  For ACS requirements, single-Function devices that are SR-IOV
  capable must be handled as if they were Multi-Function Devices.

  ...

  - ACS P2P Request Redirect: must be implemented by Functions that
    support peer-to-peer traffic with other Functions. This includes
    SR-IOV Virtual Functions (VFs).  ACS P2P Request Redirect is
    subject to interaction with the ACS P2P Egress Control and ACS
    Direct Translated P2P mechanisms (if implemented). Refer to
    Section 6.12.3 for more information.  When ACS P2P Request
    Redirect is enabled in a Multi-Function Device that is not an
    RCiEP, peer-to-peer Requests (between Functions of the device)
    must be redirected Upstream towards the RC.

Or do you mean something else by "P2PDMA compatible"?

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-24 16:21         ` Logan Gunthorpe
@ 2024-10-24 18:01           ` Bjorn Helgaas
  0 siblings, 0 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2024-10-24 18:01 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kasireddy, Vivek, dri-devel@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Thu, Oct 24, 2024 at 10:21:17AM -0600, Logan Gunthorpe wrote:
> On 2024-10-23 23:50, Kasireddy, Vivek wrote:
> >> I'd echo many of Bjorn's concerns. In addition, I think the name of the
> >> pci_devs_are_p2pdma_compatible() isn't quite right. Specifically this is
> >> dealing with PCI functions within a single device that are known to
> >> allow P2P traffic. So I think the name should probably reflect that.
> >
> > Would pci_devfns_support_p2pdma() be a more appropriate name?
> 
> That sounds better to me, thanks.

This sounds similar to what's done in pci_dev_specific_acs_enabled().
Could this problem be solved by adding more device-specific quirks
there?

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-24 17:59       ` Bjorn Helgaas
@ 2024-10-25  6:57         ` Kasireddy, Vivek
  2024-10-30 18:46           ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Kasireddy, Vivek @ 2024-10-25  6:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, Logan Gunthorpe, linux-pci@vger.kernel.org

Hi Bjorn,

> Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> functions of same device
> 
> On Thu, Oct 24, 2024 at 05:58:48AM +0000, Kasireddy, Vivek wrote:
> > > Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> > > functions of same device
> > >
> > > On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
> > > > Functions of the same PCI device (such as a PF and a VF) share the
> > > > same bus and have a common root port and typically, the PF provisions
> > > > resources for the VF. Therefore, they can be considered compatible
> > > > as far as P2P access is considered.
> 
> I don't understand the "therefore they can be considered compatible"
> conclusion.  The spec quote below seems like it addresses exactly this
> situation: it says ACS can control peer-to-peer requests between VFs.
I am only referring to the specific case where the PF is trying to access (P2P)
a VF's resource that the PF itself has provisioned. Shouldn't this be considered
a valid access?

> 
> > ...
> > > I'm not sure what you refer to by "PF provisions resources for the
> > > VF".  Isn't it *always* the case that the architected PCI
> > > resources (BARs) are configured by the PF?  It sounds like you're
> > > referring to something Intel GPU-specific beyond that?
> >
> > What I meant to say is that since PF provisions the resources for
> > the VF in a typical scenario,
> 
> Are you talking about BARs?  As far as I know, the PF BAR assignments
> always (not just in typical scenarios) determine the VF BAR
> assignments.
Right, I am indeed talking about BARs.

> 
> Or are you referring to some other non-BAR resources?
> 
> > they should be automatically P2PDMA compatible particularly when the
> > provider is the VF and PF is the client. However, since this cannot
> > be guaranteed on all the PCI devices out there for various reasons,
> > my objective is to start including the ones that can be tested and
> > are known to be compatible (Intel GPUs).
> 
> Regardless of BAR or other VF resources, I don't think VFs are
> automatically P2PDMA compatible.
I agree that VFs in general are not automatically P2PDMA compatible
but a PF and a VF should be considered compatible particularly when the
provider is a VF and PF is the client.

> For example, PCIe r6.0, sec 6.12.1.2  says:
> 
>   For ACS requirements, single-Function devices that are SR-IOV
>   capable must be handled as if they were Multi-Function Devices.
> 
>   ...
> 
>   - ACS P2P Request Redirect: must be implemented by Functions that
>     support peer-to-peer traffic with other Functions. This includes
>     SR-IOV Virtual Functions (VFs).  ACS P2P Request Redirect is
>     subject to interaction with the ACS P2P Egress Control and ACS
>     Direct Translated P2P mechanisms (if implemented). Refer to
>     Section 6.12.3 for more information.  When ACS P2P Request
>     Redirect is enabled in a Multi-Function Device that is not an
>     RCiEP, peer-to-peer Requests (between Functions of the device)
>     must be redirected Upstream towards the RC.
> 
> Or do you mean something else by "P2PDMA compatible"?
I am no longer making any generic claims about devices' P2PDMA
compatibility. Instead, as mentioned above, I am only focused on the
interactions between a PF (client) and a VF (provider), particularly with
Intel GPUs. 

More specifically, I am trying to address a use-case where the VF needs to
share a buffer with the PF but is unsuccessful because pci_p2pdma_distance_many(
provider, client, 1, true) fails (due to ACS redirect being set) although
the buffer is located within a BAR resource that the PF has provisioned
and has full access to it. Shouldn't this be allowed?

Thanks,
Vivek

> 
> Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-25  6:57         ` Kasireddy, Vivek
@ 2024-10-30 18:46           ` Bjorn Helgaas
  2024-10-30 21:20             ` Logan Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-10-30 18:46 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, Logan Gunthorpe, linux-pci@vger.kernel.org

On Fri, Oct 25, 2024 at 06:57:37AM +0000, Kasireddy, Vivek wrote:
> > Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> > functions of same device
> > 
> > On Thu, Oct 24, 2024 at 05:58:48AM +0000, Kasireddy, Vivek wrote:
> > > > Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> > > > functions of same device
> > > >
> > > > On Sun, Oct 20, 2024 at 10:21:29PM -0700, Vivek Kasireddy wrote:
> > > > > Functions of the same PCI device (such as a PF and a VF) share the
> > > > > same bus and have a common root port and typically, the PF provisions
> > > > > resources for the VF. Therefore, they can be considered compatible
> > > > > as far as P2P access is considered.
> > 
> > I don't understand the "therefore they can be considered compatible"
> > conclusion.  The spec quote below seems like it addresses exactly this
> > situation: it says ACS can control peer-to-peer requests between VFs.
>
> I am only referring to the specific case where the PF is trying to
> access (P2P) a VF's resource that the PF itself has provisioned.
> Shouldn't this be considered a valid access?
> 
> > > ...
> > > > I'm not sure what you refer to by "PF provisions resources for the
> > > > VF".  Isn't it *always* the case that the architected PCI
> > > > resources (BARs) are configured by the PF?  It sounds like you're
> > > > referring to something Intel GPU-specific beyond that?
> > >
> > > What I meant to say is that since PF provisions the resources for
> > > the VF in a typical scenario,
> > 
> > Are you talking about BARs?  As far as I know, the PF BAR assignments
> > always (not just in typical scenarios) determine the VF BAR
> > assignments.
>
> Right, I am indeed talking about BARs.
> 
> > Or are you referring to some other non-BAR resources?
> > 
> > > they should be automatically P2PDMA compatible particularly when the
> > > provider is the VF and PF is the client. However, since this cannot
> > > be guaranteed on all the PCI devices out there for various reasons,
> > > my objective is to start including the ones that can be tested and
> > > are known to be compatible (Intel GPUs).
> > 
> > Regardless of BAR or other VF resources, I don't think VFs are
> > automatically P2PDMA compatible.
>
> I agree that VFs in general are not automatically P2PDMA compatible
> but a PF and a VF should be considered compatible particularly when the
> provider is a VF and PF is the client.
> 
> > For example, PCIe r6.0, sec 6.12.1.2  says:
> > 
> >   For ACS requirements, single-Function devices that are SR-IOV
> >   capable must be handled as if they were Multi-Function Devices.
> > 
> >   ...
> > 
> >   - ACS P2P Request Redirect: must be implemented by Functions that
> >     support peer-to-peer traffic with other Functions. This includes
> >     SR-IOV Virtual Functions (VFs).  ACS P2P Request Redirect is
> >     subject to interaction with the ACS P2P Egress Control and ACS
> >     Direct Translated P2P mechanisms (if implemented). Refer to
> >     Section 6.12.3 for more information.  When ACS P2P Request
> >     Redirect is enabled in a Multi-Function Device that is not an
> >     RCiEP, peer-to-peer Requests (between Functions of the device)
> >     must be redirected Upstream towards the RC.
> > 
> > Or do you mean something else by "P2PDMA compatible"?
>
> I am no longer making any generic claims about devices' P2PDMA
> compatibility. Instead, as mentioned above, I am only focused on the
> interactions between a PF (client) and a VF (provider), particularly
> with Intel GPUs. 
> 
> More specifically, I am trying to address a use-case where the VF
> needs to share a buffer with the PF but is unsuccessful because
> pci_p2pdma_distance_many(provider, client, 1, true) fails (due to
> ACS redirect being set) although the buffer is located within a BAR
> resource that the PF has provisioned and has full access to it.
> Shouldn't this be allowed?

IIUC you want the PF to be able to initiate a transaction on the PCIe
link to access a VF BAR.  The address in that TLP will be inside the
VF BAR (and also inside the space defined by the VF BAR<n> and the
NumVFs value in the PF's SR-IOV Capability).

In the PCIe world, I don't think a TLP can "loop back" to another
function on the same device.  I think it has to go upstream at least
to the Port above the originating function.  The Port works like a
PCI-PCI bridge.  The TLP address will be inside a Port memory window,
so in the absence of ACS, the Port would reflect the TLP back down the
same link it came from.  I'm pretty sure an analyzer on the link would
see two distinct TLPs.

But as far as I can tell, when ACS P2P Request Redirect is enabled,
the spec requires that the Port forward the TLP upstream (regardless
of the TLP address) instead of reflecting it back to the downstream
link.

Do you read the spec differently?

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-30 18:46           ` Bjorn Helgaas
@ 2024-10-30 21:20             ` Logan Gunthorpe
  2024-10-30 22:07               ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Logan Gunthorpe @ 2024-10-30 21:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Kasireddy, Vivek
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, linux-pci@vger.kernel.org

On 2024-10-30 12:46, Bjorn Helgaas wrote:
> On Fri, Oct 25, 2024 at 06:57:37AM +0000, Kasireddy, Vivek wrote:
> In the PCIe world, I don't think a TLP can "loop back" to another
> function on the same device.

I'm not sure if the spec says anything that specifically denies this.
But it seems to me that it would be possible for a multifunction device
to handle a transfer to a neighbouring function internally and not
actually involve the PCIe fabric. This seems like something we'd want to
support if and when such a device were to be created.

Logan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-30 21:20             ` Logan Gunthorpe
@ 2024-10-30 22:07               ` Bjorn Helgaas
  2024-10-31  6:59                 ` Kasireddy, Vivek
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-10-30 22:07 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kasireddy, Vivek, dri-devel@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Wed, Oct 30, 2024 at 03:20:02PM -0600, Logan Gunthorpe wrote:
> On 2024-10-30 12:46, Bjorn Helgaas wrote:
> > On Fri, Oct 25, 2024 at 06:57:37AM +0000, Kasireddy, Vivek wrote:
> > In the PCIe world, I don't think a TLP can "loop back" to another
> > function on the same device.
> 
> I'm not sure if the spec says anything that specifically denies this.

I'm not a hardware guy and I don't know if there's a direct statement
about it, but if a Downstream Port supports ACS, it must support ACS
P2P Request Redirect (PCIe r6.0, sec 6.12.1.1), which specifically
applies to peer-to-peer TLPs.

If peer-to-peer TLPs appear on the link, the Downstream Port will see
them and act on them, e.g., either route them upstream (if P2P Request
Redirect is enabled) or back downstream.  I don't think the VF could
act on them directly via a loopback path because that would lead to
duplicate writes and duplicate Completions for reads.

> But it seems to me that it would be possible for a multifunction device
> to handle a transfer to a neighbouring function internally and not
> actually involve the PCIe fabric. This seems like something we'd want to
> support if and when such a device were to be created.

If peer-to-peer transactions are handled internally, an SR-IOV device
other than an RCiEP is required to support ACS with P2P Egress Control
(sec 7.7.11) and P2P Request Redirect (sec 7.7.11.2).

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device
  2024-10-30 22:07               ` Bjorn Helgaas
@ 2024-10-31  6:59                 ` Kasireddy, Vivek
  0 siblings, 0 replies; 24+ messages in thread
From: Kasireddy, Vivek @ 2024-10-31  6:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Logan Gunthorpe
  Cc: dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Bjorn Helgaas, linux-pci@vger.kernel.org

Hi Bjorn,

> Subject: Re: [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for
> functions of same device
> 
> On Wed, Oct 30, 2024 at 03:20:02PM -0600, Logan Gunthorpe wrote:
> > On 2024-10-30 12:46, Bjorn Helgaas wrote:
> > > On Fri, Oct 25, 2024 at 06:57:37AM +0000, Kasireddy, Vivek wrote:
> > > In the PCIe world, I don't think a TLP can "loop back" to another
> > > function on the same device.
> >
> > I'm not sure if the spec says anything that specifically denies this.
> 
> I'm not a hardware guy and I don't know if there's a direct statement
> about it, but if a Downstream Port supports ACS, it must support ACS
> P2P Request Redirect (PCIe r6.0, sec 6.12.1.1), which specifically
> applies to peer-to-peer TLPs.
> 
> If peer-to-peer TLPs appear on the link, the Downstream Port will see
> them and act on them, e.g., either route them upstream (if P2P Request
> Redirect is enabled) or back downstream.  I don't think the VF could
> act on them directly via a loopback path because that would lead to
> duplicate writes and duplicate Completions for reads.
> 
> > But it seems to me that it would be possible for a multifunction device
> > to handle a transfer to a neighbouring function internally and not
> > actually involve the PCIe fabric. This seems like something we'd want to
> > support if and when such a device were to be created.
> 
> If peer-to-peer transactions are handled internally, an SR-IOV device
> other than an RCiEP is required to support ACS with P2P Egress Control
> (sec 7.7.11) and P2P Request Redirect (sec 7.7.11.2).
As Logan suggests, my use-case does not involve using the PCIe fabric to
accomplish the DMA access between PF and VF. Instead, the PF's (GPU) driver
handles VF's BAR addresses (associated with the buffer) by translating them
into a different (internal and local) address space before facilitating access.

To articulate further, in my use-case, there is a driver A (that is bound to VF)
that needs to share a buffer with driver B (associated with the PF). However,
driver A would like to know if B can access its buffer or not, so it calls
pci_p2pdma_distance(A, B, ...) to check, as both PF and VF are PCI devices.
But given that pci_p2pdma_distance_many() uses ACS as the main criterion
(in addition to the bridge whitelist and a specific CPU type), it determines
that the access is invalid (as ACS redirect is set).

IIUC, it appears that pci_p2pdma_distance_many() is not the right tool to
use to check for access validity in my use-case, as it assumes the provider
and client would always use PCIe fabric for DMA. I think it either needs
to be augmented to handle various situations or a new helper is needed.
What is your recommended solution for this issue?

Thanks,
Vivek

> 
> Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-10-31  6:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-21  5:21 [PATCH v2 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
2024-10-21  5:21 ` [PATCH v2 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device Vivek Kasireddy
2024-10-22 15:16   ` Bjorn Helgaas
2024-10-22 21:15     ` Logan Gunthorpe
2024-10-24  5:50       ` Kasireddy, Vivek
2024-10-24 16:21         ` Logan Gunthorpe
2024-10-24 18:01           ` Bjorn Helgaas
2024-10-24  5:58     ` Kasireddy, Vivek
2024-10-24 17:59       ` Bjorn Helgaas
2024-10-25  6:57         ` Kasireddy, Vivek
2024-10-30 18:46           ` Bjorn Helgaas
2024-10-30 21:20             ` Logan Gunthorpe
2024-10-30 22:07               ` Bjorn Helgaas
2024-10-31  6:59                 ` Kasireddy, Vivek
2024-10-21  5:21 ` [PATCH v2 2/5] drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode Vivek Kasireddy
2024-10-21  5:21 ` [PATCH v2 3/5] drm/xe/pf: Add a helper function to get a VF's backing object in LMEM Vivek Kasireddy
2024-10-21  5:21 ` [PATCH v2 4/5] drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs Vivek Kasireddy
2024-10-22 10:12   ` kernel test robot
2024-10-22 10:54   ` kernel test robot
2024-10-21  5:21 ` [PATCH v2 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs Vivek Kasireddy
2024-10-22 12:58   ` kernel test robot
2024-10-21  5:52 ` ✓ CI.Patch_applied: success for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM (rev2) Patchwork
2024-10-21  5:52 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-21  5:52 ` ✗ CI.KUnit: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox