Re: [PATCH v1 4/5] drm/xe/bo: Create a new sg for dmabuf BOs that are associated with a VF

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
Cc: "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v1 4/5] drm/xe/bo: Create a new sg for dmabuf BOs that are associated with a VF
Date: Sun, 20 Oct 2024 18:06:12 +0000	[thread overview]
Message-ID: <ZxVGlKBvC3eAzqRy@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <IA0PR11MB71853C7F2EB0A6677892FC3FF8462@IA0PR11MB7185.namprd11.prod.outlook.com>

On Tue, Oct 15, 2024 at 11:41:56PM -0600, Kasireddy, Vivek wrote:
> Hi Matt,
> 
> > 
> > On Fri, Oct 11, 2024 at 07:40:26PM -0700, Vivek Kasireddy wrote:
> > > For BOs of type ttm_bo_type_sg, that are backed by PCI BAR addresses
> > > associated with a VF, we need to adjust and translate these addresses
> > > to LMEM addresses to make the BOs usable by the PF. Otherwise, the BOs
> > > (i.e, PCI BAR addresses) are only accessible by the CPU and not by
> > > the GPU.
> > >
> > > In order to do the above, we first need to identify if the DMA addresses
> > > associated with an imported BO (type ttm_bo_type_sg) belong to System
> > > RAM or a VF or other PCI device. After we confirm that they belong to
> > > a VF, we convert the DMA addresses (IOVAs in this case) to DPAs and
> > > create a new sg and populate it with the new addresses.
> > 
> > I think using a SG list is a no-go. We have received pushback before [1]
> > about using a SG list as structure to hold DPA rather than dma-address.
> > The consensus was a SG list is not a generic structure to hold any
> > address [2], rather a specific structure for dma addressess.
> AFAIU, that would make sense if the SG list was exposed outside of the Xe
> driver but the SG list that is created by this patch is only used internally by
> the Xe driver. Would this still not be acceptable?
> 

I'd check with Thomas on this. I'm not hugely opposed, but if I remember
correctly, Thomas agrees with Jason in the links below that SG lists
shouldn't be used like this.

Certainly, making this internal to Xe makes it safer, though.

> > 
> > I'm pretty sure we will have define a new BO type (ttm_bo_type_devmem?)
> > and structure that can iterated on if we want to do something like this
> > unless we want to ignore the above feedback.
> Right, if using SG list is a no-go, then using some form of xe_res_cursor to
> iterate seems like the only other option.
>

Ah, yes. Since this is internal to Xe, it may be easy enough to use a
different cursor type here rather than touching TTM.
 
> > 
> > [1] https://patchwork.freedesktop.org/patch/574894/?series=128910&rev=1
> > [2]
> > https://patchwork.freedesktop.org/patch/574894/?series=128910&rev=1#co
> > mment_1070889
> > 
> > >
> > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c       | 108
> > ++++++++++++++++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_bo_types.h |   6 ++
> > >  2 files changed, 113 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > index c74c121ea7bb..64efe1b21f19 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -6,6 +6,7 @@
> > >  #include "xe_bo.h"
> > >
> > >  #include <linux/dma-buf.h>
> > > +#include <linux/iommu.h>
> > >
> > >  #include <drm/drm_drv.h>
> > >  #include <drm/drm_gem_ttm_helper.h>
> > > @@ -15,16 +16,19 @@
> > >  #include <drm/ttm/ttm_tt.h>
> > >  #include <uapi/drm/xe_drm.h>
> > >
> > > +#include "regs/xe_bars.h"
> > >  #include "xe_device.h"
> > >  #include "xe_dma_buf.h"
> > >  #include "xe_drm_client.h"
> > >  #include "xe_ggtt.h"
> > >  #include "xe_gt.h"
> > > +#include "xe_gt_sriov_pf_config.h"
> > >  #include "xe_map.h"
> > >  #include "xe_migrate.h"
> > >  #include "xe_pm.h"
> > >  #include "xe_preempt_fence.h"
> > >  #include "xe_res_cursor.h"
> > > +#include "xe_sriov_pf_helpers.h"
> > >  #include "xe_trace_bo.h"
> > >  #include "xe_ttm_stolen_mgr.h"
> > >  #include "xe_vm.h"
> > > @@ -543,6 +547,102 @@ static int xe_bo_trigger_rebind(struct xe_device
> > *xe, struct xe_bo *bo,
> > >  	return ret;
> > >  }
> > >
> > > +static struct pci_dev *xe_find_vf_dev(struct xe_device *xe,
> > > +				      phys_addr_t phys)
> > > +{
> > > +	struct pci_dev *pdev, *pf_pdev = to_pci_dev(xe->drm.dev);
> > > +	resource_size_t io_start, io_size;
> > > +
> > > +	list_for_each_entry(pdev, &pf_pdev->bus->devices, bus_list) {
> > > +		if (pdev->is_physfn)
> > > +			continue;
> > > +
> > > +		io_start = pci_resource_start(pdev, LMEM_BAR);
> > > +		io_size = pci_resource_len(pdev, LMEM_BAR);
> > > +
> > > +		if (phys >= io_start &&
> > > +		    phys < (io_start + io_size - PAGE_SIZE))
> > > +			return pdev;
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +
> > > +static void xe_bo_translate_iova_to_dpa(struct xe_device *xe,
> > > +					struct sg_table *sg,
> > > +					struct sg_table *new_sg,
> > > +					struct pci_dev *pdev)
> > > +{
> > > +	resource_size_t io_start = pci_resource_start(pdev, LMEM_BAR);
> > > +	struct xe_gt *gt = xe_root_mmio_gt(xe);
> > > +	struct scatterlist *sgl, *new_sgl;
> > > +	int i, vfid = pci_iov_vf_id(pdev);
> > > +	dma_addr_t new_addr, bo_addr;
> > > +	struct iommu_domain *domain;
> > > +	phys_addr_t phys;
> > > +	u64 offset;
> > > +
> > > +	bo_addr = xe_gt_sriov_pf_config_get_lmem_addr(gt, ++vfid);
> > > +	domain = iommu_get_domain_for_dev(xe->drm.dev);
> > > +
> > > +	new_sgl = new_sg->sgl;
> > > +	for_each_sgtable_dma_sg(sg, sgl, i) {
> > 
> > I'm pretty sure this doesn't work if a single dma address of the input
> > 'sg' resolves to a non-contiguous physical DPA. In most cases this is
> > going to be contiguous though unless there is memory pressure or
> > unaligned allocations sizes. Assuming your testing didn't blow up, you
> > may not have hit a memory pressure situation where VRAM gets
> > fragmented.
> Yeah, I did not test exhaustively and also did not realize that that the BO
> (config->lmem_obj) backing the VF's lmem quota may not be contiguous.
> I'll try to figure out a way to test this scenario. However, for each SG entry,
> if I do something like:
> 
> offset = phys - io_start;
> new_addr = xe_bo_addr(config->lmem_obj, offset, sg_dma_len(sgl));
> 
> instead of
> 
> bo_addr = xe_bo_addr(config->lmem_obj, 0, PAGE_SIZE);
> offset = phys - io_start;
> new_addr = bo_addr + offset;
> 
> I believe this would probably work as expected even when lmem_obj is not
> contiguous (as it probes the buddy blocks) or if input SG has non-contiguous
> physical (DPA) ranges.
> 
> > 
> > I think only iommu_iova_to_phys is accurate for exactly 1 page unless
> > I'm missing something. See [3].
> IIUC, it seems to be valid for the segment of size sg_dma_len(sgl), for a given
> SG entry.
> 

So the SG list you are remapping here is set up in xe_dma_buf_map,
specifically for the VRAM case of xe_ttm_vram_mgr_alloc_sgt, right?

Yes, it appears each segment (sg_dma_len(sgl)) is a buddy block, so it
is contiguous. This does work, but it makes a lot of assumptions about
the lower layers, which is not ideal.

Each segment is still a DMA address, so it’s possible that
iommu_iova_to_phys is only accurate for exactly one page within the
segment. I'd prefer to code for that behavior, which will always work,
rather than making assumptions about a lower layer.

With all of the above, can we add a page-by-page DPA cursor? Thomas has
written one for SVM [4] [5]. That code will take a while to land, but I
think it can give you a template to build a cursor like this. Once that
lands, maybe we can combine these cursors.

Matt

[4] https://patchwork.freedesktop.org/patch/619812/?series=137870&rev=2
[5] https://patchwork.freedesktop.org/patch/619845/?series=137870&rev=2

> Thanks,
> Vivek
> 
> > 
> > [3]
> > https://elixir.bootlin.com/linux/v6.11.3/source/drivers/iommu/iommu.c#L2
> > 376
> > 
> > Matt
> > 
> > > +		phys = domain ? iommu_iova_to_phys(domain,
> > sg_dma_address(sgl)) :
> > > +			sg_dma_address(sgl);
> > > +		offset = phys - io_start;
> > > +		new_addr = bo_addr + offset;
> > > +
> > > +		sg_set_page(new_sgl, NULL, sg_dma_len(sgl), 0);
> > > +		sg_dma_address(new_sgl) = new_addr;
> > > +		sg_dma_len(new_sgl) = sg_dma_len(sgl);
> > > +
> > > +		new_sgl = sg_next(new_sgl);
> > > +	}
> > > +}
> > > +
> > > +static struct sg_table *xe_bo_create_new_sg(struct sg_table *sg,
> > > +					    struct xe_bo *bo)
> > > +{
> > > +	struct xe_device *xe = xe_bo_device(bo);
> > > +	struct iommu_domain *domain;
> > > +	struct sg_table *new_sg;
> > > +	struct pci_dev *pdev;
> > > +	phys_addr_t phys;
> > > +	int vfid;
> > > +
> > > +	if (!IS_SRIOV_PF(xe))
> > > +		return sg;
> > > +
> > > +	domain = iommu_get_domain_for_dev(xe->drm.dev);
> > > +	phys = domain ? iommu_iova_to_phys(domain, sg_dma_address(sg-
> > >sgl)) :
> > > +		sg_dma_address(sg->sgl);
> > > +
> > > +	if (page_is_ram(PFN_DOWN(phys)))
> > > +		return sg;
> > > +
> > > +	pdev = xe_find_vf_dev(xe, phys);
> > > +	if (!pdev)
> > > +		return sg;
> > > +
> > > +	vfid = pci_iov_vf_id(pdev);
> > > +	if (vfid < 0)
> > > +		return sg;
> > > +
> > > +	new_sg = kzalloc(sizeof(*new_sg), GFP_KERNEL);
> > > +	if (!new_sg)
> > > +		return sg;
> > > +
> > > +	if (sg_alloc_table(new_sg, sg->nents, GFP_KERNEL)) {
> > > +		kfree(new_sg);
> > > +		return sg;
> > > +	}
> > > +
> > > +	bo->is_devmem_external = true;
> > > +	xe_bo_translate_iova_to_dpa(xe, sg, new_sg, pdev);
> > > +
> > > +	return new_sg;
> > > +}
> > > +
> > >  /*
> > >   * The dma-buf map_attachment() / unmap_attachment() is hooked up
> > here.
> > >   * Note that unmapping the attachment is deferred to the next
> > > @@ -577,7 +677,7 @@ static int xe_bo_move_dmabuf(struct
> > ttm_buffer_object *ttm_bo,
> > >  		return PTR_ERR(sg);
> > >
> > >  	ttm_bo->sg = sg;
> > > -	xe_tt->sg = sg;
> > > +	xe_tt->sg = xe_bo_create_new_sg(sg, ttm_to_xe_bo(ttm_bo));
> > >
> > >  out:
> > >  	ttm_bo_move_null(ttm_bo, new_res);
> > > @@ -1066,6 +1166,8 @@ static void xe_ttm_bo_release_notify(struct
> > ttm_buffer_object *ttm_bo)
> > >
> > >  static void xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object
> > *ttm_bo)
> > >  {
> > > +	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> > > +
> > >  	if (!xe_bo_is_xe_bo(ttm_bo))
> > >  		return;
> > >
> > > @@ -1079,6 +1181,10 @@ static void
> > xe_ttm_bo_delete_mem_notify(struct ttm_buffer_object *ttm_bo)
> > >
> > >  		dma_buf_unmap_attachment(ttm_bo->base.import_attach,
> > ttm_bo->sg,
> > >  					 DMA_BIDIRECTIONAL);
> > > +		if (bo->is_devmem_external && xe_tt->sg != ttm_bo->sg) {
> > > +			sg_free_table(xe_tt->sg);
> > > +			kfree(xe_tt->sg);
> > > +		}
> > >  		ttm_bo->sg = NULL;
> > >  		xe_tt->sg = NULL;
> > >  	}
> > > diff --git a/drivers/gpu/drm/xe/xe_bo_types.h
> > b/drivers/gpu/drm/xe/xe_bo_types.h
> > > index 8b9201775081..0fe619bc436d 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> > > @@ -67,6 +67,12 @@ struct xe_bo {
> > >  	/** @ccs_cleared */
> > >  	bool ccs_cleared;
> > >
> > > +	/**
> > > +	 * @is_devmem_external: Whether this BO is an imported dma-buf
> > that
> > > +	 * has a backing store in VRAM.
> > > +	 */
> > > +	bool is_devmem_external;
> > > +
> > >  	/**
> > >  	 * @cpu_caching: CPU caching mode. Currently only used for
> > userspace
> > >  	 * objects. Exceptions are system memory on DGFX, which is always
> > > --
> > > 2.45.1
> > >

next prev parent reply	other threads:[~2024-10-20 18:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-12  2:40 [PATCH v1 0/5] drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Vivek Kasireddy
2024-10-12  2:40 ` [PATCH v1 1/5] PCI/P2PDMA: Don't enforce ACS check for functions of same device Vivek Kasireddy
2024-10-15 17:45   ` Logan Gunthorpe
2024-10-16  5:29     ` Kasireddy, Vivek
2024-10-16 19:29       ` Logan Gunthorpe
2024-10-12  2:40 ` [PATCH v1 2/5] drm/xe/dmabuf: Don't migrate BO to System RAM while running in VF mode Vivek Kasireddy
2024-10-12  2:40 ` [PATCH v1 3/5] drm/xe/pf: Add a helper function to get a VF's starting address in LMEM Vivek Kasireddy
2024-10-12  2:40 ` [PATCH v1 4/5] drm/xe/bo: Create a new sg for dmabuf BOs that are associated with a VF Vivek Kasireddy
2024-10-12  4:17   ` Matthew Brost
2024-10-16  5:41     ` Kasireddy, Vivek
2024-10-20 18:06       ` Matthew Brost [this message]
2024-10-12  2:40 ` [PATCH v1 5/5] drm/xe/pt: Add an additional check for dmabuf BOs while updating PTEs Vivek Kasireddy
2024-10-20 18:25   ` Matthew Brost
2024-10-12  3:16 ` ✓ CI.Patch_applied: success for drm/xe/sriov: Don't migrate dmabuf BO to System RAM while running in VM Patchwork
2024-10-12  3:16 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-12  3:18 ` ✓ CI.KUnit: success " Patchwork
2024-10-12  3:34 ` ✓ CI.Build: " Patchwork
2024-10-12  3:36 ` ✓ CI.Hooks: " Patchwork
2024-10-12  3:38 ` ✗ CI.checksparse: warning " Patchwork
2024-10-12  4:17 ` ✗ CI.BAT: failure " Patchwork
2024-10-12  9:27 ` ✗ CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZxVGlKBvC3eAzqRy@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox