From: Matthew Brost <matthew.brost@intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: <intel-xe@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>, <leonro@nvidia.com>,
<francois.dugast@intel.com>, <thomas.hellstrom@linux.intel.com>,
<himal.prasad.ghimiray@intel.com>
Subject: Re: [RFC PATCH v3 06/11] drm/pagemap: Add IOVA interface to DRM pagemap
Date: Wed, 28 Jan 2026 10:42:53 -0800 [thread overview]
Message-ID: <aXpYrfUmEaaOsse8@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260128151458.GJ1641016@ziepe.ca>
On Wed, Jan 28, 2026 at 11:14:58AM -0400, Jason Gunthorpe wrote:
> On Tue, Jan 27, 2026 at 04:48:36PM -0800, Matthew Brost wrote:
> > Add an IOVA interface to the DRM pagemap layer. This provides a semantic
> > wrapper around the dma-map IOVA alloc/link/sync/unlink/free API while
> > remaining flexible enough to support future high-speed interconnects
> > between devices.
>
> I don't think this is a very clear justification.
>
> "IOVA" and dma_addr_t should be strictly reserved for communication
> that flows through the interconnect that Linux struct device is aware
> of (ie the PCIe fabric). It should not ever be used for "high speed
> interconnects" implying some private and hidden things like
> xgmi/nvlink/ualink type stuff.
>
Yes, the future is looking forward to xgmi/nvlink/ualink type stuff. I
agree we (DRM pagemap, GPU SVM, Xe) need a refactor to avoid using
dma_addr_t for any interfaces here once we unify this xgmi/nvlink/ualink
as dma_addr_t doesn't make tons of sense. This is a PoC the code structure.
s/IOVA/something else/ for interfaces may make sense too.
> I can't think of any reason why you'd want to delegate constructing
> the IOVA to some other code. I can imagine you'd want to get a pfn
> list from someplace else and turn that into a mapping.
>
Yes, this is exactly what I envision here. First, let me explain the
possible addressing modes on the UAL fabric:
- Physical (akin to IOMMU passthrough)
- Virtual (akin to IOMMU enabled)
Physical mode is straightforward — resolve the PFN to a cross-device
physical address, then install it into the initiator’s page tables along
with a bit indicating routing over the network. In this mode, the vfuncs
here are basically NOPs.
Virtual mode is the tricky one. There are addressing modes where a
virtual address must be allocated at the target device (i.e., the
address on the wire is translated at the target via a page-table walk).
This is why the code is structured the way it is, and why I envision a
UAL API that mirrors dma-map. At the initiator the initiator target
virtual addresss is installed the page tables along with a bit
indicating routing over the network.
Let me give some examples of what this would look like in a few of the
vfuncs — see [1] for the dma-map implementation. Also ignore dma_addr_t
abuse for now.
[1] https://patchwork.freedesktop.org/patch/701149/?series=160587&rev=3
struct xe_svm_iova_cookie {
struct dma_iova_state state;
struct ual_iova_state ual_state;
};
static void *xe_drm_pagemap_device_iova_alloc(struct drm_pagemap *dpagemap,
struct device *dev, size_t length,
enum dma_data_direction dir)
{
struct device *pgmap_dev = dpagemap->drm->dev;
struct xe_svm_iova_cookie *cookie;
static bool locking_proved = false;
xe_drm_pagemap_device_iova_prove_locking(&locking_proved);
if (pgmap_dev == dev)
return NULL;
cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
if (!cookie)
return NULL;
if (ual_distance(pgmap_dev, dev) < 0) {
dma_iova_try_alloc(dev, &cookie->state, length >= SZ_2M ? SZ_2M : 0,
length);
if (dma_use_iova(&cookie->state))
return cookie;
} else {
err = ual_iova_try_alloc(pgmap_dev, &cookie->ual_state,
length >= SZ_2M ? SZ_2M : 0,
length);
if (err)
return ERR_PTR(err);
if (ual_use_iova(&cookie->state))
return cookie;
}
kfree(cookie);
return NULL;
}
So, here in physical mode - 'ual_use_iova' would return false, true in virtual.
This function is also interesting because ual_iova_try_alloc in virtual
mode can allocate memory for PTEs on the target device. This is why the
kernel doc explanation for Context, along with
xe_drm_pagemap_device_iova_prove_locking, is important to ensure that
all the locking is correct.
Now this function:
static struct drm_pagemap_addr
xe_drm_pagemap_device_iova_link(struct drm_pagemap *dpagemap,
struct device *dev, struct page *page,
size_t length, size_t offset, void *cookie,
enum dma_data_direction dir)
{
struct device *pgmap_dev = dpagemap->drm->dev;
struct xe_svm_iova_cookie *__cookie = cookie;
struct xe_device *xe = to_xe_device(dpagemap->drm);
enum drm_interconnect_protocol prot;
dma_addr_t addr;
int err;
if (dma_use_iova(&__cookie->state) {
addr = __cookie->state.addr + offset;
proto = XE_INTERCONNECT_P2P;
err = dma_iova_link(dev, &__cookie->state, xe_page_to_pcie(page),
offset, length, dir, DMA_ATTR_SKIP_CPU_SYNC |
DMA_ATTR_MMIO);
} else {
addr = __cookie->ual_state.addr + offset;
proto = XE_INTERCONNECT_VRAM; /* Also means over fabric */
err = ual_iova_link(dev, &__cookie->ual_state, xe_page_to_pcie(page),
offset, length, dir);
}
if (err)
addr = DMA_MAPPING_ERROR;
return drm_pagemap_addr_encode(addr, proto, ilog2(length), dir);
}
Note that the above function can only be called in virtual mode (i.e.,
the first function returns an IOVA cookie). Here we’d jam the target’s
PTEs with physical page addresses (reclaim-safe) and return the network
virtual address.
Lastly a physical UAL example (i.e., first function returns NULL).
static struct drm_pagemap_addr
xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
struct device *dev,
struct page *page,
unsigned int order,
enum dma_data_direction dir)
{
struct device *pgmap_dev = dpagemap->drm->dev;
enum drm_interconnect_protocol prot;
dma_addr_t addr;
if (pgmap_dev == dev || ual_distance(pgmap_dev, dev) >= 0) {
addr = xe_page_to_dpa(page);
prot = XE_INTERCONNECT_VRAM;
} else {
addr = dma_map_resource(dev,
xe_page_to_pcie(page),
PAGE_SIZE << order, dir,
DMA_ATTR_SKIP_CPU_SYNC);
prot = XE_INTERCONNECT_P2P;
}
return drm_pagemap_addr_encode(addr, prot, order, dir);
}
So, if it isn’t clear — these vfuncs hide whether PCIe P2P is being used
(IOMMU in passthrough or enabled) or UAL is being used (physical or
virtual) for DRM common layer. They manage the resources for the
connection and provide the information needed to program the initiator
PTEs (address + “use interconnect” vs. “use PCIe P2P bit”).
This reasoning is why it would be nice if drivers were allowed to
dma-map IOVA alloc/link/sync/unlink/free API for PCIe P2P directly.
> My understanding of all the private interconnects is you get an
> interconnect address and program it directly into the device HW,
> possibly with a "use interconnect" bit, and the device never touches
> the PCIe fabric at all.
>
Yes, but see physical vs virtual explaination. The "use interconnect" is
just one part of this.
Matt
> Jason
next prev parent reply other threads:[~2026-01-28 18:43 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-28 0:48 [RFC PATCH v3 00/11] Use new dma-map IOVA alloc, link, and sync API in GPU SVM and DRM pagemap Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 01/11] drm/pagemap: Add helper to access zone_device_data Matthew Brost
2026-01-28 13:53 ` Leon Romanovsky
2026-01-28 0:48 ` [RFC PATCH v3 02/11] drm/gpusvm: Use dma-map IOVA alloc, link, and sync API in GPU SVM Matthew Brost
2026-01-28 14:04 ` Leon Romanovsky
2026-01-28 0:48 ` [RFC PATCH v3 03/11] drm/pagemap: Split drm_pagemap_migrate_map_pages into device / system Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 04/11] drm/pagemap: Use dma-map IOVA alloc, link, and sync API for DRM pagemap Matthew Brost
2026-01-28 14:28 ` Leon Romanovsky
2026-01-28 17:46 ` Matthew Brost
[not found] ` <20260128175531.GR1641016@ziepe.ca>
2026-01-28 19:29 ` Matthew Brost
2026-01-28 19:45 ` Leon Romanovsky
2026-01-28 21:04 ` Matthew Brost
2026-01-29 10:14 ` Leon Romanovsky
2026-01-29 18:22 ` Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 05/11] drm/pagemap: Reduce number of IOVA link calls Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 06/11] drm/pagemap: Add IOVA interface to DRM pagemap Matthew Brost
[not found] ` <20260128151458.GJ1641016@ziepe.ca>
2026-01-28 18:42 ` Matthew Brost [this message]
2026-01-28 19:41 ` Matthew Brost
[not found] ` <20260128193509.GU1641016@ziepe.ca>
2026-01-28 20:24 ` Matthew Brost
2026-01-29 18:57 ` Jason Gunthorpe
2026-01-29 19:28 ` Matthew Brost
2026-01-29 19:32 ` Jason Gunthorpe
2026-01-28 0:48 ` [RFC PATCH v3 07/11] drm/xe: Stub out DRM pagemap IOVA alloc implementation Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 08/11] drm/pagemap: Use device-to-device IOVA alloc, link, and sync API for DRM pagemap Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 09/11] drm/xe: Drop BO dma-resv lock during SVM migrate-to-device Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 10/11] drm/xe: Implement DRM pagemap IOVA vfuncs Matthew Brost
2026-01-28 0:48 ` [RFC PATCH v3 11/11] drm/gpusvm: Use device-to-device IOVA alloc, link, and sync API in GPU SVM Matthew Brost
2026-01-28 0:59 ` ✗ CI.checkpatch: warning for Use new dma-map IOVA alloc, link, and sync API in GPU SVM and DRM pagemap (rev3) Patchwork
2026-01-28 1:01 ` ✓ CI.KUnit: success " Patchwork
2026-01-28 1:42 ` ✓ Xe.CI.BAT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aXpYrfUmEaaOsse8@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jgg@ziepe.ca \
--cc=leonro@nvidia.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox