Re: [RFC PATCH] drm/xe/dma-buf: Allow pinning of p2p dma-buf

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Maarten Lankhorst <maarten.lankhorst@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>,
	intel-xe@lists.freedesktop.org, Dave Airlie <airlied@gmail.com>,
	Simona Vetter <simona.vetter@ffwll.ch>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Lucas De Marchi <lucas.demarchi@intel.com>
Subject: Re: [RFC PATCH] drm/xe/dma-buf: Allow pinning of p2p dma-buf
Date: Wed, 17 Sep 2025 10:48:05 +0200	[thread overview]
Message-ID: <27729672-509e-4f95-8aef-4dbf86858715@intel.com> (raw)
In-Reply-To: <dafff5f2281d7d3479c043c1d3d9f5ad7645f7d6.camel@linux.intel.com>

Hey,

On 2025-09-16 20:54, Thomas Hellström wrote:
> On Tue, 2025-09-16 at 11:02 -0700, Matthew Brost wrote:
>> On Tue, Sep 16, 2025 at 03:06:48PM +0200, Thomas Hellström wrote:
>>> On Tue, 2025-09-16 at 14:03 +0100, Matthew Auld wrote:
>>>> On 16/09/2025 12:53, Thomas Hellström wrote:
>>>>> RDMA NICs typically requires the VRAM dma-bufs to be pinned in
>>>>> VRAM for pcie-p2p communication, since they don't fully support
>>>>> the move_notify() scheme. We would like to support that.
>>>>>
>>>>> However allowing unaccounted pinning of VRAM creates a DOS
>>>>> vector
>>>>> so up until now we haven't allowed it.
>>>>>
>>>>> However with cgroups support in TTM, the amount of VRAM
>>>>> allocated
>>>>> to a cgroup can be limited, and since also the pinned memory is
>>>>> accounted as allocated VRAM we should be safe.
>>>>>
>>>>> An analogy with system memory can be made if we observe the
>>>>> similarity with kernel system memory that is allocated as the
>>>>> result of user-space action and that is accounted using
>>>>> __GFP_ACCOUNT.
>>>>>
>>>>> Ideally, to be more flexible, we would add a "pinned_memory",
>>>>> or possibly "kernel_memory" limit to the dmem cgroups
>>>>> controller,
>>>>> that would additionally limit the memory that is pinned in this
>>>>> way.
>>>>> If we let that limit default to the dmem::max limit we can
>>>>> introduce that without needing to care about regressions.
>>>>>
>>>>> Considering that we already pin VRAM in this way for at least
>>>>> page-table memory and LRC memory, and the above path to greater
>>>>> flexibility, allow this also for dma-bufs.
>>>>>
>>>>> Cc: Dave Airlie <airlied@gmail.com>
>>>>> Cc: Simona Vetter <simona.vetter@ffwll.ch>
>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@intel.com>
>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>> Signed-off-by: Thomas Hellström
>>>>> <thomas.hellstrom@linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/xe/tests/xe_dma_buf.c | 13 +++++++++
>>>>>   drivers/gpu/drm/xe/xe_dma_buf.c       | 41 +++++++++++++++++-
>>>>> ----
>>>>> -----
>>>>>   2 files changed, 39 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>>>>> b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>>>>> index a7e548a2bdfb..1f88ca71820c 100644
>>>>> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>>>>> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>>>>> @@ -31,6 +31,7 @@ static void check_residency(struct kunit
>>>>> *test,
>>>>> struct xe_bo *exported,
>>>>>   			    struct drm_exec *exec)
>>>>>   {
>>>>>   	struct dma_buf_test_params *params =
>>>>> to_dma_buf_test_params(test->priv);
>>>>> +	struct dma_buf_attachment *attach;
>>>>>   	u32 mem_type;
>>>>>   	int ret;
>>>>>   
>>>>> @@ -88,6 +89,18 @@ static void check_residency(struct kunit
>>>>> *test,
>>>>> struct xe_bo *exported,
>>>>>   
>>>>>   	KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported,
>>>>> mem_type));
>>>>>   
>>>>> +	/* Check that we can pin without migrating. */
>>>>> +	attach = list_first_entry_or_null(&dmabuf-
>>>>>> attachments,
>>>>> typeof(*attach), node);
>>>>> +	if (attach) {
>>>>> +		int err = dma_buf_pin(attach);
>>>>> +
>>>>> +		if (!err) {
>>>>> +			KUNIT_EXPECT_TRUE(test,
>>>>> xe_bo_is_mem_type(exported, mem_type));
>>>>> +			dma_buf_unpin(attach);
>>>>> +		}
>>>>> +		KUNIT_EXPECT_EQ(test, err, 0);
>>>>> +	}
>>>>> +
>>>>>   	if (params->force_different_devices)
>>>>>   		KUNIT_EXPECT_TRUE(test,
>>>>> xe_bo_is_mem_type(imported, XE_PL_TT));
>>>>>   	else
>>>>> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c
>>>>> b/drivers/gpu/drm/xe/xe_dma_buf.c
>>>>> index a7d67725c3ee..54e42960daad 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
>>>>> @@ -48,32 +48,43 @@ static void xe_dma_buf_detach(struct
>>>>> dma_buf
>>>>> *dmabuf,
>>>>>   
>>>>>   static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>>>>>   {
>>>>> -	struct drm_gem_object *obj = attach->dmabuf->priv;
>>>>> +	struct dma_buf *dmabuf = attach->dmabuf;
>>>>> +	struct drm_gem_object *obj = dmabuf->priv;
>>>>>   	struct xe_bo *bo = gem_to_xe_bo(obj);
>>>>>   	struct xe_device *xe = xe_bo_device(bo);
>>>>>   	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>>>>> +	bool allow_vram = true;
>>>>>   	int ret;
>>>>>   
>>>>> -	/*
>>>>> -	 * For now only support pinning in TT memory, for two
>>>>> reasons:
>>>>> -	 * 1) Avoid pinning in a placement not accessible to
>>>>> some
>>>>> importers.
>>>>> -	 * 2) Pinning in VRAM requires PIN accounting which is
>>>>> a
>>>>> to-do.
>>>>> -	 */
>>>>> -	if (xe_bo_is_pinned(bo) && !xe_bo_is_mem_type(bo,
>>>>> XE_PL_TT)) {
>>>>> +	if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
>>>>> +		allow_vram = false;
>>>>> +	} else {
>>>>> +		list_for_each_entry(attach, &dmabuf-
>>>>>> attachments,
>>>>> node) {
>>>>> +			if (!attach->peer2peer) {
>>>>> +				allow_vram = false;
>>>>> +				break;
>>>>> +			}
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +	if (xe_bo_is_pinned(bo) && !xe_bo_is_mem_type(bo,
>>>>> XE_PL_TT) &&
>>>>> +	    !(xe_bo_is_vram(bo) && allow_vram)) {
>>>>>   		drm_dbg(&xe->drm, "Can't migrate pinned bo for
>>>>> dma-buf pin.\n");
>>>>>   		return -EINVAL;
>>>>>   	}
>>>>>   
>>>>> -	ret = xe_bo_migrate(bo, XE_PL_TT, NULL, exec);
>>>>> -	if (ret) {
>>>>> -		if (ret != -EINTR && ret != -ERESTARTSYS)
>>>>> -			drm_dbg(&xe->drm,
>>>>> -				"Failed migrating dma-buf to
>>>>> TT
>>>>> memory: %pe\n",
>>>>> -				ERR_PTR(ret));
>>>>> -		return ret;
>>>>> +	if (!allow_vram) {
>>>>> +		ret = xe_bo_migrate(bo, XE_PL_TT, NULL, exec);
>>>>> +		if (ret) {
>>>>> +			if (ret != -EINTR && ret != -
>>>>> ERESTARTSYS)
>>>>> +				drm_dbg(&xe->drm,
>>>>> +					"Failed migrating dma-
>>>>> buf
>>>>> to TT memory: %pe\n",
>>>>> +					ERR_PTR(ret));
>>>>> +			return ret;
>>>>> +		}
>>>>>   	}
>>>>>   
>>>>> -	ret = xe_bo_pin_external(bo, true, exec);
>>>>> +	ret = xe_bo_pin_external(bo, !allow_vram, exec);
>>>> Are we also missing save/restore support for such objects? Or at
>>>> least I 
>>>> can't see where the save flow is happening for externally pinned
>>>> VRAM?
>>> Good point. I forgot about that. IIRC we once made a deliberate
>>> decision to leave that out since we didn't support it.
>>>
>>> I'll have a look at that as well depending if we decide to go
>>> ahead with this.
>>>
>> Don't we take a PM ref exporting device in xe_dma_buf_attach? So when
>> memory attached to (or pinned in this case) we can't go through save
>> /
>> restore flows as the device should be awake?
> Yes, rpm save/restore should not happen, however system suspend or
> hibernate may.

Just a quick review: A call to ttm_bo_validate() is missing to ensure we don't pin while in wrong placement or swapped out.

By default the dmem cgroup controller is not yet used by most distributions, so introducing and using separate max_pinned + actually pinned accounting would work. It would make sense to make it first come first serve, which makes accounting simple. This would make the semantics the same as the first time hitting 'max' limit, but without the possibility for eviction.

This will likely work better than re-using the 'min' semantics, since they serve different purposes. One might want to allow pinning, but not want to pin memory by default unless needed.

Kind regards,
~Maarten

next prev parent reply	other threads:[~2025-09-17  8:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-16 11:53 [RFC PATCH] drm/xe/dma-buf: Allow pinning of p2p dma-buf Thomas Hellström
2025-09-16 12:10 ` ✓ CI.KUnit: success for " Patchwork
2025-09-16 12:43 ` ✓ Xe.CI.BAT: " Patchwork
2025-09-16 13:03 ` [RFC PATCH] " Matthew Auld
2025-09-16 13:06   ` Thomas Hellström
2025-09-16 18:02     ` Matthew Brost
2025-09-16 18:54       ` Thomas Hellström
2025-09-17  8:48         ` Maarten Lankhorst [this message]
2025-09-19  7:11           ` Thomas Hellström
2025-09-16 19:53     ` Rodrigo Vivi
2025-09-16 20:35       ` Thomas Hellström
2025-09-16 21:05         ` Thomas Hellström
2025-09-16 14:44 ` ✓ Xe.CI.Full: success for " Patchwork
2025-09-17  5:43 ` [RFC PATCH] " Niranjana Vishwanathapura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27729672-509e-4f95-8aef-4dbf86858715@intel.com \
    --to=maarten.lankhorst@intel.com \
    --cc=airlied@gmail.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox