From: Daniel Almeida <daniel.almeida@collabora.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Alex Gaynor" <alex.gaynor@gmail.com>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <benno.lossin@proton.me>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Trevor Gross" <tmgross@umich.edu>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>,
"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
"Lyude Paul" <lyude@redhat.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
dri-devel@lists.freedesktop.org,
"Asahi Lina" <lina@asahilina.net>
Subject: Re: [PATCH v2 2/2] rust: drm: Add GPUVM abstraction
Date: Thu, 19 Jun 2025 10:49:34 -0300 [thread overview]
Message-ID: <477186AF-A8D4-4D4E-9F58-86958C654333@collabora.com> (raw)
In-Reply-To: <20250619135709.634625e0@collabora.com>
Hi Boris,
> On 19 Jun 2025, at 08:57, Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
> Hi,
>
> On Fri, 13 Jun 2025 13:42:59 -0300
> Daniel Almeida <daniel.almeida@collabora.com> wrote:
>
>> Danilo,
>>
>>
>>> <snip>
>>>
>>>>>> +// SAFETY: DRM GpuVmBo objects are always reference counted and the get/put functions
>>>>>> +// satisfy the requirements.
>>>>>> +unsafe impl<T: DriverGpuVm> AlwaysRefCounted for GpuVmBo<T> {
>>>>>> + fn inc_ref(&self) {
>>>>>> + // SAFETY: The drm_gpuvm_get function satisfies the requirements for inc_ref().
>>>>>> + unsafe { bindings::drm_gpuvm_bo_get(&self.bo as *const _ as *mut _) };
>>>>>> + }
>>>>>> +
>>>>>> + unsafe fn dec_ref(mut obj: NonNull<Self>) {
>>>>>> + // SAFETY: drm_gpuvm_bo_put() requires holding the gpuva lock, which is
>>>>>> + // the dma_resv lock by default.
>>>>>> + // The drm_gpuvm_put function satisfies the requirements for dec_ref().
>>>>>> + // (We do not support custom locks yet.)
>>>>>> + unsafe {
>>>>>> + let resv = (*obj.as_mut().bo.obj).resv;
>>>>>> + bindings::dma_resv_lock(resv, core::ptr::null_mut());
>>>>>> + bindings::drm_gpuvm_bo_put(&mut obj.as_mut().bo);
>>>>>> + bindings::dma_resv_unlock(resv);
>>>>>
>>>>> What if the resv_lock is held already? Please also make sure to put multiple
>>>>> unsafe calls each in a separate unsafe block.
>>>>
>>>> By whom?
>>>
>>> The lock might be held already by the driver or by TTM when things are called
>>> from TTM callbacks.
>>>
>>> This is why GPUVM never takes locks by itself, but asserts that the correct lock
>>> is held.
>>>
>>> I think we really want to get proof by the driver by providing lock guard
>>> references.
>>>
>>
>> There doesn’t seem to be a solution that fits all the boxes here.
>>
>> As you said, at this point the current status of the resv is unknown. If we
>> simply assume that it is not taken, we run into the problem you pointed out:
>> i.e.: recursive locking where ttm or some other layer has the lock already.
>>
>> Alternatively, if we assume that the resv must be locked in dec_ref(), then we
>> may build a lock::Guard from it and assert that it is held, but in any case
>> it's very confusing to expect the reservation to be locked on a dec_ref() call.
>>
>> The fact that dec_ref() is placed automatically on drop will massively
>> complicate the call sites:
>
> I'm digressing, but there's an aspect I found very annoying in the C
> version of the API: the fact that we have to take a BO ref, then lock,
> then release the vm_bo [1], because otherwise the vm_bo might be the
> last owner of a BO ref leading to a UAF on the lock itself. This to me,
> denotes a lifetime issue that I think would be good to address in the
> rust version of the API.
>
> It's not exactly the same problem, but I think it comes from the same
> root issue: lax ownership definition. By that I mean it's not clear
> who's the owner and who's the owned. gem_object::gpuva::list has
> weak refs on the vm_bos contained in this list, which kinda makes sense
> because vm_bos themselves have a ref on the gem_object, and if we were
> to make this weak ref a strong ref we'd never free any of these two
> objects. The lock is also part of the BO (either the BO resv lock, or a
> custom lock), and since it's the very same lock we use to insert/remove
> vm_bos, that's problematic.
>
> If we were making the gpuvm_bo_list a separate object that's originally
> created by the BO, and then let the GPUVM layer manipulate only this
> list, it could work. Of course that means the resv lock/driver custom
> lock should come from this object too, and I'm not too sure that's an
> option when dma_buf imports are involved.
This would require changes to the C side, because the C helpers expect the gpuva
list to be inside drm_gem_object, and these helpers are used in the Rust
bindings.
Don't get me wrong, IMHO we can definitely pursue this if needed, or if it's an
improvement. I am merely stating the fact that it's a more elaborate solution
that what I had originally proposed so that we are all aware.
>
>>
>> We will have to ensure that the resv is locked at all times where we interface
>> with a GpuVmBo, because each of these points could possibly be the last active
>> ref. If we don't, then we've introduced a race where the list is modified but
>> no lock is taken, which will be a pretty easy mistake to make. This seems to
>> also be the case in C, which we should try to improve upon.
>
> Yep, with auto-unref thrown into the mix you have to be very careful on
> which paths might release the last vm_bo ref, and make sure an extra
> ref is taken on the BO, and the resv/custom lock is held when that
> happens.
>
>>
>> My suggestion is to introduce a separate GPU-VA lock here:
>>
>> /// A base GEM object.
>> #[repr(C)]
>> #[pin_data]
>> pub struct Object<T: DriverObject> {
>> obj: bindings::drm_gem_object,
>> // The DRM core ensures the Device exists as long as its objects exist, so we don't need to
>> // manage the reference count here.
>> dev: *const bindings::drm_device,
>> #[pin]
>> inner: T,
>> #[pin]
>> _p: PhantomPinned,
>> // Add a GPU-VA lock here <--------
>> }
>>
>> And only support custom locks in Rust, to the detriment of the optimization
>> where the resv is used and to the detriment of any perf improvements that
>> reusing the reservation might bring to the table.
>
> Yes, if it was only about perf optimizations, then I'd like to see
> numbers that prove taking an extra lock that's always going to be free
> in a path where you already took the BO resv lock actually makes a
> difference, and honestly, I doubt it. But my fear is that it's not so
> much about avoiding an extra lock to be taken, and more about making
> sure this list insertion/deletion doesn't race with other paths that
> are assuming that taking the resv lock is enough to guarantee exclusive
> access to this vm_bo list (I mean places outside gpuvm, in the drivers
> directly). I guess the is fixable.
>
>>
>> Notice that this would sidestep this entire discussion: nobody else would be
>> aware of this new lock so we could safely lock it in dec_ref(). We would also
>> be transparently managing the locking on behalf of drivers in all the other
>> calls where the VA list is accessed, which is another plus as I said above.
>
> If the lock is part of the gem_object, it's not solving the problem I
> described above, because you might be taking a lock that disappears if
> you don't take a BO ref before taking the lock. In the end, that's
> still a risky business.
I must confess I was not aware of the issue you're discussing. We can add an
instance of gem::ObjectRef in gpuvm::GpuVmBo if you think this will solve it.
This will make sure that the vmbo owns the gem object (or rather that it has a
+1 on the refcount) for its lifetime. That seems to be the semantics you're
looking for?
I thought about adding this lock to the Rust wrapper for the vmbo, but then
we'd run into the issue of shared BOs, where each would be connected to a
different vmbo and would thus take a different lock when attempting to modify
the same underlying gpuva list.
Just to be clear, by shared BO here I specifically mean a BO that's connected
to more than one VM, which is possible, IIUC.
— Daniel
next prev parent reply other threads:[~2025-06-19 13:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 16:41 [PATCH v2 0/2] Add a Rust GPUVM abstraction Daniel Almeida
2025-04-22 16:41 ` [PATCH v2 1/2] rust: helpers: Add bindings/wrappers for dma_resv Daniel Almeida
2025-04-22 16:52 ` Alyssa Rosenzweig
2025-04-22 16:41 ` [PATCH v2 2/2] rust: drm: Add GPUVM abstraction Daniel Almeida
2025-04-22 21:16 ` Danilo Krummrich
2025-04-23 13:29 ` Daniel Almeida
2025-04-23 14:38 ` Danilo Krummrich
2025-05-14 20:11 ` Daniel Almeida
2025-06-10 21:04 ` Daniel Almeida
2025-06-13 16:42 ` Daniel Almeida
2025-06-19 11:57 ` Boris Brezillon
2025-06-19 13:49 ` Daniel Almeida [this message]
2025-06-19 17:32 ` Danilo Krummrich
2025-06-10 21:06 ` Daniel Almeida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=477186AF-A8D4-4D4E-9F58-86958C654333@collabora.com \
--to=daniel.almeida@collabora.com \
--cc=a.hindborg@kernel.org \
--cc=airlied@gmail.com \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=alyssa@rosenzweig.io \
--cc=benno.lossin@proton.me \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=boris.brezillon@collabora.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=lina@asahilina.net \
--cc=linux-kernel@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=sumit.semwal@linaro.org \
--cc=tmgross@umich.edu \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).