From: Joel Fernandes <joelagnelf@nvidia.com>
To: "Christian König" <christian.koenig@amd.com>,
"Dave Airlie" <airlied@gmail.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
Danilo Krummrich <dakr@kernel.org>, Zhi Wang <zhiw@nvidia.com>,
linux-kernel@vger.kernel.org,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
Simona Vetter <simona@ffwll.ch>, Jonathan Corbet <corbet@lwn.net>,
Alex Deucher <alexander.deucher@amd.com>,
Jani Nikula <jani.nikula@linux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Tvrtko Ursulin <tursulin@ursulin.net>,
Rui Huang <ray.huang@amd.com>,
Matthew Auld <matthew.auld@intel.com>,
Matthew Brost <matthew.brost@intel.com>,
Lucas De Marchi <lucas.demarchi@intel.com>,
Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
Helge Deller <deller@gmx.de>, Alice Ryhl <aliceryhl@google.com>,
Miguel Ojeda <ojeda@kernel.org>,
Alex Gaynor <alex.gaynor@gmail.com>,
Boqun Feng <boqun.feng@gmail.com>, Gary Guo <gary@garyguo.net>,
Bjorn Roy Baron <bjorn3_gh@protonmail.com>,
Benno Lossin <lossin@kernel.org>,
Andreas Hindborg <a.hindborg@kernel.org>,
Trevor Gross <tmgross@umich.edu>,
Alistair Popple <apopple@nvidia.com>,
Timur Tabi <ttabi@nvidia.com>, Edwin Peer <epeer@nvidia.com>,
Alexandre Courbot <acourbot@nvidia.com>,
Andrea Righi <arighi@nvidia.com>,
Andy Ritger <aritger@nvidia.com>,
Alexey Ivanov <alexeyi@nvidia.com>,
Balbir Singh <balbirs@nvidia.com>,
Philipp Stanner <phasta@kernel.org>,
Elle Rhumsaa <elle@weathered-steel.dev>,
Daniel Almeida <daniel.almeida@collabora.com>,
nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
rust-for-linux@vger.kernel.org, linux-doc@vger.kernel.org,
amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, linux-fbdev@vger.kernel.org
Subject: Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Date: Wed, 4 Feb 2026 18:42:04 -0500 [thread overview]
Message-ID: <e5938ebf-c888-4af0-ad0b-66f983ed0dfe@nvidia.com> (raw)
In-Reply-To: <a50c9e31-a182-4ed7-837c-4a12d220c022@amd.com>
On 2/2/2026 4:12 AM, Christian König wrote:
> On 1/31/26 04:00, Dave Airlie wrote:
>> On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>> On 1/29/2026 10:38 PM, John Hubbard wrote:
[...]
>>>> For the deadlock above, I think a good way to break that deadlock is
>>>> to not allow taking that lock in a fence signaling calling path.
>>>>
>>>> So during an unmap, instead of "lock, unmap/free, unlock" it should
>>>> move the item to a deferred-free list, which is processed separately.
>>>> Of course, this is a little complex, because the allocation and reclaim
>>>> has to be aware of such lists if they get large.
>>> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
>>> (whichever we take during map). The GPU buddy actually does GFP_KERNEL
>>> allocations internally which is problematic.
>>>
>>> Some solutions / next steps:
>>>
>>> 1. allocating (VRAM and system memory) outside mm locks just before acquiring them.
>>>
>>> 2. pre-allocating both VRAM and system memory needed, before the DMA fence
>>> critical paths (The issue is also to figure out how much memory to pre-allocate
>>> for the page table pages based on the VM_BIND request. I think we can analyze
>>> the page tables in the submit stage to make an estimate).
>>>
>>> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
>>> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the
>>> allocate path. I am not sure what do yet about this. ISTR the maple tree also
>>> has similar issues.
>>>
>>> 4. Using non-reclaimable memory allocations where pre-allocation or
>>> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
>>> don't fail allocations when memory is scarce).
>>>
>>> Will work on these issues for the v7. Thanks,
>>
>> The way this works on nouveau at least (and I haven't yet read the
>> nova code in depth).
>>
>> Is we have 4 stages of vmm page table mgmt.
>>
>> ref - locked with a ref lock - can allocate/free memory - just makes
>> sure the page tables exist and are reference counted
>> map - locked with a map lock - cannot allocate memory - fill in the
>> PTEs in the page table
>> unmap - locked with a map lock - cannot allocate memory - removes
>> entries in PTEs
>> unref - locked with a ref lock - can allocate/free memory - just drops
>> references and frees (not sure if it ever merges).
>
> On amdgpu VM page tables are allocated and PTEs filled outside of the fence critical path.
Does that really work for async VM_BIND? If we're missing anything in nova-core
related to the timing of when the allocate and update of the page tables, it
would be good to know.
My understanding you have to write the PTEs at the run stage of the job in
question otherwise you may not know how to map? Are you saying amdgpu writes it
during the run stage but somehow before fence signaling?
>
> Only invalidating PTEs to signal that a shader needs to be taken off the HW are inside the fence critical path and here no memory allocation is needed.
>
> Keep in mind that you not only need to avoid having memory allocations inside the critical path, but also not take locks under which memory is allocated.
Yes, this part I was clear from Danilo's email and clear about the various
deadlocks. See my analysis where what you mention is in the cases I covered:
https://lore.kernel.org/all/20e04a3e-8d7d-47bc-9299-deadf8b9e992@nvidia.com/
> Simona added some dma_fence_begin_signalling() and dma_fence_end_signalling() helpers to add lockdep annotations to the fence signaling path. Those have proven to be extremely useful since they allow lockdep to point out mistakes immediately and not just after hours of running on a test system.
>
Yeah, I looked. Nice! Thanks,
--
Joel Fernandes
next prev parent reply other threads:[~2026-02-04 23:42 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
2026-01-20 23:48 ` Gary Guo
2026-01-21 19:50 ` Joel Fernandes
2026-01-21 20:36 ` Gary Guo
2026-01-21 20:41 ` Joel Fernandes
2026-01-21 20:46 ` Joel Fernandes
2026-01-25 1:51 ` Joel Fernandes
2026-01-21 7:27 ` Zhi Wang
2026-01-21 18:12 ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up Joel Fernandes
2026-02-05 20:55 ` Dave Airlie
2026-02-06 1:04 ` Joel Fernandes
2026-02-06 1:07 ` Dave Airlie
2026-01-20 20:42 ` [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings Joel Fernandes
2026-02-04 3:55 ` Dave Airlie
2026-02-05 1:00 ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 04/26] nova-core: mm: Select GPU_BUDDY for VRAM allocation Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
2026-01-21 8:07 ` Zhi Wang
2026-01-21 17:52 ` Joel Fernandes
2026-01-22 23:16 ` Joel Fernandes
2026-01-23 10:13 ` Zhi Wang
2026-01-23 12:59 ` Joel Fernandes
2026-01-28 12:04 ` Danilo Krummrich
2026-01-28 15:27 ` Joel Fernandes
2026-01-29 0:09 ` Danilo Krummrich
2026-01-29 1:02 ` John Hubbard
2026-01-29 1:49 ` Joel Fernandes
2026-01-29 1:28 ` Joel Fernandes
2026-01-30 0:26 ` Joel Fernandes
2026-01-30 1:11 ` John Hubbard
2026-01-30 1:59 ` Joel Fernandes
2026-01-30 3:38 ` John Hubbard
2026-01-30 21:14 ` Joel Fernandes
2026-01-31 3:00 ` Dave Airlie
2026-01-31 3:21 ` John Hubbard
2026-01-31 20:08 ` Joel Fernandes
2026-01-31 20:02 ` Joel Fernandes
2026-02-02 9:12 ` Christian König
2026-02-04 23:42 ` Joel Fernandes [this message]
2026-01-30 1:16 ` Gary Guo
2026-01-30 1:45 ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 06/26] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 07/26] nova-core: Add BAR1 aperture type and size constant Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 08/26] nova-core: gsp: Add BAR1 PDE base accessors Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 09/26] nova-core: mm: Add common memory management types Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 10/26] nova-core: mm: Add common types for all page table formats Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 11/26] nova-core: mm: Add MMU v2 page table types Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 12/26] nova-core: mm: Add MMU v3 " Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums Joel Fernandes
2026-01-21 9:54 ` Zhi Wang
2026-01-21 18:35 ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support Joel Fernandes
2026-01-21 9:59 ` Zhi Wang
2026-01-21 18:45 ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 15/26] nova-core: mm: Add GpuMm centralized memory manager Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2 Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 17/26] nova-core: mm: Add Virtual Memory Manager Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 18/26] nova-core: mm: Add virtual address range tracking to VMM Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 19/26] nova-core: mm: Add BAR1 user interface Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 20/26] nova-core: gsp: Return GspStaticInfo and FbLayout from boot() Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 21/26] nova-core: mm: Add memory management self-tests Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 22/26] nova-core: mm: Add PRAMIN aperture self-tests Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 23/26] nova-core: gsp: Extract usable FB region from GSP Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 24/26] nova-core: fb: Add usable_vram field to FbLayout Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 25/26] nova-core: mm: Use usable VRAM region for buddy allocator Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 26/26] nova-core: mm: Add BarUser to struct Gpu and create at boot Joel Fernandes
2026-01-28 11:37 ` [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Danilo Krummrich
2026-01-28 12:44 ` Joel Fernandes
2026-01-29 0:01 ` Danilo Krummrich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5938ebf-c888-4af0-ad0b-66f983ed0dfe@nvidia.com \
--to=joelagnelf@nvidia.com \
--cc=a.hindborg@kernel.org \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=alex.gaynor@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=alexeyi@nvidia.com \
--cc=aliceryhl@google.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=arighi@nvidia.com \
--cc=aritger@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=christian.koenig@amd.com \
--cc=corbet@lwn.net \
--cc=dakr@kernel.org \
--cc=daniel.almeida@collabora.com \
--cc=deller@gmx.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=elle@weathered-steel.dev \
--cc=epeer@nvidia.com \
--cc=gary@garyguo.net \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@linux.intel.com \
--cc=jhubbard@nvidia.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fbdev@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=lucas.demarchi@intel.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=ojeda@kernel.org \
--cc=phasta@kernel.org \
--cc=ray.huang@amd.com \
--cc=rodrigo.vivi@intel.com \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tmgross@umich.edu \
--cc=ttabi@nvidia.com \
--cc=tursulin@ursulin.net \
--cc=tzimmermann@suse.de \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox