Linux Framebuffer Layer development
 help / color / mirror / Atom feed
* Re: [PATCH] staging: fbtft: fix macro whitespace errors
From: Dhyan K Prajapati @ 2026-02-27  4:51 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andy Shevchenko, Greg Kroah-Hartman, Jason Donenfeld, dri-devel,
	linux-fbdev, linux-staging, linux-kernel
In-Reply-To: <aaCIqUXPB75vR6rK@smile.fi.intel.com>

On Thu, Feb 26, 2026 at 07:53:45PM +0200, Andy Shevchenko wrote:
>On Thu, Feb 26, 2026 at 10:55:31PM +0530, dhyan19022009@gmail.com wrote:
>>
>> Remove prohibited spaces before closing parentheses in macro calls in
>> fbtft-bus.c, reported by checkpatch.pl
>
>You haven't compiled this, have you?
>
>
>-- 
>With Best Regards,
>Andy Shevchenko
>
>

Hey, sorry I atleast should've checked the definition, i'll keep in mind
not to trust checkpatch.pl blindly

^ permalink raw reply

* Re: [PATCH] staging: fbtft: fix macro whitespace errors
From: kernel test robot @ 2026-02-26 23:35 UTC (permalink / raw)
  To: dhyan19022009, Andy Shevchenko, Greg Kroah-Hartman
  Cc: llvm, oe-kbuild-all, Jason Donenfeld, dri-devel, linux-fbdev,
	linux-staging, linux-kernel, Dhyan K Prajapati
In-Reply-To: <20260226172531.13714-1-dhyan19022009@gmail.com>

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on staging/staging-testing]

url:    https://github.com/intel-lab-lkp/linux/commits/dhyan19022009-gmail-com/staging-fbtft-fix-macro-whitespace-errors/20260227-021646
base:   staging/staging-testing
patch link:    https://lore.kernel.org/r/20260226172531.13714-1-dhyan19022009%40gmail.com
patch subject: [PATCH] staging: fbtft: fix macro whitespace errors
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20260227/202602270720.2J4j3gHF-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260227/202602270720.2J4j3gHF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602270720.2J4j3gHF-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/staging/fbtft/fbtft-bus.c:65:53: error: too few arguments provided to function-like macro invocation
      65 | define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
         |                                                     ^
   drivers/staging/fbtft/fbtft-bus.c:14:9: note: macro 'define_fbtft_write_reg' defined here
      14 | #define define_fbtft_write_reg(func, buffer_type, data_type, modifier)        \
         |         ^
>> drivers/staging/fbtft/fbtft-bus.c:65:1: error: unknown type name 'define_fbtft_write_reg'
      65 | define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
         | ^
   drivers/staging/fbtft/fbtft-bus.c:67:57: error: too few arguments provided to function-like macro invocation
      67 | define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
         |                                                         ^
   drivers/staging/fbtft/fbtft-bus.c:14:9: note: macro 'define_fbtft_write_reg' defined here
      14 | #define define_fbtft_write_reg(func, buffer_type, data_type, modifier)        \
         |         ^
   drivers/staging/fbtft/fbtft-bus.c:67:1: error: unknown type name 'define_fbtft_write_reg'
      67 | define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
         | ^
   4 errors generated.


vim +65 drivers/staging/fbtft/fbtft-bus.c

    64	
  > 65	define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
    66	define_fbtft_write_reg(fbtft_write_reg16_bus8, __be16, u16, cpu_to_be16)
    67	define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
    68	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v11 4/4] rust: gpu: Add GPU buddy allocator bindings
From: Joel Fernandes @ 2026-02-26 21:42 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, Alistair Popple, Andrea Righi, Zhi Wang,
	Philipp Stanner, Elle Rhumsaa, alexeyi, Eliot Courtney, linux-doc,
	amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DGOJDXWDOJD0.2J6NENL44SQJJ@nvidia.com>



On 2/25/2026 9:26 PM, Alexandre Courbot wrote:
> On Thu Feb 26, 2026 at 5:41 AM JST, Joel Fernandes wrote:
>>> This structure doesn't seem to be useful. I would understand using one
>>> if `GpuBuddyParams` had lots of members, some of which have a sensible
>>> default value - then we could implement `Default` and let users fill in
>>> the parameters they need.
>>>
>>> But this structure has no constructor of any sort, requiring users to
>>> fill its 3 members manually - which is actually heavier than having 3
>>> parameters to `GpuBuddy::new`. It is even deconstructed in
>>> `GpuBuddyInner` to store its members as 3 different fields! So let's
>>> skip it.
>>
>> I'd prefer to keep the struct -- all three parameters are `u64`, so
>> positional arguments would be easy to silently misorder. The struct
>> also makes call sites more readable since Rust has no named function call
>> parameters.
> 
> Fair point about the 3 parameters being easily confused. If you keep it,
> can you also store it in `GpuBuddyInner` instead of deconstructing it
> into 3 members?

Done, good idea.

> 
>>
>>>> +pub struct GpuBuddyAllocParams {
>>>
>>> This one also feels like it could be rustified some more.
>>>
>>> By this I mean that it e.g. allows the user to specify a range even if
>>> `RANGE_ALLOCATION` is not set. A C API rejects invalid combinations at
>>> runtime. A Rust API should make it impossible to even express them.
>>>
>>> [...]
>>>
>>> That would turn `alloc_blocks` into something like:
>>>
>>>   `fn alloc_blocks(&self, alloc: AllocType, size: u64, min_block_size: Alignment, flags: AllocBlocksFlags)`
>>
>> The C API supports combining allocation modes with modifiers (e.g.
>> RANGE+CLEAR, TOPDOWN+CLEAR), so modeling the mode as a
>> mutually-exclusive enum would lose valid combinations. More importantly,
> 
> What I suggested does allow you to combine allocation modes with
> modifiers. I should have pasted a bit of code for clarity, so here goes:
> 
>     #[derive(Copy, Clone, Debug, PartialEq, Eq)]
>     pub enum GpuBuddyAllocMode {
>         Simple,
>         Range { start: u64, end: u64 },
>         TopDown,
>     }
> 
>     impl GpuBuddyAllocMode {
>         // Returns the flag corresponding to the allocation mode.
>         //
>         // Intentionally private - for internal use.
>         fn into_flags(self) -> usize {
>             match self {
>                 Self::Simple => 0,
>                 Self::Range { .. } => bindings::GPU_BUDDY_RANGE_ALLOCATION,
>                 Self::TopDown => bindings::GPU_BUDDY_TOPDOWN_ALLOCATION,
>             }
>         }
>     }

I took this bit from  yours(more comments below).
> 
>     impl_flags!(
>         #[derive(Copy, Clone, PartialEq, Eq, Default)]
>         pub struct GpuBuddyAllocFlags(u32);
> 
>         #[derive(Copy, Clone, PartialEq, Eq)]
>         pub enum GpuBuddyAllocFlag {
>             Contiguous = bindings::GPU_BUDDY_CONTIGUOUS_ALLOCATION as u32,
>             Clear = bindings::GPU_BUDDY_CLEAR_ALLOCATION as u32,
>             TrimDisable = bindings::GPU_BUDDY_TRIM_DISABLE as u32,
>         }
>     );
> 
>     pub struct GpuBuddyAllocParams {
>         mode: GpuBuddyAllocMode,
>         size: u64,
>         min_block_size: u64,
>         flags: GpuBuddyAllocFlags,
>     }
> 
I took this bit from  yours(more comments below).

> Now instead of doing something like:
> 
>     let params = GpuBuddyAllocParams {
>         start_range_address: 0,
>         end_range_address: 0,
>         size: SZ_16M as u64,
>         min_block_size: SZ_16M as u64,
>         buddy_flags: BuddyFlag::TopdownAllocation.into(),
>     };
> 
> You would have:
> 
>     let params = GpuBuddyAllocParams {
>         // No unneeded `start_range` and `end_range`!
>         mode: GpuBuddyAllocMode::TopDown,
>         size: SZ_16M as u64,
>         min_block_size: SZ_16M as u64,
>         flags: Default::default(),
>     };
> 
I took this bit from  yours(more comments below).

> And for a cleared range allocation:
> 
>     let params = GpuBuddyAllocParams {
>         mode: GpuBuddyAllocMode::Range {
>             start: 0,
>             end: SZ_16M as u64,
>         },
>         size: SZ_16M as u64,
>         min_block_size: SZ_16M as u64,
>         flags: GpuBuddyAllocFlag::Clear,
>     };
> 
> Actually the parameters are now distinct enough that you don't need a
> type to prevent confusion. A block allocation now just reads like a nice
> sentence:
> 
>     buddy.alloc_blocks(
>         GpuBuddyAllocMode::Range {
>             start: 0,
>             end: SZ_16M,
>         },
>         SZ_16M,
>         // `min_block_size` should be an `Alignment`, the C API even
>         // returns an error if it is not a power of 2.
>         Alignment::new::<SZ_16M>(),
>         GpuBuddyAllocFlag::Clear,
>     )?;

Makes sense, this is indeed better, I'll do it this way.

> 
> And the job of `alloc_blocks` is also simplified:
> 
>     let (start, end) = match mode {
>         GpuBuddyAllocMode::Range { start, end } => (start, end),
>         _ => (0, 0),
>     };
>     let flags = mode.into_flags() | u32::from(flags) as usize;
>     // ... and just invoke the C API with these parameters.
> 
>> if the C allocator evolves its flag semantics (new combinations become
>> valid, or existing constraints change), an enum on the Rust side would
>> break. It's simpler and more maintainable to pass combinable flags and
>> let the C allocator validate -- which it already does. The switch to
>> `impl_flags!` will work for us without over-constraining.
> 
> The evolution you describe is speculative and unlikely to happen as it
> would break all C users just the same. And if the C API adds new flags
> or allocation modes, we will have to update the Rust abstraction either
> way.

How/why would it break C users? Currently top down + range is silently ignored,
implementing it is unlikely to break them.

I also wouldn't call it speculative: top-down within a range is a natural
feature the C allocator could add right? By modeling modes as a mutually
exclusive enum, we're disallowing a flag combination that could become
valid in the future. That's fine for now, but something to keep in mind as we
choose this design. We could add a new RangeTopDown mode variant in the future,
though. That said, I've made the switch to the enum as
you suggested since it is cleaner code! And is more Rust-like as you pointed.

> 
> Rust abstractions should model the C API correctly. By hardening the way
> the C API can be used and stripping out invalid uses, we save headaches
> to users of the API who don't need to worry about whether the flag they
> pass will result in an error or simply be ignored, and we also save
> maintainer time who don't have to explain the intricacies of their APIs
> to confused users. :)
> 

Sure, no argument on that one. ;-)

[...]
>>>> +    base_offset: u64,
>>>
>>> This does not appear to be used in the C API - does it belong here? It
>>> looks like an additional convenience, but I'm not convinced that's the
>>> role of this type to provide this. But if it really is needed by all
>>> users (guess I'll find out after looking the Nova code :)), then keeping
>>> it is fair I guess.
>>
>> Yes, `base_offset` is needed by nova-core. The GPU's usable VRAM
>> starts at `usable_vram_start` from the GSP firmware parameters:
>>
>>     GpuBuddyParams {
>>         base_offset: params.usable_vram_start,
>>         physical_memory_size: params.usable_vram_size,
>>         chunk_size: SZ_4K.into_safe_cast(),
>>     }
>>
>> `AllocatedBlock::offset()` then adds `base_offset` to return absolute
>> VRAM addresses, so callers don't need to track the offset themselves.
> 
> Sounds fair, I'll check how this is used in Nova.
> 
> Ah, another thing I've noticed while writing the example above:
> 
>> +#[pinned_drop]
>> +impl PinnedDrop for AllocatedBlocks {
>> +    fn drop(self: Pin<&mut Self>) {
>> +        let guard = self.buddy.lock();
>> +
>> +        // SAFETY:
>> +        // - list is valid per the type's invariants.
>> +        // - guard provides exclusive access to the allocator.
>> +        // CAST: BuddyFlags were validated to fit in u32 at construction.
>> +        unsafe {
>> +            bindings::gpu_buddy_free_list(
>> +                guard.as_raw(),
>> +                self.list.as_raw(),
>> +                self.flags.as_raw() as u32,
> 
> `gpu_buddy_free_list` only expects the `CLEARED` flag - actually it
> silently masks other flags. So you probably want to just pass `0` here -
> adding a `Cleared` field to `GpuBuddyAllocFlag` would also do the trick,
> but it looks risky to me as it relies on the promise that the user has
> cleared the buffer, which is not something we can guarantee. So I don't
> think we can support this safely.
> 
> If you just pass `0`, then the `flags` member of `AllocatedBlocks`
> becomes unused and you can just drop it.

Good catch, done!

> 
> And another small one - some methods of `Block` are `pub(crate)` - I
> believe they should either be `pub` or kept private.

Changed to pub. thanks,

-- 
Joel Fernandes


^ permalink raw reply

* Re: [PATCH] staging: fbtft: fix macro whitespace errors
From: kernel test robot @ 2026-02-26 21:36 UTC (permalink / raw)
  To: dhyan19022009, Andy Shevchenko, Greg Kroah-Hartman
  Cc: oe-kbuild-all, Jason Donenfeld, dri-devel, linux-fbdev,
	linux-staging, linux-kernel, Dhyan K Prajapati
In-Reply-To: <20260226172531.13714-1-dhyan19022009@gmail.com>

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on staging/staging-testing]

url:    https://github.com/intel-lab-lkp/linux/commits/dhyan19022009-gmail-com/staging-fbtft-fix-macro-whitespace-errors/20260227-021646
base:   staging/staging-testing
patch link:    https://lore.kernel.org/r/20260226172531.13714-1-dhyan19022009%40gmail.com
patch subject: [PATCH] staging: fbtft: fix macro whitespace errors
config: parisc-randconfig-001-20260227 (https://download.01.org/0day-ci/archive/20260227/202602270527.UlXqo4xH-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260227/202602270527.UlXqo4xH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602270527.UlXqo4xH-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/staging/fbtft/fbtft-bus.c:65:53: error: macro "define_fbtft_write_reg" requires 4 arguments, but only 3 given
    define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
                                                        ^
>> drivers/staging/fbtft/fbtft-bus.c:65:23: error: expected ';' before 'void'
    define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
                          ^
                          ;
   drivers/staging/fbtft/fbtft-bus.c:67:57: error: macro "define_fbtft_write_reg" requires 4 arguments, but only 3 given
    define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
                                                            ^
   drivers/staging/fbtft/fbtft-bus.c:67:23: error: expected ';' before 'void'
    define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
                          ^
                          ;
   drivers/staging/fbtft/fbtft-bus.c:69:1:
    void fbtft_write_reg8_bus9(struct fbtft_par *par, int len, ...)
    ~~~~                   


vim +/define_fbtft_write_reg +65 drivers/staging/fbtft/fbtft-bus.c

    64	
  > 65	define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
    66	define_fbtft_write_reg(fbtft_write_reg16_bus8, __be16, u16, cpu_to_be16)
    67	define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
    68	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v11 2/2] rust: clist: Add support to interface with C linked lists
From: Joel Fernandes @ 2026-02-26 19:34 UTC (permalink / raw)
  To: Alvin Sun
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Alex Gaynor, Danilo Krummrich, Dave Airlie, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Daniel Almeida, Koen Koning, Nikola Djukic,
	Alexandre Courbot, Philipp Stanner, Elle Rhumsaa, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Andrea Righi, Andy Ritger, Zhi Wang, Balbir Singh, alexeyi,
	Eliot Courtney, dri-devel, nouveau, rust-for-linux, linux-doc,
	amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <dbbb1a93-93fc-4ea6-bd6f-6f7fbfcc4710@linux.dev>

On Fri, 27 Feb 2026, Alvin Sun wrote:
> Thanks for the clist abstraction. The Tyr debugfs [1] I'm implementing
> needs to iterate over a GpuVm's VA list, and I'd like to switch that to
> a CList-based implementation.

Thanks for looking into using CList for this!

> Could you make CListHeadIter public and expose a public constructor?
> Or do you have a better suggestion?

I think this can be handled without exposing CListHeadIter. See below.

> The VA list mixes two node types in one list — GpuVa (with driver-specific
> data) and KernelGpuVa — so we have to filter/skip nodes and can't use
> CList as-is. With a public CListHeadIter and new(), we can implement a
> custom iterator (like our current GpuVaIter) on top of CListHeadIter and
> then migrate that code to clist instead of hand-rolled list traversal.

Looking at the Tyr code, both GpuVa and KernelGpuVa are
#[repr(transparent)] wrappers over the same C struct (drm_gpuva), linked
through the same list_head field at the same offset. The "two types" are
a Rust-level modeling choice for safety, not a structural difference in
the list — every node in that list is a drm_gpuva.

So CList's typed iteration already works here. You can iterate over all
nodes using a common Rust wrapper type (like a #[repr(transparent)]
wrapper over drm_gpuva), and then skip the kernel-reserved node by
pointer identity — since drm_gpuvm has its kernel_alloc_node as a named
field, its address is known. Something like:

    // Iterate all nodes as a common base type.
    let list = clist_create!(unsafe { head, RawGpuVa, drm_gpuva, rb.entry });
    let kernel_ptr = unsafe { &raw mut (*gpuvm_raw).kernel_alloc_node };

    for va in list.iter() {
        if va.as_raw() == kernel_ptr {
            continue;  // skip
        }

        // Cast to &GpuVa
        let gpu_va = unsafe { GpuVa::from_raw(va.as_raw()) };
        ...
    }

If you need a named iterator type (e.g. for returning from a method),
you can wrap CListIter in your own GpuVaIter struct that stores the
kernel node pointer and filters in its Iterator::next() impl. That would
probably also be cleaner.

OTOH, with CListHeadIter you'd need to do container_of manually on each node,
which might be more erroneous code, whereas CListIter handles that for you.
And anyway, the pointer comparison needed to skip the kernel node is the same
in both approaches.

Would this work for the Tyr debugfs use case?

--
Joel Fernandes


^ permalink raw reply

* Re: [PATCH] staging: fbtft: fix macro whitespace errors
From: Andy Shevchenko @ 2026-02-26 17:53 UTC (permalink / raw)
  To: dhyan19022009
  Cc: Andy Shevchenko, Greg Kroah-Hartman, Jason Donenfeld, dri-devel,
	linux-fbdev, linux-staging, linux-kernel
In-Reply-To: <20260226172531.13714-1-dhyan19022009@gmail.com>

On Thu, Feb 26, 2026 at 10:55:31PM +0530, dhyan19022009@gmail.com wrote:
> 
> Remove prohibited spaces before closing parentheses in macro calls in
> fbtft-bus.c, reported by checkpatch.pl

You haven't compiled this, have you?


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* [PATCH] staging: fbtft: fix macro whitespace errors
From: dhyan19022009 @ 2026-02-26 17:25 UTC (permalink / raw)
  To: Andy Shevchenko, Greg Kroah-Hartman
  Cc: Jason Donenfeld, dri-devel, linux-fbdev, linux-staging,
	linux-kernel, Dhyan K Prajapati

From: Dhyan K Prajapati <dhyan19022009@gmail.com>

Remove prohibited spaces before closing parentheses in macro calls in
fbtft-bus.c, reported by checkpatch.pl

Signed-off-by: Dhyan K Prajapati <dhyan19022009@gmail.com>
---
 drivers/staging/fbtft/fbtft-bus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fbtft/fbtft-bus.c b/drivers/staging/fbtft/fbtft-bus.c
index 30e436ff1..409770891 100644
--- a/drivers/staging/fbtft/fbtft-bus.c
+++ b/drivers/staging/fbtft/fbtft-bus.c
@@ -62,9 +62,9 @@ out:									      \
 }                                                                             \
 EXPORT_SYMBOL(func);
 
-define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8, )
+define_fbtft_write_reg(fbtft_write_reg8_bus8, u8, u8)
 define_fbtft_write_reg(fbtft_write_reg16_bus8, __be16, u16, cpu_to_be16)
-define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16, )
+define_fbtft_write_reg(fbtft_write_reg16_bus16, u16, u16)
 
 void fbtft_write_reg8_bus9(struct fbtft_par *par, int len, ...)
 {
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v11 2/2] rust: clist: Add support to interface with C linked lists
From: Alvin Sun @ 2026-02-26 16:23 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Alex Gaynor, Danilo Krummrich, Dave Airlie, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Daniel Almeida, Koen Koning, Nikola Djukic,
	Alexandre Courbot, Philipp Stanner, Elle Rhumsaa, Jonathan Corbet,
	Alex Deucher, Christian König, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellström,
	Helge Deller, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Andrea Righi, Andy Ritger, Zhi Wang, Balbir Singh,
	alexeyi, Eliot Courtney, dri-devel, nouveau, rust-for-linux,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <20260224222734.3153931-3-joelagnelf@nvidia.com>


On 2/25/26 06:27, Joel Fernandes wrote:
> Add a new module `clist` for working with C's doubly circular linked
> lists. Provide low-level iteration over list nodes.
>
> Typed iteration over actual items is provided with a `clist_create`
> macro to assist in creation of the `CList` type.
>
> Cc: Nikola Djukic <ndjukic@nvidia.com>
> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
> Acked-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>   MAINTAINERS              |   8 +
>   rust/helpers/helpers.c   |   1 +
>   rust/helpers/list.c      |  17 ++
>   rust/kernel/ffi/clist.rs | 338 +++++++++++++++++++++++++++++++++++++++
>   rust/kernel/ffi/mod.rs   |   2 +
>   rust/kernel/lib.rs       |   1 +
>   6 files changed, 367 insertions(+)
>   create mode 100644 rust/helpers/list.c
>   create mode 100644 rust/kernel/ffi/clist.rs
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dc82a6bd1a61..0dc318c94c99 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -23181,6 +23181,14 @@ T:	git https://github.com/Rust-for-Linux/linux.git alloc-next
>   F:	rust/kernel/alloc.rs
>   F:	rust/kernel/alloc/
>   
> +RUST [FFI HELPER]
> +M:	Joel Fernandes <joelagnelf@nvidia.com> (CLIST)
> +M:	Alexandre Courbot <acourbot@nvidia.com> (CLIST)
> +L:	rust-for-linux@vger.kernel.org
> +S:	Maintained
> +T:	git https://github.com/Rust-for-Linux/linux.git ffi-next
> +F:	rust/kernel/ffi/
> +
>   RUST [NUM]
>   M:	Alexandre Courbot <acourbot@nvidia.com>
>   R:	Yury Norov <yury.norov@gmail.com>
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index a3c42e51f00a..724fcb8240ac 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -35,6 +35,7 @@
>   #include "io.c"
>   #include "jump_label.c"
>   #include "kunit.c"
> +#include "list.c"
>   #include "maple_tree.c"
>   #include "mm.c"
>   #include "mutex.c"
> diff --git a/rust/helpers/list.c b/rust/helpers/list.c
> new file mode 100644
> index 000000000000..18095a5593c5
> --- /dev/null
> +++ b/rust/helpers/list.c
> @@ -0,0 +1,17 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Helpers for C circular doubly linked list implementation.
> + */
> +
> +#include <linux/list.h>
> +
> +__rust_helper void rust_helper_INIT_LIST_HEAD(struct list_head *list)
> +{
> +	INIT_LIST_HEAD(list);
> +}
> +
> +__rust_helper void rust_helper_list_add_tail(struct list_head *new, struct list_head *head)
> +{
> +	list_add_tail(new, head);
> +}
> diff --git a/rust/kernel/ffi/clist.rs b/rust/kernel/ffi/clist.rs
> new file mode 100644
> index 000000000000..a645358eee58
> --- /dev/null
> +++ b/rust/kernel/ffi/clist.rs
> @@ -0,0 +1,338 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! FFI interface for C doubly circular intrusive linked lists.
> +//!
> +//! This module provides Rust abstractions for iterating over C `list_head`-based
> +//! linked lists. It should only be used for cases where C and Rust code share
> +//! direct access to the same linked list through an FFI interface.
> +//!
> +//! Note: This *must not* be used by Rust components that just need a linked list
> +//! primitive. Use [`kernel::list::List`] instead.
> +//!
> +//! # Examples
> +//!
> +//! ```
> +//! use kernel::{
> +//!     bindings,
> +//!     clist_create,
> +//!     types::Opaque,
> +//! };
> +//! # // Create test list with values (0, 10, 20) - normally done by C code but it is
> +//! # // emulated here for doctests using the C bindings.
> +//! # use core::mem::MaybeUninit;
> +//! #
> +//! # /// C struct with embedded `list_head` (typically will be allocated by C code).
> +//! # #[repr(C)]
> +//! # pub struct SampleItemC {
> +//! #     pub value: i32,
> +//! #     pub link: bindings::list_head,
> +//! # }
> +//! #
> +//! # let mut head = MaybeUninit::<bindings::list_head>::uninit();
> +//! #
> +//! # let head = head.as_mut_ptr();
> +//! # // SAFETY: head and all the items are test objects allocated in this scope.
> +//! # unsafe { bindings::INIT_LIST_HEAD(head) };
> +//! #
> +//! # let mut items = [
> +//! #     MaybeUninit::<SampleItemC>::uninit(),
> +//! #     MaybeUninit::<SampleItemC>::uninit(),
> +//! #     MaybeUninit::<SampleItemC>::uninit(),
> +//! # ];
> +//! #
> +//! # for (i, item) in items.iter_mut().enumerate() {
> +//! #     let ptr = item.as_mut_ptr();
> +//! #     // SAFETY: `ptr` points to a valid `MaybeUninit<SampleItemC>`.
> +//! #     unsafe { (*ptr).value = i as i32 * 10 };
> +//! #     // SAFETY: `&raw mut` creates a pointer valid for `INIT_LIST_HEAD`.
> +//! #     unsafe { bindings::INIT_LIST_HEAD(&raw mut (*ptr).link) };
> +//! #     // SAFETY: `link` was just initialized and `head` is a valid list head.
> +//! #     unsafe { bindings::list_add_tail(&mut (*ptr).link, head) };
> +//! # }
> +//!
> +//! // Rust wrapper for the C struct.
> +//! // The list item struct in this example is defined in C code as:
> +//! //   struct SampleItemC {
> +//! //       int value;
> +//! //       struct list_head link;
> +//! //   };
> +//! //
> +//! #[repr(transparent)]
> +//! pub struct Item(Opaque<SampleItemC>);
> +//!
> +//! impl Item {
> +//!     pub fn value(&self) -> i32 {
> +//!         // SAFETY: [`Item`] has same layout as [`SampleItemC`].
> +//!         unsafe { (*self.0.get()).value }
> +//!     }
> +//! }
> +//!
> +//! // Create typed [`CList`] from sentinel head.
> +//! // SAFETY: head is valid and initialized, items are `SampleItemC` with
> +//! // embedded `link` field, and `Item` is `#[repr(transparent)]` over `SampleItemC`.
> +//! let list = clist_create!(unsafe { head, Item, SampleItemC, link });
> +//!
> +//! // Iterate directly over typed items.
> +//! let mut found_0 = false;
> +//! let mut found_10 = false;
> +//! let mut found_20 = false;
> +//!
> +//! for item in list.iter() {
> +//!     let val = item.value();
> +//!     if val == 0 { found_0 = true; }
> +//!     if val == 10 { found_10 = true; }
> +//!     if val == 20 { found_20 = true; }
> +//! }
> +//!
> +//! assert!(found_0 && found_10 && found_20);
> +//! ```
> +
> +use core::{
> +    iter::FusedIterator,
> +    marker::PhantomData, //
> +};
> +
> +use crate::{
> +    bindings,
> +    types::Opaque, //
> +};
> +
> +use pin_init::{
> +    pin_data,
> +    pin_init,
> +    PinInit, //
> +};
> +
> +/// FFI wrapper for a C `list_head` object used in intrusive linked lists.
> +///
> +/// # Invariants
> +///
> +/// - The underlying `list_head` has been initialized (e.g. via `INIT_LIST_HEAD()`) and its
> +///   `next`/`prev` pointers are valid and non-NULL.
> +#[pin_data]
> +#[repr(transparent)]
> +pub struct CListHead {
> +    #[pin]
> +    inner: Opaque<bindings::list_head>,
> +}
> +
> +impl CListHead {
> +    /// Create a `&CListHead` reference from a raw `list_head` pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `ptr` must be a valid pointer to an initialized `list_head` (e.g. via
> +    ///   `INIT_LIST_HEAD()`), with valid non-NULL `next`/`prev` pointers.
> +    /// - `ptr` must remain valid for the lifetime `'a`.
> +    /// - The list and all linked `list_head` nodes must not be modified from
> +    ///   anywhere for the lifetime `'a`, unless done so via any [`CListHead`] APIs.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a Self {
> +        // SAFETY:
> +        // - [`CListHead`] has same layout as `list_head`.
> +        // - `ptr` is valid and unmodified for 'a per caller guarantees.
> +        unsafe { &*ptr.cast() }
> +    }
> +
> +    /// Get the raw `list_head` pointer.
> +    #[inline]
> +    pub fn as_raw(&self) -> *mut bindings::list_head {
> +        self.inner.get()
> +    }
> +
> +    /// Get the next [`CListHead`] in the list.
> +    #[inline]
> +    pub fn next(&self) -> &Self {
> +        let raw = self.as_raw();
> +        // SAFETY:
> +        // - `self.as_raw()` is valid and initialized per type invariants.
> +        // - The `next` pointer is valid and non-NULL per type invariants
> +        //   (initialized via `INIT_LIST_HEAD()` or equivalent).
> +        unsafe { Self::from_raw((*raw).next) }
> +    }
> +
> +    /// Check if this node is linked in a list (not isolated).
> +    #[inline]
> +    pub fn is_linked(&self) -> bool {
> +        let raw = self.as_raw();
> +        // SAFETY: self.as_raw() is valid per type invariants.
> +        unsafe { (*raw).next != raw && (*raw).prev != raw }
> +    }
> +
> +    /// Pin-initializer that initializes the list head.
> +    pub fn new() -> impl PinInit<Self> {
> +        pin_init!(Self {
> +            // SAFETY: `INIT_LIST_HEAD` initializes `slot` to a valid empty list.
> +            inner <- Opaque::ffi_init(|slot| unsafe { bindings::INIT_LIST_HEAD(slot) }),
> +        })
> +    }
> +}
> +
> +// SAFETY: `list_head` contains no thread-bound state; it only holds
> +// `next`/`prev` pointers.
> +unsafe impl Send for CListHead {}
> +
> +// SAFETY: `CListHead` can be shared among threads as modifications are
> +// not allowed at the moment.
> +unsafe impl Sync for CListHead {}
> +
> +impl PartialEq for CListHead {
> +    #[inline]
> +    fn eq(&self, other: &Self) -> bool {
> +        core::ptr::eq(self, other)
> +    }
> +}
> +
> +impl Eq for CListHead {}
> +
> +/// Low-level iterator over `list_head` nodes.
> +///
> +/// An iterator used to iterate over a C intrusive linked list (`list_head`). Caller has to
> +/// perform conversion of returned [`CListHead`] to an item (using `container_of` macro or similar).
> +///
> +/// # Invariants
> +///
> +/// [`CListHeadIter`] is iterating over an initialized and valid list.
> +struct CListHeadIter<'a> {
Hi,
Joel

Thanks for the clist abstraction. The Tyr debugfs [1] I'm implementing
needs to iterate over a GpuVm's VA list, and I'd like to switch that to
a CList-based implementation.

Could you make CListHeadIter public and expose a public constructor?
Or do you have a better suggestion?

The VA list mixes two node types in one list — GpuVa (with driver-specific
data) and KernelGpuVa — so we have to filter/skip nodes and can't use
CList as-is. With a public CListHeadIter and new(), we can implement a
custom iterator (like our current GpuVaIter) on top of CListHeadIter and 
then
migrate that code to clist instead of hand-rolled list traversal.

[1] 
https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/59/diffs?commit_id=0853a7b69a42b32832c8e57da63272a8585cb76b#23581e10c8b583e85ebde61a1675ac3a70e37c14_84_148

Thanks,
Alvin Sun


> +    /// Current position in the list.
> +    current: &'a CListHead,
> +    /// The sentinel head (used to detect end of iteration).
> +    sentinel: &'a CListHead,
> +}
> +
> +impl<'a> Iterator for CListHeadIter<'a> {
> +    type Item = &'a CListHead;
> +
> +    #[inline]
> +    fn next(&mut self) -> Option<Self::Item> {
> +        // Check if we've reached the sentinel (end of list).
> +        if self.current == self.sentinel {
> +            return None;
> +        }
> +
> +        let item = self.current;
> +        self.current = item.next();
> +        Some(item)
> +    }
> +}
> +
> +impl<'a> FusedIterator for CListHeadIter<'a> {}
> +
> +/// A typed C linked list with a sentinel head intended for FFI use-cases where
> +/// C subsystem manages a linked list that Rust code needs to read. Generally
> +/// required only for special cases.
> +///
> +/// A sentinel head [`CListHead`] represents the entire linked list and can be used
> +/// for iteration over items of type `T`, it is not associated with a specific item.
> +///
> +/// The const generic `OFFSET` specifies the byte offset of the `list_head` field within
> +/// the struct that `T` wraps.
> +///
> +/// # Invariants
> +///
> +/// - The sentinel [`CListHead`] has been initialized (e.g. via `INIT_LIST_HEAD()`) with valid
> +///   non-NULL `next`/`prev` pointers.
> +/// - `OFFSET` is the byte offset of the `list_head` field within the struct that `T` wraps.
> +/// - All the list's `list_head` nodes have been initialized with valid non-NULL `next`/`prev`
> +///   pointers.
> +#[repr(transparent)]
> +pub struct CList<T, const OFFSET: usize>(CListHead, PhantomData<T>);
> +
> +impl<T, const OFFSET: usize> CList<T, OFFSET> {
> +    /// Create a typed [`CList`] reference from a raw sentinel `list_head` pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `ptr` must be a valid pointer to an initialized sentinel `list_head` (e.g. via
> +    ///   `INIT_LIST_HEAD()`), with valid non-NULL `next`/`prev` pointers.
> +    /// - `ptr` must remain valid for the lifetime `'a`.
> +    /// - The list and all linked nodes must not be concurrently modified for the lifetime `'a`.
> +    /// - The list must contain items where the `list_head` field is at byte offset `OFFSET`.
> +    /// - `T` must be `#[repr(transparent)]` over the C struct.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a Self {
> +        // SAFETY:
> +        // - [`CList`] has same layout as [`CListHead`] due to repr(transparent).
> +        // - Caller guarantees `ptr` is a valid, sentinel `list_head` object.
> +        unsafe { &*ptr.cast() }
> +    }
> +
> +    /// Check if the list is empty.
> +    #[inline]
> +    pub fn is_empty(&self) -> bool {
> +        !self.0.is_linked()
> +    }
> +
> +    /// Create an iterator over typed items.
> +    #[inline]
> +    pub fn iter(&self) -> CListIter<'_, T, OFFSET> {
> +        let head = &self.0;
> +        CListIter {
> +            head_iter: CListHeadIter {
> +                current: head.next(),
> +                sentinel: head,
> +            },
> +            _phantom: PhantomData,
> +        }
> +    }
> +}
> +
> +/// High-level iterator over typed list items.
> +pub struct CListIter<'a, T, const OFFSET: usize> {
> +    head_iter: CListHeadIter<'a>,
> +    _phantom: PhantomData<&'a T>,
> +}
> +
> +impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
> +    type Item = &'a T;
> +
> +    #[inline]
> +    fn next(&mut self) -> Option<Self::Item> {
> +        let head = self.head_iter.next()?;
> +
> +        // Convert to item using OFFSET.
> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)
> +        // is valid per invariants.
> +        Some(unsafe { &*head.as_raw().byte_sub(OFFSET).cast::<T>() })
> +    }
> +}
> +
> +impl<'a, T, const OFFSET: usize> FusedIterator for CListIter<'a, T, OFFSET> {}
> +
> +/// Create a C doubly-circular linked list interface `CList` from a raw `list_head` pointer.
> +///
> +/// This macro creates a `CList<T, OFFSET>` that can iterate over items of type `$rust_type`
> +/// linked via the `$field` field in the underlying C struct `$c_type`.
> +///
> +/// # Arguments
> +///
> +/// - `$head`: Raw pointer to the sentinel `list_head` object (`*mut bindings::list_head`).
> +/// - `$rust_type`: Each item's rust wrapper type.
> +/// - `$c_type`: Each item's C struct type that contains the embedded `list_head`.
> +/// - `$field`: The name of the `list_head` field within the C struct.
> +///
> +/// # Safety
> +///
> +/// The caller must ensure:
> +///
> +/// - `$head` is a valid, initialized sentinel `list_head` (e.g. via `INIT_LIST_HEAD()`)
> +///   pointing to a list that is not concurrently modified for the lifetime of the `CList`.
> +/// - The list contains items of type `$c_type` linked via an embedded `$field`.
> +/// - `$rust_type` is `#[repr(transparent)]` over `$c_type` or has compatible layout.
> +///
> +/// # Examples
> +///
> +/// Refer to the examples in this module's documentation.
> +#[macro_export]
> +macro_rules! clist_create {
> +    (unsafe { $head:expr, $rust_type:ty, $c_type:ty, $($field:tt).+ }) => {{
> +        // Compile-time check that field path is a list_head.
> +        let _: fn(*const $c_type) -> *const $crate::bindings::list_head =
> +            |p| unsafe { &raw const (*p).$($field).+ };
> +
> +        // Calculate offset and create `CList`.
> +        const OFFSET: usize = ::core::mem::offset_of!($c_type, $($field).+);
> +        // SAFETY: The caller of this macro is responsible for ensuring safety.
> +        unsafe { $crate::ffi::clist::CList::<$rust_type, OFFSET>::from_raw($head) }
> +    }};
> +}
> diff --git a/rust/kernel/ffi/mod.rs b/rust/kernel/ffi/mod.rs
> index 7d844e9cb339..8c235ca0d1e3 100644
> --- a/rust/kernel/ffi/mod.rs
> +++ b/rust/kernel/ffi/mod.rs
> @@ -5,3 +5,5 @@
>   // Re-export C type definitions from the `ffi` crate so that existing
>   // `kernel::ffi::c_int` etc. paths continue to work.
>   pub use ::ffi::*;
> +
> +pub mod clist;
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 0a77b4c0ffeb..58dbb02c5197 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -28,6 +28,7 @@
>   #![feature(lint_reasons)]
>   //
>   // Stable since Rust 1.82.0.
> +#![feature(offset_of_nested)]
>   #![feature(raw_ref_op)]
>   //
>   // Stable since Rust 1.83.0.

^ permalink raw reply

* Re: [PATCH v2] staging: fbtft: fb_ra8875: replace udelays with fsleep
From: Jose A. Perez de Azpillaga @ 2026-02-26  8:31 UTC (permalink / raw)
  To: Andriy Shevchenko
  Cc: Andy Shevchenko, Jose A . Perez de Azpillaga, dri-devel,
	Greg Kroah-Hartman, linux-fbdev, linux-kernel, linux-staging
In-Reply-To: <aZ_6QvJPZplh6xtd@smile.fi.intel.com>

Hi Andy,

write_reg8_bus8() calls par->fbtftops.write(), which in this driver
resolves to write_spi(), which calls spi_sync(). Since spi_sync()
requires sleepable context, write_reg8_bus8() is transitively guaranteed
to run in non-atomic context, making fsleep() safe to use.

Best regards, Jose

^ permalink raw reply

* [PATCH v1 1/1] fbtft: Update REAMDE to slow down the stream of undesired cleanups
From: Andy Shevchenko @ 2026-02-26  8:08 UTC (permalink / raw)
  To: Andy Shevchenko, dri-devel, linux-fbdev, linux-staging,
	linux-kernel
  Cc: Andy Shevchenko, Greg Kroah-Hartman, Randy Dunlap, Helge Deller,
	Dan Carpenter

Lately the enormous amount of some untested cleanups started coming
to a mailing list. This adds an unneeded and undesired burden on
the reviewers and maintainers. Try to stop that by clearly state
what we accept and on what conditions in the README file.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/staging/fbtft/README | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/fbtft/README b/drivers/staging/fbtft/README
index ba4c74c92e4c..91f152d622bd 100644
--- a/drivers/staging/fbtft/README
+++ b/drivers/staging/fbtft/README
@@ -6,27 +6,12 @@ The module 'fbtft' makes writing drivers for some of these displays very easy.
 
 Development is done on a Raspberry Pi running the Raspbian "wheezy" distribution.
 
-INSTALLATION
-  Download kernel sources
+For new hardware support consider using DRM subsystem (see TODO).
 
-  From Linux 3.15
-    cd drivers/video/fbdev/fbtft
-    git clone https://github.com/notro/fbtft.git
+NOTE:
 
-    Add to drivers/video/fbdev/Kconfig:   source "drivers/video/fbdev/fbtft/Kconfig"
-    Add to drivers/video/fbdev/Makefile:  obj-y += fbtft/
-
-  Before Linux 3.15
-    cd drivers/video
-    git clone https://github.com/notro/fbtft.git
-
-    Add to drivers/video/Kconfig:   source "drivers/video/fbtft/Kconfig"
-    Add to drivers/video/Makefile:  obj-y += fbtft/
-
-  Enable driver(s) in menuconfig and build the kernel
-
-
-See wiki for more information: https://github.com/notro/fbtft/wiki
-
-
-Source: https://github.com/notro/fbtft/
+The driver is in maintenance mode, only performance issue or bug fixes
+are accepted, which effectively means the patches must be tested on
+the real hardware (the patch must be accompanied with the information
+what hardware is that). The treewide changes may also be accepted as
+an exception.
-- 
2.50.1


^ permalink raw reply related

* Re: [PATCH v2] staging: fbtft: fb_ra8875: replace udelays with fsleep
From: Andy Shevchenko @ 2026-02-26  7:46 UTC (permalink / raw)
  To: Jose A. Perez de Azpillaga
  Cc: Greg Kroah-Hartman, Andy Shevchenko, dri-devel, linux-fbdev,
	linux-kernel, linux-staging
In-Reply-To: <20260225234100.152320-1-azpijr@gmail.com>

On Thu, Feb 26, 2026 at 12:40:36AM +0100, Jose A. Perez de Azpillaga wrote:
> The write_reg8_bus8 function uses udelay(100) twice to wait for the

write_reg8_bus8()

> display controller. In non-atomic contexts, fsleep() is the preferred
> API as it automatically selects the most efficient sleeping mechanism
> and avoids busy-waiting.

But how do you know this is the non-atomic context?

> Update both instances to use fsleep(100).

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v11 4/4] rust: gpu: Add GPU buddy allocator bindings
From: Alexandre Courbot @ 2026-02-26  2:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, Alistair Popple, Andrea Righi, Zhi Wang,
	Philipp Stanner, Elle Rhumsaa, alexeyi, Eliot Courtney, linux-doc,
	amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <eff888d1-2caa-45bd-a611-e5772ee94e1b@nvidia.com>

On Thu Feb 26, 2026 at 5:41 AM JST, Joel Fernandes wrote:
>> This structure doesn't seem to be useful. I would understand using one
>> if `GpuBuddyParams` had lots of members, some of which have a sensible
>> default value - then we could implement `Default` and let users fill in
>> the parameters they need.
>>
>> But this structure has no constructor of any sort, requiring users to
>> fill its 3 members manually - which is actually heavier than having 3
>> parameters to `GpuBuddy::new`. It is even deconstructed in
>> `GpuBuddyInner` to store its members as 3 different fields! So let's
>> skip it.
>
> I'd prefer to keep the struct -- all three parameters are `u64`, so
> positional arguments would be easy to silently misorder. The struct
> also makes call sites more readable since Rust has no named function call
> parameters.

Fair point about the 3 parameters being easily confused. If you keep it,
can you also store it in `GpuBuddyInner` instead of deconstructing it
into 3 members?

>
>>> +pub struct GpuBuddyAllocParams {
>>
>> This one also feels like it could be rustified some more.
>>
>> By this I mean that it e.g. allows the user to specify a range even if
>> `RANGE_ALLOCATION` is not set. A C API rejects invalid combinations at
>> runtime. A Rust API should make it impossible to even express them.
>>
>> [...]
>>
>> That would turn `alloc_blocks` into something like:
>>
>>   `fn alloc_blocks(&self, alloc: AllocType, size: u64, min_block_size: Alignment, flags: AllocBlocksFlags)`
>
> The C API supports combining allocation modes with modifiers (e.g.
> RANGE+CLEAR, TOPDOWN+CLEAR), so modeling the mode as a
> mutually-exclusive enum would lose valid combinations. More importantly,

What I suggested does allow you to combine allocation modes with
modifiers. I should have pasted a bit of code for clarity, so here goes:

    #[derive(Copy, Clone, Debug, PartialEq, Eq)]
    pub enum GpuBuddyAllocMode {
        Simple,
        Range { start: u64, end: u64 },
        TopDown,
    }

    impl GpuBuddyAllocMode {
        // Returns the flag corresponding to the allocation mode.
        //
        // Intentionally private - for internal use.
        fn into_flags(self) -> usize {
            match self {
                Self::Simple => 0,
                Self::Range { .. } => bindings::GPU_BUDDY_RANGE_ALLOCATION,
                Self::TopDown => bindings::GPU_BUDDY_TOPDOWN_ALLOCATION,
            }
        }
    }

    impl_flags!(
        #[derive(Copy, Clone, PartialEq, Eq, Default)]
        pub struct GpuBuddyAllocFlags(u32);

        #[derive(Copy, Clone, PartialEq, Eq)]
        pub enum GpuBuddyAllocFlag {
            Contiguous = bindings::GPU_BUDDY_CONTIGUOUS_ALLOCATION as u32,
            Clear = bindings::GPU_BUDDY_CLEAR_ALLOCATION as u32,
            TrimDisable = bindings::GPU_BUDDY_TRIM_DISABLE as u32,
        }
    );

    pub struct GpuBuddyAllocParams {
        mode: GpuBuddyAllocMode,
        size: u64,
        min_block_size: u64,
        flags: GpuBuddyAllocFlags,
    }

Now instead of doing something like:

    let params = GpuBuddyAllocParams {
        start_range_address: 0,
        end_range_address: 0,
        size: SZ_16M as u64,
        min_block_size: SZ_16M as u64,
        buddy_flags: BuddyFlag::TopdownAllocation.into(),
    };

You would have:

    let params = GpuBuddyAllocParams {
        // No unneeded `start_range` and `end_range`!
        mode: GpuBuddyAllocMode::TopDown,
        size: SZ_16M as u64,
        min_block_size: SZ_16M as u64,
        flags: Default::default(),
    };

And for a cleared range allocation:

    let params = GpuBuddyAllocParams {
        mode: GpuBuddyAllocMode::Range {
            start: 0,
            end: SZ_16M as u64,
        },
        size: SZ_16M as u64,
        min_block_size: SZ_16M as u64,
        flags: GpuBuddyAllocFlag::Clear,
    };

Actually the parameters are now distinct enough that you don't need a
type to prevent confusion. A block allocation now just reads like a nice
sentence:

    buddy.alloc_blocks(
        GpuBuddyAllocMode::Range {
            start: 0,
            end: SZ_16M,
        },
        SZ_16M,
        // `min_block_size` should be an `Alignment`, the C API even
        // returns an error if it is not a power of 2.
        Alignment::new::<SZ_16M>(),
        GpuBuddyAllocFlag::Clear,
    )?;

And the job of `alloc_blocks` is also simplified:

    let (start, end) = match mode {
        GpuBuddyAllocMode::Range { start, end } => (start, end),
        _ => (0, 0),
    };
    let flags = mode.into_flags() | u32::from(flags) as usize;
    // ... and just invoke the C API with these parameters.

> if the C allocator evolves its flag semantics (new combinations become
> valid, or existing constraints change), an enum on the Rust side would
> break. It's simpler and more maintainable to pass combinable flags and
> let the C allocator validate -- which it already does. The switch to
> `impl_flags!` will work for us without over-constraining.

The evolution you describe is speculative and unlikely to happen as it
would break all C users just the same. And if the C API adds new flags
or allocation modes, we will have to update the Rust abstraction either
way.

Rust abstractions should model the C API correctly. By hardening the way
the C API can be used and stripping out invalid uses, we save headaches
to users of the API who don't need to worry about whether the flag they
pass will result in an error or simply be ignored, and we also save
maintainer time who don't have to explain the intricacies of their APIs
to confused users. :)

>
>>> +/// The internal [`GpuBuddyGuard`] ensures that the lock is held for all
>>
>> `GpuBuddyGuard` is exported and not internal though.
>
> Fixed, removed "internal" wording.
>
>>> +    base_offset: u64,
>>
>> This does not appear to be used in the C API - does it belong here? It
>> looks like an additional convenience, but I'm not convinced that's the
>> role of this type to provide this. But if it really is needed by all
>> users (guess I'll find out after looking the Nova code :)), then keeping
>> it is fair I guess.
>
> Yes, `base_offset` is needed by nova-core. The GPU's usable VRAM
> starts at `usable_vram_start` from the GSP firmware parameters:
>
>     GpuBuddyParams {
>         base_offset: params.usable_vram_start,
>         physical_memory_size: params.usable_vram_size,
>         chunk_size: SZ_4K.into_safe_cast(),
>     }
>
> `AllocatedBlock::offset()` then adds `base_offset` to return absolute
> VRAM addresses, so callers don't need to track the offset themselves.

Sounds fair, I'll check how this is used in Nova.

Ah, another thing I've noticed while writing the example above:

> +#[pinned_drop]
> +impl PinnedDrop for AllocatedBlocks {
> +    fn drop(self: Pin<&mut Self>) {
> +        let guard = self.buddy.lock();
> +
> +        // SAFETY:
> +        // - list is valid per the type's invariants.
> +        // - guard provides exclusive access to the allocator.
> +        // CAST: BuddyFlags were validated to fit in u32 at construction.
> +        unsafe {
> +            bindings::gpu_buddy_free_list(
> +                guard.as_raw(),
> +                self.list.as_raw(),
> +                self.flags.as_raw() as u32,

`gpu_buddy_free_list` only expects the `CLEARED` flag - actually it
silently masks other flags. So you probably want to just pass `0` here -
adding a `Cleared` field to `GpuBuddyAllocFlag` would also do the trick,
but it looks risky to me as it relies on the promise that the user has
cleared the buffer, which is not something we can guarantee. So I don't
think we can support this safely.

If you just pass `0`, then the `flags` member of `AllocatedBlocks`
becomes unused and you can just drop it.

And another small one - some methods of `Block` are `pub(crate)` - I
believe they should either be `pub` or kept private.

^ permalink raw reply

* Re: [PATCH v11 2/2] rust: clist: Add support to interface with C linked lists
From: Joel Fernandes @ 2026-02-26  0:24 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Alex Gaynor, Danilo Krummrich, Dave Airlie, Maarten Lankhorst,
	Maxime Ripard, Simona Vetter, Daniel Almeida, Koen Koning,
	Nikola Djukic, Philipp Stanner, Elle Rhumsaa, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alistair Popple, Andrea Righi, Zhi Wang, alexeyi, Eliot Courtney,
	dri-devel, nouveau, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev
In-Reply-To: <DGNW1KH6TCE1.3DIVLKYG6OURI@nvidia.com>

On Wed, 25 Feb 2026 17:09:22 +0900, Alexandre Courbot wrote:
> On Wed Feb 25, 2026 at 7:27 AM JST, Joel Fernandes wrote:
>> Add a new module `clist` for working with C's doubly circular linked
>> lists. Provide low-level iteration over list nodes.

<snip>

> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)
>
> `item_ptr` does not exist.

Good catch, thanks! Fixed.

-- 
Joel Fernandes


^ permalink raw reply

* [PATCH v2] staging: fbtft: fb_ra8875: replace udelays with fsleep
From: Jose A. Perez de Azpillaga @ 2026-02-25 23:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Andy Shevchenko
  Cc: Jose A . Perez de Azpillaga, dri-devel, linux-fbdev, linux-kernel,
	linux-staging
In-Reply-To: <20260225204602.134218-1-azpijr@gmail.com>

The write_reg8_bus8 function uses udelay(100) twice to wait for the
display controller. In non-atomic contexts, fsleep() is the preferred
API as it automatically selects the most efficient sleeping mechanism
and avoids busy-waiting.

Update both instances to use fsleep(100).

Signed-off-by: Jose A. Perez de Azpillaga <azpijr@gmail.com>
---
v2: Switch from usleep_range() to fsleep() for modernizing delay calls.
---
 drivers/staging/fbtft/fb_ra8875.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fbtft/fb_ra8875.c b/drivers/staging/fbtft/fb_ra8875.c
index 0ab1de6647d0..fe95ec6da064 100644
--- a/drivers/staging/fbtft/fb_ra8875.c
+++ b/drivers/staging/fbtft/fb_ra8875.c
@@ -210,7 +210,7 @@ static void write_reg8_bus8(struct fbtft_par *par, int len, ...)
 	}
 	len--;

-	udelay(100);
+	fsleep(100);

 	if (len) {
 		buf = (u8 *)par->buf;
@@ -231,7 +231,7 @@ static void write_reg8_bus8(struct fbtft_par *par, int len, ...)

 	/* restore user spi-speed */
 	par->fbtftops.write = fbtft_write_spi;
-	udelay(100);
+	fsleep(100);
 }

 static int write_vmem16_bus8(struct fbtft_par *par, size_t offset, size_t len)
--
2.53.0


^ permalink raw reply related

* Re: [PATCH] staging: fbtft: fb_ra8875: replace udelays with usleep_range
From: Greg KH @ 2026-02-25 21:14 UTC (permalink / raw)
  To: Jose A. Perez de Azpillaga
  Cc: andy, dri-devel, linux-fbdev, linux-staging, linux-kernel
In-Reply-To: <20260225204602.134218-1-azpijr@gmail.com>

On Wed, Feb 25, 2026 at 09:45:59PM +0100, Jose A. Perez de Azpillaga wrote:
> The write_reg8_bus8 function uses udelay(100) twice to wait for the
> display controller. For delays of this duration in non-atomic context,
> usleep_range() is preferred as it avoids busy-waiting.
> 
> Update both instances to use usleep_range(100, 120), allowing the
> scheduler to utilize the CPU during these wait periods.
> 
> Signed-off-by: Jose A. Perez de Azpillaga <azpijr@gmail.com>
> ---
>  drivers/staging/fbtft/fb_ra8875.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/fbtft/fb_ra8875.c b/drivers/staging/fbtft/fb_ra8875.c
> index 0ab1de6647d0..6058934e2ca2 100644
> --- a/drivers/staging/fbtft/fb_ra8875.c
> +++ b/drivers/staging/fbtft/fb_ra8875.c
> @@ -210,7 +210,7 @@ static void write_reg8_bus8(struct fbtft_par *par, int len, ...)
>  	}
>  	len--;
>  
> -	udelay(100);
> +	usleep_range(100, 120);
>  
>  	if (len) {
>  		buf = (u8 *)par->buf;
> @@ -231,7 +231,7 @@ static void write_reg8_bus8(struct fbtft_par *par, int len, ...)
>  
>  	/* restore user spi-speed */
>  	par->fbtftops.write = fbtft_write_spi;
> -	udelay(100);
> +	usleep_range(100, 120);
>  }
>  
>  static int write_vmem16_bus8(struct fbtft_par *par, size_t offset, size_t len)
> -- 
> 2.53.0
> 
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You sent a patch that has been sent multiple times in the past few
  days, and is identical to ones that has been recently rejected.
  Please always look at the mailing list traffic to determine if you are
  duplicating other people's work.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply

* [PATCH] staging: fbtft: fb_ra8875: replace udelays with usleep_range
From: Jose A. Perez de Azpillaga @ 2026-02-25 20:45 UTC (permalink / raw)
  To: andy, gregkh; +Cc: dri-devel, linux-fbdev, linux-staging, linux-kernel, azpijr

The write_reg8_bus8 function uses udelay(100) twice to wait for the
display controller. For delays of this duration in non-atomic context,
usleep_range() is preferred as it avoids busy-waiting.

Update both instances to use usleep_range(100, 120), allowing the
scheduler to utilize the CPU during these wait periods.

Signed-off-by: Jose A. Perez de Azpillaga <azpijr@gmail.com>
---
 drivers/staging/fbtft/fb_ra8875.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fbtft/fb_ra8875.c b/drivers/staging/fbtft/fb_ra8875.c
index 0ab1de6647d0..6058934e2ca2 100644
--- a/drivers/staging/fbtft/fb_ra8875.c
+++ b/drivers/staging/fbtft/fb_ra8875.c
@@ -210,7 +210,7 @@ static void write_reg8_bus8(struct fbtft_par *par, int len, ...)
 	}
 	len--;
 
-	udelay(100);
+	usleep_range(100, 120);
 
 	if (len) {
 		buf = (u8 *)par->buf;
@@ -231,7 +231,7 @@ static void write_reg8_bus8(struct fbtft_par *par, int len, ...)
 
 	/* restore user spi-speed */
 	par->fbtftops.write = fbtft_write_spi;
-	udelay(100);
+	usleep_range(100, 120);
 }
 
 static int write_vmem16_bus8(struct fbtft_par *par, size_t offset, size_t len)
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v11 4/4] rust: gpu: Add GPU buddy allocator bindings
From: Joel Fernandes @ 2026-02-25 20:41 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, Alistair Popple, Andrea Righi, Zhi Wang,
	Philipp Stanner, Elle Rhumsaa, alexeyi, Eliot Courtney, linux-doc,
	amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DGO4BIQ6MQ9Y.KB3JJQI71ETU@nvidia.com>

Hi Alex,

On Wed, Feb 25, 2026 at 11:38:31PM +0900, Alexandre Courbot wrote:
>> +//! # Examples
>> +//!
>> +//! ``` [..]
>
> This is a very long example illustrating many use-cases. It is long
> enough that it is difficult to grasp where each example start. Can I
> suggest to aerate it a bit by splitting it into several examples, with a
> bit of regular text explaining what each example does, similarly to the
> documentation of the `Bounded` type?
>
> You can hide the creation of the `GpuBuddy` after the first example to
> keep things concise.

Done. Split into four separate examples with descriptive text between
them. Subsequent examples hide imports and buddy creation with # lines,
and adjust based on your other suggestions.

>> +//!     buddy_flags: BuddyFlags::try_new(BuddyFlags::RANGE_ALLOCATION)?,
>
> Why can a `BuddyFlags` creation fail if we give it a valid value? It
> looks like its consts should be of the type `BuddyFlags` themselves so
> we can use them directly. Actually, we should probably use `impl_flags!`
> for it.

Good point. Switched to `impl_flags!`. I made it wrap `u32`
and individual flags are `BuddyFlag` enum variants. Construction is
infallible. Invalid combinations will be caught by the C allocator
at alloc time.

>> +//! for block in topdown.iter() {
>> +//!     assert_eq!(block.offset(), (SZ_1G - SZ_16M) as u64);
>> +//!     assert_eq!(block.order(), 12); // 2^12 pages
>> +//!     assert_eq!(block.size(), SZ_16M as u64);
>> +//! }
>
> IIUC there should be only one block here, right? That for loop should be
> replaced by a call to `next()` followed by another one checking that the
> result is `None` to be a valid test.

Ah, good point, fixed as follows:
  let block = topdown.iter().next().expect("expected one block");
  assert_eq!(block.offset(), (SZ_1G - SZ_16M) as u64);
  assert_eq!(block.order(), 12);
  assert_eq!(block.size(), SZ_16M as u64);

>> +//! drop(topdown);
>
> Here is a good chance to mention that dropping the allocation will
> return it - it's expected, but not entirely obvious when you read this
> for the first time.

Added a comment: "Dropping the allocation returns the memory to the
buddy allocator."

>> +//! params.buddy_flags = BuddyFlags::try_new(BuddyFlags::RANGE_ALLOCATION)?;
>
> Let's recreate the params for each example to make it self-contained
> instead of modifying the first one.

Done, each example now creates its own self-contained params.

>> +        if flags > u32::MAX as usize {
>
> These `as` conversions are unfortunate - I will try to graduate the
> infallible convertors from Nova into kernel soon so we can avoid them,
> but for now I guess there's nothing we can do unfortunately.

Since I switched to `impl_flags!` with `u32`, the `u32::MAX` check
is gone.

>> +        if (flags & Self::RANGE_ALLOCATION) != 0 && (flags & Self::TOPDOWN_ALLOCATION) != 0 {
>> +            return Err(EINVAL);
>> +        }
>
> This indicates that we should use the type system to prevent such
> constructs from even being attempted - more on this on
> `GpuBuddyAllocParams`.

The C API supports flag combinations (e.g. RANGE+CLEAR, TOPDOWN+CLEAR),
so we model flags as combinable bitflags via `impl_flags!` as you suggested,
rather than a mutually-exclusive enum. Invalid combinations are caught by the C
allocator at runtime. Also fixed a bug here: RANGE+TOPDOWN is valid in C
(TOPDOWN is just unused in the range path), so the old rejection was wrong.
Removed it.

>> +    pub base_offset_bytes: u64,
>
> Let's remove the `_bytes` suffix. Units can be specified in the
> doccomment so they are readily available without making the code
> heavier (`dma.rs` for instance does this).

Done.

>> +pub struct GpuBuddyParams {
>
> This structure doesn't seem to be useful. I would understand using one
> if `GpuBuddyParams` had lots of members, some of which have a sensible
> default value - then we could implement `Default` and let users fill in
> the parameters they need.
>
> But this structure has no constructor of any sort, requiring users to
> fill its 3 members manually - which is actually heavier than having 3
> parameters to `GpuBuddy::new`. It is even deconstructed in
> `GpuBuddyInner` to store its members as 3 different fields! So let's
> skip it.

I'd prefer to keep the struct -- all three parameters are `u64`, so
positional arguments would be easy to silently misorder. The struct
also makes call sites more readable since Rust has no named function call
parameters.

>> +pub struct GpuBuddyAllocParams {
>
> This one also feels like it could be rustified some more.
>
> By this I mean that it e.g. allows the user to specify a range even if
> `RANGE_ALLOCATION` is not set. A C API rejects invalid combinations at
> runtime. A Rust API should make it impossible to even express them.
>
> [...]
>
> That would turn `alloc_blocks` into something like:
>
>   `fn alloc_blocks(&self, alloc: AllocType, size: u64, min_block_size: Alignment, flags: AllocBlocksFlags)`

The C API supports combining allocation modes with modifiers (e.g.
RANGE+CLEAR, TOPDOWN+CLEAR), so modeling the mode as a
mutually-exclusive enum would lose valid combinations. More importantly,
if the C allocator evolves its flag semantics (new combinations become
valid, or existing constraints change), an enum on the Rust side would
break. It's simpler and more maintainable to pass combinable flags and
let the C allocator validate -- which it already does. The switch to
`impl_flags!` will work for us without over-constraining.

>> +/// The internal [`GpuBuddyGuard`] ensures that the lock is held for all
>
> `GpuBuddyGuard` is exported and not internal though.

Fixed, removed "internal" wording.

>> +    base_offset: u64,
>
> This does not appear to be used in the C API - does it belong here? It
> looks like an additional convenience, but I'm not convinced that's the
> role of this type to provide this. But if it really is needed by all
> users (guess I'll find out after looking the Nova code :)), then keeping
> it is fair I guess.

Yes, `base_offset` is needed by nova-core. The GPU's usable VRAM
starts at `usable_vram_start` from the GSP firmware parameters:

    GpuBuddyParams {
        base_offset: params.usable_vram_start,
        physical_memory_size: params.usable_vram_size,
        chunk_size: SZ_4K.into_safe_cast(),
    }

`AllocatedBlock::offset()` then adds `base_offset` to return absolute
VRAM addresses, so callers don't need to track the offset themselves.

>> +/// # Invariants
>> +///
>> +/// The inner `_guard` holds the lock for the duration of this guard's lifetime.
>
> Private members should not be mentioned in public documentation. Also is
> this invariant ever referenced when enforced and to justify an unsafe
> block? If not I don't think there is a point in having it.
>>
>> +pub(crate) struct GpuBuddyGuard<'a> {
>
> IIUC this type can be private.

Done, made `GpuBuddyGuard` private and converted to a regular `//` comment.

>> +    pub fn free_memory_bytes(&self) -> u64 {
>
> Same as struct members, the unit doesn't need to be in the method name -
> the doccomment is sufficient.

Renamed.

>> +pub struct GpuBuddy(Arc<GpuBuddyInner>);
>
> Most people looking for the documentation will reach it through
> `GpuBuddy`. I think we should either move the module-level documentation
> to this type, or add a reference to the module so users can easily find
> how to use it.

Ok, I refer to the module-level doc on the struct.

>> +        let start = params.start_range_address;
>> +        let end = params.end_range_address;
>> +        let size = params.size_bytes;
>> +        let min_block_size = params.min_block_size_bytes;
>> +        let flags = params.buddy_flags;
>
> These local variables are required so the closure below is not confused
> by the lifetime of `params`. But since you are copying its content
> anyway, you could just make `GpuBuddyAllocParams` derive `Copy`, pass
> `params` by value, and use its members directly in the closure.

What I will do is just pass it by value (instead of the reference that I'm
currently passing), and then let the compiler decide if it needs to make copies
or not. In the future, if we change it to not making copies inside the function,
then it will just fallback to a non-copy move. However, if implemented as copy
trait, it might always be copied.

Actually, it turns out that when I pass it by value, I can also get rid of the
intermediate variables so this is great and has the effect you were intending!

>> +        // SAFETY: list contains gpu_buddy_block items linked via __bindgen_anon_1.link.
>
> IIUC the type invariant should be invoked explicitly as we are using it
> to justify this unsafe block (i.e. "per the type invariant, ...").

Fixed.

>> +                self.flags.as_raw() as u32,
>
> You won't need this `as` if you make `BuddyFlags` wrap a `u32` instead
> of a `usize`.

Yes! Done.

>> +// SAFETY: `Block` is not modified after allocation for the lifetime
>> +// of `AllocatedBlock`.
>> +unsafe impl Send for Block {}
>
> This safety comment should not need to reference another type - how
> about something like "`Block` is a wrapper around `gpu_buddy_block`
> which can be sent across threads safely".

Fixed.

>> +// SAFETY: `Block` is not modified after allocation for the lifetime
>> +// of `AllocatedBlock`.
>> +unsafe impl Sync for Block {}
>
> Here as well. I'd say that the block is only accessed through shared
> references after allocation, and thus safe to access concurrently across
> threads.

Fixed.

Thanks for the thorough review! Looks a lot better now, and I am looking forward
to sending the next revision soon.

-- 
Joel Fernandes

^ permalink raw reply

* Re: [PATCH v11 4/4] rust: gpu: Add GPU buddy allocator bindings
From: Alexandre Courbot @ 2026-02-25 14:38 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Dave Airlie, Daniel Almeida,
	Koen Koning, dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian König, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellström,
	Helge Deller, Alex Gaynor, Boqun Feng, Alistair Popple,
	Andrea Righi, Zhi Wang, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev
In-Reply-To: <20260224224005.3232841-5-joelagnelf@nvidia.com>

On Wed Feb 25, 2026 at 7:40 AM JST, Joel Fernandes wrote:
> Add safe Rust abstractions over the Linux kernel's GPU buddy
> allocator for physical memory management. The GPU buddy allocator
> implements a binary buddy system useful for GPU physical memory
> allocation. nova-core will use it for physical memory allocation.
>
> Cc: Nikola Djukic <ndjukic@nvidia.com>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

<snip>
> diff --git a/rust/kernel/gpu/buddy.rs b/rust/kernel/gpu/buddy.rs
> new file mode 100644
> index 000000000000..4383f82c0fc1
> --- /dev/null
> +++ b/rust/kernel/gpu/buddy.rs
> @@ -0,0 +1,536 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! GPU buddy allocator bindings.
> +//!
> +//! C header: [`include/linux/gpu_buddy.h`](srctree/include/linux/gpu_buddy.h)
> +//!
> +//! This module provides Rust abstractions over the Linux kernel's GPU buddy
> +//! allocator, which implements a binary buddy memory allocator.
> +//!
> +//! The buddy allocator manages a contiguous address space and allocates blocks
> +//! in power-of-two sizes, useful for GPU physical memory management.
> +//!
> +//! # Examples
> +//!
> +//! ```

This is a very long example illustrating many use-cases. It is long
enough that it is difficult to grasp where each example start. Can I
suggest to aerate it a bit by splitting it into several examples, with a
bit of regular text explaining what each example does, similarly to the
documentation of the `Bounded` type?

You can hide the creation of the `GpuBuddy` after the first example to
keep things concise.

> +//! use kernel::{
> +//!     gpu::buddy::{BuddyFlags, GpuBuddy, GpuBuddyAllocParams, GpuBuddyParams},
> +//!     prelude::*,
> +//!     sizes::*, //
> +//! };
> +//!
> +//! // Create a 1GB buddy allocator with 4KB minimum chunk size.
> +//! let buddy = GpuBuddy::new(GpuBuddyParams {
> +//!     base_offset_bytes: 0,
> +//!     physical_memory_size_bytes: SZ_1G as u64,
> +//!     chunk_size_bytes: SZ_4K as u64,
> +//! })?;
> +//!
> +//! // Verify initial state.
> +//! assert_eq!(buddy.size(), SZ_1G as u64);
> +//! assert_eq!(buddy.chunk_size(), SZ_4K as u64);
> +//! let initial_free = buddy.free_memory_bytes();
> +//!
> +//! // Base allocation params - mutated between calls for field overrides.
> +//! let mut params = GpuBuddyAllocParams {
> +//!     start_range_address: 0,
> +//!     end_range_address: 0,   // Entire range.
> +//!     size_bytes: SZ_16M as u64,
> +//!     min_block_size_bytes: SZ_16M as u64,
> +//!     buddy_flags: BuddyFlags::try_new(BuddyFlags::RANGE_ALLOCATION)?,

Why can a `BuddyFlags` creation fail if we give it a valid value? It
looks like its consts should be of the type `BuddyFlags` themselves so
we can use them directly. Actually, we should probably use `impl_flags!`
for it.

> +//! };
> +//!
> +//! // Test top-down allocation (allocates from highest addresses).
> +//! params.buddy_flags = BuddyFlags::try_new(BuddyFlags::TOPDOWN_ALLOCATION)?;
> +//! let topdown = KBox::pin_init(buddy.alloc_blocks(&params), GFP_KERNEL)?;
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_16M as u64);
> +//!
> +//! for block in topdown.iter() {
> +//!     assert_eq!(block.offset(), (SZ_1G - SZ_16M) as u64);
> +//!     assert_eq!(block.order(), 12); // 2^12 pages
> +//!     assert_eq!(block.size(), SZ_16M as u64);
> +//! }

IIUC there should be only one block here, right? That for loop should be
replaced by a call to `next()` followed by another one checking that the
result is `None` to be a valid test.

> +//! drop(topdown);

Here is a good chance to mention that dropping the allocation will
return it - it's expected, but not entirely obvious when you read this
for the first time.

> +//! assert_eq!(buddy.free_memory_bytes(), initial_free);
> +//!
> +//! // Allocate 16MB - should result in a single 16MB block at offset 0.
> +//! params.buddy_flags = BuddyFlags::try_new(BuddyFlags::RANGE_ALLOCATION)?;

Let's recreate the params for each example to make it self-contained
instead of modifying the first one.

> +//! let allocated = KBox::pin_init(buddy.alloc_blocks(&params), GFP_KERNEL)?;
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_16M as u64);
> +//!
> +//! for block in allocated.iter() {
> +//!     assert_eq!(block.offset(), 0);
> +//!     assert_eq!(block.order(), 12); // 2^12 pages
> +//!     assert_eq!(block.size(), SZ_16M as u64);
> +//! }
> +//! drop(allocated);
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free);
> +//!
> +//! // Test non-contiguous allocation with fragmented memory.
> +//! // Create fragmentation by allocating 4MB blocks at [0,4M) and [8M,12M).
> +//! params.end_range_address = SZ_4M as u64;
> +//! params.size_bytes = SZ_4M as u64;
> +//! params.min_block_size_bytes = SZ_4M as u64;
> +//! let frag1 = KBox::pin_init(buddy.alloc_blocks(&params), GFP_KERNEL)?;
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_4M as u64);
> +//!
> +//! params.start_range_address = SZ_8M as u64;
> +//! params.end_range_address = (SZ_8M + SZ_4M) as u64;
> +//! let frag2 = KBox::pin_init(buddy.alloc_blocks(&params), GFP_KERNEL)?;
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_8M as u64);
> +//!
> +//! // Allocate 8MB without CONTIGUOUS - should return 2 blocks from the holes.
> +//! params.start_range_address = 0;
> +//! params.end_range_address = SZ_16M as u64;
> +//! params.size_bytes = SZ_8M as u64;
> +//! let fragmented = KBox::pin_init(buddy.alloc_blocks(&params), GFP_KERNEL)?;
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free - (SZ_16M) as u64);
> +//!
> +//! let (mut count, mut total) = (0u32, 0u64);
> +//! for block in fragmented.iter() {
> +//!     // The 8MB allocation should return 2 blocks, each 4MB.
> +//!     assert_eq!(block.size(), SZ_4M as u64);
> +//!     total += block.size();
> +//!     count += 1;
> +//! }
> +//! assert_eq!(total, SZ_8M as u64);
> +//! assert_eq!(count, 2);
> +//! drop(fragmented);
> +//! drop(frag2);
> +//! drop(frag1);
> +//! assert_eq!(buddy.free_memory_bytes(), initial_free);
> +//!
> +//! // Test CONTIGUOUS failure when only fragmented space available.
> +//! // Create a small buddy allocator with only 16MB of memory.
> +//! let small = GpuBuddy::new(GpuBuddyParams {
> +//!     base_offset_bytes: 0,
> +//!     physical_memory_size_bytes: SZ_16M as u64,
> +//!     chunk_size_bytes: SZ_4K as u64,
> +//! })?;
> +//!
> +//! // Allocate 4MB blocks at [0,4M) and [8M,12M) to create fragmented memory.
> +//! params.start_range_address = 0;
> +//! params.end_range_address = SZ_4M as u64;
> +//! params.size_bytes = SZ_4M as u64;
> +//! let hole1 = KBox::pin_init(small.alloc_blocks(&params), GFP_KERNEL)?;
> +//!
> +//! params.start_range_address = SZ_8M as u64;
> +//! params.end_range_address = (SZ_8M + SZ_4M) as u64;
> +//! let hole2 = KBox::pin_init(small.alloc_blocks(&params), GFP_KERNEL)?;
> +//!
> +//! // 8MB contiguous should fail - only two non-contiguous 4MB holes exist.
> +//! params.start_range_address = 0;
> +//! params.end_range_address = 0;
> +//! params.size_bytes = SZ_8M as u64;
> +//! params.buddy_flags = BuddyFlags::try_new(BuddyFlags::CONTIGUOUS_ALLOCATION)?;
> +//! let result = KBox::pin_init(small.alloc_blocks(&params), GFP_KERNEL);
> +//! assert!(result.is_err());
> +//! drop(hole2);
> +//! drop(hole1);
> +//!
> +//! # Ok::<(), Error>(())
> +//! ```
> +
> +use crate::{
> +    bindings,
> +    clist_create,
> +    error::to_result,
> +    ffi::clist::CListHead,
> +    new_mutex,
> +    prelude::*,
> +    sync::{
> +        lock::mutex::MutexGuard,
> +        Arc,
> +        Mutex, //
> +    },
> +    types::Opaque, //
> +};
> +
> +/// Flags for GPU buddy allocator operations.
> +///
> +/// These flags control the allocation behavior of the buddy allocator.
> +#[derive(Clone, Copy, Default, PartialEq, Eq)]
> +pub struct BuddyFlags(usize);
> +
> +impl BuddyFlags {
> +    /// Range-based allocation from start to end addresses.
> +    pub const RANGE_ALLOCATION: usize = bindings::GPU_BUDDY_RANGE_ALLOCATION;
> +
> +    /// Allocate from top of address space downward.
> +    pub const TOPDOWN_ALLOCATION: usize = bindings::GPU_BUDDY_TOPDOWN_ALLOCATION;
> +
> +    /// Allocate physically contiguous blocks.
> +    pub const CONTIGUOUS_ALLOCATION: usize = bindings::GPU_BUDDY_CONTIGUOUS_ALLOCATION;
> +
> +    /// Request allocation from the cleared (zeroed) memory. The zero'ing is not
> +    /// done by the allocator, but by the caller before freeing old blocks.
> +    pub const CLEAR_ALLOCATION: usize = bindings::GPU_BUDDY_CLEAR_ALLOCATION;
> +
> +    /// Disable trimming of partially used blocks.
> +    pub const TRIM_DISABLE: usize = bindings::GPU_BUDDY_TRIM_DISABLE;
> +
> +    /// Mark blocks as cleared (zeroed) when freeing. When set during free,
> +    /// indicates that the caller has already zeroed the memory.
> +    pub const CLEARED: usize = bindings::GPU_BUDDY_CLEARED;
> +
> +    /// Create [`BuddyFlags`] from a raw value with validation.
> +    ///
> +    /// Use `|` operator to combine flags if needed, before calling this method.
> +    pub fn try_new(flags: usize) -> Result<Self> {
> +        // Flags must not exceed u32::MAX to satisfy the GPU buddy allocator C API.
> +        if flags > u32::MAX as usize {

These `as` conversions are unfortunate - I will try to graduate the
infallible convertors from Nova into kernel soon so we can avoid them,
but for now I guess there's nothing we can do unfortunately.

> +            return Err(EINVAL);
> +        }
> +
> +        // `TOPDOWN_ALLOCATION` only works without `RANGE_ALLOCATION`. When both are
> +        // set, `TOPDOWN_ALLOCATION` is silently ignored by the allocator. Reject this.
> +        if (flags & Self::RANGE_ALLOCATION) != 0 && (flags & Self::TOPDOWN_ALLOCATION) != 0 {
> +            return Err(EINVAL);
> +        }

This indicates that we should use the type system to prevent such
constructs from even being attempted - more on this on
`GpuBuddyAllocParams`.

> +
> +        Ok(Self(flags))
> +    }
> +
> +    /// Get raw value of the flags.
> +    pub(crate) fn as_raw(self) -> usize {
> +        self.0
> +    }
> +}
> +
> +/// Parameters for creating a GPU buddy allocator.
> +pub struct GpuBuddyParams {
> +    /// Base offset in bytes where the managed memory region starts.
> +    /// Allocations will be offset by this value.
> +    pub base_offset_bytes: u64,

Let's remove the `_bytes` suffix. Units can be specified in the
doccomment so they are readily available without making the code
heavier (`dma.rs` for instance does this).

> +    /// Total physical memory size managed by the allocator in bytes.
> +    pub physical_memory_size_bytes: u64,
> +    /// Minimum allocation unit / chunk size in bytes, must be >= 4KB.
> +    pub chunk_size_bytes: u64,
> +}

This structure doesn't seem to be useful. I would understand using one
if `GpuBuddyParams` had lots of members, some of which have a sensible
default value - then we could implement `Default` and let users fill in
the parameters they need.

But this structure has no constructor of any sort, requiring users to
fill its 3 members manually - which is actually heavier than having 3
parameters to `GpuBuddy::new`. It is even deconstructed in
`GpuBuddyInner` to store its members as 3 different fields! So let's
skip it.

> +
> +/// Parameters for allocating blocks from a GPU buddy allocator.
> +pub struct GpuBuddyAllocParams {
> +    /// Start of allocation range in bytes. Use 0 for beginning.
> +    pub start_range_address: u64,
> +    /// End of allocation range in bytes. Use 0 for entire range.
> +    pub end_range_address: u64,
> +    /// Total size to allocate in bytes.
> +    pub size_bytes: u64,
> +    /// Minimum block size for fragmented allocations in bytes.
> +    pub min_block_size_bytes: u64,
> +    /// Buddy allocator behavior flags.
> +    pub buddy_flags: BuddyFlags,
> +}

This one also feels like it could be rustified some more.

By this I mean that it e.g. allows the user to specify a range even if
`RANGE_ALLOCATION` is not set. A C API rejects invalid combinations at
runtime. A Rust API should make it impossible to even express them.

IIUC the flags mix the allocation type (simple, range, topdown) with
some orthogonal properties (contiguous, cleared, trim_disable). There is
also one bit (`GPU_BUDDY_CLEARED`) that is not relevant for allocation,
but is freeing a block. We want to use the type system to only allow states
that make sense to be constructed.

The allocation type can be expressed using a three-state enum. Start and
end range only make sense for range allocations, so they would be part
of the `Range` variant.

`size` and `min_block_size` (let's move the unit from the name into the
doccomment) are always relevant, and should be regular arguments of
their own. If `min_block_sizes` only accepts certain values (the C
documentation says "alignment", so power of two?) then it should also
use the relevant type for that (our own `kernel::ptr::Alignment` type?).

`flags` should be its own type (using `impl_flags` again?) allowing a
combination of Contiguous, Cleared, and TrimDisable.

That would turn `alloc_blocks` into something like:

  `fn alloc_blocks(&self, alloc: AllocType, size: u64, min_block_size: Alignment, flags: AllocBlocksFlags)`

(`min_block_size` is a u64 in the C API, but for an alignment a `usize`
is enough and the conversion can be lossless).

> +
> +/// Inner structure holding the actual buddy allocator.
> +///
> +/// # Synchronization
> +///
> +/// The C `gpu_buddy` API requires synchronization (see `include/linux/gpu_buddy.h`).
> +/// The internal [`GpuBuddyGuard`] ensures that the lock is held for all

`GpuBuddyGuard` is exported and not internal though.

> +/// allocator and free operations, preventing races between concurrent allocations
> +/// and the freeing that occurs when [`AllocatedBlocks`] is dropped.
> +///
> +/// # Invariants
> +///
> +/// The inner [`Opaque`] contains a valid, initialized buddy allocator.
> +#[pin_data(PinnedDrop)]
> +struct GpuBuddyInner {
> +    #[pin]
> +    inner: Opaque<bindings::gpu_buddy>,
> +
> +    // TODO: Replace `Mutex<()>` with `Mutex<Opaque<..>>` once `Mutex::new()`
> +    // accepts `impl PinInit<T>`.
> +    #[pin]
> +    lock: Mutex<()>,
> +    /// Base offset for all allocations (does not change after init).
> +    base_offset: u64,

This does not appear to be used in the C API - does it belong here? It
looks like an additional convenience, but I'm not convinced that's the
role of this type to provide this. But if it really is needed by all
users (guess I'll find out after looking the Nova code :)), then keeping
it is fair I guess.

> +    /// Cached chunk size (does not change after init).
> +    chunk_size: u64,
> +    /// Cached total size (does not change after init).
> +    size: u64,
> +}
> +
> +impl GpuBuddyInner {
> +    /// Create a pin-initializer for the buddy allocator.
> +    fn new(params: GpuBuddyParams) -> impl PinInit<Self, Error> {
> +        let base_offset = params.base_offset_bytes;
> +        let size = params.physical_memory_size_bytes;
> +        let chunk_size = params.chunk_size_bytes;
> +
> +        try_pin_init!(Self {
> +            inner <- Opaque::try_ffi_init(|ptr| {
> +                // SAFETY: ptr points to valid uninitialized memory from the pin-init
> +                // infrastructure. gpu_buddy_init will initialize the structure.
> +                to_result(unsafe { bindings::gpu_buddy_init(ptr, size, chunk_size) })
> +            }),
> +            lock <- new_mutex!(()),
> +            base_offset: base_offset,
> +            chunk_size: chunk_size,
> +            size: size,
> +        })
> +    }
> +
> +    /// Lock the mutex and return a guard for accessing the allocator.
> +    fn lock(&self) -> GpuBuddyGuard<'_> {
> +        GpuBuddyGuard {
> +            inner: self,
> +            _guard: self.lock.lock(),
> +        }
> +    }
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for GpuBuddyInner {
> +    fn drop(self: Pin<&mut Self>) {
> +        let guard = self.lock();
> +
> +        // SAFETY: guard provides exclusive access to the allocator.
> +        unsafe {
> +            bindings::gpu_buddy_fini(guard.as_raw());
> +        }
> +    }
> +}
> +
> +// SAFETY: GpuBuddyInner can be sent between threads.
> +unsafe impl Send for GpuBuddyInner {}
> +
> +// SAFETY: GpuBuddyInner is `Sync` because the internal GpuBuddyGuard
> +// serializes all access to the C allocator, preventing data races.
> +unsafe impl Sync for GpuBuddyInner {}
> +
> +/// Guard that proves the lock is held, enabling access to the allocator.
> +///
> +/// # Invariants
> +///
> +/// The inner `_guard` holds the lock for the duration of this guard's lifetime.

Private members should not be mentioned in public documentation. Also is
this invariant ever referenced when enforced and to justify an unsafe
block? If not I don't think there is a point in having it.

> +pub(crate) struct GpuBuddyGuard<'a> {

IIUC this type can be private.

> +    inner: &'a GpuBuddyInner,
> +    _guard: MutexGuard<'a, ()>,
> +}
> +
> +impl GpuBuddyGuard<'_> {
> +    /// Get a raw pointer to the underlying C `gpu_buddy` structure.
> +    fn as_raw(&self) -> *mut bindings::gpu_buddy {
> +        self.inner.inner.get()
> +    }
> +}
> +
> +/// GPU buddy allocator instance.
> +///
> +/// This structure wraps the C `gpu_buddy` allocator using reference counting.
> +/// The allocator is automatically cleaned up when all references are dropped.
> +///
> +/// # Invariants
> +///
> +/// The inner [`Arc`] points to a valid, initialized GPU buddy allocator.

Most people looking for the documentation will reach it through
`GpuBuddy`. I think we should either move the module-level documentation
to this type, or add a reference to the module so users can easily find
how to use it.

> +pub struct GpuBuddy(Arc<GpuBuddyInner>);
> +
> +impl GpuBuddy {
> +    /// Create a new buddy allocator.
> +    ///
> +    /// Creates a buddy allocator that manages a contiguous address space of the given
> +    /// size, with the specified minimum allocation unit (chunk_size must be at least 4KB).
> +    pub fn new(params: GpuBuddyParams) -> Result<Self> {
> +        Ok(Self(Arc::pin_init(GpuBuddyInner::new(params), GFP_KERNEL)?))
> +    }
> +
> +    /// Get the base offset for allocations.
> +    pub fn base_offset(&self) -> u64 {
> +        self.0.base_offset
> +    }
> +
> +    /// Get the chunk size (minimum allocation unit).
> +    pub fn chunk_size(&self) -> u64 {
> +        self.0.chunk_size
> +    }
> +
> +    /// Get the total managed size.
> +    pub fn size(&self) -> u64 {
> +        self.0.size
> +    }
> +
> +    /// Get the available (free) memory in bytes.
> +    pub fn free_memory_bytes(&self) -> u64 {

Same as struct members, the unit doesn't need to be in the method name -
the doccomment is sufficient.

> +        let guard = self.0.lock();
> +
> +        // SAFETY: guard provides exclusive access to the allocator.
> +        unsafe { (*guard.as_raw()).avail }
> +    }
> +
> +    /// Allocate blocks from the buddy allocator.
> +    ///
> +    /// Returns a pin-initializer for [`AllocatedBlocks`].
> +    ///
> +    /// Takes `&self` instead of `&mut self` because the internal [`Mutex`] provides
> +    /// synchronization - no external `&mut` exclusivity needed.
> +    pub fn alloc_blocks(
> +        &self,
> +        params: &GpuBuddyAllocParams,
> +    ) -> impl PinInit<AllocatedBlocks, Error> {
> +        let buddy_arc = Arc::clone(&self.0);
> +        let start = params.start_range_address;
> +        let end = params.end_range_address;
> +        let size = params.size_bytes;
> +        let min_block_size = params.min_block_size_bytes;
> +        let flags = params.buddy_flags;

These local variables are required so the closure below is not confused
by the lifetime of `params`. But since you are copying its content
anyway, you could just make `GpuBuddyAllocParams` derive `Copy`, pass
`params` by value, and use its members directly in the closure.

That probably won't be needed if we split `GpuBuddyAllocParams` as I
suggested though.

> +
> +        // Create pin-initializer that initializes list and allocates blocks.
> +        try_pin_init!(AllocatedBlocks {
> +            buddy: buddy_arc,
> +            list <- CListHead::new(),
> +            flags: flags,
> +            _: {
> +                // Lock while allocating to serialize with concurrent frees.
> +                let guard = buddy.lock();
> +
> +                // SAFETY: `guard` provides exclusive access to the buddy allocator.
> +                to_result(unsafe {
> +                    bindings::gpu_buddy_alloc_blocks(
> +                        guard.as_raw(),
> +                        start,
> +                        end,
> +                        size,
> +                        min_block_size,
> +                        list.as_raw(),
> +                        flags.as_raw(),
> +                    )
> +                })?
> +            }
> +        })
> +    }
> +}
> +
> +/// Allocated blocks from the buddy allocator with automatic cleanup.
> +///
> +/// This structure owns a list of allocated blocks and ensures they are
> +/// automatically freed when dropped. Use `iter()` to iterate over all
> +/// allocated [`Block`] structures.
> +///
> +/// # Invariants
> +///
> +/// - `list` is an initialized, valid list head containing allocated blocks.
> +#[pin_data(PinnedDrop)]
> +pub struct AllocatedBlocks {
> +    #[pin]
> +    list: CListHead,
> +    buddy: Arc<GpuBuddyInner>,
> +    flags: BuddyFlags,
> +}
> +
> +impl AllocatedBlocks {
> +    /// Check if the block list is empty.
> +    pub fn is_empty(&self) -> bool {
> +        // An empty list head points to itself.
> +        !self.list.is_linked()
> +    }
> +
> +    /// Iterate over allocated blocks.
> +    ///
> +    /// Returns an iterator yielding [`AllocatedBlock`] values. Each [`AllocatedBlock`]
> +    /// borrows `self` and is only valid for the duration of that borrow.
> +    pub fn iter(&self) -> impl Iterator<Item = AllocatedBlock<'_>> + '_ {
> +        // SAFETY: list contains gpu_buddy_block items linked via __bindgen_anon_1.link.

IIUC the type invariant should be invoked explicitly as we are using it
to justify this unsafe block (i.e. "per the type invariant, ...").

> +        let clist = clist_create!(unsafe {
> +            self.list.as_raw(),
> +            Block,
> +            bindings::gpu_buddy_block,
> +            __bindgen_anon_1.link
> +        });
> +
> +        clist
> +            .iter()
> +            .map(|block| AllocatedBlock { block, alloc: self })
> +    }
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for AllocatedBlocks {
> +    fn drop(self: Pin<&mut Self>) {
> +        let guard = self.buddy.lock();
> +
> +        // SAFETY:
> +        // - list is valid per the type's invariants.
> +        // - guard provides exclusive access to the allocator.
> +        // CAST: BuddyFlags were validated to fit in u32 at construction.
> +        unsafe {
> +            bindings::gpu_buddy_free_list(
> +                guard.as_raw(),
> +                self.list.as_raw(),
> +                self.flags.as_raw() as u32,

You won't need this `as` if you make `BuddyFlags` wrap a `u32` instead
of a `usize`.

> +            );
> +        }
> +    }
> +}
> +
> +/// A GPU buddy block.
> +///
> +/// Transparent wrapper over C `gpu_buddy_block` structure. This type is returned
> +/// as references during iteration over [`AllocatedBlocks`].
> +///
> +/// # Invariants
> +///
> +/// The inner [`Opaque`] contains a valid, allocated `gpu_buddy_block`.
> +#[repr(transparent)]
> +pub struct Block(Opaque<bindings::gpu_buddy_block>);
> +
> +impl Block {
> +    /// Get a raw pointer to the underlying C block.
> +    fn as_raw(&self) -> *mut bindings::gpu_buddy_block {
> +        self.0.get()
> +    }
> +
> +    /// Get the block's offset in the address space.
> +    pub(crate) fn offset(&self) -> u64 {
> +        // SAFETY: self.as_raw() is valid per the type's invariants.
> +        unsafe { bindings::gpu_buddy_block_offset(self.as_raw()) }
> +    }
> +
> +    /// Get the block order.
> +    pub(crate) fn order(&self) -> u32 {
> +        // SAFETY: self.as_raw() is valid per the type's invariants.
> +        unsafe { bindings::gpu_buddy_block_order(self.as_raw()) }
> +    }
> +}
> +
> +// SAFETY: `Block` is not modified after allocation for the lifetime
> +// of `AllocatedBlock`.

This safety comment should not need to reference another type - how
about something like "`Block` is a wrapper around `gpu_buddy_block`
which can be sent across threads safely".

> +unsafe impl Send for Block {}
> +
> +// SAFETY: `Block` is not modified after allocation for the lifetime
> +// of `AllocatedBlock`.

Here as well. I'd say that the block is only accessed through shared
references after allocation, and thus safe to access concurrently across
threads.

^ permalink raw reply

* Re: [PATCH v8 16/25] gpu: nova-core: mm: Add page table walker for MMU v2/v3
From: Joel Fernandes @ 2026-02-25 14:26 UTC (permalink / raw)
  To: Gary Guo
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher, Christian Koenig,
	Jani Nikula, Joonas Lahtinen, Vivi Rodrigo, Tvrtko Ursulin,
	Rui Huang, Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellstrom, Helge Deller, Alex Gaynor, Boqun Feng,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev
In-Reply-To: <129e9988ff8d3e8747f724fbcc88c5cb@garyguo.net>

On 2026-02-25, Gary Guo <gary@garyguo.net> wrote:
> On 2026-02-24 22:53, Joel Fernandes wrote:
>> +//! ## MMU v2 (Turing/Ampere/Ada) - 5 levels
>> [...]
>> +//! ## MMU v3 (Hopper+) - 6 levels
>
> I think this is called "4 levels" and "5 levels" in kernel MM rather than
> "5 levels" and "6 levels".

Actually, I think "5 levels" and "6 levels" is correct even by x86 kernel MM
convention. In x86 "4-level paging", the 4 levels are PGD, PUD, PMD, PTE -
the root page directory (PGD) IS counted as one of the 4 levels. Similarly,
for the GPU MMU, counting the root PDB (L0) as a level gives us 5 levels for
v2 (PDB/L0 through L4/PTE) and 6 levels for v3 (PDB/L0 through L5/PTE).

This is also consistent with NVIDIA's own hardware definitions in the OpenRM
headers (dev_mmu.h for Turing and Hopper) which define the page table entries
for each of these levels. The virtual address bitfield spans L0 (bits 56:48)
through L4 (bits 20:12) for v2, giving 5 distinct page table levels.

FWIW, the existing nouveau driver also uses this convention - NVKM_VMM_LEVELS_MAX
is defined as 6 in nvkm/subdev/mmu/vmm.c, and the GH100 page table descriptors
in vmmgh100.c list all 6 levels.

-- 
Joel Fernandes


^ permalink raw reply

* Re: [PATCH v11 2/2] rust: clist: Add support to interface with C linked lists
From: Alexandre Courbot @ 2026-02-25  8:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Alex Gaynor, Danilo Krummrich, Dave Airlie,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Daniel Almeida,
	Koen Koning, Nikola Djukic, Philipp Stanner, Elle Rhumsaa,
	Jonathan Corbet, Alex Deucher, Christian König, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellström, Helge Deller, Alistair Popple,
	Andrea Righi, Zhi Wang, alexeyi, Eliot Courtney, dri-devel,
	nouveau, rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev
In-Reply-To: <DGNW1KH6TCE1.3DIVLKYG6OURI@nvidia.com>

On Wed Feb 25, 2026 at 5:09 PM JST, Alexandre Courbot wrote:
> On Wed Feb 25, 2026 at 7:27 AM JST, Joel Fernandes wrote:
>> Add a new module `clist` for working with C's doubly circular linked
>> lists. Provide low-level iteration over list nodes.
>>
>> Typed iteration over actual items is provided with a `clist_create`
>> macro to assist in creation of the `CList` type.
>>
>> Cc: Nikola Djukic <ndjukic@nvidia.com>
>> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
>> Acked-by: Gary Guo <gary@garyguo.net>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>
> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
>
> (with one small comment below)

Actually, one more. When trying to build the GPU buddy series on top of
this, I got this warning:

warning: this macro expands metavariables in an unsafe block
   --> ../rust/kernel/ffi/clist.rs:336:9
    |
336 |         unsafe { $crate::ffi::clist::CList::<$rust_type, OFFSET>::from_raw($head) }
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: this allows the user of the macro to write unsafe code outside of an unsafe block
    = help: consider expanding any metavariables outside of this block, e.g. by storing them in a variable
    = help: ... or also expand referenced metavariables in a safe context to require an unsafe block at callsite
    = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.93.0/index.html#macro_metavars_in_unsafe
    = note: `-W clippy::macro-metavars-in-unsafe` implied by `-W clippy::all`
    = help: to override `-W clippy::all` add `#[allow(clippy::macro_metavars_in_unsafe)]`

The fix should be easy and as prescribed.

Btw, your `nova/mm` branch has not been built with `CLIPPY=1` before
submission - there are still a few of few.

^ permalink raw reply

* Re: [PATCH v11 2/2] rust: clist: Add support to interface with C linked lists
From: Alexandre Courbot @ 2026-02-25  8:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Alex Gaynor, Danilo Krummrich, Dave Airlie,
	Maarten Lankhorst, Maxime Ripard, Simona Vetter, Daniel Almeida,
	Koen Koning, Nikola Djukic, Philipp Stanner, Elle Rhumsaa,
	Jonathan Corbet, Alex Deucher, Christian König, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellström, Helge Deller, Alistair Popple,
	Andrea Righi, Zhi Wang, alexeyi, Eliot Courtney, dri-devel,
	nouveau, rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev
In-Reply-To: <20260224222734.3153931-3-joelagnelf@nvidia.com>

On Wed Feb 25, 2026 at 7:27 AM JST, Joel Fernandes wrote:
> Add a new module `clist` for working with C's doubly circular linked
> lists. Provide low-level iteration over list nodes.
>
> Typed iteration over actual items is provided with a `clist_create`
> macro to assist in creation of the `CList` type.
>
> Cc: Nikola Djukic <ndjukic@nvidia.com>
> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
> Acked-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>

(with one small comment below)

> ---
>  MAINTAINERS              |   8 +
>  rust/helpers/helpers.c   |   1 +
>  rust/helpers/list.c      |  17 ++
>  rust/kernel/ffi/clist.rs | 338 +++++++++++++++++++++++++++++++++++++++
>  rust/kernel/ffi/mod.rs   |   2 +
>  rust/kernel/lib.rs       |   1 +
>  6 files changed, 367 insertions(+)
>  create mode 100644 rust/helpers/list.c
>  create mode 100644 rust/kernel/ffi/clist.rs
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dc82a6bd1a61..0dc318c94c99 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -23181,6 +23181,14 @@ T:	git https://github.com/Rust-for-Linux/linux.git alloc-next
>  F:	rust/kernel/alloc.rs
>  F:	rust/kernel/alloc/
>  
> +RUST [FFI HELPER]
> +M:	Joel Fernandes <joelagnelf@nvidia.com> (CLIST)
> +M:	Alexandre Courbot <acourbot@nvidia.com> (CLIST)
> +L:	rust-for-linux@vger.kernel.org
> +S:	Maintained
> +T:	git https://github.com/Rust-for-Linux/linux.git ffi-next
> +F:	rust/kernel/ffi/

Acked-by: Alexandre Courbot <acourbot@nvidia.com>

<snip>
> +impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
> +    type Item = &'a T;
> +
> +    #[inline]
> +    fn next(&mut self) -> Option<Self::Item> {
> +        let head = self.head_iter.next()?;
> +
> +        // Convert to item using OFFSET.
> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)

`item_ptr` does not exist. 


^ permalink raw reply

* Re: [PATCH v8 16/25] gpu: nova-core: mm: Add page table walker for MMU v2/v3
From: Gary Guo @ 2026-02-25  5:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev
In-Reply-To: <20260224225323.3312204-17-joelagnelf@nvidia.com>

On 2026-02-24 22:53, Joel Fernandes wrote:
> Add the page table walker implementation that traverses the page table
> hierarchy for both MMU v2 (5-level) and MMU v3 (6-level) to resolve
> virtual addresses to physical addresses or find PTE locations.
> 
> Currently only v2 has been tested (nova-core currently boots pre-hopper)
> with some initial prepatory work done for v3.
> 
> Cc: Nikola Djukic <ndjukic@nvidia.com>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/pagetable.rs      |   1 +
>  drivers/gpu/nova-core/mm/pagetable/walk.rs | 218 +++++++++++++++++++++
>  2 files changed, 219 insertions(+)
>  create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs
> 
> diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
> index 33acb7053fbe..7ebea4cb8437 100644
> --- a/drivers/gpu/nova-core/mm/pagetable.rs
> +++ b/drivers/gpu/nova-core/mm/pagetable.rs
> @@ -9,6 +9,7 @@
>  #![expect(dead_code)]
>  pub(crate) mod ver2;
>  pub(crate) mod ver3;
> +pub(crate) mod walk;
>  
>  use kernel::prelude::*;
>  
> diff --git a/drivers/gpu/nova-core/mm/pagetable/walk.rs b/drivers/gpu/nova-core/mm/pagetable/walk.rs
> new file mode 100644
> index 000000000000..023226af8816
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pagetable/walk.rs
> @@ -0,0 +1,218 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Page table walker implementation for NVIDIA GPUs.
> +//!
> +//! This module provides page table walking functionality for MMU v2 and v3.
> +//! The walker traverses the page table hierarchy to resolve virtual addresses
> +//! to physical addresses or to find PTE locations.
> +//!
> +//! # Page Table Hierarchy
> +//!
> +//! ## MMU v2 (Turing/Ampere/Ada) - 5 levels
> +//!
> +//! ```text
> +//!     +-------+     +-------+     +-------+     +---------+     +-------+
> +//!     | PDB   |---->|  L1   |---->|  L2   |---->| L3 Dual |---->|  L4   |
> +//!     | (L0)  |     |       |     |       |     | PDE     |     | (PTE) |
> +//!     +-------+     +-------+     +-------+     +---------+     +-------+
> +//!       64-bit        64-bit        64-bit        128-bit         64-bit
> +//!        PDE           PDE           PDE        (big+small)        PTE
> +//! ```
> +//!
> +//! ## MMU v3 (Hopper+) - 6 levels

I think this is called "4 levels" and "5 levels" in kernel MM rather than
"5 levels" and "6 levels".

Best,
Gary

> +//!
> +//! ```text
> +//!     +-------+     +-------+     +-------+     +-------+     +---------+     +-------+
> +//!     | PDB   |---->|  L1   |---->|  L2   |---->|  L3   |---->| L4 Dual |---->|  L5   |
> +//!     | (L0)  |     |       |     |       |     |       |     | PDE     |     | (PTE) |
> +//!     +-------+     +-------+     +-------+     +-------+     +---------+     +-------+
> +//!       64-bit        64-bit        64-bit        64-bit        128-bit         64-bit
> +//!        PDE           PDE           PDE           PDE        (big+small)        PTE
> +//! ```
> +//!
> +//! # Result of a page table walk
> +//!
> +//! The walker returns a [`WalkResult`] indicating the outcome.

^ permalink raw reply

* [PATCH v8 25/25] gpu: nova-core: mm: Add PRAMIN aperture self-tests
From: Joel Fernandes @ 2026-02-24 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260224225323.3312204-1-joelagnelf@nvidia.com>

Add self-tests for the PRAMIN aperture mechanism to verify correct
operation during GPU probe. The tests validate various alignment
requirements and corner cases.

The tests are default disabled and behind CONFIG_NOVA_MM_SELFTESTS.
When enabled, tests run after GSP boot during probe.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |   3 +
 drivers/gpu/nova-core/mm/pramin.rs | 161 +++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index fba6ddba6a3f..34827dc2afff 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -396,6 +396,9 @@ fn run_mm_selftests(mut self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>)
 
         let mmu_version = MmuVersion::from(self.spec.chipset.arch());
 
+        // PRAMIN aperture self-tests.
+        crate::mm::pramin::run_self_test(pdev.as_ref(), self.bar.clone(), mmu_version)?;
+
         // BAR1 self-tests.
         let bar1 = Arc::pin_init(
             pdev.iomap_region_sized::<BAR1_SIZE>(1, c"nova-core/bar1"),
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
index 04b652d3ee4f..30b1dba0c305 100644
--- a/drivers/gpu/nova-core/mm/pramin.rs
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -290,3 +290,164 @@ fn drop(&mut self) {
         // MutexGuard drops automatically, releasing the lock.
     }
 }
+
+/// Run PRAMIN self-tests during boot if self-tests are enabled.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+pub(crate) fn run_self_test(
+    dev: &kernel::device::Device,
+    bar: Arc<Devres<Bar0>>,
+    mmu_version: super::pagetable::MmuVersion,
+) -> Result {
+    use super::pagetable::MmuVersion;
+
+    // PRAMIN support is only for MMU v2 for now (Turing/Ampere/Ada).
+    if mmu_version != MmuVersion::V2 {
+        dev_info!(
+            dev,
+            "PRAMIN: Skipping self-tests for MMU {:?} (only V2 supported)\n",
+            mmu_version
+        );
+        return Ok(());
+    }
+
+    dev_info!(dev, "PRAMIN: Starting self-test...\n");
+
+    let pramin = Arc::pin_init(Pramin::new(bar)?, GFP_KERNEL)?;
+    let mut win = pramin.window()?;
+
+    // Use offset 0x1000 as test area.
+    let base: usize = 0x1000;
+
+    // Test 1: Read/write at byte-aligned locations.
+    for i in 0u8..4 {
+        let offset = base + 1 + usize::from(i); // Offsets 0x1001, 0x1002, 0x1003, 0x1004
+        let val = 0xA0 + i;
+        win.try_write8(offset, val)?;
+        let read_val = win.try_read8(offset)?;
+        if read_val != val {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - offset {:#x}: wrote {:#x}, read {:#x}\n",
+                offset,
+                val,
+                read_val
+            );
+            return Err(EIO);
+        }
+    }
+
+    // Test 2: Write `u32` and read back as `u8`s.
+    let test2_offset = base + 0x10;
+    let test2_val: u32 = 0xDEADBEEF;
+    win.try_write32(test2_offset, test2_val)?;
+
+    // Read back as individual bytes (little-endian: EF BE AD DE).
+    let expected_bytes: [u8; 4] = [0xEF, 0xBE, 0xAD, 0xDE];
+    for (i, &expected) in expected_bytes.iter().enumerate() {
+        let read_val = win.try_read8(test2_offset + i)?;
+        if read_val != expected {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+                test2_offset + i,
+                expected,
+                read_val
+            );
+            return Err(EIO);
+        }
+    }
+
+    // Test 3: Window repositioning across 1MB boundaries.
+    // Write to offset > 1MB to trigger window slide, then verify.
+    let test3_offset_a: usize = base; // First 1MB region.
+    let test3_offset_b: usize = 0x200000 + base; // 2MB + base (different 1MB region).
+    let val_a: u32 = 0x11111111;
+    let val_b: u32 = 0x22222222;
+
+    // Write to first region.
+    win.try_write32(test3_offset_a, val_a)?;
+
+    // Write to second region (triggers window reposition).
+    win.try_write32(test3_offset_b, val_b)?;
+
+    // Read back from second region.
+    let read_b = win.try_read32(test3_offset_b)?;
+    if read_b != val_b {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+            test3_offset_b,
+            val_b,
+            read_b
+        );
+        return Err(EIO);
+    }
+
+    // Read back from first region (triggers window reposition again).
+    let read_a = win.try_read32(test3_offset_a)?;
+    if read_a != val_a {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+            test3_offset_a,
+            val_a,
+            read_a
+        );
+        return Err(EIO);
+    }
+
+    // Test 4: Invalid offset rejection (beyond 40-bit address space).
+    {
+        // 40-bit address space limit check.
+        let invalid_offset: usize = MAX_VRAM_OFFSET + 1;
+        let result = win.try_read32(invalid_offset);
+        if result.is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - read at invalid offset {:#x} should have failed\n",
+                invalid_offset
+            );
+            return Err(EIO);
+        }
+    }
+
+    // Test 5: Misaligned multi-byte access rejection.
+    // Verify that misaligned `u16`/`u32`/`u64` accesses are properly rejected.
+    {
+        // `u16` at odd offset (not 2-byte aligned).
+        let offset_u16 = base + 0x21;
+        if win.try_write16(offset_u16, 0xABCD).is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - misaligned u16 write at {:#x} should have failed\n",
+                offset_u16
+            );
+            return Err(EIO);
+        }
+
+        // `u32` at 2-byte-aligned (not 4-byte-aligned) offset.
+        let offset_u32 = base + 0x32;
+        if win.try_write32(offset_u32, 0x12345678).is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - misaligned u32 write at {:#x} should have failed\n",
+                offset_u32
+            );
+            return Err(EIO);
+        }
+
+        // `u64` read at 4-byte-aligned (not 8-byte-aligned) offset.
+        let offset_u64 = base + 0x44;
+        if win.try_read64(offset_u64).is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - misaligned u64 read at {:#x} should have failed\n",
+                offset_u64
+            );
+            return Err(EIO);
+        }
+    }
+
+    dev_info!(dev, "PRAMIN: All self-tests PASSED\n");
+    Ok(())
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v8 24/25] gpu: nova-core: mm: Add BAR1 memory management self-tests
From: Joel Fernandes @ 2026-02-24 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260224225323.3312204-1-joelagnelf@nvidia.com>

Add self-tests for BAR1 access during driver probe when
CONFIG_NOVA_MM_SELFTESTS is enabled (default disabled). This results in
testing the Vmm, GPU buddy allocator and BAR1 region all of which should
function correctly for the tests to pass.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig        |  10 ++
 drivers/gpu/nova-core/driver.rs      |   2 +
 drivers/gpu/nova-core/gpu.rs         |  42 +++++
 drivers/gpu/nova-core/mm/bar_user.rs | 250 +++++++++++++++++++++++++++
 4 files changed, 304 insertions(+)

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index 6513007bf66f..35de55aabcfc 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -15,3 +15,13 @@ config NOVA_CORE
 	  This driver is work in progress and may not be functional.
 
 	  If M is selected, the module will be called nova_core.
+
+config NOVA_MM_SELFTESTS
+	bool "Memory management self-tests"
+	depends on NOVA_CORE
+	help
+	  Enable self-tests for the memory management subsystem. When enabled,
+	  tests are run during GPU probe to verify PRAMIN aperture access,
+	  page table walking, and BAR1 virtual memory mapping functionality.
+
+	  This is a testing option and is default-disabled.
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index d8b2e967ba4c..7d0d09939835 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -92,6 +92,8 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
 
             Ok(try_pin_init!(Self {
                 gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
+                // Run optional GPU selftests.
+                _: { gpu.run_selftests(pdev)? },
                 _reg <- auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 4281487ca52e..fba6ddba6a3f 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -380,4 +380,46 @@ pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
             .inspect(|bar| self.sysmem_flush.unregister(bar))
             .is_err());
     }
+
+    /// Run selftests on the constructed [`Gpu`].
+    pub(crate) fn run_selftests(
+        mut self: Pin<&mut Self>,
+        pdev: &pci::Device<device::Bound>,
+    ) -> Result {
+        self.as_mut().run_mm_selftests(pdev)?;
+        Ok(())
+    }
+
+    #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+    fn run_mm_selftests(mut self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
+        use crate::driver::BAR1_SIZE;
+
+        let mmu_version = MmuVersion::from(self.spec.chipset.arch());
+
+        // BAR1 self-tests.
+        let bar1 = Arc::pin_init(
+            pdev.iomap_region_sized::<BAR1_SIZE>(1, c"nova-core/bar1"),
+            GFP_KERNEL,
+        )?;
+        let bar1_access = bar1.access(pdev.as_ref())?;
+
+        let proj = self.as_mut().project();
+        let bar1_pde_base = proj.gsp_static_info.bar1_pde_base();
+        let mm = proj.mm.as_ref().get_ref();
+
+        crate::mm::bar_user::run_self_test(
+            pdev.as_ref(),
+            mm,
+            bar1_access,
+            bar1_pde_base,
+            mmu_version,
+        )?;
+
+        Ok(())
+    }
+
+    #[cfg(not(CONFIG_NOVA_MM_SELFTESTS))]
+    fn run_mm_selftests(self: Pin<&mut Self>, _pdev: &pci::Device<device::Bound>) -> Result {
+        Ok(())
+    }
 }
diff --git a/drivers/gpu/nova-core/mm/bar_user.rs b/drivers/gpu/nova-core/mm/bar_user.rs
index 4af56ac628b6..28dfb10e7cea 100644
--- a/drivers/gpu/nova-core/mm/bar_user.rs
+++ b/drivers/gpu/nova-core/mm/bar_user.rs
@@ -154,3 +154,253 @@ fn drop(&mut self) {
         }
     }
 }
+
+/// Check if the PDB has valid, VRAM-backed page tables.
+///
+/// Returns `Err(ENOENT)` if page tables are missing or not in VRAM.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn check_valid_page_tables(mm: &GpuMm, pdb_addr: VramAddress) -> Result {
+    use crate::mm::pagetable::{
+        ver2::Pde,
+        AperturePde, //
+    };
+
+    let mut window = mm.pramin().window()?;
+    let pdb_entry_raw = window.try_read64(pdb_addr.raw())?;
+    let pdb_entry = Pde::new(pdb_entry_raw);
+
+    if !pdb_entry.is_valid() {
+        return Err(ENOENT);
+    }
+
+    if pdb_entry.aperture() != AperturePde::VideoMemory {
+        return Err(ENOENT);
+    }
+
+    Ok(())
+}
+
+/// Run MM subsystem self-tests during probe.
+///
+/// Tests page table infrastructure and `BAR1` MMIO access using the `BAR1`
+/// address space. Uses the `GpuMm`'s buddy allocator to allocate page tables
+/// and test pages as needed.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+pub(crate) fn run_self_test(
+    dev: &kernel::device::Device,
+    mm: &GpuMm,
+    bar1: &crate::driver::Bar1,
+    bar1_pdb: u64,
+    mmu_version: MmuVersion,
+) -> Result {
+    use crate::mm::{
+        vmm::Vmm,
+        PAGE_SIZE, //
+    };
+    use kernel::gpu::buddy::{
+        BuddyFlags,
+        GpuBuddyAllocParams, //
+    };
+    use kernel::sizes::{
+        SZ_4K,
+        SZ_16K,
+        SZ_32K,
+        SZ_64K, //
+    };
+
+    // Self-tests only support MMU v2 for now.
+    if mmu_version != MmuVersion::V2 {
+        dev_info!(
+            dev,
+            "MM: Skipping self-tests for MMU {:?} (only V2 supported)\n",
+            mmu_version
+        );
+        return Ok(());
+    }
+
+    // Test patterns.
+    const PATTERN_PRAMIN: u32 = 0xDEAD_BEEF;
+    const PATTERN_BAR1: u32 = 0xCAFE_BABE;
+
+    dev_info!(dev, "MM: Starting self-test...\n");
+
+    let pdb_addr = VramAddress::new(bar1_pdb);
+
+    // Check if initial page tables are in VRAM.
+    if check_valid_page_tables(mm, pdb_addr).is_err() {
+        dev_info!(dev, "MM: Self-test SKIPPED - no valid VRAM page tables\n");
+        return Ok(());
+    }
+
+    // Set up a test page from the buddy allocator.
+    let alloc_params = GpuBuddyAllocParams {
+        start_range_address: 0,
+        end_range_address: 0,
+        size_bytes: SZ_4K.into_safe_cast(),
+        min_block_size_bytes: SZ_4K.into_safe_cast(),
+        buddy_flags: BuddyFlags::try_new(0)?,
+    };
+
+    let test_page_blocks = KBox::pin_init(mm.buddy().alloc_blocks(&alloc_params), GFP_KERNEL)?;
+    let test_vram_offset = test_page_blocks.iter().next().ok_or(ENOMEM)?.offset();
+    let test_vram = VramAddress::new(test_vram_offset);
+    let test_pfn = Pfn::from(test_vram);
+
+    // Create a VMM of size 64K to track virtual memory mappings.
+    let mut vmm = Vmm::new(pdb_addr, MmuVersion::V2, SZ_64K.into_safe_cast())?;
+
+    // Create a test mapping.
+    let mapped = vmm.map_pages(mm, &[test_pfn], None, true)?;
+    let test_vfn = mapped.vfn_start;
+
+    // Pre-compute test addresses for the PRAMIN to BAR1 read test.
+    let vfn_offset: usize = test_vfn.raw().into_safe_cast();
+    let bar1_base_offset = vfn_offset.checked_mul(PAGE_SIZE).ok_or(EOVERFLOW)?;
+    let bar1_read_offset: usize = bar1_base_offset + 0x100;
+    let vram_read_addr: usize = test_vram.raw() + 0x100;
+
+    // Test 1: Write via PRAMIN, read via BAR1.
+    {
+        let mut window = mm.pramin().window()?;
+        window.try_write32(vram_read_addr, PATTERN_PRAMIN)?;
+    }
+
+    // Read back via BAR1 aperture.
+    let bar1_value = bar1.try_read32(bar1_read_offset)?;
+
+    let test1_passed = if bar1_value == PATTERN_PRAMIN {
+        true
+    } else {
+        dev_err!(
+            dev,
+            "MM: Test 1 FAILED - Expected {:#010x}, got {:#010x}\n",
+            PATTERN_PRAMIN,
+            bar1_value
+        );
+        false
+    };
+
+    // Cleanup - invalidate PTE.
+    vmm.unmap_pages(mm, mapped)?;
+
+    // Test 2: Two-phase prepare/execute API.
+    let prepared = vmm.prepare_map(mm, 1, None)?;
+    let mapped2 = vmm.execute_map(mm, prepared, &[test_pfn], true)?;
+    let readback = vmm.read_mapping(mm, mapped2.vfn_start)?;
+    let test2_passed = if readback == Some(test_pfn) {
+        true
+    } else {
+        dev_err!(dev, "MM: Test 2 FAILED - Two-phase map readback mismatch\n");
+        false
+    };
+    vmm.unmap_pages(mm, mapped2)?;
+
+    // Test 3: Range-constrained allocation with a hole — exercises block.size()-driven
+    // BAR1 mapping. A 4K hole is punched at base+16K, then a single 32K allocation
+    // is requested within [base, base+36K). The buddy allocator must split around the
+    // hole, returning multiple blocks (expected: {16K, 4K, 8K, 4K} = 32K total).
+    // Each block is mapped into BAR1 and verified via PRAMIN read-back.
+    //
+    // Address layout (base = 0x10000):
+    //   [    16K    ] [HOLE 4K] [4K] [ 8K ] [4K]
+    //   0x10000       0x14000  0x15000 0x16000 0x18000 0x19000
+    let range_base: u64 = SZ_64K.into_safe_cast();
+    let range_flag = BuddyFlags::try_new(BuddyFlags::RANGE_ALLOCATION)?;
+    let sz_4k: u64 = SZ_4K.into_safe_cast();
+    let sz_16k: u64 = SZ_16K.into_safe_cast();
+    let sz_32k_4k: u64 = (SZ_32K + SZ_4K).into_safe_cast();
+
+    // Punch a 4K hole at base+16K so the subsequent 32K allocation must split.
+    let _hole = KBox::pin_init(mm.buddy().alloc_blocks(&GpuBuddyAllocParams {
+        start_range_address: range_base + sz_16k,
+        end_range_address: range_base + sz_16k + sz_4k,
+        size_bytes: SZ_4K.into_safe_cast(),
+        min_block_size_bytes: SZ_4K.into_safe_cast(),
+        buddy_flags: range_flag,
+    }), GFP_KERNEL)?;
+
+    // Allocate 32K within [base, base+36K). The hole forces the allocator to return
+    // split blocks whose sizes are determined by buddy alignment.
+    let blocks = KBox::pin_init(mm.buddy().alloc_blocks(&GpuBuddyAllocParams {
+        start_range_address: range_base,
+        end_range_address: range_base + sz_32k_4k,
+        size_bytes: SZ_32K.into_safe_cast(),
+        min_block_size_bytes: SZ_4K.into_safe_cast(),
+        buddy_flags: range_flag,
+    }), GFP_KERNEL)?;
+
+    let mut test3_passed = true;
+    let mut total_size = 0u64;
+
+    for block in blocks.iter() {
+        total_size += block.size();
+
+        // Map all pages of this block.
+        let page_size: u64 = PAGE_SIZE.into_safe_cast();
+        let num_pages: usize = (block.size() / page_size).into_safe_cast();
+
+        let mut pfns = KVec::new();
+        for j in 0..num_pages {
+            let j_u64: u64 = j.into_safe_cast();
+            pfns.push(
+                Pfn::from(VramAddress::new(
+                    block.offset()
+                        + j_u64.checked_mul(page_size)
+                            .ok_or(EOVERFLOW)?,
+                )),
+                GFP_KERNEL,
+            )?;
+        }
+
+        let mapped = vmm.map_pages(mm, &pfns, None, true)?;
+        let bar1_base_vfn: usize = mapped.vfn_start.raw().into_safe_cast();
+        let bar1_base = bar1_base_vfn.checked_mul(PAGE_SIZE).ok_or(EOVERFLOW)?;
+
+        for j in 0..num_pages {
+            let page_bar1_off = bar1_base + j * PAGE_SIZE;
+            let j_u64: u64 = j.into_safe_cast();
+            let page_phys = block.offset()
+                + j_u64.checked_mul(PAGE_SIZE.into_safe_cast())
+                    .ok_or(EOVERFLOW)?;
+
+            bar1.try_write32(PATTERN_BAR1, page_bar1_off)?;
+
+            let pramin_val = {
+                let mut window = mm.pramin().window()?;
+                window.try_read32(page_phys.into_safe_cast())?
+            };
+
+            if pramin_val != PATTERN_BAR1 {
+                dev_err!(
+                    dev,
+                    "MM: Test 3 FAILED block offset {:#x} page {} (val={:#x})\n",
+                    block.offset(),
+                    j,
+                    pramin_val
+                );
+                test3_passed = false;
+            }
+        }
+
+        vmm.unmap_pages(mm, mapped)?;
+    }
+
+    // Verify aggregate: all returned block sizes must sum to allocation size.
+    if total_size != SZ_32K.into_safe_cast() {
+        dev_err!(
+            dev,
+            "MM: Test 3 FAILED - total size {} != expected {}\n",
+            total_size,
+            SZ_32K
+        );
+        test3_passed = false;
+    }
+
+    if test1_passed && test2_passed && test3_passed {
+        dev_info!(dev, "MM: All self-tests PASSED\n");
+        Ok(())
+    } else {
+        dev_err!(dev, "MM: Self-tests FAILED\n");
+        Err(EIO)
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v8 23/25] gpu: nova-core: mm: Add BarUser to struct Gpu and create at boot
From: Joel Fernandes @ 2026-02-24 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, nouveau, rust-for-linux, Nikola Djukic,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260224225323.3312204-1-joelagnelf@nvidia.com>

Add a BarUser field to struct Gpu and eagerly create it during GPU
initialization. The BarUser provides the BAR1 user interface for CPU
access to GPU virtual memory through the GPU's MMU.

The BarUser is initialized using BAR1 PDE base address from GSP static
info, MMU version and BAR1 size obtained from platform device.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs          | 22 +++++++++++++++++++++-
 drivers/gpu/nova-core/gsp/commands.rs |  1 -
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index ba769de2f984..4281487ca52e 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -26,7 +26,12 @@
         commands::GetGspStaticInfoReply,
         Gsp, //
     },
-    mm::GpuMm,
+    mm::{
+        bar_user::BarUser,
+        pagetable::MmuVersion,
+        GpuMm,
+        VramAddress, //
+    },
     num::IntoSafeCast,
     regs,
 };
@@ -36,6 +41,7 @@
 struct BootParams {
     usable_vram_start: u64,
     usable_vram_size: u64,
+    bar1_pde_base: u64,
 }
 
 macro_rules! define_chipset {
@@ -273,6 +279,8 @@ pub(crate) struct Gpu {
     gsp: Gsp,
     /// Static GPU information from GSP.
     gsp_static_info: GetGspStaticInfoReply,
+    /// BAR1 user interface for CPU access to GPU virtual memory.
+    bar_user: BarUser,
 }
 
 impl Gpu {
@@ -286,6 +294,7 @@ pub(crate) fn new<'a>(
         let boot_params: Cell<BootParams> = Cell::new(BootParams {
             usable_vram_start: 0,
             usable_vram_size: 0,
+            bar1_pde_base: 0,
         });
 
         try_pin_init!(Self {
@@ -330,6 +339,7 @@ pub(crate) fn new<'a>(
                 boot_params.set(BootParams {
                     usable_vram_start: usable_vram.start,
                     usable_vram_size: usable_vram.end - usable_vram.start,
+                    bar1_pde_base: info.bar1_pde_base(),
                 });
 
                 info
@@ -346,6 +356,16 @@ pub(crate) fn new<'a>(
                 })?
             },
 
+            // Create BAR1 user interface for CPU access to GPU virtual memory.
+            // Uses the BAR1 PDE base from GSP and full BAR1 size for VA space.
+            bar_user: {
+                let params = boot_params.get();
+                let pdb_addr = VramAddress::new(params.bar1_pde_base);
+                let mmu_version = MmuVersion::from(spec.chipset.arch());
+                let bar1_size = pdev.resource_len(1)?;
+                BarUser::new(pdb_addr, mmu_version, bar1_size)?
+            },
+
             bar: devres_bar,
         })
     }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 22bd61e2a67e..8b3fa29c57f1 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -235,7 +235,6 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
     }
 
     /// Returns the BAR1 Page Directory Entry base address.
-    #[expect(dead_code)]
     pub(crate) fn bar1_pde_base(&self) -> u64 {
         self.bar1_pde_base
     }
-- 
2.34.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox