public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
@ 2026-02-05  4:11 John Hubbard
  2026-02-05  4:16 ` John Hubbard
  2026-02-05 13:44 ` Gary Guo
  0 siblings, 2 replies; 6+ messages in thread
From: John Hubbard @ 2026-02-05  4:11 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Joel Fernandes, Timur Tabi, Alistair Popple, Eliot Courtney,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nouveau, rust-for-linux, LKML, John Hubbard

The auxiliary device registration was using a hardcoded ID of 0, which
caused probe() to fail on multi-GPU systems with:

   sysfs: cannot create duplicate filename '/bus/auxiliary/devices/NovaCore.nova-drm.0'

Fix this by using an atomic counter to generate unique IDs for each
GPU's aux device registration. The TODO item to eventually use XArray
for recycling aux device IDs is retained, but for now, this works very
nicely.

This has the side effect of making debugfs[1] work on multi-GPU systems.

[1] https://lore.kernel.org/20260203224757.871729-1-ttabi@nvidia.com

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Hi,

This is based on today's (Feb 4, 2026) linux-next/master branch.

thanks,
John Hubbard

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 5a4cc047bcfc..a542ec0b40fa 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::sync::atomic::{AtomicU32, Ordering};
+
 use kernel::{
     auxiliary,
     device::Core,
@@ -19,6 +21,9 @@
 
 use crate::gpu::Gpu;
 
+/// Counter for generating unique auxiliary device IDs.
+static AUXILIARY_ID_COUNTER: AtomicU32 = AtomicU32::new(0);
+
 #[pin_data]
 pub(crate) struct NovaCore {
     #[pin]
@@ -85,12 +90,17 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
                 GFP_KERNEL,
             )?;
 
+            // TODO[XARR]: Use XArray for proper ID allocation/recycling; for now we use a simple
+            // atomic counter which never recycles IDs. A unique ID is required for multi-GPU
+            // systems; without it, probe() fails for all but the first GPU.
+            let aux_id = AUXILIARY_ID_COUNTER.fetch_add(1, Ordering::Relaxed);
+
             Ok(try_pin_init!(Self {
                 gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
                 _reg <- auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
-                    0, // TODO[XARR]: Once it lands, use XArray; for now we don't use the ID.
+                    aux_id,
                     crate::MODULE_NAME
                 ),
             }))

base-commit: 0f8a890c4524d6e4013ff225e70de2aed7e6d726
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
  2026-02-05  4:11 [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems John Hubbard
@ 2026-02-05  4:16 ` John Hubbard
  2026-02-05 13:44 ` Gary Guo
  1 sibling, 0 replies; 6+ messages in thread
From: John Hubbard @ 2026-02-05  4:16 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: Joel Fernandes, Timur Tabi, Alistair Popple, Eliot Courtney,
	Zhi Wang, David Airlie, Simona Vetter, Bjorn Helgaas,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nouveau, rust-for-linux, LKML

On 2/4/26 8:11 PM, John Hubbard wrote:
> The auxiliary device registration was using a hardcoded ID of 0, which
> caused probe() to fail on multi-GPU systems with:
> 
>    sysfs: cannot create duplicate filename '/bus/auxiliary/devices/NovaCore.nova-drm.0'
> 
> Fix this by using an atomic counter to generate unique IDs for each
> GPU's aux device registration. The TODO item to eventually use XArray
> for recycling aux device IDs is retained, but for now, this works very
> nicely.
> 
> This has the side effect of making debugfs[1] work on multi-GPU systems.
> 
> [1] https://lore.kernel.org/20260203224757.871729-1-ttabi@nvidia.com
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> Hi,
> 
> This is based on today's (Feb 4, 2026) linux-next/master branch.
> 
> thanks,
> John Hubbard
> 
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index 5a4cc047bcfc..a542ec0b40fa 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -1,5 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> +use core::sync::atomic::{AtomicU32, Ordering};

Somehow the wrong (non-vertical) formatting snuck back into
my patch! Arggh. I'll be glad when rustfmt support for this
can help me catch this.

> +
>  use kernel::{
>      auxiliary,
>      device::Core,
> @@ -19,6 +21,9 @@
>  
>  use crate::gpu::Gpu;
>  
> +/// Counter for generating unique auxiliary device IDs.
> +static AUXILIARY_ID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
>  #[pin_data]
>  pub(crate) struct NovaCore {
>      #[pin]
> @@ -85,12 +90,17 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
>                  GFP_KERNEL,
>              )?;
>  
> +            // TODO[XARR]: Use XArray for proper ID allocation/recycling; for now we use a simple

I also did *not* mean to leave the word "we" in there.

Lots of little glitches tonight, sorry about those.

thanks,
-- 
John Hubbard

> +            // atomic counter which never recycles IDs. A unique ID is required for multi-GPU
> +            // systems; without it, probe() fails for all but the first GPU.
> +            let aux_id = AUXILIARY_ID_COUNTER.fetch_add(1, Ordering::Relaxed);
> +
>              Ok(try_pin_init!(Self {
>                  gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
>                  _reg <- auxiliary::Registration::new(
>                      pdev.as_ref(),
>                      c"nova-drm",
> -                    0, // TODO[XARR]: Once it lands, use XArray; for now we don't use the ID.
> +                    aux_id,
>                      crate::MODULE_NAME
>                  ),
>              }))
> 
> base-commit: 0f8a890c4524d6e4013ff225e70de2aed7e6d726



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
  2026-02-05  4:11 [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems John Hubbard
  2026-02-05  4:16 ` John Hubbard
@ 2026-02-05 13:44 ` Gary Guo
  2026-02-05 13:48   ` Matthew Wilcox
  1 sibling, 1 reply; 6+ messages in thread
From: Gary Guo @ 2026-02-05 13:44 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich, Alexandre Courbot
  Cc: Matthew Wilcox, Joel Fernandes, Timur Tabi, Alistair Popple,
	Eliot Courtney, Zhi Wang, David Airlie, Simona Vetter,
	Bjorn Helgaas, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, nouveau, rust-for-linux, LKML

On Thu Feb 5, 2026 at 4:11 AM GMT, John Hubbard wrote:
> The auxiliary device registration was using a hardcoded ID of 0, which
> caused probe() to fail on multi-GPU systems with:
>
>    sysfs: cannot create duplicate filename '/bus/auxiliary/devices/NovaCore.nova-drm.0'
>
> Fix this by using an atomic counter to generate unique IDs for each
> GPU's aux device registration. The TODO item to eventually use XArray
> for recycling aux device IDs is retained, but for now, this works very
> nicely.
>
> This has the side effect of making debugfs[1] work on multi-GPU systems.

Hi John,

Looks like this is something that should be achieved via IDA?

Cc: Matthew Wilcox <willy@infradead.org>

>
> [1] https://lore.kernel.org/20260203224757.871729-1-ttabi@nvidia.com
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> Hi,
>
> This is based on today's (Feb 4, 2026) linux-next/master branch.
>
> thanks,
> John Hubbard
>
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index 5a4cc047bcfc..a542ec0b40fa 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -1,5 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> +use core::sync::atomic::{AtomicU32, Ordering};

We're stopping the use of Rust atomics. Please use LKMM atomics available from
`kernel::sync::atomic`.

Best,
Gary

> +
>  use kernel::{
>      auxiliary,
>      device::Core,
> @@ -19,6 +21,9 @@
>  
>  use crate::gpu::Gpu;
>  
> +/// Counter for generating unique auxiliary device IDs.
> +static AUXILIARY_ID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
>  #[pin_data]
>  pub(crate) struct NovaCore {
>      #[pin]
> @@ -85,12 +90,17 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
>                  GFP_KERNEL,
>              )?;
>  
> +            // TODO[XARR]: Use XArray for proper ID allocation/recycling; for now we use a simple
> +            // atomic counter which never recycles IDs. A unique ID is required for multi-GPU
> +            // systems; without it, probe() fails for all but the first GPU.
> +            let aux_id = AUXILIARY_ID_COUNTER.fetch_add(1, Ordering::Relaxed);
> +
>              Ok(try_pin_init!(Self {
>                  gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
>                  _reg <- auxiliary::Registration::new(
>                      pdev.as_ref(),
>                      c"nova-drm",
> -                    0, // TODO[XARR]: Once it lands, use XArray; for now we don't use the ID.
> +                    aux_id,
>                      crate::MODULE_NAME
>                  ),
>              }))
>
> base-commit: 0f8a890c4524d6e4013ff225e70de2aed7e6d726


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
  2026-02-05 13:44 ` Gary Guo
@ 2026-02-05 13:48   ` Matthew Wilcox
  2026-02-05 14:19     ` Danilo Krummrich
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2026-02-05 13:48 UTC (permalink / raw)
  To: Gary Guo
  Cc: John Hubbard, Danilo Krummrich, Alexandre Courbot, Joel Fernandes,
	Timur Tabi, Alistair Popple, Eliot Courtney, Zhi Wang,
	David Airlie, Simona Vetter, Bjorn Helgaas, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau,
	rust-for-linux, LKML

On Thu, Feb 05, 2026 at 01:44:27PM +0000, Gary Guo wrote:
> > Fix this by using an atomic counter to generate unique IDs for each
> > GPU's aux device registration. The TODO item to eventually use XArray
> > for recycling aux device IDs is retained, but for now, this works very
> > nicely.
> >
> > This has the side effect of making debugfs[1] work on multi-GPU systems.
> 
> Hi John,
> 
> Looks like this is something that should be achieved via IDA?

Yes, if you have no need to go from ID to pointer, an IDA is better.
That said, as far as I understand what this code is doing, an atomic_t
solves the problem just fine and is cheaper.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
  2026-02-05 13:48   ` Matthew Wilcox
@ 2026-02-05 14:19     ` Danilo Krummrich
  2026-02-05 21:44       ` John Hubbard
  0 siblings, 1 reply; 6+ messages in thread
From: Danilo Krummrich @ 2026-02-05 14:19 UTC (permalink / raw)
  To: John Hubbard, Matthew Wilcox
  Cc: Gary Guo, Alexandre Courbot, Joel Fernandes, Timur Tabi,
	Alistair Popple, Eliot Courtney, Zhi Wang, David Airlie,
	Simona Vetter, Bjorn Helgaas, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, nouveau, rust-for-linux, LKML

On Thu Feb 5, 2026 at 2:48 PM CET, Matthew Wilcox wrote:
> On Thu, Feb 05, 2026 at 01:44:27PM +0000, Gary Guo wrote:
>> > Fix this by using an atomic counter to generate unique IDs for each
>> > GPU's aux device registration. The TODO item to eventually use XArray
>> > for recycling aux device IDs is retained, but for now, this works very
>> > nicely.
>> >
>> > This has the side effect of making debugfs[1] work on multi-GPU systems.
>> 
>> Hi John,
>> 
>> Looks like this is something that should be achieved via IDA?
>
> Yes, if you have no need to go from ID to pointer, an IDA is better.
> That said, as far as I understand what this code is doing, an atomic_t
> solves the problem just fine and is cheaper.

I agree, for now an atomic should be perfectly fine. Though, with enough
patience binding/unbinding the driver from sysfs you can probably make this
overflow. :)

The reason for the Xarray TODO is that it is one option for a place where
nova-core can store nova-drm / vGPU specific data, once either vGPU or nova-drm
attaches to the auxiliary device. But I think there may be better alternatives.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
  2026-02-05 14:19     ` Danilo Krummrich
@ 2026-02-05 21:44       ` John Hubbard
  0 siblings, 0 replies; 6+ messages in thread
From: John Hubbard @ 2026-02-05 21:44 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Wilcox
  Cc: Gary Guo, Alexandre Courbot, Joel Fernandes, Timur Tabi,
	Alistair Popple, Eliot Courtney, Zhi Wang, David Airlie,
	Simona Vetter, Bjorn Helgaas, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, nouveau, rust-for-linux, LKML

On 2/5/26 6:19 AM, Danilo Krummrich wrote:
> On Thu Feb 5, 2026 at 2:48 PM CET, Matthew Wilcox wrote:
>> On Thu, Feb 05, 2026 at 01:44:27PM +0000, Gary Guo wrote:
>>>> Fix this by using an atomic counter to generate unique IDs for each
>>>> GPU's aux device registration. The TODO item to eventually use XArray
>>>> for recycling aux device IDs is retained, but for now, this works very
>>>> nicely.
>>>>
>>>> This has the side effect of making debugfs[1] work on multi-GPU systems.
>>>
>>> Hi John,
>>>
>>> Looks like this is something that should be achieved via IDA?
>>
>> Yes, if you have no need to go from ID to pointer, an IDA is better.
>> That said, as far as I understand what this code is doing, an atomic_t
>> solves the problem just fine and is cheaper.
> 
> I agree, for now an atomic should be perfectly fine. Though, with enough
> patience binding/unbinding the driver from sysfs you can probably make this
> overflow. :)
> 
> The reason for the Xarray TODO is that it is one option for a place where
> nova-core can store nova-drm / vGPU specific data, once either vGPU or nova-drm
> attaches to the auxiliary device. But I think there may be better alternatives.

OK, this seems like enough information to post a v2, thanks!

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-05 21:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-05  4:11 [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems John Hubbard
2026-02-05  4:16 ` John Hubbard
2026-02-05 13:44 ` Gary Guo
2026-02-05 13:48   ` Matthew Wilcox
2026-02-05 14:19     ` Danilo Krummrich
2026-02-05 21:44       ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox