* pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
@ 2025-06-26 10:14 Jozef Matejcik (Nokia)
2025-06-26 12:08 ` Lukas Wunner
0 siblings, 1 reply; 8+ messages in thread
From: Jozef Matejcik (Nokia) @ 2025-06-26 10:14 UTC (permalink / raw)
To: linux-pci@vger.kernel.org
Hi kernel community,
We have one specific problem related to Linux PCI subsystem.
We have a device with 2 identical NPUs, so 2 identical PCI devices sharing the same 3rd party driver. Our problem is that _pci_probe of this driver is called concurrently from 2 kernel threads. It happens more frequently when kernel debug logs are enabled in GRUB, appr. every 20th or 30th reboot of the device.
I am writing this mail because it's possible this is generic issue of Linux PCI subsystem which may affect more people/companies - please correct me if I am wrong.
When digging for this in driver's source and Linux kernel source, I found this place in pci_call_probe:
if (cpu < nr_cpu_ids)
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
This was added in 0b2c2a71 in 2017. Quoting part of commit message:
PCI: Replace the racy recursion prevention
pci_call_probe() can called recursively when a physcial function is probed
and the probing creates virtual functions, which are populated via
pci_bus_add_device() which in turn can end up calling pci_call_probe()
again.
<end of quote>
So the fix is specifically related to devices with multiple VFs. But does this take into account the setup with 2 separate, but otherwise identical PCI devices? Is it possible this can occur in any machine with 2 identical PCI devices?
Snippet from dmesg (unfortunately, I am not sure how much I can share):
[ 76.586492] linux-kernel-bde (154): DO_NOT_COMMIT: in _pci_probe at 2627
[ 76.586494] linux-kernel-bde (154): DO_NOT_COMMIT: ctrl addr before: 0000000000000000, _ndevices: 0
[ 76.586497] linux-kernel-bde (154): DO_NOT_COMMIT: ctrl addr after: 00000000f24dc905, _ndevices: 0
[ 76.595735] linux-kernel-bde (4688): DO_NOT_COMMIT: _devices at 00000000f24dc905, sizeof(*_devices): 472
[ 76.603415] linux-kernel-bde (154): DO_NOT_COMMIT: ctrl->dev_type set to 256
[ 76.628884] linux-kernel-bde (4688): DO_NOT_COMMIT: dev->device: 8854
[ 76.644076] linux-kernel-bde (4688): DO_NOT_COMMIT: in _pci_probe at 2627
[ 76.661176] linux-kernel-bde (4688): DO_NOT_COMMIT: ctrl addr before: 0000000000000000, _ndevices: 0
[ 76.679854] linux-kernel-bde (4688): DO_NOT_COMMIT: ctrl addr after: 00000000f24dc905, _ndevices: 0
I checked sources of several drivers for various PCI devices, but none of them seem to assume probe callback can be called from multiple threads.
Output of uname -a:
Linux Dut-A 6.1.128-13-amd64 #1 SMP PREEMPT_DYNAMIC Thu Jun 12 07:22:21 UTC 2025 x86_64 GNU/Linux
Regards,
Jozef
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 10:14 pci_probe called concurrently in machine with 2 identical PCI devices causing race condition Jozef Matejcik (Nokia)
@ 2025-06-26 12:08 ` Lukas Wunner
2025-06-26 12:20 ` Jozef Matejcik (Nokia)
0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2025-06-26 12:08 UTC (permalink / raw)
To: Jozef Matejcik (Nokia); +Cc: linux-pci@vger.kernel.org
On Thu, Jun 26, 2025 at 10:14:00AM +0000, Jozef Matejcik (Nokia) wrote:
> We have one specific problem related to Linux PCI subsystem.
>
> We have a device with 2 identical NPUs, so 2 identical PCI devices
> sharing the same 3rd party driver. Our problem is that _pci_probe of
> this driver is called concurrently from 2 kernel threads. It happens
> more frequently when kernel debug logs are enabled in GRUB, appr.
> every 20th or 30th reboot of the device.
So what exactly is the "problem"? Does something not work?
Do you get errors or warnings?
> So the fix is specifically related to devices with multiple VFs.
> But does this take into account the setup with 2 separate, but
> otherwise identical PCI devices? Is it possible this can occur
> in any machine with 2 identical PCI devices?
Not unless probing of one PF creates another PF.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 12:08 ` Lukas Wunner
@ 2025-06-26 12:20 ` Jozef Matejcik (Nokia)
2025-06-26 12:26 ` Lukas Wunner
0 siblings, 1 reply; 8+ messages in thread
From: Jozef Matejcik (Nokia) @ 2025-06-26 12:20 UTC (permalink / raw)
To: Lukas Wunner; +Cc: linux-pci@vger.kernel.org
Hi Lukas,
The problem is that when this race occurs, the second NPU (PCI device) remains uninitialized in the kernel driver. And I don't think it's specific to the driver and device we are using, hence I am asking on this mailing list.
The driver keeps internal global array of initialized devices and their count. The working sequence is this:
- call pci_probe for 1st NPU, store it at index 0 in the array, increment count
- call pci_probe for second NPU, store it at index 1, increment count
What happens in erroneous case:
- call pci_probe, store it at index 0
- call pci_probe, store it at index 0 !!
- increment the counter in first pci probe
In this case, datapath on top of these ASICs does not work, because it expects the driver to initialize both ASICs.
I know this can be fixed in the driver by proper locking and we have contacted the vendor. However, I think this can happen in any machine with 2 identical PCI devices, because as far as I know, existing PCI drivers usually do not assume that probe function can be called from multiple threads.
Thanks,
Jozef
-----Original Message-----
From: Lukas Wunner <lukas@wunner.de>
Sent: Thursday, June 26, 2025 2:09 PM
To: Jozef Matejcik (Nokia) <jozef.matejcik@nokia.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
[You don't often get email from lukas@wunner.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
On Thu, Jun 26, 2025 at 10:14:00AM +0000, Jozef Matejcik (Nokia) wrote:
> We have one specific problem related to Linux PCI subsystem.
>
> We have a device with 2 identical NPUs, so 2 identical PCI devices
> sharing the same 3rd party driver. Our problem is that _pci_probe of
> this driver is called concurrently from 2 kernel threads. It happens
> more frequently when kernel debug logs are enabled in GRUB, appr.
> every 20th or 30th reboot of the device.
So what exactly is the "problem"? Does something not work?
Do you get errors or warnings?
> So the fix is specifically related to devices with multiple VFs.
> But does this take into account the setup with 2 separate, but
> otherwise identical PCI devices? Is it possible this can occur in any
> machine with 2 identical PCI devices?
Not unless probing of one PF creates another PF.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 12:20 ` Jozef Matejcik (Nokia)
@ 2025-06-26 12:26 ` Lukas Wunner
2025-06-26 15:41 ` Keith Busch
0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2025-06-26 12:26 UTC (permalink / raw)
To: Jozef Matejcik (Nokia); +Cc: linux-pci@vger.kernel.org
On Thu, Jun 26, 2025 at 12:20:48PM +0000, Jozef Matejcik (Nokia) wrote:
> The driver keeps internal global array of initialized devices and their
> count.
[...]
> I know this can be fixed in the driver by proper locking
Right.
> However, I think this can happen in any machine with 2 identical
> PCI devices, because as far as I know, existing PCI drivers usually
> do not assume that probe function can be called from multiple threads.
That can happen all the time and it would be a bug in the driver
if it caused issues.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 12:26 ` Lukas Wunner
@ 2025-06-26 15:41 ` Keith Busch
2025-06-26 18:16 ` Jozef Matejcik (Nokia)
2025-07-04 8:03 ` Lukas Wunner
0 siblings, 2 replies; 8+ messages in thread
From: Keith Busch @ 2025-06-26 15:41 UTC (permalink / raw)
To: Lukas Wunner; +Cc: Jozef Matejcik (Nokia), linux-pci@vger.kernel.org
On Thu, Jun 26, 2025 at 02:26:56PM +0200, Lukas Wunner wrote:
> On Thu, Jun 26, 2025 at 12:20:48PM +0000, Jozef Matejcik (Nokia) wrote:
> > However, I think this can happen in any machine with 2 identical
> > PCI devices, because as far as I know, existing PCI drivers usually
> > do not assume that probe function can be called from multiple threads.
>
> That can happen all the time and it would be a bug in the driver
> if it caused issues.
Wait, is that true? I thought that would only happen if the driver
indicated probe_type PROBE_PREFER_ASYNCHRONOUS. The default appears to
still be the same as PROBE_FORCE_SYNCHRONOUS.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 15:41 ` Keith Busch
@ 2025-06-26 18:16 ` Jozef Matejcik (Nokia)
2025-06-26 22:37 ` Keith Busch
2025-07-04 8:03 ` Lukas Wunner
1 sibling, 1 reply; 8+ messages in thread
From: Jozef Matejcik (Nokia) @ 2025-06-26 18:16 UTC (permalink / raw)
To: Keith Busch, Lukas Wunner; +Cc: linux-pci@vger.kernel.org
On Thu, Jun 26, 2025 at 02:26:56PM +0200, Lukas Wunner wrote:
> > > On Thu, Jun 26, 2025 at 12:20:48PM +0000, Jozef Matejcik (Nokia) wrote:
> > > However, I think this can happen in any machine with 2 identical PCI
> > > devices, because as far as I know, existing PCI drivers usually do
> > > not assume that probe function can be called from multiple threads.
> >
> > That can happen all the time and it would be a bug in the driver if it
> > caused issues.
> Wait, is that true? I thought that would only happen if the driver indicated probe_type PROBE_PREFER_ASYNCHRONOUS. The default > > appears to still be the same as PROBE_FORCE_SYNCHRONOUS.
Hi Keith, thanks for stepping in. The implementation in drivers/pci/*.c seems to be pretty agnostic to probe_type. I did not find any place where this enum is accessed and some decisions are made based on its value.
If the probe_type field of struct driver is relevant for PCI subsystem, I think it should be documented in Documentation/PCI/pci.rst.
In any case, we will push the vendor to fix their driver, but if there is anything that should be improved in the kernel, I can assist.
Regards,
Jozef
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 18:16 ` Jozef Matejcik (Nokia)
@ 2025-06-26 22:37 ` Keith Busch
0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2025-06-26 22:37 UTC (permalink / raw)
To: Jozef Matejcik (Nokia); +Cc: Lukas Wunner, linux-pci@vger.kernel.org
On Thu, Jun 26, 2025 at 06:16:31PM +0000, Jozef Matejcik (Nokia) wrote:
> Hi Keith, thanks for stepping in. The implementation in drivers/pci/*.c seems to be pretty agnostic to probe_type. I did not find any place where this enum is accessed and some decisions are made based on its value.
You're looking in the wrong place. The different handling for device
probe type happens in base/dd.c, function __driver_attach.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition
2025-06-26 15:41 ` Keith Busch
2025-06-26 18:16 ` Jozef Matejcik (Nokia)
@ 2025-07-04 8:03 ` Lukas Wunner
1 sibling, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2025-07-04 8:03 UTC (permalink / raw)
To: Keith Busch; +Cc: Jozef Matejcik (Nokia), linux-pci@vger.kernel.org
On Thu, Jun 26, 2025 at 09:41:58AM -0600, Keith Busch wrote:
> On Thu, Jun 26, 2025 at 02:26:56PM +0200, Lukas Wunner wrote:
> > On Thu, Jun 26, 2025 at 12:20:48PM +0000, Jozef Matejcik (Nokia) wrote:
> > > However, I think this can happen in any machine with 2 identical
> > > PCI devices, because as far as I know, existing PCI drivers usually
> > > do not assume that probe function can be called from multiple threads.
> >
> > That can happen all the time and it would be a bug in the driver
> > if it caused issues.
>
> Wait, is that true? I thought that would only happen if the driver
> indicated probe_type PROBE_PREFER_ASYNCHRONOUS. The default appears to
> still be the same as PROBE_FORCE_SYNCHRONOUS.
You're right, and additionally PROBE_PREFER_ASYNCHRONOUS is only honored
on deferred probing. It appears Jozef is using an out-of-tree driver,
so it's unclear if those conditions are met, but if they are, then the
driver's ->probe() hook may be executed multiple times concurrently.
I guess I went out on a limb with the above-quoted statement, so I
apologize for that.
I've just submitted a patch to honor PROBE_PREFER_ASYNCHRONOUS also on
initial probing:
https://lore.kernel.org/r/53abe6f5ac7c631f95f5d061aa748b192eda0379.1751614426.git.lukas@wunner.de
Would you mind giving it a spin to ascertain that initial probing does
happen asynchronously with it? The nvme driver (which you co-maintain)
already opts in to async probing, so should take advantage of it right
away. GPU drivers seem particularly guilty of lengthy probe times,
so you might want to test async probing with those as well, in order to
have quicker booting on machines used for training neural networks.
Thanks!
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-04 8:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 10:14 pci_probe called concurrently in machine with 2 identical PCI devices causing race condition Jozef Matejcik (Nokia)
2025-06-26 12:08 ` Lukas Wunner
2025-06-26 12:20 ` Jozef Matejcik (Nokia)
2025-06-26 12:26 ` Lukas Wunner
2025-06-26 15:41 ` Keith Busch
2025-06-26 18:16 ` Jozef Matejcik (Nokia)
2025-06-26 22:37 ` Keith Busch
2025-07-04 8:03 ` Lukas Wunner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox