public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net/mlx5: Allow asynchronous probe
@ 2026-03-03 10:33 Gerd Bayer
  2026-03-05  7:56 ` Tariq Toukan
  0 siblings, 1 reply; 3+ messages in thread
From: Gerd Bayer @ 2026-03-03 10:33 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Niklas Schnelle, Peter Oberparleiter, Halil Pasic,
	Alexandra Winter, netdev, linux-rdma, linux-kernel, linux-s390,
	Gerd Bayer

Announce that mlx5_core supports asynchronous probing.

Tests on s390 - where VFs can show up isolated from their PF in OS
instances - showed symptoms of "mlx5_core: probe of 00e7:00:00.0 failed
with error -12" when booting a system with a large number (> 250) of
Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
(15b3:101e) PCI functions.

Turns out that this is due to systemd-udev's time-out supervision of
"modprobe" killing the sequential initialization of additional functions
if probing exceeds a default of 180 seconds.

According to [1] device drivers could (slow ones should!) opt-in to have
their probe step being executed asynchronously - and interleaved. With
the mlx5_core device driver announcing PROBE_PREFER_ASYNCHRONOUS as
proposed by this patch, we've measured 275 VFs being probed successfully
in about 60 seconds.

[1] https://www.kernel.org/doc/html/latest/driver-api/infrastructure.html

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
---
Hi all,

this patch helps to speed up boot times when there are a large numbers
of Mellanox/NVidia VFs in a configuration. Although we've seens real
issues, I'm hesitating to declare this a fix of commit 9603b61de1ee
("mlx5: Move pci device handling from mlx5_ib to mlx5_core") primarily
because the concept of asynchronous probing with commit 765230b5f084
("driver-core: add asynchronous probing support for drivers") was
introduced only later.

Thanks,
Gerd Bayer
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index fdc3ba20912e4fbc53c65825c62e868996eff56d..b53fc3f2566acf5a07cb8df649124c4a87f3e043 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -2306,6 +2306,9 @@ static struct pci_driver mlx5_core_driver = {
 	.sriov_configure   = mlx5_core_sriov_configure,
 	.sriov_get_vf_total_msix = mlx5_sriov_get_vf_total_msix,
 	.sriov_set_msix_vec_count = mlx5_core_sriov_set_msix_vec_count,
+	.driver		= {
+		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
+	}
 };
 
 /**

---
base-commit: c69855ada28656fdd7e197b6e24cd40a04fe14d3
change-id: 20260303-parprobe_mlx5-d10d2a746d3a

Best regards,
-- 
Gerd Bayer <gbayer@linux.ibm.com>


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] net/mlx5: Allow asynchronous probe
  2026-03-03 10:33 [PATCH net-next] net/mlx5: Allow asynchronous probe Gerd Bayer
@ 2026-03-05  7:56 ` Tariq Toukan
  2026-03-05 10:03   ` Gerd Bayer
  0 siblings, 1 reply; 3+ messages in thread
From: Tariq Toukan @ 2026-03-05  7:56 UTC (permalink / raw)
  To: Gerd Bayer, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Niklas Schnelle, Peter Oberparleiter, Halil Pasic,
	Alexandra Winter, netdev, linux-rdma, linux-kernel, linux-s390



On 03/03/2026 12:33, Gerd Bayer wrote:
> Announce that mlx5_core supports asynchronous probing.
> 

Hi Gerd,
Interesting patch.

> Tests on s390 - where VFs can show up isolated from their PF in OS
> instances - showed symptoms of "mlx5_core: probe of 00e7:00:00.0 failed
> with error -12" when booting a system with a large number (> 250) of
> Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
> (15b3:101e) PCI functions.
> 
> Turns out that this is due to systemd-udev's time-out supervision of
> "modprobe" killing the sequential initialization of additional functions
> if probing exceeds a default of 180 seconds.
> 
> According to [1] device drivers could (slow ones should!) opt-in to have
> their probe step being executed asynchronously - and interleaved. With
> the mlx5_core device driver announcing PROBE_PREFER_ASYNCHRONOUS as
> proposed by this patch, we've measured 275 VFs being probed successfully
> in about 60 seconds.
> 

Nice.

> [1] https://www.kernel.org/doc/html/latest/driver-api/infrastructure.html
> 
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi all,
> 
> this patch helps to speed up boot times when there are a large numbers
> of Mellanox/NVidia VFs in a configuration. Although we've seens real
> issues, I'm hesitating to declare this a fix of commit 9603b61de1ee
> ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") primarily
> because the concept of asynchronous probing with commit 765230b5f084
> ("driver-core: add asynchronous probing support for drivers") was
> introduced only later.
> 
> Thanks,
> Gerd Bayer
> ---

This is an interesting problem, and the proposed solution looks 
reasonable. That said, this is a very sensitive area and there may still 
be hidden assumptions or corner cases we haven't considered. This needs 
thorough testing across a wide range of real-world scenarios and 
different system topologies before we can be confident in it.

We'll take this for testing and report back once we have results.

BTW, as you probably know, a possible workaround is to increase the 
systemd-udev timeout.
What timeout is required for it to succeed without this change?

>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index fdc3ba20912e4fbc53c65825c62e868996eff56d..b53fc3f2566acf5a07cb8df649124c4a87f3e043 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -2306,6 +2306,9 @@ static struct pci_driver mlx5_core_driver = {
>   	.sriov_configure   = mlx5_core_sriov_configure,
>   	.sriov_get_vf_total_msix = mlx5_sriov_get_vf_total_msix,
>   	.sriov_set_msix_vec_count = mlx5_core_sriov_set_msix_vec_count,
> +	.driver		= {
> +		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
> +	}
>   };
>   
>   /**
> 
> ---
> base-commit: c69855ada28656fdd7e197b6e24cd40a04fe14d3
> change-id: 20260303-parprobe_mlx5-d10d2a746d3a
> 
> Best regards,


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] net/mlx5: Allow asynchronous probe
  2026-03-05  7:56 ` Tariq Toukan
@ 2026-03-05 10:03   ` Gerd Bayer
  0 siblings, 0 replies; 3+ messages in thread
From: Gerd Bayer @ 2026-03-05 10:03 UTC (permalink / raw)
  To: Tariq Toukan, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Niklas Schnelle, Peter Oberparleiter, Halil Pasic,
	Alexandra Winter, netdev, linux-rdma, linux-kernel, linux-s390

On Thu, 2026-03-05 at 09:56 +0200, Tariq Toukan wrote:
> 
> On 03/03/2026 12:33, Gerd Bayer wrote:
> > Announce that mlx5_core supports asynchronous probing.
> > 
> 
> Hi Gerd,
> Interesting patch.
> 
> > Tests on s390 - where VFs can show up isolated from their PF in OS
> > instances - showed symptoms of "mlx5_core: probe of 00e7:00:00.0 failed
> > with error -12" when booting a system with a large number (> 250) of
> > Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
> > (15b3:101e) PCI functions.
> > 
> > Turns out that this is due to systemd-udev's time-out supervision of
> > "modprobe" killing the sequential initialization of additional functions
> > if probing exceeds a default of 180 seconds.
> > 
> > According to [1] device drivers could (slow ones should!) opt-in to have
> > their probe step being executed asynchronously - and interleaved. With
> > the mlx5_core device driver announcing PROBE_PREFER_ASYNCHRONOUS as
> > proposed by this patch, we've measured 275 VFs being probed successfully
> > in about 60 seconds.
> > 
> 
> Nice.
> 
> > [1] https://www.kernel.org/doc/html/latest/driver-api/infrastructure.html
> > 
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > ---
> > Hi all,
> > 
> > this patch helps to speed up boot times when there are a large numbers
> > of Mellanox/NVidia VFs in a configuration. Although we've seens real
> > issues, I'm hesitating to declare this a fix of commit 9603b61de1ee
> > ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") primarily
> > because the concept of asynchronous probing with commit 765230b5f084
> > ("driver-core: add asynchronous probing support for drivers") was
> > introduced only later.
> > 
> > Thanks,
> > Gerd Bayer
> > ---
> 
> This is an interesting problem, and the proposed solution looks 
> reasonable. That said, this is a very sensitive area and there may still 
> be hidden assumptions or corner cases we haven't considered. This needs 
> thorough testing across a wide range of real-world scenarios and 
> different system topologies before we can be confident in it.

I agree that a change like this might expose concurrency issues lurking
both in the driver instance controlling the VFs and the driver instance
running the PF. I have to admit, that my testing so far was primarily
focused on making large configurations work rather than "regression
tests" with "household configurations" of 1..~10 VFs. I'll discuss in-
house how we can increase coverage as well.

> 
> We'll take this for testing and report back once we have results.

Thank you for your consideration.

> 
> BTW, as you probably know, a possible workaround is to increase the 
> systemd-udev timeout.
> What timeout is required for it to succeed without this change?

Yes, I did some very limited experiments with that, but I was measuring
the uninterruptible duration of initializing a single VF instance to be
close to one second. That would mean that for the 275 VFs I'd have to
up the time-out value from 180 to ~300 seconds. That would be ~5
minutes of boot latency (worst case)... 

> 
> >   drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index fdc3ba20912e4fbc53c65825c62e868996eff56d..b53fc3f2566acf5a07cb8df649124c4a87f3e043 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -2306,6 +2306,9 @@ static struct pci_driver mlx5_core_driver = {
> >   	.sriov_configure   = mlx5_core_sriov_configure,
> >   	.sriov_get_vf_total_msix = mlx5_sriov_get_vf_total_msix,
> >   	.sriov_set_msix_vec_count = mlx5_core_sriov_set_msix_vec_count,
> > +	.driver		= {
> > +		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
> > +	}
> >   };
> >   
> >   /**
> > 
> > ---
> > base-commit: c69855ada28656fdd7e197b6e24cd40a04fe14d3
> > change-id: 20260303-parprobe_mlx5-d10d2a746d3a
> > 
> > Best regards,

Thank you,
Gerd

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-05 10:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 10:33 [PATCH net-next] net/mlx5: Allow asynchronous probe Gerd Bayer
2026-03-05  7:56 ` Tariq Toukan
2026-03-05 10:03   ` Gerd Bayer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox