public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Gerd Bayer <gbayer@linux.ibm.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Alexandra Winter <wintera@linux.ibm.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH net-next] net/mlx5: Allow asynchronous probe
Date: Thu, 5 Mar 2026 09:56:55 +0200	[thread overview]
Message-ID: <d6fcf417-87b1-4bbe-9ec1-cabb2b2ed1a6@gmail.com> (raw)
In-Reply-To: <20260303-parprobe_mlx5-v1-1-18194f2a1a3a@linux.ibm.com>



On 03/03/2026 12:33, Gerd Bayer wrote:
> Announce that mlx5_core supports asynchronous probing.
> 

Hi Gerd,
Interesting patch.

> Tests on s390 - where VFs can show up isolated from their PF in OS
> instances - showed symptoms of "mlx5_core: probe of 00e7:00:00.0 failed
> with error -12" when booting a system with a large number (> 250) of
> Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
> (15b3:101e) PCI functions.
> 
> Turns out that this is due to systemd-udev's time-out supervision of
> "modprobe" killing the sequential initialization of additional functions
> if probing exceeds a default of 180 seconds.
> 
> According to [1] device drivers could (slow ones should!) opt-in to have
> their probe step being executed asynchronously - and interleaved. With
> the mlx5_core device driver announcing PROBE_PREFER_ASYNCHRONOUS as
> proposed by this patch, we've measured 275 VFs being probed successfully
> in about 60 seconds.
> 

Nice.

> [1] https://www.kernel.org/doc/html/latest/driver-api/infrastructure.html
> 
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
> Hi all,
> 
> this patch helps to speed up boot times when there are a large numbers
> of Mellanox/NVidia VFs in a configuration. Although we've seens real
> issues, I'm hesitating to declare this a fix of commit 9603b61de1ee
> ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") primarily
> because the concept of asynchronous probing with commit 765230b5f084
> ("driver-core: add asynchronous probing support for drivers") was
> introduced only later.
> 
> Thanks,
> Gerd Bayer
> ---

This is an interesting problem, and the proposed solution looks 
reasonable. That said, this is a very sensitive area and there may still 
be hidden assumptions or corner cases we haven't considered. This needs 
thorough testing across a wide range of real-world scenarios and 
different system topologies before we can be confident in it.

We'll take this for testing and report back once we have results.

BTW, as you probably know, a possible workaround is to increase the 
systemd-udev timeout.
What timeout is required for it to succeed without this change?

>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index fdc3ba20912e4fbc53c65825c62e868996eff56d..b53fc3f2566acf5a07cb8df649124c4a87f3e043 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -2306,6 +2306,9 @@ static struct pci_driver mlx5_core_driver = {
>   	.sriov_configure   = mlx5_core_sriov_configure,
>   	.sriov_get_vf_total_msix = mlx5_sriov_get_vf_total_msix,
>   	.sriov_set_msix_vec_count = mlx5_core_sriov_set_msix_vec_count,
> +	.driver		= {
> +		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
> +	}
>   };
>   
>   /**
> 
> ---
> base-commit: c69855ada28656fdd7e197b6e24cd40a04fe14d3
> change-id: 20260303-parprobe_mlx5-d10d2a746d3a
> 
> Best regards,


  reply	other threads:[~2026-03-05  7:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-03 10:33 [PATCH net-next] net/mlx5: Allow asynchronous probe Gerd Bayer
2026-03-05  7:56 ` Tariq Toukan [this message]
2026-03-05 10:03   ` Gerd Bayer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6fcf417-87b1-4bbe-9ec1-cabb2b2ed1a6@gmail.com \
    --to=ttoukan.linux@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gbayer@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=oberpar@linux.ibm.com \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=saeedm@nvidia.com \
    --cc=schnelle@linux.ibm.com \
    --cc=tariqt@nvidia.com \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox