From: Jakub Kicinski <kuba@kernel.org>
To: faicker.mo@gmail.com
Cc: Jakub Kicinski <kuba@kernel.org>,
horms@kernel.org, sridhar.samudrala@intel.com,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
pabeni@redhat.com, sdf@fomichev.me, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] net: net_failover: Fix the deadlock in slave register
Date: Tue, 28 Apr 2026 18:19:54 -0700 [thread overview]
Message-ID: <20260429011954.1519021-1-kuba@kernel.org> (raw)
In-Reply-To: <20260427092923.93901-1-faicker.mo@gmail.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: net_failover: Fix the deadlock in slave register
This patch replaces dev_set_mtu(), dev_open() and dev_close() in
net_failover_slave_register() with their non-locking netif_* counterparts
to avoid a deadlock when the slave is registered from the NETDEV_REGISTER
notifier, where register_netdevice() already holds the per-device ops
lock.
> diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
> index d0361aaf25ef..3f7d31033bae 100644
> --- a/drivers/net/net_failover.c
> +++ b/drivers/net/net_failover.c
> @@ -502,7 +502,7 @@ static int net_failover_slave_register(struct net_device *slave_dev,
>
> /* Align MTU of slave with failover dev */
> orig_mtu = slave_dev->mtu;
> - err = dev_set_mtu(slave_dev, failover_dev->mtu);
> + err = netif_set_mtu(slave_dev, failover_dev->mtu);
Is net_failover_slave_register() only reachable from the NETDEV_REGISTER
notifier path?
Looking at net/core/failover.c, the fops->slave_register callback is
invoked from two places. One is failover_event() on NETDEV_REGISTER,
which matches the call trace in the commit message and where
register_netdevice() has already taken the ops lock via
netdev_lock_ops().
The other is failover_register() -> failover_existing_slave_register(),
which walks existing netdevs under rtnl_lock only:
net/core/failover.c:failover_existing_slave_register() {
rtnl_lock();
for_each_netdev(net, dev) {
if (netif_is_failover(dev))
continue;
if (ether_addr_equal(failover_dev->perm_addr, dev->perm_addr))
failover_slave_register(dev);
}
rtnl_unlock();
}
On that path, the slave's dev->lock is not held by the caller.
netif_set_mtu() -> netif_set_mtu_ext() calls netdev_ops_assert_locked(dev)
for devices with request_ops_lock (for example mlx5e, the driver in the
commit's own call trace).
> @@ -512,11 +512,11 @@ static int net_failover_slave_register(struct net_device *slave_dev,
> dev_hold(slave_dev);
>
> if (netif_running(failover_dev)) {
> - err = dev_open(slave_dev, NULL);
> + err = netif_open(slave_dev, NULL);
Same question for this call. netif_open() -> __dev_open() also does
netdev_ops_assert_locked(dev), and on the failover_existing_slave_register()
path the slave's ops lock is not held.
Before this change, dev_set_mtu()/dev_open()/dev_close() acquired the ops
lock internally via netdev_lock_ops(), so the pre-existing-slave path was
covered. Does switching unconditionally to the non-locking variants
regress that path for slaves whose drivers use the per-device ops lock,
both as a lockdep splat when CONFIG_PROVE_LOCKING is set and as a loss of
ops-lock serialization against concurrent ethtool, queue and netdev-genl
operations on the slave?
> @@ -562,10 +562,10 @@ static int net_failover_slave_register(struct net_device *slave_dev,
> err_vlan_add:
> dev_uc_unsync(slave_dev, failover_dev);
> dev_mc_unsync(slave_dev, failover_dev);
> - dev_close(slave_dev);
> -err_dev_open:
> + netif_close(slave_dev);
> +err_netif_open:
> dev_put(slave_dev);
> - dev_set_mtu(slave_dev, orig_mtu);
> + netif_set_mtu(slave_dev, orig_mtu);
Would it be more appropriate to acquire netdev_lock_ops(slave_dev) in
net_failover_slave_register() when the caller is not the NETDEV_REGISTER
notifier (where the ops lock is already held), instead of dropping the
locking unconditionally?
The commit message explains the NETDEV_REGISTER caller but does not
mention the failover_existing_slave_register() caller; could that
rationale be extended, or the other caller handled explicitly?
--
pw-bot: cr
next prev parent reply other threads:[~2026-04-29 1:23 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 9:29 [PATCH v2] net: net_failover: Fix the deadlock in slave register faicker.mo
2026-04-29 1:19 ` Jakub Kicinski [this message]
2026-05-02 2:46 ` faicker mo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260429011954.1519021-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=faicker.mo@gmail.com \
--cc=horms@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=sridhar.samudrala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox