From: Stanislav Fomichev <stfomichev@gmail.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
Dragos Tatulea <dtatulea@nvidia.com>,
"sdf@fomichev.me" <sdf@fomichev.me>,
"pabeni@redhat.com" <pabeni@redhat.com>,
"jiri@resnulli.us" <jiri@resnulli.us>,
"edumazet@google.com" <edumazet@google.com>,
"kuba@kernel.org" <kuba@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH net] net: Lock lower level devices when updating features
Date: Wed, 7 May 2025 14:20:38 -0700 [thread overview]
Message-ID: <aBvOpkIoxcr9PfDg@mini-arch> (raw)
In-Reply-To: <e886c3511fbe9cf7e66b0a142183a5375a42978a.camel@nvidia.com>
On 05/07, Cosmin Ratiu wrote:
> On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote:
> > > In any case, please hold off with picking this patch up, it seems
> > > there's a possibility of a real deadlock. Here's the scenario:
> > >
> > > ============================================
> > > WARNING: possible recursive locking detected
> > > --------------------------------------------
> > > ethtool/44150 is trying to acquire lock:
> > > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > > __netdev_update_features+0x31e/0xe20
> > >
> > > but task is already holding lock:
> > > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > > ethnl_set_features+0xbc/0x4b0
> > > and the lock comparison function returns 0:
> > >
> > > other info that might help us debug this:
> > > Possible unsafe locking scenario:
> > >
> > > CPU0
> > > ----
> > > lock(&dev_instance_lock_key#7);
> > > lock(&dev_instance_lock_key#7);
> > >
> > > *** DEADLOCK ***
> > >
> > > May be due to missing lock nesting notation
> > >
> > > 3 locks held by ethtool/44150:
> > > #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
> > > #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at:
> > > ethnl_set_features+0x88/0x4b0
> > > #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at:
> > > ethnl_set_features+0xbc/0x4b0
> > >
> > > stack backtrace:
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x69/0xa0
> > > print_deadlock_bug.cold+0xbd/0xca
> > > __lock_acquire+0x163c/0x2f00
> > > lock_acquire+0xd3/0x2e0
> > > __mutex_lock+0x98/0xf10
> > > __netdev_update_features+0x31e/0xe20
> > > netdev_update_features+0x1f/0x60
> > > vlan_device_event+0x57d/0x930 [8021q]
> > > notifier_call_chain+0x3d/0x100
> > > netdev_features_change+0x32/0x50
> > > ethnl_set_features+0x17e/0x4b0
> > > genl_family_rcv_msg_doit+0xe0/0x130
> > > genl_rcv_msg+0x188/0x290
> > > [...]
> > >
> > > I'm not sure how to solve this yet...
> > > Cosmin.
> >
> > If it's not clear, the problem is that:
> > 1. the lower device is already ops locked
> > 2. netdev_feature_change gets called.
> > 3. __netdev_update_features gets called for the vlan (upper) dev.
> > 4. It tries to acquire the same lock instance as 1 (this patch).
> > 5. Deadlock.
> >
> > One solution I can think of would be to run device notifiers for
> > changing features outside the lock, it doesn't seem like the netdev
> > lock has anything to do with what other devices might do with this
> > information.
> >
> > This can be triggered from many scenarios, I have another similar
> > stack
> > involving bonding.
> >
> > What do you think?
>
> All I could think of was to drop the lock during the
> netdev_features_changed notifier calls, like in the following hunk.
> I'm running this through regressions, let's see if it's a good idea or
> not.
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1be7cb73a602..817fd5bc21b1 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1514,7 +1514,12 @@ int dev_get_alias(const struct net_device *dev,
> char *name, size_t len)
> */
> void netdev_features_change(struct net_device *dev)
> {
> + /* Drop the lock to avoid potential deadlocks from e.g. upper
> dev
> + * notifiers altering features of 'dev' and acquiring dev lock
> again.
> + */
> + netdev_unlock_ops(dev);
> call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
> + netdev_lock_ops(dev);
> }
> EXPORT_SYMBOL(netdev_features_change);
>
Hmm, are you sure you're calling __netdev_update_features on the upper?
I don't see how the lower would be locked in that case. From my POW,
this is what happens:
1. your dev (lower) has a vlan on it (upper)
2. you call lro=off on the _lower_
3. this triggers FEAT_CHANGE notifier and vlan_device_event catches it
4. since the lower has a vlan device (dev->vlan_info != NULL), it goes
over every other vlan in the group and triggers netdev_update_features
for the upper (netdev_update_features vlandev)
5. the upper tries to sync the features into the lower (including the
one that triggered FEAT_CHANGE) and that's where the deadlock happens
But I think (5) should be largely a no-op for the device triggering the
notification, because the features have been already applied in ethnl_set_features.
I'd move the lock into netdev_sync_lower_features, and only after checking
the features (and making sure that we are going to change them). The feature
check might be racy, but I think it should still work?
Can you also share the bonding stacktrace as well to confirm it's the
same issue?
diff --git a/net/core/dev.c b/net/core/dev.c
index bcb266ab2912..b5fc8a740e8b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10312,6 +10312,7 @@ static void netdev_sync_lower_features(struct net_device *upper,
for_each_netdev_feature(upper_disables, feature_bit) {
feature = __NETIF_F_BIT(feature_bit);
if (!(features & feature) && (lower->features & feature)) {
+ netdev_lock_ops(lower);
netdev_dbg(upper, "Disabling feature %pNF on lower dev %s.\n",
&feature, lower->name);
lower->wanted_features &= ~feature;
@@ -10322,6 +10323,7 @@ static void netdev_sync_lower_features(struct net_device *upper,
&feature, lower->name);
else
netdev_features_change(lower);
+ netdev_unlock_ops(lower);
}
}
}
next prev parent reply other threads:[~2025-05-07 21:20 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-06 14:21 [PATCH net] net: Lock lower level devices when updating features Cosmin Ratiu
2025-05-06 17:12 ` Stanislav Fomichev
2025-05-06 17:40 ` Stanislav Fomichev
2025-05-06 17:47 ` Cosmin Ratiu
2025-05-06 18:13 ` Stanislav Fomichev
2025-05-07 14:35 ` Cosmin Ratiu
2025-05-07 15:13 ` Cosmin Ratiu
2025-05-07 20:29 ` Cosmin Ratiu
2025-05-07 21:20 ` Stanislav Fomichev [this message]
2025-05-08 10:33 ` Cosmin Ratiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBvOpkIoxcr9PfDg@mini-arch \
--to=stfomichev@gmail.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.