From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <edumazet@google.com>
Cc: <davem@davemloft.net>, <horms@kernel.org>, <kuba@kernel.org>,
<kuni1840@gmail.com>, <kuniyu@amazon.com>,
<netdev@vger.kernel.org>, <pabeni@redhat.com>,
<ychemla@nvidia.com>
Subject: Re: [PATCH v2 net 1/2] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
Date: Fri, 7 Feb 2025 15:58:47 +0900 [thread overview]
Message-ID: <20250207065847.83672-1-kuniyu@amazon.com> (raw)
In-Reply-To: <CANn89iKdg=_uf-gis1knki-XSTbp-oHSXM0=kP-HFm2H39AWcg@mail.gmail.com>
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 7 Feb 2025 07:42:13 +0100
> On Fri, Feb 7, 2025 at 5:43 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > After the cited commit, dev_net(dev) is fetched before holding RTNL
> > and passed to __unregister_netdevice_notifier_net().
> >
> > However, dev_net(dev) might be different after holding RTNL.
> >
> > In the reported case [0], while removing a VF device, its netns was
> > being dismantled and the VF was moved to init_net.
> >
> > So the following sequence is basically illegal when dev was fetched
> > without lookup:
> >
> > net = dev_net(dev);
> > rtnl_net_lock(net);
> >
> > Let's use a new helper rtnl_net_dev_lock() to fix the race.
> >
> > It calls maybe_get_net() for dev_net_rcu(dev) and checks dev_net_rcu(dev)
> > before/after rtnl_net_lock().
> >
> > The dev_net_rcu(dev) pointer itself is valid, thanks to RCU API, but the
> > netns might be being dismantled. maybe_get_net() is to avoid the race.
> > This can be done by holding pernet_ops_rwsem, but it will be overkill.
> >
> >
> > Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> > Reported-by: Yael Chemla <ychemla@nvidia.com>
> > Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Tested-by: Yael Chemla <ychemla@nvidia.com>
> > ---
> > v2:
> > * Use dev_net_rcu().
> > * Use msleep(1) instead of cond_resched() after maybe_get_net()
> > * Remove cond_resched() after net_eq() check
> >
> > v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> > ---
> > net/core/dev.c | 63 +++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 50 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index b91658e8aedb..f7430c9d9bc3 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2070,6 +2070,51 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> > __register_netdevice_notifier_net(dst_net, nb, true);
> > }
> >
> > +static bool from_cleanup_net(void)
> > +{
> > +#ifdef CONFIG_NET_NS
> > + return current == cleanup_net_task;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > +static void rtnl_net_dev_lock(struct net_device *dev)
> > +{
> > + struct net *net;
> > +
> > + DEBUG_NET_WARN_ON_ONCE(from_cleanup_net());
>
> I would rather make sure rtnl_net_dev_lock() _can_ be called from cleanup_net()
>
>
> > +again:
> > + /* netns might be being dismantled. */
> > + rcu_read_lock();
> > + net = maybe_get_net(dev_net_rcu(dev));
>
> I do not think maybe_get_net() is what we want here.
>
> If the netns is already in dismantle phase, the count will be zero.
Yes, so I placed the warning above.
Will use net->passive instead, thanks for suggestion!
>
> Instead:
>
> net = dev_net_rcu(dev);
> refcount_inc(&net->passive);
>
>
> > + rcu_read_unlock();
>
> > + if (!net) {
> > + msleep(1);
> > + goto again;
> > + }
>
> > +
> > + rtnl_net_lock(net);
> > +
> > + /* dev might have been moved to another netns. */
> > + rcu_read_lock();
>
> As we do not dereference the net pointer, I would not acquire
> rcu_read_lock() and instead use
>
> if (!net_eq(net, rcu_access_pointer(dev->nd_net.net)) {
Exactly, will use rcu_access_pointer().
>
>
>
> > + if (!net_eq(net, dev_net_rcu(dev))) {
> > + rcu_read_unlock();
> > + rtnl_net_unlock(net);
>
> > + put_net(net);
> instead :
> net_drop_ns(net);
>
> > + goto again;
> > + }
> > + rcu_read_unlock();
> > +}
> > +
> > +static void rtnl_net_dev_unlock(struct net_device *dev)
> > +{
> > + struct net *net = dev_net(dev);
> > +
> > + rtnl_net_unlock(net);
>
> And replace the put_net() here and above with:
>
> net_drop_ns(net);
>
> > + put_net(net);
> > +}
> > +
> > int register_netdevice_notifier_dev_net(struct net_device *dev,
> > struct notifier_block *nb,
> > struct netdev_net_notifier *nn)
> > @@ -2077,6 +2122,8 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> > struct net *net = dev_net(dev);
> > int err;
> >
>
> > + DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->dev_list));
> /* Why is this needed ? */
The following rtnl_net_lock() assumes the dev is not yet published
by register_netdevice(), and I think there's no such users calling
register_netdevice_notifier_dev_net() after that, so just a paranoid..
>
> > +
> > rtnl_net_lock(net);
> > err = __register_netdevice_notifier_net(net, nb, false);
> > if (!err) {
next prev parent reply other threads:[~2025-02-07 6:59 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-07 4:42 [PATCH v2 net 0/2] net: Fix race of rtnl_net_lock(dev_net(dev)) Kuniyuki Iwashima
2025-02-07 4:42 ` [PATCH v2 net 1/2] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net() Kuniyuki Iwashima
2025-02-07 6:42 ` Eric Dumazet
2025-02-07 6:58 ` Kuniyuki Iwashima [this message]
2025-02-07 7:01 ` Eric Dumazet
2025-02-07 7:07 ` Kuniyuki Iwashima
2025-02-07 4:42 ` [PATCH v2 net 2/2] dev: Use rtnl_net_dev_lock() in unregister_netdev() Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250207065847.83672-1-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=ychemla@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox