* [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration.
@ 2025-01-04 6:37 Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier Kuniyuki Iwashima
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-04 6:37 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Patch 1 converts the global netdev notifier to blocking_notifier,
which will be called under per-netns RTNL without RTNL, then we
need to protect the ongoing netdev_chain users from unregistration.
Patch 2 ~ 4 adds per-netns RTNL for registration of the global
and per-netns netdev notifiers.
Kuniyuki Iwashima (4):
net: Convert netdev_chain to blocking_notifier.
net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier().
net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net().
net: Hold rtnl_net_lock() in
(un)?register_netdevice_notifier_dev_net().
net/core/dev.c | 48 +++++++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 17 deletions(-)
--
2.39.5 (Apple Git-154)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier.
2025-01-04 6:37 [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Kuniyuki Iwashima
@ 2025-01-04 6:37 ` Kuniyuki Iwashima
2025-01-08 2:28 ` kernel test robot
2025-01-04 6:37 ` [PATCH v1 net-next 2/4] net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier() Kuniyuki Iwashima
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-04 6:37 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Once RTNL is converted to per-netns, nothing protects
netdev_chain.
A netdev notifier might disappear while being used by
raw_notifier_call_chain().
Let's convert netdev_chain to blocking_notifier to add
its dedicated rwsem.
Given netdev_chain is touched under RTNL by only one user
at the same time, adding rwsem is unlikely to cause regression.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/dev.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index e7223972b9aa..404f5bda821b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -444,7 +444,7 @@ static void unlist_netdevice(struct net_device *dev)
* Our notifier list
*/
-static RAW_NOTIFIER_HEAD(netdev_chain);
+static BLOCKING_NOTIFIER_HEAD(netdev_chain);
/*
* Device drivers call our routines to queue packets here. We empty the
@@ -1770,7 +1770,7 @@ int register_netdevice_notifier(struct notifier_block *nb)
/* Close race with setup_net() and cleanup_net() */
down_write(&pernet_ops_rwsem);
rtnl_lock();
- err = raw_notifier_chain_register(&netdev_chain, nb);
+ err = blocking_notifier_chain_register(&netdev_chain, nb);
if (err)
goto unlock;
if (dev_boot_phase)
@@ -1790,7 +1790,7 @@ int register_netdevice_notifier(struct notifier_block *nb)
for_each_net_continue_reverse(net)
call_netdevice_unregister_net_notifiers(nb, net);
- raw_notifier_chain_unregister(&netdev_chain, nb);
+ blocking_notifier_chain_unregister(&netdev_chain, nb);
goto unlock;
}
EXPORT_SYMBOL(register_netdevice_notifier);
@@ -1817,7 +1817,7 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
/* Close race with setup_net() and cleanup_net() */
down_write(&pernet_ops_rwsem);
rtnl_lock();
- err = raw_notifier_chain_unregister(&netdev_chain, nb);
+ err = blocking_notifier_chain_unregister(&netdev_chain, nb);
if (err)
goto unlock;
@@ -1993,7 +1993,7 @@ int call_netdevice_notifiers_info(unsigned long val,
ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
if (ret & NOTIFY_STOP_MASK)
return ret;
- return raw_notifier_call_chain(&netdev_chain, val, info);
+ return blocking_notifier_call_chain(&netdev_chain, val, info);
}
/**
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 net-next 2/4] net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier().
2025-01-04 6:37 [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier Kuniyuki Iwashima
@ 2025-01-04 6:37 ` Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 3/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net() Kuniyuki Iwashima
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-04 6:37 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
(un)?register_netdevice_notifier() hold pernet_ops_rwsem and RTNL,
iterate all netns, and trigger the notifier for all netdev.
Let's hold __rtnl_net_lock() before triggering the notifier.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/dev.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 404f5bda821b..7d49b4018ea2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1776,7 +1776,9 @@ int register_netdevice_notifier(struct notifier_block *nb)
if (dev_boot_phase)
goto unlock;
for_each_net(net) {
+ __rtnl_net_lock(net);
err = call_netdevice_register_net_notifiers(nb, net);
+ __rtnl_net_unlock(net);
if (err)
goto rollback;
}
@@ -1787,8 +1789,11 @@ int register_netdevice_notifier(struct notifier_block *nb)
return err;
rollback:
- for_each_net_continue_reverse(net)
+ for_each_net_continue_reverse(net) {
+ __rtnl_net_lock(net);
call_netdevice_unregister_net_notifiers(nb, net);
+ __rtnl_net_unlock(net);
+ }
blocking_notifier_chain_unregister(&netdev_chain, nb);
goto unlock;
@@ -1821,8 +1826,11 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
if (err)
goto unlock;
- for_each_net(net)
+ for_each_net(net) {
+ __rtnl_net_lock(net);
call_netdevice_unregister_net_notifiers(nb, net);
+ __rtnl_net_unlock(net);
+ }
unlock:
rtnl_unlock();
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 net-next 3/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net().
2025-01-04 6:37 [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 2/4] net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier() Kuniyuki Iwashima
@ 2025-01-04 6:37 ` Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net() Kuniyuki Iwashima
2025-01-04 15:37 ` [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Jakub Kicinski
4 siblings, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-04 6:37 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
(un)?register_netdevice_notifier_net() hold RTNL before triggering the
notifier for all netdev in the netns.
Let's convert the RTNL to rtnl_net_lock().
Note that the per-netns netdev notifier is protected by per-netns RTNL,
so we do not need to convert it to blocking_notifier.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/dev.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 7d49b4018ea2..f6c6559e2548 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1894,9 +1894,10 @@ int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb)
{
int err;
- rtnl_lock();
+ rtnl_net_lock(net);
err = __register_netdevice_notifier_net(net, nb, false);
- rtnl_unlock();
+ rtnl_net_unlock(net);
+
return err;
}
EXPORT_SYMBOL(register_netdevice_notifier_net);
@@ -1922,9 +1923,10 @@ int unregister_netdevice_notifier_net(struct net *net,
{
int err;
- rtnl_lock();
+ rtnl_net_lock(net);
err = __unregister_netdevice_notifier_net(net, nb);
- rtnl_unlock();
+ rtnl_net_unlock(net);
+
return err;
}
EXPORT_SYMBOL(unregister_netdevice_notifier_net);
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-04 6:37 [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Kuniyuki Iwashima
` (2 preceding siblings ...)
2025-01-04 6:37 ` [PATCH v1 net-next 3/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net() Kuniyuki Iwashima
@ 2025-01-04 6:37 ` Kuniyuki Iwashima
2025-01-15 22:16 ` Yael Chemla
2025-01-04 15:37 ` [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Jakub Kicinski
4 siblings, 1 reply; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-04 6:37 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
(un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
the notifier for all netdev in the netns.
Let's convert the RTNL to rtnl_net_lock().
Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
yet) protected by per-netns RTNL of both src and dst netns; we need to
convert wireless and hyperv drivers that call dev_change_net_namespace().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/dev.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index f6c6559e2548..a0dd34463901 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
+ struct net *net = dev_net(dev);
int err;
- rtnl_lock();
- err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
+ rtnl_net_lock(net);
+ err = __register_netdevice_notifier_net(net, nb, false);
if (!err) {
nn->nb = nb;
list_add(&nn->list, &dev->net_notifier_list);
}
- rtnl_unlock();
+ rtnl_net_unlock(net);
+
return err;
}
EXPORT_SYMBOL(register_netdevice_notifier_dev_net);
@@ -1960,12 +1962,14 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
+ struct net *net = dev_net(dev);
int err;
- rtnl_lock();
+ rtnl_net_lock(net);
list_del(&nn->list);
- err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
- rtnl_unlock();
+ err = __unregister_netdevice_notifier_net(net, nb);
+ rtnl_net_unlock(net);
+
return err;
}
EXPORT_SYMBOL(unregister_netdevice_notifier_dev_net);
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration.
2025-01-04 6:37 [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Kuniyuki Iwashima
` (3 preceding siblings ...)
2025-01-04 6:37 ` [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net() Kuniyuki Iwashima
@ 2025-01-04 15:37 ` Jakub Kicinski
2025-01-05 7:59 ` Kuniyuki Iwashima
4 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-01-04 15:37 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Kuniyuki Iwashima, netdev
On Sat, 4 Jan 2025 15:37:31 +0900 Kuniyuki Iwashima wrote:
> Patch 1 converts the global netdev notifier to blocking_notifier,
> which will be called under per-netns RTNL without RTNL, then we
> need to protect the ongoing netdev_chain users from unregistration.
>
> Patch 2 ~ 4 adds per-netns RTNL for registration of the global
> and per-netns netdev notifiers.
Lockdep is not happy:
[ 249.261403][ T11] ============================================
[ 249.261592][ T11] WARNING: possible recursive locking detected
[ 249.261769][ T11] 6.13.0-rc5-virtme #1 Not tainted
[ 249.261920][ T11] --------------------------------------------
[ 249.262094][ T11] kworker/u16:0/11 is trying to acquire lock:
[ 249.262293][ T11] ffffffff8a7f6a70 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x50/0x90
[ 249.262591][ T11]
[ 249.262591][ T11] but task is already holding lock:
[ 249.262810][ T11] ffffffff8a7f6a70 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x50/0x90
[ 249.263100][ T11]
[ 249.263100][ T11] other info that might help us debug this:
[ 249.263310][ T11] Possible unsafe locking scenario:
[ 249.263310][ T11]
[ 249.263522][ T11] CPU0
[ 249.263624][ T11] ----
[ 249.263728][ T11] lock((netdev_chain).rwsem);
[ 249.263875][ T11] lock((netdev_chain).rwsem);
[ 249.264020][ T11]
[ 249.264020][ T11] *** DEADLOCK ***
[ 249.264020][ T11]
[ 249.264223][ T11] May be due to missing lock nesting notation
[ 249.264223][ T11]
[ 249.264440][ T11] 5 locks held by kworker/u16:0/11:
[ 249.264582][ T11] #0: ffff8880010b5948 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x7ec/0x16d0
[ 249.264867][ T11] #1: ffffc900000b7da0 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work+0xe0b/0x16d0
[ 249.265118][ T11] #2: ffffffff8a7ec4d0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net+0xbc/0xba0
[ 249.265381][ T11] #3: ffffffff8a807e88 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch+0x81/0x2e0
[ 249.265668][ T11] #4: ffffffff8a7f6a70 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x50/0x90
[ 249.265954][ T11]
[ 249.265954][ T11] stack backtrace:
[ 249.266126][ T11] CPU: 2 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted 6.13.0-rc5-virtme #1
[ 249.266389][ T11] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 249.266572][ T11] Workqueue: netns cleanup_net
[ 249.266722][ T11] Call Trace:
[ 249.266826][ T11] <TASK>
[ 249.266907][ T11] dump_stack_lvl+0x82/0xd0
[ 249.267056][ T11] print_deadlock_bug+0x40a/0x650
[ 249.267206][ T11] validate_chain+0x5bf/0xae0
[ 249.267352][ T11] ? __pfx_validate_chain+0x10/0x10
[ 249.267503][ T11] ? hlock_class+0x4e/0x130
[ 249.267642][ T11] ? mark_lock+0x38/0x3e0
[ 249.267751][ T11] __lock_acquire+0xb9a/0x1680
[ 249.267897][ T11] ? spin_bug+0x191/0x1d0
[ 249.268007][ T11] ? debug_object_assert_init+0x2a9/0x370
[ 249.268164][ T11] lock_acquire.part.0+0xeb/0x330
[ 249.268313][ T11] ? blocking_notifier_call_chain+0x50/0x90
[ 249.268497][ T11] ? __pfx_lock_acquire.part.0+0x10/0x10
[ 249.268651][ T11] ? trace_lock_acquire+0x14c/0x1f0
[ 249.268803][ T11] ? lock_acquire+0x32/0xc0
[ 249.268944][ T11] ? blocking_notifier_call_chain+0x50/0x90
[ 249.269132][ T11] down_read+0x9f/0x340
[ 249.269247][ T11] ? blocking_notifier_call_chain+0x50/0x90
[ 249.269436][ T11] ? __pfx_down_read+0x10/0x10
[ 249.269586][ T11] blocking_notifier_call_chain+0x50/0x90
[ 249.269739][ T11] __dev_close_many+0xdf/0x2d0
[ 249.269881][ T11] ? __pfx___dev_close_many+0x10/0x10
[ 249.270031][ T11] dev_close_many+0x202/0x650
--
pw-bot: cr
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration.
2025-01-04 15:37 ` [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Jakub Kicinski
@ 2025-01-05 7:59 ` Kuniyuki Iwashima
0 siblings, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-05 7:59 UTC (permalink / raw)
To: kuba; +Cc: davem, edumazet, horms, kuni1840, kuniyu, netdev, pabeni
From: Jakub Kicinski <kuba@kernel.org>
Date: Sat, 4 Jan 2025 07:37:40 -0800
> On Sat, 4 Jan 2025 15:37:31 +0900 Kuniyuki Iwashima wrote:
> > Patch 1 converts the global netdev notifier to blocking_notifier,
> > which will be called under per-netns RTNL without RTNL, then we
> > need to protect the ongoing netdev_chain users from unregistration.
> >
> > Patch 2 ~ 4 adds per-netns RTNL for registration of the global
> > and per-netns netdev notifiers.
>
> Lockdep is not happy:
>
> [ 249.261403][ T11] ============================================
> [ 249.261592][ T11] WARNING: possible recursive locking detected
> [ 249.261769][ T11] 6.13.0-rc5-virtme #1 Not tainted
> [ 249.261920][ T11] --------------------------------------------
> [ 249.262094][ T11] kworker/u16:0/11 is trying to acquire lock:
> [ 249.262293][ T11] ffffffff8a7f6a70 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x50/0x90
> [ 249.262591][ T11]
> [ 249.262591][ T11] but task is already holding lock:
> [ 249.262810][ T11] ffffffff8a7f6a70 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x50/0x90
> [ 249.263100][ T11]
> [ 249.263100][ T11] other info that might help us debug this:
> [ 249.263310][ T11] Possible unsafe locking scenario:
> [ 249.263310][ T11]
> [ 249.263522][ T11] CPU0
> [ 249.263624][ T11] ----
> [ 249.263728][ T11] lock((netdev_chain).rwsem);
> [ 249.263875][ T11] lock((netdev_chain).rwsem);
Ah, lockdep annotaion was needed for the nested notifier calls.
But this will not be a meaningful annotation and needs to be changed
once rtnl_setlink/dellink supports per-netns RTNL.
I'll drop patch 1 and just leave a comment around RTNL in
register_netdevice_notifier() in patch 2.
Another option would be clone each netdev notifier during registration
and unshare(2)/clone(2) and force notifiers to be namespacified ?
---8<---
diff --git a/net/core/dev.c b/net/core/dev.c
index a0dd34463901..8bf8d565f42d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -446,6 +446,17 @@ static void unlist_netdevice(struct net_device *dev)
static BLOCKING_NOTIFIER_HEAD(netdev_chain);
+#ifdef CONFIG_PROVE_LOCKING
+static int netdev_chain_lock_cmp_fn(const struct lockdep_map *a,
+ const struct lockdep_map *b)
+{
+ if (rtnl_is_locked())
+ return -1;
+
+ return 1;
+}
+#endif
+
/*
* Device drivers call our routines to queue packets here. We empty the
* queue in the local softnet handler.
@@ -12229,6 +12240,8 @@ static int __init net_dev_init(void)
net_dev_struct_check();
+ lock_set_cmp_fn(&netdev_chain.rwsem, netdev_chain_lock_cmp_fn, NULL);
+
if (dev_proc_init())
goto out;
---8<---
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier.
2025-01-04 6:37 ` [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier Kuniyuki Iwashima
@ 2025-01-08 2:28 ` kernel test robot
0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-01-08 2:28 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: oe-lkp, lkp, netdev, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
Kuniyuki Iwashima, oliver.sang
Hello,
kernel test robot noticed "WARNING:possible_recursive_locking_detected" on:
commit: a105e36eac9fc55523ee0157ab2aeaf0ebe949ae ("[PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier.")
url: https://github.com/intel-lab-lkp/linux/commits/Kuniyuki-Iwashima/net-Convert-netdev_chain-to-blocking_notifier/20250104-144302
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 385f186aba3d2f7122b71d6d4c7e236b9d4e8003
patch link: https://lore.kernel.org/all/20250104063735.36945-2-kuniyu@amazon.com/
patch subject: [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier.
in testcase: boot
config: i386-randconfig-011-20250106
compiler: clang-19
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202501081030.464a8d0a-lkp@intel.com
[ 5.611439][ T1] WARNING: possible recursive locking detected
[ 5.612070][ T1] 6.13.0-rc5-00768-ga105e36eac9f #1 Tainted: G T
[ 5.612838][ T1] --------------------------------------------
[ 5.613448][ T1] swapper/1 is trying to acquire lock:
[ 5.613981][ T1] 8bb1f2c0 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain (kernel/notifier.c:?)
[ 5.614973][ T1]
[ 5.614973][ T1] but task is already holding lock:
[ 5.615681][ T1] 8bb1f2c0 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain (kernel/notifier.c:?)
[ 5.616710][ T1]
[ 5.616710][ T1] other info that might help us debug this:
[ 5.617477][ T1] Possible unsafe locking scenario:
[ 5.617477][ T1]
[ 5.618212][ T1] CPU0
[ 5.618531][ T1] ----
[ 5.618867][ T1] lock((netdev_chain).rwsem);
[ 5.619378][ T1] lock((netdev_chain).rwsem);
[ 5.619874][ T1]
[ 5.619874][ T1] *** DEADLOCK ***
[ 5.619874][ T1]
[ 5.620686][ T1] May be due to missing lock nesting notation
[ 5.620686][ T1]
[ 5.621503][ T1] 2 locks held by swapper/1:
[ 5.621956][ T1] #0: 8bb1fd24 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock (net/core/rtnetlink.c:80)
[ 5.622737][ T1] #1: 8bb1f2c0 ((netdev_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain (kernel/notifier.c:?)
[ 5.623745][ T1]
[ 5.623745][ T1] stack backtrace:
[ 5.624307][ T1] CPU: 0 UID: 0 PID: 1 Comm: swapper Tainted: G T 6.13.0-rc5-00768-ga105e36eac9f #1
[ 5.625360][ T1] Tainted: [T]=RANDSTRUCT
[ 5.625789][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 5.626829][ T1] Call Trace:
[ 5.627162][ T1] dump_stack_lvl (lib/dump_stack.c:122)
[ 5.627595][ T1] ? vprintk (kernel/printk/printk_safe.c:?)
[ 5.628004][ T1] dump_stack (lib/dump_stack.c:129)
[ 5.628427][ T1] print_deadlock_bug (kernel/locking/lockdep.c:3039)
[ 5.628931][ T1] __lock_acquire (kernel/locking/lockdep.c:3089 kernel/locking/lockdep.c:3891 kernel/locking/lockdep.c:5226)
[ 5.629437][ T1] lock_acquire (kernel/locking/lockdep.c:5849)
[ 5.629883][ T1] ? blocking_notifier_call_chain (kernel/notifier.c:?)
[ 5.630484][ T1] down_read (kernel/locking/rwsem.c:1524)
[ 5.630907][ T1] ? blocking_notifier_call_chain (kernel/notifier.c:?)
[ 5.631522][ T1] ? notifier_call_chain (kernel/notifier.c:?)
[ 5.632054][ T1] blocking_notifier_call_chain (kernel/notifier.c:?)
[ 5.632631][ T1] call_netdevice_notifiers_info (net/core/dev.c:1997)
[ 5.633224][ T1] register_netdevice (include/linux/notifier.h:207 net/core/dev.c:10597)
[ 5.633717][ T1] bpq_device_event (drivers/net/hamradio/bpqether.c:500 drivers/net/hamradio/bpqether.c:542)
[ 5.634190][ T1] notifier_call_chain (kernel/notifier.c:?)
[ 5.634710][ T1] blocking_notifier_call_chain (kernel/notifier.c:380)
[ 5.635284][ T1] call_netdevice_notifiers_info (net/core/dev.c:1997)
[ 5.635875][ T1] __dev_notify_flags (net/core/dev.c:8995)
[ 5.636372][ T1] dev_change_flags (net/core/dev.c:?)
[ 5.636833][ T1] ic_open_devs (net/ipv4/ipconfig.c:242)
[ 5.637265][ T1] ? wait_for_devices (net/ipv4/ipconfig.c:?)
[ 5.637736][ T1] ip_auto_config (net/ipv4/ipconfig.c:1515)
[ 5.638192][ T1] do_one_initcall (init/main.c:1266)
[ 5.638656][ T1] ? root_nfs_parse_addr (net/ipv4/ipconfig.c:1477)
[ 5.639169][ T1] ? __lock_acquire (kernel/locking/lockdep.c:4670 kernel/locking/lockdep.c:5180)
[ 5.639684][ T1] ? ktime_get (include/linux/seqlock.h:368 (discriminator 3) include/linux/seqlock.h:388 (discriminator 3) kernel/time/timekeeping.c:815 (discriminator 3))
[ 5.640142][ T1] ? irqentry_exit (kernel/entry/common.c:?)
[ 5.640619][ T1] ? check_preemption_disabled (lib/smp_processor_id.c:16 (discriminator 1))
[ 5.641205][ T1] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67)
[ 5.641755][ T1] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4468 (discriminator 5))
[ 5.642293][ T1] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:80)
[ 5.642800][ T1] ? irqentry_exit (kernel/entry/common.c:?)
[ 5.643271][ T1] ? vmware_sched_clock (arch/x86/kernel/apic/apic.c:1049)
[ 5.643776][ T1] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 (discriminator 6))
[ 5.644336][ T1] ? handle_exception (init_task.c:?)
[ 5.644843][ T1] ? next_arg (lib/cmdline.c:273)
[ 5.645293][ T1] do_initcall_level (init/main.c:1327 (discriminator 6))
[ 5.645781][ T1] do_initcalls (init/main.c:1341 (discriminator 2))
[ 5.646223][ T1] do_basic_setup (init/main.c:1364)
[ 5.646697][ T1] kernel_init_freeable (init/main.c:1579)
[ 5.647216][ T1] ? rest_init (init/main.c:1458)
[ 5.647671][ T1] ? rest_init (init/main.c:1458)
[ 5.648135][ T1] kernel_init (init/main.c:1468)
[ 5.648604][ T1] ret_from_fork (arch/x86/kernel/process.c:153)
[ 5.649060][ T1] ret_from_fork_asm (??:?)
[ 5.649533][ T1] entry_INT80_32 (init_task.c:?)
[ 7.669903][ T35] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 7.682723][ T1] Sending DHCP requests ., OK
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250108/202501081030.464a8d0a-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-04 6:37 ` [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net() Kuniyuki Iwashima
@ 2025-01-15 22:16 ` Yael Chemla
2025-01-16 2:54 ` Kuniyuki Iwashima
0 siblings, 1 reply; 16+ messages in thread
From: Yael Chemla @ 2025-01-15 22:16 UTC (permalink / raw)
To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Kuniyuki Iwashima, netdev
we observed in our regression tests the following issue:
BUG: KASAN: slab-use-after-free in notifier_call_chain+0x22c/0x280
kasan_report+0xbd/0xf0
RIP: 0033:0x7f70839018b7
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0x83/0x90
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x37/0x50
__kasan_slab_free+0x33/0x40
page dumped because: kasan: bad access detected
BUG: KASAN: slab-use-after-free in notifier_call_chain+0x222/0x280
kasan_report+0xbd/0xf0
RIP: 0033:0x7f70839018b7
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0x83/0x90
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x37/0x50
__kasan_slab_free+0x33/0x40
page dumped because: kasan: bad access detected
and there are many more of that kind.
it happens after applying commit 7fb1073300a2 ("net: Hold
rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net()")
test scenario includes configuration and traffic over two namespaces
associated with two different VFs.
On 04/01/2025 8:37, Kuniyuki Iwashima wrote:
> (un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
> the notifier for all netdev in the netns.
>
> Let's convert the RTNL to rtnl_net_lock().
>
> Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
> yet) protected by per-netns RTNL of both src and dst netns; we need to
> convert wireless and hyperv drivers that call dev_change_net_namespace().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> net/core/dev.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f6c6559e2548..a0dd34463901 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> {
> + struct net *net = dev_net(dev);
it seems to happen since the net pointer is acquired here without a lock.
Note that KASAN issue is not triggered when executing with rtnl_lock()
taken before this line. and our kernel .config expands
rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not set).
> int err;
>
> - rtnl_lock();
> - err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
> + rtnl_net_lock(net);
> + err = __register_netdevice_notifier_net(net, nb, false);
> if (!err) {
> nn->nb = nb;
> list_add(&nn->list, &dev->net_notifier_list);
> }
> - rtnl_unlock();
> + rtnl_net_unlock(net);
> +
> return err;
> }
> EXPORT_SYMBOL(register_netdevice_notifier_dev_net);
> @@ -1960,12 +1962,14 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> {
> + struct net *net = dev_net(dev);
> int err;
>
> - rtnl_lock();
> + rtnl_net_lock(net);
> list_del(&nn->list);
> - err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
> - rtnl_unlock();
> + err = __unregister_netdevice_notifier_net(net, nb);
> + rtnl_net_unlock(net);
> +
> return err;
> }
> EXPORT_SYMBOL(unregister_netdevice_notifier_dev_net);
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-15 22:16 ` Yael Chemla
@ 2025-01-16 2:54 ` Kuniyuki Iwashima
2025-01-20 18:55 ` Yael Chemla
0 siblings, 1 reply; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-16 2:54 UTC (permalink / raw)
To: ychemla; +Cc: davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev, pabeni
Thanks for the report!
From: Yael Chemla <ychemla@nvidia.com>
Date: Thu, 16 Jan 2025 00:16:27 +0200
> we observed in our regression tests the following issue:
>
> BUG: KASAN: slab-use-after-free in notifier_call_chain+0x22c/0x280
> kasan_report+0xbd/0xf0
> RIP: 0033:0x7f70839018b7
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> __kasan_kmalloc+0x83/0x90
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> kasan_save_free_info+0x37/0x50
> __kasan_slab_free+0x33/0x40
> page dumped because: kasan: bad access detected
> BUG: KASAN: slab-use-after-free in notifier_call_chain+0x222/0x280
> kasan_report+0xbd/0xf0
> RIP: 0033:0x7f70839018b7
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> __kasan_kmalloc+0x83/0x90
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> kasan_save_free_info+0x37/0x50
> __kasan_slab_free+0x33/0x40
> page dumped because: kasan: bad access detected
>
> and there are many more of that kind.
Do you have any other stack traces with more callers info ?
Also can you decode the trace with ./scripts/decode_stacktrace.sh ?
>
> it happens after applying commit 7fb1073300a2 ("net: Hold
> rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net()")
>
> test scenario includes configuration and traffic over two namespaces
> associated with two different VFs.
Could you elaborate more about the test scenario, especially
how each device/netns is dismantled after the test case ?
I guess the VF is moved to init_net ?
>
>
> On 04/01/2025 8:37, Kuniyuki Iwashima wrote:
> > (un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
> > the notifier for all netdev in the netns.
> >
> > Let's convert the RTNL to rtnl_net_lock().
> >
> > Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
> > yet) protected by per-netns RTNL of both src and dst netns; we need to
> > convert wireless and hyperv drivers that call dev_change_net_namespace().
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> > net/core/dev.c | 16 ++++++++++------
> > 1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index f6c6559e2548..a0dd34463901 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> > struct notifier_block *nb,
> > struct netdev_net_notifier *nn)
> > {
> > + struct net *net = dev_net(dev);
>
> it seems to happen since the net pointer is acquired here without a lock.
> Note that KASAN issue is not triggered when executing with rtnl_lock()
> taken before this line. and our kernel .config expands
> rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not set).
It sounds like the device was being moved to another netns while
unregister_netdevice_notifier_dev_net() was called.
Could you check if dev_net() is changed before/after rtnl_lock() in
* register_netdevice_notifier_dev_net()
* unregister_netdevice_notifier_dev_net()
?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-16 2:54 ` Kuniyuki Iwashima
@ 2025-01-20 18:55 ` Yael Chemla
2025-01-27 17:26 ` Yael Chemla
2025-01-27 23:26 ` Kuniyuki Iwashima
0 siblings, 2 replies; 16+ messages in thread
From: Yael Chemla @ 2025-01-20 18:55 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, edumazet, horms, kuba, kuni1840, netdev, pabeni
On 16/01/2025 4:54, Kuniyuki Iwashima wrote:
> Thanks for the report!
>
> From: Yael Chemla <ychemla@nvidia.com>
> Date: Thu, 16 Jan 2025 00:16:27 +0200
>> we observed in our regression tests the following issue:
>>
>> BUG: KASAN: slab-use-after-free in notifier_call_chain+0x22c/0x280
>> kasan_report+0xbd/0xf0
>> RIP: 0033:0x7f70839018b7
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> __kasan_kmalloc+0x83/0x90
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> kasan_save_free_info+0x37/0x50
>> __kasan_slab_free+0x33/0x40
>> page dumped because: kasan: bad access detected
>> BUG: KASAN: slab-use-after-free in notifier_call_chain+0x222/0x280
>> kasan_report+0xbd/0xf0
>> RIP: 0033:0x7f70839018b7
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> __kasan_kmalloc+0x83/0x90
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> kasan_save_free_info+0x37/0x50
>> __kasan_slab_free+0x33/0x40
>> page dumped because: kasan: bad access detected
>>
>> and there are many more of that kind.
>
> Do you have any other stack traces with more callers info ?
> Also can you decode the trace with ./scripts/decode_stacktrace.sh ?
>
BUG: KASAN: slab-use-after-free in notifier_call_chain
(/usr/work/linux/kernel/notifier.c:75 (discriminator 2))
Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (/usr/work/linux/lib/dump_stack.c:123)
print_report (/usr/work/linux/mm/kasan/report.c:379
/usr/work/linux/mm/kasan/report.c:489)
? __virt_addr_valid (/usr/work/linux/./arch/x86/include/asm/preempt.h:84
(discriminator 13) /usr/work/linux/./include/linux/rcupdate.h:964
(discriminator 13) /usr/work/linux/./include/linux/mmzone.h:2058
(discriminator 13) /usr/work/linux/arch/x86/mm/physaddr.c:65
(discriminator 13))
kasan_report (/usr/work/linux/mm/kasan/report.c:604)
? notifier_call_chain (/usr/work/linux/kernel/notifier.c:75
(discriminator 2))
? notifier_call_chain (/usr/work/linux/kernel/notifier.c:75
(discriminator 2))
notifier_call_chain (/usr/work/linux/kernel/notifier.c:75 (discriminator 2))
call_netdevice_notifiers_info (/usr/work/linux/net/core/dev.c:2011)
unregister_netdevice_many_notify (/usr/work/linux/net/core/dev.c:11551)
? mark_held_locks (/usr/work/linux/kernel/locking/lockdep.c:4321
(discriminator 1))
? __mutex_lock (/usr/work/linux/kernel/locking/mutex.c:689
(discriminator 2) /usr/work/linux/kernel/locking/mutex.c:735
(discriminator 2))
? lockdep_hardirqs_on_prepare
(/usr/work/linux/kernel/locking/lockdep.c:4347
/usr/work/linux/kernel/locking/lockdep.c:4406)
? dev_ingress_queue_create (/usr/work/linux/net/core/dev.c:11492)
? __mutex_lock (/usr/work/linux/kernel/locking/mutex.c:689
(discriminator 2) /usr/work/linux/kernel/locking/mutex.c:735
(discriminator 2))
? __mutex_lock (/usr/work/linux/./arch/x86/include/asm/preempt.h:84
(discriminator 13) /usr/work/linux/kernel/locking/mutex.c:715
(discriminator 13) /usr/work/linux/kernel/locking/mutex.c:735
(discriminator 13))
? unregister_netdev (/usr/work/linux/./include/linux/netdevice.h:3236
/usr/work/linux/net/core/dev.c:11633)
? mutex_lock_io_nested (/usr/work/linux/kernel/locking/mutex.c:734)
? __mutex_unlock_slowpath
(/usr/work/linux/./arch/x86/include/asm/atomic64_64.h:101
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:4329
/usr/work/linux/./include/linux/atomic/atomic-long.h:1506
/usr/work/linux/./include/linux/atomic/atomic-instrumented.h:4481
/usr/work/linux/kernel/locking/mutex.c:913)
unregister_netdevice_queue (/usr/work/linux/net/core/dev.c:11487)
? unregister_netdevice_many (/usr/work/linux/net/core/dev.c:11476)
unregister_netdev (/usr/work/linux/net/core/dev.c:11635)
mlx5e_remove
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579)
mlx5_core
auxiliary_bus_remove (/usr/work/linux/drivers/base/auxiliary.c:230)
device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275
/usr/work/linux/drivers/base/dd.c:1296)
? kobject_put (/usr/work/linux/./arch/x86/include/asm/atomic.h:93
(discriminator 4)
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:949
(discriminator 4)
/usr/work/linux/./include/linux/atomic/atomic-instrumented.h:401
(discriminator 4) /usr/work/linux/./include/linux/refcount.h:264
(discriminator 4) /usr/work/linux/./include/linux/refcount.h:307
(discriminator 4) /usr/work/linux/./include/linux/refcount.h:325
(discriminator 4) /usr/work/linux/./include/linux/kref.h:64
(discriminator 4) /usr/work/linux/lib/kobject.c:737 (discriminator 4))
bus_remove_device (/usr/work/linux/./include/linux/kobject.h:193
/usr/work/linux/drivers/base/base.h:73
/usr/work/linux/drivers/base/bus.c:583)
device_del (/usr/work/linux/drivers/base/power/power.h:142
/usr/work/linux/drivers/base/core.c:3855)
? mlx5_core_is_eth_enabled
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/devlink.h:65)
mlx5_core
? __device_link_del (/usr/work/linux/drivers/base/core.c:3809)
? is_ib_enabled
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/devlink.h:58)
mlx5_core
mlx5_rescan_drivers_locked
(/usr/work/linux/./include/linux/auxiliary_bus.h:241
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:333
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:535
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
mlx5_unregister_device
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:468)
mlx5_core
mlx5_uninit_one (/usr/work/linux/./include/linux/instrumented.h:68
/usr/work/linux/./include/asm-generic/bitops/instrumented-non-atomic.h:141
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:1563)
mlx5_core
remove_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:965
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:2019)
mlx5_core
pci_device_remove (/usr/work/linux/./include/linux/pm_runtime.h:129
/usr/work/linux/drivers/pci/pci-driver.c:475)
device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275
/usr/work/linux/drivers/base/dd.c:1296)
unbind_store (/usr/work/linux/drivers/base/bus.c:245)
kernfs_fop_write_iter (/usr/work/linux/fs/kernfs/file.c:338)
vfs_write (/usr/work/linux/fs/read_write.c:587 (discriminator 1)
/usr/work/linux/fs/read_write.c:679 (discriminator 1))
? do_user_addr_fault (/usr/work/linux/./include/linux/rcupdate.h:337
/usr/work/linux/./include/linux/rcupdate.h:849
/usr/work/linux/./include/linux/mm.h:740
/usr/work/linux/arch/x86/mm/fault.c:1340)
? kernel_write (/usr/work/linux/fs/read_write.c:660)
? lock_downgrade (/usr/work/linux/kernel/locking/lockdep.c:5857)
ksys_write (/usr/work/linux/fs/read_write.c:732)
? __x64_sys_read (/usr/work/linux/fs/read_write.c:721)
? do_user_addr_fault
(/usr/work/linux/./arch/x86/include/asm/preempt.h:84 (discriminator 13)
/usr/work/linux/./include/linux/rcupdate.h:98 (discriminator 13)
/usr/work/linux/./include/linux/rcupdate.h:882 (discriminator 13)
/usr/work/linux/./include/linux/mm.h:742 (discriminator 13)
/usr/work/linux/arch/x86/mm/fault.c:1340 (discriminator 13))
do_syscall_64 (/usr/work/linux/arch/x86/entry/common.c:52 (discriminator
1) /usr/work/linux/arch/x86/entry/common.c:83 (discriminator 1))
entry_SYSCALL_64_after_hwframe
(/usr/work/linux/arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f6a4d5018b7
>>
>> it happens after applying commit 7fb1073300a2 ("net: Hold
>> rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net()")
>>
>> test scenario includes configuration and traffic over two namespaces
>> associated with two different VFs.
>
> Could you elaborate more about the test scenario, especially
> how each device/netns is dismantled after the test case ?
>
we set up a network configuration which includes two VFs isolated using
two namespaces (there’s also bridge in this topology), we pass some
traffic between VFs. At the end of test (cleanup) we delete network
namespaces, wait for 0.5 sec and unbind VFs of NIC.
note that when I extended the timeout after deleting the namespaces the
issue doesn’t reproduce.
> I guess the VF is moved to init_net ?
>
this should be the behavior in case deletion of the namespaces happens
before the unbind of VFs.
>>
>>
>> On 04/01/2025 8:37, Kuniyuki Iwashima wrote:
>>> (un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
>>> the notifier for all netdev in the netns.
>>>
>>> Let's convert the RTNL to rtnl_net_lock().
>>>
>>> Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
>>> yet) protected by per-netns RTNL of both src and dst netns; we need to
>>> convert wireless and hyperv drivers that call dev_change_net_namespace().
>>>
>>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>>> ---
>>> net/core/dev.c | 16 ++++++++++------
>>> 1 file changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index f6c6559e2548..a0dd34463901 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
>>> struct notifier_block *nb,
>>> struct netdev_net_notifier *nn)
>>> {
>>> + struct net *net = dev_net(dev);
>>
>> it seems to happen since the net pointer is acquired here without a lock.
>> Note that KASAN issue is not triggered when executing with rtnl_lock()
>> taken before this line. and our kernel .config expands
>> rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not set).
>
> It sounds like the device was being moved to another netns while
> unregister_netdevice_notifier_dev_net() was called.
>
> Could you check if dev_net() is changed before/after rtnl_lock() in
>
> * register_netdevice_notifier_dev_net()
> * unregister_netdevice_notifier_dev_net()
>
> ?
When checking dev_net before and after taking the lock the issue won’t
reproduce.
note that when issue reproduce we arrive to
unregister_netdevice_notifier_dev_net with an invalid net pointer
(verified it with prints of its value, and it's not the same consistent
value as is throughout rest of the test).
we suspect the issue related to the async ns deletion.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-20 18:55 ` Yael Chemla
@ 2025-01-27 17:26 ` Yael Chemla
2025-01-27 17:54 ` Kuniyuki Iwashima
2025-01-27 23:26 ` Kuniyuki Iwashima
1 sibling, 1 reply; 16+ messages in thread
From: Yael Chemla @ 2025-01-27 17:26 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, edumazet, horms, kuba, kuni1840, netdev, pabeni
Hi Kuniyuki,
any update on this?
do you need further info?
Thanks,
Yael
On 20/01/2025 20:55, Yael Chemla wrote:
> On 16/01/2025 4:54, Kuniyuki Iwashima wrote:
>> Thanks for the report!
>>
>> From: Yael Chemla <ychemla@nvidia.com>
>> Date: Thu, 16 Jan 2025 00:16:27 +0200
>>> we observed in our regression tests the following issue:
>>>
>>> BUG: KASAN: slab-use-after-free in notifier_call_chain+0x22c/0x280
>>> kasan_report+0xbd/0xf0
>>> RIP: 0033:0x7f70839018b7
>>> kasan_save_stack+0x1c/0x40
>>> kasan_save_track+0x10/0x30
>>> __kasan_kmalloc+0x83/0x90
>>> kasan_save_stack+0x1c/0x40
>>> kasan_save_track+0x10/0x30
>>> kasan_save_free_info+0x37/0x50
>>> __kasan_slab_free+0x33/0x40
>>> page dumped because: kasan: bad access detected
>>> BUG: KASAN: slab-use-after-free in notifier_call_chain+0x222/0x280
>>> kasan_report+0xbd/0xf0
>>> RIP: 0033:0x7f70839018b7
>>> kasan_save_stack+0x1c/0x40
>>> kasan_save_track+0x10/0x30
>>> __kasan_kmalloc+0x83/0x90
>>> kasan_save_stack+0x1c/0x40
>>> kasan_save_track+0x10/0x30
>>> kasan_save_free_info+0x37/0x50
>>> __kasan_slab_free+0x33/0x40
>>> page dumped because: kasan: bad access detected
>>>
>>> and there are many more of that kind.
>>
>> Do you have any other stack traces with more callers info ?
>> Also can you decode the trace with ./scripts/decode_stacktrace.sh ?
>>
> BUG: KASAN: slab-use-after-free in notifier_call_chain (/usr/work/linux/
> kernel/notifier.c:75 (discriminator 2))
> Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
>
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl (/usr/work/linux/lib/dump_stack.c:123)
> print_report (/usr/work/linux/mm/kasan/report.c:379 /usr/work/linux/mm/
> kasan/report.c:489)
> ? __virt_addr_valid (/usr/work/linux/./arch/x86/include/asm/preempt.h:84
> (discriminator 13) /usr/work/linux/./include/linux/rcupdate.h:964
> (discriminator 13) /usr/work/linux/./include/linux/mmzone.h:2058
> (discriminator 13) /usr/work/linux/arch/x86/mm/physaddr.c:65
> (discriminator 13))
> kasan_report (/usr/work/linux/mm/kasan/report.c:604)
> ? notifier_call_chain (/usr/work/linux/kernel/notifier.c:75
> (discriminator 2))
> ? notifier_call_chain (/usr/work/linux/kernel/notifier.c:75
> (discriminator 2))
> notifier_call_chain (/usr/work/linux/kernel/notifier.c:75 (discriminator
> 2))
> call_netdevice_notifiers_info (/usr/work/linux/net/core/dev.c:2011)
> unregister_netdevice_many_notify (/usr/work/linux/net/core/dev.c:11551)
> ? mark_held_locks (/usr/work/linux/kernel/locking/lockdep.c:4321
> (discriminator 1))
> ? __mutex_lock (/usr/work/linux/kernel/locking/mutex.c:689
> (discriminator 2) /usr/work/linux/kernel/locking/mutex.c:735
> (discriminator 2))
> ? lockdep_hardirqs_on_prepare (/usr/work/linux/kernel/locking/
> lockdep.c:4347 /usr/work/linux/kernel/locking/lockdep.c:4406)
> ? dev_ingress_queue_create (/usr/work/linux/net/core/dev.c:11492)
> ? __mutex_lock (/usr/work/linux/kernel/locking/mutex.c:689
> (discriminator 2) /usr/work/linux/kernel/locking/mutex.c:735
> (discriminator 2))
> ? __mutex_lock (/usr/work/linux/./arch/x86/include/asm/preempt.h:84
> (discriminator 13) /usr/work/linux/kernel/locking/mutex.c:715
> (discriminator 13) /usr/work/linux/kernel/locking/mutex.c:735
> (discriminator 13))
> ? unregister_netdev (/usr/work/linux/./include/linux/netdevice.h:3236 /
> usr/work/linux/net/core/dev.c:11633)
> ? mutex_lock_io_nested (/usr/work/linux/kernel/locking/mutex.c:734)
> ? __mutex_unlock_slowpath (/usr/work/linux/./arch/x86/include/asm/
> atomic64_64.h:101 /usr/work/linux/./include/linux/atomic/atomic-arch-
> fallback.h:4329 /usr/work/linux/./include/linux/atomic/atomic-
> long.h:1506 /usr/work/linux/./include/linux/atomic/atomic-
> instrumented.h:4481 /usr/work/linux/kernel/locking/mutex.c:913)
> unregister_netdevice_queue (/usr/work/linux/net/core/dev.c:11487)
> ? unregister_netdevice_many (/usr/work/linux/net/core/dev.c:11476)
> unregister_netdev (/usr/work/linux/net/core/dev.c:11635)
> mlx5e_remove (/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/
> en_main.c:6552 /usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/
> en_main.c:6579) mlx5_core
> auxiliary_bus_remove (/usr/work/linux/drivers/base/auxiliary.c:230)
> device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275 /
> usr/work/linux/drivers/base/dd.c:1296)
> ? kobject_put (/usr/work/linux/./arch/x86/include/asm/atomic.h:93
> (discriminator 4) /usr/work/linux/./include/linux/atomic/atomic-arch-
> fallback.h:949 (discriminator 4) /usr/work/linux/./include/linux/atomic/
> atomic-instrumented.h:401 (discriminator 4) /usr/work/linux/./include/
> linux/refcount.h:264 (discriminator 4) /usr/work/linux/./include/linux/
> refcount.h:307 (discriminator 4) /usr/work/linux/./include/linux/
> refcount.h:325 (discriminator 4) /usr/work/linux/./include/linux/
> kref.h:64 (discriminator 4) /usr/work/linux/lib/kobject.c:737
> (discriminator 4))
> bus_remove_device (/usr/work/linux/./include/linux/kobject.h:193 /usr/
> work/linux/drivers/base/base.h:73 /usr/work/linux/drivers/base/bus.c:583)
> device_del (/usr/work/linux/drivers/base/power/power.h:142 /usr/work/
> linux/drivers/base/core.c:3855)
> ? mlx5_core_is_eth_enabled (/usr/work/linux/drivers/net/ethernet/
> mellanox/mlx5/core/devlink.h:65) mlx5_core
> ? __device_link_del (/usr/work/linux/drivers/base/core.c:3809)
> ? is_ib_enabled (/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/
> core/devlink.h:58) mlx5_core
> mlx5_rescan_drivers_locked (/usr/work/linux/./include/linux/
> auxiliary_bus.h:241 /usr/work/linux/drivers/net/ethernet/mellanox/mlx5/
> core/dev.c:333 /usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/
> dev.c:535 /usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/
> dev.c:549) mlx5_core
> mlx5_unregister_device (/usr/work/linux/drivers/net/ethernet/mellanox/
> mlx5/core/dev.c:468) mlx5_core
> mlx5_uninit_one (/usr/work/linux/./include/linux/instrumented.h:68 /usr/
> work/linux/./include/asm-generic/bitops/instrumented-non-atomic.h:141 /
> usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:1563)
> mlx5_core
> remove_one (/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/
> main.c:965 /usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/
> main.c:2019) mlx5_core
> pci_device_remove (/usr/work/linux/./include/linux/pm_runtime.h:129 /
> usr/work/linux/drivers/pci/pci-driver.c:475)
> device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275 /
> usr/work/linux/drivers/base/dd.c:1296)
> unbind_store (/usr/work/linux/drivers/base/bus.c:245)
> kernfs_fop_write_iter (/usr/work/linux/fs/kernfs/file.c:338)
> vfs_write (/usr/work/linux/fs/read_write.c:587 (discriminator 1) /usr/
> work/linux/fs/read_write.c:679 (discriminator 1))
> ? do_user_addr_fault (/usr/work/linux/./include/linux/rcupdate.h:337 /
> usr/work/linux/./include/linux/rcupdate.h:849 /usr/work/linux/./include/
> linux/mm.h:740 /usr/work/linux/arch/x86/mm/fault.c:1340)
> ? kernel_write (/usr/work/linux/fs/read_write.c:660)
> ? lock_downgrade (/usr/work/linux/kernel/locking/lockdep.c:5857)
> ksys_write (/usr/work/linux/fs/read_write.c:732)
> ? __x64_sys_read (/usr/work/linux/fs/read_write.c:721)
> ? do_user_addr_fault (/usr/work/linux/./arch/x86/include/asm/
> preempt.h:84 (discriminator 13) /usr/work/linux/./include/linux/
> rcupdate.h:98 (discriminator 13) /usr/work/linux/./include/linux/
> rcupdate.h:882 (discriminator 13) /usr/work/linux/./include/linux/
> mm.h:742 (discriminator 13) /usr/work/linux/arch/x86/mm/fault.c:1340
> (discriminator 13))
> do_syscall_64 (/usr/work/linux/arch/x86/entry/common.c:52 (discriminator
> 1) /usr/work/linux/arch/x86/entry/common.c:83 (discriminator 1))
> entry_SYSCALL_64_after_hwframe (/usr/work/linux/arch/x86/entry/
> entry_64.S:130)
> RIP: 0033:0x7f6a4d5018b7
>
>>>
>>> it happens after applying commit 7fb1073300a2 ("net: Hold
>>> rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net()")
>>>
>>> test scenario includes configuration and traffic over two namespaces
>>> associated with two different VFs.
>>
>> Could you elaborate more about the test scenario, especially
>> how each device/netns is dismantled after the test case ?
>>
>
> we set up a network configuration which includes two VFs isolated using
> two namespaces (there’s also bridge in this topology), we pass some
> traffic between VFs. At the end of test (cleanup) we delete network
> namespaces, wait for 0.5 sec and unbind VFs of NIC.
>
> note that when I extended the timeout after deleting the namespaces the
> issue doesn’t reproduce.
>
>> I guess the VF is moved to init_net ?
>>
>
> this should be the behavior in case deletion of the namespaces happens
> before the unbind of VFs.
>
>>>
>>>
>>> On 04/01/2025 8:37, Kuniyuki Iwashima wrote:
>>>> (un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
>>>> the notifier for all netdev in the netns.
>>>>
>>>> Let's convert the RTNL to rtnl_net_lock().
>>>>
>>>> Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
>>>> yet) protected by per-netns RTNL of both src and dst netns; we need to
>>>> convert wireless and hyperv drivers that call
>>>> dev_change_net_namespace().
>>>>
>>>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>>>> ---
>>>> net/core/dev.c | 16 ++++++++++------
>>>> 1 file changed, 10 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index f6c6559e2548..a0dd34463901 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -1943,15 +1943,17 @@ int
>>>> register_netdevice_notifier_dev_net(struct net_device *dev,
>>>> struct notifier_block *nb,
>>>> struct netdev_net_notifier *nn)
>>>> {
>>>> + struct net *net = dev_net(dev);
>>>
>>> it seems to happen since the net pointer is acquired here without a
>>> lock.
>>> Note that KASAN issue is not triggered when executing with rtnl_lock()
>>> taken before this line. and our kernel .config expands
>>> rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not
>>> set).
>>
>> It sounds like the device was being moved to another netns while
>> unregister_netdevice_notifier_dev_net() was called.
>>
>> Could you check if dev_net() is changed before/after rtnl_lock() in
>>
>> * register_netdevice_notifier_dev_net()
>> * unregister_netdevice_notifier_dev_net()
>>
>> ?
>
> When checking dev_net before and after taking the lock the issue won’t
> reproduce.
> note that when issue reproduce we arrive to
> unregister_netdevice_notifier_dev_net with an invalid net pointer
> (verified it with prints of its value, and it's not the same consistent
> value as is throughout rest of the test).
> we suspect the issue related to the async ns deletion.
>
>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-27 17:26 ` Yael Chemla
@ 2025-01-27 17:54 ` Kuniyuki Iwashima
0 siblings, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-27 17:54 UTC (permalink / raw)
To: ychemla; +Cc: davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev, pabeni
From: Yael Chemla <ychemla@nvidia.com>
Date: Mon, 27 Jan 2025 19:26:34 +0200
> Hi Kuniyuki,
>
> any update on this?
> do you need further info?
Sorry, I was busy as on-call last week.
I'll investigate it this week.
Thanks!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-20 18:55 ` Yael Chemla
2025-01-27 17:26 ` Yael Chemla
@ 2025-01-27 23:26 ` Kuniyuki Iwashima
2025-01-29 16:21 ` Yael Chemla
1 sibling, 1 reply; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-27 23:26 UTC (permalink / raw)
To: ychemla; +Cc: davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev, pabeni
From: Yael Chemla <ychemla@nvidia.com>
Date: Mon, 20 Jan 2025 20:55:07 +0200
> >>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>> index f6c6559e2548..a0dd34463901 100644
> >>> --- a/net/core/dev.c
> >>> +++ b/net/core/dev.c
> >>> @@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> >>> struct notifier_block *nb,
> >>> struct netdev_net_notifier *nn)
> >>> {
> >>> + struct net *net = dev_net(dev);
> >>
> >> it seems to happen since the net pointer is acquired here without a lock.
> >> Note that KASAN issue is not triggered when executing with rtnl_lock()
> >> taken before this line. and our kernel .config expands
> >> rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not set).
> >
> > It sounds like the device was being moved to another netns while
> > unregister_netdevice_notifier_dev_net() was called.
> >
> > Could you check if dev_net() is changed before/after rtnl_lock() in
> >
> > * register_netdevice_notifier_dev_net()
> > * unregister_netdevice_notifier_dev_net()
> >
> > ?
>
> When checking dev_net before and after taking the lock the issue won’t
> reproduce.
> note that when issue reproduce we arrive to
> unregister_netdevice_notifier_dev_net with an invalid net pointer
> (verified it with prints of its value, and it's not the same consistent
> value as is throughout rest of the test).
Does an invalid net pointer means a dead netns pointer ?
dev_net() and dev_net_set() use rcu_dereference() and rcu_assign_pointer(),
so I guess it should not be an invalid address at least.
> we suspect the issue related to the async ns deletion.
I think async netns change would trigger the issue too.
Could you try this patch ?
---8<---
diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..f4438ec24683 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2070,20 +2070,50 @@ static void __move_netdevice_notifier_net(struct net *src_net,
__register_netdevice_notifier_net(dst_net, nb, true);
}
+static void rtnl_net_dev_lock(struct net_device *dev)
+{
+ struct net *net;
+
+again:
+ /* netns might be being dismantled. */
+ net = maybe_get_net(dev_net(dev));
+ if (!net) {
+ cond_resched();
+ goto again;
+ }
+
+ rtnl_net_lock(net);
+
+ /* dev might be moved to another netns. */
+ if (!net_eq(net, dev_net(dev))) {
+ rtnl_net_unlock(net);
+ put_net(net);
+ cond_resched();
+ goto again;
+ }
+}
+
+static void rtnl_net_dev_unlock(struct net_device *dev)
+{
+ struct net *net = dev_net(dev);
+
+ rtnl_net_unlock(net);
+ put_net(net);
+}
+
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
- struct net *net = dev_net(dev);
int err;
- rtnl_net_lock(net);
- err = __register_netdevice_notifier_net(net, nb, false);
+ rtnl_net_dev_lock(dev);
+ err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
if (!err) {
nn->nb = nb;
list_add(&nn->list, &dev->net_notifier_list);
}
- rtnl_net_unlock(net);
+ rtnl_net_dev_unlock(dev);
return err;
}
@@ -2093,13 +2123,12 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
- struct net *net = dev_net(dev);
int err;
- rtnl_net_lock(net);
+ rtnl_net_dev_lock(dev);
list_del(&nn->list);
- err = __unregister_netdevice_notifier_net(net, nb);
- rtnl_net_unlock(net);
+ err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
+ rtnl_net_dev_unlock(dev);
return err;
}
---8<---
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-27 23:26 ` Kuniyuki Iwashima
@ 2025-01-29 16:21 ` Yael Chemla
2025-01-30 0:32 ` Kuniyuki Iwashima
0 siblings, 1 reply; 16+ messages in thread
From: Yael Chemla @ 2025-01-29 16:21 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, edumazet, horms, kuba, kuni1840, netdev, pabeni
On 28/01/2025 1:26, Kuniyuki Iwashima wrote:
> From: Yael Chemla <ychemla@nvidia.com>
> Date: Mon, 20 Jan 2025 20:55:07 +0200
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index f6c6559e2548..a0dd34463901 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
>>>>> struct notifier_block *nb,
>>>>> struct netdev_net_notifier *nn)
>>>>> {
>>>>> + struct net *net = dev_net(dev);
>>>>
>>>> it seems to happen since the net pointer is acquired here without a lock.
>>>> Note that KASAN issue is not triggered when executing with rtnl_lock()
>>>> taken before this line. and our kernel .config expands
>>>> rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not set).
>>>
>>> It sounds like the device was being moved to another netns while
>>> unregister_netdevice_notifier_dev_net() was called.
>>>
>>> Could you check if dev_net() is changed before/after rtnl_lock() in
>>>
>>> * register_netdevice_notifier_dev_net()
>>> * unregister_netdevice_notifier_dev_net()
>>>
>>> ?
>>
>> When checking dev_net before and after taking the lock the issue won’t
>> reproduce.
>> note that when issue reproduce we arrive to
>> unregister_netdevice_notifier_dev_net with an invalid net pointer
>> (verified it with prints of its value, and it's not the same consistent
>> value as is throughout rest of the test).
>
> Does an invalid net pointer means a dead netns pointer ?
> dev_net() and dev_net_set() use rcu_dereference() and rcu_assign_pointer(),
> so I guess it should not be an invalid address at least.
>
I logged several values at the entrance of
unregister_netdevice_notifier_dev_net when issue reproduced:
1) fields of net->ns (struct ns_common):
count: the namespace refcount is 0 (i.e. net->ns.count, used
refcount_read to read it).
inum: the value doesn't appear to be garbage but differ from its
constant value throughout the test.
2) net pointer (struct net): value differ from its constant value
observed during the rest of the test.
hope this helps and please let me know if more info is needed.
>
>> we suspect the issue related to the async ns deletion.
>
> I think async netns change would trigger the issue too.
>
> Could you try this patch ?
>
I tested your patch and issue won't reproduce with it
(CONFIG_DEBUG_NET_SMALL_RTNL is not set in my config).
Tested-by: Yael Chemla <ychemla@nvidia.com>
Thanks a lot!
> ---8<---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index afa2282f2604..f4438ec24683 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2070,20 +2070,50 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> __register_netdevice_notifier_net(dst_net, nb, true);
> }
>
> +static void rtnl_net_dev_lock(struct net_device *dev)
> +{
> + struct net *net;
> +
> +again:
> + /* netns might be being dismantled. */
> + net = maybe_get_net(dev_net(dev));
> + if (!net) {
> + cond_resched();
> + goto again;
> + }
> +
> + rtnl_net_lock(net);
> +
> + /* dev might be moved to another netns. */
> + if (!net_eq(net, dev_net(dev))) {
> + rtnl_net_unlock(net);
> + put_net(net);
> + cond_resched();
> + goto again;
> + }
> +}
> +
> +static void rtnl_net_dev_unlock(struct net_device *dev)
> +{
> + struct net *net = dev_net(dev);
> +
> + rtnl_net_unlock(net);
> + put_net(net);
> +}
> +
> int register_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> {
> - struct net *net = dev_net(dev);
> int err;
>
> - rtnl_net_lock(net);
> - err = __register_netdevice_notifier_net(net, nb, false);
> + rtnl_net_dev_lock(dev);
> + err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
> if (!err) {
> nn->nb = nb;
> list_add(&nn->list, &dev->net_notifier_list);
> }
> - rtnl_net_unlock(net);
> + rtnl_net_dev_unlock(dev);
>
> return err;
> }
> @@ -2093,13 +2123,12 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> {
> - struct net *net = dev_net(dev);
> int err;
>
> - rtnl_net_lock(net);
> + rtnl_net_dev_lock(dev);
> list_del(&nn->list);
> - err = __unregister_netdevice_notifier_net(net, nb);
> - rtnl_net_unlock(net);
> + err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
> + rtnl_net_dev_unlock(dev);
>
> return err;
> }
> ---8<---
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
2025-01-29 16:21 ` Yael Chemla
@ 2025-01-30 0:32 ` Kuniyuki Iwashima
0 siblings, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-01-30 0:32 UTC (permalink / raw)
To: ychemla; +Cc: davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev, pabeni
From: Yael Chemla <ychemla@nvidia.com>
Date: Wed, 29 Jan 2025 18:21:23 +0200
> On 28/01/2025 1:26, Kuniyuki Iwashima wrote:
> > From: Yael Chemla <ychemla@nvidia.com>
> > Date: Mon, 20 Jan 2025 20:55:07 +0200
> >>>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>>> index f6c6559e2548..a0dd34463901 100644
> >>>>> --- a/net/core/dev.c
> >>>>> +++ b/net/core/dev.c
> >>>>> @@ -1943,15 +1943,17 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> >>>>> struct notifier_block *nb,
> >>>>> struct netdev_net_notifier *nn)
> >>>>> {
> >>>>> + struct net *net = dev_net(dev);
> >>>>
> >>>> it seems to happen since the net pointer is acquired here without a lock.
> >>>> Note that KASAN issue is not triggered when executing with rtnl_lock()
> >>>> taken before this line. and our kernel .config expands
> >>>> rtnl_net_lock(net) to rtnl_lock() (CONFIG_DEBUG_NET_SMALL_RTNL is not set).
> >>>
> >>> It sounds like the device was being moved to another netns while
> >>> unregister_netdevice_notifier_dev_net() was called.
> >>>
> >>> Could you check if dev_net() is changed before/after rtnl_lock() in
> >>>
> >>> * register_netdevice_notifier_dev_net()
> >>> * unregister_netdevice_notifier_dev_net()
> >>>
> >>> ?
> >>
> >> When checking dev_net before and after taking the lock the issue won’t
> >> reproduce.
> >> note that when issue reproduce we arrive to
> >> unregister_netdevice_notifier_dev_net with an invalid net pointer
> >> (verified it with prints of its value, and it's not the same consistent
> >> value as is throughout rest of the test).
> >
> > Does an invalid net pointer means a dead netns pointer ?
> > dev_net() and dev_net_set() use rcu_dereference() and rcu_assign_pointer(),
> > so I guess it should not be an invalid address at least.
> >
> I logged several values at the entrance of
> unregister_netdevice_notifier_dev_net when issue reproduced:
> 1) fields of net->ns (struct ns_common):
> count: the namespace refcount is 0 (i.e. net->ns.count, used
> refcount_read to read it).
This indicates here we race with cleanup_net().
>
> inum: the value doesn't appear to be garbage but differ from its
> constant value throughout the test.
>
> 2) net pointer (struct net): value differ from its constant value
> observed during the rest of the test.
>
> hope this helps and please let me know if more info is needed.
>
> >
> >> we suspect the issue related to the async ns deletion.
> >
> > I think async netns change would trigger the issue too.
> >
> > Could you try this patch ?
> >
>
> I tested your patch and issue won't reproduce with it
> (CONFIG_DEBUG_NET_SMALL_RTNL is not set in my config).
>
> Tested-by: Yael Chemla <ychemla@nvidia.com>
Thanks for testing !
Will post the fix officially.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-01-30 0:32 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-04 6:37 [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 1/4] net: Convert netdev_chain to blocking_notifier Kuniyuki Iwashima
2025-01-08 2:28 ` kernel test robot
2025-01-04 6:37 ` [PATCH v1 net-next 2/4] net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier() Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 3/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net() Kuniyuki Iwashima
2025-01-04 6:37 ` [PATCH v1 net-next 4/4] net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net() Kuniyuki Iwashima
2025-01-15 22:16 ` Yael Chemla
2025-01-16 2:54 ` Kuniyuki Iwashima
2025-01-20 18:55 ` Yael Chemla
2025-01-27 17:26 ` Yael Chemla
2025-01-27 17:54 ` Kuniyuki Iwashima
2025-01-27 23:26 ` Kuniyuki Iwashima
2025-01-29 16:21 ` Yael Chemla
2025-01-30 0:32 ` Kuniyuki Iwashima
2025-01-04 15:37 ` [PATCH v1 net-next 0/4] net: Hold per-netns RTNL during netdev notifier registration Jakub Kicinski
2025-01-05 7:59 ` Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).