* [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)).
@ 2025-02-17 19:11 Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 1/3] net: Add net_passive_inc() and net_passive_dec() Kuniyuki Iwashima
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-02-17 19:11 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Yael Chemla reported that commit 7fb1073300a2 ("net: Hold rtnl_net_lock()
in (un)?register_netdevice_notifier_dev_net().") started to trigger KASAN's
use-after-free splat.
The problem is that dev_net(dev) fetched before rtnl_net_lock() might be
different after rtnl_net_lock().
The patch 2 fixes the issue by checking dev_net(dev) after rtnl_net_lock(),
and the patch 3 fixes the same potential issue that would emerge once RTNL
is removed.
Changes:
v5:
* Use do-while loop instead of goto
v4: https://lore.kernel.org/netdev/20250212064206.18159-1-kuniyu@amazon.com/
* Add patch 1
* Fix build failure for !CONFIG_NET_NS in patch 2
v3: https://lore.kernel.org/netdev/20250211051217.12613-1-kuniyu@amazon.com/
* Bump net->passive instead of maybe_get_net()
* Remove msleep(1) loop
* Use rcu_access_pointer() instead of rcu_read_lock().
v2: https://lore.kernel.org/netdev/20250207044251.65421-1-kuniyu@amazon.com/
* Use dev_net_rcu()
* Use msleep(1) instead of cond_resched() after maybe_get_net()
* Remove cond_resched() after net_eq() check
v1: https://lore.kernel.org/netdev/20250130232435.43622-1-kuniyu@amazon.com/
Kuniyuki Iwashima (3):
net: Add net_passive_inc() and net_passive_dec().
net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
dev: Use rtnl_net_dev_lock() in unregister_netdev().
include/net/net_namespace.h | 11 ++++++++
net/core/dev.c | 54 +++++++++++++++++++++++++++++++------
net/core/net_namespace.c | 8 +++---
3 files changed, 61 insertions(+), 12 deletions(-)
--
2.39.5 (Apple Git-154)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 net 1/3] net: Add net_passive_inc() and net_passive_dec().
2025-02-17 19:11 [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) Kuniyuki Iwashima
@ 2025-02-17 19:11 ` Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net() Kuniyuki Iwashima
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-02-17 19:11 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
net_drop_ns() is NULL when CONFIG_NET_NS is disabled.
The next patch introduces a function that increments
and decrements net->passive.
As a prep, let's rename and export net_free() to
net_passive_dec() and add net_passive_inc().
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89i+oUCt2VGvrbrweniTendZFEh+nwS=uonc004-aPkWy-Q@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
include/net/net_namespace.h | 11 +++++++++++
net/core/net_namespace.c | 8 ++++----
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 7ba1402ca779..f467a66abc6b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -297,6 +297,7 @@ static inline int check_net(const struct net *net)
}
void net_drop_ns(void *);
+void net_passive_dec(struct net *net);
#else
@@ -326,8 +327,18 @@ static inline int check_net(const struct net *net)
}
#define net_drop_ns NULL
+
+static inline void net_passive_dec(struct net *net)
+{
+ refcount_dec(&net->passive);
+}
#endif
+static inline void net_passive_inc(struct net *net)
+{
+ refcount_inc(&net->passive);
+}
+
/* Returns true if the netns initialization is completed successfully */
static inline bool net_initialized(const struct net *net)
{
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index cb39a12b2f82..4303f2a49262 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -464,7 +464,7 @@ static void net_complete_free(void)
}
-static void net_free(struct net *net)
+void net_passive_dec(struct net *net)
{
if (refcount_dec_and_test(&net->passive)) {
kfree(rcu_access_pointer(net->gen));
@@ -482,7 +482,7 @@ void net_drop_ns(void *p)
struct net *net = (struct net *)p;
if (net)
- net_free(net);
+ net_passive_dec(net);
}
struct net *copy_net_ns(unsigned long flags,
@@ -523,7 +523,7 @@ struct net *copy_net_ns(unsigned long flags,
key_remove_domain(net->key_domain);
#endif
put_user_ns(user_ns);
- net_free(net);
+ net_passive_dec(net);
dec_ucounts:
dec_net_namespaces(ucounts);
return ERR_PTR(rv);
@@ -672,7 +672,7 @@ static void cleanup_net(struct work_struct *work)
key_remove_domain(net->key_domain);
#endif
put_user_ns(net->user_ns);
- net_free(net);
+ net_passive_dec(net);
}
cleanup_net_task = NULL;
}
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-02-17 19:11 [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 1/3] net: Add net_passive_inc() and net_passive_dec() Kuniyuki Iwashima
@ 2025-02-17 19:11 ` Kuniyuki Iwashima
2025-03-26 13:46 ` Yael Chemla
2025-02-17 19:11 ` [PATCH v5 net 3/3] dev: Use rtnl_net_dev_lock() in unregister_netdev() Kuniyuki Iwashima
2025-02-19 2:50 ` [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) patchwork-bot+netdevbpf
3 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-02-17 19:11 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, Yael Chemla
After the cited commit, dev_net(dev) is fetched before holding RTNL
and passed to __unregister_netdevice_notifier_net().
However, dev_net(dev) might be different after holding RTNL.
In the reported case [0], while removing a VF device, its netns was
being dismantled and the VF was moved to init_net.
So the following sequence is basically illegal when dev was fetched
without lookup:
net = dev_net(dev);
rtnl_net_lock(net);
Let's use a new helper rtnl_net_dev_lock() to fix the race.
It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
dev_net_rcu(dev) is changed after rtnl_net_lock().
[0]:
BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:123)
print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
kasan_report (mm/kasan/report.c:604)
notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
call_netdevice_notifiers_info (net/core/dev.c:2011)
unregister_netdevice_many_notify (net/core/dev.c:11551)
unregister_netdevice_queue (net/core/dev.c:11487)
unregister_netdev (net/core/dev.c:11635)
mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
auxiliary_bus_remove (drivers/base/auxiliary.c:230)
device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
unbind_store (drivers/base/bus.c:245)
kernfs_fop_write_iter (fs/kernfs/file.c:338)
vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
ksys_write (fs/read_write.c:732)
do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f6a4d5018b7
Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
Reported-by: Yael Chemla <ychemla@nvidia.com>
Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
v5:
* Use do-while loop
v4:
* Fix build failure when !CONFIG_NET_NS
* Use net_passive_dec()
v3:
* Bump net->passive instead of maybe_get_net()
* Remove msleep(1) loop
* Use rcu_access_pointer() instead of rcu_read_lock().
v2:
* Use dev_net_rcu().
* Use msleep(1) instead of cond_resched() after maybe_get_net()
* Remove cond_resched() after net_eq() check
v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
---
net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 44 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b91658e8aedb..19e268568282 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
__register_netdevice_notifier_net(dst_net, nb, true);
}
+static void rtnl_net_dev_lock(struct net_device *dev)
+{
+ bool again;
+
+ do {
+ struct net *net;
+
+ again = false;
+
+ /* netns might be being dismantled. */
+ rcu_read_lock();
+ net = dev_net_rcu(dev);
+ net_passive_inc(net);
+ rcu_read_unlock();
+
+ rtnl_net_lock(net);
+
+#ifdef CONFIG_NET_NS
+ /* dev might have been moved to another netns. */
+ if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
+ rtnl_net_unlock(net);
+ net_passive_dec(net);
+ again = true;
+ }
+#endif
+ } while (again);
+}
+
+static void rtnl_net_dev_unlock(struct net_device *dev)
+{
+ struct net *net = dev_net(dev);
+
+ rtnl_net_unlock(net);
+ net_passive_dec(net);
+}
+
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
@@ -2077,6 +2113,11 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
struct net *net = dev_net(dev);
int err;
+ /* rtnl_net_lock() assumes dev is not yet published by
+ * register_netdevice().
+ */
+ DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->dev_list));
+
rtnl_net_lock(net);
err = __register_netdevice_notifier_net(net, nb, false);
if (!err) {
@@ -2093,13 +2134,12 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
- struct net *net = dev_net(dev);
int err;
- rtnl_net_lock(net);
+ rtnl_net_dev_lock(dev);
list_del(&nn->list);
- err = __unregister_netdevice_notifier_net(net, nb);
- rtnl_net_unlock(net);
+ err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
+ rtnl_net_dev_unlock(dev);
return err;
}
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 net 3/3] dev: Use rtnl_net_dev_lock() in unregister_netdev().
2025-02-17 19:11 [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 1/3] net: Add net_passive_inc() and net_passive_dec() Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net() Kuniyuki Iwashima
@ 2025-02-17 19:11 ` Kuniyuki Iwashima
2025-02-19 2:50 ` [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) patchwork-bot+netdevbpf
3 siblings, 0 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-02-17 19:11 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
The following sequence is basically illegal when dev was fetched
without lookup because dev_net(dev) might be different after holding
rtnl_net_lock():
net = dev_net(dev);
rtnl_net_lock(net);
Let's use rtnl_net_dev_lock() in unregister_netdev().
Note that there is no real bug in unregister_netdev() for now
because RTNL protects the scope even if dev_net(dev) is changed
before/after RTNL.
Fixes: 00fb9823939e ("dev: Hold per-netns RTNL in (un)?register_netdev().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 19e268568282..fafd2f4b5d5d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11920,11 +11920,9 @@ EXPORT_SYMBOL(unregister_netdevice_many);
*/
void unregister_netdev(struct net_device *dev)
{
- struct net *net = dev_net(dev);
-
- rtnl_net_lock(net);
+ rtnl_net_dev_lock(dev);
unregister_netdevice(dev);
- rtnl_net_unlock(net);
+ rtnl_net_dev_unlock(dev);
}
EXPORT_SYMBOL(unregister_netdev);
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)).
2025-02-17 19:11 [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) Kuniyuki Iwashima
` (2 preceding siblings ...)
2025-02-17 19:11 ` [PATCH v5 net 3/3] dev: Use rtnl_net_dev_lock() in unregister_netdev() Kuniyuki Iwashima
@ 2025-02-19 2:50 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-19 2:50 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, edumazet, kuba, pabeni, horms, kuni1840, netdev
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 17 Feb 2025 11:11:26 -0800 you wrote:
> Yael Chemla reported that commit 7fb1073300a2 ("net: Hold rtnl_net_lock()
> in (un)?register_netdevice_notifier_dev_net().") started to trigger KASAN's
> use-after-free splat.
>
> The problem is that dev_net(dev) fetched before rtnl_net_lock() might be
> different after rtnl_net_lock().
>
> [...]
Here is the summary with links:
- [v5,net,1/3] net: Add net_passive_inc() and net_passive_dec().
https://git.kernel.org/netdev/net/c/e57a6320215c
- [v5,net,2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
https://git.kernel.org/netdev/net/c/65161fb544aa
- [v5,net,3/3] dev: Use rtnl_net_dev_lock() in unregister_netdev().
https://git.kernel.org/netdev/net/c/d4c6bfc83936
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-02-17 19:11 ` [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net() Kuniyuki Iwashima
@ 2025-03-26 13:46 ` Yael Chemla
2025-03-26 21:54 ` Kuniyuki Iwashima
0 siblings, 1 reply; 13+ messages in thread
From: Yael Chemla @ 2025-03-26 13:46 UTC (permalink / raw)
To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: Kuniyuki Iwashima, netdev
On 17/02/2025 21:11, Kuniyuki Iwashima wrote:
> After the cited commit, dev_net(dev) is fetched before holding RTNL
> and passed to __unregister_netdevice_notifier_net().
>
> However, dev_net(dev) might be different after holding RTNL.
>
> In the reported case [0], while removing a VF device, its netns was
> being dismantled and the VF was moved to init_net.
>
> So the following sequence is basically illegal when dev was fetched
> without lookup:
>
> net = dev_net(dev);
> rtnl_net_lock(net);
>
> Let's use a new helper rtnl_net_dev_lock() to fix the race.
>
> It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
> dev_net_rcu(dev) is changed after rtnl_net_lock().
>
> [0]:
> BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:123)
> print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
> kasan_report (mm/kasan/report.c:604)
> notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> call_netdevice_notifiers_info (net/core/dev.c:2011)
> unregister_netdevice_many_notify (net/core/dev.c:11551)
> unregister_netdevice_queue (net/core/dev.c:11487)
> unregister_netdev (net/core/dev.c:11635)
> mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
> auxiliary_bus_remove (drivers/base/auxiliary.c:230)
> device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
> device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
> mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
> mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
> mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
> remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
> pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
> device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> unbind_store (drivers/base/bus.c:245)
> kernfs_fop_write_iter (fs/kernfs/file.c:338)
> vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
> ksys_write (fs/read_write.c:732)
> do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f6a4d5018b7
>
> Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> Reported-by: Yael Chemla <ychemla@nvidia.com>
> Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> ---
> v5:
> * Use do-while loop
>
> v4:
> * Fix build failure when !CONFIG_NET_NS
> * Use net_passive_dec()
>
> v3:
> * Bump net->passive instead of maybe_get_net()
> * Remove msleep(1) loop
> * Use rcu_access_pointer() instead of rcu_read_lock().
>
> v2:
> * Use dev_net_rcu().
> * Use msleep(1) instead of cond_resched() after maybe_get_net()
> * Remove cond_resched() after net_eq() check
>
> v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> ---
> net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b91658e8aedb..19e268568282 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> __register_netdevice_notifier_net(dst_net, nb, true);
> }
>
> +static void rtnl_net_dev_lock(struct net_device *dev)
> +{
> + bool again;
> +
> + do {
> + struct net *net;
> +
> + again = false;
> +
> + /* netns might be being dismantled. */
> + rcu_read_lock();
> + net = dev_net_rcu(dev);
> + net_passive_inc(net);
Hi Kuniyuki,
It seems we are still encountering the previously reorted issue,
even when running with your latest fix. However, the problem has become
less frequent, now requiring multiple test iterations to reproduce.
1) we identified the following warnings (each accompanied by a call
trace; only one is detailed below, though others are similar):
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 6 PID: 1105 at lib/refcount.c:25 refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:25 (discriminator 1))
and also
refcount_t: underflow; use-after-free.
WARNING: CPU: 6 PID: 1105 at lib/refcount.c:28 refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:28 (discriminator 1))
2) test scenario:
sets up a network topology of two VFs on different eSwitch, performs
ping tests between them, verifies traffic rules offloading, and cleans
up the environment afterward.
3) the warning is triggered upon reaching the
unregister_netdevice_notifier_dev_net when both net->ns.count and
net->passive reference counts are zero.
Call Trace:
<TASK>
? __warn (/usr/work/linux/kernel/panic.c:748)
? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
? report_bug (/usr/work/linux/lib/bug.c:180 /usr/work/linux/lib/bug.c:219)
? handle_bug (/usr/work/linux/arch/x86/kernel/traps.c:285)
? exc_invalid_op (/usr/work/linux/arch/x86/kernel/traps.c:309
(discriminator 1))
? asm_exc_invalid_op
(/usr/work/linux/./arch/x86/include/asm/idtentry.h:574)
? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
rtnl_net_dev_lock (/usr/work/linux/./include/linux/refcount.h:190
/usr/work/linux/./include/linux/refcount.h:241
/usr/work/linux/./include/linux/refcount.h:258
/usr/work/linux/./include/net/net_namespace.h:339
/usr/work/linux/net/core/dev.c:2076)
unregister_netdevice_notifier_dev_net
(/usr/work/linux/./include/linux/list.h:218
/usr/work/linux/./include/linux/list.h:229
/usr/work/linux/net/core/dev.c:2125)
mlx5e_tc_nic_cleanup
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:5352)
mlx5_core
mlx5e_cleanup_nic_rx
(/usr/work/linux/./drivers/net/ethernet/mellanox/mlx5/core/en/fs.h:161
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:5765)
mlx5_core
mlx5e_detach_netdev
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6231)
mlx5_core
_mlx5e_suspend
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6399)
mlx5_core
mlx5e_remove
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6526
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6553)
mlx5_core
auxiliary_bus_remove (/usr/work/linux/drivers/base/auxiliary.c:230)
device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275
/usr/work/linux/drivers/base/dd.c:1296)
bus_remove_device (/usr/work/linux/./include/linux/kobject.h:193
/usr/work/linux/drivers/base/base.h:73
/usr/work/linux/drivers/base/bus.c:586)
device_del (/usr/work/linux/drivers/base/power/power.h:142
/usr/work/linux/drivers/base/core.c:3856)
? devl_param_driverinit_value_get (/usr/work/linux/net/devlink/param.c:778)
mlx5_rescan_drivers_locked
(/usr/work/linux/./include/linux/auxiliary_bus.h:242
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:333
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:535
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
mlx5_unregister_device
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:468)
mlx5_core
mlx5_uninit_one (/usr/work/linux/./arch/x86/include/asm/bitops.h:206
/usr/work/linux/./arch/x86/include/asm/bitops.h:238
/usr/work/linux/./include/asm-generic/bitops/instrumented-non-atomic.h:142
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:1576)
mlx5_core
remove_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:969
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:2034)
mlx5_core
pci_device_remove (/usr/work/linux/./arch/x86/include/asm/atomic.h:23
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:457
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2426
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2456
/usr/work/linux/./include/linux/atomic/atomic-instrumented.h:1518
/usr/work/linux/./include/linux/pm_runtime.h:129
/usr/work/linux/drivers/pci/pci-driver.c:475)
device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1275
/usr/work/linux/drivers/base/dd.c:1296)
pci_stop_bus_device (/usr/work/linux/drivers/pci/remove.c:46
/usr/work/linux/drivers/pci/remove.c:106)
pci_stop_and_remove_bus_device (/usr/work/linux/drivers/pci/remove.c:141)
pci_iov_remove_virtfn (/usr/work/linux/drivers/pci/iov.c:377)
sriov_disable (/usr/work/linux/drivers/pci/iov.c:712 (discriminator 1)
/usr/work/linux/drivers/pci/iov.c:723 (discriminator 1))
mlx5_sriov_disable
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:208)
mlx5_core
mlx5_core_sriov_configure
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:229)
mlx5_core
sriov_numvfs_store (/usr/work/linux/./include/linux/device.h:1045
/usr/work/linux/drivers/pci/iov.c:471)
kernfs_fop_write_iter (/usr/work/linux/fs/kernfs/file.c:338)
vfs_write (/usr/work/linux/fs/read_write.c:587 (discriminator 1)
/usr/work/linux/fs/read_write.c:679 (discriminator 1))
ksys_write (/usr/work/linux/fs/read_write.c:732)
do_syscall_64 (/usr/work/linux/arch/x86/entry/common.c:52
(discriminator 1) /usr/work/linux/arch/x86/entry/common.c:83
(discriminator 1))
entry_SYSCALL_64_after_hwframe
(/usr/work/linux/arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7fbe3e9018b7
let me know if you need more information
Yael
> + rcu_read_unlock();
> +
> + rtnl_net_lock(net);
> +
> +#ifdef CONFIG_NET_NS
> + /* dev might have been moved to another netns. */
> + if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
> + rtnl_net_unlock(net);
> + net_passive_dec(net);
> + again = true;
> + }
> +#endif
> + } while (again);
> +}
> +
> +static void rtnl_net_dev_unlock(struct net_device *dev)
> +{
> + struct net *net = dev_net(dev);
> +
> + rtnl_net_unlock(net);
> + net_passive_dec(net);
> +}
> +
> int register_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> @@ -2077,6 +2113,11 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> struct net *net = dev_net(dev);
> int err;
>
> + /* rtnl_net_lock() assumes dev is not yet published by
> + * register_netdevice().
> + */
> + DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->dev_list));
> +
> rtnl_net_lock(net);
> err = __register_netdevice_notifier_net(net, nb, false);
> if (!err) {
> @@ -2093,13 +2134,12 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> {
> - struct net *net = dev_net(dev);
> int err;
>
> - rtnl_net_lock(net);
> + rtnl_net_dev_lock(dev);
> list_del(&nn->list);
> - err = __unregister_netdevice_notifier_net(net, nb);
> - rtnl_net_unlock(net);
> + err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
> + rtnl_net_dev_unlock(dev);
>
> return err;
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-03-26 13:46 ` Yael Chemla
@ 2025-03-26 21:54 ` Kuniyuki Iwashima
2025-03-26 22:10 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-26 21:54 UTC (permalink / raw)
To: ychemla; +Cc: davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev, pabeni
From: Yael Chemla <ychemla@nvidia.com>
Date: Wed, 26 Mar 2025 15:46:40 +0200
> On 17/02/2025 21:11, Kuniyuki Iwashima wrote:
> > After the cited commit, dev_net(dev) is fetched before holding RTNL
> > and passed to __unregister_netdevice_notifier_net().
> >
> > However, dev_net(dev) might be different after holding RTNL.
> >
> > In the reported case [0], while removing a VF device, its netns was
> > being dismantled and the VF was moved to init_net.
> >
> > So the following sequence is basically illegal when dev was fetched
> > without lookup:
> >
> > net = dev_net(dev);
> > rtnl_net_lock(net);
> >
> > Let's use a new helper rtnl_net_dev_lock() to fix the race.
> >
> > It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
> > dev_net_rcu(dev) is changed after rtnl_net_lock().
> >
> > [0]:
> > BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> > Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> > <TASK>
> > dump_stack_lvl (lib/dump_stack.c:123)
> > print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
> > kasan_report (mm/kasan/report.c:604)
> > notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> > call_netdevice_notifiers_info (net/core/dev.c:2011)
> > unregister_netdevice_many_notify (net/core/dev.c:11551)
> > unregister_netdevice_queue (net/core/dev.c:11487)
> > unregister_netdev (net/core/dev.c:11635)
> > mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
> > auxiliary_bus_remove (drivers/base/auxiliary.c:230)
> > device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> > bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
> > device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
> > mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
> > mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
> > mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
> > remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
> > pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
> > device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> > unbind_store (drivers/base/bus.c:245)
> > kernfs_fop_write_iter (fs/kernfs/file.c:338)
> > vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
> > ksys_write (fs/read_write.c:732)
> > do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
> > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> > RIP: 0033:0x7f6a4d5018b7
> >
> > Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> > Reported-by: Yael Chemla <ychemla@nvidia.com>
> > Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Reviewed-by: Eric Dumazet <edumazet@google.com>
> > ---
> > v5:
> > * Use do-while loop
> >
> > v4:
> > * Fix build failure when !CONFIG_NET_NS
> > * Use net_passive_dec()
> >
> > v3:
> > * Bump net->passive instead of maybe_get_net()
> > * Remove msleep(1) loop
> > * Use rcu_access_pointer() instead of rcu_read_lock().
> >
> > v2:
> > * Use dev_net_rcu().
> > * Use msleep(1) instead of cond_resched() after maybe_get_net()
> > * Remove cond_resched() after net_eq() check
> >
> > v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> > ---
> > net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 44 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index b91658e8aedb..19e268568282 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> > __register_netdevice_notifier_net(dst_net, nb, true);
> > }
> >
> > +static void rtnl_net_dev_lock(struct net_device *dev)
> > +{
> > + bool again;
> > +
> > + do {
> > + struct net *net;
> > +
> > + again = false;
> > +
> > + /* netns might be being dismantled. */
> > + rcu_read_lock();
> > + net = dev_net_rcu(dev);
> > + net_passive_inc(net);
>
> Hi Kuniyuki,
> It seems we are still encountering the previously reorted issue,
> even when running with your latest fix. However, the problem has become
> less frequent, now requiring multiple test iterations to reproduce.
Thanks for reporting!
>
> 1) we identified the following warnings (each accompanied by a call
> trace; only one is detailed below, though others are similar):
>
> refcount_t: addition on 0; use-after-free.
> WARNING: CPU: 6 PID: 1105 at lib/refcount.c:25 refcount_warn_saturate
> (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
>
> and also
>
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 6 PID: 1105 at lib/refcount.c:28 refcount_warn_saturate
> (/usr/work/linux/lib/refcount.c:28 (discriminator 1))
>
>
> 2) test scenario:
> sets up a network topology of two VFs on different eSwitch, performs
> ping tests between them, verifies traffic rules offloading, and cleans
> up the environment afterward.
>
> 3) the warning is triggered upon reaching the
> unregister_netdevice_notifier_dev_net when both net->ns.count and
> net->passive reference counts are zero.
It looks unlikely but I missed there is still a race window.
If dev_net_rcu() is called between synchronize_net() and dev_net_set()
in netif_change_net_namespace(), there might be no synchronize_rcu()
called after that and net_passive_dec() could be called in cleanup_net()
earlier than net_passive_inc() ... ?
Could you test this ?
---8<---
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index bd57d8fb54f1..c275e95c83ab 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -337,9 +337,9 @@ static inline void net_passive_dec(struct net *net)
}
#endif
-static inline void net_passive_inc(struct net *net)
+static inline bool net_passive_inc(struct net *net)
{
- refcount_inc(&net->passive);
+ return refcount_inc_not_zero(&net->passive);
}
/* Returns true if the netns initialization is completed successfully */
diff --git a/net/core/dev.c b/net/core/dev.c
index b597cc27a115..baff9568a34f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2087,7 +2087,11 @@ static void rtnl_net_dev_lock(struct net_device *dev)
/* netns might be being dismantled. */
rcu_read_lock();
net = dev_net_rcu(dev);
- net_passive_inc(net);
+ if (!net_passive_inc(net)) {
+ rcu_read_unlock();
+ msleep(1);
+ continue;
+ }
rcu_read_unlock();
rtnl_net_lock(net);
---8<---
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-03-26 21:54 ` Kuniyuki Iwashima
@ 2025-03-26 22:10 ` Eric Dumazet
2025-03-26 22:32 ` Kuniyuki Iwashima
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2025-03-26 22:10 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: ychemla, davem, horms, kuba, kuni1840, netdev, pabeni
On Wed, Mar 26, 2025 at 10:55 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Yael Chemla <ychemla@nvidia.com>
> Date: Wed, 26 Mar 2025 15:46:40 +0200
> > On 17/02/2025 21:11, Kuniyuki Iwashima wrote:
> > > After the cited commit, dev_net(dev) is fetched before holding RTNL
> > > and passed to __unregister_netdevice_notifier_net().
> > >
> > > However, dev_net(dev) might be different after holding RTNL.
> > >
> > > In the reported case [0], while removing a VF device, its netns was
> > > being dismantled and the VF was moved to init_net.
> > >
> > > So the following sequence is basically illegal when dev was fetched
> > > without lookup:
> > >
> > > net = dev_net(dev);
> > > rtnl_net_lock(net);
> > >
> > > Let's use a new helper rtnl_net_dev_lock() to fix the race.
> > >
> > > It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
> > > dev_net_rcu(dev) is changed after rtnl_net_lock().
> > >
> > > [0]:
> > > BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> > > Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl (lib/dump_stack.c:123)
> > > print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
> > > kasan_report (mm/kasan/report.c:604)
> > > notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> > > call_netdevice_notifiers_info (net/core/dev.c:2011)
> > > unregister_netdevice_many_notify (net/core/dev.c:11551)
> > > unregister_netdevice_queue (net/core/dev.c:11487)
> > > unregister_netdev (net/core/dev.c:11635)
> > > mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
> > > auxiliary_bus_remove (drivers/base/auxiliary.c:230)
> > > device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> > > bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
> > > device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
> > > mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
> > > mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
> > > mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
> > > remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
> > > pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
> > > device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> > > unbind_store (drivers/base/bus.c:245)
> > > kernfs_fop_write_iter (fs/kernfs/file.c:338)
> > > vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
> > > ksys_write (fs/read_write.c:732)
> > > do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
> > > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> > > RIP: 0033:0x7f6a4d5018b7
> > >
> > > Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> > > Reported-by: Yael Chemla <ychemla@nvidia.com>
> > > Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > Reviewed-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > v5:
> > > * Use do-while loop
> > >
> > > v4:
> > > * Fix build failure when !CONFIG_NET_NS
> > > * Use net_passive_dec()
> > >
> > > v3:
> > > * Bump net->passive instead of maybe_get_net()
> > > * Remove msleep(1) loop
> > > * Use rcu_access_pointer() instead of rcu_read_lock().
> > >
> > > v2:
> > > * Use dev_net_rcu().
> > > * Use msleep(1) instead of cond_resched() after maybe_get_net()
> > > * Remove cond_resched() after net_eq() check
> > >
> > > v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> > > ---
> > > net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
> > > 1 file changed, 44 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index b91658e8aedb..19e268568282 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> > > __register_netdevice_notifier_net(dst_net, nb, true);
> > > }
> > >
> > > +static void rtnl_net_dev_lock(struct net_device *dev)
> > > +{
> > > + bool again;
> > > +
> > > + do {
> > > + struct net *net;
> > > +
> > > + again = false;
> > > +
> > > + /* netns might be being dismantled. */
> > > + rcu_read_lock();
> > > + net = dev_net_rcu(dev);
> > > + net_passive_inc(net);
> >
> > Hi Kuniyuki,
> > It seems we are still encountering the previously reorted issue,
> > even when running with your latest fix. However, the problem has become
> > less frequent, now requiring multiple test iterations to reproduce.
>
> Thanks for reporting!
>
>
> >
> > 1) we identified the following warnings (each accompanied by a call
> > trace; only one is detailed below, though others are similar):
> >
> > refcount_t: addition on 0; use-after-free.
> > WARNING: CPU: 6 PID: 1105 at lib/refcount.c:25 refcount_warn_saturate
> > (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
> >
> > and also
> >
> > refcount_t: underflow; use-after-free.
> > WARNING: CPU: 6 PID: 1105 at lib/refcount.c:28 refcount_warn_saturate
> > (/usr/work/linux/lib/refcount.c:28 (discriminator 1))
> >
> >
> > 2) test scenario:
> > sets up a network topology of two VFs on different eSwitch, performs
> > ping tests between them, verifies traffic rules offloading, and cleans
> > up the environment afterward.
> >
> > 3) the warning is triggered upon reaching the
> > unregister_netdevice_notifier_dev_net when both net->ns.count and
> > net->passive reference counts are zero.
>
> It looks unlikely but I missed there is still a race window.
>
> If dev_net_rcu() is called between synchronize_net() and dev_net_set()
> in netif_change_net_namespace(), there might be no synchronize_rcu()
> called after that and net_passive_dec() could be called in cleanup_net()
> earlier than net_passive_inc() ... ?
>
> Could you test this ?
>
> ---8<---
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index bd57d8fb54f1..c275e95c83ab 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -337,9 +337,9 @@ static inline void net_passive_dec(struct net *net)
> }
> #endif
>
> -static inline void net_passive_inc(struct net *net)
> +static inline bool net_passive_inc(struct net *net)
> {
> - refcount_inc(&net->passive);
> + return refcount_inc_not_zero(&net->passive);
Hmm
We want the two oher net_passive_inc() callers to have refcount safety.
Please add a new net_passive_inc_not_zero()
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-03-26 22:10 ` Eric Dumazet
@ 2025-03-26 22:32 ` Kuniyuki Iwashima
2025-04-01 20:49 ` Yael Chemla
0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-26 22:32 UTC (permalink / raw)
To: edumazet; +Cc: davem, horms, kuba, kuni1840, kuniyu, netdev, pabeni, ychemla
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 26 Mar 2025 23:10:04 +0100
> On Wed, Mar 26, 2025 at 10:55 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Yael Chemla <ychemla@nvidia.com>
> > Date: Wed, 26 Mar 2025 15:46:40 +0200
> > > On 17/02/2025 21:11, Kuniyuki Iwashima wrote:
> > > > After the cited commit, dev_net(dev) is fetched before holding RTNL
> > > > and passed to __unregister_netdevice_notifier_net().
> > > >
> > > > However, dev_net(dev) might be different after holding RTNL.
> > > >
> > > > In the reported case [0], while removing a VF device, its netns was
> > > > being dismantled and the VF was moved to init_net.
> > > >
> > > > So the following sequence is basically illegal when dev was fetched
> > > > without lookup:
> > > >
> > > > net = dev_net(dev);
> > > > rtnl_net_lock(net);
> > > >
> > > > Let's use a new helper rtnl_net_dev_lock() to fix the race.
> > > >
> > > > It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
> > > > dev_net_rcu(dev) is changed after rtnl_net_lock().
> > > >
> > > > [0]:
> > > > BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> > > > Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > Call Trace:
> > > > <TASK>
> > > > dump_stack_lvl (lib/dump_stack.c:123)
> > > > print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
> > > > kasan_report (mm/kasan/report.c:604)
> > > > notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
> > > > call_netdevice_notifiers_info (net/core/dev.c:2011)
> > > > unregister_netdevice_many_notify (net/core/dev.c:11551)
> > > > unregister_netdevice_queue (net/core/dev.c:11487)
> > > > unregister_netdev (net/core/dev.c:11635)
> > > > mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
> > > > auxiliary_bus_remove (drivers/base/auxiliary.c:230)
> > > > device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> > > > bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
> > > > device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
> > > > mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
> > > > mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
> > > > mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
> > > > remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
> > > > pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
> > > > device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
> > > > unbind_store (drivers/base/bus.c:245)
> > > > kernfs_fop_write_iter (fs/kernfs/file.c:338)
> > > > vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
> > > > ksys_write (fs/read_write.c:732)
> > > > do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
> > > > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> > > > RIP: 0033:0x7f6a4d5018b7
> > > >
> > > > Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> > > > Reported-by: Yael Chemla <ychemla@nvidia.com>
> > > > Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > Reviewed-by: Eric Dumazet <edumazet@google.com>
> > > > ---
> > > > v5:
> > > > * Use do-while loop
> > > >
> > > > v4:
> > > > * Fix build failure when !CONFIG_NET_NS
> > > > * Use net_passive_dec()
> > > >
> > > > v3:
> > > > * Bump net->passive instead of maybe_get_net()
> > > > * Remove msleep(1) loop
> > > > * Use rcu_access_pointer() instead of rcu_read_lock().
> > > >
> > > > v2:
> > > > * Use dev_net_rcu().
> > > > * Use msleep(1) instead of cond_resched() after maybe_get_net()
> > > > * Remove cond_resched() after net_eq() check
> > > >
> > > > v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> > > > ---
> > > > net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
> > > > 1 file changed, 44 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index b91658e8aedb..19e268568282 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> > > > __register_netdevice_notifier_net(dst_net, nb, true);
> > > > }
> > > >
> > > > +static void rtnl_net_dev_lock(struct net_device *dev)
> > > > +{
> > > > + bool again;
> > > > +
> > > > + do {
> > > > + struct net *net;
> > > > +
> > > > + again = false;
> > > > +
> > > > + /* netns might be being dismantled. */
> > > > + rcu_read_lock();
> > > > + net = dev_net_rcu(dev);
> > > > + net_passive_inc(net);
> > >
> > > Hi Kuniyuki,
> > > It seems we are still encountering the previously reorted issue,
> > > even when running with your latest fix. However, the problem has become
> > > less frequent, now requiring multiple test iterations to reproduce.
> >
> > Thanks for reporting!
> >
> >
> > >
> > > 1) we identified the following warnings (each accompanied by a call
> > > trace; only one is detailed below, though others are similar):
> > >
> > > refcount_t: addition on 0; use-after-free.
> > > WARNING: CPU: 6 PID: 1105 at lib/refcount.c:25 refcount_warn_saturate
> > > (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
> > >
> > > and also
> > >
> > > refcount_t: underflow; use-after-free.
> > > WARNING: CPU: 6 PID: 1105 at lib/refcount.c:28 refcount_warn_saturate
> > > (/usr/work/linux/lib/refcount.c:28 (discriminator 1))
> > >
> > >
> > > 2) test scenario:
> > > sets up a network topology of two VFs on different eSwitch, performs
> > > ping tests between them, verifies traffic rules offloading, and cleans
> > > up the environment afterward.
> > >
> > > 3) the warning is triggered upon reaching the
> > > unregister_netdevice_notifier_dev_net when both net->ns.count and
> > > net->passive reference counts are zero.
> >
> > It looks unlikely but I missed there is still a race window.
> >
> > If dev_net_rcu() is called between synchronize_net() and dev_net_set()
> > in netif_change_net_namespace(), there might be no synchronize_rcu()
> > called after that and net_passive_dec() could be called in cleanup_net()
> > earlier than net_passive_inc() ... ?
> >
> > Could you test this ?
> >
> > ---8<---
> > diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> > index bd57d8fb54f1..c275e95c83ab 100644
> > --- a/include/net/net_namespace.h
> > +++ b/include/net/net_namespace.h
> > @@ -337,9 +337,9 @@ static inline void net_passive_dec(struct net *net)
> > }
> > #endif
> >
> > -static inline void net_passive_inc(struct net *net)
> > +static inline bool net_passive_inc(struct net *net)
> > {
> > - refcount_inc(&net->passive);
> > + return refcount_inc_not_zero(&net->passive);
>
> Hmm
>
> We want the two oher net_passive_inc() callers to have refcount safety.
>
> Please add a new net_passive_inc_not_zero()
Sure, will do.
Also, I'll remove the weird do-while {} and do everything under
ifdef CONFIG_NET_NS guard. If it's disabled, just calling
rtnl_net_lock() for &init_net is fine.
---8<---
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index bd57d8fb54f1..c91b13e6fb12 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -342,6 +342,11 @@ static inline void net_passive_inc(struct net *net)
refcount_inc(&net->passive);
}
+static inline bool net_passive_inc_not_zero(struct net *net)
+{
+ return refcount_inc_not_zero(&net->passive);
+}
+
/* Returns true if the netns initialization is completed successfully */
static inline bool net_initialized(const struct net *net)
{
diff --git a/net/core/dev.c b/net/core/dev.c
index b597cc27a115..408659d238b8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2077,38 +2077,43 @@ static void __move_netdevice_notifier_net(struct net *src_net,
static void rtnl_net_dev_lock(struct net_device *dev)
{
- bool again;
-
- do {
- struct net *net;
-
- again = false;
+#ifdef CONFIG_NET_NS
+ struct net *net;
- /* netns might be being dismantled. */
- rcu_read_lock();
- net = dev_net_rcu(dev);
- net_passive_inc(net);
+again:
+ /* netns might be being dismantled. */
+ rcu_read_lock();
+ net = dev_net_rcu(dev);
+ if (!net_passive_inc_not_zero(net)) {
rcu_read_unlock();
+ msleep(1);
+ goto again;
+ }
+ rcu_read_unlock();
- rtnl_net_lock(net);
+ rtnl_net_lock(net);
-#ifdef CONFIG_NET_NS
- /* dev might have been moved to another netns. */
- if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
- rtnl_net_unlock(net);
- net_passive_dec(net);
- again = true;
- }
+ /* dev might have been moved to another netns. */
+ if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
+ rtnl_net_unlock(net);
+ net_passive_dec(net);
+ goto again;
+ }
+#else
+ rtnl_net_lock(&init_net);
#endif
- } while (again);
}
static void rtnl_net_dev_unlock(struct net_device *dev)
{
+#ifdef CONFIG_NET_NS
struct net *net = dev_net(dev);
rtnl_net_unlock(net);
net_passive_dec(net);
+#else
+ rtnl_net_unlock(&init_net);
+#endif
}
int register_netdevice_notifier_dev_net(struct net_device *dev,
---8<---
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-03-26 22:32 ` Kuniyuki Iwashima
@ 2025-04-01 20:49 ` Yael Chemla
2025-04-01 21:58 ` Kuniyuki Iwashima
0 siblings, 1 reply; 13+ messages in thread
From: Yael Chemla @ 2025-04-01 20:49 UTC (permalink / raw)
To: Kuniyuki Iwashima, edumazet; +Cc: davem, horms, kuba, kuni1840, netdev, pabeni
On 27/03/2025 0:32, Kuniyuki Iwashima wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Wed, 26 Mar 2025 23:10:04 +0100
>> On Wed, Mar 26, 2025 at 10:55 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>>
>>> From: Yael Chemla <ychemla@nvidia.com>
>>> Date: Wed, 26 Mar 2025 15:46:40 +0200
>>>> On 17/02/2025 21:11, Kuniyuki Iwashima wrote:
>>>>> After the cited commit, dev_net(dev) is fetched before holding RTNL
>>>>> and passed to __unregister_netdevice_notifier_net().
>>>>>
>>>>> However, dev_net(dev) might be different after holding RTNL.
>>>>>
>>>>> In the reported case [0], while removing a VF device, its netns was
>>>>> being dismantled and the VF was moved to init_net.
>>>>>
>>>>> So the following sequence is basically illegal when dev was fetched
>>>>> without lookup:
>>>>>
>>>>> net = dev_net(dev);
>>>>> rtnl_net_lock(net);
>>>>>
>>>>> Let's use a new helper rtnl_net_dev_lock() to fix the race.
>>>>>
>>>>> It fetches dev_net_rcu(dev), bumps its net->passive, and checks if
>>>>> dev_net_rcu(dev) is changed after rtnl_net_lock().
>>>>>
>>>>> [0]:
>>>>> BUG: KASAN: slab-use-after-free in notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
>>>>> Read of size 8 at addr ffff88810cefb4c8 by task test-bridge-lag/21127
>>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>>>>> Call Trace:
>>>>> <TASK>
>>>>> dump_stack_lvl (lib/dump_stack.c:123)
>>>>> print_report (mm/kasan/report.c:379 mm/kasan/report.c:489)
>>>>> kasan_report (mm/kasan/report.c:604)
>>>>> notifier_call_chain (kernel/notifier.c:75 (discriminator 2))
>>>>> call_netdevice_notifiers_info (net/core/dev.c:2011)
>>>>> unregister_netdevice_many_notify (net/core/dev.c:11551)
>>>>> unregister_netdevice_queue (net/core/dev.c:11487)
>>>>> unregister_netdev (net/core/dev.c:11635)
>>>>> mlx5e_remove (drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6552 drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6579) mlx5_core
>>>>> auxiliary_bus_remove (drivers/base/auxiliary.c:230)
>>>>> device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
>>>>> bus_remove_device (./include/linux/kobject.h:193 drivers/base/base.h:73 drivers/base/bus.c:583)
>>>>> device_del (drivers/base/power/power.h:142 drivers/base/core.c:3855)
>>>>> mlx5_rescan_drivers_locked (./include/linux/auxiliary_bus.h:241 drivers/net/ethernet/mellanox/mlx5/core/dev.c:333 drivers/net/ethernet/mellanox/mlx5/core/dev.c:535 drivers/net/ethernet/mellanox/mlx5/core/dev.c:549) mlx5_core
>>>>> mlx5_unregister_device (drivers/net/ethernet/mellanox/mlx5/core/dev.c:468) mlx5_core
>>>>> mlx5_uninit_one (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 drivers/net/ethernet/mellanox/mlx5/core/main.c:1563) mlx5_core
>>>>> remove_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:965 drivers/net/ethernet/mellanox/mlx5/core/main.c:2019) mlx5_core
>>>>> pci_device_remove (./include/linux/pm_runtime.h:129 drivers/pci/pci-driver.c:475)
>>>>> device_release_driver_internal (drivers/base/dd.c:1275 drivers/base/dd.c:1296)
>>>>> unbind_store (drivers/base/bus.c:245)
>>>>> kernfs_fop_write_iter (fs/kernfs/file.c:338)
>>>>> vfs_write (fs/read_write.c:587 (discriminator 1) fs/read_write.c:679 (discriminator 1))
>>>>> ksys_write (fs/read_write.c:732)
>>>>> do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
>>>>> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>>>>> RIP: 0033:0x7f6a4d5018b7
>>>>>
>>>>> Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
>>>>> Reported-by: Yael Chemla <ychemla@nvidia.com>
>>>>> Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
>>>>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>>>>> Reviewed-by: Eric Dumazet <edumazet@google.com>
>>>>> ---
>>>>> v5:
>>>>> * Use do-while loop
>>>>>
>>>>> v4:
>>>>> * Fix build failure when !CONFIG_NET_NS
>>>>> * Use net_passive_dec()
>>>>>
>>>>> v3:
>>>>> * Bump net->passive instead of maybe_get_net()
>>>>> * Remove msleep(1) loop
>>>>> * Use rcu_access_pointer() instead of rcu_read_lock().
>>>>>
>>>>> v2:
>>>>> * Use dev_net_rcu().
>>>>> * Use msleep(1) instead of cond_resched() after maybe_get_net()
>>>>> * Remove cond_resched() after net_eq() check
>>>>>
>>>>> v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
>>>>> ---
>>>>> net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>>>>> 1 file changed, 44 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index b91658e8aedb..19e268568282 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -2070,6 +2070,42 @@ static void __move_netdevice_notifier_net(struct net *src_net,
>>>>> __register_netdevice_notifier_net(dst_net, nb, true);
>>>>> }
>>>>>
>>>>> +static void rtnl_net_dev_lock(struct net_device *dev)
>>>>> +{
>>>>> + bool again;
>>>>> +
>>>>> + do {
>>>>> + struct net *net;
>>>>> +
>>>>> + again = false;
>>>>> +
>>>>> + /* netns might be being dismantled. */
>>>>> + rcu_read_lock();
>>>>> + net = dev_net_rcu(dev);
>>>>> + net_passive_inc(net);
>>>>
>>>> Hi Kuniyuki,
>>>> It seems we are still encountering the previously reorted issue,
>>>> even when running with your latest fix. However, the problem has become
>>>> less frequent, now requiring multiple test iterations to reproduce.
>>>
>>> Thanks for reporting!
>>>
>>>
>>>>
>>>> 1) we identified the following warnings (each accompanied by a call
>>>> trace; only one is detailed below, though others are similar):
>>>>
>>>> refcount_t: addition on 0; use-after-free.
>>>> WARNING: CPU: 6 PID: 1105 at lib/refcount.c:25 refcount_warn_saturate
>>>> (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
>>>>
>>>> and also
>>>>
>>>> refcount_t: underflow; use-after-free.
>>>> WARNING: CPU: 6 PID: 1105 at lib/refcount.c:28 refcount_warn_saturate
>>>> (/usr/work/linux/lib/refcount.c:28 (discriminator 1))
>>>>
>>>>
>>>> 2) test scenario:
>>>> sets up a network topology of two VFs on different eSwitch, performs
>>>> ping tests between them, verifies traffic rules offloading, and cleans
>>>> up the environment afterward.
>>>>
>>>> 3) the warning is triggered upon reaching the
>>>> unregister_netdevice_notifier_dev_net when both net->ns.count and
>>>> net->passive reference counts are zero.
>>>
>>> It looks unlikely but I missed there is still a race window.
>>>
>>> If dev_net_rcu() is called between synchronize_net() and dev_net_set()
>>> in netif_change_net_namespace(), there might be no synchronize_rcu()
>>> called after that and net_passive_dec() could be called in cleanup_net()
>>> earlier than net_passive_inc() ... ?
>>>
>>> Could you test this ?
>>>
>>> ---8<---
>>> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
>>> index bd57d8fb54f1..c275e95c83ab 100644
>>> --- a/include/net/net_namespace.h
>>> +++ b/include/net/net_namespace.h
>>> @@ -337,9 +337,9 @@ static inline void net_passive_dec(struct net *net)
>>> }
>>> #endif
>>>
>>> -static inline void net_passive_inc(struct net *net)
>>> +static inline bool net_passive_inc(struct net *net)
>>> {
>>> - refcount_inc(&net->passive);
>>> + return refcount_inc_not_zero(&net->passive);
>>
>> Hmm
>>
>> We want the two oher net_passive_inc() callers to have refcount safety.
>>
>> Please add a new net_passive_inc_not_zero()
>
> Sure, will do.
>
> Also, I'll remove the weird do-while {} and do everything under
> ifdef CONFIG_NET_NS guard. If it's disabled, just calling
> rtnl_net_lock() for &init_net is fine.
>
> ---8<---
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index bd57d8fb54f1..c91b13e6fb12 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -342,6 +342,11 @@ static inline void net_passive_inc(struct net *net)
> refcount_inc(&net->passive);
> }
>
> +static inline bool net_passive_inc_not_zero(struct net *net)
> +{
> + return refcount_inc_not_zero(&net->passive);
> +}
> +
> /* Returns true if the netns initialization is completed successfully */
> static inline bool net_initialized(const struct net *net)
> {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b597cc27a115..408659d238b8 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2077,38 +2077,43 @@ static void __move_netdevice_notifier_net(struct net *src_net,
>
> static void rtnl_net_dev_lock(struct net_device *dev)
> {
> - bool again;
> -
> - do {
> - struct net *net;
> -
> - again = false;
> +#ifdef CONFIG_NET_NS
> + struct net *net;
>
> - /* netns might be being dismantled. */
> - rcu_read_lock();
> - net = dev_net_rcu(dev);
> - net_passive_inc(net);
> +again:
> + /* netns might be being dismantled. */
> + rcu_read_lock();
> + net = dev_net_rcu(dev);
> + if (!net_passive_inc_not_zero(net)) {
> rcu_read_unlock();
> + msleep(1);
> + goto again;
> + }
> + rcu_read_unlock();
>
> - rtnl_net_lock(net);
> + rtnl_net_lock(net);
>
> -#ifdef CONFIG_NET_NS
> - /* dev might have been moved to another netns. */
> - if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
> - rtnl_net_unlock(net);
> - net_passive_dec(net);
> - again = true;
> - }
> + /* dev might have been moved to another netns. */
> + if (!net_eq(net, rcu_access_pointer(dev->nd_net.net))) {
> + rtnl_net_unlock(net);
> + net_passive_dec(net);
> + goto again;
> + }
> +#else
> + rtnl_net_lock(&init_net);
> #endif
> - } while (again);
> }
>
> static void rtnl_net_dev_unlock(struct net_device *dev)
> {
> +#ifdef CONFIG_NET_NS
> struct net *net = dev_net(dev);
>
> rtnl_net_unlock(net);
> net_passive_dec(net);
> +#else
> + rtnl_net_unlock(&init_net);
> +#endif
> }
>
> int register_netdevice_notifier_dev_net(struct net_device *dev,
> ---8<---
Hi Kuniyuki,
Sorry for the delay (I was OOO). I tested your patch, and while the race
occurs much less frequently, it still happens—see the warnings and call
traces below.
Additionally, in some cases, the test which reproduce the race hang.
Debugging shows that we're stuck in an endless loop inside
rtnl_net_dev_lock because the passive refcount is already zero, causing
net_passive_inc_not_zero to return false, thus it go to "again" and this
repeats without ending.
I suspect, as you mentioned before, that in such cases, the passive
refcount was decreased from cleanup_net.
warnings and call traces:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 4 PID: 27219 at lib/refcount.c:25 refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:25 (discriminator 1))
Call Trace:
<TASK>
? __warn (/usr/work/linux/kernel/panic.c:746)
? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
? report_bug (/usr/work/linux/lib/bug.c:207)
? handle_bug (/usr/work/linux/arch/x86/kernel/traps.c:285)
? exc_invalid_op (/usr/work/linux/arch/x86/kernel/traps.c:309
(discriminator 1))
? asm_exc_invalid_op (/usr/work/linux/./arch/x86/include/asm/idtentry.h:574)
? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
? refcount_warn_saturate (/usr/work/linux/lib/refcount.c:25
(discriminator 1))
rtnl_net_dev_lock (/usr/work/linux/net/core/dev.c:2091)
unregister_netdevice_notifier_dev_net
(/usr/work/linux/./include/linux/list.h:218
/usr/work/linux/./include/linux/list.h:229
/usr/work/linux/net/core/dev.c:2130)
mlx5e_tc_nic_cleanup
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:5355)
mlx5_core
mlx5e_cleanup_nic_rx
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:5765
(discriminator 1)) mlx5_core
mlx5e_detach_netdev
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6391)
mlx5_core
_mlx5e_suspend
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6403)
mlx5_core
mlx5e_remove
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6431
(discriminator 1)
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6507
(discriminator 1)) mlx5_core
auxiliary_bus_remove (/usr/work/linux/drivers/base/auxiliary.c:230)
device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1273
/usr/work/linux/drivers/base/dd.c:1296)
bus_remove_device (/usr/work/linux/./include/linux/kobject.h:193
/usr/work/linux/drivers/base/base.h:73
/usr/work/linux/drivers/base/bus.c:586)
device_del (/usr/work/linux/drivers/base/power/power.h:142
/usr/work/linux/drivers/base/core.c:3856)
? devl_param_driverinit_value_get (/usr/work/linux/net/devlink/param.c:778)
mlx5_rescan_drivers_locked
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:550)
mlx5_core
mlx5_unregister_device
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:448)
mlx5_core
mlx5_uninit_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:1587)
mlx5_core
remove_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:281
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:973
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:2034)
mlx5_core
pci_device_remove (/usr/work/linux/./arch/x86/include/asm/atomic.h:23
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:457
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2426
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2456
/usr/work/linux/./include/linux/atomic/atomic-instrumented.h:1518
/usr/work/linux/./include/linux/pm_runtime.h:129
/usr/work/linux/drivers/pci/pci-driver.c:475)
device_release_driver_internal (/usr/work/linux/drivers/base/dd.c:1273
/usr/work/linux/drivers/base/dd.c:1296)
pci_stop_bus_device (/usr/work/linux/drivers/pci/remove.c:46
/usr/work/linux/drivers/pci/remove.c:106)
pci_stop_and_remove_bus_device (/usr/work/linux/drivers/pci/remove.c:141)
pci_iov_remove_virtfn (/usr/work/linux/drivers/pci/iov.c:377)
sriov_disable (/usr/work/linux/drivers/pci/iov.c:712 (discriminator 1)
/usr/work/linux/drivers/pci/iov.c:723 (discriminator 1))
mlx5_sriov_disable
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:209)
mlx5_core
mlx5_core_sriov_configure
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:195
(discriminator 1)
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:222
(discriminator 1)) mlx5_core
sriov_numvfs_store (/usr/work/linux/drivers/pci/iov.c:450)
kernfs_fop_write_iter (/usr/work/linux/fs/kernfs/file.c:334)
vfs_write (/usr/work/linux/./include/linux/rcu_sync.h:36 (discriminator
2) /usr/work/linux/./include/linux/percpu-rwsem.h:62 (discriminator 2)
/usr/work/linux/./include/linux/fs.h:1785 (discriminator 2)
/usr/work/linux/./include/linux/fs.h:1921 (discriminator 2)
/usr/work/linux/./include/linux/fs.h:3035 (discriminator 2)
/usr/work/linux/fs/read_write.c:675 (discriminator 2))
ksys_write (/usr/work/linux/fs/read_write.c:732)
do_syscall_64 (/usr/work/linux/arch/x86/entry/common.c:52 (discriminator
1) /usr/work/linux/arch/x86/entry/common.c:83 (discriminator 1))
entry_SYSCALL_64_after_hwframe
(/usr/work/linux/arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7fb7b6f018b7
and another warning:
refcount_t: underflow; use-after-free.
[ 3860.462923] WARNING: CPU: 4 PID: 27219 at lib/refcount.c:28
refcount_warn_saturate (/usr/work/linux/lib/refcount.c:28 (discriminator
1)).
[ 3860.479887] Call Trace:
[ 3860.480199] <TASK>
[ 3860.480478] ? __warn (/usr/work/linux/kernel/panic.c:746)
[ 3860.480849] ? refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:28 (discriminator 1))
[ 3860.481327] ? report_bug (/usr/work/linux/lib/bug.c:207)
[ 3860.481797] ? handle_bug (/usr/work/linux/arch/x86/kernel/traps.c:285)
[ 3860.482190] ? exc_invalid_op
(/usr/work/linux/arch/x86/kernel/traps.c:309 (discriminator 1))
[ 3860.482610] ? asm_exc_invalid_op
(/usr/work/linux/./arch/x86/include/asm/idtentry.h:574)
[ 3860.483057] ? refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:28 (discriminator 1))
[ 3860.483535] ? refcount_warn_saturate
(/usr/work/linux/lib/refcount.c:28 (discriminator 1))
[ 3860.484007] unregister_netdevice_notifier_dev_net
(/usr/work/linux/net/core/dev.c:2135)
[ 3860.484569] mlx5e_tc_nic_cleanup
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:5355)
mlx5_core
[ 3860.485148] mlx5e_cleanup_nic_rx
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:5765
(discriminator 1)) mlx5_core
[ 3860.485720] mlx5e_detach_netdev
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6391)
mlx5_core
[ 3860.486337] _mlx5e_suspend
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6403)
mlx5_core
[ 3860.486853] mlx5e_remove
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6431
(discriminator 1)
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/en_main.c:6507
(discriminator 1)) mlx5_core
[ 3860.487377] auxiliary_bus_remove
(/usr/work/linux/drivers/base/auxiliary.c:230)
[ 3860.487824] device_release_driver_internal
(/usr/work/linux/drivers/base/dd.c:1273
/usr/work/linux/drivers/base/dd.c:1296)
[ 3860.488347] bus_remove_device
(/usr/work/linux/./include/linux/kobject.h:193
/usr/work/linux/drivers/base/base.h:73
/usr/work/linux/drivers/base/bus.c:586)
[ 3860.488776] device_del
(/usr/work/linux/drivers/base/power/power.h:142
/usr/work/linux/drivers/base/core.c:3856)
[ 3860.489167] ? devl_param_driverinit_value_get
(/usr/work/linux/net/devlink/param.c:778)
[ 3860.489717] mlx5_rescan_drivers_locked
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:550)
mlx5_core
[ 3860.490376] mlx5_unregister_device
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/dev.c:448)
mlx5_core
[ 3860.490937] mlx5_uninit_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:1587)
mlx5_core
[ 3860.491471] remove_one
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:281
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:973
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c:2034)
mlx5_core
[ 3860.491948] pci_device_remove
(/usr/work/linux/./arch/x86/include/asm/atomic.h:23
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:457
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.h:2426
/usr/work/linux/./include/linux/atomic/atomic-arch-fallback.
h:2456 /usr/work/linux/./include/linux/atomic/atomic-instrumented.h:1518
/usr/work/linux/./include/linux/pm_runtime.h:129
/usr/work/linux/drivers/pci/pci-driver.c:475)
[ 3860.492378] device_release_driver_internal
(/usr/work/linux/drivers/base/dd.c:1273
/usr/work/linux/drivers/base/dd.c:1296)
[ 3860.492908] pci_stop_bus_device
(/usr/work/linux/drivers/pci/remove.c:46
/usr/work/linux/drivers/pci/remove.c:106)
[ 3860.493344] pci_stop_and_remove_bus_device
(/usr/work/linux/drivers/pci/remove.c:141)
[ 3860.493918] pci_iov_remove_virtfn (/usr/work/linux/drivers/pci/iov.c:377)
[ 3860.494386] sriov_disable (/usr/work/linux/drivers/pci/iov.c:712
(discriminator 1) /usr/work/linux/drivers/pci/iov.c:723 (discriminator 1))
[ 3860.494786] mlx5_sriov_disable
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:209)
mlx5_core
[ 3860.495327] mlx5_core_sriov_configure
(/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:195
(discriminator 1)
/usr/work/linux/drivers/net/ethernet/mellanox/mlx5/core/sriov.c:222
(discriminator 1)) mlx5_core
[ 3860.495919] sriov_numvfs_store (/usr/work/linux/drivers/pci/iov.c:450)
[ 3860.496358] kernfs_fop_write_iter (/usr/work/linux/fs/kernfs/file.c:334)
[ 3860.496819] vfs_write (/usr/work/linux/./include/linux/rcu_sync.h:36
(discriminator 2) /usr/work/linux/./include/linux/percpu-rwsem.h:62
(discriminator 2) /usr/work/linux/./include/linux/fs.h:1785
(discriminator 2) /usr/work/linux/./include/linux/fs.h:1921
(discriminator 2)
/usr/work/linux/./include/linux/fs.h:3035 (discriminator 2)
/usr/work/linux/fs/read_write.c:675 (discriminator 2))
[ 3860.497201] ksys_write (/usr/work/linux/fs/read_write.c:732)
[ 3860.497598] do_syscall_64 (/usr/work/linux/arch/x86/entry/common.c:52
(discriminator 1) /usr/work/linux/arch/x86/entry/common.c:83
(discriminator 1))
[ 3860.498070] entry_SYSCALL_64_after_hwframe
(/usr/work/linux/arch/x86/entry/entry_64.S:130)
[ 3860.498580] RIP: 0033:0x7fb7b6f018b7
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-04-01 20:49 ` Yael Chemla
@ 2025-04-01 21:58 ` Kuniyuki Iwashima
2025-04-06 15:37 ` Yael Chemla
0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-01 21:58 UTC (permalink / raw)
To: ychemla; +Cc: davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev, pabeni
Hi Yael,
Thanks for testing!
From: Yael Chemla <ychemla@nvidia.com>
Date: Tue, 1 Apr 2025 23:49:42 +0300
> Hi Kuniyuki,
> Sorry for the delay (I was OOO). I tested your patch, and while the race
> occurs much less frequently, it still happens—see the warnings and call
> traces below.
> Additionally, in some cases, the test which reproduce the race hang.
> Debugging shows that we're stuck in an endless loop inside
> rtnl_net_dev_lock because the passive refcount is already zero, causing
> net_passive_inc_not_zero to return false, thus it go to "again" and this
> repeats without ending.
> I suspect, as you mentioned before, that in such cases, the passive
> refcount was decreased from cleanup_net.
This sounds weird.
We assumed vif will be moved to init_net, then the infinite loop
should never happen.
So the assumption was wrong and vif belonged to the dead netns and
was not moved to init_net ... ??
Even if dev_change_net_namespace() fails, it leads to BUG().
>
>
> warnings and call traces:
>
> refcount_t: addition on 0; use-after-free.
I guess this is from the old log or the test patch was not applied
because _inc_not_zero() will trigger REFCOUNT_ADD_NOT_ZERO_OVF and
then the message will be
refcount_t: saturated; leaking memory
, see __refcount_add_not_zero() and refcount_warn_saturate().
> WARNING: CPU: 4 PID: 27219 at lib/refcount.c:25 refcount_warn_saturate
> (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-04-01 21:58 ` Kuniyuki Iwashima
@ 2025-04-06 15:37 ` Yael Chemla
2025-04-23 11:47 ` Yael Chemla
0 siblings, 1 reply; 13+ messages in thread
From: Yael Chemla @ 2025-04-06 15:37 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, edumazet, horms, kuba, kuni1840, netdev, pabeni
On 02/04/2025 0:58, Kuniyuki Iwashima wrote:
> Hi Yael,
>
> Thanks for testing!
>
> From: Yael Chemla <ychemla@nvidia.com>
> Date: Tue, 1 Apr 2025 23:49:42 +0300
>> Hi Kuniyuki,
>> Sorry for the delay (I was OOO). I tested your patch, and while the race
>> occurs much less frequently, it still happens—see the warnings and call
>> traces below.
>> Additionally, in some cases, the test which reproduce the race hang.
>> Debugging shows that we're stuck in an endless loop inside
>> rtnl_net_dev_lock because the passive refcount is already zero, causing
>> net_passive_inc_not_zero to return false, thus it go to "again" and this
>> repeats without ending.
>> I suspect, as you mentioned before, that in such cases, the passive
>> refcount was decreased from cleanup_net.
>
> This sounds weird.
>
> We assumed vif will be moved to init_net, then the infinite loop
> should never happen.
>
> So the assumption was wrong and vif belonged to the dead netns and
> was not moved to init_net ... ??
>
> Even if dev_change_net_namespace() fails, it leads to BUG().
>
Hi Kuniyuki,
In failure scenarios, we observe that cleanup_net is invoked, followed
by net_passive_dec, which reduces the passive refcount to zero. These
are called before the call to unregister_netdevice_notifier_dev_net.
During the test, dev_change_net_namespace is called once, but it
operates on different net_device poiner than the one passed to final
call of unregister_netdevice_notifier_dev_net, a call which enter
infinite loop (with net->passive=0 and net->ns.count=0, inside
rtnl_net_dev_lock, as explained in previous mail).
Do you need additional debug information, perhaps specific details
regarding reassigning the netns to init_net? Please let me know how I
can help further.
>>
>>
>> warnings and call traces:
>>
>> refcount_t: addition on 0; use-after-free.
>
> I guess this is from the old log or the test patch was not applied
> because _inc_not_zero() will trigger REFCOUNT_ADD_NOT_ZERO_OVF and
> then the message will be
>
> refcount_t: saturated; leaking memory
>
> , see __refcount_add_not_zero() and refcount_warn_saturate().
>
you are right it's a mistake, i was unable to reproduce another failure
with call trace info. Test either succeeds or hang (infinite loop).
>
>> WARNING: CPU: 4 PID: 27219 at lib/refcount.c:25 refcount_warn_saturate
>> (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
2025-04-06 15:37 ` Yael Chemla
@ 2025-04-23 11:47 ` Yael Chemla
0 siblings, 0 replies; 13+ messages in thread
From: Yael Chemla @ 2025-04-23 11:47 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, edumazet, horms, kuba, kuni1840, netdev, pabeni
On 06/04/2025 18:37, Yael Chemla wrote:
> On 02/04/2025 0:58, Kuniyuki Iwashima wrote:
>> Hi Yael,
>>
>> Thanks for testing!
>>
>> From: Yael Chemla <ychemla@nvidia.com>
>> Date: Tue, 1 Apr 2025 23:49:42 +0300
>>> Hi Kuniyuki,
>>> Sorry for the delay (I was OOO). I tested your patch, and while the race
>>> occurs much less frequently, it still happens—see the warnings and call
>>> traces below.
>>> Additionally, in some cases, the test which reproduce the race hang.
>>> Debugging shows that we're stuck in an endless loop inside
>>> rtnl_net_dev_lock because the passive refcount is already zero, causing
>>> net_passive_inc_not_zero to return false, thus it go to "again" and this
>>> repeats without ending.
>>> I suspect, as you mentioned before, that in such cases, the passive
>>> refcount was decreased from cleanup_net.
>>
>> This sounds weird.
>>
>> We assumed vif will be moved to init_net, then the infinite loop
>> should never happen.
>>
>> So the assumption was wrong and vif belonged to the dead netns and
>> was not moved to init_net ... ??
>>
>> Even if dev_change_net_namespace() fails, it leads to BUG().
>>
>
> Hi Kuniyuki,
> In failure scenarios, we observe that cleanup_net is invoked, followed
> by net_passive_dec, which reduces the passive refcount to zero. These
> are called before the call to unregister_netdevice_notifier_dev_net.
>
> During the test, dev_change_net_namespace is called once, but it
> operates on different net_device poiner than the one passed to final
> call of unregister_netdevice_notifier_dev_net, a call which enter
> infinite loop (with net->passive=0 and net->ns.count=0, inside
> rtnl_net_dev_lock, as explained in previous mail).
>
> Do you need additional debug information, perhaps specific details
> regarding reassigning the netns to init_net? Please let me know how I
> can help further.
>
Hi Kuniyuki,
any updates on this?
thanks,
Yael
>>>
>>>
>>> warnings and call traces:
>>>
>>> refcount_t: addition on 0; use-after-free.
>>
>> I guess this is from the old log or the test patch was not applied
>> because _inc_not_zero() will trigger REFCOUNT_ADD_NOT_ZERO_OVF and
>> then the message will be
>>
>> refcount_t: saturated; leaking memory
>>
>> , see __refcount_add_not_zero() and refcount_warn_saturate().
>>
>
> you are right it's a mistake, i was unable to reproduce another failure
> with call trace info. Test either succeeds or hang (infinite loop).
>
>>
>>> WARNING: CPU: 4 PID: 27219 at lib/refcount.c:25 refcount_warn_saturate
>>> (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-04-23 11:48 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 19:11 [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 1/3] net: Add net_passive_inc() and net_passive_dec() Kuniyuki Iwashima
2025-02-17 19:11 ` [PATCH v5 net 2/3] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net() Kuniyuki Iwashima
2025-03-26 13:46 ` Yael Chemla
2025-03-26 21:54 ` Kuniyuki Iwashima
2025-03-26 22:10 ` Eric Dumazet
2025-03-26 22:32 ` Kuniyuki Iwashima
2025-04-01 20:49 ` Yael Chemla
2025-04-01 21:58 ` Kuniyuki Iwashima
2025-04-06 15:37 ` Yael Chemla
2025-04-23 11:47 ` Yael Chemla
2025-02-17 19:11 ` [PATCH v5 net 3/3] dev: Use rtnl_net_dev_lock() in unregister_netdev() Kuniyuki Iwashima
2025-02-19 2:50 ` [PATCH v5 net 0/3] net: Fix race of rtnl_net_lock(dev_net(dev)) patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).