* [PATCH v1 net-next 00/14] net: Support per-netns device unregistration
@ 2026-07-01 21:41 Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink() Kuniyuki Iwashima
` (14 more replies)
0 siblings, 15 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
The biggest blocker to per-netns RTNL is netdev unregistration.
It starts within a single netns, but it can eventually involve
multiple namespaces.
There are three types of such cross-netns devices:
1. Paired devices (e.g., netkit, veth, vxcan)
-> Unregistering one device also deletes its peer, which
may reside in another netns.
2. Tunnel devices (e.g., bareudp, geneve, etc)
-> Destroying a netns removes devices in another netns if
their backend sockets reside in the dying netns
3. Stacked devices (e.g., ipvlan, macvlan, etc)
-> Removing the lower device also removes multiple upper
devices, each of which may reside in different namespaces.
While the first two device types require at most two rtnl_net_lock()s,
the stacked type has no upper limit. This makes it impossible to
freeze all necessary namespaces in advance.
This series introduces per-netns work, initially suggested at
NetConf 2024, to delegate the unregistration of such cross-netns
devices.
https://netdev.bots.linux.dev/netconf/2024/kuniyu.pdf#page=62
The first half of the series wraps NETDEV_UNREGISTER (in core) with
per-netns RTNL, adds a helper for per-netns device unregistration,
and forces per-netns device unregistration in the core code when
CONFIG_DEBUG_NET_SMALL_RTNL=y.
The latter half picks out one from each type (veth, bareudp, ipvlan)
and converts them to support per-netns device unregistration,
although the operations are **still serialised under RTNL** for now.
Please note that this series focuses only on the device unregistration
paths. For example, there are ASSERT_RTNL() left in other paths, and
Sashiko may point it out, but they are out of scope.
This is just the first step, and we need more incremental changes to
completely remove RTNL anyway.
Now, we can see that unregistering a lower device (veth0 below)
removes upper devices (ipvl2, ipvl3) in different namespaces using
per-netns work with a different PID. The lower device (veth0) is
freed only after all upper ipvlan devices have called netdev_put()
in ipvlan_uninit().
# ip netns add ns1
# ip netns add ns2
# ip netns add ns3
# ip -n ns1 link add veth0 type veth peer veth1
# ip -n ns2 link add ipvl2 link veth0 link-netns ns1 type ipvlan mode l2
# ip -n ns3 link add ipvl3 link veth0 link-netns ns1 type ipvlan mode l2
# ip -n ns1 link del veth0
# bpftrace -e '#include <linux/netdevice.h>
kprobe:ipvlan_uninit,
kprobe:veth_dellink,
kprobe:free_netdev {
$dev = (struct net_device *)arg0;
printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
}'
PID: 2010 | DEV: veth0
veth_dellink+5
rtnl_dellink+1213
rtnetlink_rcv_msg+1791
...
PID: 440 | DEV: ipvl2
ipvlan_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
...
PID: 440 | DEV: ipvl2
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
PID: 440 | DEV: ipvl3
ipvlan_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
process_scheduled_works+2538
...
PID: 2010 | DEV: veth0
free_netdev+5
netdev_run_todo+4798
rtnl_dellink+1507
rtnetlink_rcv_msg+1791
...
PID: 440 | DEV: ipvl3
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
Kuniyuki Iwashima (14):
rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
rtnetlink: Call unregister_netdevice_many() only once in
rtnl_link_unregister().
rtnetlink: Add per-netns rtnl_work.
net: Wrap default_device_exit_net() with __rtnl_net_lock().
net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
net: Add per-netns netdev unregistration infra.
net: Call unregister_netdevice_many() per netns.
veth: Support per-netns device unregistration.
bareudp: Protect bareudp_list with mutex.
bareudp: Support per-netns netdev unregistration.
ipvlan: Convert ipvl_port.count to refcount_t.
ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same
lower dev.
ipvlan: Protect ipvl_port.ipvlans with mutex.
ipvlan: Support per-netns netdev unregistration.
drivers/net/bareudp.c | 43 ++++++++-
drivers/net/ipvlan/ipvlan.h | 18 +++-
drivers/net/ipvlan/ipvlan_main.c | 153 +++++++++++++++++++++++++------
drivers/net/ipvlan/ipvtap.c | 16 ++--
drivers/net/veth.c | 34 ++++---
include/linux/netdevice.h | 22 +++++
include/linux/rtnetlink.h | 8 ++
include/net/net_namespace.h | 3 +
net/core/dev.c | 129 +++++++++++++++++++++++++-
net/core/net_namespace.c | 4 +
net/core/rtnetlink.c | 57 ++++++++++--
11 files changed, 418 insertions(+), 69 deletions(-)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister() Kuniyuki Iwashima
` (13 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
There are a few cases where rtnl_net_lock() is not properly
held in rtnl_newlink().
When either of IFLA_NET_NS_PID / IFLA_NET_NS_FD / IFLA_TARGET_NETNSID
is specified but IFLA_LINK_NETNSID is not, sock_net(skb->sk) is used
as link_net in rtnl_newlink_link_net().
In addition, the do_setlink() path uses sock_net(skb->sk) and one
from the three netns attributes while rtnl_link_get_net_capable()
returns only one of four.
Let's add sock_net(skb->sk) to rtnl_nets in rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
No fixes tag is needed since there is no real bug nor assertion.
---
net/core/rtnetlink.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 12aa3aa1688b..f39c93e80e20 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -282,10 +282,11 @@ static int rtnl_net_cmp_locks(const struct net *net_a, const struct net *net_b)
#endif
struct rtnl_nets {
- /* ->newlink() needs to freeze 3 netns at most;
- * 2 for the new device, 1 for its peer.
+ /* ->newlink() needs to freeze 4 netns at most;
+ * 2 for the new device, 1 for its peer, 1 for
+ * an existing device (do_setlink() path).
*/
- struct net *net[3];
+ struct net *net[4];
unsigned char len;
};
@@ -4155,6 +4156,8 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}
+ rtnl_nets_add(&rtnl_nets, get_net(sock_net(skb->sk)));
+
rtnl_nets_lock(&rtnl_nets);
ret = __rtnl_newlink(skb, nlh, ops, tgt_net, link_net, peer_net, tbs, data, extack);
rtnl_nets_unlock(&rtnl_nets);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister().
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink() Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work Kuniyuki Iwashima
` (12 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
When rtnl_link_unregister() is called during module unload, it
calls __rtnl_kill_links() for every netns.
__rtnl_kill_links() collects all devices of the unloaded module
and passes them to unregister_netdevice_many().
Let's move unregister_netdevice_many() to rtnl_link_unregister()
to unregister all devices across netns in a single batch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/rtnetlink.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f39c93e80e20..7207da002fb5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -637,16 +637,15 @@ int rtnl_link_register(struct rtnl_link_ops *ops)
}
EXPORT_SYMBOL_GPL(rtnl_link_register);
-static void __rtnl_kill_links(struct net *net, struct rtnl_link_ops *ops)
+static void __rtnl_kill_links(struct net *net, struct rtnl_link_ops *ops,
+ struct list_head *dev_kill_list)
{
struct net_device *dev;
- LIST_HEAD(list_kill);
for_each_netdev(net, dev) {
if (dev->rtnl_link_ops == ops)
- ops->dellink(dev, &list_kill);
+ ops->dellink(dev, dev_kill_list);
}
- unregister_netdevice_many(&list_kill);
}
/* Return with the rtnl_lock held when there are no network
@@ -677,6 +676,7 @@ static void rtnl_lock_unregistering_all(void)
*/
void rtnl_link_unregister(struct rtnl_link_ops *ops)
{
+ LIST_HEAD(dev_kill_list);
struct net *net;
mutex_lock(&link_ops_mutex);
@@ -691,7 +691,9 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops)
rtnl_lock_unregistering_all();
for_each_net(net)
- __rtnl_kill_links(net, ops);
+ __rtnl_kill_links(net, ops, &dev_kill_list);
+
+ unregister_netdevice_many(&dev_kill_list);
rtnl_unlock();
up_write(&pernet_ops_rwsem);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink() Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister() Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock() Kuniyuki Iwashima
` (11 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
The biggest blocker to per-netns RTNL is netdev unregistration.
It starts within a single netns (e.g., during a device lookup or
netns dismantle), but it can eventually involve multiple namespaces,
such as when upper ipvlan devices reside in different netns.
This prevents us from acquiring multiple rtnl_net_lock()s beforehand.
When we encounter such a cross-netns device, we must delegate the
unregistration to the work of the netns where the device actually
resides.
Let's add per-netns rtnl_work to support the deferred netdev
unregistration.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/rtnetlink.h | 8 ++++++++
include/net/net_namespace.h | 1 +
net/core/net_namespace.c | 1 +
net/core/rtnetlink.c | 26 ++++++++++++++++++++++++++
4 files changed, 36 insertions(+)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index ea39dd23a197..95729339e7a5 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -115,6 +115,10 @@ bool rtnl_net_is_locked(struct net *net);
bool lockdep_rtnl_net_is_held(struct net *net);
+void rtnl_net_queue_work(struct net *net);
+void rtnl_net_flush_workqueue(void);
+void rtnl_net_work_func(struct work_struct *work);
+
#define rcu_dereference_rtnl_net(net, p) \
rcu_dereference_check(p, lockdep_rtnl_net_is_held(net))
#define rtnl_net_dereference(net, p) \
@@ -150,6 +154,10 @@ static inline void ASSERT_RTNL_NET(struct net *net)
ASSERT_RTNL();
}
+static inline void rtnl_net_flush_workqueue(void)
+{
+}
+
#define rcu_dereference_rtnl_net(net, p) \
rcu_dereference_rtnl(p)
#define rtnl_net_dereference(net, p) \
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 80de5e98a66d..a989019af5f7 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -197,6 +197,7 @@ struct net {
#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
/* Move to a better place when the config guard is removed. */
struct mutex rtnl_mutex;
+ struct work_struct rtnl_work;
#endif
#if IS_ENABLED(CONFIG_VSOCKETS)
struct netns_vsock vsock;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d9dafe24f57e..d1aeff9de580 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -422,6 +422,7 @@ static __net_init int preinit_net(struct net *net, struct user_namespace *user_n
#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
mutex_init(&net->rtnl_mutex);
lock_set_cmp_fn(&net->rtnl_mutex, rtnl_net_lock_cmp_fn, NULL);
+ INIT_WORK(&net->rtnl_work, rtnl_net_work_func);
#endif
INIT_LIST_HEAD(&net->ptype_all);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7207da002fb5..7959519e7375 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -273,6 +273,26 @@ bool lockdep_rtnl_net_is_held(struct net *net)
return lockdep_rtnl_is_held() && lockdep_is_held(&net->rtnl_mutex);
}
EXPORT_SYMBOL(lockdep_rtnl_net_is_held);
+
+static struct workqueue_struct *rtnl_net_wq;
+
+void rtnl_net_queue_work(struct net *net)
+{
+ queue_work(rtnl_net_wq, &net->rtnl_work);
+}
+
+void rtnl_net_flush_workqueue(void)
+{
+ flush_workqueue(rtnl_net_wq);
+}
+
+void rtnl_net_work_func(struct work_struct *work)
+{
+ struct net *net = container_of(work, struct net, rtnl_work);
+
+ rtnl_net_lock(net);
+ rtnl_net_unlock(net);
+}
#else
static int rtnl_net_cmp_locks(const struct net *net_a, const struct net *net_b)
{
@@ -7226,4 +7246,10 @@ void __init rtnetlink_init(void)
register_netdevice_notifier(&rtnetlink_dev_notifier);
rtnl_register_many(rtnetlink_rtnl_msg_handlers);
+
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ rtnl_net_wq = create_workqueue("rtnl_net");
+ if (!rtnl_net_wq)
+ panic("Could not create rtnl_net workq");
+#endif
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock().
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (2 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any() Kuniyuki Iwashima
` (10 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
default_device_exit_net() could call dev_change_net_namespace()
to move devices from a dying netns to init_net.
Let's hold the two netns __rtnl_net_lock() around it.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/dev.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 4b3d5cfdf6e0..c477c4f84ed9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -13034,7 +13034,7 @@ static void __net_exit default_device_exit_net(struct net *net)
* Push all migratable network devices back to the
* initial network namespace
*/
- ASSERT_RTNL();
+
for_each_netdev_safe(net, dev, aux) {
int err;
char fb_name[IFNAMSIZ];
@@ -13077,11 +13077,19 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
LIST_HEAD(dev_kill_list);
rtnl_lock();
+
+ __rtnl_net_lock(&init_net);
+
list_for_each_entry(net, net_list, exit_list) {
+ __rtnl_net_lock(net);
default_device_exit_net(net);
+ __rtnl_net_unlock(net);
+
cond_resched();
}
+ __rtnl_net_unlock(&init_net);
+
list_for_each_entry(net, net_list, exit_list) {
for_each_netdev_reverse(net, dev) {
if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (3 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock() Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra Kuniyuki Iwashima
` (9 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Currently, netdev_run_todo() processes pending devices from multiple
namespaces in a batch.
To expand the per-netns RTNL coverage for NETDEV_UNREGISTER, let's
acquire __rtnl_net_lock() in netdev_wait_allrefs_any().
Note that netdev_run_todo() itself will need to be namespacified
before RTNL is removed.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/dev.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c477c4f84ed9..48818a194fa5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11608,8 +11608,13 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
rtnl_lock();
/* Rebroadcast unregister notification */
- list_for_each_entry(dev, list, todo_list)
+ list_for_each_entry(dev, list, todo_list) {
+ struct net *net = dev_net(dev);
+
+ __rtnl_net_lock(net);
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ __rtnl_net_unlock(net);
+ }
__rtnl_unlock();
rcu_barrier();
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (4 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any() Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 07/14] net: Call unregister_netdevice_many() per netns Kuniyuki Iwashima
` (8 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
When we need to unregister a netdev in a different netns, we will
delegate its unregistration to per-netns work.
There are three types of such cross-netns devices:
1. Paired devices (e.g., netkit, veth, vxcan)
-> Unregistering one device also deletes its peer, which
may reside in another netns.
2. Tunnel devices (e.g., bareudp, geneve, etc)
-> Destroying a netns removes devices in another netns if
their backend sockets reside in the dying netns
3. Stacked devices (e.g., ipvlan, macvlan, etc)
-> Removing the lower device also removes multiple upper
devices, each of which may reside in different namespaces.
In these cases, we will use unregister_netdevice_queue_net() to
queue such potential cross-netns devices for destruction.
unregister_netdevice_queue_net() takes net and dev. If dev resides
in the net, it simply calls unregister_netdevice_queue().
If dev_net(dev) is different from the net, it enqueues the device
to dev_net(dev)->dev_unreg_head and schedules the per-netns work.
When __rtnl_net_unlock() is called from the per-netns work (or another
thread already holding the lock), unregister_netdevice_many_net()
collects the queued devices and calls unregister_netdevice_many()
to perform the actual unregistration.
During netns dismantle, rtnl_net_flush_workqueue() is called at the
end of default_device_exit_batch() to ensure that cross-netns
devices in the other alive netns are unregistered.
Once RTNL is removed, a device could be moved to another netns while
being queued to net->dev_unreg_head.
__dev_change_net_namespace() handles this race by acquiring
net->dev_unreg_lock of both the old and new netns after dev_set_net()
and moving the device between their dev_unreg_head lists.
Since dev_set_net() and unregister_netdevice_queue_net() are
synchronised by netdev_lock(), the device is either queued to the
old netns's dev_unreg_head and then moved, or queued directly to
the new netns.
Note that unregister_netdevice_move_net() does not need to call
rtnl_net_queue_work() because __dev_change_net_namespace() is
(supposed to be) called with rtnl_net_lock(). (Not all callers
hold it yet, but the race does not happen until all callers
are converted and RTNL is removed.)
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/netdevice.h | 16 +++++++
include/net/net_namespace.h | 2 +
net/core/dev.c | 85 +++++++++++++++++++++++++++++++++++++
net/core/net_namespace.c | 2 +
net/core/rtnetlink.c | 4 ++
5 files changed, 109 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9981d637f8b5..53454db3611a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2241,6 +2241,9 @@ struct net_device {
struct list_head dev_list;
struct list_head napi_list;
struct list_head unreg_list;
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ struct list_head unreg_list_net;
+#endif
struct list_head close_list;
struct list_head ptype_all;
@@ -3472,6 +3475,19 @@ static inline void unregister_netdevice(struct net_device *dev)
unregister_netdevice_queue(dev, NULL);
}
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+void unregister_netdevice_queue_net(struct net *net, struct net_device *dev,
+ struct list_head *head);
+void unregister_netdevice_many_net(struct net *net);
+#else
+static inline void unregister_netdevice_queue_net(struct net *net,
+ struct net_device *dev,
+ struct list_head *head)
+{
+ unregister_netdevice_queue(dev, head);
+}
+#endif
+
int netdev_refcnt_read(const struct net_device *dev);
void free_netdev(struct net_device *dev);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index a989019af5f7..501af1999fe8 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -198,6 +198,8 @@ struct net {
/* Move to a better place when the config guard is removed. */
struct mutex rtnl_mutex;
struct work_struct rtnl_work;
+ struct list_head dev_unreg_head;
+ spinlock_t dev_unreg_lock;
#endif
#if IS_ENABLED(CONFIG_VSOCKETS)
struct netns_vsock vsock;
diff --git a/net/core/dev.c b/net/core/dev.c
index 48818a194fa5..0f0bf65f5bf9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12092,6 +12092,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->napi_list);
INIT_LIST_HEAD(&dev->unreg_list);
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ INIT_LIST_HEAD(&dev->unreg_list_net);
+#endif
INIT_LIST_HEAD(&dev->close_list);
INIT_LIST_HEAD(&dev->link_watch_list);
INIT_LIST_HEAD(&dev->adj_list.upper);
@@ -12485,6 +12488,16 @@ void unregister_netdevice_many_notify(struct list_head *head,
synchronize_net();
list_for_each_entry(dev, head, unreg_list) {
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ struct net *net = dev_net(dev);
+
+ /* spin_lock() can be moved outside of the loop
+ * once the per-netns RTNL conversion completes.
+ */
+ spin_lock(&net->dev_unreg_lock);
+ list_del(&dev->unreg_list_net);
+ spin_unlock(&net->dev_unreg_lock);
+#endif
netdev_put(dev, &dev->dev_registered_tracker);
net_set_todo(dev);
cnt++;
@@ -12507,6 +12520,72 @@ void unregister_netdevice_many(struct list_head *head)
}
EXPORT_SYMBOL(unregister_netdevice_many);
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+void unregister_netdevice_queue_net(struct net *net, struct net_device *dev,
+ struct list_head *head)
+{
+ netdev_lock(dev);
+
+ if (net_eq(dev_net(dev), net)) {
+ netdev_unlock(dev);
+ unregister_netdevice_queue(dev, head);
+ return;
+ }
+
+ net = dev_net(dev);
+
+ spin_lock(&net->dev_unreg_lock);
+
+ DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->unreg_list_net));
+ list_add_tail(&dev->unreg_list_net, &net->dev_unreg_head);
+ rtnl_net_queue_work(net);
+
+ spin_unlock(&net->dev_unreg_lock);
+
+ netdev_unlock(dev);
+}
+EXPORT_SYMBOL(unregister_netdevice_queue_net);
+
+static void unregister_netdevice_move_net(struct net *net_old,
+ struct net *net,
+ struct net_device *dev)
+{
+ if (net_old > net) {
+ spin_lock(&net->dev_unreg_lock);
+ spin_lock(&net_old->dev_unreg_lock);
+ } else {
+ spin_lock(&net_old->dev_unreg_lock);
+ spin_lock(&net->dev_unreg_lock);
+ }
+
+ if (!list_empty(&dev->unreg_list_net)) {
+ list_del(&dev->unreg_list_net);
+ list_add_tail(&dev->unreg_list_net, &net->dev_unreg_head);
+ }
+
+ spin_unlock(&net_old->dev_unreg_lock);
+ spin_unlock(&net->dev_unreg_lock);
+}
+
+void unregister_netdevice_many_net(struct net *net)
+{
+ struct net_device *dev, *tmp;
+ LIST_HEAD(unreg_head_net);
+ LIST_HEAD(unreg_head);
+
+ spin_lock(&net->dev_unreg_lock);
+ list_splice_init(&net->dev_unreg_head, &unreg_head_net);
+ spin_unlock(&net->dev_unreg_lock);
+
+ list_for_each_entry_safe(dev, tmp, &unreg_head_net, unreg_list_net) {
+ list_del_init(&dev->unreg_list_net);
+ list_add_tail(&dev->unreg_list, &unreg_head);
+ }
+
+ unregister_netdevice_many(&unreg_head);
+}
+#endif
+
/**
* unregister_netdev - remove device from the kernel
* @dev: device
@@ -12663,6 +12742,10 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
netdev_unlock(dev);
dev->ifindex = new_ifindex;
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+ unregister_netdevice_move_net(net_old, net, dev);
+#endif
+
if (new_name[0]) {
/* Rename the netdev to prepared name */
write_seqlock_bh(&netdev_rename_lock);
@@ -13105,6 +13188,8 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
}
unregister_netdevice_many(&dev_kill_list);
rtnl_unlock();
+
+ rtnl_net_flush_workqueue();
}
static struct pernet_operations __net_initdata default_device_ops = {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d1aeff9de580..578b48cf5318 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -423,6 +423,8 @@ static __net_init int preinit_net(struct net *net, struct user_namespace *user_n
mutex_init(&net->rtnl_mutex);
lock_set_cmp_fn(&net->rtnl_mutex, rtnl_net_lock_cmp_fn, NULL);
INIT_WORK(&net->rtnl_work, rtnl_net_work_func);
+ INIT_LIST_HEAD(&net->dev_unreg_head);
+ spin_lock_init(&net->dev_unreg_lock);
#endif
INIT_LIST_HEAD(&net->ptype_all);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7959519e7375..544498d3c325 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -197,6 +197,7 @@ void __rtnl_net_unlock(struct net *net)
{
ASSERT_RTNL();
+ unregister_netdevice_many_net(net);
mutex_unlock(&net->rtnl_mutex);
}
EXPORT_SYMBOL(__rtnl_net_unlock);
@@ -290,6 +291,9 @@ void rtnl_net_work_func(struct work_struct *work)
{
struct net *net = container_of(work, struct net, rtnl_work);
+ if (list_empty(&net->dev_unreg_head))
+ return;
+
rtnl_net_lock(net);
rtnl_net_unlock(net);
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 07/14] net: Call unregister_netdevice_many() per netns.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (5 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 08/14] veth: Support per-netns device unregistration Kuniyuki Iwashima
` (7 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
For per-netns device unregistration, the list passed to
unregister_netdevice_many() must contain devices from a single
netns only (once all callers are converted).
Let's move collected devices in the following functions to
net->dev_unreg_head and let __rtnl_net_unlock() pass them to
unregister_netdevice_many().
* default_device_exit_batch()
* ops_exit_rtnl_list()
* __rtnl_kill_links()
This allows incremental conversion of each driver to support
per-netns device unregistration without affecting the normal
kernel where CONFIG_DEBUG_NET_SMALL_RTNL is disabled.
Note that this change unbatches synchronize_rcu() etc in
unregister_netdevice_many(), but we can later split it into
multiple stages to batch them again.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/netdevice.h | 6 ++++++
net/core/dev.c | 27 +++++++++++++++++++++++++++
net/core/net_namespace.c | 1 +
net/core/rtnetlink.c | 6 +++++-
4 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 53454db3611a..0cd26fb59806 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3479,6 +3479,7 @@ static inline void unregister_netdevice(struct net_device *dev)
void unregister_netdevice_queue_net(struct net *net, struct net_device *dev,
struct list_head *head);
void unregister_netdevice_many_net(struct net *net);
+void unregister_netdevice_queue_many_net(struct net *net, struct list_head *head);
#else
static inline void unregister_netdevice_queue_net(struct net *net,
struct net_device *dev,
@@ -3486,6 +3487,11 @@ static inline void unregister_netdevice_queue_net(struct net *net,
{
unregister_netdevice_queue(dev, head);
}
+
+static inline void unregister_netdevice_queue_many_net(struct net *net,
+ struct list_head *head)
+{
+}
#endif
int netdev_refcnt_read(const struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0f0bf65f5bf9..57fb4741d0ac 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12546,6 +12546,28 @@ void unregister_netdevice_queue_net(struct net *net, struct net_device *dev,
}
EXPORT_SYMBOL(unregister_netdevice_queue_net);
+void unregister_netdevice_queue_many_net(struct net *net, struct list_head *head)
+{
+ struct net_device *dev, *tmp;
+
+ spin_lock(&net->dev_unreg_lock);
+ list_for_each_entry_safe(dev, tmp, head, unreg_list) {
+ /* Once all cross-netns unregister_netdevice_queue() is
+ * converted to _net() (or for debugging), remove this check.
+ */
+ if (!net_eq(dev_net(dev), net))
+ continue;
+
+ DEBUG_NET_WARN_ONCE(!net_eq(dev_net(dev), net),
+ "%s was unregistered from a different netns.\n",
+ dev->name);
+
+ list_del_init(&dev->unreg_list);
+ list_move_tail(&dev->unreg_list_net, &net->dev_unreg_head);
+ }
+ spin_unlock(&net->dev_unreg_lock);
+}
+
static void unregister_netdevice_move_net(struct net *net_old,
struct net *net,
struct net_device *dev)
@@ -13179,12 +13201,17 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
__rtnl_net_unlock(&init_net);
list_for_each_entry(net, net_list, exit_list) {
+ __rtnl_net_lock(net);
+
for_each_netdev_reverse(net, dev) {
if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
dev->rtnl_link_ops->dellink(dev, &dev_kill_list);
else
unregister_netdevice_queue(dev, &dev_kill_list);
}
+
+ unregister_netdevice_queue_many_net(net, &dev_kill_list);
+ __rtnl_net_unlock(net);
}
unregister_netdevice_many(&dev_kill_list);
rtnl_unlock();
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 578b48cf5318..a91d2b58aadd 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -181,6 +181,7 @@ static void ops_exit_rtnl_list(const struct list_head *ops_list,
ops->exit_rtnl(net, &dev_kill_list);
}
+ unregister_netdevice_queue_many_net(net, &dev_kill_list);
__rtnl_net_unlock(net);
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 544498d3c325..b129f793d851 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -714,8 +714,12 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops)
down_write(&pernet_ops_rwsem);
rtnl_lock_unregistering_all();
- for_each_net(net)
+ for_each_net(net) {
+ __rtnl_net_lock(net);
__rtnl_kill_links(net, ops, &dev_kill_list);
+ unregister_netdevice_queue_many_net(net, &dev_kill_list);
+ __rtnl_net_unlock(net);
+ }
unregister_netdevice_many(&dev_kill_list);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 08/14] veth: Support per-netns device unregistration.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (6 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 07/14] net: Call unregister_netdevice_many() per netns Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 09/14] bareudp: Protect bareudp_list with mutex Kuniyuki Iwashima
` (6 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Currently, veth_dellink() unregisters both local and peer devices
synchronously under RTNL.
Once RTNL is removed, it can be called concurrently from different
netns.
Let's use xchg() and unregister_netdevice_queue_net() to support
per-netns device unregistration.
This way, each device is queued for destruction only once by
the winner of the race.
Note that the extra netdev_hold() ensures that @peer obtained by
the first xchg() is not freed during the subsequent access to
netdev_priv(peer). The 2nd xchg() overwrites @dev to balance
the refcount.
Tested:
1. Create two veth pairs (veth1-2, veth3-4) between two netns
(ns1 & ns2).
# ip netns add ns1
# ip netns add ns2
# ip -n ns1 link add veth1 type veth peer veth2 netns ns2
# ip -n ns1 link add veth3 type veth peer veth4 netns ns2
2. Run bpftrace to check if the same process does NOT
unregister the paired veth devices
# bpftrace -e '#include <linux/netdevice.h>
kprobe:free_netdev {
$dev = (struct net_device *)arg0;
printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
}'
3. Remove veth2 in ns2 and check bpftrace output
# ip -n ns2 link del veth2
PID: 2194 | DEV: veth2
free_netdev+5
netdev_run_todo+4798
rtnl_dellink+1507
rtnetlink_rcv_msg+1791
...
PID: 448 | DEV: veth1
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
4. Remove ns2 (thus veth4) and check bpftrace output
# ip netns del ns2
PID: 571 | DEV: veth4
free_netdev+5
netdev_run_todo+4798
default_device_exit_batch+2271
ops_undo_list+993
cleanup_net+1122
process_scheduled_works+2538
...
PID: 441 | DEV: veth3
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/veth.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 1c5142149175..8170bf33ccf9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -77,6 +77,7 @@ struct veth_priv {
struct bpf_prog *_xdp_prog;
struct veth_rq *rq;
unsigned int requested_headroom;
+ netdevice_tracker peer_tracker;
};
struct veth_xdp_tx_bq {
@@ -1901,15 +1902,17 @@ static int veth_newlink(struct net_device *dev,
priv = netdev_priv(dev);
rcu_assign_pointer(priv->peer, peer);
+ netdev_hold(peer, &priv->peer_tracker, GFP_KERNEL);
err = veth_init_queues(dev, tb);
if (err)
goto err_queues;
priv = netdev_priv(peer);
rcu_assign_pointer(priv->peer, dev);
+ netdev_hold(dev, &priv->peer_tracker, GFP_KERNEL);
err = veth_init_queues(peer, tb);
if (err)
- goto err_queues;
+ goto err_peer_queues;
veth_disable_gro(dev);
/* update XDP supported features */
@@ -1918,7 +1921,11 @@ static int veth_newlink(struct net_device *dev,
return 0;
+err_peer_queues:
+ netdev_put(dev, &priv->peer_tracker);
+ priv = netdev_priv(dev);
err_queues:
+ netdev_put(peer, &priv->peer_tracker);
unregister_netdevice(dev);
err_register_dev:
/* nothing to do */
@@ -1933,24 +1940,25 @@ static int veth_newlink(struct net_device *dev,
static void veth_dellink(struct net_device *dev, struct list_head *head)
{
- struct veth_priv *priv;
+ netdevice_tracker *peer_tracker;
struct net_device *peer;
+ struct veth_priv *priv;
priv = netdev_priv(dev);
- peer = rtnl_dereference(priv->peer);
+ peer_tracker = &priv->peer_tracker;
+ peer = unrcu_pointer(xchg(&priv->peer, NULL));
+ if (!peer)
+ return;
- /* Note : dellink() is called from default_device_exit_batch(),
- * before a rcu_synchronize() point. The devices are guaranteed
- * not being freed before one RCU grace period.
- */
- RCU_INIT_POINTER(priv->peer, NULL);
unregister_netdevice_queue(dev, head);
- if (peer) {
- priv = netdev_priv(peer);
- RCU_INIT_POINTER(priv->peer, NULL);
- unregister_netdevice_queue(peer, head);
- }
+ priv = netdev_priv(peer);
+ dev = unrcu_pointer(xchg(&priv->peer, NULL));
+ if (dev)
+ unregister_netdevice_queue_net(dev_net(dev), peer, head);
+
+ netdev_put(peer, peer_tracker);
+ netdev_put(dev, &priv->peer_tracker);
}
static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 09/14] bareudp: Protect bareudp_list with mutex.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (7 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 08/14] veth: Support per-netns device unregistration Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 10/14] bareudp: Support per-netns netdev unregistration Kuniyuki Iwashima
` (5 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
struct bareudp_dev.net is the netns where the backend bareudp
socket resides.
struct bareudp_dev is linked to the bareudp_net.bareudp_list of
the socket's netns.
During netns dismantle or module unload, bareudp_exit_rtnl_net()
iterates the list and queues devices for destruction regardless
of the devices' netns.
Thus, once RTNL is removed, the list can be modified concurrently
from different netns due to device removal.
Let's protect it with per-netns mutex.
bareudp_newlink() is still protected by rtnl_net_lock()s, so
acquiring gn->lock twice in bareudp_find_dev() and
bareudp_configure() is not a problem.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/bareudp.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 5ef841c85526..7dedf4867e7b 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -36,6 +36,7 @@ static unsigned int bareudp_net_id;
struct bareudp_net {
struct list_head bareudp_list;
+ struct mutex lock;
};
struct bareudp_conf {
@@ -636,10 +637,15 @@ static struct bareudp_dev *bareudp_find_dev(struct bareudp_net *bn,
{
struct bareudp_dev *bareudp, *t = NULL;
+ mutex_lock(&bn->lock);
+
list_for_each_entry(bareudp, &bn->bareudp_list, next) {
if (conf->port == bareudp->port)
t = bareudp;
}
+
+ mutex_unlock(&bn->lock);
+
return t;
}
@@ -675,7 +681,10 @@ static int bareudp_configure(struct net *net, struct net_device *dev,
if (err)
return err;
+ mutex_lock(&bn->lock);
list_add(&bareudp->next, &bn->bareudp_list);
+ mutex_unlock(&bn->lock);
+
return 0;
}
@@ -692,7 +701,7 @@ static int bareudp_link_config(struct net_device *dev,
return 0;
}
-static void bareudp_dellink(struct net_device *dev, struct list_head *head)
+static void __bareudp_dellink(struct net_device *dev, struct list_head *head)
{
struct bareudp_dev *bareudp = netdev_priv(dev);
@@ -700,6 +709,18 @@ static void bareudp_dellink(struct net_device *dev, struct list_head *head)
unregister_netdevice_queue(dev, head);
}
+static void bareudp_dellink(struct net_device *dev, struct list_head *head)
+{
+ struct bareudp_dev *bareudp = netdev_priv(dev);
+ struct bareudp_net *bn;
+
+ bn = net_generic(bareudp->net, bareudp_net_id);
+
+ mutex_lock(&bn->lock);
+ __bareudp_dellink(dev, head);
+ mutex_unlock(&bn->lock);
+}
+
static int bareudp_newlink(struct net_device *dev,
struct rtnl_newlink_params *params,
struct netlink_ext_ack *extack)
@@ -776,6 +797,8 @@ static __net_init int bareudp_init_net(struct net *net)
struct bareudp_net *bn = net_generic(net, bareudp_net_id);
INIT_LIST_HEAD(&bn->bareudp_list);
+ mutex_init(&bn->lock);
+
return 0;
}
@@ -785,8 +808,12 @@ static void __net_exit bareudp_exit_rtnl_net(struct net *net,
struct bareudp_net *bn = net_generic(net, bareudp_net_id);
struct bareudp_dev *bareudp, *next;
+ mutex_lock(&bn->lock);
+
list_for_each_entry_safe(bareudp, next, &bn->bareudp_list, next)
- bareudp_dellink(bareudp->dev, dev_kill_list);
+ __bareudp_dellink(bareudp->dev, dev_kill_list);
+
+ mutex_unlock(&bn->lock);
}
static struct pernet_operations bareudp_net_ops = {
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 10/14] bareudp: Support per-netns netdev unregistration.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (8 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 09/14] bareudp: Protect bareudp_list with mutex Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t Kuniyuki Iwashima
` (4 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
bareudp_exit_rtnl_net() iterates bareudp devices whose sockets
are in the dying netns and queues them for destruction.
So the devices may reside in different netns.
Let's use unregister_netdevice_queue_net() to support per-netns
device unregistration.
list_del() is changed to list_del_init() to avoid queueing the
same device twice.
Even after bareudp_exit_rtnl_net() queues a cross-netns bareudp
device, bareudp_dellink() could be called concurrently for it
(once RTNL is removed). In such a case, __rtnl_net_unlock() will
perform the unregistration.
Note that bareudp uses register_pernet_subsys() instead of _device(),
so default_device_exit_batch() guarantees that the async per-netns
works are flushed before ->exit().
Tested:
1. Create bareudp device across two netns.
# ip netns add ns1
# ip netns add ns2
# ip -n ns1 link add bareudp0 link-netns ns2 type bareudp \
dstport 9292 ethertype ipv4
2. Run bpftrace to check that bareudp_uninit() is called between
->exit_rtnl() and ->exit().
# bpftrace -e '#include <linux/netdevice.h>
kprobe:bareudp_uninit {
$dev = (struct net_device *)arg0;
printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
}
kprobe:bareudp_exit_rtnl_net,
kprobe:bareudp_exit_net {
printf("PID: %d%s\n", pid, kstack());
}'
3. Remove the netns where the bareudp socket resides
# ip netns del ns2
Now, we can see bareudp0 is unregistered by per-netns work
instead of cleanup_net() and it finishes before ->exit() to
avoid WARN_ON_ONCE(!list_empty(&gn->sock_list)) there.
PID: 576
bareudp_exit_rtnl_net+5
ops_undo_list+702
cleanup_net+1122
process_scheduled_works+2538
...
PID: 470 | DEV: bareudp0
bareudp_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
process_scheduled_works+2538
...
PID: 576
bareudp_exit_net+5
ops_undo_list+1064
cleanup_net+1122
process_scheduled_works+2538
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/bareudp.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 7dedf4867e7b..c3b5ed52d877 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -701,12 +701,13 @@ static int bareudp_link_config(struct net_device *dev,
return 0;
}
-static void __bareudp_dellink(struct net_device *dev, struct list_head *head)
+static void __bareudp_dellink(struct net *net, struct net_device *dev,
+ struct list_head *head)
{
struct bareudp_dev *bareudp = netdev_priv(dev);
- list_del(&bareudp->next);
- unregister_netdevice_queue(dev, head);
+ list_del_init(&bareudp->next);
+ unregister_netdevice_queue_net(net, dev, head);
}
static void bareudp_dellink(struct net_device *dev, struct list_head *head)
@@ -717,7 +718,8 @@ static void bareudp_dellink(struct net_device *dev, struct list_head *head)
bn = net_generic(bareudp->net, bareudp_net_id);
mutex_lock(&bn->lock);
- __bareudp_dellink(dev, head);
+ if (!list_empty(&bareudp->next))
+ __bareudp_dellink(dev_net(dev), dev, head);
mutex_unlock(&bn->lock);
}
@@ -811,14 +813,22 @@ static void __net_exit bareudp_exit_rtnl_net(struct net *net,
mutex_lock(&bn->lock);
list_for_each_entry_safe(bareudp, next, &bn->bareudp_list, next)
- __bareudp_dellink(bareudp->dev, dev_kill_list);
+ __bareudp_dellink(net, bareudp->dev, dev_kill_list);
mutex_unlock(&bn->lock);
}
+static void __net_exit bareudp_exit_net(struct net *net)
+{
+ struct bareudp_net *bn = net_generic(net, bareudp_net_id);
+
+ WARN_ON_ONCE(!list_empty(&bn->bareudp_list));
+}
+
static struct pernet_operations bareudp_net_ops = {
.init = bareudp_init_net,
.exit_rtnl = bareudp_exit_rtnl_net,
+ .exit = bareudp_exit_net,
.id = &bareudp_net_id,
.size = sizeof(struct bareudp_net),
};
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (9 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 10/14] bareudp: Support per-netns netdev unregistration Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev Kuniyuki Iwashima
` (3 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
struct ipvl_port is shared between a lower device and its upper
ipvlan devices.
While each upper device can always access ipvl_port safely via
ipvlan_dev.port, the lower device relies on RTNL to access it
via net_device.rx_handler_data.
Once RTNL is removed, the lower device cannot read ipvl_port safely
in ipvlan_device_event() because the port could be freed concurrently
and net_device.rx_handler_data is set to NULL if the last ipvlan
device in another namespace is unregistered.
Let's convert ipvl_port.count to refcount_t and use RCU along with
refcount_inc_not_zero() in ipvlan_device_event().
netdev_put() in ipvlan_port_destroy() is also moved down after
cancel_work_sync(), which is the last user of port->dev.
Note that ipvlan->port is now set in ipvlan_init() so that it can
be used in ipvlan_uninit(), instead of ipvlan_port_get_rtnl()
(rtnl_dereference()).
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/ipvlan/ipvlan.h | 2 +-
drivers/net/ipvlan/ipvlan_main.c | 75 ++++++++++++++++++++++----------
2 files changed, 52 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 80f84fc87008..78f9107fa752 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -96,7 +96,7 @@ struct ipvl_port {
u16 dev_id_start;
struct work_struct wq;
struct sk_buff_head backlog;
- int count;
+ refcount_t count;
struct ida ida;
netdevice_tracker dev_tracker;
};
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index ed46439a9f4e..b4906a8d24ef 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -86,6 +86,7 @@ static int ipvlan_port_create(struct net_device *dev)
goto err;
netdev_hold(dev, &port->dev_tracker, GFP_KERNEL);
+
return 0;
err:
@@ -93,16 +94,18 @@ static int ipvlan_port_create(struct net_device *dev)
return err;
}
-static void ipvlan_port_destroy(struct net_device *dev)
+static void ipvlan_port_destroy(struct ipvl_port *port)
{
- struct ipvl_port *port = ipvlan_port_get_rtnl(dev);
+ struct net_device *dev = port->dev;
struct sk_buff *skb;
- netdev_put(dev, &port->dev_tracker);
if (port->mode == IPVLAN_MODE_L3S)
ipvlan_l3s_unregister(port);
+
netdev_rx_handler_unregister(dev);
cancel_work_sync(&port->wq);
+ netdev_put(dev, &port->dev_tracker);
+
while ((skb = __skb_dequeue(&port->backlog)) != NULL) {
dev_put(skb->dev);
kfree_skb(skb);
@@ -111,6 +114,27 @@ static void ipvlan_port_destroy(struct net_device *dev)
kfree(port);
}
+static void ipvlan_port_put(struct ipvl_port *port)
+{
+ if (refcount_dec_and_test(&port->count))
+ ipvlan_port_destroy(port);
+}
+
+static struct ipvl_port *ipvlan_port_get(struct net_device *dev)
+{
+ struct ipvl_port *port = NULL;
+
+ rcu_read_lock();
+ if (netif_is_ipvlan_port(dev)) {
+ port = ipvlan_port_get_rcu(dev);
+ if (!refcount_inc_not_zero(&port->count))
+ port = NULL;
+ }
+ rcu_read_unlock();
+
+ return port;
+}
+
#define IPVLAN_ALWAYS_ON_OFLOADS \
(NETIF_F_SG | NETIF_F_HW_CSUM | \
NETIF_F_GSO_ROBUST | NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ENCAP_ALL)
@@ -159,24 +183,24 @@ static int ipvlan_init(struct net_device *dev)
free_percpu(ipvlan->pcpu_stats);
return err;
}
+ port = ipvlan_port_get_rtnl(phy_dev);
+ refcount_set(&port->count, 1);
+ } else {
+ port = ipvlan_port_get_rtnl(phy_dev);
+ refcount_inc(&port->count);
}
- port = ipvlan_port_get_rtnl(phy_dev);
- port->count += 1;
+
+ ipvlan->port = port;
+
return 0;
}
static void ipvlan_uninit(struct net_device *dev)
{
struct ipvl_dev *ipvlan = netdev_priv(dev);
- struct net_device *phy_dev = ipvlan->phy_dev;
- struct ipvl_port *port;
free_percpu(ipvlan->pcpu_stats);
-
- port = ipvlan_port_get_rtnl(phy_dev);
- port->count -= 1;
- if (!port->count)
- ipvlan_port_destroy(port->dev);
+ ipvlan_port_put(ipvlan->port);
}
static int ipvlan_open(struct net_device *dev)
@@ -594,9 +618,7 @@ int ipvlan_link_new(struct net_device *dev, struct rtnl_newlink_params *params,
if (err < 0)
return err;
- /* ipvlan_init() would have created the port, if required */
- port = ipvlan_port_get_rtnl(phy_dev);
- ipvlan->port = port;
+ port = ipvlan->port;
/* If the port-id base is at the MAX value, then wrap it around and
* begin from 0x1 again. This may be due to a busy system where lots
@@ -729,14 +751,13 @@ static int ipvlan_device_event(struct notifier_block *unused,
struct netdev_notifier_pre_changeaddr_info *prechaddr_info;
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct ipvl_dev *ipvlan, *next;
+ int err, ret = NOTIFY_DONE;
struct ipvl_port *port;
LIST_HEAD(lst_kill);
- int err;
-
- if (!netif_is_ipvlan_port(dev))
- return NOTIFY_DONE;
- port = ipvlan_port_get_rtnl(dev);
+ port = ipvlan_port_get(dev);
+ if (!port)
+ return ret;
switch (event) {
case NETDEV_UP:
@@ -788,8 +809,10 @@ static int ipvlan_device_event(struct notifier_block *unused,
err = netif_pre_changeaddr_notify(ipvlan->dev,
prechaddr_info->dev_addr,
extack);
- if (err)
- return notifier_from_errno(err);
+ if (err) {
+ ret = notifier_from_errno(err);
+ break;
+ }
}
break;
@@ -802,7 +825,8 @@ static int ipvlan_device_event(struct notifier_block *unused,
case NETDEV_PRE_TYPE_CHANGE:
/* Forbid underlying device to change its type. */
- return NOTIFY_BAD;
+ ret = NOTIFY_BAD;
+ break;
case NETDEV_NOTIFY_PEERS:
case NETDEV_BONDING_FAILOVER:
@@ -810,7 +834,10 @@ static int ipvlan_device_event(struct notifier_block *unused,
list_for_each_entry(ipvlan, &port->ipvlans, pnode)
call_netdevice_notifiers(event, ipvlan->dev);
}
- return NOTIFY_DONE;
+
+ ipvlan_port_put(port);
+
+ return ret;
}
/* the caller must held the addrs lock */
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (10 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex Kuniyuki Iwashima
` (2 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
ipvlan_uninit() for the last ipvlan device resets the lower device's
rx_handler_data to NULL.
Once RTNL is removed, ipvlan_init() would race with ipvlan_uninit(),
which could leak a newly allocated ipvl_port.
ipvlan_init() ipvlan_uninit()
| |- if (refcount_dec_and_test(old_port))
... |- ipvlan_port_destroy(old_port)
| '
|- refcount_inc_not_zero(old_port) <-- fails
|- ipvlan_port_create(phy_dev) .
|- new_port = kzalloc() |
|- phy_dev->rx_handler_data = new_port
|- phy_dev->rx_handler_data = NULL
...
`- kfree(old_port);
Let's synchronise the two by holding the lower device's netdev_lock().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/ipvlan/ipvlan_main.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index b4906a8d24ef..7adad781e9b5 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -177,9 +177,12 @@ static int ipvlan_init(struct net_device *dev)
if (!ipvlan->pcpu_stats)
return -ENOMEM;
+ netdev_lock(phy_dev);
+
if (!netif_is_ipvlan_port(phy_dev)) {
err = ipvlan_port_create(phy_dev);
if (err < 0) {
+ netdev_unlock(phy_dev);
free_percpu(ipvlan->pcpu_stats);
return err;
}
@@ -190,6 +193,8 @@ static int ipvlan_init(struct net_device *dev)
refcount_inc(&port->count);
}
+ netdev_unlock(phy_dev);
+
ipvlan->port = port;
return 0;
@@ -198,9 +203,19 @@ static int ipvlan_init(struct net_device *dev)
static void ipvlan_uninit(struct net_device *dev)
{
struct ipvl_dev *ipvlan = netdev_priv(dev);
+ netdevice_tracker dev_tracker;
+ struct net_device *phy_dev;
free_percpu(ipvlan->pcpu_stats);
+
+ phy_dev = ipvlan->phy_dev;
+ netdev_hold(phy_dev, &dev_tracker, GFP_KERNEL);
+ netdev_lock(phy_dev);
+
ipvlan_port_put(ipvlan->port);
+
+ netdev_unlock(phy_dev);
+ netdev_put(phy_dev, &dev_tracker);
}
static int ipvlan_open(struct net_device *dev)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (11 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 14/14] ipvlan: Support per-netns netdev unregistration Kuniyuki Iwashima
2026-07-02 7:45 ` [syzbot ci] Re: net: Support per-netns device unregistration syzbot ci
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
struct ipvl_port is shared between a lower device and its upper
ipvlan devices.
All upper devices are linked to ipvl_port.ipvlans.
Once RTNL is removed, the list can be modified concurrently from
different netns due to device removal.
Let's protect it with a per-port mutex.
NETDEV_PRECHANGEUPPER and NETDEV_CHANGEUPPER are explicitly
skipped to avoid deadlock for netdev_upper_dev_unlink() called
from NETDEV_UNREGISTER.
Note that __ipvtap_dellink() and struct ipvtap_dev is moved to
ipvlan.c/h for CONFIG_IPVLAN=y but CONFIG_IPVTAP=m.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/ipvlan/ipvlan.h | 14 ++++++++-
drivers/net/ipvlan/ipvlan_main.c | 54 +++++++++++++++++++++++++++++---
drivers/net/ipvlan/ipvtap.c | 15 +++------
3 files changed, 67 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 78f9107fa752..a0736f5c89f6 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -16,6 +16,9 @@
#include <linux/if_arp.h>
#include <linux/if_link.h>
#include <linux/if_vlan.h>
+#if IS_ENABLED(CONFIG_IPVTAP)
+#include <linux/if_tap.h>
+#endif
#include <linux/ip.h>
#include <linux/inetdevice.h>
#include <linux/netfilter.h>
@@ -91,6 +94,7 @@ struct ipvl_port {
struct hlist_head hlhead[IPVLAN_HASH_SIZE];
spinlock_t addrs_lock; /* guards hash-table and addrs */
struct list_head ipvlans;
+ struct mutex pnodes_lock;
u16 mode;
u16 flags;
u16 dev_id_start;
@@ -168,7 +172,6 @@ void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
unsigned int len, bool success, bool mcast);
int ipvlan_link_new(struct net_device *dev, struct rtnl_newlink_params *params,
struct netlink_ext_ack *extack);
-void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
void ipvlan_link_setup(struct net_device *dev);
int ipvlan_link_register(struct rtnl_link_ops *ops);
#ifdef CONFIG_IPVLAN_L3S
@@ -207,4 +210,13 @@ static inline bool netif_is_ipvlan_port(const struct net_device *dev)
return rcu_access_pointer(dev->rx_handler) == ipvlan_handle_frame;
}
+#if IS_ENABLED(CONFIG_IPVTAP)
+struct ipvtap_dev {
+ struct ipvl_dev vlan;
+ struct tap_dev tap;
+};
+
+void __ipvtap_dellink(struct net_device *dev, struct list_head *head);
+#endif
+
#endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 7adad781e9b5..41024fe27b78 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -16,6 +16,8 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
ASSERT_RTNL();
if (port->mode != nval) {
+ mutex_lock(&port->pnodes_lock);
+
list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
flags = ipvlan->dev->flags;
if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S) {
@@ -40,6 +42,8 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
ipvlan_l3s_unregister(port);
}
port->mode = nval;
+
+ mutex_unlock(&port->pnodes_lock);
}
return 0;
@@ -56,6 +60,8 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
NULL);
}
+ mutex_unlock(&port->pnodes_lock);
+
return err;
}
@@ -76,6 +82,7 @@ static int ipvlan_port_create(struct net_device *dev)
INIT_HLIST_HEAD(&port->hlhead[idx]);
spin_lock_init(&port->addrs_lock);
+ mutex_init(&port->pnodes_lock);
skb_queue_head_init(&port->backlog);
INIT_WORK(&port->wq, ipvlan_process_multicast);
ida_init(&port->ida);
@@ -676,7 +683,10 @@ int ipvlan_link_new(struct net_device *dev, struct rtnl_newlink_params *params,
if (err)
goto unlink_netdev;
+ mutex_lock(&port->pnodes_lock);
list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
+ mutex_unlock(&port->pnodes_lock);
+
netif_stacked_transfer_operstate(phy_dev, dev);
return 0;
@@ -690,7 +700,7 @@ int ipvlan_link_new(struct net_device *dev, struct rtnl_newlink_params *params,
}
EXPORT_SYMBOL_GPL(ipvlan_link_new);
-void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
+static void __ipvlan_link_delete(struct net_device *dev, struct list_head *head)
{
struct ipvl_dev *ipvlan = netdev_priv(dev);
struct ipvl_addr *addr, *next;
@@ -708,7 +718,27 @@ void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
unregister_netdevice_queue(dev, head);
netdev_upper_dev_unlink(ipvlan->phy_dev, dev);
}
-EXPORT_SYMBOL_GPL(ipvlan_link_delete);
+
+static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
+{
+ struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+ mutex_lock(&ipvlan->port->pnodes_lock);
+ __ipvlan_link_delete(dev, head);
+ mutex_unlock(&ipvlan->port->pnodes_lock);
+}
+
+#if IS_ENABLED(CONFIG_IPVTAP)
+void __ipvtap_dellink(struct net_device *dev, struct list_head *head)
+{
+ struct ipvtap_dev *vlantap = netdev_priv(dev);
+
+ netdev_rx_handler_unregister(dev);
+ tap_del_queues(&vlantap->tap);
+ __ipvlan_link_delete(dev, head);
+}
+EXPORT_SYMBOL_GPL(__ipvtap_dellink);
+#endif
void ipvlan_link_setup(struct net_device *dev)
{
@@ -770,10 +800,16 @@ static int ipvlan_device_event(struct notifier_block *unused,
struct ipvl_port *port;
LIST_HEAD(lst_kill);
+ if (event == NETDEV_PRECHANGEUPPER ||
+ event == NETDEV_CHANGEUPPER)
+ return ret;
+
port = ipvlan_port_get(dev);
if (!port)
return ret;
+ mutex_lock(&port->pnodes_lock);
+
switch (event) {
case NETDEV_UP:
case NETDEV_DOWN:
@@ -800,9 +836,15 @@ static int ipvlan_device_event(struct notifier_block *unused,
if (dev->reg_state != NETREG_UNREGISTERING)
break;
- list_for_each_entry_safe(ipvlan, next, &port->ipvlans, pnode)
- ipvlan->dev->rtnl_link_ops->dellink(ipvlan->dev,
- &lst_kill);
+ list_for_each_entry_safe(ipvlan, next, &port->ipvlans, pnode) {
+#if IS_ENABLED(CONFIG_IPVTAP)
+ if (ipvlan->dev->rtnl_link_ops != &ipvlan_link_ops)
+ __ipvtap_dellink(ipvlan->dev, &lst_kill);
+ else
+#endif
+ __ipvlan_link_delete(ipvlan->dev, &lst_kill);
+ }
+
unregister_netdevice_many(&lst_kill);
break;
@@ -850,6 +892,8 @@ static int ipvlan_device_event(struct notifier_block *unused,
call_netdevice_notifiers(event, ipvlan->dev);
}
+ mutex_unlock(&port->pnodes_lock);
+
ipvlan_port_put(port);
return ret;
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c
index 2d6bbddd1edd..17b0dd7cf73b 100644
--- a/drivers/net/ipvlan/ipvtap.c
+++ b/drivers/net/ipvlan/ipvtap.c
@@ -2,7 +2,6 @@
#include <linux/etherdevice.h>
#include "ipvlan.h"
#include <linux/if_vlan.h>
-#include <linux/if_tap.h>
#include <linux/interrupt.h>
#include <linux/nsproxy.h>
#include <linux/compat.h>
@@ -43,11 +42,6 @@ static struct class ipvtap_class = {
.namespace = ipvtap_net_namespace,
};
-struct ipvtap_dev {
- struct ipvl_dev vlan;
- struct tap_dev tap;
-};
-
static void ipvtap_count_tx_dropped(struct tap_dev *tap)
{
struct ipvtap_dev *vlantap = container_of(tap, struct ipvtap_dev, tap);
@@ -112,11 +106,12 @@ static int ipvtap_newlink(struct net_device *dev,
static void ipvtap_dellink(struct net_device *dev,
struct list_head *head)
{
- struct ipvtap_dev *vlan = netdev_priv(dev);
+ struct ipvtap_dev *vlantap = netdev_priv(dev);
+ struct ipvl_port *port = vlantap->vlan.port;
- netdev_rx_handler_unregister(dev);
- tap_del_queues(&vlan->tap);
- ipvlan_link_delete(dev, head);
+ mutex_lock(&port->pnodes_lock);
+ __ipvtap_dellink(dev, head);
+ mutex_unlock(&port->pnodes_lock);
}
static void ipvtap_setup(struct net_device *dev)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 net-next 14/14] ipvlan: Support per-netns netdev unregistration.
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (12 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex Kuniyuki Iwashima
@ 2026-07-01 21:41 ` Kuniyuki Iwashima
2026-07-02 7:45 ` [syzbot ci] Re: net: Support per-netns device unregistration syzbot ci
14 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-01 21:41 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
When a lower device is unregistered, its upper ipvlan devices
must also be unregistered. However, these upper devices may
reside in different netns than the lower device.
Let's use unregister_netdevice_queue_net() to support per-netns
device unregistration for ipvlan.
The new dying flag in struct ipvl_dev is used to avoid a race
that ipvlan_link_delete() is called while its lower device is
being removed in ipvlan_device_event().
If dying is true in ipvlan_link_delete(), the ipvlan device is
already destructed but not yet unregistered. In this case,
unregistration will be done in __rtnl_net_unlock() of the
->dellink() caller.
Tested:
1. Create veth in ns1 and two ipvlan devices in ns2 and ns3.
# ip netns add ns1
# ip netns add ns2
# ip netns add ns3
# ip -n ns1 link add veth0 type veth peer veth1
# ip -n ns2 link add ipvl2 link veth0 link-netns ns1 type ipvlan mode l2
# ip -n ns3 link add ipvl3 link veth0 link-netns ns1 type ipvlan mode l2
2. Run bpftrace to check that veth is unregistered first but
wait ipvlan to be unregistered
# bpftrace -e '#include <linux/netdevice.h>
kprobe:ipvlan_uninit,
kprobe:veth_dellink,
kprobe:free_netdev {
$dev = (struct net_device *)arg0;
printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
}'
3. Remove the lower veth0 in ns1.
# ip -n ns1 link del veth0
We can see that veth0 is freed after unregistering ipvl2 and ipvl3
in per-netns work because ipvl_port holds refcount of veth0.
PID: 2010 | DEV: veth0
veth_dellink+5
rtnl_dellink+1213
rtnetlink_rcv_msg+1791
...
PID: 440 | DEV: ipvl2
ipvlan_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
process_scheduled_works+2538
...
PID: 440 | DEV: ipvl2
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
PID: 440 | DEV: ipvl3
ipvlan_uninit+5
unregister_netdevice_many_notify+7129
unregister_netdevice_many_net+1050
rtnl_net_work_func+136
process_scheduled_works+2538
...
PID: 2010 | DEV: veth0
free_netdev+5
netdev_run_todo+4798
rtnl_dellink+1507
rtnetlink_rcv_msg+1791
...
PID: 440 | DEV: ipvl3
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/ipvlan/ipvlan.h | 4 +++-
drivers/net/ipvlan/ipvlan_main.c | 25 ++++++++++++++++---------
drivers/net/ipvlan/ipvtap.c | 3 ++-
3 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index a0736f5c89f6..a83313244add 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -72,6 +72,7 @@ struct ipvl_dev {
DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
netdev_features_t sfeatures;
u32 msg_enable;
+ bool dying;
};
struct ipvl_addr {
@@ -216,7 +217,8 @@ struct ipvtap_dev {
struct tap_dev tap;
};
-void __ipvtap_dellink(struct net_device *dev, struct list_head *head);
+void __ipvtap_dellink(struct net *net, struct net_device *dev,
+ struct list_head *head);
#endif
#endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 41024fe27b78..7e2cf43ca78a 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -700,7 +700,8 @@ int ipvlan_link_new(struct net_device *dev, struct rtnl_newlink_params *params,
}
EXPORT_SYMBOL_GPL(ipvlan_link_new);
-static void __ipvlan_link_delete(struct net_device *dev, struct list_head *head)
+static void __ipvlan_link_delete(struct net *net, struct net_device *dev,
+ struct list_head *head)
{
struct ipvl_dev *ipvlan = netdev_priv(dev);
struct ipvl_addr *addr, *next;
@@ -715,7 +716,7 @@ static void __ipvlan_link_delete(struct net_device *dev, struct list_head *head)
ida_free(&ipvlan->port->ida, dev->dev_id);
list_del_rcu(&ipvlan->pnode);
- unregister_netdevice_queue(dev, head);
+ unregister_netdevice_queue_net(net, dev, head);
netdev_upper_dev_unlink(ipvlan->phy_dev, dev);
}
@@ -724,18 +725,20 @@ static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
struct ipvl_dev *ipvlan = netdev_priv(dev);
mutex_lock(&ipvlan->port->pnodes_lock);
- __ipvlan_link_delete(dev, head);
+ if (!ipvlan->dying)
+ __ipvlan_link_delete(dev_net(dev), dev, head);
mutex_unlock(&ipvlan->port->pnodes_lock);
}
#if IS_ENABLED(CONFIG_IPVTAP)
-void __ipvtap_dellink(struct net_device *dev, struct list_head *head)
+void __ipvtap_dellink(struct net *net, struct net_device *dev,
+ struct list_head *head)
{
struct ipvtap_dev *vlantap = netdev_priv(dev);
netdev_rx_handler_unregister(dev);
tap_del_queues(&vlantap->tap);
- __ipvlan_link_delete(dev, head);
+ __ipvlan_link_delete(net, dev, head);
}
EXPORT_SYMBOL_GPL(__ipvtap_dellink);
#endif
@@ -832,22 +835,26 @@ static int ipvlan_device_event(struct notifier_block *unused,
ipvlan_migrate_l3s_hook(oldnet, newnet);
break;
}
- case NETDEV_UNREGISTER:
+ case NETDEV_UNREGISTER: {
+ struct net *net = dev_net(dev);
+
if (dev->reg_state != NETREG_UNREGISTERING)
break;
list_for_each_entry_safe(ipvlan, next, &port->ipvlans, pnode) {
+ ipvlan->dying = true;
+
#if IS_ENABLED(CONFIG_IPVTAP)
if (ipvlan->dev->rtnl_link_ops != &ipvlan_link_ops)
- __ipvtap_dellink(ipvlan->dev, &lst_kill);
+ __ipvtap_dellink(net, ipvlan->dev, &lst_kill);
else
#endif
- __ipvlan_link_delete(ipvlan->dev, &lst_kill);
+ __ipvlan_link_delete(net, ipvlan->dev, &lst_kill);
}
unregister_netdevice_many(&lst_kill);
break;
-
+ }
case NETDEV_FEAT_CHANGE:
list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
netif_inherit_tso_max(ipvlan->dev, dev);
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c
index 17b0dd7cf73b..b790959c03f5 100644
--- a/drivers/net/ipvlan/ipvtap.c
+++ b/drivers/net/ipvlan/ipvtap.c
@@ -110,7 +110,8 @@ static void ipvtap_dellink(struct net_device *dev,
struct ipvl_port *port = vlantap->vlan.port;
mutex_lock(&port->pnodes_lock);
- __ipvtap_dellink(dev, head);
+ if (!vlantap->vlan.dying)
+ __ipvtap_dellink(dev_net(dev), dev, head);
mutex_unlock(&port->pnodes_lock);
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [syzbot ci] Re: net: Support per-netns device unregistration
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
` (13 preceding siblings ...)
2026-07-01 21:41 ` [PATCH v1 net-next 14/14] ipvlan: Support per-netns netdev unregistration Kuniyuki Iwashima
@ 2026-07-02 7:45 ` syzbot ci
2026-07-02 21:59 ` Kuniyuki Iwashima
14 siblings, 1 reply; 17+ messages in thread
From: syzbot ci @ 2026-07-02 7:45 UTC (permalink / raw)
To: andrew, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
pabeni
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] net: Support per-netns device unregistration
https://lore.kernel.org/all/20260701214334.266991-1-kuniyu@google.com
* [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
* [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister().
* [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work.
* [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock().
* [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
* [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra.
* [PATCH v1 net-next 07/14] net: Call unregister_netdevice_many() per netns.
* [PATCH v1 net-next 08/14] veth: Support per-netns device unregistration.
* [PATCH v1 net-next 09/14] bareudp: Protect bareudp_list with mutex.
* [PATCH v1 net-next 10/14] bareudp: Support per-netns netdev unregistration.
* [PATCH v1 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t.
* [PATCH v1 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev.
* [PATCH v1 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex.
* [PATCH v1 net-next 14/14] ipvlan: Support per-netns netdev unregistration.
and found the following issue:
possible deadlock in __dev_change_net_namespace
Full report is available here:
https://ci.syzbot.org/series/a744b257-d741-4780-8a53-f156b2a7afc9
***
possible deadlock in __dev_change_net_namespace
tree: net-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base: d6e81529749190123aa0040626c7e5dbc20fdc9a
arch: amd64
compiler: Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config: https://ci.syzbot.org/builds/243cd0ec-28f9-4d21-8f16-3d2fbad8388d/config
syz repro: https://ci.syzbot.org/findings/a8a0740d-fdec-4a20-9aa5-7cb955707913/syz_repro
veth0_macvtap: left promiscuous mode
============================================
WARNING: possible recursive locking detected
syzkaller #0 Not tainted
--------------------------------------------
syz.1.18/5814 is trying to acquire lock:
ffffffff9a9b0418 (&net->dev_unreg_lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
ffffffff9a9b0418 (&net->dev_unreg_lock){+.+.}-{3:3}, at: unregister_netdevice_move_net net/core/dev.c:-1 [inline]
ffffffff9a9b0418 (&net->dev_unreg_lock){+.+.}-{3:3}, at: __dev_change_net_namespace+0x1479/0x2200 net/core/dev.c:12768
but task is already holding lock:
ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: unregister_netdevice_move_net net/core/dev.c:-1 [inline]
ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: __dev_change_net_namespace+0x146a/0x2200 net/core/dev.c:12768
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&net->dev_unreg_lock);
lock(&net->dev_unreg_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by syz.1.18/5814:
#0: ffffffff8fdeb0c0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
#0: ffffffff8fdeb0c0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x2a/0x2b0 net/core/rtnetlink.c:366
#1: ffffffff9a9b0380 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x76/0x2b0 net/core/rtnetlink.c:369
#2: ffff88810b6cf840 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0xbf/0x2b0 net/core/rtnetlink.c:369
#3: ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
#3: ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: unregister_netdevice_move_net net/core/dev.c:-1 [inline]
#3: ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: __dev_change_net_namespace+0x146a/0x2200 net/core/dev.c:12768
stack backtrace:
CPU: 0 UID: 0 PID: 5814 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_deadlock_bug+0x279/0x290 kernel/locking/lockdep.c:3041
check_deadlock kernel/locking/lockdep.c:3093 [inline]
validate_chain kernel/locking/lockdep.c:3895 [inline]
__lock_acquire+0x24df/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
__raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
spin_lock include/linux/spinlock.h:342 [inline]
unregister_netdevice_move_net net/core/dev.c:-1 [inline]
__dev_change_net_namespace+0x1479/0x2200 net/core/dev.c:12768
do_setlink+0x2d1/0x4670 net/core/rtnetlink.c:3148
rtnl_changelink net/core/rtnetlink.c:3880 [inline]
__rtnl_newlink net/core/rtnetlink.c:4053 [inline]
rtnl_newlink+0x1612/0x1b50 net/core/rtnetlink.c:4192
rtnetlink_rcv_msg+0x802/0xc00 net/core/rtnetlink.c:7109
netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
sock_sendmsg_nosec+0x13a/0x180 net/socket.c:775
__sock_sendmsg net/socket.c:790 [inline]
____sys_sendmsg+0x54e/0x850 net/socket.c:2684
___sys_sendmsg+0x2a5/0x360 net/socket.c:2738
__sys_sendmsg net/socket.c:2770 [inline]
__do_sys_sendmsg net/socket.c:2775 [inline]
__se_sys_sendmsg net/socket.c:2773 [inline]
__x64_sys_sendmsg+0x1b1/0x290 net/socket.c:2773
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fbdc639ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbdc59fe028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fbdc6615fa0 RCX: 00007fbdc639ce59
RDX: 0000000000000000 RSI: 0000200000000740 RDI: 0000000000000003
RBP: 00007fbdc6432e6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fbdc6616038 R14: 00007fbdc6615fa0 R15: 00007ffd94dcb1f8
</TASK>
syz.1.18 (5814) used greatest stack depth: 19864 bytes left
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [syzbot ci] Re: net: Support per-netns device unregistration
2026-07-02 7:45 ` [syzbot ci] Re: net: Support per-netns device unregistration syzbot ci
@ 2026-07-02 21:59 ` Kuniyuki Iwashima
0 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2026-07-02 21:59 UTC (permalink / raw)
To: syzbot+ci052b96c9bf56ca1d
Cc: andrew, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
pabeni, syzbot, syzkaller-bugs
From: syzbot ci <syzbot+ci052b96c9bf56ca1d@syzkaller.appspotmail.com>
Date: Thu, 02 Jul 2026 00:45:10 -0700
> syzbot ci has tested the following series
>
> [v1] net: Support per-netns device unregistration
> https://lore.kernel.org/all/20260701214334.266991-1-kuniyu@google.com
> * [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
> * [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister().
> * [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work.
> * [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock().
> * [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
> * [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra.
> * [PATCH v1 net-next 07/14] net: Call unregister_netdevice_many() per netns.
> * [PATCH v1 net-next 08/14] veth: Support per-netns device unregistration.
> * [PATCH v1 net-next 09/14] bareudp: Protect bareudp_list with mutex.
> * [PATCH v1 net-next 10/14] bareudp: Support per-netns netdev unregistration.
> * [PATCH v1 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t.
> * [PATCH v1 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev.
> * [PATCH v1 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex.
> * [PATCH v1 net-next 14/14] ipvlan: Support per-netns netdev unregistration.
>
> and found the following issue:
> possible deadlock in __dev_change_net_namespace
>
> Full report is available here:
> https://ci.syzbot.org/series/a744b257-d741-4780-8a53-f156b2a7afc9
>
> ***
>
> possible deadlock in __dev_change_net_namespace
>
> tree: net-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
> base: d6e81529749190123aa0040626c7e5dbc20fdc9a
> arch: amd64
> compiler: Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
> config: https://ci.syzbot.org/builds/243cd0ec-28f9-4d21-8f16-3d2fbad8388d/config
> syz repro: https://ci.syzbot.org/findings/a8a0740d-fdec-4a20-9aa5-7cb955707913/syz_repro
>
> veth0_macvtap: left promiscuous mode
> ============================================
> WARNING: possible recursive locking detected
> syzkaller #0 Not tainted
> --------------------------------------------
> syz.1.18/5814 is trying to acquire lock:
> ffffffff9a9b0418 (&net->dev_unreg_lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
> ffffffff9a9b0418 (&net->dev_unreg_lock){+.+.}-{3:3}, at: unregister_netdevice_move_net net/core/dev.c:-1 [inline]
> ffffffff9a9b0418 (&net->dev_unreg_lock){+.+.}-{3:3}, at: __dev_change_net_namespace+0x1479/0x2200 net/core/dev.c:12768
>
> but task is already holding lock:
> ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
> ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: unregister_netdevice_move_net net/core/dev.c:-1 [inline]
> ffff88810b6cf8d8 (&net->dev_unreg_lock){+.+.}-{3:3}, at: __dev_change_net_namespace+0x146a/0x2200 net/core/dev.c:12768
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&net->dev_unreg_lock);
> lock(&net->dev_unreg_lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
Oh right, I'll squash this to patch 6.
---8<---
diff --git a/net/core/dev.c b/net/core/dev.c
index 57fb4741d0ac..fcd58c2aa030 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12574,10 +12574,10 @@ static void unregister_netdevice_move_net(struct net *net_old,
{
if (net_old > net) {
spin_lock(&net->dev_unreg_lock);
- spin_lock(&net_old->dev_unreg_lock);
+ spin_lock_nested(&net_old->dev_unreg_lock, SINGLE_DEPTH_NESTING);
} else {
spin_lock(&net_old->dev_unreg_lock);
- spin_lock(&net->dev_unreg_lock);
+ spin_lock_nested(&net->dev_unreg_lock, SINGLE_DEPTH_NESTING);
}
if (!list_empty(&dev->unreg_list_net)) {
---8<---
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-07-02 21:59 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 21:41 [PATCH v1 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink() Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister() Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 03/14] rtnetlink: Add per-netns rtnl_work Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock() Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any() Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 06/14] net: Add per-netns netdev unregistration infra Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 07/14] net: Call unregister_netdevice_many() per netns Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 08/14] veth: Support per-netns device unregistration Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 09/14] bareudp: Protect bareudp_list with mutex Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 10/14] bareudp: Support per-netns netdev unregistration Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex Kuniyuki Iwashima
2026-07-01 21:41 ` [PATCH v1 net-next 14/14] ipvlan: Support per-netns netdev unregistration Kuniyuki Iwashima
2026-07-02 7:45 ` [syzbot ci] Re: net: Support per-netns device unregistration syzbot ci
2026-07-02 21:59 ` Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox