From: Kuniyuki Iwashima <kuniyu@google.com>
To: "David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Simon Horman <horms@kernel.org>,
Kuniyuki Iwashima <kuniyu@google.com>,
Kuniyuki Iwashima <kuni1840@gmail.com>,
netdev@vger.kernel.org
Subject: [PATCH v2 net-next 08/14] veth: Support per-netns device unregistration.
Date: Fri, 3 Jul 2026 00:09:19 +0000 [thread overview]
Message-ID: <20260703001009.1572444-9-kuniyu@google.com> (raw)
In-Reply-To: <20260703001009.1572444-1-kuniyu@google.com>
Currently, veth_dellink() unregisters both local and peer devices
synchronously under RTNL.
Once RTNL is removed, it can be called concurrently from different
netns.
Let's use xchg() and unregister_netdevice_queue_net() to support
per-netns device unregistration.
This way, each device is queued for destruction only once by
the winner of the race.
Note that the extra netdev_hold() ensures that @peer obtained by
the first xchg() is not freed during the subsequent access to
netdev_priv(peer). The 2nd xchg() overwrites @dev to balance
the refcount.
Tested:
1. Create two veth pairs (veth1-2, veth3-4) between two netns
(ns1 & ns2).
# ip netns add ns1
# ip netns add ns2
# ip -n ns1 link add veth1 type veth peer veth2 netns ns2
# ip -n ns1 link add veth3 type veth peer veth4 netns ns2
2. Run bpftrace to check if the same process does NOT
unregister the paired veth devices
# bpftrace -e '#include <linux/netdevice.h>
kprobe:free_netdev {
$dev = (struct net_device *)arg0;
printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
}'
3. Remove veth2 in ns2 and check bpftrace output
# ip -n ns2 link del veth2
PID: 2194 | DEV: veth2
free_netdev+5
netdev_run_todo+4798
rtnl_dellink+1507
rtnetlink_rcv_msg+1791
...
PID: 448 | DEV: veth1
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
4. Remove ns2 (thus veth4) and check bpftrace output
# ip netns del ns2
PID: 571 | DEV: veth4
free_netdev+5
netdev_run_todo+4798
default_device_exit_batch+2271
ops_undo_list+993
cleanup_net+1122
process_scheduled_works+2538
...
PID: 441 | DEV: veth3
free_netdev+5
netdev_run_todo+4798
process_scheduled_works+2538
...
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
drivers/net/veth.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 1c5142149175..8170bf33ccf9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -77,6 +77,7 @@ struct veth_priv {
struct bpf_prog *_xdp_prog;
struct veth_rq *rq;
unsigned int requested_headroom;
+ netdevice_tracker peer_tracker;
};
struct veth_xdp_tx_bq {
@@ -1901,15 +1902,17 @@ static int veth_newlink(struct net_device *dev,
priv = netdev_priv(dev);
rcu_assign_pointer(priv->peer, peer);
+ netdev_hold(peer, &priv->peer_tracker, GFP_KERNEL);
err = veth_init_queues(dev, tb);
if (err)
goto err_queues;
priv = netdev_priv(peer);
rcu_assign_pointer(priv->peer, dev);
+ netdev_hold(dev, &priv->peer_tracker, GFP_KERNEL);
err = veth_init_queues(peer, tb);
if (err)
- goto err_queues;
+ goto err_peer_queues;
veth_disable_gro(dev);
/* update XDP supported features */
@@ -1918,7 +1921,11 @@ static int veth_newlink(struct net_device *dev,
return 0;
+err_peer_queues:
+ netdev_put(dev, &priv->peer_tracker);
+ priv = netdev_priv(dev);
err_queues:
+ netdev_put(peer, &priv->peer_tracker);
unregister_netdevice(dev);
err_register_dev:
/* nothing to do */
@@ -1933,24 +1940,25 @@ static int veth_newlink(struct net_device *dev,
static void veth_dellink(struct net_device *dev, struct list_head *head)
{
- struct veth_priv *priv;
+ netdevice_tracker *peer_tracker;
struct net_device *peer;
+ struct veth_priv *priv;
priv = netdev_priv(dev);
- peer = rtnl_dereference(priv->peer);
+ peer_tracker = &priv->peer_tracker;
+ peer = unrcu_pointer(xchg(&priv->peer, NULL));
+ if (!peer)
+ return;
- /* Note : dellink() is called from default_device_exit_batch(),
- * before a rcu_synchronize() point. The devices are guaranteed
- * not being freed before one RCU grace period.
- */
- RCU_INIT_POINTER(priv->peer, NULL);
unregister_netdevice_queue(dev, head);
- if (peer) {
- priv = netdev_priv(peer);
- RCU_INIT_POINTER(priv->peer, NULL);
- unregister_netdevice_queue(peer, head);
- }
+ priv = netdev_priv(peer);
+ dev = unrcu_pointer(xchg(&priv->peer, NULL));
+ if (dev)
+ unregister_netdevice_queue_net(dev_net(dev), peer, head);
+
+ netdev_put(peer, peer_tracker);
+ netdev_put(dev, &priv->peer_tracker);
}
static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
--
2.55.0.rc0.799.gd6f94ed593-goog
next prev parent reply other threads:[~2026-07-03 0:10 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 0:09 [PATCH v2 net-next 00/14] net: Support per-netns device unregistration Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 01/14] rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink() Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 02/14] rtnetlink: Call unregister_netdevice_many() only once in rtnl_link_unregister() Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 03/14] rtnetlink: Add per-netns rtnl_work Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 04/14] net: Wrap default_device_exit_net() with __rtnl_net_lock() Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 05/14] net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any() Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 06/14] net: Add per-netns netdev unregistration infra Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 07/14] net: Call unregister_netdevice_many() per netns Kuniyuki Iwashima
2026-07-03 0:09 ` Kuniyuki Iwashima [this message]
2026-07-03 0:09 ` [PATCH v2 net-next 09/14] bareudp: Protect bareudp_list with mutex Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 10/14] bareudp: Support per-netns netdev unregistration Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 11/14] ipvlan: Convert ipvl_port.count to refcount_t Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 12/14] ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same lower dev Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 13/14] ipvlan: Protect ipvl_port.ipvlans with mutex Kuniyuki Iwashima
2026-07-03 0:09 ` [PATCH v2 net-next 14/14] ipvlan: Support per-netns netdev unregistration Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703001009.1572444-9-kuniyu@google.com \
--to=kuniyu@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox