Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink()
@ 2026-05-19 11:43 Eric Dumazet
  2026-05-19 11:43 ` [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Eric Dumazet @ 2026-05-19 11:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

Many shell scripts invoke iproute2 commands specifying a device by
its name.

This series improves their performance avoiding RTNL acquisition
for their (repeated) name->index conversion.

Eric Dumazet (2):
  rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  rtnetlink: do not acquire RTNL for RTM_GETLINK with
    RTEXT_FILTER_NAME_ONLY

 net/core/rtnetlink.c | 76 ++++++++++++++++++++++++++++++++------------
 1 file changed, 55 insertions(+), 21 deletions(-)

-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  2026-05-19 11:43 [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Eric Dumazet
@ 2026-05-19 11:43 ` Eric Dumazet
  2026-05-19 16:39   ` Jakub Kicinski
  2026-05-19 11:43 ` [PATCH v2 net-next 2/2] rtnetlink: do not acquire RTNL for RTM_GETLINK with RTEXT_FILTER_NAME_ONLY Eric Dumazet
  2026-05-19 16:37 ` [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Jakub Kicinski
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2026-05-19 11:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

Avoid corrupting a netlink message and confuse user space in the
unlikely case rtnl_fill_prop_list was able to produce a very big
nested element.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6a5e9ace55a0880d7b1e4303d12dc0a8b8b7c5ed..ae0254f19178735b2805a8189e81a960a49b2858 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1971,7 +1971,9 @@ static int rtnl_fill_prop_list(struct sk_buff *skb,
 	if (ret <= 0)
 		goto nest_cancel;
 
-	nla_nest_end(skb, prop_list);
+	if (nla_nest_end_safe(skb, prop_list) < 0)
+		goto nest_cancel;
+
 	return 0;
 
 nest_cancel:
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 2/2] rtnetlink: do not acquire RTNL for RTM_GETLINK with RTEXT_FILTER_NAME_ONLY
  2026-05-19 11:43 [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Eric Dumazet
  2026-05-19 11:43 ` [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
@ 2026-05-19 11:43 ` Eric Dumazet
  2026-05-19 16:37 ` [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Jakub Kicinski
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2026-05-19 11:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

When RTEXT_FILTER_NAME_ONLY is requested, rtnl_fill_ifinfo()
is dumping device attributes which do not need RTNL protection.

Many shell scripts invoke iproute2 commands specifying a device by
its name. After this patch, they will no longer add RTNL pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: move the ASSERT_RTNL() in rtnl_fill_ifinfo()

 net/core/rtnetlink.c | 72 ++++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ae0254f19178735b2805a8189e81a960a49b2858..68cd2238ee170f44841caf47c86ef48303a3d15e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2068,7 +2068,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	struct nlmsghdr *nlh;
 	struct Qdisc *qdisc;
 
-	ASSERT_RTNL();
 	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
 	if (nlh == NULL)
 		return -EMSGSIZE;
@@ -2091,6 +2090,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (ext_filter_mask & RTEXT_FILTER_NAME_ONLY)
 		goto end;
 
+	ASSERT_RTNL();
 	if (tgt_netnsid >= 0 &&
 	    nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
 		goto nla_put_failure;
@@ -3468,6 +3468,21 @@ static struct net_device *rtnl_dev_get(struct net *net,
 	return __dev_get_by_name(net, ifname);
 }
 
+static struct net_device *rtnl_dev_get_rcu(struct net *net,
+					   struct nlattr *tb[])
+{
+	char ifname[ALTIFNAMSIZ];
+
+	if (tb[IFLA_IFNAME])
+		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
+	else if (tb[IFLA_ALT_IFNAME])
+		nla_strscpy(ifname, tb[IFLA_ALT_IFNAME], ALTIFNAMSIZ);
+	else
+		return NULL;
+
+	return dev_get_by_name_rcu(net, ifname);
+}
+
 static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
@@ -4187,14 +4202,15 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFLA_MAX + 1];
+	netdevice_tracker dev_tracker;
+	struct net_device *dev = NULL;
 	struct net *tgt_net = net;
+	u32 ext_filter_mask = 0;
 	struct ifinfomsg *ifm;
-	struct nlattr *tb[IFLA_MAX+1];
-	struct net_device *dev = NULL;
 	struct sk_buff *nskb;
 	int netnsid = -1;
 	int err;
-	u32 ext_filter_mask = 0;
 
 	err = rtnl_valid_getlink_req(skb, nlh, tb, extack);
 	if (err < 0)
@@ -4214,14 +4230,19 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (tb[IFLA_EXT_MASK])
 		ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
 
-	err = -EINVAL;
 	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_index > 0)
-		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
-		dev = rtnl_dev_get(tgt_net, tb);
-	else
+	rcu_read_lock();
+	if (ifm->ifi_index > 0) {
+		dev = dev_get_by_index_rcu(tgt_net, ifm->ifi_index);
+	} else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
+		dev = rtnl_dev_get_rcu(tgt_net, tb);
+	} else {
+		rcu_read_unlock();
+		err = -EINVAL;
 		goto out;
+	}
+	netdev_hold(dev, &dev_tracker, GFP_ATOMIC);
+	rcu_read_unlock();
 
 	err = -ENODEV;
 	if (dev == NULL)
@@ -4232,25 +4253,35 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (nskb == NULL)
 		goto out;
 
-	/* Synchronize the carrier state so we don't report a state
-	 * that we're not actually going to honour immediately; if
-	 * the driver just did a carrier off->on transition, we can
-	 * only TX if link watch work has run, but without this we'd
-	 * already report carrier on, even if it doesn't work yet.
-	 */
-	linkwatch_sync_dev(dev);
+	if (!(ext_filter_mask & RTEXT_FILTER_NAME_ONLY)) {
+		rtnl_lock();
+		/* Synchronize the carrier state so we don't report a state
+		 * that we're not actually going to honour immediately; if
+		 * the driver just did a carrier off->on transition, we can
+		 * only TX if link watch work has run, but without this we'd
+		 * already report carrier on, even if it doesn't work yet.
+		 */
+		linkwatch_sync_dev(dev);
+	}
 
 	err = rtnl_fill_ifinfo(nskb, dev, net,
 			       RTM_NEWLINK, NETLINK_CB(skb).portid,
 			       nlh->nlmsg_seq, 0, 0, ext_filter_mask,
 			       0, NULL, 0, netnsid, GFP_KERNEL);
+
+	if (!(ext_filter_mask & RTEXT_FILTER_NAME_ONLY))
+		rtnl_unlock();
+
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size */
-		WARN_ON(err == -EMSGSIZE);
+		WARN_ON_ONCE(err == -EMSGSIZE &&
+			     !(ext_filter_mask & RTEXT_FILTER_NAME_ONLY));
 		kfree_skb(nskb);
-	} else
+	} else {
 		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+	}
 out:
+	netdev_put(dev, &dev_tracker);
 	if (netnsid >= 0)
 		put_net(tgt_net);
 
@@ -7116,7 +7147,8 @@ static const struct rtnl_msg_handler rtnetlink_rtnl_msg_handlers[] __initconst =
 	{.msgtype = RTM_DELLINK, .doit = rtnl_dellink,
 	 .flags = RTNL_FLAG_DOIT_PERNET_WIP},
 	{.msgtype = RTM_GETLINK, .doit = rtnl_getlink,
-	 .dumpit = rtnl_dump_ifinfo, .flags = RTNL_FLAG_DUMP_SPLIT_NLM_DONE},
+	 .dumpit = rtnl_dump_ifinfo,
+	 .flags = RTNL_FLAG_DUMP_SPLIT_NLM_DONE | RTNL_FLAG_DOIT_UNLOCKED},
 	{.msgtype = RTM_SETLINK, .doit = rtnl_setlink,
 	 .flags = RTNL_FLAG_DOIT_PERNET_WIP},
 	{.msgtype = RTM_GETADDR, .dumpit = rtnl_dump_all},
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink()
  2026-05-19 11:43 [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Eric Dumazet
  2026-05-19 11:43 ` [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
  2026-05-19 11:43 ` [PATCH v2 net-next 2/2] rtnetlink: do not acquire RTNL for RTM_GETLINK with RTEXT_FILTER_NAME_ONLY Eric Dumazet
@ 2026-05-19 16:37 ` Jakub Kicinski
  2026-05-19 17:17   ` Eric Dumazet
  2 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-19 16:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Tue, 19 May 2026 11:43:53 +0000 Eric Dumazet wrote:
> Many shell scripts invoke iproute2 commands specifying a device by
> its name.
> 
> This series improves their performance avoiding RTNL acquisition
> for their (repeated) name->index conversion.

Hm.

[ 1414.868166][T10284] BUG: KASAN: slab-use-after-free in rtnl_fill_prop_list+0x5c0/0x620
[ 1414.868291][T10284] Read of size 8 at addr ff11000001d2c150 by task (udev-worker)/10284
[ 1414.868404][T10284] 
[ 1414.868445][T10284] CPU: 2 UID: 0 PID: 10284 Comm: (udev-worker) Not tainted 7.1.0-rc3-virtme #1 PREEMPT(full) 
[ 1414.868448][T10284] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1414.868450][T10284] Call Trace:
[ 1414.868452][T10284]  <TASK>
[ 1414.868453][T10284]  dump_stack_lvl+0x6f/0xa0
[ 1414.868459][T10284]  print_address_description.constprop.0+0x56/0x2d0
[ 1414.868464][T10284]  print_report+0xfc/0x1fa
[ 1414.868466][T10284]  ? __virt_addr_valid+0x102/0x440
[ 1414.868470][T10284]  ? __virt_addr_valid+0x1da/0x440
[ 1414.868472][T10284]  kasan_report+0x108/0x130
[ 1414.868475][T10284]  ? rtnl_fill_prop_list+0x5c0/0x620
[ 1414.868477][T10284]  ? rtnl_fill_prop_list+0x5c0/0x620
[ 1414.868479][T10284]  rtnl_fill_prop_list+0x5c0/0x620
[ 1414.868480][T10284]  ? __asan_memcpy+0x3c/0x60
[ 1414.868482][T10284]  rtnl_fill_ifinfo.isra.0+0x3d6/0x2c90
[ 1414.868484][T10284]  ? rcu_read_lock_any_held+0x3c/0x90
[ 1414.868487][T10284]  ? validate_chain+0x38b/0xc20
[ 1414.868490][T10284]  ? rtnl_fill_vf+0x460/0x460
[ 1414.868491][T10284]  ? lockdep_hardirqs_on_prepare.part.0+0x9a/0x160
[ 1414.868493][T10284]  ? lockdep_hardirqs_on+0x8c/0x130
[ 1414.868496][T10284]  ? __lock_acquire+0x508/0xc10
[ 1414.868498][T10284]  ? lock_acquire.part.0+0xbc/0x260
[ 1414.868499][T10284]  ? find_held_lock+0x2b/0x80
[ 1414.868502][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
[ 1414.868504][T10284]  ? mark_held_locks+0x40/0x70
[ 1414.868505][T10284]  ? lockdep_hardirqs_on_prepare.part.0+0x9a/0x160
[ 1414.868507][T10284]  ? lockdep_hardirqs_on+0x8c/0x130
[ 1414.868508][T10284]  ? _raw_spin_unlock_irqrestore+0x53/0x80
[ 1414.868510][T10284]  rtnl_getlink+0xa48/0xe50
[ 1414.868513][T10284]  ? find_held_lock+0x2b/0x80
[ 1414.868515][T10284]  ? rtnl_dump_ifinfo+0xfb0/0xfb0
[ 1414.868516][T10284]  ? mark_usage+0x61/0x170
[ 1414.868517][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
[ 1414.868518][T10284]  ? __lock_acquire+0x508/0xc10
[ 1414.868525][T10284]  ? lock_acquire.part.0+0xbc/0x260
[ 1414.868526][T10284]  ? find_held_lock+0x2b/0x80
[ 1414.868529][T10284]  ? mark_usage+0x61/0x170
[ 1414.868530][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
[ 1414.868531][T10284]  ? __lock_acquire+0x508/0xc10
[ 1414.868532][T10284]  ? bpf_address_lookup+0x232/0x290
[ 1414.868536][T10284]  ? lock_acquire.part.0+0xbc/0x260
[ 1414.868537][T10284]  ? find_held_lock+0x2b/0x80
[ 1414.868539][T10284]  ? rtnl_dump_ifinfo+0xfb0/0xfb0
[ 1414.868540][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
[ 1414.868542][T10284]  ? rtnl_dump_ifinfo+0xfb0/0xfb0
[ 1414.868543][T10284]  rtnetlink_rcv_msg+0x6fd/0xbd0
[ 1414.868545][T10284]  ? validate_chain+0x38b/0xc20
[ 1414.868546][T10284]  ? rtnl_link_fill+0x920/0x920
[ 1414.868547][T10284]  ? __lock_acquire+0x508/0xc10
[ 1414.868549][T10284]  ? lock_acquire.part.0+0xbc/0x260
[ 1414.868551][T10284]  ? find_held_lock+0x2b/0x80
[ 1414.868553][T10284]  netlink_rcv_skb+0x14e/0x3a0
[ 1414.868556][T10284]  ? rtnl_link_fill+0x920/0x920
[ 1414.868558][T10284]  ? netlink_ack+0xce0/0xce0
[ 1414.868560][T10284]  ? netlink_deliver_tap+0xc5/0x330
[ 1414.868562][T10284]  ? netlink_deliver_tap+0x13c/0x330
[ 1414.868564][T10284]  netlink_unicast+0x47c/0x740
[ 1414.868566][T10284]  ? netlink_attachskb+0x800/0x800
[ 1414.868568][T10284]  ? __lock_acquire+0x508/0xc10
[ 1414.868570][T10284]  netlink_sendmsg+0x735/0xc60
[ 1414.868572][T10284]  ? netlink_unicast+0x740/0x740
[ 1414.868574][T10284]  ? __might_fault+0x97/0x140
[ 1414.868577][T10284]  ? __might_fault+0x97/0x140
[ 1414.868579][T10284]  __sys_sendto+0x2c9/0x400
[ 1414.868582][T10284]  ? __ia32_sys_getpeername+0xd0/0xd0
[ 1414.868586][T10284]  ? fput_close_sync+0xde/0x1b0
[ 1414.868589][T10284]  ? alloc_file_clone+0xe0/0xe0
[ 1414.868591][T10284]  __x64_sys_sendto+0xe4/0x1f0
[ 1414.868593][T10284]  ? trace_irq_enable.constprop.0+0x9b/0x180
[ 1414.868596][T10284]  ? lockdep_hardirqs_on+0x8c/0x130
[ 1414.868597][T10284]  ? do_syscall_64+0x82/0xfc0
[ 1414.868599][T10284]  do_syscall_64+0x117/0xfc0
[ 1414.868600][T10284]  ? trace_hardirqs_off+0xd/0x30
[ 1414.868602][T10284]  ? exc_page_fault+0xee/0x100
[ 1414.868604][T10284]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1414.868606][T10284] RIP: 0033:0x7fcd4191e08e
[ 1414.868609][T10284] Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
[ 1414.868611][T10284] RSP: 002b:00007ffc4c8b41c0 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 1414.868614][T10284] RAX: ffffffffffffffda RBX: 0000557d5eaaee20 RCX: 00007fcd4191e08e
[ 1414.868615][T10284] RDX: 0000000000000020 RSI: 0000557d5eaa98e0 RDI: 0000000000000012
[ 1414.868616][T10284] RBP: 00007ffc4c8b41d0 R08: 00007ffc4c8b4220 R09: 0000000000000080
[ 1414.868617][T10284] R10: 0000000000000000 R11: 0000000000000202 R12: 0000557d5ec08060
[ 1414.868618][T10284] R13: 00007ffc4c8b4304 R14: 0000000000000000 R15: 00007ffc4c8b43a0
[ 1414.868621][T10284]  </TASK>
[ 1414.868621][T10284] 
[ 1414.876011][T10284] Allocated by task 10304:
[ 1414.876091][T10284]  kasan_save_stack+0x2f/0x50
[ 1414.876205][T10284]  kasan_save_track+0x14/0x30
[ 1414.876286][T10284]  __kasan_kmalloc+0x7b/0x90
[ 1414.876399][T10284]  register_netdevice+0x48b/0x1bc0
[ 1414.876477][T10284]  geneve_configure+0x6c3/0xcf0 [geneve]
[ 1414.876591][T10284]  geneve_newlink+0x189/0x220 [geneve]
[ 1414.876669][T10284]  rtnl_newlink_create+0x2da/0x8c0
[ 1414.876747][T10284]  __rtnl_newlink+0x22b/0xa50
[ 1414.876858][T10284]  rtnl_newlink+0x8d1/0xef0
[ 1414.876973][T10284]  rtnetlink_rcv_msg+0x6fd/0xbd0
[ 1414.877049][T10284]  netlink_rcv_skb+0x14e/0x3a0
[ 1414.877128][T10284]  netlink_unicast+0x47c/0x740
[ 1414.877204][T10284]  netlink_sendmsg+0x735/0xc60
[ 1414.877280][T10284]  __sys_sendto+0x2c9/0x400
[ 1414.877392][T10284]  __x64_sys_sendto+0xe4/0x1f0
[ 1414.877470][T10284]  do_syscall_64+0x117/0xfc0
[ 1414.877547][T10284]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1414.877680][T10284] 
[ 1414.877719][T10284] Freed by task 10304:
[ 1414.877818][T10284]  kasan_save_stack+0x2f/0x50
[ 1414.877894][T10284]  kasan_save_track+0x14/0x30
[ 1414.877968][T10284]  kasan_save_free_info+0x3b/0x60
[ 1414.878084][T10284]  __kasan_slab_free+0x43/0x70
[ 1414.878161][T10284]  kfree+0x123/0x5a0
[ 1414.878218][T10284]  unregister_netdevice_many_notify+0xf0d/0x1f20
[ 1414.878313][T10284]  rtnl_dellink+0x4a0/0xae0
[ 1414.878425][T10284]  rtnetlink_rcv_msg+0x6fd/0xbd0
[ 1414.878499][T10284]  netlink_rcv_skb+0x14e/0x3a0
[ 1414.878577][T10284]  netlink_unicast+0x47c/0x740
[ 1414.878657][T10284]  netlink_sendmsg+0x735/0xc60
[ 1414.878778][T10284]  __sys_sendto+0x2c9/0x400
[ 1414.878853][T10284]  __x64_sys_sendto+0xe4/0x1f0
[ 1414.878928][T10284]  do_syscall_64+0x117/0xfc0
[ 1414.879004][T10284]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1414.879098][T10284] 
[ 1414.879148][T10284] The buggy address belongs to the object at ff11000001d2c140
[ 1414.879148][T10284]  which belongs to the cache kmalloc-64 of size 64
[ 1414.879339][T10284] The buggy address is located 16 bytes inside of
[ 1414.879339][T10284]  freed 64-byte region [ff11000001d2c140, ff11000001d2c180)
[ 1414.879521][T10284] 
[ 1414.879559][T10284] The buggy address belongs to the physical page:
[ 1414.879689][T10284] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d2c
[ 1414.879910][T10284] flags: 0x80000000000000(node=0|zone=1)
[ 1414.879992][T10284] page_type: f5(slab)
[ 1414.880053][T10284] raw: 0080000000000000 ff1100000103cac0 ffd400000023ac90 ffd4000000074c90
[ 1414.880232][T10284] raw: 0000000000000000 0000000000100010 00000000f5000000 0000000000000000
[ 1414.880365][T10284] page dumped because: kasan: bad access detected
[ 1414.880495][T10284] 
[ 1414.880534][T10284] Memory state around the buggy address:
[ 1414.880609][T10284]  ff11000001d2c000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
[ 1414.880726][T10284]  ff11000001d2c080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1414.880837][T10284] >ff11000001d2c100: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
[ 1414.880948][T10284]                                                  ^
[ 1414.881042][T10284]  ff11000001d2c180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1414.881190][T10284]  ff11000001d2c200: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb

decoded: https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/653382/vm-crash-thr0-0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  2026-05-19 11:43 ` [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
@ 2026-05-19 16:39   ` Jakub Kicinski
  2026-05-19 16:53     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-19 16:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Tue, 19 May 2026 11:43:54 +0000 Eric Dumazet wrote:
> Avoid corrupting a netlink message and confuse user space in the
> unlikely case rtnl_fill_prop_list was able to produce a very big
> nested element.

Should we not prevent it from happening in the first place?
IIUC otherwise if user adds a lot of altnames ip link will no longer
work?

> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 6a5e9ace55a0880d7b1e4303d12dc0a8b8b7c5ed..ae0254f19178735b2805a8189e81a960a49b2858 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1971,7 +1971,9 @@ static int rtnl_fill_prop_list(struct sk_buff *skb,
>  	if (ret <= 0)
>  		goto nest_cancel;
>  
> -	nla_nest_end(skb, prop_list);
> +	if (nla_nest_end_safe(skb, prop_list) < 0)
> +		goto nest_cancel;
> +
>  	return 0;
>  
>  nest_cancel:


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  2026-05-19 16:39   ` Jakub Kicinski
@ 2026-05-19 16:53     ` Eric Dumazet
  2026-05-19 22:17       ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2026-05-19 16:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Tue, May 19, 2026 at 9:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 19 May 2026 11:43:54 +0000 Eric Dumazet wrote:
> > Avoid corrupting a netlink message and confuse user space in the
> > unlikely case rtnl_fill_prop_list was able to produce a very big
> > nested element.
>
> Should we not prevent it from happening in the first place?
> IIUC otherwise if user adds a lot of altnames ip link will no longer
> work?

We cannot prevent this unless we add a mutual exclusion.

If a reader iterates an RCU list, other threads can delete items
(before the reader's cursor) and add new ones to the end of the list.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink()
  2026-05-19 16:37 ` [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Jakub Kicinski
@ 2026-05-19 17:17   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2026-05-19 17:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Tue, May 19, 2026 at 9:37 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 19 May 2026 11:43:53 +0000 Eric Dumazet wrote:
> > Many shell scripts invoke iproute2 commands specifying a device by
> > its name.
> >
> > This series improves their performance avoiding RTNL acquisition
> > for their (repeated) name->index conversion.
>
> Hm.
>

We probably miss this.

diff --git a/net/core/dev.c b/net/core/dev.c
index 26ac8eb9b259d489159c7ab5a2b206d425110b3b..92f17c270da988ca46f4cfbb4ca67ebecd4e7e8e
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -398,7 +398,7 @@ static void netdev_name_node_alt_flush(struct
net_device *dev)
        struct netdev_name_node *name_node, *tmp;

        list_for_each_entry_safe(name_node, tmp, &dev->name_node->list, list) {
-               list_del(&name_node->list);
+               list_del_rcu(&name_node->list);
                netdev_name_node_alt_free(&name_node->rcu);
        }
 }


> [ 1414.868166][T10284] BUG: KASAN: slab-use-after-free in rtnl_fill_prop_list+0x5c0/0x620
> [ 1414.868291][T10284] Read of size 8 at addr ff11000001d2c150 by task (udev-worker)/10284
> [ 1414.868404][T10284]
> [ 1414.868445][T10284] CPU: 2 UID: 0 PID: 10284 Comm: (udev-worker) Not tainted 7.1.0-rc3-virtme #1 PREEMPT(full)
> [ 1414.868448][T10284] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 1414.868450][T10284] Call Trace:
> [ 1414.868452][T10284]  <TASK>
> [ 1414.868453][T10284]  dump_stack_lvl+0x6f/0xa0
> [ 1414.868459][T10284]  print_address_description.constprop.0+0x56/0x2d0
> [ 1414.868464][T10284]  print_report+0xfc/0x1fa
> [ 1414.868466][T10284]  ? __virt_addr_valid+0x102/0x440
> [ 1414.868470][T10284]  ? __virt_addr_valid+0x1da/0x440
> [ 1414.868472][T10284]  kasan_report+0x108/0x130
> [ 1414.868475][T10284]  ? rtnl_fill_prop_list+0x5c0/0x620
> [ 1414.868477][T10284]  ? rtnl_fill_prop_list+0x5c0/0x620
> [ 1414.868479][T10284]  rtnl_fill_prop_list+0x5c0/0x620
> [ 1414.868480][T10284]  ? __asan_memcpy+0x3c/0x60
> [ 1414.868482][T10284]  rtnl_fill_ifinfo.isra.0+0x3d6/0x2c90
> [ 1414.868484][T10284]  ? rcu_read_lock_any_held+0x3c/0x90
> [ 1414.868487][T10284]  ? validate_chain+0x38b/0xc20
> [ 1414.868490][T10284]  ? rtnl_fill_vf+0x460/0x460
> [ 1414.868491][T10284]  ? lockdep_hardirqs_on_prepare.part.0+0x9a/0x160
> [ 1414.868493][T10284]  ? lockdep_hardirqs_on+0x8c/0x130
> [ 1414.868496][T10284]  ? __lock_acquire+0x508/0xc10
> [ 1414.868498][T10284]  ? lock_acquire.part.0+0xbc/0x260
> [ 1414.868499][T10284]  ? find_held_lock+0x2b/0x80
> [ 1414.868502][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
> [ 1414.868504][T10284]  ? mark_held_locks+0x40/0x70
> [ 1414.868505][T10284]  ? lockdep_hardirqs_on_prepare.part.0+0x9a/0x160
> [ 1414.868507][T10284]  ? lockdep_hardirqs_on+0x8c/0x130
> [ 1414.868508][T10284]  ? _raw_spin_unlock_irqrestore+0x53/0x80
> [ 1414.868510][T10284]  rtnl_getlink+0xa48/0xe50
> [ 1414.868513][T10284]  ? find_held_lock+0x2b/0x80
> [ 1414.868515][T10284]  ? rtnl_dump_ifinfo+0xfb0/0xfb0
> [ 1414.868516][T10284]  ? mark_usage+0x61/0x170
> [ 1414.868517][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
> [ 1414.868518][T10284]  ? __lock_acquire+0x508/0xc10
> [ 1414.868525][T10284]  ? lock_acquire.part.0+0xbc/0x260
> [ 1414.868526][T10284]  ? find_held_lock+0x2b/0x80
> [ 1414.868529][T10284]  ? mark_usage+0x61/0x170
> [ 1414.868530][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
> [ 1414.868531][T10284]  ? __lock_acquire+0x508/0xc10
> [ 1414.868532][T10284]  ? bpf_address_lookup+0x232/0x290
> [ 1414.868536][T10284]  ? lock_acquire.part.0+0xbc/0x260
> [ 1414.868537][T10284]  ? find_held_lock+0x2b/0x80
> [ 1414.868539][T10284]  ? rtnl_dump_ifinfo+0xfb0/0xfb0
> [ 1414.868540][T10284]  ? __lock_release.isra.0+0x6b/0x1a0
> [ 1414.868542][T10284]  ? rtnl_dump_ifinfo+0xfb0/0xfb0
> [ 1414.868543][T10284]  rtnetlink_rcv_msg+0x6fd/0xbd0
> [ 1414.868545][T10284]  ? validate_chain+0x38b/0xc20
> [ 1414.868546][T10284]  ? rtnl_link_fill+0x920/0x920
> [ 1414.868547][T10284]  ? __lock_acquire+0x508/0xc10
> [ 1414.868549][T10284]  ? lock_acquire.part.0+0xbc/0x260
> [ 1414.868551][T10284]  ? find_held_lock+0x2b/0x80
> [ 1414.868553][T10284]  netlink_rcv_skb+0x14e/0x3a0
> [ 1414.868556][T10284]  ? rtnl_link_fill+0x920/0x920
> [ 1414.868558][T10284]  ? netlink_ack+0xce0/0xce0
> [ 1414.868560][T10284]  ? netlink_deliver_tap+0xc5/0x330
> [ 1414.868562][T10284]  ? netlink_deliver_tap+0x13c/0x330
> [ 1414.868564][T10284]  netlink_unicast+0x47c/0x740
> [ 1414.868566][T10284]  ? netlink_attachskb+0x800/0x800
> [ 1414.868568][T10284]  ? __lock_acquire+0x508/0xc10
> [ 1414.868570][T10284]  netlink_sendmsg+0x735/0xc60
> [ 1414.868572][T10284]  ? netlink_unicast+0x740/0x740
> [ 1414.868574][T10284]  ? __might_fault+0x97/0x140
> [ 1414.868577][T10284]  ? __might_fault+0x97/0x140
> [ 1414.868579][T10284]  __sys_sendto+0x2c9/0x400
> [ 1414.868582][T10284]  ? __ia32_sys_getpeername+0xd0/0xd0
> [ 1414.868586][T10284]  ? fput_close_sync+0xde/0x1b0
> [ 1414.868589][T10284]  ? alloc_file_clone+0xe0/0xe0
> [ 1414.868591][T10284]  __x64_sys_sendto+0xe4/0x1f0
> [ 1414.868593][T10284]  ? trace_irq_enable.constprop.0+0x9b/0x180
> [ 1414.868596][T10284]  ? lockdep_hardirqs_on+0x8c/0x130
> [ 1414.868597][T10284]  ? do_syscall_64+0x82/0xfc0
> [ 1414.868599][T10284]  do_syscall_64+0x117/0xfc0
> [ 1414.868600][T10284]  ? trace_hardirqs_off+0xd/0x30
> [ 1414.868602][T10284]  ? exc_page_fault+0xee/0x100
> [ 1414.868604][T10284]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [ 1414.868606][T10284] RIP: 0033:0x7fcd4191e08e
> [ 1414.868609][T10284] Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
> [ 1414.868611][T10284] RSP: 002b:00007ffc4c8b41c0 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
> [ 1414.868614][T10284] RAX: ffffffffffffffda RBX: 0000557d5eaaee20 RCX: 00007fcd4191e08e
> [ 1414.868615][T10284] RDX: 0000000000000020 RSI: 0000557d5eaa98e0 RDI: 0000000000000012
> [ 1414.868616][T10284] RBP: 00007ffc4c8b41d0 R08: 00007ffc4c8b4220 R09: 0000000000000080
> [ 1414.868617][T10284] R10: 0000000000000000 R11: 0000000000000202 R12: 0000557d5ec08060
> [ 1414.868618][T10284] R13: 00007ffc4c8b4304 R14: 0000000000000000 R15: 00007ffc4c8b43a0
> [ 1414.868621][T10284]  </TASK>
> [ 1414.868621][T10284]
> [ 1414.876011][T10284] Allocated by task 10304:
> [ 1414.876091][T10284]  kasan_save_stack+0x2f/0x50
> [ 1414.876205][T10284]  kasan_save_track+0x14/0x30
> [ 1414.876286][T10284]  __kasan_kmalloc+0x7b/0x90
> [ 1414.876399][T10284]  register_netdevice+0x48b/0x1bc0
> [ 1414.876477][T10284]  geneve_configure+0x6c3/0xcf0 [geneve]
> [ 1414.876591][T10284]  geneve_newlink+0x189/0x220 [geneve]
> [ 1414.876669][T10284]  rtnl_newlink_create+0x2da/0x8c0
> [ 1414.876747][T10284]  __rtnl_newlink+0x22b/0xa50
> [ 1414.876858][T10284]  rtnl_newlink+0x8d1/0xef0
> [ 1414.876973][T10284]  rtnetlink_rcv_msg+0x6fd/0xbd0
> [ 1414.877049][T10284]  netlink_rcv_skb+0x14e/0x3a0
> [ 1414.877128][T10284]  netlink_unicast+0x47c/0x740
> [ 1414.877204][T10284]  netlink_sendmsg+0x735/0xc60
> [ 1414.877280][T10284]  __sys_sendto+0x2c9/0x400
> [ 1414.877392][T10284]  __x64_sys_sendto+0xe4/0x1f0
> [ 1414.877470][T10284]  do_syscall_64+0x117/0xfc0
> [ 1414.877547][T10284]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [ 1414.877680][T10284]
> [ 1414.877719][T10284] Freed by task 10304:
> [ 1414.877818][T10284]  kasan_save_stack+0x2f/0x50
> [ 1414.877894][T10284]  kasan_save_track+0x14/0x30
> [ 1414.877968][T10284]  kasan_save_free_info+0x3b/0x60
> [ 1414.878084][T10284]  __kasan_slab_free+0x43/0x70
> [ 1414.878161][T10284]  kfree+0x123/0x5a0
> [ 1414.878218][T10284]  unregister_netdevice_many_notify+0xf0d/0x1f20
> [ 1414.878313][T10284]  rtnl_dellink+0x4a0/0xae0
> [ 1414.878425][T10284]  rtnetlink_rcv_msg+0x6fd/0xbd0
> [ 1414.878499][T10284]  netlink_rcv_skb+0x14e/0x3a0
> [ 1414.878577][T10284]  netlink_unicast+0x47c/0x740
> [ 1414.878657][T10284]  netlink_sendmsg+0x735/0xc60
> [ 1414.878778][T10284]  __sys_sendto+0x2c9/0x400
> [ 1414.878853][T10284]  __x64_sys_sendto+0xe4/0x1f0
> [ 1414.878928][T10284]  do_syscall_64+0x117/0xfc0
> [ 1414.879004][T10284]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [ 1414.879098][T10284]
> [ 1414.879148][T10284] The buggy address belongs to the object at ff11000001d2c140
> [ 1414.879148][T10284]  which belongs to the cache kmalloc-64 of size 64
> [ 1414.879339][T10284] The buggy address is located 16 bytes inside of
> [ 1414.879339][T10284]  freed 64-byte region [ff11000001d2c140, ff11000001d2c180)
> [ 1414.879521][T10284]
> [ 1414.879559][T10284] The buggy address belongs to the physical page:
> [ 1414.879689][T10284] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d2c
> [ 1414.879910][T10284] flags: 0x80000000000000(node=0|zone=1)
> [ 1414.879992][T10284] page_type: f5(slab)
> [ 1414.880053][T10284] raw: 0080000000000000 ff1100000103cac0 ffd400000023ac90 ffd4000000074c90
> [ 1414.880232][T10284] raw: 0000000000000000 0000000000100010 00000000f5000000 0000000000000000
> [ 1414.880365][T10284] page dumped because: kasan: bad access detected
> [ 1414.880495][T10284]
> [ 1414.880534][T10284] Memory state around the buggy address:
> [ 1414.880609][T10284]  ff11000001d2c000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
> [ 1414.880726][T10284]  ff11000001d2c080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 1414.880837][T10284] >ff11000001d2c100: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
> [ 1414.880948][T10284]                                                  ^
> [ 1414.881042][T10284]  ff11000001d2c180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 1414.881190][T10284]  ff11000001d2c200: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>
> decoded: https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/653382/vm-crash-thr0-0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  2026-05-19 16:53     ` Eric Dumazet
@ 2026-05-19 22:17       ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-05-19 22:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Tue, 19 May 2026 09:53:08 -0700 Eric Dumazet wrote:
> On Tue, May 19, 2026 at 9:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 19 May 2026 11:43:54 +0000 Eric Dumazet wrote:  
> > > Avoid corrupting a netlink message and confuse user space in the
> > > unlikely case rtnl_fill_prop_list was able to produce a very big
> > > nested element.  
> >
> > Should we not prevent it from happening in the first place?
> > IIUC otherwise if user adds a lot of altnames ip link will no longer
> > work?  
> 
> We cannot prevent this unless we add a mutual exclusion.

Today its under rtnl_lock, AFAICT, so nop, if we want to lift
rtnl_lock from RTM_NEWLINKPROP we can probably slide the props
under dev->lock or add a new lock?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-05-19 22:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 11:43 [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Eric Dumazet
2026-05-19 11:43 ` [PATCH v2 net-next 1/2] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
2026-05-19 16:39   ` Jakub Kicinski
2026-05-19 16:53     ` Eric Dumazet
2026-05-19 22:17       ` Jakub Kicinski
2026-05-19 11:43 ` [PATCH v2 net-next 2/2] rtnetlink: do not acquire RTNL for RTM_GETLINK with RTEXT_FILTER_NAME_ONLY Eric Dumazet
2026-05-19 16:37 ` [PATCH v2 net-next 0/2] rtnetlink: RTNL avoidance in rtnl_getlink() Jakub Kicinski
2026-05-19 17:17   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox