All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()
@ 2026-05-22 17:29 Eric Dumazet
  2026-05-22 17:29 ` [PATCH v4 net-next 1/5] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-22 17:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

Many shell scripts invoke iproute2 commands specifying a device by
its name.

This series improves their performance avoiding RTNL acquisition
for their (repeated) name->index conversion.

v3: insert patch 2/3 in the series (Jakub reported a KASAN splat)
v4: Addressed Sashiko's feedback.
    added 2 patches for rtnl_dump_ifinfo().

Eric Dumazet (5):
  rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  net: defer netdev_name_node_alt_flush() call to netdev_run_todo()
  rtnetlink: do not acquire RTNL in rtnl_getlink() with
    RTEXT_FILTER_NAME_ONLY
  rtnetlink: do not assume RTNL is held in link_master_filtered()
  rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo()

 net/core/dev.c       |   4 +-
 net/core/rtnetlink.c | 131 ++++++++++++++++++++++++++++++-------------
 2 files changed, 95 insertions(+), 40 deletions(-)

-- 
2.54.0.746.g67dd491aae-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 net-next 1/5] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
@ 2026-05-22 17:29 ` Eric Dumazet
  2026-05-22 17:29 ` [PATCH v4 net-next 2/5] net: defer netdev_name_node_alt_flush() call to netdev_run_todo() Eric Dumazet
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-22 17:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

Avoid corrupting a netlink message and confuse user space in the
very unlikely case rtnl_fill_prop_list was able to produce a very big
nested element.

This is extremely unlikely, because rtnl_prop_list_size()
provisions nla_total_size(ALTIFNAMSIZ) per altname.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3d40ebe035b37ae0f38fb81f918eb76742371ef1..3dfa28927c7f92f906a0d89b7a1812b975d13854 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1971,12 +1971,14 @@ static int rtnl_fill_prop_list(struct sk_buff *skb,
 	if (ret <= 0)
 		goto nest_cancel;
 
-	nla_nest_end(skb, prop_list);
+	if (nla_nest_end_safe(skb, prop_list) < 0)
+		goto nest_cancel;
+
 	return 0;
 
 nest_cancel:
 	nla_nest_cancel(skb, prop_list);
-	return ret;
+	return -EMSGSIZE;
 }
 
 static int rtnl_fill_proto_down(struct sk_buff *skb,
-- 
2.54.0.746.g67dd491aae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 net-next 2/5] net: defer netdev_name_node_alt_flush() call to netdev_run_todo()
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
  2026-05-22 17:29 ` [PATCH v4 net-next 1/5] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
@ 2026-05-22 17:29 ` Eric Dumazet
  2026-05-22 17:30 ` [PATCH v4 net-next 3/5] rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY Eric Dumazet
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-22 17:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

In the following patch, we want to call rtnl_fill_prop_list() without
RTNL being held, but after a device reference was taken.

We need to free altnames in netdev_run_todo() instead of
unregister_netdevice_many_notify().

Freeing will only happen once all device references
have been released.

Note that dev->name_node serves as the anchor for altnames,
thus must be also freed in netdev_run_todo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/core/dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 26ac8eb9b259d489159c7ab5a2b206d425110b3b..2d795f3f569be00361809823fd3e59fb1871919c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11738,6 +11738,8 @@ void netdev_run_todo(void)
 		WARN_ON(rcu_access_pointer(dev->ip_ptr));
 		WARN_ON(rcu_access_pointer(dev->ip6_ptr));
 
+		netdev_name_node_alt_flush(dev);
+		netdev_name_node_free(dev->name_node);
 		netdev_do_free_pcpu_stats(dev);
 		if (dev->priv_destructor)
 			dev->priv_destructor(dev);
@@ -12451,8 +12453,6 @@ void unregister_netdevice_many_notify(struct list_head *head,
 		dev_uc_flush(dev);
 		dev_mc_flush(dev);
 
-		netdev_name_node_alt_flush(dev);
-		netdev_name_node_free(dev->name_node);
 
 		netdev_rss_contexts_free(dev);
 
-- 
2.54.0.746.g67dd491aae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 net-next 3/5] rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
  2026-05-22 17:29 ` [PATCH v4 net-next 1/5] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
  2026-05-22 17:29 ` [PATCH v4 net-next 2/5] net: defer netdev_name_node_alt_flush() call to netdev_run_todo() Eric Dumazet
@ 2026-05-22 17:30 ` Eric Dumazet
  2026-05-22 17:30 ` [PATCH v4 net-next 4/5] rtnetlink: do not assume RTNL is held in link_master_filtered() Eric Dumazet
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-22 17:30 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

When RTEXT_FILTER_NAME_ONLY is requested, rtnl_fill_ifinfo()
is dumping device attributes which do not need RTNL protection.

Many shell scripts invoke iproute2 commands specifying a device by
its name. After this patch, they will no longer add RTNL pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 94 +++++++++++++++++++++++++++++++-------------
 1 file changed, 67 insertions(+), 27 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3dfa28927c7f92f906a0d89b7a1812b975d13854..c342b22528e4478a61f22e204a3934ba1a48cb3c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2068,7 +2068,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	struct nlmsghdr *nlh;
 	struct Qdisc *qdisc;
 
-	ASSERT_RTNL();
 	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
 	if (nlh == NULL)
 		return -EMSGSIZE;
@@ -2091,6 +2090,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (ext_filter_mask & RTEXT_FILTER_NAME_ONLY)
 		goto end;
 
+	ASSERT_RTNL();
 	if (tgt_netnsid >= 0 &&
 	    nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
 		goto nla_put_failure;
@@ -3468,6 +3468,21 @@ static struct net_device *rtnl_dev_get(struct net *net,
 	return __dev_get_by_name(net, ifname);
 }
 
+static struct net_device *rtnl_dev_get_rcu(struct net *net,
+					   struct nlattr *tb[])
+{
+	char ifname[ALTIFNAMSIZ];
+
+	if (tb[IFLA_IFNAME])
+		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
+	else if (tb[IFLA_ALT_IFNAME])
+		nla_strscpy(ifname, tb[IFLA_ALT_IFNAME], ALTIFNAMSIZ);
+	else
+		return NULL;
+
+	return dev_get_by_name_rcu(net, ifname);
+}
+
 static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
@@ -4187,14 +4202,16 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFLA_MAX + 1];
+	netdevice_tracker dev_tracker;
+	struct net_device *dev = NULL;
 	struct net *tgt_net = net;
+	u32 ext_filter_mask = 0;
 	struct ifinfomsg *ifm;
-	struct nlattr *tb[IFLA_MAX+1];
-	struct net_device *dev = NULL;
 	struct sk_buff *nskb;
 	int netnsid = -1;
+	bool need_rtnl;
 	int err;
-	u32 ext_filter_mask = 0;
 
 	err = rtnl_valid_getlink_req(skb, nlh, tb, extack);
 	if (err < 0)
@@ -4214,43 +4231,65 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (tb[IFLA_EXT_MASK])
 		ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
 
-	err = -EINVAL;
 	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_index > 0)
-		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
-		dev = rtnl_dev_get(tgt_net, tb);
-	else
+	rcu_read_lock();
+	if (ifm->ifi_index > 0) {
+		dev = dev_get_by_index_rcu(tgt_net, ifm->ifi_index);
+	} else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
+		dev = rtnl_dev_get_rcu(tgt_net, tb);
+	} else {
+		rcu_read_unlock();
+		err = -EINVAL;
 		goto out;
+	}
+	netdev_hold(dev, &dev_tracker, GFP_ATOMIC);
+	rcu_read_unlock();
 
 	err = -ENODEV;
 	if (dev == NULL)
 		goto out;
 
+	need_rtnl = !(ext_filter_mask & RTEXT_FILTER_NAME_ONLY);
+
+retry:
+	if (need_rtnl) {
+		rtnl_lock();
+		/* Synchronize the carrier state so we don't report a state
+		 * that we're not actually going to honour immediately; if
+		 * the driver just did a carrier off->on transition, we can
+		 * only TX if link watch work has run, but without this we'd
+		 * already report carrier on, even if it doesn't work yet.
+		 */
+		linkwatch_sync_dev(dev);
+	}
+
 	err = -ENOBUFS;
 	nskb = nlmsg_new_large(if_nlmsg_size(dev, ext_filter_mask));
-	if (nskb == NULL)
-		goto out;
+	if (nskb)
+		err = rtnl_fill_ifinfo(nskb, dev, net,
+				       RTM_NEWLINK, NETLINK_CB(skb).portid,
+				       nlh->nlmsg_seq, 0, 0, ext_filter_mask,
+				       0, NULL, 0, netnsid, GFP_KERNEL);
 
-	/* Synchronize the carrier state so we don't report a state
-	 * that we're not actually going to honour immediately; if
-	 * the driver just did a carrier off->on transition, we can
-	 * only TX if link watch work has run, but without this we'd
-	 * already report carrier on, even if it doesn't work yet.
-	 */
-	linkwatch_sync_dev(dev);
+	if (need_rtnl)
+		rtnl_unlock();
 
-	err = rtnl_fill_ifinfo(nskb, dev, net,
-			       RTM_NEWLINK, NETLINK_CB(skb).portid,
-			       nlh->nlmsg_seq, 0, 0, ext_filter_mask,
-			       0, NULL, 0, netnsid, GFP_KERNEL);
 	if (err < 0) {
-		/* -EMSGSIZE implies BUG in if_nlmsg_size */
-		WARN_ON(err == -EMSGSIZE);
 		kfree_skb(nskb);
-	} else
+		if (err == -EMSGSIZE) {
+			if (!need_rtnl) {
+				/* Some altnames were added, retry with RTNL. */
+				need_rtnl = true;
+				goto retry;
+			}
+			/* -EMSGSIZE implies BUG in if_nlmsg_size */
+			WARN_ON_ONCE(1);
+		}
+	} else {
 		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+	}
 out:
+	netdev_put(dev, &dev_tracker);
 	if (netnsid >= 0)
 		put_net(tgt_net);
 
@@ -7117,7 +7156,8 @@ static const struct rtnl_msg_handler rtnetlink_rtnl_msg_handlers[] __initconst =
 	{.msgtype = RTM_DELLINK, .doit = rtnl_dellink,
 	 .flags = RTNL_FLAG_DOIT_PERNET_WIP},
 	{.msgtype = RTM_GETLINK, .doit = rtnl_getlink,
-	 .dumpit = rtnl_dump_ifinfo, .flags = RTNL_FLAG_DUMP_SPLIT_NLM_DONE},
+	 .dumpit = rtnl_dump_ifinfo,
+	 .flags = RTNL_FLAG_DUMP_SPLIT_NLM_DONE | RTNL_FLAG_DOIT_UNLOCKED},
 	{.msgtype = RTM_SETLINK, .doit = rtnl_setlink,
 	 .flags = RTNL_FLAG_DOIT_PERNET_WIP},
 	{.msgtype = RTM_GETADDR, .dumpit = rtnl_dump_all},
-- 
2.54.0.746.g67dd491aae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 net-next 4/5] rtnetlink: do not assume RTNL is held in link_master_filtered()
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
                   ` (2 preceding siblings ...)
  2026-05-22 17:30 ` [PATCH v4 net-next 3/5] rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY Eric Dumazet
@ 2026-05-22 17:30 ` Eric Dumazet
  2026-05-22 17:30 ` [PATCH v4 net-next 5/5] rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo() Eric Dumazet
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-22 17:30 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

RTNL might be no longer held by the caller in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c342b22528e4478a61f22e204a3934ba1a48cb3c..bad036ef7614ffae52a65c447344ac1314f5521b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2371,22 +2371,24 @@ static struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla,
 static bool link_master_filtered(struct net_device *dev, int master_idx)
 {
 	struct net_device *master;
+	bool res = false;
 
 	if (!master_idx)
 		return false;
 
-	master = netdev_master_upper_dev_get(dev);
+	rcu_read_lock();
+	master = netdev_master_upper_dev_get_rcu(dev);
 
 	/* 0 is already used to denote IFLA_MASTER wasn't passed, therefore need
 	 * another invalid value for ifindex to denote "no master".
 	 */
 	if (master_idx == -1)
-		return !!master;
-
-	if (!master || master->ifindex != master_idx)
-		return true;
+		res = !!master;
+	else if (!master || master->ifindex != master_idx)
+		res = true;
+	rcu_read_unlock();
 
-	return false;
+	return res;
 }
 
 static bool link_kind_filtered(const struct net_device *dev,
-- 
2.54.0.746.g67dd491aae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 net-next 5/5] rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo()
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
                   ` (3 preceding siblings ...)
  2026-05-22 17:30 ` [PATCH v4 net-next 4/5] rtnetlink: do not assume RTNL is held in link_master_filtered() Eric Dumazet
@ 2026-05-22 17:30 ` Eric Dumazet
  2026-05-22 21:29 ` [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Jakub Kicinski
  2026-05-23  7:00 ` [syzbot ci] " syzbot ci
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-22 17:30 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

When user requests RTEXT_FILTER_NAME_ONLY flag, we limit the dump
parts to:

 - struct nlmsghdr
 - IFLA_IFNAME
 - IFLA_PROP_LIST (alternate names)

- This saves space in the dump, pushing more devices per system call.
- This can be done without acquiring RTNL.

I still have a long term goal to avoid RTNL in rtnl_dump_ifinfo()
regardless of RTEXT_FILTER_NAME_ONLY being used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bad036ef7614ffae52a65c447344ac1314f5521b..9045285ba2f8be8d7ff32e4f90ee546651f1a05f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2499,6 +2499,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 	int ops_srcu_index;
 	int master_idx = 0;
 	int netnsid = -1;
+	bool need_rtnl;
 	int err, i;
 
 	err = rtnl_valid_dump_ifinfo_req(nlh, cb->strict_check, tb, extack);
@@ -2548,6 +2549,12 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 
 walk_entries:
 	err = 0;
+	need_rtnl = !(ext_filter_mask & RTEXT_FILTER_NAME_ONLY);
+	if (need_rtnl)
+		rtnl_lock();
+	else
+		rcu_read_lock();
+
 	for_each_netdev_dump(tgt_net, dev, ctx->ifindex) {
 		if (link_dump_filtered(dev, master_idx, kind_ops))
 			continue;
@@ -2559,11 +2566,13 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 		if (err < 0)
 			break;
 	}
-
-
-	cb->seq = tgt_net->dev_base_seq;
+	cb->seq = READ_ONCE(tgt_net->dev_base_seq);
 	nl_dump_check_consistent(cb, nlmsg_hdr(skb));
 
+	if (need_rtnl)
+		rtnl_unlock();
+	else
+		rcu_read_unlock();
 out:
 
 	if (kind_ops)
@@ -7159,7 +7168,9 @@ static const struct rtnl_msg_handler rtnetlink_rtnl_msg_handlers[] __initconst =
 	 .flags = RTNL_FLAG_DOIT_PERNET_WIP},
 	{.msgtype = RTM_GETLINK, .doit = rtnl_getlink,
 	 .dumpit = rtnl_dump_ifinfo,
-	 .flags = RTNL_FLAG_DUMP_SPLIT_NLM_DONE | RTNL_FLAG_DOIT_UNLOCKED},
+	 .flags = RTNL_FLAG_DUMP_SPLIT_NLM_DONE |
+		  RTNL_FLAG_DOIT_UNLOCKED |
+		  RTNL_FLAG_DUMP_UNLOCKED},
 	{.msgtype = RTM_SETLINK, .doit = rtnl_setlink,
 	 .flags = RTNL_FLAG_DOIT_PERNET_WIP},
 	{.msgtype = RTM_GETADDR, .dumpit = rtnl_dump_all},
-- 
2.54.0.746.g67dd491aae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
                   ` (4 preceding siblings ...)
  2026-05-22 17:30 ` [PATCH v4 net-next 5/5] rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo() Eric Dumazet
@ 2026-05-22 21:29 ` Jakub Kicinski
  2026-05-23  4:48   ` Eric Dumazet
  2026-05-23  7:00 ` [syzbot ci] " syzbot ci
  6 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2026-05-22 21:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Fri, 22 May 2026 17:29:57 +0000 Eric Dumazet wrote:
> Many shell scripts invoke iproute2 commands specifying a device by
> its name.
> 
> This series improves their performance avoiding RTNL acquisition
> for their (repeated) name->index conversion.
> 
> v3: insert patch 2/3 in the series (Jakub reported a KASAN splat)
> v4: Addressed Sashiko's feedback.
>     added 2 patches for rtnl_dump_ifinfo().

The CI looks fried, various errors:

# 0.02 [+0.02] RTNETLINK answers: File exists
# 0.02 [+0.00] Failed to create netif

# CMD: ip -d -j link show dev eth8
#   EXIT: 1
#   STDOUT: []
#   STDERR: RTNETLINK answers: Message too long
#           Cannot send link get request: Message too long

At boot we hit:

[    0.661578] ------------[ cut here ]------------
[    0.661608] WARNING: net/core/rtnetlink.c:4296 at rtnl_getlink+0x457/0x5e0, CPU#3: ip/71
[    0.661656] Modules linked in:
[    0.661681] CPU: 3 UID: 0 PID: 71 Comm: ip Tainted: G        W           7.1.0-rc4-virtme #1 PREEMPT(lazy) 
[    0.661735] Tainted: [W]=WARN
[    0.661756] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.661795] RIP: 0010:rtnl_getlink+0x457/0x5e0
[    0.661831] Code: ff ff 89 c3 48 83 c4 38 e8 76 44 fe ff 85 db 48 8b 7c 24 08 0f 84 2c 01 00 00 48 89 fe ba 02 00 00 00 31 ff e8 4a b7 fb ff 90 <0f> 0b 90 b8 a6 ff ff ff 49 8b 94 24 40 05 00 00 65 ff 0a 80 7c 24
[    0.661928] RSP: 0018:ff87a370c027b7a8 EFLAGS: 00010296
[    0.661956] RAX: 0000000000000010 RBX: 00000000ffffffa6 RCX: ff368f1541d3ef00
[    0.661998] RDX: ff368f157edadda0 RSI: 0000000000000011 RDI: ff368f15411ffa00
[    0.662041] RBP: ff368f1541e60b00 R08: ff368f1541845810 R09: ffffffff9ec3bbe6
[    0.662076] R10: ffd7c3a340079800 R11: ff368f15411ffa00 R12: ff368f1541e31000
[    0.662126] R13: ffffffff9fe4f700 R14: 0000000000000009 R15: ffffffff9fe4f700
[    0.662176] FS:  00007fa9d67e0600(0000) GS:ff368f15df00c000(0000) knlGS:0000000000000000
[    0.662219] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.662253] CR2: 00007fff3c059c78 CR3: 000000000190f005 CR4: 0000000000771ef0
[    0.662294] PKRU: 55555554
[    0.662307] Call Trace:
[    0.662319]  <TASK>
[    0.662338]  ? rtnl_fill_ifinfo.isra.0+0x1670/0x1670
[    0.662366]  rtnetlink_rcv_msg+0x39f/0x460
[    0.662390]  ? rtnl_calcit.isra.0+0x160/0x160
[    0.662420]  netlink_rcv_skb+0xca/0x140
[    0.662445]  netlink_unicast+0x26b/0x3a0
[    0.662467]  netlink_sendmsg+0x1e2/0x430
[    0.662489]  ____sys_sendmsg+0x14c/0x2b0
[    0.662511]  ___sys_sendmsg+0xe1/0x120
[    0.662539]  __sys_sendmsg+0xad/0x100
[    0.662560]  do_syscall_64+0x104/0xfc0
[    0.662585]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[    0.662609] RIP: 0033:0x7fa9d6a1808e
[    0.662631] Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
[    0.662733] RSP: 002b:00007fff3c059b50 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[    0.662775] RAX: ffffffffffffffda RBX: 00007fff3c05ce1d RCX: 00007fa9d6a1808e
[    0.662823] RDX: 0000000000000000 RSI: 00007fff3c059c00 RDI: 0000000000000004
[    0.662864] RBP: 00007fff3c059b60 R08: 0000000000000000 R09: 0000000000000000
[    0.662907] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[    0.662947] R13: 000000006a10c531 R14: 00007fff3c059ca0 R15: 00007fff3c05ce1d
[    0.662982]  </TASK>
[    0.662998] ---[ end trace 0000000000000000 ]---
[    0.663079] ------------[ cut here ]------------
[    0.663110] WARNING: net/core/rtnetlink.c:4523 at rtmsg_ifinfo_build_skb+0xc8/0x110, CPU#3: ip/71
[    0.663162] Modules linked in:
[    0.663186] CPU: 3 UID: 0 PID: 71 Comm: ip Tainted: G        W           7.1.0-rc4-virtme #1 PREEMPT(lazy) 
[    0.663235] Tainted: [W]=WARN
[    0.663256] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.663289] RIP: 0010:rtmsg_ifinfo_build_skb+0xc8/0x110
[    0.663319] Code: 80 00 00 00 e8 59 d9 ff ff 48 83 c4 38 85 c0 75 18 48 83 c4 08 4c 89 f8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 8b 48 08 eb b4 90 <0f> 0b 90 ba 02 00 00 00 4c 89 fe 31 ff e8 36 ab fb ff b9 a6 ff ff
[    0.663414] RSP: 0018:ff87a370c027b720 EFLAGS: 00010286
[    0.663439] RAX: 00000000ffffffa6 RBX: 0000000000000001 RCX: 0000000000000000
[    0.663485] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff368f1541e60700
[    0.663527] RBP: 0000000000000000 R08: ff368f1541845810 R09: ff368f154298f02c
[    0.663569] R10: ff368f1541e31120 R11: fefefefefefefeff R12: 0000000000000000
[    0.663612] R13: 0000000000000010 R14: ff368f1541e31000 R15: ff368f1541e60700
[    0.663655] FS:  00007fa9d67e0600(0000) GS:ff368f15df00c000(0000) knlGS:0000000000000000
[    0.663699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.663730] CR2: 000055ed56288988 CR3: 000000000190f005 CR4: 0000000000771ef0
[    0.663774] PKRU: 55555554
[    0.663787] Call Trace:
[    0.663803]  <TASK>
[    0.663821]  rtmsg_ifinfo+0x3c/0xa0
[    0.663845]  __dev_notify_flags+0xb1/0xf0
[    0.663867]  ? rtnl_getlink+0x456/0x5e0
[    0.663887]  netif_change_flags+0x54/0x70
[    0.663913]  do_setlink.isra.0+0x3a2/0x1500
[    0.663939]  ? __nla_validate_parse+0x76/0xf20
[    0.663970]  rtnl_newlink+0x9d3/0xd90
[    0.663993]  ? do_setlink.isra.0+0x1500/0x1500
[    0.664015]  rtnetlink_rcv_msg+0x39f/0x460
[    0.664035]  ? get_page_from_freelist+0x157a/0x16a0
[    0.664068]  ? rtnl_calcit.isra.0+0x160/0x160
[    0.664091]  netlink_rcv_skb+0xca/0x140
[    0.664118]  netlink_unicast+0x26b/0x3a0
[    0.664140]  netlink_sendmsg+0x1e2/0x430
[    0.664162]  ____sys_sendmsg+0x14c/0x2b0
[    0.664182]  ___sys_sendmsg+0xe1/0x120
[    0.664204]  __sys_sendmsg+0xad/0x100
[    0.664226]  do_syscall_64+0x104/0xfc0
[    0.664248]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[    0.664275] RIP: 0033:0x7fa9d6a1808e
[    0.664295] Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
[    0.664397] RSP: 002b:00007fff3c05b1b0 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[    0.664437] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fa9d6a1808e
[    0.664478] RDX: 0000000000000000 RSI: 00007fff3c05b260 RDI: 0000000000000003
[    0.664520] RBP: 00007fff3c05b1c0 R08: 0000000000000000 R09: 0000000000000000
[    0.664559] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[    0.664604] R13: 000000006a10c531 R14: 000055ed1cfb1040 R15: 0000000000000000
[    0.664645]  </TASK>
[    0.664658] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()
  2026-05-22 21:29 ` [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Jakub Kicinski
@ 2026-05-23  4:48   ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-05-23  4:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
	netdev, eric.dumazet

On Fri, May 22, 2026 at 2:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 22 May 2026 17:29:57 +0000 Eric Dumazet wrote:
> > Many shell scripts invoke iproute2 commands specifying a device by
> > its name.
> >
> > This series improves their performance avoiding RTNL acquisition
> > for their (repeated) name->index conversion.
> >
> > v3: insert patch 2/3 in the series (Jakub reported a KASAN splat)
> > v4: Addressed Sashiko's feedback.
> >     added 2 patches for rtnl_dump_ifinfo().
>
> The CI looks fried, various errors:
>
> # 0.02 [+0.02] RTNETLINK answers: File exists
> # 0.02 [+0.00] Failed to create netif
>
> # CMD: ip -d -j link show dev eth8
> #   EXIT: 1
> #   STDOUT: []
> #   STDERR: RTNETLINK answers: Message too long
> #           Cannot send link get request: Message too long
>
> At boot we hit:
>
> [    0.661578] ------------[ cut here ]------------
> [    0.661608] WARNING: net/core/rtnetlink.c:4296 at rtnl_getlink+0x457/0x5e0, CPU#3: ip/71

Ah right, first patch went wrong, I missed that we call
nla_nest_cancel() when there was no altnames to dump.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [syzbot ci] Re: rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()
  2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
                   ` (5 preceding siblings ...)
  2026-05-22 21:29 ` [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Jakub Kicinski
@ 2026-05-23  7:00 ` syzbot ci
  6 siblings, 0 replies; 9+ messages in thread
From: syzbot ci @ 2026-05-23  7:00 UTC (permalink / raw)
  To: davem, edumazet, eric.dumazet, horms, kuba, kuniyu, netdev,
	pabeni
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v4] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()
https://lore.kernel.org/all/20260522173002.2181677-1-edumazet@google.com
* [PATCH v4 net-next 1/5] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()
* [PATCH v4 net-next 2/5] net: defer netdev_name_node_alt_flush() call to netdev_run_todo()
* [PATCH v4 net-next 3/5] rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY
* [PATCH v4 net-next 4/5] rtnetlink: do not assume RTNL is held in link_master_filtered()
* [PATCH v4 net-next 5/5] rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo()

and found the following issue:
WARNING in rtmsg_ifinfo_build_skb

Full report is available here:
https://ci.syzbot.org/series/583940d4-d5e9-48ca-a2e6-544edbb1d63c

***

WARNING in rtmsg_ifinfo_build_skb

tree:      net-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base:      1a1f055318d82e64485a6ff8420e5f70b4267998
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/513a249e-70b7-4622-8d72-6f62840955c7/config

pci 0000:00:01.0: BAR 2 [mem 0xfebf0000-0xfebf0fff]
pci 0000:00:01.0: ROM [mem 0xfebe0000-0xfebeffff pref]
pci 0000:00:01.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
pci 0000:00:02.0: [1af4:1005] type 00 class 0x00ff00 conventional PCI endpoint
pci 0000:00:02.0: BAR 0 [io  0xc080-0xc09f]
pci 0000:00:02.0: BAR 1 [mem 0xfebf1000-0xfebf1fff]
pci 0000:00:02.0: BAR 4 [mem 0xfe000000-0xfe003fff 64bit pref]
pci 0000:00:03.0: [8086:100e] type 00 class 0x020000 conventional PCI endpoint
pci 0000:00:03.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
pci 0000:00:03.0: BAR 1 [io  0xc000-0xc03f]
pci 0000:00:03.0: ROM [mem 0xfeb80000-0xfebbffff pref]
pci 0000:00:1f.0: [8086:2918] type 00 class 0x060100 conventional PCI endpoint
pci 0000:00:1f.0: quirk: [io  0x0600-0x067f] claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.2: [8086:2922] type 00 class 0x010601 conventional PCI endpoint
pci 0000:00:1f.2: BAR 4 [io  0xc0a0-0xc0bf]
pci 0000:00:1f.2: BAR 5 [mem 0xfebf2000-0xfebf2fff]
pci 0000:00:1f.3: [8086:2930] type 00 class 0x0c0500 conventional PCI endpoint
pci 0000:00:1f.3: BAR 4 [io  0x0700-0x073f]
ACPI: PCI: Interrupt link LNKA configured for IRQ 10
ACPI: PCI: Interrupt link LNKB configured for IRQ 10
ACPI: PCI: Interrupt link LNKC configured for IRQ 11
ACPI: PCI: Interrupt link LNKD configured for IRQ 11
ACPI: PCI: Interrupt link LNKE configured for IRQ 10
ACPI: PCI: Interrupt link LNKF configured for IRQ 10
ACPI: PCI: Interrupt link LNKG configured for IRQ 11
ACPI: PCI: Interrupt link LNKH configured for IRQ 11
ACPI: PCI: Interrupt link GSIA configured for IRQ 16
ACPI: PCI: Interrupt link GSIB configured for IRQ 17
ACPI: PCI: Interrupt link GSIC configured for IRQ 18
ACPI: PCI: Interrupt link GSID configured for IRQ 19
ACPI: PCI: Interrupt link GSIE configured for IRQ 20
ACPI: PCI: Interrupt link GSIF configured for IRQ 21
ACPI: PCI: Interrupt link GSIG configured for IRQ 22
ACPI: PCI: Interrupt link GSIH configured for IRQ 23
iommu: Default domain type: Translated
iommu: DMA domain TLB invalidation policy: lazy mode
SCSI subsystem initialized
ACPI: bus type USB registered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
mc: Linux media interface: v0.10
videodev: Linux video capture interface: v2.00
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
Advanced Linux Sound Architecture Driver Initialized.
------------[ cut here ]------------
err == -EMSGSIZE
WARNING: net/core/rtnetlink.c:4524 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: swapper/0/1
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
Code: f6 ba 01 00 00 00 89 e9 e8 45 ac 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 39 dc 40 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
RSP: 0000:ffffc90000067438 EFLAGS: 00010293
RAX: ffffffff8984e887 RBX: 0000000000000000 RCX: ffff8881026f5880
RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
RBP: 00000000ffffffa6 R08: ffffffff8984f746 R09: 0000000000000000
R10: fffff5200000ce30 R11: ffffed1020c50405 R12: 1ffff11020c51c21
R13: 0000000000000000 R14: ffff888103a82480 R15: ffff88810628e000
FS:  0000000000000000(0000) GS:ffff88818dc76000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000e74a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 rtmsg_ifinfo+0x8c/0x1a0
 register_netdevice+0x1aca/0x1ec0
 register_netdev+0x40/0x60
 loopback_net_init+0x75/0x150
 ops_init+0x35c/0x5c0
 register_pernet_operations+0x343/0x830
 register_pernet_device+0x2a/0x80
 net_dev_init+0x973/0xa90
 do_one_initcall+0x250/0x870
 do_initcall_level+0x104/0x190
 do_initcalls+0x59/0xa0
 kernel_init_freeable+0x2a6/0x3e0
 kernel_init+0x1d/0x1d0
 ret_from_fork+0x514/0xb70
 ret_from_fork_asm+0x1a/0x30
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-23  7:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 17:29 [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Eric Dumazet
2026-05-22 17:29 ` [PATCH v4 net-next 1/5] rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list() Eric Dumazet
2026-05-22 17:29 ` [PATCH v4 net-next 2/5] net: defer netdev_name_node_alt_flush() call to netdev_run_todo() Eric Dumazet
2026-05-22 17:30 ` [PATCH v4 net-next 3/5] rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY Eric Dumazet
2026-05-22 17:30 ` [PATCH v4 net-next 4/5] rtnetlink: do not assume RTNL is held in link_master_filtered() Eric Dumazet
2026-05-22 17:30 ` [PATCH v4 net-next 5/5] rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo() Eric Dumazet
2026-05-22 21:29 ` [PATCH v4 net-next 0/5] rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo() Jakub Kicinski
2026-05-23  4:48   ` Eric Dumazet
2026-05-23  7:00 ` [syzbot ci] " syzbot ci

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.