netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps
@ 2024-02-22 10:50 Eric Dumazet
  2024-02-22 10:50 ` [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
                   ` (14 more replies)
  0 siblings, 15 replies; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

This series restarts the conversion of rtnl dump operations
to RCU protection, instead of requiring RTNL.

In this new attempt (prior one failed in 2011), I chose to
allow a gradual conversion of selected operations.

After this series, "ip -6 addr" and "ip -4 ro" no longer
need to acquire RTNL.

I refrained from changing inet_dump_ifaddr() and inet6_dump_addr()
to avoid merge conflicts because of two fixes in net tree.

I also started the work for "ip link" future conversion.

v2: rtnl_fill_link_ifmap() always emit IFLA_MAP (Jiri Pirko)
    Added "nexthop: allow nexthop_mpath_fill_node()
           to be called without RTNL" to avoid a lockdep splat (Ido Schimmel)

Eric Dumazet (14):
  rtnetlink: prepare nla_put_iflink() to run under RCU
  ipv6: prepare inet6_fill_ifla6_attrs() for RCU
  ipv6: prepare inet6_fill_ifinfo() for RCU protection
  ipv6: use xarray iterator to implement inet6_dump_ifinfo()
  netlink: fix netlink_diag_dump() return value
  netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  rtnetlink: change nlk->cb_mutex role
  rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
  ipv6: switch inet6_dump_ifinfo() to RCU protection
  inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
  nexthop: allow nexthop_mpath_fill_node() to be called without RTNL
  inet: switch inet_dump_fib() to RCU protection
  rtnetlink: make rtnl_fill_link_ifmap() RCU ready
  rtnetlink: provide RCU protection to rtnl_fill_prop_list()

 drivers/infiniband/ulp/ipoib/ipoib_main.c     |   4 +-
 drivers/net/can/vxcan.c                       |   2 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_vnd.c   |   2 +-
 drivers/net/ipvlan/ipvlan_main.c              |   2 +-
 drivers/net/macsec.c                          |   2 +-
 drivers/net/macvlan.c                         |   2 +-
 drivers/net/netkit.c                          |   2 +-
 drivers/net/veth.c                            |   2 +-
 drivers/net/wireless/virtual/virt_wifi.c      |   2 +-
 include/linux/netdevice.h                     |   6 +-
 include/linux/netlink.h                       |   2 +
 include/net/ip_fib.h                          |   1 +
 include/net/nexthop.h                         |   2 +-
 include/net/rtnetlink.h                       |   1 +
 net/8021q/vlan_dev.c                          |   4 +-
 net/core/dev.c                                |   6 +-
 net/core/rtnetlink.c                          |  36 +--
 net/dsa/user.c                                |   2 +-
 net/ieee802154/6lowpan/core.c                 |   2 +-
 net/ipv4/fib_frontend.c                       |  50 ++--
 net/ipv4/fib_trie.c                           |   4 +-
 net/ipv4/ipmr.c                               |   4 +-
 net/ipv6/addrconf.c                           | 222 +++++++++---------
 net/ipv6/ip6_fib.c                            |   7 +-
 net/ipv6/ip6_tunnel.c                         |   2 +-
 net/ipv6/ip6mr.c                              |   4 +-
 net/ipv6/ndisc.c                              |   2 +-
 net/mpls/af_mpls.c                            |   4 +-
 net/netlink/af_netlink.c                      |  46 ++--
 net/netlink/af_netlink.h                      |   5 +-
 net/netlink/diag.c                            |   2 +-
 net/xfrm/xfrm_interface_core.c                |   2 +-
 32 files changed, 238 insertions(+), 198 deletions(-)

-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 13:29   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c       | 4 ++--
 drivers/net/can/vxcan.c                         | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2 +-
 drivers/net/ipvlan/ipvlan_main.c                | 2 +-
 drivers/net/macsec.c                            | 2 +-
 drivers/net/macvlan.c                           | 2 +-
 drivers/net/netkit.c                            | 2 +-
 drivers/net/veth.c                              | 2 +-
 drivers/net/wireless/virtual/virt_wifi.c        | 2 +-
 net/8021q/vlan_dev.c                            | 4 ++--
 net/core/dev.c                                  | 2 +-
 net/core/rtnetlink.c                            | 6 +++---
 net/dsa/user.c                                  | 2 +-
 net/ieee802154/6lowpan/core.c                   | 2 +-
 net/ipv6/ip6_tunnel.c                           | 2 +-
 net/xfrm/xfrm_interface_core.c                  | 2 +-
 16 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7a5be705d71830d5bb3aa26a96a4463df03883a4..6f2a688fccbfb02ae7bdf3d55cca0e77fa9b56b4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1272,10 +1272,10 @@ static int ipoib_get_iflink(const struct net_device *dev)
 
 	/* parent interface */
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
-		return dev->ifindex;
+		return READ_ONCE(dev->ifindex);
 
 	/* child/vlan interface */
-	return priv->parent->ifindex;
+	return READ_ONCE(priv->parent->ifindex);
 }
 
 static u32 ipoib_addr_hash(struct ipoib_neigh_hash *htbl, u8 *daddr)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 98c669ad5141479b509ee924ddba3da6bca554cd..f7fabba707ea640cab8863e63bb19294e333ba2c 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -119,7 +119,7 @@ static int vxcan_get_iflink(const struct net_device *dev)
 
 	rcu_read_lock();
 	peer = rcu_dereference(priv->peer);
-	iflink = peer ? peer->ifindex : 0;
+	iflink = peer ? READ_ONCE(peer->ifindex) : 0;
 	rcu_read_unlock();
 
 	return iflink;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 046b5f7d8e7cab33a9f09079858bac2a972e968a..9d2a9562c96ff4937da7a389c773acce01508ca3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -98,7 +98,7 @@ static int rmnet_vnd_get_iflink(const struct net_device *dev)
 {
 	struct rmnet_priv *priv = netdev_priv(dev);
 
-	return priv->real_dev->ifindex;
+	return READ_ONCE(priv->real_dev->ifindex);
 }
 
 static int rmnet_vnd_init(struct net_device *dev)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index df7c43a109e1a7376c6ce3216cb3dd4223eac04c..5920f7e6335230cf07a3da528e4ac7a050c2fd41 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -349,7 +349,7 @@ static int ipvlan_get_iflink(const struct net_device *dev)
 {
 	struct ipvl_dev *ipvlan = netdev_priv(dev);
 
-	return ipvlan->phy_dev->ifindex;
+	return READ_ONCE(ipvlan->phy_dev->ifindex);
 }
 
 static const struct net_device_ops ipvlan_netdev_ops = {
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 7f5426285c61b1e35afd74d4c044f80c77f34e7f..4b5513c9c2befe42e054fee6ecdadc9aabb0ce19 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3753,7 +3753,7 @@ static void macsec_get_stats64(struct net_device *dev,
 
 static int macsec_get_iflink(const struct net_device *dev)
 {
-	return macsec_priv(dev)->real_dev->ifindex;
+	return READ_ONCE(macsec_priv(dev)->real_dev->ifindex);
 }
 
 static const struct net_device_ops macsec_netdev_ops = {
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a3cc665757e8727d3ffb24d8dbfbcd321fc93ffd..0cec2783a3e712b7769572482bf59aa336b9ca15 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1158,7 +1158,7 @@ static int macvlan_dev_get_iflink(const struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
-	return vlan->lowerdev->ifindex;
+	return READ_ONCE(vlan->lowerdev->ifindex);
 }
 
 static const struct ethtool_ops macvlan_ethtool_ops = {
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 39171380ccf29e27412bb2b9cee7102acc4a83ab..a4d2e76a8d587cc6ce7ad7f98e382a1c81f76e67 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -145,7 +145,7 @@ static int netkit_get_iflink(const struct net_device *dev)
 	rcu_read_lock();
 	peer = rcu_dereference(nk->peer);
 	if (peer)
-		iflink = peer->ifindex;
+		iflink = READ_ONCE(peer->ifindex);
 	rcu_read_unlock();
 	return iflink;
 }
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 500b9dfccd08ee8f91b22d78e3d8195f3de26088..dd5aa8ab65a865dc9dbaa596861671d189bfe1af 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1461,7 +1461,7 @@ static int veth_get_iflink(const struct net_device *dev)
 
 	rcu_read_lock();
 	peer = rcu_dereference(priv->peer);
-	iflink = peer ? peer->ifindex : 0;
+	iflink = peer ? READ_ONCE(peer->ifindex) : 0;
 	rcu_read_unlock();
 
 	return iflink;
diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c
index ba14d83353a4b226e44d420a16e33460a9dc762d..6a84ec58d618bcbf966dab6e38cfe02b886a712f 100644
--- a/drivers/net/wireless/virtual/virt_wifi.c
+++ b/drivers/net/wireless/virtual/virt_wifi.c
@@ -453,7 +453,7 @@ static int virt_wifi_net_device_get_iflink(const struct net_device *dev)
 {
 	struct virt_wifi_netdev_priv *priv = netdev_priv(dev);
 
-	return priv->lowerdev->ifindex;
+	return READ_ONCE(priv->lowerdev->ifindex);
 }
 
 static const struct net_device_ops virt_wifi_ops = {
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index df55525182517e49b2cfbffe7f102967c66b5952..39876eff51d21f830c3bde1682e07aac698c633e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -762,9 +762,9 @@ static void vlan_dev_netpoll_cleanup(struct net_device *dev)
 
 static int vlan_dev_get_iflink(const struct net_device *dev)
 {
-	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+	const struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
 
-	return real_dev->ifindex;
+	return READ_ONCE(real_dev->ifindex);
 }
 
 static int vlan_dev_fill_forward_path(struct net_device_path_ctx *ctx,
diff --git a/net/core/dev.c b/net/core/dev.c
index c588808be77f563c429eb4a2eaee5c8062d99582..0628d8ff1ed932efdd45ab7b79599dcfcca6c4eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -641,7 +641,7 @@ int dev_get_iflink(const struct net_device *dev)
 	if (dev->netdev_ops && dev->netdev_ops->ndo_get_iflink)
 		return dev->netdev_ops->ndo_get_iflink(dev);
 
-	return dev->ifindex;
+	return READ_ONCE(dev->ifindex);
 }
 EXPORT_SYMBOL(dev_get_iflink);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c54dbe05c4c5df126d0b58403049ebc1d272907e..060543fe7919c13c7a5c6cf22f9e7606d0897345 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1611,10 +1611,10 @@ static int put_master_ifindex(struct sk_buff *skb, struct net_device *dev)
 static int nla_put_iflink(struct sk_buff *skb, const struct net_device *dev,
 			  bool force)
 {
-	int ifindex = dev_get_iflink(dev);
+	int iflink = dev_get_iflink(dev);
 
-	if (force || dev->ifindex != ifindex)
-		return nla_put_u32(skb, IFLA_LINK, ifindex);
+	if (force || READ_ONCE(dev->ifindex) != iflink)
+		return nla_put_u32(skb, IFLA_LINK, iflink);
 
 	return 0;
 }
diff --git a/net/dsa/user.c b/net/dsa/user.c
index 4d53c76a9840a789511b9ee0d9a39c70de77f72c..9c42a6edcdc8a8de94241ce4a238f31583b738ec 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -352,7 +352,7 @@ void dsa_user_mii_bus_init(struct dsa_switch *ds)
 /* user device handling ****************************************************/
 static int dsa_user_get_iflink(const struct net_device *dev)
 {
-	return dsa_user_to_conduit(dev)->ifindex;
+	return READ_ONCE(dsa_user_to_conduit(dev)->ifindex);
 }
 
 static int dsa_user_open(struct net_device *dev)
diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
index e643f52663f9bed8c4707b205a73d0d2bad5bb73..77b4e92027c5dfdadefc3019ca82ee8967a9006e 100644
--- a/net/ieee802154/6lowpan/core.c
+++ b/net/ieee802154/6lowpan/core.c
@@ -93,7 +93,7 @@ static int lowpan_neigh_construct(struct net_device *dev, struct neighbour *n)
 
 static int lowpan_get_iflink(const struct net_device *dev)
 {
-	return lowpan_802154_dev(dev)->wdev->ifindex;
+	return READ_ONCE(lowpan_802154_dev(dev)->wdev->ifindex);
 }
 
 static const struct net_device_ops lowpan_netdev_ops = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 44406c28445dc457fb47a7cdec295778eb30b31f..5fd07581efafe3c57cc8732ddaae9910d6726f30 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1756,7 +1756,7 @@ int ip6_tnl_get_iflink(const struct net_device *dev)
 {
 	struct ip6_tnl *t = netdev_priv(dev);
 
-	return t->parms.link;
+	return READ_ONCE(t->parms.link);
 }
 EXPORT_SYMBOL(ip6_tnl_get_iflink);
 
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index dafefef3cf51a79fd6701a8b78c3f8fcfd10615d..717855b9acf1c413d506f681aec636af9b075af5 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -727,7 +727,7 @@ static int xfrmi_get_iflink(const struct net_device *dev)
 {
 	struct xfrm_if *xi = netdev_priv(dev);
 
-	return xi->p.link;
+	return READ_ONCE(xi->p.link);
 }
 
 static const struct net_device_ops xfrmi_netdev_ops = {
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
  2024-02-22 10:50 ` [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 14:35   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/addrconf.c | 163 ++++++++++++++++++++++++--------------------
 net/ipv6/ndisc.c    |   2 +-
 2 files changed, 90 insertions(+), 75 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d3f4b7b9cf1fe380757225a110153fbad51bf763..3c8bdad0105dc9542489b612890ba86de9c44bdc 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3477,7 +3477,8 @@ static void addrconf_dev_config(struct net_device *dev)
 	/* this device type has no EUI support */
 	if (dev->type == ARPHRD_NONE &&
 	    idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_EUI64)
-		idev->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_RANDOM;
+		WRITE_ONCE(idev->cnf.addr_gen_mode,
+			   IN6_ADDR_GEN_MODE_RANDOM);
 
 	addrconf_addr_gen(idev, false);
 }
@@ -3749,7 +3750,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 				rt6_mtu_change(dev, dev->mtu);
 				idev->cnf.mtu6 = dev->mtu;
 			}
-			idev->tstamp = jiffies;
+			WRITE_ONCE(idev->tstamp, jiffies);
 			inet6_ifinfo_notify(RTM_NEWLINK, idev);
 
 			/*
@@ -3991,7 +3992,7 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
 		ipv6_mc_down(idev);
 	}
 
-	idev->tstamp = jiffies;
+	WRITE_ONCE(idev->tstamp, jiffies);
 	idev->ra_mtu = 0;
 
 	/* Last: Shot the device (if unregistered) */
@@ -5619,87 +5620,97 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
 		rtnl_set_sk_err(net, RTNLGRP_IPV6_IFADDR, err);
 }
 
-static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
-				__s32 *array, int bytes)
+static void ipv6_store_devconf(const struct ipv6_devconf *cnf,
+			       __s32 *array, int bytes)
 {
 	BUG_ON(bytes < (DEVCONF_MAX * 4));
 
 	memset(array, 0, bytes);
-	array[DEVCONF_FORWARDING] = cnf->forwarding;
-	array[DEVCONF_HOPLIMIT] = cnf->hop_limit;
-	array[DEVCONF_MTU6] = cnf->mtu6;
-	array[DEVCONF_ACCEPT_RA] = cnf->accept_ra;
-	array[DEVCONF_ACCEPT_REDIRECTS] = cnf->accept_redirects;
-	array[DEVCONF_AUTOCONF] = cnf->autoconf;
-	array[DEVCONF_DAD_TRANSMITS] = cnf->dad_transmits;
-	array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
+	array[DEVCONF_FORWARDING] = READ_ONCE(cnf->forwarding);
+	array[DEVCONF_HOPLIMIT] = READ_ONCE(cnf->hop_limit);
+	array[DEVCONF_MTU6] = READ_ONCE(cnf->mtu6);
+	array[DEVCONF_ACCEPT_RA] = READ_ONCE(cnf->accept_ra);
+	array[DEVCONF_ACCEPT_REDIRECTS] = READ_ONCE(cnf->accept_redirects);
+	array[DEVCONF_AUTOCONF] = READ_ONCE(cnf->autoconf);
+	array[DEVCONF_DAD_TRANSMITS] = READ_ONCE(cnf->dad_transmits);
+	array[DEVCONF_RTR_SOLICITS] = READ_ONCE(cnf->rtr_solicits);
 	array[DEVCONF_RTR_SOLICIT_INTERVAL] =
-		jiffies_to_msecs(cnf->rtr_solicit_interval);
+		jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_interval));
 	array[DEVCONF_RTR_SOLICIT_MAX_INTERVAL] =
-		jiffies_to_msecs(cnf->rtr_solicit_max_interval);
+		jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_max_interval));
 	array[DEVCONF_RTR_SOLICIT_DELAY] =
-		jiffies_to_msecs(cnf->rtr_solicit_delay);
-	array[DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version;
+		jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_delay));
+	array[DEVCONF_FORCE_MLD_VERSION] = READ_ONCE(cnf->force_mld_version);
 	array[DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL] =
-		jiffies_to_msecs(cnf->mldv1_unsolicited_report_interval);
+		jiffies_to_msecs(READ_ONCE(cnf->mldv1_unsolicited_report_interval));
 	array[DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL] =
-		jiffies_to_msecs(cnf->mldv2_unsolicited_report_interval);
-	array[DEVCONF_USE_TEMPADDR] = cnf->use_tempaddr;
-	array[DEVCONF_TEMP_VALID_LFT] = cnf->temp_valid_lft;
-	array[DEVCONF_TEMP_PREFERED_LFT] = cnf->temp_prefered_lft;
-	array[DEVCONF_REGEN_MAX_RETRY] = cnf->regen_max_retry;
-	array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
-	array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
-	array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
-	array[DEVCONF_RA_DEFRTR_METRIC] = cnf->ra_defrtr_metric;
-	array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
-	array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
+		jiffies_to_msecs(READ_ONCE(cnf->mldv2_unsolicited_report_interval));
+	array[DEVCONF_USE_TEMPADDR] = READ_ONCE(cnf->use_tempaddr);
+	array[DEVCONF_TEMP_VALID_LFT] = READ_ONCE(cnf->temp_valid_lft);
+	array[DEVCONF_TEMP_PREFERED_LFT] = READ_ONCE(cnf->temp_prefered_lft);
+	array[DEVCONF_REGEN_MAX_RETRY] = READ_ONCE(cnf->regen_max_retry);
+	array[DEVCONF_MAX_DESYNC_FACTOR] = READ_ONCE(cnf->max_desync_factor);
+	array[DEVCONF_MAX_ADDRESSES] = READ_ONCE(cnf->max_addresses);
+	array[DEVCONF_ACCEPT_RA_DEFRTR] = READ_ONCE(cnf->accept_ra_defrtr);
+	array[DEVCONF_RA_DEFRTR_METRIC] = READ_ONCE(cnf->ra_defrtr_metric);
+	array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] =
+		READ_ONCE(cnf->accept_ra_min_hop_limit);
+	array[DEVCONF_ACCEPT_RA_PINFO] = READ_ONCE(cnf->accept_ra_pinfo);
 #ifdef CONFIG_IPV6_ROUTER_PREF
-	array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
+	array[DEVCONF_ACCEPT_RA_RTR_PREF] = READ_ONCE(cnf->accept_ra_rtr_pref);
 	array[DEVCONF_RTR_PROBE_INTERVAL] =
-		jiffies_to_msecs(cnf->rtr_probe_interval);
+		jiffies_to_msecs(READ_ONCE(cnf->rtr_probe_interval));
 #ifdef CONFIG_IPV6_ROUTE_INFO
-	array[DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN] = cnf->accept_ra_rt_info_min_plen;
-	array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = cnf->accept_ra_rt_info_max_plen;
+	array[DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN] =
+		READ_ONCE(cnf->accept_ra_rt_info_min_plen);
+	array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] =
+		READ_ONCE(cnf->accept_ra_rt_info_max_plen);
 #endif
 #endif
-	array[DEVCONF_PROXY_NDP] = cnf->proxy_ndp;
-	array[DEVCONF_ACCEPT_SOURCE_ROUTE] = cnf->accept_source_route;
+	array[DEVCONF_PROXY_NDP] = READ_ONCE(cnf->proxy_ndp);
+	array[DEVCONF_ACCEPT_SOURCE_ROUTE] =
+		READ_ONCE(cnf->accept_source_route);
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
-	array[DEVCONF_OPTIMISTIC_DAD] = cnf->optimistic_dad;
-	array[DEVCONF_USE_OPTIMISTIC] = cnf->use_optimistic;
+	array[DEVCONF_OPTIMISTIC_DAD] = READ_ONCE(cnf->optimistic_dad);
+	array[DEVCONF_USE_OPTIMISTIC] = READ_ONCE(cnf->use_optimistic);
 #endif
 #ifdef CONFIG_IPV6_MROUTE
 	array[DEVCONF_MC_FORWARDING] = atomic_read(&cnf->mc_forwarding);
 #endif
-	array[DEVCONF_DISABLE_IPV6] = cnf->disable_ipv6;
-	array[DEVCONF_ACCEPT_DAD] = cnf->accept_dad;
-	array[DEVCONF_FORCE_TLLAO] = cnf->force_tllao;
-	array[DEVCONF_NDISC_NOTIFY] = cnf->ndisc_notify;
-	array[DEVCONF_SUPPRESS_FRAG_NDISC] = cnf->suppress_frag_ndisc;
-	array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local;
-	array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu;
-	array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] = cnf->ignore_routes_with_linkdown;
+	array[DEVCONF_DISABLE_IPV6] = READ_ONCE(cnf->disable_ipv6);
+	array[DEVCONF_ACCEPT_DAD] = READ_ONCE(cnf->accept_dad);
+	array[DEVCONF_FORCE_TLLAO] = READ_ONCE(cnf->force_tllao);
+	array[DEVCONF_NDISC_NOTIFY] = READ_ONCE(cnf->ndisc_notify);
+	array[DEVCONF_SUPPRESS_FRAG_NDISC] =
+		READ_ONCE(cnf->suppress_frag_ndisc);
+	array[DEVCONF_ACCEPT_RA_FROM_LOCAL] =
+		READ_ONCE(cnf->accept_ra_from_local);
+	array[DEVCONF_ACCEPT_RA_MTU] = READ_ONCE(cnf->accept_ra_mtu);
+	array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] =
+		READ_ONCE(cnf->ignore_routes_with_linkdown);
 	/* we omit DEVCONF_STABLE_SECRET for now */
-	array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
-	array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = cnf->drop_unicast_in_l2_multicast;
-	array[DEVCONF_DROP_UNSOLICITED_NA] = cnf->drop_unsolicited_na;
-	array[DEVCONF_KEEP_ADDR_ON_DOWN] = cnf->keep_addr_on_down;
-	array[DEVCONF_SEG6_ENABLED] = cnf->seg6_enabled;
+	array[DEVCONF_USE_OIF_ADDRS_ONLY] = READ_ONCE(cnf->use_oif_addrs_only);
+	array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] =
+		READ_ONCE(cnf->drop_unicast_in_l2_multicast);
+	array[DEVCONF_DROP_UNSOLICITED_NA] = READ_ONCE(cnf->drop_unsolicited_na);
+	array[DEVCONF_KEEP_ADDR_ON_DOWN] = READ_ONCE(cnf->keep_addr_on_down);
+	array[DEVCONF_SEG6_ENABLED] = READ_ONCE(cnf->seg6_enabled);
 #ifdef CONFIG_IPV6_SEG6_HMAC
-	array[DEVCONF_SEG6_REQUIRE_HMAC] = cnf->seg6_require_hmac;
+	array[DEVCONF_SEG6_REQUIRE_HMAC] = READ_ONCE(cnf->seg6_require_hmac);
 #endif
-	array[DEVCONF_ENHANCED_DAD] = cnf->enhanced_dad;
-	array[DEVCONF_ADDR_GEN_MODE] = cnf->addr_gen_mode;
-	array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
-	array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
-	array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
-	array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
-	array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
-	array[DEVCONF_IOAM6_ID_WIDE] = cnf->ioam6_id_wide;
-	array[DEVCONF_NDISC_EVICT_NOCARRIER] = cnf->ndisc_evict_nocarrier;
-	array[DEVCONF_ACCEPT_UNTRACKED_NA] = cnf->accept_untracked_na;
-	array[DEVCONF_ACCEPT_RA_MIN_LFT] = cnf->accept_ra_min_lft;
+	array[DEVCONF_ENHANCED_DAD] = READ_ONCE(cnf->enhanced_dad);
+	array[DEVCONF_ADDR_GEN_MODE] = READ_ONCE(cnf->addr_gen_mode);
+	array[DEVCONF_DISABLE_POLICY] = READ_ONCE(cnf->disable_policy);
+	array[DEVCONF_NDISC_TCLASS] = READ_ONCE(cnf->ndisc_tclass);
+	array[DEVCONF_RPL_SEG_ENABLED] = READ_ONCE(cnf->rpl_seg_enabled);
+	array[DEVCONF_IOAM6_ENABLED] = READ_ONCE(cnf->ioam6_enabled);
+	array[DEVCONF_IOAM6_ID] = READ_ONCE(cnf->ioam6_id);
+	array[DEVCONF_IOAM6_ID_WIDE] = READ_ONCE(cnf->ioam6_id_wide);
+	array[DEVCONF_NDISC_EVICT_NOCARRIER] =
+		READ_ONCE(cnf->ndisc_evict_nocarrier);
+	array[DEVCONF_ACCEPT_UNTRACKED_NA] =
+		READ_ONCE(cnf->accept_untracked_na);
+	array[DEVCONF_ACCEPT_RA_MIN_LFT] = READ_ONCE(cnf->accept_ra_min_lft);
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5779,13 +5790,14 @@ static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype,
 static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev,
 				  u32 ext_filter_mask)
 {
-	struct nlattr *nla;
 	struct ifla_cacheinfo ci;
+	struct nlattr *nla;
+	u32 ra_mtu;
 
-	if (nla_put_u32(skb, IFLA_INET6_FLAGS, idev->if_flags))
+	if (nla_put_u32(skb, IFLA_INET6_FLAGS, READ_ONCE(idev->if_flags)))
 		goto nla_put_failure;
 	ci.max_reasm_len = IPV6_MAXPLEN;
-	ci.tstamp = cstamp_delta(idev->tstamp);
+	ci.tstamp = cstamp_delta(READ_ONCE(idev->tstamp));
 	ci.reachable_time = jiffies_to_msecs(idev->nd_parms->reachable_time);
 	ci.retrans_time = jiffies_to_msecs(NEIGH_VAR(idev->nd_parms, RETRANS_TIME));
 	if (nla_put(skb, IFLA_INET6_CACHEINFO, sizeof(ci), &ci))
@@ -5817,11 +5829,12 @@ static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev,
 	memcpy(nla_data(nla), idev->token.s6_addr, nla_len(nla));
 	read_unlock_bh(&idev->lock);
 
-	if (nla_put_u8(skb, IFLA_INET6_ADDR_GEN_MODE, idev->cnf.addr_gen_mode))
+	if (nla_put_u8(skb, IFLA_INET6_ADDR_GEN_MODE,
+		       READ_ONCE(idev->cnf.addr_gen_mode)))
 		goto nla_put_failure;
 
-	if (idev->ra_mtu &&
-	    nla_put_u32(skb, IFLA_INET6_RA_MTU, idev->ra_mtu))
+	ra_mtu = READ_ONCE(idev->ra_mtu);
+	if (ra_mtu && nla_put_u32(skb, IFLA_INET6_RA_MTU, ra_mtu))
 		goto nla_put_failure;
 
 	return 0;
@@ -6022,7 +6035,7 @@ static int inet6_set_link_af(struct net_device *dev, const struct nlattr *nla,
 	if (tb[IFLA_INET6_ADDR_GEN_MODE]) {
 		u8 mode = nla_get_u8(tb[IFLA_INET6_ADDR_GEN_MODE]);
 
-		idev->cnf.addr_gen_mode = mode;
+		WRITE_ONCE(idev->cnf.addr_gen_mode, mode);
 	}
 
 	return 0;
@@ -6501,7 +6514,7 @@ static int addrconf_sysctl_addr_gen_mode(struct ctl_table *ctl, int write,
 			}
 
 			if (idev->cnf.addr_gen_mode != new_val) {
-				idev->cnf.addr_gen_mode = new_val;
+				WRITE_ONCE(idev->cnf.addr_gen_mode, new_val);
 				addrconf_init_auto_addrs(idev->dev);
 			}
 		} else if (&net->ipv6.devconf_all->addr_gen_mode == ctl->data) {
@@ -6512,7 +6525,8 @@ static int addrconf_sysctl_addr_gen_mode(struct ctl_table *ctl, int write,
 				idev = __in6_dev_get(dev);
 				if (idev &&
 				    idev->cnf.addr_gen_mode != new_val) {
-					idev->cnf.addr_gen_mode = new_val;
+					WRITE_ONCE(idev->cnf.addr_gen_mode,
+						  new_val);
 					addrconf_init_auto_addrs(idev->dev);
 				}
 			}
@@ -6577,14 +6591,15 @@ static int addrconf_sysctl_stable_secret(struct ctl_table *ctl, int write,
 			struct inet6_dev *idev = __in6_dev_get(dev);
 
 			if (idev) {
-				idev->cnf.addr_gen_mode =
-					IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
+				WRITE_ONCE(idev->cnf.addr_gen_mode,
+					   IN6_ADDR_GEN_MODE_STABLE_PRIVACY);
 			}
 		}
 	} else {
 		struct inet6_dev *idev = ctl->extra1;
 
-		idev->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
+		WRITE_ONCE(idev->cnf.addr_gen_mode,
+			   IN6_ADDR_GEN_MODE_STABLE_PRIVACY);
 	}
 
 out:
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 73cb31afe93542285e3f11b7140d2cc1619006e7..8523f0595b01899a9f6cf82809c1b4bcfc233202 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1975,7 +1975,7 @@ int ndisc_ifinfo_sysctl_change(struct ctl_table *ctl, int write, void *buffer,
 		if (ctl->data == &NEIGH_VAR(idev->nd_parms, BASE_REACHABLE_TIME))
 			idev->nd_parms->reachable_time =
 					neigh_rand_reach_time(NEIGH_VAR(idev->nd_parms, BASE_REACHABLE_TIME));
-		idev->tstamp = jiffies;
+		WRITE_ONCE(idev->tstamp, jiffies);
 		inet6_ifinfo_notify(RTM_NEWLINK, idev);
 		in6_dev_put(idev);
 	}
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
  2024-02-22 10:50 ` [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
  2024-02-22 10:50 ` [PATCH v2 net-next 02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-22 16:36   ` Jiri Pirko
  2024-02-22 10:50 ` [PATCH v2 net-next 04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

We want to use RCU protection instead of RTNL
for inet6_fill_ifinfo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |  6 ++++--
 net/core/dev.c            |  4 ++--
 net/ipv6/addrconf.c       | 11 +++++++----
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f07c8374f29cb936fe11236fc63e06e741b1c965..09023e44db4e2c3a2133afc52ba5a335d6030646 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4354,8 +4354,10 @@ static inline bool netif_testing(const struct net_device *dev)
  */
 static inline bool netif_oper_up(const struct net_device *dev)
 {
-	return (dev->operstate == IF_OPER_UP ||
-		dev->operstate == IF_OPER_UNKNOWN /* backward compat */);
+	unsigned int operstate = READ_ONCE(dev->operstate);
+
+	return	operstate == IF_OPER_UP ||
+		operstate == IF_OPER_UNKNOWN /* backward compat */;
 }
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index 0628d8ff1ed932efdd45ab7b79599dcfcca6c4eb..275fd5259a4a92d0bd2e145d66a716248b6c2804 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8632,12 +8632,12 @@ unsigned int dev_get_flags(const struct net_device *dev)
 {
 	unsigned int flags;
 
-	flags = (dev->flags & ~(IFF_PROMISC |
+	flags = (READ_ONCE(dev->flags) & ~(IFF_PROMISC |
 				IFF_ALLMULTI |
 				IFF_RUNNING |
 				IFF_LOWER_UP |
 				IFF_DORMANT)) |
-		(dev->gflags & (IFF_PROMISC |
+		(READ_ONCE(dev->gflags) & (IFF_PROMISC |
 				IFF_ALLMULTI));
 
 	if (netif_running(dev)) {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3c8bdad0105dc9542489b612890ba86de9c44bdc..df3c6feea74e2d95144140eceb6df5cef2dce1f4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6047,6 +6047,7 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
 	struct net_device *dev = idev->dev;
 	struct ifinfomsg *hdr;
 	struct nlmsghdr *nlh;
+	int ifindex, iflink;
 	void *protoinfo;
 
 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*hdr), flags);
@@ -6057,16 +6058,18 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
 	hdr->ifi_family = AF_INET6;
 	hdr->__ifi_pad = 0;
 	hdr->ifi_type = dev->type;
-	hdr->ifi_index = dev->ifindex;
+	ifindex = READ_ONCE(dev->ifindex);
+	hdr->ifi_index = ifindex;
 	hdr->ifi_flags = dev_get_flags(dev);
 	hdr->ifi_change = 0;
 
+	iflink = dev_get_iflink(dev);
 	if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
 	    (dev->addr_len &&
 	     nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) ||
-	    nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
-	    (dev->ifindex != dev_get_iflink(dev) &&
-	     nla_put_u32(skb, IFLA_LINK, dev_get_iflink(dev))) ||
+	    nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) ||
+	    (ifindex != iflink &&
+	     nla_put_u32(skb, IFLA_LINK, iflink)) ||
 	    nla_put_u8(skb, IFLA_OPERSTATE,
 		       netif_running(dev) ? READ_ONCE(dev->operstate) : IF_OPER_DOWN))
 		goto nla_put_failure;
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo()
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (2 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 14:42   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 05/14] netlink: fix netlink_diag_dump() return value Eric Dumazet
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

Prepare inet6_dump_ifinfo() to run with RCU protection
instead of RTNL and use for_each_netdev_dump() interface.

Also properly return 0 at the end of a dump, avoiding
an extra recvmsg() system call and RTNL acquisition.

Note that RTNL-less dumps need core changes, coming later
in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/ipv6/addrconf.c | 46 +++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index df3c6feea74e2d95144140eceb6df5cef2dce1f4..8994ddc6c859e6bc68303e6e61663baf330aee00 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6117,50 +6117,42 @@ static int inet6_valid_dump_ifinfo(const struct nlmsghdr *nlh,
 static int inet6_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);
-	int h, s_h;
-	int idx = 0, s_idx;
+	struct {
+		unsigned long ifindex;
+	} *ctx = (void *)cb->ctx;
 	struct net_device *dev;
 	struct inet6_dev *idev;
-	struct hlist_head *head;
+	int err;
 
 	/* only requests using strict checking can pass data to
 	 * influence the dump
 	 */
 	if (cb->strict_check) {
-		int err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack);
+		err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack);
 
 		if (err < 0)
 			return err;
 	}
 
-	s_h = cb->args[0];
-	s_idx = cb->args[1];
-
+	err = 0;
 	rcu_read_lock();
-	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
-		idx = 0;
-		head = &net->dev_index_head[h];
-		hlist_for_each_entry_rcu(dev, head, index_hlist) {
-			if (idx < s_idx)
-				goto cont;
-			idev = __in6_dev_get(dev);
-			if (!idev)
-				goto cont;
-			if (inet6_fill_ifinfo(skb, idev,
-					      NETLINK_CB(cb->skb).portid,
-					      cb->nlh->nlmsg_seq,
-					      RTM_NEWLINK, NLM_F_MULTI) < 0)
-				goto out;
-cont:
-			idx++;
+	for_each_netdev_dump(net, dev, ctx->ifindex) {
+		idev = __in6_dev_get(dev);
+		if (!idev)
+			continue;
+		err = inet6_fill_ifinfo(skb, idev,
+					NETLINK_CB(cb->skb).portid,
+					cb->nlh->nlmsg_seq,
+					RTM_NEWLINK, NLM_F_MULTI);
+		if (err < 0) {
+			if (likely(skb->len))
+				err = skb->len;
+			break;
 		}
 	}
-out:
 	rcu_read_unlock();
-	cb->args[1] = idx;
-	cb->args[0] = h;
 
-	return skb->len;
+	return err;
 }
 
 void inet6_ifinfo_notify(int event, struct inet6_dev *idev)
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 05/14] netlink: fix netlink_diag_dump() return value
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (3 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 12:30   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

__netlink_diag_dump() returns 1 if the dump is not complete,
zero if no error occurred.

If err variable is zero, this means the dump is complete:
We should not return skb->len in this case, but 0.

This allows NLMSG_DONE to be appended to the skb.
User space does not have to call us again only to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/netlink/diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index e12c90d5f6ad29446ea1990c88c19bcb0ee856c3..61981e01fd6ff189dcb46a06a4d265cf6029b840 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -207,7 +207,7 @@ static int netlink_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 		err = __netlink_diag_dump(skb, cb, req->sdiag_protocol, s_num);
 	}
 
-	return err < 0 ? err : skb->len;
+	return err <= 0 ? err : skb->len;
 }
 
 static int netlink_diag_dump_done(struct netlink_callback *cb)
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (4 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 05/14] netlink: fix netlink_diag_dump() return value Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-22 16:20   ` Jiri Pirko
  2024-02-22 10:50 ` [PATCH v2 net-next 07/14] rtnetlink: change nlk->cb_mutex role Eric Dumazet
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.

This seems dangerous, even if KASAN did not bother yet.

Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/netlink/af_netlink.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9c962347cf859f16fc76e4d8a2fd22cdb3d142d6..94f3860526bfaa5793e8b3917250ec0e751687b5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -130,7 +130,7 @@ static const char *const nlk_cb_mutex_key_strings[MAX_LINKS + 1] = {
 	"nlk_cb_mutex-MAX_LINKS"
 };
 
-static int netlink_dump(struct sock *sk);
+static int netlink_dump(struct sock *sk, bool lock_taken);
 
 /* nl_table locking explained:
  * Lookup and traversal are protected with an RCU read-side lock. Insertion
@@ -1987,7 +1987,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	if (READ_ONCE(nlk->cb_running) &&
 	    atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
-		ret = netlink_dump(sk);
+		ret = netlink_dump(sk, false);
 		if (ret) {
 			WRITE_ONCE(sk->sk_err, -ret);
 			sk_error_report(sk);
@@ -2196,7 +2196,7 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
 	return 0;
 }
 
-static int netlink_dump(struct sock *sk)
+static int netlink_dump(struct sock *sk, bool lock_taken)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 	struct netlink_ext_ack extack = {};
@@ -2208,7 +2208,8 @@ static int netlink_dump(struct sock *sk)
 	int alloc_min_size;
 	int alloc_size;
 
-	mutex_lock(nlk->cb_mutex);
+	if (!lock_taken)
+		mutex_lock(nlk->cb_mutex);
 	if (!nlk->cb_running) {
 		err = -EINVAL;
 		goto errout_skb;
@@ -2365,9 +2366,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	WRITE_ONCE(nlk->cb_running, true);
 	nlk->dump_done_errno = INT_MAX;
 
-	mutex_unlock(nlk->cb_mutex);
-
-	ret = netlink_dump(sk);
+	ret = netlink_dump(sk, true);
 
 	sock_put(sk);
 
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 07/14] rtnetlink: change nlk->cb_mutex role
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (5 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-22 10:50 ` [PATCH v2 net-next 08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and the dump() operations.

The override is used for rtnl dumps, registered with
rntl_register() and rntl_register_module().

We want to be able to opt-out some dump() operations
to not acquire RTNL, so we need to protect nlk->cb
with a per socket mutex.

This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex

The optional pointer to the mutex used to protect dump()
call is stored in nlk->dump_cb_mutex

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/netlink/af_netlink.c | 32 ++++++++++++++++++--------------
 net/netlink/af_netlink.h |  5 +++--
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 94f3860526bfaa5793e8b3917250ec0e751687b5..84cad7be6d4335bfb5301ef49f84af8e7b3bc842 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -636,7 +636,7 @@ static struct proto netlink_proto = {
 };
 
 static int __netlink_create(struct net *net, struct socket *sock,
-			    struct mutex *cb_mutex, int protocol,
+			    struct mutex *dump_cb_mutex, int protocol,
 			    int kern)
 {
 	struct sock *sk;
@@ -651,15 +651,11 @@ static int __netlink_create(struct net *net, struct socket *sock,
 	sock_init_data(sock, sk);
 
 	nlk = nlk_sk(sk);
-	if (cb_mutex) {
-		nlk->cb_mutex = cb_mutex;
-	} else {
-		nlk->cb_mutex = &nlk->cb_def_mutex;
-		mutex_init(nlk->cb_mutex);
-		lockdep_set_class_and_name(nlk->cb_mutex,
+	mutex_init(&nlk->nl_cb_mutex);
+	lockdep_set_class_and_name(&nlk->nl_cb_mutex,
 					   nlk_cb_mutex_keys + protocol,
 					   nlk_cb_mutex_key_strings[protocol]);
-	}
+	nlk->dump_cb_mutex = dump_cb_mutex;
 	init_waitqueue_head(&nlk->wait);
 
 	sk->sk_destruct = netlink_sock_destruct;
@@ -2209,7 +2205,7 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	int alloc_size;
 
 	if (!lock_taken)
-		mutex_lock(nlk->cb_mutex);
+		mutex_lock(&nlk->nl_cb_mutex);
 	if (!nlk->cb_running) {
 		err = -EINVAL;
 		goto errout_skb;
@@ -2261,14 +2257,22 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	netlink_skb_set_owner_r(skb, sk);
 
 	if (nlk->dump_done_errno > 0) {
+		struct mutex *extra_mutex = nlk->dump_cb_mutex;
+
 		cb->extack = &extack;
+
+		if (extra_mutex)
+			mutex_lock(extra_mutex);
 		nlk->dump_done_errno = cb->dump(skb, cb);
+		if (extra_mutex)
+			mutex_unlock(extra_mutex);
+
 		cb->extack = NULL;
 	}
 
 	if (nlk->dump_done_errno > 0 ||
 	    skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
-		mutex_unlock(nlk->cb_mutex);
+		mutex_unlock(&nlk->nl_cb_mutex);
 
 		if (sk_filter(sk, skb))
 			kfree_skb(skb);
@@ -2302,13 +2306,13 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	WRITE_ONCE(nlk->cb_running, false);
 	module = cb->module;
 	skb = cb->skb;
-	mutex_unlock(nlk->cb_mutex);
+	mutex_unlock(&nlk->nl_cb_mutex);
 	module_put(module);
 	consume_skb(skb);
 	return 0;
 
 errout_skb:
-	mutex_unlock(nlk->cb_mutex);
+	mutex_unlock(&nlk->nl_cb_mutex);
 	kfree_skb(skb);
 	return err;
 }
@@ -2331,7 +2335,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	nlk = nlk_sk(sk);
-	mutex_lock(nlk->cb_mutex);
+	mutex_lock(&nlk->nl_cb_mutex);
 	/* A dump is in progress... */
 	if (nlk->cb_running) {
 		ret = -EBUSY;
@@ -2382,7 +2386,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	module_put(control->module);
 error_unlock:
 	sock_put(sk);
-	mutex_unlock(nlk->cb_mutex);
+	mutex_unlock(&nlk->nl_cb_mutex);
 error_free:
 	kfree_skb(skb);
 	return ret;
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 2145979b9986a0331b34b6ba2fda867f23d0d71c..9751e29d4bbb9ad9cb7900e2cfaedbe7ab138cf4 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -39,8 +39,9 @@ struct netlink_sock {
 	bool			cb_running;
 	int			dump_done_errno;
 	struct netlink_callback	cb;
-	struct mutex		*cb_mutex;
-	struct mutex		cb_def_mutex;
+	struct mutex		nl_cb_mutex;
+
+	struct mutex		*dump_cb_mutex;
 	void			(*netlink_rcv)(struct sk_buff *skb);
 	int			(*netlink_bind)(struct net *net, int group);
 	void			(*netlink_unbind)(struct net *net, int group);
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (6 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 07/14] rtnetlink: change nlk->cb_mutex role Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 15:19   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netlink.h  | 2 ++
 include/net/rtnetlink.h  | 1 +
 net/core/rtnetlink.c     | 2 ++
 net/netlink/af_netlink.c | 3 +++
 4 files changed, 8 insertions(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 1a4445bf2ab9acff630b3712453c8a6cdf8fc47c..5df7340d4dabc0c0b1728dafde43b5522dacd024 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -291,6 +291,7 @@ struct netlink_callback {
 	u16			answer_flags;
 	u32			min_dump_alloc;
 	unsigned int		prev_seq, seq;
+	int			flags;
 	bool			strict_check;
 	union {
 		u8		ctx[48];
@@ -323,6 +324,7 @@ struct netlink_dump_control {
 	void *data;
 	struct module *module;
 	u32 min_dump_alloc;
+	int flags;
 };
 
 int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 6506221c5fe31f49ccaca470e0b24dffb703c28e..3bfb80bad1739d244a3906fa7f0e1a606dfaf868 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -12,6 +12,7 @@ typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *);
 enum rtnl_link_flags {
 	RTNL_FLAG_DOIT_UNLOCKED		= BIT(0),
 	RTNL_FLAG_BULK_DEL_SUPPORTED	= BIT(1),
+	RTNL_FLAG_DUMP_UNLOCKED		= BIT(2),
 };
 
 enum rtnl_kinds {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 060543fe7919c13c7a5c6cf22f9e7606d0897345..1b26dfa5668d22fb2e30ceefbf143e98df13ae29 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -6532,6 +6532,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 		}
 		owner = link->owner;
 		dumpit = link->dumpit;
+		flags = link->flags;
 
 		if (type == RTM_GETLINK - RTM_BASE)
 			min_dump_alloc = rtnl_calcit(skb, nlh);
@@ -6549,6 +6550,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 				.dump		= dumpit,
 				.min_dump_alloc	= min_dump_alloc,
 				.module		= owner,
+				.flags		= flags,
 			};
 			err = netlink_dump_start(rtnl, skb, nlh, &c);
 			/* netlink_dump_start() will keep a reference on
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 84cad7be6d4335bfb5301ef49f84af8e7b3bc842..be5792b638aa563232cdb96de8c97c4fe45b3718 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2261,6 +2261,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 
 		cb->extack = &extack;
 
+		if (cb->flags & RTNL_FLAG_DUMP_UNLOCKED)
+			extra_mutex = NULL;
 		if (extra_mutex)
 			mutex_lock(extra_mutex);
 		nlk->dump_done_errno = cb->dump(skb, cb);
@@ -2355,6 +2357,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	cb->data = control->data;
 	cb->module = control->module;
 	cb->min_dump_alloc = control->min_dump_alloc;
+	cb->flags = control->flags;
 	cb->skb = skb;
 
 	cb->strict_check = nlk_test_bit(STRICT_CHK, NETLINK_CB(skb).sk);
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (7 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 15:19   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

No longer hold RTNL while calling inet6_dump_ifinfo()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/addrconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8994ddc6c859e6bc68303e6e61663baf330aee00..244b670a44b92f10b8f18c444d72a2467f8ed90a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -7447,7 +7447,7 @@ int __init addrconf_init(void)
 	rtnl_af_register(&inet6_ops);
 
 	err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETLINK,
-				   NULL, inet6_dump_ifinfo, 0);
+				   NULL, inet6_dump_ifinfo, RTNL_FLAG_DUMP_UNLOCKED);
 	if (err < 0)
 		goto errout;
 
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (8 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 15:22   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL Eric Dumazet
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

Add a new field into struct fib_dump_filter, to let callers
tell if they use RTNL locking or RCU.

This is used in the following patch, when inet_dump_fib()
no longer holds RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip_fib.h    |  1 +
 net/ipv4/fib_frontend.c | 15 +++++++++++----
 net/ipv4/ipmr.c         |  4 +++-
 net/ipv6/ip6_fib.c      |  7 +++++--
 net/ipv6/ip6mr.c        |  4 +++-
 net/mpls/af_mpls.c      |  4 +++-
 6 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index d4667b7797e3e4591f3ff1fe641f168295e0a894..9b2f69ba5e4981fb108581c229ff008d04750ade 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -264,6 +264,7 @@ struct fib_dump_filter {
 	bool			filter_set;
 	bool			dump_routes;
 	bool			dump_exceptions;
+	bool			rtnl_held;
 	unsigned char		protocol;
 	unsigned char		rt_type;
 	unsigned int		flags;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 390f4be7f7bec20f33aa80e9bf12d5e2f3760562..39f67990e01c19b73a622dced0220a1bba21d5e6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -916,7 +916,8 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
 	struct rtmsg *rtm;
 	int err, i;
 
-	ASSERT_RTNL();
+	if (filter->rtnl_held)
+		ASSERT_RTNL();
 
 	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
 		NL_SET_ERR_MSG(extack, "Invalid header for FIB dump request");
@@ -961,7 +962,10 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
 			break;
 		case RTA_OIF:
 			ifindex = nla_get_u32(tb[i]);
-			filter->dev = __dev_get_by_index(net, ifindex);
+			if (filter->rtnl_held)
+				filter->dev = __dev_get_by_index(net, ifindex);
+			else
+				filter->dev = dev_get_by_index_rcu(net, ifindex);
 			if (!filter->dev)
 				return -ENODEV;
 			break;
@@ -983,8 +987,11 @@ EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req);
 
 static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	struct fib_dump_filter filter = { .dump_routes = true,
-					  .dump_exceptions = true };
+	struct fib_dump_filter filter = {
+		.dump_routes = true,
+		.dump_exceptions = true,
+		.rtnl_held = true,
+	};
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
 	unsigned int h, s_h;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5561bce3a37e8f72d08ff8062d6b8cde08bbed44..0708ac6f6c582681ab1f2b52c5ce1f2a4acd10de 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2587,7 +2587,9 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 
 static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	struct fib_dump_filter filter = {};
+	struct fib_dump_filter filter = {
+		.rtnl_held = true,
+	};
 	int err;
 
 	if (cb->strict_check) {
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 6540d877d3693e788d000309950f3735554c937d..5c558dc1c6838681c2848412dced72a41fe764be 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -620,8 +620,11 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb,
 
 static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	struct rt6_rtnl_dump_arg arg = { .filter.dump_exceptions = true,
-					 .filter.dump_routes = true };
+	struct rt6_rtnl_dump_arg arg = {
+		.filter.dump_exceptions = true,
+		.filter.dump_routes = true,
+		.filter.rtnl_held = true,
+	};
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
 	unsigned int h, s_h;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 1f19743f254064852139809143b60c1d397fe1d8..cb0ee81a068a4c895d5d8b21f3fc557bf1784dfb 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2592,7 +2592,9 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	const struct nlmsghdr *nlh = cb->nlh;
-	struct fib_dump_filter filter = {};
+	struct fib_dump_filter filter = {
+		.rtnl_held = true,
+	};
 	int err;
 
 	if (cb->strict_check) {
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1af29af65388584e9666f4fcb73a16e8ff159587..6dab883a08dda46ff6ddc1e6e407e6f48a10c8aa 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2179,7 +2179,9 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
 	struct mpls_route __rcu **platform_label;
-	struct fib_dump_filter filter = {};
+	struct fib_dump_filter filter = {
+		.rtnl_held = true,
+	};
 	unsigned int flags = NLM_F_MULTI;
 	size_t platform_labels;
 	unsigned int index;
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (9 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 15:21   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 12/14] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

nexthop_mpath_fill_node() will be potentially called
from contexts holding rcu_lock instead of RTNL.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/nexthop.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 6647ad509faa02a9a13d58f3405c4a540abc5077..77e99cba60ade85d25329074905b33424c11e7f5 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -317,7 +317,7 @@ static inline
 int nexthop_mpath_fill_node(struct sk_buff *skb, struct nexthop *nh,
 			    u8 rt_family)
 {
-	struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
+	struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp);
 	int i;
 
 	for (i = 0; i < nhg->num_nh; i++) {
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 12/14] inet: switch inet_dump_fib() to RCU protection
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (10 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 15:25   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

No longer hold RTNL while calling inet_dump_fib().

Also change return value for a completed dump:

Returning 0 instead of skb->len allows NLMSG_DONE
to be appended to the skb. User space does not have
to call us again to get a standalone NLMSG_DONE marker.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/fib_frontend.c | 37 ++++++++++++++++++-------------------
 net/ipv4/fib_trie.c     |  4 ++--
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 39f67990e01c19b73a622dced0220a1bba21d5e6..bf3a2214fe29b6f9b494581b293259e6c5ce6f8c 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -990,7 +990,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	struct fib_dump_filter filter = {
 		.dump_routes = true,
 		.dump_exceptions = true,
-		.rtnl_held = true,
+		.rtnl_held = false,
 	};
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
@@ -998,12 +998,13 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	unsigned int e = 0, s_e;
 	struct fib_table *tb;
 	struct hlist_head *head;
-	int dumped = 0, err;
+	int dumped = 0, err = 0;
 
+	rcu_read_lock();
 	if (cb->strict_check) {
 		err = ip_valid_fib_dump_req(net, nlh, &filter, cb);
 		if (err < 0)
-			return err;
+			goto unlock;
 	} else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
 		struct rtmsg *rtm = nlmsg_data(nlh);
 
@@ -1012,29 +1013,28 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 
 	/* ipv4 does not use prefix flag */
 	if (filter.flags & RTM_F_PREFIX)
-		return skb->len;
+		goto unlock;
 
 	if (filter.table_id) {
 		tb = fib_get_table(net, filter.table_id);
 		if (!tb) {
 			if (rtnl_msg_family(cb->nlh) != PF_INET)
-				return skb->len;
+				goto unlock;
 
 			NL_SET_ERR_MSG(cb->extack, "ipv4: FIB table does not exist");
-			return -ENOENT;
+			err = -ENOENT;
+			goto unlock;
 		}
-
-		rcu_read_lock();
 		err = fib_table_dump(tb, skb, cb, &filter);
-		rcu_read_unlock();
-		return skb->len ? : err;
+		if (err < 0 && skb->len)
+			err = skb->len;
+		goto unlock;
 	}
 
 	s_h = cb->args[0];
 	s_e = cb->args[1];
 
-	rcu_read_lock();
-
+	err = 0;
 	for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_e = 0) {
 		e = 0;
 		head = &net->ipv4.fib_table_hash[h];
@@ -1047,9 +1047,8 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 			err = fib_table_dump(tb, skb, cb, &filter);
 			if (err < 0) {
 				if (likely(skb->len))
-					goto out;
-
-				goto out_err;
+					err = skb->len;
+				goto out;
 			}
 			dumped = 1;
 next:
@@ -1057,13 +1056,12 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 		}
 	}
 out:
-	err = skb->len;
-out_err:
-	rcu_read_unlock();
 
 	cb->args[1] = e;
 	cb->args[0] = h;
 
+unlock:
+	rcu_read_unlock();
 	return err;
 }
 
@@ -1666,5 +1664,6 @@ void __init ip_fib_init(void)
 
 	rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, 0);
 	rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, 0);
-	rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, 0);
+	rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib,
+		      RTNL_FLAG_DUMP_UNLOCKED);
 }
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 0fc7ab5832d1ae00e33fdf6fad4ef379c7d0bd4d..f474106464d2f2a52fa6b7ecaf2146977d05eecc 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2368,7 +2368,7 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 	 * and key == 0 means the dump has wrapped around and we are done.
 	 */
 	if (count && !key)
-		return skb->len;
+		return 0;
 
 	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
 		int err;
@@ -2394,7 +2394,7 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
 	cb->args[3] = key;
 	cb->args[2] = count;
 
-	return skb->len;
+	return 0;
 }
 
 void __init fib_trie_init(void)
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (11 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 12/14] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 13:03   ` Donald Hunter
  2024-02-22 10:50 ` [PATCH v2 net-next 14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
  2024-02-26 11:50 ` [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps patchwork-bot+netdevbpf
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

Use READ_ONCE() to read the following device fields:

	dev->mem_start
	dev->mem_end
	dev->base_addr
	dev->irq
	dev->dma
	dev->if_port

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1b26dfa5668d22fb2e30ceefbf143e98df13ae29..2d83ab76a3c95c3200016a404e740bb058f23ada 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1455,17 +1455,18 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
 	return 0;
 }
 
-static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
+static int rtnl_fill_link_ifmap(struct sk_buff *skb,
+				const struct net_device *dev)
 {
 	struct rtnl_link_ifmap map;
 
 	memset(&map, 0, sizeof(map));
-	map.mem_start   = dev->mem_start;
-	map.mem_end     = dev->mem_end;
-	map.base_addr   = dev->base_addr;
-	map.irq         = dev->irq;
-	map.dma         = dev->dma;
-	map.port        = dev->if_port;
+	map.mem_start = READ_ONCE(dev->mem_start);
+	map.mem_end   = READ_ONCE(dev->mem_end);
+	map.base_addr = READ_ONCE(dev->base_addr);
+	map.irq       = READ_ONCE(dev->irq);
+	map.dma       = READ_ONCE(dev->dma);
+	map.port      = READ_ONCE(dev->if_port);
 
 	if (nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
 		return -EMSGSIZE;
@@ -1875,9 +1876,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 			goto nla_put_failure;
 	}
 
-	if (rtnl_fill_link_ifmap(skb, dev))
-		goto nla_put_failure;
-
 	if (dev->addr_len) {
 		if (nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr) ||
 		    nla_put(skb, IFLA_BROADCAST, dev->addr_len, dev->broadcast))
@@ -1927,6 +1925,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	rcu_read_lock();
 	if (rtnl_fill_link_af(skb, dev, ext_filter_mask))
 		goto nla_put_failure_rcu;
+	if (rtnl_fill_link_ifmap(skb, dev))
+		goto nla_put_failure_rcu;
+
 	rcu_read_unlock();
 
 	if (rtnl_fill_prop_list(skb, dev))
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 net-next 14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list()
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (12 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
@ 2024-02-22 10:50 ` Eric Dumazet
  2024-02-23 13:03   ` Donald Hunter
  2024-02-26 11:50 ` [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps patchwork-bot+netdevbpf
  14 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 10:50 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Ido Schimmel, Jiri Pirko, eric.dumazet, Eric Dumazet

We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

dev->name_node items are already rcu protected.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2d83ab76a3c95c3200016a404e740bb058f23ada..39f17d0b6ceaa9fcf29905ab0a97645a4e831990 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1699,7 +1699,7 @@ static int rtnl_fill_alt_ifnames(struct sk_buff *skb,
 	struct netdev_name_node *name_node;
 	int count = 0;
 
-	list_for_each_entry(name_node, &dev->name_node->list, list) {
+	list_for_each_entry_rcu(name_node, &dev->name_node->list, list) {
 		if (nla_put_string(skb, IFLA_ALT_IFNAME, name_node->name))
 			return -EMSGSIZE;
 		count++;
@@ -1707,6 +1707,7 @@ static int rtnl_fill_alt_ifnames(struct sk_buff *skb,
 	return count;
 }
 
+/* RCU protected. */
 static int rtnl_fill_prop_list(struct sk_buff *skb,
 			       const struct net_device *dev)
 {
@@ -1927,11 +1928,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 		goto nla_put_failure_rcu;
 	if (rtnl_fill_link_ifmap(skb, dev))
 		goto nla_put_failure_rcu;
-
-	rcu_read_unlock();
-
 	if (rtnl_fill_prop_list(skb, dev))
-		goto nla_put_failure;
+		goto nla_put_failure_rcu;
+	rcu_read_unlock();
 
 	if (dev->dev.parent &&
 	    nla_put_string(skb, IFLA_PARENT_DEV_NAME,
-- 
2.44.0.rc1.240.g4c46232300-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  2024-02-22 10:50 ` [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
@ 2024-02-22 16:20   ` Jiri Pirko
  2024-06-09  8:17     ` Tetsuo Handa
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2024-02-22 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Thu, Feb 22, 2024 at 11:50:13AM CET, edumazet@google.com wrote:
>__netlink_dump_start() releases nlk->cb_mutex right before
>calling netlink_dump() which grabs it again.

Yeah, I spotted this recently as well. Good to get rid of it.


>
>This seems dangerous, even if KASAN did not bother yet.
>
>Add a @lock_taken parameter to netlink_dump() to let it
>grab the mutex if called from netlink_recvmsg() only.
>
>Signed-off-by: Eric Dumazet <edumazet@google.com>


Reviewed-by: Jiri Pirko <jiri@nvidia.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
  2024-02-22 10:50 ` [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
@ 2024-02-22 16:36   ` Jiri Pirko
  2024-02-22 16:43     ` Eric Dumazet
  2024-02-22 16:45     ` Eric Dumazet
  0 siblings, 2 replies; 41+ messages in thread
From: Jiri Pirko @ 2024-02-22 16:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Thu, Feb 22, 2024 at 11:50:10AM CET, edumazet@google.com wrote:
>We want to use RCU protection instead of RTNL

Is this a royal "We"? :)


>for inet6_fill_ifinfo().

This is a motivation for this patch, not what the patch does.

Would it be possible to maintain some sort of culture for the patch
descriptions, even of the patches which are small and simple?

https://www.kernel.org/doc/html/v6.6/process/submitting-patches.html#describe-your-changes

Your patch descriptions are usually hard to follow for me to understand
what the patch does :( Yes, I know you do it "to displease me" as you
wrote couple of months ago but maybe think about the others too, also
the ones looking in a git log/show and guessing.

Don't beat me.


>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>---
> include/linux/netdevice.h |  6 ++++--
> net/core/dev.c            |  4 ++--
> net/ipv6/addrconf.c       | 11 +++++++----
> 3 files changed, 13 insertions(+), 8 deletions(-)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index f07c8374f29cb936fe11236fc63e06e741b1c965..09023e44db4e2c3a2133afc52ba5a335d6030646 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -4354,8 +4354,10 @@ static inline bool netif_testing(const struct net_device *dev)
>  */
> static inline bool netif_oper_up(const struct net_device *dev)
> {
>-	return (dev->operstate == IF_OPER_UP ||
>-		dev->operstate == IF_OPER_UNKNOWN /* backward compat */);
>+	unsigned int operstate = READ_ONCE(dev->operstate);
>+
>+	return	operstate == IF_OPER_UP ||

double space  ^^


>+		operstate == IF_OPER_UNKNOWN /* backward compat */;
> }
> 
> /**
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 0628d8ff1ed932efdd45ab7b79599dcfcca6c4eb..275fd5259a4a92d0bd2e145d66a716248b6c2804 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -8632,12 +8632,12 @@ unsigned int dev_get_flags(const struct net_device *dev)
> {
> 	unsigned int flags;
> 
>-	flags = (dev->flags & ~(IFF_PROMISC |
>+	flags = (READ_ONCE(dev->flags) & ~(IFF_PROMISC |
> 				IFF_ALLMULTI |
> 				IFF_RUNNING |
> 				IFF_LOWER_UP |
> 				IFF_DORMANT)) |
>-		(dev->gflags & (IFF_PROMISC |
>+		(READ_ONCE(dev->gflags) & (IFF_PROMISC |
> 				IFF_ALLMULTI));
> 
> 	if (netif_running(dev)) {
>diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>index 3c8bdad0105dc9542489b612890ba86de9c44bdc..df3c6feea74e2d95144140eceb6df5cef2dce1f4 100644
>--- a/net/ipv6/addrconf.c
>+++ b/net/ipv6/addrconf.c
>@@ -6047,6 +6047,7 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
> 	struct net_device *dev = idev->dev;
> 	struct ifinfomsg *hdr;
> 	struct nlmsghdr *nlh;
>+	int ifindex, iflink;
> 	void *protoinfo;
> 
> 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*hdr), flags);
>@@ -6057,16 +6058,18 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
> 	hdr->ifi_family = AF_INET6;
> 	hdr->__ifi_pad = 0;
> 	hdr->ifi_type = dev->type;
>-	hdr->ifi_index = dev->ifindex;
>+	ifindex = READ_ONCE(dev->ifindex);
>+	hdr->ifi_index = ifindex;
> 	hdr->ifi_flags = dev_get_flags(dev);
> 	hdr->ifi_change = 0;
> 
>+	iflink = dev_get_iflink(dev);
> 	if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
> 	    (dev->addr_len &&
> 	     nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) ||
>-	    nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
>-	    (dev->ifindex != dev_get_iflink(dev) &&
>-	     nla_put_u32(skb, IFLA_LINK, dev_get_iflink(dev))) ||
>+	    nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) ||
>+	    (ifindex != iflink &&
>+	     nla_put_u32(skb, IFLA_LINK, iflink)) ||
> 	    nla_put_u8(skb, IFLA_OPERSTATE,
> 		       netif_running(dev) ? READ_ONCE(dev->operstate) : IF_OPER_DOWN))
> 		goto nla_put_failure;
>-- 
>2.44.0.rc1.240.g4c46232300-goog
>
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
  2024-02-22 16:36   ` Jiri Pirko
@ 2024-02-22 16:43     ` Eric Dumazet
  2024-02-23  7:19       ` Jiri Pirko
  2024-02-22 16:45     ` Eric Dumazet
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 16:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On Thu, Feb 22, 2024 at 5:36 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Feb 22, 2024 at 11:50:10AM CET, edumazet@google.com wrote:
> >We want to use RCU protection instead of RTNL
>
> Is this a royal "We"? :)
>
>
> >for inet6_fill_ifinfo().
>
> This is a motivation for this patch, not what the patch does.
>
> Would it be possible to maintain some sort of culture for the patch
> descriptions, even of the patches which are small and simple?
>
> https://www.kernel.org/doc/html/v6.6/process/submitting-patches.html#describe-your-changes
>
> Your patch descriptions are usually hard to follow for me to understand
> what the patch does :( Yes, I know you do it "to displease me" as you
> wrote couple of months ago but maybe think about the others too, also
> the ones looking in a git log/show and guessing.
>
> Don't beat me.
>

I dunno.

Do I need to explain why we need READ_ONCE()/WRITE_ONCE() on RCU for
all the patches ?

Documentation/RCU has already 36000 lines...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
  2024-02-22 16:36   ` Jiri Pirko
  2024-02-22 16:43     ` Eric Dumazet
@ 2024-02-22 16:45     ` Eric Dumazet
  2024-02-23  7:16       ` Jiri Pirko
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-22 16:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On Thu, Feb 22, 2024 at 5:36 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Feb 22, 2024 at 11:50:10AM CET, edumazet@google.com wrote:
> >We want to use RCU protection instead of RTNL
>
> Is this a royal "We"? :)

I was hoping reducing RTNL pressure was a team effort.

If not, maybe I should consider doing something else, if hundreds of
kernel engineers are adding more and more stuff depending on RTNL.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
  2024-02-22 16:45     ` Eric Dumazet
@ 2024-02-23  7:16       ` Jiri Pirko
  0 siblings, 0 replies; 41+ messages in thread
From: Jiri Pirko @ 2024-02-23  7:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Thu, Feb 22, 2024 at 05:45:20PM CET, edumazet@google.com wrote:
>On Thu, Feb 22, 2024 at 5:36 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Feb 22, 2024 at 11:50:10AM CET, edumazet@google.com wrote:
>> >We want to use RCU protection instead of RTNL
>>
>> Is this a royal "We"? :)
>
>I was hoping reducing RTNL pressure was a team effort.

Yeah sure, it just reads odd to me, that's it. Basically if you state
the motivation in the cover letter, then in the patches you just tell
the codebase what to do and this "we want" statement become redundant.


>
>If not, maybe I should consider doing something else, if hundreds of
>kernel engineers are adding more and more stuff depending on RTNL.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
  2024-02-22 16:43     ` Eric Dumazet
@ 2024-02-23  7:19       ` Jiri Pirko
  0 siblings, 0 replies; 41+ messages in thread
From: Jiri Pirko @ 2024-02-23  7:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Thu, Feb 22, 2024 at 05:43:17PM CET, edumazet@google.com wrote:
>On Thu, Feb 22, 2024 at 5:36 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Feb 22, 2024 at 11:50:10AM CET, edumazet@google.com wrote:
>> >We want to use RCU protection instead of RTNL
>>
>> Is this a royal "We"? :)
>>
>>
>> >for inet6_fill_ifinfo().
>>
>> This is a motivation for this patch, not what the patch does.
>>
>> Would it be possible to maintain some sort of culture for the patch
>> descriptions, even of the patches which are small and simple?
>>
>> https://www.kernel.org/doc/html/v6.6/process/submitting-patches.html#describe-your-changes
>>
>> Your patch descriptions are usually hard to follow for me to understand
>> what the patch does :( Yes, I know you do it "to displease me" as you
>> wrote couple of months ago but maybe think about the others too, also
>> the ones looking in a git log/show and guessing.
>>
>> Don't beat me.
>>
>
>I dunno.
>
>Do I need to explain why we need READ_ONCE()/WRITE_ONCE() on RCU for
>all the patches ?

I don't think so. If the motivation is described in the cover letter
properly, then in the incremental patches you just tell the codebase
what to change clearly, that describes the matter of changes. No
redundancy, clear motivation, clear patch description, easy to
understand for everyone.


>
>Documentation/RCU has already 36000 lines...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 05/14] netlink: fix netlink_diag_dump() return value
  2024-02-22 10:50 ` [PATCH v2 net-next 05/14] netlink: fix netlink_diag_dump() return value Eric Dumazet
@ 2024-02-23 12:30   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 12:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> __netlink_diag_dump() returns 1 if the dump is not complete,
> zero if no error occurred.
>
> If err variable is zero, this means the dump is complete:
> We should not return skb->len in this case, but 0.
>
> This allows NLMSG_DONE to be appended to the skb.
> User space does not have to call us again only to get NLMSG_DONE.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list()
  2024-02-22 10:50 ` [PATCH v2 net-next 14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
@ 2024-02-23 13:03   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 13:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> We want to be able to run rtnl_fill_ifinfo() under RCU protection
> instead of RTNL in the future.
>
> dev->name_node items are already rcu protected.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
  2024-02-22 10:50 ` [PATCH v2 net-next 13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
@ 2024-02-23 13:03   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 13:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> Use READ_ONCE() to read the following device fields:
>
> 	dev->mem_start
> 	dev->mem_end
> 	dev->base_addr
> 	dev->irq
> 	dev->dma
> 	dev->if_port
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
  2024-02-22 10:50 ` [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
@ 2024-02-23 13:29   ` Donald Hunter
  2024-02-24  8:21     ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 13:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> We want to be able to run rtnl_fill_ifinfo() under RCU protection
> instead of RTNL in the future.
>
> This patch prepares dev_get_iflink() and nla_put_iflink()
> to run either with RTNL or RCU held.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I notice that several of the *_get_iflink() implementations are wrapped
with rcu_read_lock()/unlock() and many are not. Shouldn't this be done
consistently for all?

e.g.

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 7a5be705d71830d5bb3aa26a96a4463df03883a4..6f2a688fccbfb02ae7bdf3d55cca0e77fa9b56b4 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1272,10 +1272,10 @@ static int ipoib_get_iflink(const struct net_device *dev)
>  
>  	/* parent interface */
>  	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
> -		return dev->ifindex;
> +		return READ_ONCE(dev->ifindex);
>  
>  	/* child/vlan interface */
> -	return priv->parent->ifindex;
> +	return READ_ONCE(priv->parent->ifindex);
>  }
>  
>  static u32 ipoib_addr_hash(struct ipoib_neigh_hash *htbl, u8 *daddr)
> diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
> index 98c669ad5141479b509ee924ddba3da6bca554cd..f7fabba707ea640cab8863e63bb19294e333ba2c 100644
> --- a/drivers/net/can/vxcan.c
> +++ b/drivers/net/can/vxcan.c
> @@ -119,7 +119,7 @@ static int vxcan_get_iflink(const struct net_device *dev)
>  
>  	rcu_read_lock();
>  	peer = rcu_dereference(priv->peer);
> -	iflink = peer ? peer->ifindex : 0;
> +	iflink = peer ? READ_ONCE(peer->ifindex) : 0;
>  	rcu_read_unlock();
>  
>  	return iflink;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU
  2024-02-22 10:50 ` [PATCH v2 net-next 02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
@ 2024-02-23 14:35   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 14:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
> in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo()
  2024-02-22 10:50 ` [PATCH v2 net-next 04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
@ 2024-02-23 14:42   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 14:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> Prepare inet6_dump_ifinfo() to run with RCU protection
> instead of RTNL and use for_each_netdev_dump() interface.
>
> Also properly return 0 at the end of a dump, avoiding
> an extra recvmsg() system call and RTNL acquisition.
>
> Note that RTNL-less dumps need core changes, coming later
> in the series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Nice to see cleaner code with for_each_netdev_dump().

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
  2024-02-22 10:50 ` [PATCH v2 net-next 08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
@ 2024-02-23 15:19   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 15:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
> allows dump operations registered via rtnl_register()
> or rtnl_register_module() to opt-out from RTNL protection.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection
  2024-02-22 10:50 ` [PATCH v2 net-next 09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
@ 2024-02-23 15:19   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 15:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> No longer hold RTNL while calling inet6_dump_ifinfo()
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL
  2024-02-22 10:50 ` [PATCH v2 net-next 11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL Eric Dumazet
@ 2024-02-23 15:21   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 15:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> nexthop_mpath_fill_node() will be potentially called
> from contexts holding rcu_lock instead of RTNL.
>
> Suggested-by: Ido Schimmel <idosch@nvidia.com>
> Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
  2024-02-22 10:50 ` [PATCH v2 net-next 10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
@ 2024-02-23 15:22   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 15:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> Add a new field into struct fib_dump_filter, to let callers
> tell if they use RTNL locking or RCU.
>
> This is used in the following patch, when inet_dump_fib()
> no longer holds RTNL.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 12/14] inet: switch inet_dump_fib() to RCU protection
  2024-02-22 10:50 ` [PATCH v2 net-next 12/14] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
@ 2024-02-23 15:25   ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-23 15:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:

> No longer hold RTNL while calling inet_dump_fib().
>
> Also change return value for a completed dump:
>
> Returning 0 instead of skb->len allows NLMSG_DONE
> to be appended to the skb. User space does not have
> to call us again to get a standalone NLMSG_DONE marker.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
  2024-02-23 13:29   ` Donald Hunter
@ 2024-02-24  8:21     ` Eric Dumazet
  2024-02-24 10:46       ` Donald Hunter
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-24  8:21 UTC (permalink / raw)
  To: Donald Hunter
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On Fri, Feb 23, 2024 at 4:25 PM Donald Hunter <donald.hunter@gmail.com> wrote:
>
> Eric Dumazet <edumazet@google.com> writes:
>
> > We want to be able to run rtnl_fill_ifinfo() under RCU protection
> > instead of RTNL in the future.
> >
> > This patch prepares dev_get_iflink() and nla_put_iflink()
> > to run either with RTNL or RCU held.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> I notice that several of the *_get_iflink() implementations are wrapped
> with rcu_read_lock()/unlock() and many are not. Shouldn't this be done
> consistently for all?

I do not understand the question, could you give one example of what
you saw so that I can comment ?

We do not need an rcu_read_lock() only to fetch dev->ifindex, if this
is what concerns you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
  2024-02-24  8:21     ` Eric Dumazet
@ 2024-02-24 10:46       ` Donald Hunter
  2024-02-24 11:08         ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Donald Hunter @ 2024-02-24 10:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On Sat, 24 Feb 2024 at 08:21, Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Feb 23, 2024 at 4:25 PM Donald Hunter <donald.hunter@gmail.com> wrote:
> >
> > I notice that several of the *_get_iflink() implementations are wrapped
> > with rcu_read_lock()/unlock() and many are not. Shouldn't this be done
> > consistently for all?
>
> I do not understand the question, could you give one example of what
> you saw so that I can comment ?

I did include a snippet of your patch showing ipoib_get_iflink() which
does not use rcu_read_lock() / unlock() and vxcan_get_iflink() which
does. Sorry if that wasn't clear. My concern is that I'd expect all
implementers of .ndo_get_iflink to need to be consistent, whether that
is with or without the calls. Does it just mean that individual
drivers are being overly cautious, or are protecting internal usage?

No use of rcu_read_lock() / unlock() here:

> index 7a5be705d71830d5bb3aa26a96a4463df03883a4..6f2a688fccbfb02ae7bdf3d55cca0e77fa9b56b4 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1272,10 +1272,10 @@ static int ipoib_get_iflink(const struct net_device *dev)
>
>       /* parent interface */
>       if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
> -             return dev->ifindex;
> +             return READ_ONCE(dev->ifindex);
>
>       /* child/vlan interface */
> -     return priv->parent->ifindex;
> +     return READ_ONCE(priv->parent->ifindex);
>  }

And use of them here:

> diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
> index 98c669ad5141479b509ee924ddba3da6bca554cd..f7fabba707ea640cab8863e63bb19294e333ba2c 100644
> --- a/drivers/net/can/vxcan.c
> +++ b/drivers/net/can/vxcan.c
> @@ -119,7 +119,7 @@ static int vxcan_get_iflink(const struct net_device *dev)
>
>       rcu_read_lock();
>       peer = rcu_dereference(priv->peer);
> -     iflink = peer ? peer->ifindex : 0;
> +     iflink = peer ? READ_ONCE(peer->ifindex) : 0;
>       rcu_read_unlock();
>
>       return iflink;


> We do not need an rcu_read_lock() only to fetch dev->ifindex, if this
> is what concerns you.

In which case, it seems that no .ndo_get_iflink implementations should
need the rcu_read_* calls?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
  2024-02-24 10:46       ` Donald Hunter
@ 2024-02-24 11:08         ` Eric Dumazet
  2024-02-26  8:59           ` Donald Hunter
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-02-24 11:08 UTC (permalink / raw)
  To: Donald Hunter
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On Sat, Feb 24, 2024 at 11:46 AM Donald Hunter <donald.hunter@gmail.com> wrote:
>
> On Sat, 24 Feb 2024 at 08:21, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Feb 23, 2024 at 4:25 PM Donald Hunter <donald.hunter@gmail.com> wrote:
> > >
> > > I notice that several of the *_get_iflink() implementations are wrapped
> > > with rcu_read_lock()/unlock() and many are not. Shouldn't this be done
> > > consistently for all?
> >
> > I do not understand the question, could you give one example of what
> > you saw so that I can comment ?
>
> I did include a snippet of your patch showing ipoib_get_iflink() which
> does not use rcu_read_lock() / unlock() and vxcan_get_iflink() which
> does. Sorry if that wasn't clear. My concern is that I'd expect all
> implementers of .ndo_get_iflink to need to be consistent, whether that
> is with or without the calls. Does it just mean that individual
> drivers are being overly cautious, or are protecting internal usage?
>
> No use of rcu_read_lock() / unlock() here:
>
> > index 7a5be705d71830d5bb3aa26a96a4463df03883a4..6f2a688fccbfb02ae7bdf3d55cca0e77fa9b56b4 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1272,10 +1272,10 @@ static int ipoib_get_iflink(const struct net_device *dev)
> >
> >       /* parent interface */
> >       if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
> > -             return dev->ifindex;
> > +             return READ_ONCE(dev->ifindex);

No need here, because dev is guaranteed to be alive during this call.

> >
> >       /* child/vlan interface */
> > -     return priv->parent->ifindex;
> > +     return READ_ONCE(priv->parent->ifindex);

Sure, no need for rcu_read_lock() here because priv->parent is stable
(can not change during lifetime)


> >  }
>
> And use of them here:
>
> > diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
> > index 98c669ad5141479b509ee924ddba3da6bca554cd..f7fabba707ea640cab8863e63bb19294e333ba2c 100644
> > --- a/drivers/net/can/vxcan.c
> > +++ b/drivers/net/can/vxcan.c
> > @@ -119,7 +119,7 @@ static int vxcan_get_iflink(const struct net_device *dev)
> >
> >       rcu_read_lock();
> >       peer = rcu_dereference(priv->peer);
> > -     iflink = peer ? peer->ifindex : 0;
> > +     iflink = peer ? READ_ONCE(peer->ifindex) : 0;
> >       rcu_read_unlock();
> >
> >       return iflink;
>
>
> > We do not need an rcu_read_lock() only to fetch dev->ifindex, if this
> > is what concerns you.
>
> In which case, it seems that no .ndo_get_iflink implementations should
> need the rcu_read_* calls?

rcu_read_lock() is needed in all cases a dereference is performed,
expecting RCU protection of the pointer.

In vxcan_get_iflink(), we access priv->peer, then peer->ifindex.

rcu_read_lock() is needed because of the second dereference, peer->ifindex.

Without rcu_read_lock(), peer could be freed before we get a chance to
read peer->ifindex.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
  2024-02-24 11:08         ` Eric Dumazet
@ 2024-02-26  8:59           ` Donald Hunter
  0 siblings, 0 replies; 41+ messages in thread
From: Donald Hunter @ 2024-02-26  8:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Eric Dumazet <edumazet@google.com> writes:
>>
>> And use of them here:
>>
>> > diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
>> > index 98c669ad5141479b509ee924ddba3da6bca554cd..f7fabba707ea640cab8863e63bb19294e333ba2c 100644
>> > --- a/drivers/net/can/vxcan.c
>> > +++ b/drivers/net/can/vxcan.c
>> > @@ -119,7 +119,7 @@ static int vxcan_get_iflink(const struct net_device *dev)
>> >
>> >       rcu_read_lock();
>> >       peer = rcu_dereference(priv->peer);
>> > -     iflink = peer ? peer->ifindex : 0;
>> > +     iflink = peer ? READ_ONCE(peer->ifindex) : 0;
>> >       rcu_read_unlock();
>> >
>> >       return iflink;
>>
>>
>> > We do not need an rcu_read_lock() only to fetch dev->ifindex, if this
>> > is what concerns you.
>>
>> In which case, it seems that no .ndo_get_iflink implementations should
>> need the rcu_read_* calls?
>
> rcu_read_lock() is needed in all cases a dereference is performed,
> expecting RCU protection of the pointer.
>
> In vxcan_get_iflink(), we access priv->peer, then peer->ifindex.
>
> rcu_read_lock() is needed because of the second dereference, peer->ifindex.
>
> Without rcu_read_lock(), peer could be freed before we get a chance to
> read peer->ifindex.

Thanks for the detailed explanation.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps
  2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
                   ` (13 preceding siblings ...)
  2024-02-22 10:50 ` [PATCH v2 net-next 14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
@ 2024-02-26 11:50 ` patchwork-bot+netdevbpf
  14 siblings, 0 replies; 41+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-02-26 11:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, idosch, jiri, eric.dumazet

Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 22 Feb 2024 10:50:07 +0000 you wrote:
> This series restarts the conversion of rtnl dump operations
> to RCU protection, instead of requiring RTNL.
> 
> In this new attempt (prior one failed in 2011), I chose to
> allow a gradual conversion of selected operations.
> 
> After this series, "ip -6 addr" and "ip -4 ro" no longer
> need to acquire RTNL.
> 
> [...]

Here is the summary with links:
  - [v2,net-next,01/14] rtnetlink: prepare nla_put_iflink() to run under RCU
    https://git.kernel.org/netdev/net-next/c/e353ea9ce471
  - [v2,net-next,02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU
    https://git.kernel.org/netdev/net-next/c/4ad268136421
  - [v2,net-next,03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection
    https://git.kernel.org/netdev/net-next/c/8afc7a78d55d
  - [v2,net-next,04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo()
    https://git.kernel.org/netdev/net-next/c/ac14ad9755d4
  - [v2,net-next,05/14] netlink: fix netlink_diag_dump() return value
    https://git.kernel.org/netdev/net-next/c/6647b338fc5c
  - [v2,net-next,06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
    https://git.kernel.org/netdev/net-next/c/b5590270068c
  - [v2,net-next,07/14] rtnetlink: change nlk->cb_mutex role
    https://git.kernel.org/netdev/net-next/c/e39951d965bf
  - [v2,net-next,08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
    https://git.kernel.org/netdev/net-next/c/386520e0ecc0
  - [v2,net-next,09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection
    https://git.kernel.org/netdev/net-next/c/69fdb7e411b6
  - [v2,net-next,10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
    https://git.kernel.org/netdev/net-next/c/22e36ea9f5d7
  - [v2,net-next,11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL
    https://git.kernel.org/netdev/net-next/c/0ac3fa0c3b36
  - [v2,net-next,12/14] inet: switch inet_dump_fib() to RCU protection
    https://git.kernel.org/netdev/net-next/c/4ce5dc9316de
  - [v2,net-next,13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
    https://git.kernel.org/netdev/net-next/c/74808e72e0b2
  - [v2,net-next,14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list()
    https://git.kernel.org/netdev/net-next/c/0ec4e48c3a23

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  2024-02-22 16:20   ` Jiri Pirko
@ 2024-06-09  8:17     ` Tetsuo Handa
  2024-06-09  8:29       ` Tetsuo Handa
  0 siblings, 1 reply; 41+ messages in thread
From: Tetsuo Handa @ 2024-06-09  8:17 UTC (permalink / raw)
  To: Jiri Pirko, Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

Hello.

While investigating hung task reports involving rtnl_mutex, I came to
suspect that commit b5590270068c ("netlink: hold nlk->cb_mutex longer
in __netlink_dump_start()") is buggy, for that commit made only
mutex_lock(nlk->cb_mutex) side conditionally. Why don't we need to make
mutex_unlock(nlk->cb_mutex) side conditionally?

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index fa9c090cf629..c23a8d4ddcae 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2352,7 +2352,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 
 	if (nlk->dump_done_errno > 0 ||
 	    skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
-		mutex_unlock(&nlk->nl_cb_mutex);
+		if (!lock_taken)
+			mutex_unlock(&nlk->nl_cb_mutex);
 
 		if (sk_filter(sk, skb))
 			kfree_skb(skb);
@@ -2386,13 +2387,15 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	WRITE_ONCE(nlk->cb_running, false);
 	module = cb->module;
 	skb = cb->skb;
-	mutex_unlock(&nlk->nl_cb_mutex);
+	if (!lock_taken)
+		mutex_unlock(&nlk->nl_cb_mutex);
 	module_put(module);
 	consume_skb(skb);
 	return 0;
 
 errout_skb:
-	mutex_unlock(&nlk->nl_cb_mutex);
+	if (!lock_taken)
+		mutex_unlock(&nlk->nl_cb_mutex);
 	kfree_skb(skb);
 	return err;
 }


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  2024-06-09  8:17     ` Tetsuo Handa
@ 2024-06-09  8:29       ` Tetsuo Handa
  2024-06-10 12:59         ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Tetsuo Handa @ 2024-06-09  8:29 UTC (permalink / raw)
  To: Jiri Pirko, Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On 2024/06/09 17:17, Tetsuo Handa wrote:
> Hello.
> 
> While investigating hung task reports involving rtnl_mutex, I came to
> suspect that commit b5590270068c ("netlink: hold nlk->cb_mutex longer
> in __netlink_dump_start()") is buggy, for that commit made only
> mutex_lock(nlk->cb_mutex) side conditionally. Why don't we need to make
> mutex_unlock(nlk->cb_mutex) side conditionally?
> 

Sorry for the noise. That commit should be correct, for the caller
no longer calls mutex_unlock(nlk->cb_mutex).

I'll try a debug printk() patch for linux-next.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  2024-06-09  8:29       ` Tetsuo Handa
@ 2024-06-10 12:59         ` Eric Dumazet
  2024-06-10 13:21           ` Tetsuo Handa
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2024-06-10 12:59 UTC (permalink / raw)
  To: Tetsuo Handa, Dmitry Vyukov
  Cc: Jiri Pirko, David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On Sun, Jun 9, 2024 at 10:29 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/06/09 17:17, Tetsuo Handa wrote:
> > Hello.
> >
> > While investigating hung task reports involving rtnl_mutex, I came to
> > suspect that commit b5590270068c ("netlink: hold nlk->cb_mutex longer
> > in __netlink_dump_start()") is buggy, for that commit made only
> > mutex_lock(nlk->cb_mutex) side conditionally. Why don't we need to make
> > mutex_unlock(nlk->cb_mutex) side conditionally?
> >
>
> Sorry for the noise. That commit should be correct, for the caller
> no longer calls mutex_unlock(nlk->cb_mutex).
>
> I'll try a debug printk() patch for linux-next.

I also have a lot of hung task reports as well, but in most reports
the console is flooded
before the crashes.


[  276.515597][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.522774][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71
[  276.529566][    C1] yealink 4-1:36.0: unexpected response 0
[  276.535875][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.543011][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71
[  276.549951][    C1] yealink 4-1:36.0: unexpected response 0
[  276.556111][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.563143][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71
[  276.570382][    C1] yealink 4-1:36.0: unexpected response 0
[  276.576399][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.584381][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71
[  276.591617][    C1] yealink 4-1:36.0: unexpected response 0
[  276.597904][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.605126][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71
[  276.612153][    C1] yealink 4-1:36.0: unexpected response 0
[  276.618588][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.626153][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71
[  276.631595][   T30] INFO: task dhcpcd:4749 blocked for more than 143 seconds.
[  276.633015][    C1] yealink 4-1:36.0: unexpected response 0
[  276.646813][    C1] yealink 4-1:36.0: urb_ctl_callback - urb status -71
[  276.654401][   T30]       Not tainted
6.10.0-rc2-syzkaller-00269-g96e09b8f8166 #0


2024/06/08 02:48:35 SYZFATAL: failed to recv *flatrpc.HostMessageRaw: EOF

[  276.654461][    C1] yealink 4-1:36.0: urb_irq_callback - urb status -71

I wonder how to deal with SYZFATAL, maybe the reports are truncated and we
do not see who owns rtnl mutex.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
  2024-06-10 12:59         ` Eric Dumazet
@ 2024-06-10 13:21           ` Tetsuo Handa
  0 siblings, 0 replies; 41+ messages in thread
From: Tetsuo Handa @ 2024-06-10 13:21 UTC (permalink / raw)
  To: Eric Dumazet, Dmitry Vyukov
  Cc: Jiri Pirko, David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Ido Schimmel, Jiri Pirko, eric.dumazet

On 2024/06/10 21:59, Eric Dumazet wrote:
> On Sun, Jun 9, 2024 at 10:29 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2024/06/09 17:17, Tetsuo Handa wrote:
>>> Hello.
>>>
>>> While investigating hung task reports involving rtnl_mutex, I came to
>>> suspect that commit b5590270068c ("netlink: hold nlk->cb_mutex longer
>>> in __netlink_dump_start()") is buggy, for that commit made only
>>> mutex_lock(nlk->cb_mutex) side conditionally. Why don't we need to make
>>> mutex_unlock(nlk->cb_mutex) side conditionally?
>>>
>>
>> Sorry for the noise. That commit should be correct, for the caller
>> no longer calls mutex_unlock(nlk->cb_mutex).
>>
>> I'll try a debug printk() patch for linux-next.
> 
> I also have a lot of hung task reports as well, but in most reports
> the console is flooded
> before the crashes.

Yeah, printk() flooding is the cause of some of hung task reports.

I queued https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2bfadd666b5852974071df0588d7eb0f499b7b5/
for linux-next.git . You can try this patch to see what the owner of rtnl_mutex is doing.


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2024-06-10 13:21 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-22 10:50 [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
2024-02-22 10:50 ` [PATCH v2 net-next 01/14] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
2024-02-23 13:29   ` Donald Hunter
2024-02-24  8:21     ` Eric Dumazet
2024-02-24 10:46       ` Donald Hunter
2024-02-24 11:08         ` Eric Dumazet
2024-02-26  8:59           ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 02/14] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
2024-02-23 14:35   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 03/14] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
2024-02-22 16:36   ` Jiri Pirko
2024-02-22 16:43     ` Eric Dumazet
2024-02-23  7:19       ` Jiri Pirko
2024-02-22 16:45     ` Eric Dumazet
2024-02-23  7:16       ` Jiri Pirko
2024-02-22 10:50 ` [PATCH v2 net-next 04/14] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
2024-02-23 14:42   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 05/14] netlink: fix netlink_diag_dump() return value Eric Dumazet
2024-02-23 12:30   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 06/14] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
2024-02-22 16:20   ` Jiri Pirko
2024-06-09  8:17     ` Tetsuo Handa
2024-06-09  8:29       ` Tetsuo Handa
2024-06-10 12:59         ` Eric Dumazet
2024-06-10 13:21           ` Tetsuo Handa
2024-02-22 10:50 ` [PATCH v2 net-next 07/14] rtnetlink: change nlk->cb_mutex role Eric Dumazet
2024-02-22 10:50 ` [PATCH v2 net-next 08/14] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
2024-02-23 15:19   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 09/14] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
2024-02-23 15:19   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 10/14] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
2024-02-23 15:22   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 11/14] nexthop: allow nexthop_mpath_fill_node() to be called without RTNL Eric Dumazet
2024-02-23 15:21   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 12/14] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
2024-02-23 15:25   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 13/14] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
2024-02-23 13:03   ` Donald Hunter
2024-02-22 10:50 ` [PATCH v2 net-next 14/14] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
2024-02-23 13:03   ` Donald Hunter
2024-02-26 11:50 ` [PATCH v2 net-next 00/14] rtnetlink: reduce RTNL pressure for dumps patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).