* [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps
@ 2024-02-21 10:59 Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 01/13] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
` (12 more replies)
0 siblings, 13 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
This series restarts the conversion of rtnl dump operations
to RCU protection, instead of requiring RTNL.
In this new attempt (prior one failed in 2009), I chose to
allow a gradual conversion of selected operations.
After this series, "ip -6 addr" and "ip -4 ro" no longer
need to acquire RTNL.
I refrained from changing inet_dump_ifaddr() and inet6_dump_addr()
to avoid merge conflicts because of two fixes in net tree.
I also started the work for "ip link" future conversion.
Eric Dumazet (13):
rtnetlink: prepare nla_put_iflink() to run under RCU
ipv6: prepare inet6_fill_ifla6_attrs() for RCU
ipv6: prepare inet6_fill_ifinfo() for RCU protection
ipv6: use xarray iterator to implement inet6_dump_ifinfo()
netlink: fix netlink_diag_dump() return value
netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
rtnetlink: change nlk->cb_mutex role
rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
ipv6: switch inet6_dump_ifinfo() to RCU protection
inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
inet: switch inet_dump_fib() to RCU protection
rtnetlink: make rtnl_fill_link_ifmap() RCU ready
rtnetlink: provide RCU protection to rtnl_fill_prop_list()
drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 +-
drivers/net/can/vxcan.c | 2 +-
.../net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2 +-
drivers/net/ipvlan/ipvlan_main.c | 2 +-
drivers/net/macsec.c | 2 +-
drivers/net/macvlan.c | 2 +-
drivers/net/netkit.c | 2 +-
drivers/net/veth.c | 2 +-
drivers/net/wireless/virtual/virt_wifi.c | 2 +-
include/linux/netdevice.h | 6 +-
include/linux/netlink.h | 2 +
include/net/ip_fib.h | 1 +
include/net/rtnetlink.h | 1 +
net/8021q/vlan_dev.c | 4 +-
net/core/dev.c | 6 +-
net/core/rtnetlink.c | 41 ++--
net/dsa/user.c | 2 +-
net/ieee802154/6lowpan/core.c | 2 +-
net/ipv4/fib_frontend.c | 50 ++--
net/ipv4/fib_trie.c | 4 +-
net/ipv4/ipmr.c | 4 +-
net/ipv6/addrconf.c | 222 +++++++++---------
net/ipv6/ip6_fib.c | 7 +-
net/ipv6/ip6_tunnel.c | 2 +-
net/ipv6/ip6mr.c | 4 +-
net/ipv6/ndisc.c | 2 +-
net/mpls/af_mpls.c | 4 +-
net/netlink/af_netlink.c | 46 ++--
net/netlink/af_netlink.h | 5 +-
net/netlink/diag.c | 2 +-
net/xfrm/xfrm_interface_core.c | 2 +-
31 files changed, 240 insertions(+), 199 deletions(-)
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next 01/13] rtnetlink: prepare nla_put_iflink() to run under RCU
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 02/13] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
` (11 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
drivers/net/can/vxcan.c | 2 +-
drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2 +-
drivers/net/ipvlan/ipvlan_main.c | 2 +-
drivers/net/macsec.c | 2 +-
drivers/net/macvlan.c | 2 +-
drivers/net/netkit.c | 2 +-
drivers/net/veth.c | 2 +-
drivers/net/wireless/virtual/virt_wifi.c | 2 +-
net/8021q/vlan_dev.c | 4 ++--
net/core/dev.c | 2 +-
net/core/rtnetlink.c | 6 +++---
net/dsa/user.c | 2 +-
net/ieee802154/6lowpan/core.c | 2 +-
net/ipv6/ip6_tunnel.c | 2 +-
net/xfrm/xfrm_interface_core.c | 2 +-
16 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7a5be705d71830d5bb3aa26a96a4463df03883a4..6f2a688fccbfb02ae7bdf3d55cca0e77fa9b56b4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1272,10 +1272,10 @@ static int ipoib_get_iflink(const struct net_device *dev)
/* parent interface */
if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
- return dev->ifindex;
+ return READ_ONCE(dev->ifindex);
/* child/vlan interface */
- return priv->parent->ifindex;
+ return READ_ONCE(priv->parent->ifindex);
}
static u32 ipoib_addr_hash(struct ipoib_neigh_hash *htbl, u8 *daddr)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 98c669ad5141479b509ee924ddba3da6bca554cd..f7fabba707ea640cab8863e63bb19294e333ba2c 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -119,7 +119,7 @@ static int vxcan_get_iflink(const struct net_device *dev)
rcu_read_lock();
peer = rcu_dereference(priv->peer);
- iflink = peer ? peer->ifindex : 0;
+ iflink = peer ? READ_ONCE(peer->ifindex) : 0;
rcu_read_unlock();
return iflink;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 046b5f7d8e7cab33a9f09079858bac2a972e968a..9d2a9562c96ff4937da7a389c773acce01508ca3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -98,7 +98,7 @@ static int rmnet_vnd_get_iflink(const struct net_device *dev)
{
struct rmnet_priv *priv = netdev_priv(dev);
- return priv->real_dev->ifindex;
+ return READ_ONCE(priv->real_dev->ifindex);
}
static int rmnet_vnd_init(struct net_device *dev)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index df7c43a109e1a7376c6ce3216cb3dd4223eac04c..5920f7e6335230cf07a3da528e4ac7a050c2fd41 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -349,7 +349,7 @@ static int ipvlan_get_iflink(const struct net_device *dev)
{
struct ipvl_dev *ipvlan = netdev_priv(dev);
- return ipvlan->phy_dev->ifindex;
+ return READ_ONCE(ipvlan->phy_dev->ifindex);
}
static const struct net_device_ops ipvlan_netdev_ops = {
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 7f5426285c61b1e35afd74d4c044f80c77f34e7f..4b5513c9c2befe42e054fee6ecdadc9aabb0ce19 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3753,7 +3753,7 @@ static void macsec_get_stats64(struct net_device *dev,
static int macsec_get_iflink(const struct net_device *dev)
{
- return macsec_priv(dev)->real_dev->ifindex;
+ return READ_ONCE(macsec_priv(dev)->real_dev->ifindex);
}
static const struct net_device_ops macsec_netdev_ops = {
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a3cc665757e8727d3ffb24d8dbfbcd321fc93ffd..0cec2783a3e712b7769572482bf59aa336b9ca15 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1158,7 +1158,7 @@ static int macvlan_dev_get_iflink(const struct net_device *dev)
{
struct macvlan_dev *vlan = netdev_priv(dev);
- return vlan->lowerdev->ifindex;
+ return READ_ONCE(vlan->lowerdev->ifindex);
}
static const struct ethtool_ops macvlan_ethtool_ops = {
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 39171380ccf29e27412bb2b9cee7102acc4a83ab..a4d2e76a8d587cc6ce7ad7f98e382a1c81f76e67 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -145,7 +145,7 @@ static int netkit_get_iflink(const struct net_device *dev)
rcu_read_lock();
peer = rcu_dereference(nk->peer);
if (peer)
- iflink = peer->ifindex;
+ iflink = READ_ONCE(peer->ifindex);
rcu_read_unlock();
return iflink;
}
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 500b9dfccd08ee8f91b22d78e3d8195f3de26088..dd5aa8ab65a865dc9dbaa596861671d189bfe1af 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1461,7 +1461,7 @@ static int veth_get_iflink(const struct net_device *dev)
rcu_read_lock();
peer = rcu_dereference(priv->peer);
- iflink = peer ? peer->ifindex : 0;
+ iflink = peer ? READ_ONCE(peer->ifindex) : 0;
rcu_read_unlock();
return iflink;
diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c
index ba14d83353a4b226e44d420a16e33460a9dc762d..6a84ec58d618bcbf966dab6e38cfe02b886a712f 100644
--- a/drivers/net/wireless/virtual/virt_wifi.c
+++ b/drivers/net/wireless/virtual/virt_wifi.c
@@ -453,7 +453,7 @@ static int virt_wifi_net_device_get_iflink(const struct net_device *dev)
{
struct virt_wifi_netdev_priv *priv = netdev_priv(dev);
- return priv->lowerdev->ifindex;
+ return READ_ONCE(priv->lowerdev->ifindex);
}
static const struct net_device_ops virt_wifi_ops = {
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index df55525182517e49b2cfbffe7f102967c66b5952..39876eff51d21f830c3bde1682e07aac698c633e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -762,9 +762,9 @@ static void vlan_dev_netpoll_cleanup(struct net_device *dev)
static int vlan_dev_get_iflink(const struct net_device *dev)
{
- struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+ const struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
- return real_dev->ifindex;
+ return READ_ONCE(real_dev->ifindex);
}
static int vlan_dev_fill_forward_path(struct net_device_path_ctx *ctx,
diff --git a/net/core/dev.c b/net/core/dev.c
index c588808be77f563c429eb4a2eaee5c8062d99582..0628d8ff1ed932efdd45ab7b79599dcfcca6c4eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -641,7 +641,7 @@ int dev_get_iflink(const struct net_device *dev)
if (dev->netdev_ops && dev->netdev_ops->ndo_get_iflink)
return dev->netdev_ops->ndo_get_iflink(dev);
- return dev->ifindex;
+ return READ_ONCE(dev->ifindex);
}
EXPORT_SYMBOL(dev_get_iflink);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c54dbe05c4c5df126d0b58403049ebc1d272907e..060543fe7919c13c7a5c6cf22f9e7606d0897345 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1611,10 +1611,10 @@ static int put_master_ifindex(struct sk_buff *skb, struct net_device *dev)
static int nla_put_iflink(struct sk_buff *skb, const struct net_device *dev,
bool force)
{
- int ifindex = dev_get_iflink(dev);
+ int iflink = dev_get_iflink(dev);
- if (force || dev->ifindex != ifindex)
- return nla_put_u32(skb, IFLA_LINK, ifindex);
+ if (force || READ_ONCE(dev->ifindex) != iflink)
+ return nla_put_u32(skb, IFLA_LINK, iflink);
return 0;
}
diff --git a/net/dsa/user.c b/net/dsa/user.c
index 4d53c76a9840a789511b9ee0d9a39c70de77f72c..9c42a6edcdc8a8de94241ce4a238f31583b738ec 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -352,7 +352,7 @@ void dsa_user_mii_bus_init(struct dsa_switch *ds)
/* user device handling ****************************************************/
static int dsa_user_get_iflink(const struct net_device *dev)
{
- return dsa_user_to_conduit(dev)->ifindex;
+ return READ_ONCE(dsa_user_to_conduit(dev)->ifindex);
}
static int dsa_user_open(struct net_device *dev)
diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
index e643f52663f9bed8c4707b205a73d0d2bad5bb73..77b4e92027c5dfdadefc3019ca82ee8967a9006e 100644
--- a/net/ieee802154/6lowpan/core.c
+++ b/net/ieee802154/6lowpan/core.c
@@ -93,7 +93,7 @@ static int lowpan_neigh_construct(struct net_device *dev, struct neighbour *n)
static int lowpan_get_iflink(const struct net_device *dev)
{
- return lowpan_802154_dev(dev)->wdev->ifindex;
+ return READ_ONCE(lowpan_802154_dev(dev)->wdev->ifindex);
}
static const struct net_device_ops lowpan_netdev_ops = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 44406c28445dc457fb47a7cdec295778eb30b31f..5fd07581efafe3c57cc8732ddaae9910d6726f30 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1756,7 +1756,7 @@ int ip6_tnl_get_iflink(const struct net_device *dev)
{
struct ip6_tnl *t = netdev_priv(dev);
- return t->parms.link;
+ return READ_ONCE(t->parms.link);
}
EXPORT_SYMBOL(ip6_tnl_get_iflink);
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index dafefef3cf51a79fd6701a8b78c3f8fcfd10615d..717855b9acf1c413d506f681aec636af9b075af5 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -727,7 +727,7 @@ static int xfrmi_get_iflink(const struct net_device *dev)
{
struct xfrm_if *xi = netdev_priv(dev);
- return xi->p.link;
+ return READ_ONCE(xi->p.link);
}
static const struct net_device_ops xfrmi_netdev_ops = {
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 02/13] ipv6: prepare inet6_fill_ifla6_attrs() for RCU
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 01/13] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 03/13] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
` (10 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv6/addrconf.c | 163 ++++++++++++++++++++++++--------------------
net/ipv6/ndisc.c | 2 +-
2 files changed, 90 insertions(+), 75 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d3f4b7b9cf1fe380757225a110153fbad51bf763..3c8bdad0105dc9542489b612890ba86de9c44bdc 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3477,7 +3477,8 @@ static void addrconf_dev_config(struct net_device *dev)
/* this device type has no EUI support */
if (dev->type == ARPHRD_NONE &&
idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_EUI64)
- idev->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_RANDOM;
+ WRITE_ONCE(idev->cnf.addr_gen_mode,
+ IN6_ADDR_GEN_MODE_RANDOM);
addrconf_addr_gen(idev, false);
}
@@ -3749,7 +3750,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
rt6_mtu_change(dev, dev->mtu);
idev->cnf.mtu6 = dev->mtu;
}
- idev->tstamp = jiffies;
+ WRITE_ONCE(idev->tstamp, jiffies);
inet6_ifinfo_notify(RTM_NEWLINK, idev);
/*
@@ -3991,7 +3992,7 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
ipv6_mc_down(idev);
}
- idev->tstamp = jiffies;
+ WRITE_ONCE(idev->tstamp, jiffies);
idev->ra_mtu = 0;
/* Last: Shot the device (if unregistered) */
@@ -5619,87 +5620,97 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
rtnl_set_sk_err(net, RTNLGRP_IPV6_IFADDR, err);
}
-static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
- __s32 *array, int bytes)
+static void ipv6_store_devconf(const struct ipv6_devconf *cnf,
+ __s32 *array, int bytes)
{
BUG_ON(bytes < (DEVCONF_MAX * 4));
memset(array, 0, bytes);
- array[DEVCONF_FORWARDING] = cnf->forwarding;
- array[DEVCONF_HOPLIMIT] = cnf->hop_limit;
- array[DEVCONF_MTU6] = cnf->mtu6;
- array[DEVCONF_ACCEPT_RA] = cnf->accept_ra;
- array[DEVCONF_ACCEPT_REDIRECTS] = cnf->accept_redirects;
- array[DEVCONF_AUTOCONF] = cnf->autoconf;
- array[DEVCONF_DAD_TRANSMITS] = cnf->dad_transmits;
- array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
+ array[DEVCONF_FORWARDING] = READ_ONCE(cnf->forwarding);
+ array[DEVCONF_HOPLIMIT] = READ_ONCE(cnf->hop_limit);
+ array[DEVCONF_MTU6] = READ_ONCE(cnf->mtu6);
+ array[DEVCONF_ACCEPT_RA] = READ_ONCE(cnf->accept_ra);
+ array[DEVCONF_ACCEPT_REDIRECTS] = READ_ONCE(cnf->accept_redirects);
+ array[DEVCONF_AUTOCONF] = READ_ONCE(cnf->autoconf);
+ array[DEVCONF_DAD_TRANSMITS] = READ_ONCE(cnf->dad_transmits);
+ array[DEVCONF_RTR_SOLICITS] = READ_ONCE(cnf->rtr_solicits);
array[DEVCONF_RTR_SOLICIT_INTERVAL] =
- jiffies_to_msecs(cnf->rtr_solicit_interval);
+ jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_interval));
array[DEVCONF_RTR_SOLICIT_MAX_INTERVAL] =
- jiffies_to_msecs(cnf->rtr_solicit_max_interval);
+ jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_max_interval));
array[DEVCONF_RTR_SOLICIT_DELAY] =
- jiffies_to_msecs(cnf->rtr_solicit_delay);
- array[DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version;
+ jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_delay));
+ array[DEVCONF_FORCE_MLD_VERSION] = READ_ONCE(cnf->force_mld_version);
array[DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL] =
- jiffies_to_msecs(cnf->mldv1_unsolicited_report_interval);
+ jiffies_to_msecs(READ_ONCE(cnf->mldv1_unsolicited_report_interval));
array[DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL] =
- jiffies_to_msecs(cnf->mldv2_unsolicited_report_interval);
- array[DEVCONF_USE_TEMPADDR] = cnf->use_tempaddr;
- array[DEVCONF_TEMP_VALID_LFT] = cnf->temp_valid_lft;
- array[DEVCONF_TEMP_PREFERED_LFT] = cnf->temp_prefered_lft;
- array[DEVCONF_REGEN_MAX_RETRY] = cnf->regen_max_retry;
- array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
- array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
- array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
- array[DEVCONF_RA_DEFRTR_METRIC] = cnf->ra_defrtr_metric;
- array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
- array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
+ jiffies_to_msecs(READ_ONCE(cnf->mldv2_unsolicited_report_interval));
+ array[DEVCONF_USE_TEMPADDR] = READ_ONCE(cnf->use_tempaddr);
+ array[DEVCONF_TEMP_VALID_LFT] = READ_ONCE(cnf->temp_valid_lft);
+ array[DEVCONF_TEMP_PREFERED_LFT] = READ_ONCE(cnf->temp_prefered_lft);
+ array[DEVCONF_REGEN_MAX_RETRY] = READ_ONCE(cnf->regen_max_retry);
+ array[DEVCONF_MAX_DESYNC_FACTOR] = READ_ONCE(cnf->max_desync_factor);
+ array[DEVCONF_MAX_ADDRESSES] = READ_ONCE(cnf->max_addresses);
+ array[DEVCONF_ACCEPT_RA_DEFRTR] = READ_ONCE(cnf->accept_ra_defrtr);
+ array[DEVCONF_RA_DEFRTR_METRIC] = READ_ONCE(cnf->ra_defrtr_metric);
+ array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] =
+ READ_ONCE(cnf->accept_ra_min_hop_limit);
+ array[DEVCONF_ACCEPT_RA_PINFO] = READ_ONCE(cnf->accept_ra_pinfo);
#ifdef CONFIG_IPV6_ROUTER_PREF
- array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
+ array[DEVCONF_ACCEPT_RA_RTR_PREF] = READ_ONCE(cnf->accept_ra_rtr_pref);
array[DEVCONF_RTR_PROBE_INTERVAL] =
- jiffies_to_msecs(cnf->rtr_probe_interval);
+ jiffies_to_msecs(READ_ONCE(cnf->rtr_probe_interval));
#ifdef CONFIG_IPV6_ROUTE_INFO
- array[DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN] = cnf->accept_ra_rt_info_min_plen;
- array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = cnf->accept_ra_rt_info_max_plen;
+ array[DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN] =
+ READ_ONCE(cnf->accept_ra_rt_info_min_plen);
+ array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] =
+ READ_ONCE(cnf->accept_ra_rt_info_max_plen);
#endif
#endif
- array[DEVCONF_PROXY_NDP] = cnf->proxy_ndp;
- array[DEVCONF_ACCEPT_SOURCE_ROUTE] = cnf->accept_source_route;
+ array[DEVCONF_PROXY_NDP] = READ_ONCE(cnf->proxy_ndp);
+ array[DEVCONF_ACCEPT_SOURCE_ROUTE] =
+ READ_ONCE(cnf->accept_source_route);
#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
- array[DEVCONF_OPTIMISTIC_DAD] = cnf->optimistic_dad;
- array[DEVCONF_USE_OPTIMISTIC] = cnf->use_optimistic;
+ array[DEVCONF_OPTIMISTIC_DAD] = READ_ONCE(cnf->optimistic_dad);
+ array[DEVCONF_USE_OPTIMISTIC] = READ_ONCE(cnf->use_optimistic);
#endif
#ifdef CONFIG_IPV6_MROUTE
array[DEVCONF_MC_FORWARDING] = atomic_read(&cnf->mc_forwarding);
#endif
- array[DEVCONF_DISABLE_IPV6] = cnf->disable_ipv6;
- array[DEVCONF_ACCEPT_DAD] = cnf->accept_dad;
- array[DEVCONF_FORCE_TLLAO] = cnf->force_tllao;
- array[DEVCONF_NDISC_NOTIFY] = cnf->ndisc_notify;
- array[DEVCONF_SUPPRESS_FRAG_NDISC] = cnf->suppress_frag_ndisc;
- array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local;
- array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu;
- array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] = cnf->ignore_routes_with_linkdown;
+ array[DEVCONF_DISABLE_IPV6] = READ_ONCE(cnf->disable_ipv6);
+ array[DEVCONF_ACCEPT_DAD] = READ_ONCE(cnf->accept_dad);
+ array[DEVCONF_FORCE_TLLAO] = READ_ONCE(cnf->force_tllao);
+ array[DEVCONF_NDISC_NOTIFY] = READ_ONCE(cnf->ndisc_notify);
+ array[DEVCONF_SUPPRESS_FRAG_NDISC] =
+ READ_ONCE(cnf->suppress_frag_ndisc);
+ array[DEVCONF_ACCEPT_RA_FROM_LOCAL] =
+ READ_ONCE(cnf->accept_ra_from_local);
+ array[DEVCONF_ACCEPT_RA_MTU] = READ_ONCE(cnf->accept_ra_mtu);
+ array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] =
+ READ_ONCE(cnf->ignore_routes_with_linkdown);
/* we omit DEVCONF_STABLE_SECRET for now */
- array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
- array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = cnf->drop_unicast_in_l2_multicast;
- array[DEVCONF_DROP_UNSOLICITED_NA] = cnf->drop_unsolicited_na;
- array[DEVCONF_KEEP_ADDR_ON_DOWN] = cnf->keep_addr_on_down;
- array[DEVCONF_SEG6_ENABLED] = cnf->seg6_enabled;
+ array[DEVCONF_USE_OIF_ADDRS_ONLY] = READ_ONCE(cnf->use_oif_addrs_only);
+ array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] =
+ READ_ONCE(cnf->drop_unicast_in_l2_multicast);
+ array[DEVCONF_DROP_UNSOLICITED_NA] = READ_ONCE(cnf->drop_unsolicited_na);
+ array[DEVCONF_KEEP_ADDR_ON_DOWN] = READ_ONCE(cnf->keep_addr_on_down);
+ array[DEVCONF_SEG6_ENABLED] = READ_ONCE(cnf->seg6_enabled);
#ifdef CONFIG_IPV6_SEG6_HMAC
- array[DEVCONF_SEG6_REQUIRE_HMAC] = cnf->seg6_require_hmac;
+ array[DEVCONF_SEG6_REQUIRE_HMAC] = READ_ONCE(cnf->seg6_require_hmac);
#endif
- array[DEVCONF_ENHANCED_DAD] = cnf->enhanced_dad;
- array[DEVCONF_ADDR_GEN_MODE] = cnf->addr_gen_mode;
- array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
- array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
- array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
- array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
- array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
- array[DEVCONF_IOAM6_ID_WIDE] = cnf->ioam6_id_wide;
- array[DEVCONF_NDISC_EVICT_NOCARRIER] = cnf->ndisc_evict_nocarrier;
- array[DEVCONF_ACCEPT_UNTRACKED_NA] = cnf->accept_untracked_na;
- array[DEVCONF_ACCEPT_RA_MIN_LFT] = cnf->accept_ra_min_lft;
+ array[DEVCONF_ENHANCED_DAD] = READ_ONCE(cnf->enhanced_dad);
+ array[DEVCONF_ADDR_GEN_MODE] = READ_ONCE(cnf->addr_gen_mode);
+ array[DEVCONF_DISABLE_POLICY] = READ_ONCE(cnf->disable_policy);
+ array[DEVCONF_NDISC_TCLASS] = READ_ONCE(cnf->ndisc_tclass);
+ array[DEVCONF_RPL_SEG_ENABLED] = READ_ONCE(cnf->rpl_seg_enabled);
+ array[DEVCONF_IOAM6_ENABLED] = READ_ONCE(cnf->ioam6_enabled);
+ array[DEVCONF_IOAM6_ID] = READ_ONCE(cnf->ioam6_id);
+ array[DEVCONF_IOAM6_ID_WIDE] = READ_ONCE(cnf->ioam6_id_wide);
+ array[DEVCONF_NDISC_EVICT_NOCARRIER] =
+ READ_ONCE(cnf->ndisc_evict_nocarrier);
+ array[DEVCONF_ACCEPT_UNTRACKED_NA] =
+ READ_ONCE(cnf->accept_untracked_na);
+ array[DEVCONF_ACCEPT_RA_MIN_LFT] = READ_ONCE(cnf->accept_ra_min_lft);
}
static inline size_t inet6_ifla6_size(void)
@@ -5779,13 +5790,14 @@ static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype,
static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev,
u32 ext_filter_mask)
{
- struct nlattr *nla;
struct ifla_cacheinfo ci;
+ struct nlattr *nla;
+ u32 ra_mtu;
- if (nla_put_u32(skb, IFLA_INET6_FLAGS, idev->if_flags))
+ if (nla_put_u32(skb, IFLA_INET6_FLAGS, READ_ONCE(idev->if_flags)))
goto nla_put_failure;
ci.max_reasm_len = IPV6_MAXPLEN;
- ci.tstamp = cstamp_delta(idev->tstamp);
+ ci.tstamp = cstamp_delta(READ_ONCE(idev->tstamp));
ci.reachable_time = jiffies_to_msecs(idev->nd_parms->reachable_time);
ci.retrans_time = jiffies_to_msecs(NEIGH_VAR(idev->nd_parms, RETRANS_TIME));
if (nla_put(skb, IFLA_INET6_CACHEINFO, sizeof(ci), &ci))
@@ -5817,11 +5829,12 @@ static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev,
memcpy(nla_data(nla), idev->token.s6_addr, nla_len(nla));
read_unlock_bh(&idev->lock);
- if (nla_put_u8(skb, IFLA_INET6_ADDR_GEN_MODE, idev->cnf.addr_gen_mode))
+ if (nla_put_u8(skb, IFLA_INET6_ADDR_GEN_MODE,
+ READ_ONCE(idev->cnf.addr_gen_mode)))
goto nla_put_failure;
- if (idev->ra_mtu &&
- nla_put_u32(skb, IFLA_INET6_RA_MTU, idev->ra_mtu))
+ ra_mtu = READ_ONCE(idev->ra_mtu);
+ if (ra_mtu && nla_put_u32(skb, IFLA_INET6_RA_MTU, ra_mtu))
goto nla_put_failure;
return 0;
@@ -6022,7 +6035,7 @@ static int inet6_set_link_af(struct net_device *dev, const struct nlattr *nla,
if (tb[IFLA_INET6_ADDR_GEN_MODE]) {
u8 mode = nla_get_u8(tb[IFLA_INET6_ADDR_GEN_MODE]);
- idev->cnf.addr_gen_mode = mode;
+ WRITE_ONCE(idev->cnf.addr_gen_mode, mode);
}
return 0;
@@ -6501,7 +6514,7 @@ static int addrconf_sysctl_addr_gen_mode(struct ctl_table *ctl, int write,
}
if (idev->cnf.addr_gen_mode != new_val) {
- idev->cnf.addr_gen_mode = new_val;
+ WRITE_ONCE(idev->cnf.addr_gen_mode, new_val);
addrconf_init_auto_addrs(idev->dev);
}
} else if (&net->ipv6.devconf_all->addr_gen_mode == ctl->data) {
@@ -6512,7 +6525,8 @@ static int addrconf_sysctl_addr_gen_mode(struct ctl_table *ctl, int write,
idev = __in6_dev_get(dev);
if (idev &&
idev->cnf.addr_gen_mode != new_val) {
- idev->cnf.addr_gen_mode = new_val;
+ WRITE_ONCE(idev->cnf.addr_gen_mode,
+ new_val);
addrconf_init_auto_addrs(idev->dev);
}
}
@@ -6577,14 +6591,15 @@ static int addrconf_sysctl_stable_secret(struct ctl_table *ctl, int write,
struct inet6_dev *idev = __in6_dev_get(dev);
if (idev) {
- idev->cnf.addr_gen_mode =
- IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
+ WRITE_ONCE(idev->cnf.addr_gen_mode,
+ IN6_ADDR_GEN_MODE_STABLE_PRIVACY);
}
}
} else {
struct inet6_dev *idev = ctl->extra1;
- idev->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
+ WRITE_ONCE(idev->cnf.addr_gen_mode,
+ IN6_ADDR_GEN_MODE_STABLE_PRIVACY);
}
out:
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 73cb31afe93542285e3f11b7140d2cc1619006e7..8523f0595b01899a9f6cf82809c1b4bcfc233202 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1975,7 +1975,7 @@ int ndisc_ifinfo_sysctl_change(struct ctl_table *ctl, int write, void *buffer,
if (ctl->data == &NEIGH_VAR(idev->nd_parms, BASE_REACHABLE_TIME))
idev->nd_parms->reachable_time =
neigh_rand_reach_time(NEIGH_VAR(idev->nd_parms, BASE_REACHABLE_TIME));
- idev->tstamp = jiffies;
+ WRITE_ONCE(idev->tstamp, jiffies);
inet6_ifinfo_notify(RTM_NEWLINK, idev);
in6_dev_put(idev);
}
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 03/13] ipv6: prepare inet6_fill_ifinfo() for RCU protection
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 01/13] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 02/13] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
` (9 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We want to use RCU protection instead of RTNL
for inet6_fill_ifinfo().
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 6 ++++--
net/core/dev.c | 4 ++--
net/ipv6/addrconf.c | 11 +++++++----
3 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f07c8374f29cb936fe11236fc63e06e741b1c965..09023e44db4e2c3a2133afc52ba5a335d6030646 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4354,8 +4354,10 @@ static inline bool netif_testing(const struct net_device *dev)
*/
static inline bool netif_oper_up(const struct net_device *dev)
{
- return (dev->operstate == IF_OPER_UP ||
- dev->operstate == IF_OPER_UNKNOWN /* backward compat */);
+ unsigned int operstate = READ_ONCE(dev->operstate);
+
+ return operstate == IF_OPER_UP ||
+ operstate == IF_OPER_UNKNOWN /* backward compat */;
}
/**
diff --git a/net/core/dev.c b/net/core/dev.c
index 0628d8ff1ed932efdd45ab7b79599dcfcca6c4eb..275fd5259a4a92d0bd2e145d66a716248b6c2804 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8632,12 +8632,12 @@ unsigned int dev_get_flags(const struct net_device *dev)
{
unsigned int flags;
- flags = (dev->flags & ~(IFF_PROMISC |
+ flags = (READ_ONCE(dev->flags) & ~(IFF_PROMISC |
IFF_ALLMULTI |
IFF_RUNNING |
IFF_LOWER_UP |
IFF_DORMANT)) |
- (dev->gflags & (IFF_PROMISC |
+ (READ_ONCE(dev->gflags) & (IFF_PROMISC |
IFF_ALLMULTI));
if (netif_running(dev)) {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3c8bdad0105dc9542489b612890ba86de9c44bdc..df3c6feea74e2d95144140eceb6df5cef2dce1f4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6047,6 +6047,7 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
struct net_device *dev = idev->dev;
struct ifinfomsg *hdr;
struct nlmsghdr *nlh;
+ int ifindex, iflink;
void *protoinfo;
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*hdr), flags);
@@ -6057,16 +6058,18 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
hdr->ifi_family = AF_INET6;
hdr->__ifi_pad = 0;
hdr->ifi_type = dev->type;
- hdr->ifi_index = dev->ifindex;
+ ifindex = READ_ONCE(dev->ifindex);
+ hdr->ifi_index = ifindex;
hdr->ifi_flags = dev_get_flags(dev);
hdr->ifi_change = 0;
+ iflink = dev_get_iflink(dev);
if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
(dev->addr_len &&
nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) ||
- nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
- (dev->ifindex != dev_get_iflink(dev) &&
- nla_put_u32(skb, IFLA_LINK, dev_get_iflink(dev))) ||
+ nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) ||
+ (ifindex != iflink &&
+ nla_put_u32(skb, IFLA_LINK, iflink)) ||
nla_put_u8(skb, IFLA_OPERSTATE,
netif_running(dev) ? READ_ONCE(dev->operstate) : IF_OPER_DOWN))
goto nla_put_failure;
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo()
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (2 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 03/13] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 18:39 ` Ido Schimmel
2024-02-21 10:59 ` [PATCH net-next 05/13] netlink: fix netlink_diag_dump() return value Eric Dumazet
` (8 subsequent siblings)
12 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet, Ido Schimmel
Prepare inet6_dump_ifinfo() to run with RCU protection
instead of RTNL and use for_each_netdev_dump() interface.
Also properly return 0 at the end of a dump, avoiding
an extra recvmsg() system call and RTNL acquisition.
Note that RTNL-less dumps need core changes, yet to come.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ido Schimmel <idosch@nvidia.com>
---
net/ipv6/addrconf.c | 46 +++++++++++++++++++--------------------------
1 file changed, 19 insertions(+), 27 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index df3c6feea74e2d95144140eceb6df5cef2dce1f4..8994ddc6c859e6bc68303e6e61663baf330aee00 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6117,50 +6117,42 @@ static int inet6_valid_dump_ifinfo(const struct nlmsghdr *nlh,
static int inet6_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
- int h, s_h;
- int idx = 0, s_idx;
+ struct {
+ unsigned long ifindex;
+ } *ctx = (void *)cb->ctx;
struct net_device *dev;
struct inet6_dev *idev;
- struct hlist_head *head;
+ int err;
/* only requests using strict checking can pass data to
* influence the dump
*/
if (cb->strict_check) {
- int err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack);
+ err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack);
if (err < 0)
return err;
}
- s_h = cb->args[0];
- s_idx = cb->args[1];
-
+ err = 0;
rcu_read_lock();
- for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
- idx = 0;
- head = &net->dev_index_head[h];
- hlist_for_each_entry_rcu(dev, head, index_hlist) {
- if (idx < s_idx)
- goto cont;
- idev = __in6_dev_get(dev);
- if (!idev)
- goto cont;
- if (inet6_fill_ifinfo(skb, idev,
- NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq,
- RTM_NEWLINK, NLM_F_MULTI) < 0)
- goto out;
-cont:
- idx++;
+ for_each_netdev_dump(net, dev, ctx->ifindex) {
+ idev = __in6_dev_get(dev);
+ if (!idev)
+ continue;
+ err = inet6_fill_ifinfo(skb, idev,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq,
+ RTM_NEWLINK, NLM_F_MULTI);
+ if (err < 0) {
+ if (likely(skb->len))
+ err = skb->len;
+ break;
}
}
-out:
rcu_read_unlock();
- cb->args[1] = idx;
- cb->args[0] = h;
- return skb->len;
+ return err;
}
void inet6_ifinfo_notify(int event, struct inet6_dev *idev)
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 05/13] netlink: fix netlink_diag_dump() return value
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (3 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 06/13] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
` (7 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
__netlink_diag_dump() returns 1 if the dump is not complete,
zero if no error occurred.
If err variable is zero, this means the dump is complete:
We should not return skb->len in this case, but 0.
This allows NLMSG_DONE to be appended to the skb.
User space does not have to call us again only to get NLMSG_DONE.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/netlink/diag.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index e12c90d5f6ad29446ea1990c88c19bcb0ee856c3..61981e01fd6ff189dcb46a06a4d265cf6029b840 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -207,7 +207,7 @@ static int netlink_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
err = __netlink_diag_dump(skb, cb, req->sdiag_protocol, s_num);
}
- return err < 0 ? err : skb->len;
+ return err <= 0 ? err : skb->len;
}
static int netlink_diag_dump_done(struct netlink_callback *cb)
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 06/13] netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (4 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 05/13] netlink: fix netlink_diag_dump() return value Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 07/13] rtnetlink: change nlk->cb_mutex role Eric Dumazet
` (6 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.
This seems dangerous, even if KASAN did not bother yet.
Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/netlink/af_netlink.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9c962347cf859f16fc76e4d8a2fd22cdb3d142d6..94f3860526bfaa5793e8b3917250ec0e751687b5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -130,7 +130,7 @@ static const char *const nlk_cb_mutex_key_strings[MAX_LINKS + 1] = {
"nlk_cb_mutex-MAX_LINKS"
};
-static int netlink_dump(struct sock *sk);
+static int netlink_dump(struct sock *sk, bool lock_taken);
/* nl_table locking explained:
* Lookup and traversal are protected with an RCU read-side lock. Insertion
@@ -1987,7 +1987,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
if (READ_ONCE(nlk->cb_running) &&
atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
- ret = netlink_dump(sk);
+ ret = netlink_dump(sk, false);
if (ret) {
WRITE_ONCE(sk->sk_err, -ret);
sk_error_report(sk);
@@ -2196,7 +2196,7 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
return 0;
}
-static int netlink_dump(struct sock *sk)
+static int netlink_dump(struct sock *sk, bool lock_taken)
{
struct netlink_sock *nlk = nlk_sk(sk);
struct netlink_ext_ack extack = {};
@@ -2208,7 +2208,8 @@ static int netlink_dump(struct sock *sk)
int alloc_min_size;
int alloc_size;
- mutex_lock(nlk->cb_mutex);
+ if (!lock_taken)
+ mutex_lock(nlk->cb_mutex);
if (!nlk->cb_running) {
err = -EINVAL;
goto errout_skb;
@@ -2365,9 +2366,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
WRITE_ONCE(nlk->cb_running, true);
nlk->dump_done_errno = INT_MAX;
- mutex_unlock(nlk->cb_mutex);
-
- ret = netlink_dump(sk);
+ ret = netlink_dump(sk, true);
sock_put(sk);
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 07/13] rtnetlink: change nlk->cb_mutex role
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (5 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 06/13] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 08/13] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
` (5 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and the dump() operations.
The override is used for rtnl dumps, registered with
rntl_register() and rntl_register_module().
We want to be able to opt-out some dump() operations
to not acquire RTNL, so we need to protect nlk->cb
with a per socket mutex.
This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex
The optional pointer to the mutex used to protect dump()
call is stored in nlk->dump_cb_mutex
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/netlink/af_netlink.c | 32 ++++++++++++++++++--------------
net/netlink/af_netlink.h | 5 +++--
2 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 94f3860526bfaa5793e8b3917250ec0e751687b5..84cad7be6d4335bfb5301ef49f84af8e7b3bc842 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -636,7 +636,7 @@ static struct proto netlink_proto = {
};
static int __netlink_create(struct net *net, struct socket *sock,
- struct mutex *cb_mutex, int protocol,
+ struct mutex *dump_cb_mutex, int protocol,
int kern)
{
struct sock *sk;
@@ -651,15 +651,11 @@ static int __netlink_create(struct net *net, struct socket *sock,
sock_init_data(sock, sk);
nlk = nlk_sk(sk);
- if (cb_mutex) {
- nlk->cb_mutex = cb_mutex;
- } else {
- nlk->cb_mutex = &nlk->cb_def_mutex;
- mutex_init(nlk->cb_mutex);
- lockdep_set_class_and_name(nlk->cb_mutex,
+ mutex_init(&nlk->nl_cb_mutex);
+ lockdep_set_class_and_name(&nlk->nl_cb_mutex,
nlk_cb_mutex_keys + protocol,
nlk_cb_mutex_key_strings[protocol]);
- }
+ nlk->dump_cb_mutex = dump_cb_mutex;
init_waitqueue_head(&nlk->wait);
sk->sk_destruct = netlink_sock_destruct;
@@ -2209,7 +2205,7 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
int alloc_size;
if (!lock_taken)
- mutex_lock(nlk->cb_mutex);
+ mutex_lock(&nlk->nl_cb_mutex);
if (!nlk->cb_running) {
err = -EINVAL;
goto errout_skb;
@@ -2261,14 +2257,22 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
netlink_skb_set_owner_r(skb, sk);
if (nlk->dump_done_errno > 0) {
+ struct mutex *extra_mutex = nlk->dump_cb_mutex;
+
cb->extack = &extack;
+
+ if (extra_mutex)
+ mutex_lock(extra_mutex);
nlk->dump_done_errno = cb->dump(skb, cb);
+ if (extra_mutex)
+ mutex_unlock(extra_mutex);
+
cb->extack = NULL;
}
if (nlk->dump_done_errno > 0 ||
skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
- mutex_unlock(nlk->cb_mutex);
+ mutex_unlock(&nlk->nl_cb_mutex);
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2302,13 +2306,13 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
WRITE_ONCE(nlk->cb_running, false);
module = cb->module;
skb = cb->skb;
- mutex_unlock(nlk->cb_mutex);
+ mutex_unlock(&nlk->nl_cb_mutex);
module_put(module);
consume_skb(skb);
return 0;
errout_skb:
- mutex_unlock(nlk->cb_mutex);
+ mutex_unlock(&nlk->nl_cb_mutex);
kfree_skb(skb);
return err;
}
@@ -2331,7 +2335,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
}
nlk = nlk_sk(sk);
- mutex_lock(nlk->cb_mutex);
+ mutex_lock(&nlk->nl_cb_mutex);
/* A dump is in progress... */
if (nlk->cb_running) {
ret = -EBUSY;
@@ -2382,7 +2386,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
module_put(control->module);
error_unlock:
sock_put(sk);
- mutex_unlock(nlk->cb_mutex);
+ mutex_unlock(&nlk->nl_cb_mutex);
error_free:
kfree_skb(skb);
return ret;
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 2145979b9986a0331b34b6ba2fda867f23d0d71c..9751e29d4bbb9ad9cb7900e2cfaedbe7ab138cf4 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -39,8 +39,9 @@ struct netlink_sock {
bool cb_running;
int dump_done_errno;
struct netlink_callback cb;
- struct mutex *cb_mutex;
- struct mutex cb_def_mutex;
+ struct mutex nl_cb_mutex;
+
+ struct mutex *dump_cb_mutex;
void (*netlink_rcv)(struct sk_buff *skb);
int (*netlink_bind)(struct net *net, int group);
void (*netlink_unbind)(struct net *net, int group);
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 08/13] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (6 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 07/13] rtnetlink: change nlk->cb_mutex role Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 09/13] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
` (4 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL protection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netlink.h | 2 ++
include/net/rtnetlink.h | 1 +
net/core/rtnetlink.c | 2 ++
net/netlink/af_netlink.c | 3 +++
4 files changed, 8 insertions(+)
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 1a4445bf2ab9acff630b3712453c8a6cdf8fc47c..5df7340d4dabc0c0b1728dafde43b5522dacd024 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -291,6 +291,7 @@ struct netlink_callback {
u16 answer_flags;
u32 min_dump_alloc;
unsigned int prev_seq, seq;
+ int flags;
bool strict_check;
union {
u8 ctx[48];
@@ -323,6 +324,7 @@ struct netlink_dump_control {
void *data;
struct module *module;
u32 min_dump_alloc;
+ int flags;
};
int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 6506221c5fe31f49ccaca470e0b24dffb703c28e..3bfb80bad1739d244a3906fa7f0e1a606dfaf868 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -12,6 +12,7 @@ typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *);
enum rtnl_link_flags {
RTNL_FLAG_DOIT_UNLOCKED = BIT(0),
RTNL_FLAG_BULK_DEL_SUPPORTED = BIT(1),
+ RTNL_FLAG_DUMP_UNLOCKED = BIT(2),
};
enum rtnl_kinds {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 060543fe7919c13c7a5c6cf22f9e7606d0897345..1b26dfa5668d22fb2e30ceefbf143e98df13ae29 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -6532,6 +6532,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
}
owner = link->owner;
dumpit = link->dumpit;
+ flags = link->flags;
if (type == RTM_GETLINK - RTM_BASE)
min_dump_alloc = rtnl_calcit(skb, nlh);
@@ -6549,6 +6550,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
.dump = dumpit,
.min_dump_alloc = min_dump_alloc,
.module = owner,
+ .flags = flags,
};
err = netlink_dump_start(rtnl, skb, nlh, &c);
/* netlink_dump_start() will keep a reference on
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 84cad7be6d4335bfb5301ef49f84af8e7b3bc842..be5792b638aa563232cdb96de8c97c4fe45b3718 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2261,6 +2261,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
cb->extack = &extack;
+ if (cb->flags & RTNL_FLAG_DUMP_UNLOCKED)
+ extra_mutex = NULL;
if (extra_mutex)
mutex_lock(extra_mutex);
nlk->dump_done_errno = cb->dump(skb, cb);
@@ -2355,6 +2357,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
cb->data = control->data;
cb->module = control->module;
cb->min_dump_alloc = control->min_dump_alloc;
+ cb->flags = control->flags;
cb->skb = skb;
cb->strict_check = nlk_test_bit(STRICT_CHK, NETLINK_CB(skb).sk);
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 09/13] ipv6: switch inet6_dump_ifinfo() to RCU protection
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (7 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 08/13] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 10/13] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
` (3 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
No longer hold RTNL while calling inet6_dump_ifinfo()
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv6/addrconf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8994ddc6c859e6bc68303e6e61663baf330aee00..244b670a44b92f10b8f18c444d72a2467f8ed90a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -7447,7 +7447,7 @@ int __init addrconf_init(void)
rtnl_af_register(&inet6_ops);
err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETLINK,
- NULL, inet6_dump_ifinfo, 0);
+ NULL, inet6_dump_ifinfo, RTNL_FLAG_DUMP_UNLOCKED);
if (err < 0)
goto errout;
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 10/13] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (8 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 09/13] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 11/13] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
` (2 subsequent siblings)
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Add a new field into struct fib_dump_filter, to let callers
tell if they use RTNL locking or RCU.
This is used in the following patch, when inet_dump_fib()
no longer holds RTNL.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/ip_fib.h | 1 +
net/ipv4/fib_frontend.c | 15 +++++++++++----
net/ipv4/ipmr.c | 4 +++-
net/ipv6/ip6_fib.c | 7 +++++--
net/ipv6/ip6mr.c | 4 +++-
net/mpls/af_mpls.c | 4 +++-
6 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index d4667b7797e3e4591f3ff1fe641f168295e0a894..9b2f69ba5e4981fb108581c229ff008d04750ade 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -264,6 +264,7 @@ struct fib_dump_filter {
bool filter_set;
bool dump_routes;
bool dump_exceptions;
+ bool rtnl_held;
unsigned char protocol;
unsigned char rt_type;
unsigned int flags;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 390f4be7f7bec20f33aa80e9bf12d5e2f3760562..39f67990e01c19b73a622dced0220a1bba21d5e6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -916,7 +916,8 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct rtmsg *rtm;
int err, i;
- ASSERT_RTNL();
+ if (filter->rtnl_held)
+ ASSERT_RTNL();
if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
NL_SET_ERR_MSG(extack, "Invalid header for FIB dump request");
@@ -961,7 +962,10 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
break;
case RTA_OIF:
ifindex = nla_get_u32(tb[i]);
- filter->dev = __dev_get_by_index(net, ifindex);
+ if (filter->rtnl_held)
+ filter->dev = __dev_get_by_index(net, ifindex);
+ else
+ filter->dev = dev_get_by_index_rcu(net, ifindex);
if (!filter->dev)
return -ENODEV;
break;
@@ -983,8 +987,11 @@ EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req);
static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
{
- struct fib_dump_filter filter = { .dump_routes = true,
- .dump_exceptions = true };
+ struct fib_dump_filter filter = {
+ .dump_routes = true,
+ .dump_exceptions = true,
+ .rtnl_held = true,
+ };
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
unsigned int h, s_h;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 3622298365105d99c0277f1c1616fb5fc63cdc2d..792726671dd39ace276189ea643e7b5a555dd987 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2587,7 +2587,9 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
{
- struct fib_dump_filter filter = {};
+ struct fib_dump_filter filter = {
+ .rtnl_held = true,
+ };
int err;
if (cb->strict_check) {
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 805bbf26b3efd0239c04c0d7a658b5eac26efd34..10ab771bd89d02cd471014112a46067ea3757cb7 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -620,8 +620,11 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb,
static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
{
- struct rt6_rtnl_dump_arg arg = { .filter.dump_exceptions = true,
- .filter.dump_routes = true };
+ struct rt6_rtnl_dump_arg arg = {
+ .filter.dump_exceptions = true,
+ .filter.dump_routes = true,
+ .filter.rtnl_held = true,
+ };
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
unsigned int h, s_h;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 9782c180fee646ab0fad7f0f911254b4b3a592c4..4af0e03fdd520f6bf5bedd64074fc1ee63abd09d 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2595,7 +2595,9 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = cb->nlh;
- struct fib_dump_filter filter = {};
+ struct fib_dump_filter filter = {
+ .rtnl_held = true,
+ };
int err;
if (cb->strict_check) {
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1af29af65388584e9666f4fcb73a16e8ff159587..6dab883a08dda46ff6ddc1e6e407e6f48a10c8aa 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2179,7 +2179,9 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
struct mpls_route __rcu **platform_label;
- struct fib_dump_filter filter = {};
+ struct fib_dump_filter filter = {
+ .rtnl_held = true,
+ };
unsigned int flags = NLM_F_MULTI;
size_t platform_labels;
unsigned int index;
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 11/13] inet: switch inet_dump_fib() to RCU protection
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (9 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 10/13] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 13/13] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
No longer hold RTNL while calling inet_dump_fib().
Also change return value for a completed dump:
Returning 0 instead of skb->len allows NLMSG_DONE
to be appended to the skb. User space does not have
to call us again to get a standalone NLMSG_DONE marker.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/fib_frontend.c | 37 ++++++++++++++++++-------------------
net/ipv4/fib_trie.c | 4 ++--
2 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 39f67990e01c19b73a622dced0220a1bba21d5e6..bf3a2214fe29b6f9b494581b293259e6c5ce6f8c 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -990,7 +990,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
struct fib_dump_filter filter = {
.dump_routes = true,
.dump_exceptions = true,
- .rtnl_held = true,
+ .rtnl_held = false,
};
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
@@ -998,12 +998,13 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
unsigned int e = 0, s_e;
struct fib_table *tb;
struct hlist_head *head;
- int dumped = 0, err;
+ int dumped = 0, err = 0;
+ rcu_read_lock();
if (cb->strict_check) {
err = ip_valid_fib_dump_req(net, nlh, &filter, cb);
if (err < 0)
- return err;
+ goto unlock;
} else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
struct rtmsg *rtm = nlmsg_data(nlh);
@@ -1012,29 +1013,28 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
/* ipv4 does not use prefix flag */
if (filter.flags & RTM_F_PREFIX)
- return skb->len;
+ goto unlock;
if (filter.table_id) {
tb = fib_get_table(net, filter.table_id);
if (!tb) {
if (rtnl_msg_family(cb->nlh) != PF_INET)
- return skb->len;
+ goto unlock;
NL_SET_ERR_MSG(cb->extack, "ipv4: FIB table does not exist");
- return -ENOENT;
+ err = -ENOENT;
+ goto unlock;
}
-
- rcu_read_lock();
err = fib_table_dump(tb, skb, cb, &filter);
- rcu_read_unlock();
- return skb->len ? : err;
+ if (err < 0 && skb->len)
+ err = skb->len;
+ goto unlock;
}
s_h = cb->args[0];
s_e = cb->args[1];
- rcu_read_lock();
-
+ err = 0;
for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_e = 0) {
e = 0;
head = &net->ipv4.fib_table_hash[h];
@@ -1047,9 +1047,8 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
err = fib_table_dump(tb, skb, cb, &filter);
if (err < 0) {
if (likely(skb->len))
- goto out;
-
- goto out_err;
+ err = skb->len;
+ goto out;
}
dumped = 1;
next:
@@ -1057,13 +1056,12 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
}
}
out:
- err = skb->len;
-out_err:
- rcu_read_unlock();
cb->args[1] = e;
cb->args[0] = h;
+unlock:
+ rcu_read_unlock();
return err;
}
@@ -1666,5 +1664,6 @@ void __init ip_fib_init(void)
rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, 0);
rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, 0);
- rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, 0);
+ rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib,
+ RTNL_FLAG_DUMP_UNLOCKED);
}
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 0fc7ab5832d1ae00e33fdf6fad4ef379c7d0bd4d..f474106464d2f2a52fa6b7ecaf2146977d05eecc 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2368,7 +2368,7 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
* and key == 0 means the dump has wrapped around and we are done.
*/
if (count && !key)
- return skb->len;
+ return 0;
while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
int err;
@@ -2394,7 +2394,7 @@ int fib_table_dump(struct fib_table *tb, struct sk_buff *skb,
cb->args[3] = key;
cb->args[2] = count;
- return skb->len;
+ return 0;
}
void __init fib_trie_init(void)
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (10 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 11/13] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
2024-02-21 16:02 ` Jiri Pirko
2024-02-21 10:59 ` [PATCH net-next 13/13] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
12 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Use READ_ONCE() to read the following device fields:
dev->mem_start
dev->mem_end
dev->base_addr
dev->irq
dev->dma
dev->if_port
Provide IFLA_MAP attribute only if at least one of these fields
is not zero. This saves some space in the output skb for most devices.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/rtnetlink.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1b26dfa5668d22fb2e30ceefbf143e98df13ae29..b91ec216c593aaebf97ea69aa0d2d265ab61c098 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1455,19 +1455,21 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
return 0;
}
-static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
+static int rtnl_fill_link_ifmap(struct sk_buff *skb,
+ const struct net_device *dev)
{
struct rtnl_link_ifmap map;
memset(&map, 0, sizeof(map));
- map.mem_start = dev->mem_start;
- map.mem_end = dev->mem_end;
- map.base_addr = dev->base_addr;
- map.irq = dev->irq;
- map.dma = dev->dma;
- map.port = dev->if_port;
-
- if (nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
+ map.mem_start = READ_ONCE(dev->mem_start);
+ map.mem_end = READ_ONCE(dev->mem_end);
+ map.base_addr = READ_ONCE(dev->base_addr);
+ map.irq = READ_ONCE(dev->irq);
+ map.dma = READ_ONCE(dev->dma);
+ map.port = READ_ONCE(dev->if_port);
+ /* Only report non zero information. */
+ if (memchr_inv(&map, 0, sizeof(map)) &&
+ nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
return -EMSGSIZE;
return 0;
@@ -1875,9 +1877,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
goto nla_put_failure;
}
- if (rtnl_fill_link_ifmap(skb, dev))
- goto nla_put_failure;
-
if (dev->addr_len) {
if (nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr) ||
nla_put(skb, IFLA_BROADCAST, dev->addr_len, dev->broadcast))
@@ -1927,6 +1926,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
rcu_read_lock();
if (rtnl_fill_link_af(skb, dev, ext_filter_mask))
goto nla_put_failure_rcu;
+ if (rtnl_fill_link_ifmap(skb, dev))
+ goto nla_put_failure_rcu;
+
rcu_read_unlock();
if (rtnl_fill_prop_list(skb, dev))
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next 13/13] rtnetlink: provide RCU protection to rtnl_fill_prop_list()
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
` (11 preceding siblings ...)
2024-02-21 10:59 ` [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
@ 2024-02-21 10:59 ` Eric Dumazet
12 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 10:59 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
dev->name_node items are already rcu protected.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/rtnetlink.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b91ec216c593aaebf97ea69aa0d2d265ab61c098..59b64febb244b51969651bb37740a799376ad35f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1700,7 +1700,7 @@ static int rtnl_fill_alt_ifnames(struct sk_buff *skb,
struct netdev_name_node *name_node;
int count = 0;
- list_for_each_entry(name_node, &dev->name_node->list, list) {
+ list_for_each_entry_rcu(name_node, &dev->name_node->list, list) {
if (nla_put_string(skb, IFLA_ALT_IFNAME, name_node->name))
return -EMSGSIZE;
count++;
@@ -1708,6 +1708,7 @@ static int rtnl_fill_alt_ifnames(struct sk_buff *skb,
return count;
}
+/* RCU protected. */
static int rtnl_fill_prop_list(struct sk_buff *skb,
const struct net_device *dev)
{
@@ -1928,11 +1929,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
goto nla_put_failure_rcu;
if (rtnl_fill_link_ifmap(skb, dev))
goto nla_put_failure_rcu;
-
- rcu_read_unlock();
-
if (rtnl_fill_prop_list(skb, dev))
- goto nla_put_failure;
+ goto nla_put_failure_rcu;
+ rcu_read_unlock();
if (dev->dev.parent &&
nla_put_string(skb, IFLA_PARENT_DEV_NAME,
--
2.44.0.rc0.258.g7320e95886-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
2024-02-21 10:59 ` [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
@ 2024-02-21 16:02 ` Jiri Pirko
2024-02-21 17:15 ` Eric Dumazet
0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2024-02-21 16:02 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
Wed, Feb 21, 2024 at 11:59:14AM CET, edumazet@google.com wrote:
>Use READ_ONCE() to read the following device fields:
>
> dev->mem_start
> dev->mem_end
> dev->base_addr
> dev->irq
> dev->dma
> dev->if_port
>
>Provide IFLA_MAP attribute only if at least one of these fields
>is not zero. This saves some space in the output skb for most devices.
>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>---
> net/core/rtnetlink.c | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>index 1b26dfa5668d22fb2e30ceefbf143e98df13ae29..b91ec216c593aaebf97ea69aa0d2d265ab61c098 100644
>--- a/net/core/rtnetlink.c
>+++ b/net/core/rtnetlink.c
>@@ -1455,19 +1455,21 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
> return 0;
> }
>
>-static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
>+static int rtnl_fill_link_ifmap(struct sk_buff *skb,
>+ const struct net_device *dev)
> {
> struct rtnl_link_ifmap map;
>
> memset(&map, 0, sizeof(map));
>- map.mem_start = dev->mem_start;
>- map.mem_end = dev->mem_end;
>- map.base_addr = dev->base_addr;
>- map.irq = dev->irq;
>- map.dma = dev->dma;
>- map.port = dev->if_port;
>-
>- if (nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
>+ map.mem_start = READ_ONCE(dev->mem_start);
>+ map.mem_end = READ_ONCE(dev->mem_end);
>+ map.base_addr = READ_ONCE(dev->base_addr);
>+ map.irq = READ_ONCE(dev->irq);
>+ map.dma = READ_ONCE(dev->dma);
>+ map.port = READ_ONCE(dev->if_port);
>+ /* Only report non zero information. */
>+ if (memchr_inv(&map, 0, sizeof(map)) &&
This check(optimization) is unrelated to the rest of the patch, correct?
If yes, could it be a separate patch?
>+ nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
> return -EMSGSIZE;
>
> return 0;
>@@ -1875,9 +1877,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
> goto nla_put_failure;
> }
>
>- if (rtnl_fill_link_ifmap(skb, dev))
>- goto nla_put_failure;
>-
> if (dev->addr_len) {
> if (nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr) ||
> nla_put(skb, IFLA_BROADCAST, dev->addr_len, dev->broadcast))
>@@ -1927,6 +1926,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
> rcu_read_lock();
> if (rtnl_fill_link_af(skb, dev, ext_filter_mask))
> goto nla_put_failure_rcu;
>+ if (rtnl_fill_link_ifmap(skb, dev))
>+ goto nla_put_failure_rcu;
>+
> rcu_read_unlock();
>
> if (rtnl_fill_prop_list(skb, dev))
>--
>2.44.0.rc0.258.g7320e95886-goog
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
2024-02-21 16:02 ` Jiri Pirko
@ 2024-02-21 17:15 ` Eric Dumazet
2024-02-21 18:22 ` Jiri Pirko
2024-02-21 18:56 ` Jakub Kicinski
0 siblings, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 17:15 UTC (permalink / raw)
To: Jiri Pirko
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Wed, Feb 21, 2024 at 5:03 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Wed, Feb 21, 2024 at 11:59:14AM CET, edumazet@google.com wrote:
> >Use READ_ONCE() to read the following device fields:
> >
> > dev->mem_start
> > dev->mem_end
> > dev->base_addr
> > dev->irq
> > dev->dma
> > dev->if_port
> >
> >Provide IFLA_MAP attribute only if at least one of these fields
> >is not zero. This saves some space in the output skb for most devices.
> >
> >Signed-off-by: Eric Dumazet <edumazet@google.com>
> >---
> > net/core/rtnetlink.c | 26 ++++++++++++++------------
> > 1 file changed, 14 insertions(+), 12 deletions(-)
> >
> >diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> >index 1b26dfa5668d22fb2e30ceefbf143e98df13ae29..b91ec216c593aaebf97ea69aa0d2d265ab61c098 100644
> >--- a/net/core/rtnetlink.c
> >+++ b/net/core/rtnetlink.c
> >@@ -1455,19 +1455,21 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
> > return 0;
> > }
> >
> >-static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
> >+static int rtnl_fill_link_ifmap(struct sk_buff *skb,
> >+ const struct net_device *dev)
> > {
> > struct rtnl_link_ifmap map;
> >
> > memset(&map, 0, sizeof(map));
> >- map.mem_start = dev->mem_start;
> >- map.mem_end = dev->mem_end;
> >- map.base_addr = dev->base_addr;
> >- map.irq = dev->irq;
> >- map.dma = dev->dma;
> >- map.port = dev->if_port;
> >-
> >- if (nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
> >+ map.mem_start = READ_ONCE(dev->mem_start);
> >+ map.mem_end = READ_ONCE(dev->mem_end);
> >+ map.base_addr = READ_ONCE(dev->base_addr);
> >+ map.irq = READ_ONCE(dev->irq);
> >+ map.dma = READ_ONCE(dev->dma);
> >+ map.port = READ_ONCE(dev->if_port);
> >+ /* Only report non zero information. */
> >+ if (memchr_inv(&map, 0, sizeof(map)) &&
>
> This check(optimization) is unrelated to the rest of the patch, correct?
> If yes, could it be a separate patch?
Sure thing. BTW, do you know which tool is using this ?
I could not find IFLA_MAP being used in iproute2 or ethtool.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
2024-02-21 17:15 ` Eric Dumazet
@ 2024-02-21 18:22 ` Jiri Pirko
2024-02-21 18:56 ` Jakub Kicinski
1 sibling, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2024-02-21 18:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
Wed, Feb 21, 2024 at 06:15:11PM CET, edumazet@google.com wrote:
>On Wed, Feb 21, 2024 at 5:03 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Wed, Feb 21, 2024 at 11:59:14AM CET, edumazet@google.com wrote:
>> >Use READ_ONCE() to read the following device fields:
>> >
>> > dev->mem_start
>> > dev->mem_end
>> > dev->base_addr
>> > dev->irq
>> > dev->dma
>> > dev->if_port
>> >
>> >Provide IFLA_MAP attribute only if at least one of these fields
>> >is not zero. This saves some space in the output skb for most devices.
>> >
>> >Signed-off-by: Eric Dumazet <edumazet@google.com>
>> >---
>> > net/core/rtnetlink.c | 26 ++++++++++++++------------
>> > 1 file changed, 14 insertions(+), 12 deletions(-)
>> >
>> >diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> >index 1b26dfa5668d22fb2e30ceefbf143e98df13ae29..b91ec216c593aaebf97ea69aa0d2d265ab61c098 100644
>> >--- a/net/core/rtnetlink.c
>> >+++ b/net/core/rtnetlink.c
>> >@@ -1455,19 +1455,21 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
>> > return 0;
>> > }
>> >
>> >-static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
>> >+static int rtnl_fill_link_ifmap(struct sk_buff *skb,
>> >+ const struct net_device *dev)
>> > {
>> > struct rtnl_link_ifmap map;
>> >
>> > memset(&map, 0, sizeof(map));
>> >- map.mem_start = dev->mem_start;
>> >- map.mem_end = dev->mem_end;
>> >- map.base_addr = dev->base_addr;
>> >- map.irq = dev->irq;
>> >- map.dma = dev->dma;
>> >- map.port = dev->if_port;
>> >-
>> >- if (nla_put_64bit(skb, IFLA_MAP, sizeof(map), &map, IFLA_PAD))
>> >+ map.mem_start = READ_ONCE(dev->mem_start);
>> >+ map.mem_end = READ_ONCE(dev->mem_end);
>> >+ map.base_addr = READ_ONCE(dev->base_addr);
>> >+ map.irq = READ_ONCE(dev->irq);
>> >+ map.dma = READ_ONCE(dev->dma);
>> >+ map.port = READ_ONCE(dev->if_port);
>> >+ /* Only report non zero information. */
>> >+ if (memchr_inv(&map, 0, sizeof(map)) &&
>>
>> This check(optimization) is unrelated to the rest of the patch, correct?
>> If yes, could it be a separate patch?
>
>Sure thing. BTW, do you know which tool is using this ?
>
>I could not find IFLA_MAP being used in iproute2 or ethtool.
No clue. Never spotted it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo()
2024-02-21 10:59 ` [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
@ 2024-02-21 18:39 ` Ido Schimmel
2024-02-21 18:57 ` Eric Dumazet
0 siblings, 1 reply; 20+ messages in thread
From: Ido Schimmel @ 2024-02-21 18:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Wed, Feb 21, 2024 at 10:59:06AM +0000, Eric Dumazet wrote:
> Prepare inet6_dump_ifinfo() to run with RCU protection
> instead of RTNL and use for_each_netdev_dump() interface.
>
> Also properly return 0 at the end of a dump, avoiding
> an extra recvmsg() system call and RTNL acquisition.
>
> Note that RTNL-less dumps need core changes, yet to come.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
BTW, not sure if you saw, but there's a failure in the fib_nexthops test
in Jakub's CI due to a lockdep splat [1]. Reproducer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 2 dev dummy1
# ip nexthop add id 10 group 1/2
# ip route add 198.51.100.0/24 nhid 10
# ip -4 r s
Seems like an oversight in nexthop code and fixed by:
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 6647ad509faa..77e99cba60ad 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -317,7 +317,7 @@ static inline
int nexthop_mpath_fill_node(struct sk_buff *skb, struct nexthop *nh,
u8 rt_family)
{
- struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
+ struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp);
int i;
for (i = 0; i < nhg->num_nh; i++) {
[1]
=============================
WARNING: suspicious RCU usage
6.8.0-rc4-custom-g85d71c2cf96e #20 Not tainted
-----------------------------
include/net/nexthop.h:320 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ip/668:
#0: ffff88801c1a3eb0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x155/0x9e0
#1: ffffffff85d4fba0 (rcu_read_lock){....}-{1:2}, at: inet_dump_fib+0x133/0xab0
stack backtrace:
CPU: 19 PID: 668 Comm: ip Not tainted 6.8.0-rc4-custom-g85d71c2cf96e #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xbd/0xe0
lockdep_rcu_suspicious+0x211/0x3b0
fib_dump_info+0x1ae2/0x1e10
fib_table_dump+0xc2e/0xf70
inet_dump_fib+0x7fa/0xab0
netlink_dump+0xd47/0x10f0
__netlink_dump_start+0x702/0x9e0
rtnetlink_rcv_msg+0xb6e/0xf20
netlink_rcv_skb+0x170/0x440
netlink_unicast+0x540/0x820
netlink_sendmsg+0x8d8/0xda0
__sys_sendto+0x27a/0x3f0
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0xc5/0x1d0
entry_SYSCALL_64_after_hwframe+0x63/0x6b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready
2024-02-21 17:15 ` Eric Dumazet
2024-02-21 18:22 ` Jiri Pirko
@ 2024-02-21 18:56 ` Jakub Kicinski
1 sibling, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2024-02-21 18:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jiri Pirko, David S . Miller, Paolo Abeni, netdev, eric.dumazet
On Wed, 21 Feb 2024 18:15:11 +0100 Eric Dumazet wrote:
> > This check(optimization) is unrelated to the rest of the patch, correct?
> > If yes, could it be a separate patch?
>
> Sure thing. BTW, do you know which tool is using this ?
>
> I could not find IFLA_MAP being used in iproute2 or ethtool.
FWIW I think it's just a blind forward-port of the ioctl functionality.
Would be really great to find a way to phase those fields out of struct
net_device if not the uAPI :(
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo()
2024-02-21 18:39 ` Ido Schimmel
@ 2024-02-21 18:57 ` Eric Dumazet
0 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-02-21 18:57 UTC (permalink / raw)
To: Ido Schimmel
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Wed, Feb 21, 2024 at 7:39 PM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Wed, Feb 21, 2024 at 10:59:06AM +0000, Eric Dumazet wrote:
> > Prepare inet6_dump_ifinfo() to run with RCU protection
> > instead of RTNL and use for_each_netdev_dump() interface.
> >
> > Also properly return 0 at the end of a dump, avoiding
> > an extra recvmsg() system call and RTNL acquisition.
> >
> > Note that RTNL-less dumps need core changes, yet to come.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Ido Schimmel <idosch@nvidia.com>
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
>
> BTW, not sure if you saw, but there's a failure in the fib_nexthops test
> in Jakub's CI due to a lockdep splat [1]. Reproducer:
I have not seen this yet, thanks !
>
> # ip link add name dummy1 up type dummy
> # ip nexthop add id 1 dev dummy1
> # ip nexthop add id 2 dev dummy1
> # ip nexthop add id 10 group 1/2
> # ip route add 198.51.100.0/24 nhid 10
> # ip -4 r s
>
> Seems like an oversight in nexthop code and fixed by:
>
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 6647ad509faa..77e99cba60ad 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -317,7 +317,7 @@ static inline
> int nexthop_mpath_fill_node(struct sk_buff *skb, struct nexthop *nh,
> u8 rt_family)
> {
> - struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
> + struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp);
> int i;
>
Indeed, and this is followed few lines later by
struct nh_info *nhi = rcu_dereference_rtnl(nhe->nh_info); // This
was done nicely
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-02-21 18:57 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-21 10:59 [PATCH net-next 00/13] rtnetlink: reduce RTNL pressure for dumps Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 01/13] rtnetlink: prepare nla_put_iflink() to run under RCU Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 02/13] ipv6: prepare inet6_fill_ifla6_attrs() for RCU Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 03/13] ipv6: prepare inet6_fill_ifinfo() for RCU protection Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 04/13] ipv6: use xarray iterator to implement inet6_dump_ifinfo() Eric Dumazet
2024-02-21 18:39 ` Ido Schimmel
2024-02-21 18:57 ` Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 05/13] netlink: fix netlink_diag_dump() return value Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 06/13] netlink: hold nlk->cb_mutex longer in __netlink_dump_start() Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 07/13] rtnetlink: change nlk->cb_mutex role Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 08/13] rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 09/13] ipv6: switch inet6_dump_ifinfo() to RCU protection Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 10/13] inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 11/13] inet: switch inet_dump_fib() to RCU protection Eric Dumazet
2024-02-21 10:59 ` [PATCH net-next 12/13] rtnetlink: make rtnl_fill_link_ifmap() RCU ready Eric Dumazet
2024-02-21 16:02 ` Jiri Pirko
2024-02-21 17:15 ` Eric Dumazet
2024-02-21 18:22 ` Jiri Pirko
2024-02-21 18:56 ` Jakub Kicinski
2024-02-21 10:59 ` [PATCH net-next 13/13] rtnetlink: provide RCU protection to rtnl_fill_prop_list() Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).