* [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo()
@ 2024-05-03 19:20 Eric Dumazet
2024-05-03 19:20 ` [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output Eric Dumazet
` (8 more replies)
0 siblings, 9 replies; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We want to no longer rely on RTNL for "ip link show" command.
This is a long road, this series takes care of some parts.
Eric Dumazet (8):
rtnetlink: do not depend on RTNL for IFLA_QDISC output
rtnetlink: do not depend on RTNL for IFLA_IFNAME output
rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
net: write once on dev->allmulti and dev->promiscuity
rtnetlink: do not depend on RTNL for many attributes
rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()
rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb()
rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
drivers/net/ppp/ppp_generic.c | 2 +-
drivers/net/vxlan/vxlan_core.c | 2 +-
net/core/dev.c | 51 ++++++++++---------
net/core/rtnetlink.c | 90 ++++++++++++++++++++--------------
net/ipv4/ip_tunnel.c | 2 +-
net/ipv6/ip6_tunnel.c | 2 +-
net/sched/sch_api.c | 2 +-
net/sched/sch_teql.c | 2 +-
net/xfrm/xfrm_interface_core.c | 2 +-
9 files changed, 89 insertions(+), 66 deletions(-)
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:48 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output Eric Dumazet
` (7 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
dev->qdisc can be read using RCU protection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/rtnetlink.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 283e42f48af68504af193ed5763d4e0fcd667d99..f4a87f89d5cde0cdd35c156d78ebe31511d4a31c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1832,7 +1832,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
if (tgt_netnsid >= 0 && nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
goto nla_put_failure;
- qdisc = rtnl_dereference(dev->qdisc);
if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
nla_put_u32(skb, IFLA_TXQLEN, dev->tx_queue_len) ||
nla_put_u8(skb, IFLA_OPERSTATE,
@@ -1857,8 +1856,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
#endif
put_master_ifindex(skb, dev) ||
nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
- (qdisc &&
- nla_put_string(skb, IFLA_QDISC, qdisc->ops->id)) ||
nla_put_ifalias(skb, dev) ||
nla_put_u32(skb, IFLA_CARRIER_CHANGES,
atomic_read(&dev->carrier_up_count) +
@@ -1924,6 +1921,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
goto nla_put_failure;
rcu_read_lock();
+ qdisc = rcu_dereference(dev->qdisc);
+ if (qdisc && nla_put_string(skb, IFLA_QDISC, qdisc->ops->id))
+ goto nla_put_failure_rcu;
if (rtnl_fill_link_af(skb, dev, ext_filter_mask))
goto nla_put_failure_rcu;
if (rtnl_fill_link_ifmap(skb, dev))
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
2024-05-03 19:20 ` [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:49 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output Eric Dumazet
` (6 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We can use netdev_copy_name() to no longer rely on RTNL
to fetch dev->name.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/rtnetlink.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f4a87f89d5cde0cdd35c156d78ebe31511d4a31c..a92e3b533d8d2ed1a52a40e02eb994c3070ede38 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1812,6 +1812,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
u32 event, int *new_nsid, int new_ifindex,
int tgt_netnsid, gfp_t gfp)
{
+ char devname[IFNAMSIZ];
struct ifinfomsg *ifm;
struct nlmsghdr *nlh;
struct Qdisc *qdisc;
@@ -1832,8 +1833,11 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
if (tgt_netnsid >= 0 && nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
goto nla_put_failure;
- if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
- nla_put_u32(skb, IFLA_TXQLEN, dev->tx_queue_len) ||
+ netdev_copy_name(dev, devname);
+ if (nla_put_string(skb, IFLA_IFNAME, devname))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, IFLA_TXQLEN, dev->tx_queue_len) ||
nla_put_u8(skb, IFLA_OPERSTATE,
netif_running(dev) ? dev->operstate : IF_OPER_DOWN) ||
nla_put_u8(skb, IFLA_LINKMODE, dev->link_mode) ||
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
2024-05-03 19:20 ` [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output Eric Dumazet
2024-05-03 19:20 ` [PATCH net-next 2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:43 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 4/8] net: write once on dev->allmulti and dev->promiscuity Eric Dumazet
` (5 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly,
granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations.
Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue()
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 4 ++--
net/core/rtnetlink.c | 2 +-
net/sched/sch_api.c | 2 +-
net/sched/sch_teql.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index e02d2363347e2e403ccb2a59d44d35cee9a1b367..9c8c2ab2d76c3587d9114bc86a395341e1fd4d2b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8959,7 +8959,7 @@ int dev_change_tx_queue_len(struct net_device *dev, unsigned long new_len)
return -ERANGE;
if (new_len != orig_len) {
- dev->tx_queue_len = new_len;
+ WRITE_ONCE(dev->tx_queue_len, new_len);
res = call_netdevice_notifiers(NETDEV_CHANGE_TX_QUEUE_LEN, dev);
res = notifier_to_errno(res);
if (res)
@@ -8973,7 +8973,7 @@ int dev_change_tx_queue_len(struct net_device *dev, unsigned long new_len)
err_rollback:
netdev_err(dev, "refused to change device tx_queue_len\n");
- dev->tx_queue_len = orig_len;
+ WRITE_ONCE(dev->tx_queue_len, orig_len);
return res;
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a92e3b533d8d2ed1a52a40e02eb994c3070ede38..77d14528bdefc8b655f5da37ed88d0b937f35a61 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1837,7 +1837,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
if (nla_put_string(skb, IFLA_IFNAME, devname))
goto nla_put_failure;
- if (nla_put_u32(skb, IFLA_TXQLEN, dev->tx_queue_len) ||
+ if (nla_put_u32(skb, IFLA_TXQLEN, READ_ONCE(dev->tx_queue_len)) ||
nla_put_u8(skb, IFLA_OPERSTATE,
netif_running(dev) ? dev->operstate : IF_OPER_DOWN) ||
nla_put_u8(skb, IFLA_LINKMODE, dev->link_mode) ||
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6292d6d73b720fef6766d08ed01d8b93a99f97b6..74afc210527d237cca3b48166be5918f802eb326 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1334,7 +1334,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
* before again attaching a qdisc.
*/
if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) {
- dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
+ WRITE_ONCE(dev->tx_queue_len, DEFAULT_TX_QUEUE_LEN);
netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
}
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index 59304611dc0050e525de5f45b2a3b8628b684ff3..29850d0f073308290ac1a479bc98315034990663 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -78,7 +78,7 @@ teql_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
struct net_device *dev = qdisc_dev(sch);
struct teql_sched_data *q = qdisc_priv(sch);
- if (q->q.qlen < dev->tx_queue_len) {
+ if (q->q.qlen < READ_ONCE(dev->tx_queue_len)) {
__skb_queue_tail(&q->q, skb);
return NET_XMIT_SUCCESS;
}
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 4/8] net: write once on dev->allmulti and dev->promiscuity
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
` (2 preceding siblings ...)
2024-05-03 19:20 ` [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:54 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes Eric Dumazet
` (4 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
In the following patch we want to read dev->allmulti
and dev->promiscuity locklessly from rtnl_fill_ifinfo()
In this patch I change __dev_set_promiscuity() and
__dev_set_allmulti() to write these fields (and dev->flags)
only if they succeed, with WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 9c8c2ab2d76c3587d9114bc86a395341e1fd4d2b..35ce603ffc57fa209dc9a57e60981a2bca5e6a29 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8544,27 +8544,29 @@ static void dev_change_rx_flags(struct net_device *dev, int flags)
static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
{
unsigned int old_flags = dev->flags;
+ unsigned int promiscuity, flags;
kuid_t uid;
kgid_t gid;
ASSERT_RTNL();
- dev->flags |= IFF_PROMISC;
- dev->promiscuity += inc;
- if (dev->promiscuity == 0) {
+ promiscuity = dev->promiscuity + inc;
+ if (promiscuity == 0) {
/*
* Avoid overflow.
* If inc causes overflow, untouch promisc and return error.
*/
- if (inc < 0)
- dev->flags &= ~IFF_PROMISC;
- else {
- dev->promiscuity -= inc;
+ if (unlikely(inc > 0)) {
netdev_warn(dev, "promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.\n");
return -EOVERFLOW;
}
+ flags = old_flags & ~IFF_PROMISC;
+ } else {
+ flags = old_flags | IFF_PROMISC;
}
- if (dev->flags != old_flags) {
+ WRITE_ONCE(dev->promiscuity, promiscuity);
+ if (flags != old_flags) {
+ WRITE_ONCE(dev->flags, flags);
netdev_info(dev, "%s promiscuous mode\n",
dev->flags & IFF_PROMISC ? "entered" : "left");
if (audit_enabled) {
@@ -8615,25 +8617,27 @@ EXPORT_SYMBOL(dev_set_promiscuity);
static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify)
{
unsigned int old_flags = dev->flags, old_gflags = dev->gflags;
+ unsigned int allmulti, flags;
ASSERT_RTNL();
- dev->flags |= IFF_ALLMULTI;
- dev->allmulti += inc;
- if (dev->allmulti == 0) {
+ allmulti = dev->allmulti + inc;
+ if (allmulti == 0) {
/*
* Avoid overflow.
* If inc causes overflow, untouch allmulti and return error.
*/
- if (inc < 0)
- dev->flags &= ~IFF_ALLMULTI;
- else {
- dev->allmulti -= inc;
+ if (unlikely(inc > 0)) {
netdev_warn(dev, "allmulti touches roof, set allmulti failed. allmulti feature of device might be broken.\n");
return -EOVERFLOW;
}
+ flags = old_flags & ~IFF_ALLMULTI;
+ } else {
+ flags = old_flags | IFF_ALLMULTI;
}
- if (dev->flags ^ old_flags) {
+ WRITE_ONCE(dev->allmulti, allmulti);
+ if (flags != old_flags) {
+ WRITE_ONCE(dev->flags, flags);
netdev_info(dev, "%s allmulticast mode\n",
dev->flags & IFF_ALLMULTI ? "entered" : "left");
dev_change_rx_flags(dev, IFF_ALLMULTI);
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
` (3 preceding siblings ...)
2024-05-03 19:20 ` [PATCH net-next 4/8] net: write once on dev->allmulti and dev->promiscuity Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:46 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down() Eric Dumazet
` (3 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Following device fields can be read locklessly
in rtnl_fill_ifinfo() :
type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
tso_max_segs, num_rx_queues.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/rtnetlink.c | 51 +++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 20 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 77d14528bdefc8b655f5da37ed88d0b937f35a61..242c24e857ec4c799f0239be3371fd589a8ed191 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1603,7 +1603,8 @@ static int put_master_ifindex(struct sk_buff *skb, struct net_device *dev)
upper_dev = netdev_master_upper_dev_get_rcu(dev);
if (upper_dev)
- ret = nla_put_u32(skb, IFLA_MASTER, upper_dev->ifindex);
+ ret = nla_put_u32(skb, IFLA_MASTER,
+ READ_ONCE(upper_dev->ifindex));
rcu_read_unlock();
return ret;
@@ -1825,8 +1826,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
ifm = nlmsg_data(nlh);
ifm->ifi_family = AF_UNSPEC;
ifm->__ifi_pad = 0;
- ifm->ifi_type = dev->type;
- ifm->ifi_index = dev->ifindex;
+ ifm->ifi_type = READ_ONCE(dev->type);
+ ifm->ifi_index = READ_ONCE(dev->ifindex);
ifm->ifi_flags = dev_get_flags(dev);
ifm->ifi_change = change;
@@ -1839,24 +1840,34 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
if (nla_put_u32(skb, IFLA_TXQLEN, READ_ONCE(dev->tx_queue_len)) ||
nla_put_u8(skb, IFLA_OPERSTATE,
- netif_running(dev) ? dev->operstate : IF_OPER_DOWN) ||
- nla_put_u8(skb, IFLA_LINKMODE, dev->link_mode) ||
- nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
- nla_put_u32(skb, IFLA_MIN_MTU, dev->min_mtu) ||
- nla_put_u32(skb, IFLA_MAX_MTU, dev->max_mtu) ||
- nla_put_u32(skb, IFLA_GROUP, dev->group) ||
- nla_put_u32(skb, IFLA_PROMISCUITY, dev->promiscuity) ||
- nla_put_u32(skb, IFLA_ALLMULTI, dev->allmulti) ||
- nla_put_u32(skb, IFLA_NUM_TX_QUEUES, dev->num_tx_queues) ||
- nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
- nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
- nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
- nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE, dev->gso_ipv4_max_size) ||
- nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE, dev->gro_ipv4_max_size) ||
- nla_put_u32(skb, IFLA_TSO_MAX_SIZE, dev->tso_max_size) ||
- nla_put_u32(skb, IFLA_TSO_MAX_SEGS, dev->tso_max_segs) ||
+ netif_running(dev) ? READ_ONCE(dev->operstate) :
+ IF_OPER_DOWN) ||
+ nla_put_u8(skb, IFLA_LINKMODE, READ_ONCE(dev->link_mode)) ||
+ nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) ||
+ nla_put_u32(skb, IFLA_MIN_MTU, READ_ONCE(dev->min_mtu)) ||
+ nla_put_u32(skb, IFLA_MAX_MTU, READ_ONCE(dev->max_mtu)) ||
+ nla_put_u32(skb, IFLA_GROUP, READ_ONCE(dev->group)) ||
+ nla_put_u32(skb, IFLA_PROMISCUITY, READ_ONCE(dev->promiscuity)) ||
+ nla_put_u32(skb, IFLA_ALLMULTI, READ_ONCE(dev->allmulti)) ||
+ nla_put_u32(skb, IFLA_NUM_TX_QUEUES,
+ READ_ONCE(dev->num_tx_queues)) ||
+ nla_put_u32(skb, IFLA_GSO_MAX_SEGS,
+ READ_ONCE(dev->gso_max_segs)) ||
+ nla_put_u32(skb, IFLA_GSO_MAX_SIZE,
+ READ_ONCE(dev->gso_max_size)) ||
+ nla_put_u32(skb, IFLA_GRO_MAX_SIZE,
+ READ_ONCE(dev->gro_max_size)) ||
+ nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE,
+ READ_ONCE(dev->gso_ipv4_max_size)) ||
+ nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE,
+ READ_ONCE(dev->gro_ipv4_max_size)) ||
+ nla_put_u32(skb, IFLA_TSO_MAX_SIZE,
+ READ_ONCE(dev->tso_max_size)) ||
+ nla_put_u32(skb, IFLA_TSO_MAX_SEGS,
+ READ_ONCE(dev->tso_max_segs)) ||
#ifdef CONFIG_RPS
- nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
+ nla_put_u32(skb, IFLA_NUM_RX_QUEUES,
+ READ_ONCE(dev->num_rx_queues)) ||
#endif
put_master_ifindex(skb, dev) ||
nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
` (4 preceding siblings ...)
2024-05-03 19:20 ` [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:52 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb() Eric Dumazet
` (2 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
Change dev_change_proto_down() and dev_change_proto_down_reason()
to write once on dev->proto_down and dev->proto_down_reason.
Then rtnl_fill_proto_down() can use READ_ONCE() annotations
and run locklessly.
rtnl_proto_down_size() should assume worst case,
because readng dev->proto_down_reason multiple
times would be racy without RTNL in the future.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 11 +++++++----
net/core/rtnetlink.c | 8 ++++----
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 35ce603ffc57fa209dc9a57e60981a2bca5e6a29..06304f07146c7f672573a977b069e7f2c031122e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9223,7 +9223,7 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
netif_carrier_off(dev);
else
netif_carrier_on(dev);
- dev->proto_down = proto_down;
+ WRITE_ONCE(dev->proto_down, proto_down);
return 0;
}
@@ -9237,18 +9237,21 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask,
u32 value)
{
+ u32 proto_down_reason;
int b;
if (!mask) {
- dev->proto_down_reason = value;
+ proto_down_reason = value;
} else {
+ proto_down_reason = dev->proto_down_reason;
for_each_set_bit(b, &mask, 32) {
if (value & (1 << b))
- dev->proto_down_reason |= BIT(b);
+ proto_down_reason |= BIT(b);
else
- dev->proto_down_reason &= ~BIT(b);
+ proto_down_reason &= ~BIT(b);
}
}
+ WRITE_ONCE(dev->proto_down_reason, proto_down_reason);
}
struct bpf_xdp_link {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 242c24e857ec4c799f0239be3371fd589a8ed191..6af7f00503b43d4989d0aaafc8b968216a6e77f5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1036,8 +1036,8 @@ static size_t rtnl_proto_down_size(const struct net_device *dev)
{
size_t size = nla_total_size(1);
- if (dev->proto_down_reason)
- size += nla_total_size(0) + nla_total_size(4);
+ /* Assume dev->proto_down_reason is not zero. */
+ size += nla_total_size(0) + nla_total_size(4);
return size;
}
@@ -1737,10 +1737,10 @@ static int rtnl_fill_proto_down(struct sk_buff *skb,
struct nlattr *pr;
u32 preason;
- if (nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down))
+ if (nla_put_u8(skb, IFLA_PROTO_DOWN, READ_ONCE(dev->proto_down)))
goto nla_put_failure;
- preason = dev->proto_down_reason;
+ preason = READ_ONCE(dev->proto_down_reason);
if (!preason)
return 0;
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb()
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
` (5 preceding siblings ...)
2024-05-03 19:20 ` [PATCH net-next 6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down() Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:53 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection Eric Dumazet
2024-05-07 9:40 ` [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() patchwork-bot+netdevbpf
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
dev->xdp_prog is protected by RCU, we can lift RTNL requirement
from rtnl_xdp_prog_skb().
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/rtnetlink.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6af7f00503b43d4989d0aaafc8b968216a6e77f5..66e5be7b92686deb03f58ee43c9707470b8c70d6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1477,13 +1477,15 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb,
static u32 rtnl_xdp_prog_skb(struct net_device *dev)
{
const struct bpf_prog *generic_xdp_prog;
+ u32 res = 0;
- ASSERT_RTNL();
+ rcu_read_lock();
+ generic_xdp_prog = rcu_dereference(dev->xdp_prog);
+ if (generic_xdp_prog)
+ res = generic_xdp_prog->aux->id;
+ rcu_read_unlock();
- generic_xdp_prog = rtnl_dereference(dev->xdp_prog);
- if (!generic_xdp_prog)
- return 0;
- return generic_xdp_prog->aux->id;
+ return res;
}
static u32 rtnl_xdp_prog_drv(struct net_device *dev)
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH net-next 8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
` (6 preceding siblings ...)
2024-05-03 19:20 ` [PATCH net-next 7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb() Eric Dumazet
@ 2024-05-03 19:20 ` Eric Dumazet
2024-05-05 14:54 ` Simon Horman
2024-05-07 9:40 ` [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() patchwork-bot+netdevbpf
8 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-03 19:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
All rtnl_link_ops->get_link_net() methods already using dev_net()
are ready. I added READ_ONCE() annotations on others.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/ppp/ppp_generic.c | 2 +-
drivers/net/vxlan/vxlan_core.c | 2 +-
net/core/rtnetlink.c | 5 ++---
net/ipv4/ip_tunnel.c | 2 +-
net/ipv6/ip6_tunnel.c | 2 +-
net/xfrm/xfrm_interface_core.c | 2 +-
6 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index fe380fe196e7b4a1ab4a6f15569d258132c00bac..0a65b6d690feb9fb5d4f1a2046d6195ac0bd39f9 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1357,7 +1357,7 @@ static struct net *ppp_nl_get_link_net(const struct net_device *dev)
{
struct ppp *ppp = netdev_priv(dev);
- return ppp->ppp_net;
+ return READ_ONCE(ppp->ppp_net);
}
static struct rtnl_link_ops ppp_link_ops __read_mostly = {
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 8884913e04738b32848b951c671ae3ede9a828e7..7e3a7d1f2018120fbe729eb5c53a6783f492ead6 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4569,7 +4569,7 @@ static struct net *vxlan_get_link_net(const struct net_device *dev)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
- return vxlan->net;
+ return READ_ONCE(vxlan->net);
}
static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 66e5be7b92686deb03f58ee43c9707470b8c70d6..91ba27e9169664126f15d19be0299151ce722cd4 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1923,9 +1923,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
goto nla_put_failure;
}
- if (rtnl_fill_link_netnsid(skb, dev, src_net, gfp))
- goto nla_put_failure;
-
if (new_nsid &&
nla_put_s32(skb, IFLA_NEW_NETNSID, *new_nsid) < 0)
goto nla_put_failure;
@@ -1938,6 +1935,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
goto nla_put_failure;
rcu_read_lock();
+ if (rtnl_fill_link_netnsid(skb, dev, src_net, GFP_ATOMIC))
+ goto nla_put_failure_rcu;
qdisc = rcu_dereference(dev->qdisc);
if (qdisc && nla_put_string(skb, IFLA_QDISC, qdisc->ops->id))
goto nla_put_failure_rcu;
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index ba46cf7612f4fc2cba33c098933b6578dd885587..f1c5f6c3f2f82e19b8e9b696c6900948a946bacc 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -1120,7 +1120,7 @@ struct net *ip_tunnel_get_link_net(const struct net_device *dev)
{
struct ip_tunnel *tunnel = netdev_priv(dev);
- return tunnel->net;
+ return READ_ONCE(tunnel->net);
}
EXPORT_SYMBOL(ip_tunnel_get_link_net);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 57bb3b3ea0c5a463f0c90659fcffe9358a4084b2..5aec79c2af1a58ed1c57cca6d06951403e087d62 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -2146,7 +2146,7 @@ struct net *ip6_tnl_get_link_net(const struct net_device *dev)
{
struct ip6_tnl *tunnel = netdev_priv(dev);
- return tunnel->net;
+ return READ_ONCE(tunnel->net);
}
EXPORT_SYMBOL(ip6_tnl_get_link_net);
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index 4df5c06e3ece834039e1713377538bd7f4d12a3e..e50e4bf993fa473769a0062ffcc661daefaf1b6b 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -926,7 +926,7 @@ static struct net *xfrmi_get_link_net(const struct net_device *dev)
{
struct xfrm_if *xi = netdev_priv(dev);
- return xi->net;
+ return READ_ONCE(xi->net);
}
static const struct nla_policy xfrmi_policy[IFLA_XFRM_MAX + 1] = {
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
2024-05-03 19:20 ` [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output Eric Dumazet
@ 2024-05-05 14:43 ` Simon Horman
2024-05-07 9:26 ` Paolo Abeni
0 siblings, 1 reply; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:54PM +0000, Eric Dumazet wrote:
> rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly,
> granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations.
>
> Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue()
Hi Eric,
I am wondering if READ_ONCE(caifd->netdev->tx_queue_len)
is also missing from net/caif/caif_dev.c:transmit().
...
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-03 19:20 ` [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes Eric Dumazet
@ 2024-05-05 14:46 ` Simon Horman
2024-05-05 15:00 ` Eric Dumazet
0 siblings, 1 reply; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> Following device fields can be read locklessly
> in rtnl_fill_ifinfo() :
>
> type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> tso_max_segs, num_rx_queues.
Hi Eric,
* Regarding mtu, as the comment you added to sruct net_device
some time ago mentions, mtu is written in many places.
I'm wondering if, in particular wrt ndo_change_mtu implementations,
if some it is appropriate to add WRITE_ONCE() annotations.
* Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output
2024-05-03 19:20 ` [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output Eric Dumazet
@ 2024-05-05 14:48 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:48 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:52PM +0000, Eric Dumazet wrote:
> dev->qdisc can be read using RCU protection.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output
2024-05-03 19:20 ` [PATCH net-next 2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output Eric Dumazet
@ 2024-05-05 14:49 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:49 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:53PM +0000, Eric Dumazet wrote:
> We can use netdev_copy_name() to no longer rely on RTNL
> to fetch dev->name.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()
2024-05-03 19:20 ` [PATCH net-next 6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down() Eric Dumazet
@ 2024-05-05 14:52 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:52 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:57PM +0000, Eric Dumazet wrote:
> Change dev_change_proto_down() and dev_change_proto_down_reason()
> to write once on dev->proto_down and dev->proto_down_reason.
>
> Then rtnl_fill_proto_down() can use READ_ONCE() annotations
> and run locklessly.
>
> rtnl_proto_down_size() should assume worst case,
> because readng dev->proto_down_reason multiple
> times would be racy without RTNL in the future.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb()
2024-05-03 19:20 ` [PATCH net-next 7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb() Eric Dumazet
@ 2024-05-05 14:53 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:53 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:58PM +0000, Eric Dumazet wrote:
> dev->xdp_prog is protected by RCU, we can lift RTNL requirement
> from rtnl_xdp_prog_skb().
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
2024-05-03 19:20 ` [PATCH net-next 8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection Eric Dumazet
@ 2024-05-05 14:54 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:59PM +0000, Eric Dumazet wrote:
> We want to be able to run rtnl_fill_ifinfo() under RCU protection
> instead of RTNL in the future.
>
> All rtnl_link_ops->get_link_net() methods already using dev_net()
> are ready. I added READ_ONCE() annotations on others.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 4/8] net: write once on dev->allmulti and dev->promiscuity
2024-05-03 19:20 ` [PATCH net-next 4/8] net: write once on dev->allmulti and dev->promiscuity Eric Dumazet
@ 2024-05-05 14:54 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-05 14:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Fri, May 03, 2024 at 07:20:55PM +0000, Eric Dumazet wrote:
> In the following patch we want to read dev->allmulti
> and dev->promiscuity locklessly from rtnl_fill_ifinfo()
>
> In this patch I change __dev_set_promiscuity() and
> __dev_set_allmulti() to write these fields (and dev->flags)
> only if they succeed, with WRITE_ONCE() annotations.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-05 14:46 ` Simon Horman
@ 2024-05-05 15:00 ` Eric Dumazet
2024-05-05 15:06 ` Simon Horman
0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-05 15:00 UTC (permalink / raw)
To: Simon Horman
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
>
> On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > Following device fields can be read locklessly
> > in rtnl_fill_ifinfo() :
> >
> > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > tso_max_segs, num_rx_queues.
>
> Hi Eric,
>
> * Regarding mtu, as the comment you added to sruct net_device
> some time ago mentions, mtu is written in many places.
>
> I'm wondering if, in particular wrt ndo_change_mtu implementations,
> if some it is appropriate to add WRITE_ONCE() annotations.
Sure thing. I called for these changes in commit
501a90c94510 ("inet: protect against too small mtu values.")
when I said "Hopefully we will add the missing ones in followup patches."
>
> * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?
In general, a lot of write sides would need to be changed.
In practice, most compilers will not perform store tearing, this would
be quite expensive.
Also, adding WRITE_ONCE() will not prevent a reader from reading some
temporary values,
take a look at dev_change_tx_queue_len() for instance.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-05 15:00 ` Eric Dumazet
@ 2024-05-05 15:06 ` Simon Horman
2024-05-05 15:14 ` Eric Dumazet
0 siblings, 1 reply; 27+ messages in thread
From: Simon Horman @ 2024-05-05 15:06 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> >
> > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > Following device fields can be read locklessly
> > > in rtnl_fill_ifinfo() :
> > >
> > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > tso_max_segs, num_rx_queues.
> >
> > Hi Eric,
> >
> > * Regarding mtu, as the comment you added to sruct net_device
> > some time ago mentions, mtu is written in many places.
> >
> > I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > if some it is appropriate to add WRITE_ONCE() annotations.
>
> Sure thing. I called for these changes in commit
> 501a90c94510 ("inet: protect against too small mtu values.")
> when I said "Hopefully we will add the missing ones in followup patches."
Ok, so basically it would be nice to add them,
but they don't block progress of this patchset?
> > * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?
>
> In general, a lot of write sides would need to be changed.
>
> In practice, most compilers will not perform store tearing, this would
> be quite expensive.
>
> Also, adding WRITE_ONCE() will not prevent a reader from reading some
> temporary values,
> take a look at dev_change_tx_queue_len() for instance.
Thank you, I will study this more closely.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-05 15:06 ` Simon Horman
@ 2024-05-05 15:14 ` Eric Dumazet
2024-05-07 16:38 ` Simon Horman
0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-05 15:14 UTC (permalink / raw)
To: Simon Horman
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
>
> On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > >
> > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > Following device fields can be read locklessly
> > > > in rtnl_fill_ifinfo() :
> > > >
> > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > tso_max_segs, num_rx_queues.
> > >
> > > Hi Eric,
> > >
> > > * Regarding mtu, as the comment you added to sruct net_device
> > > some time ago mentions, mtu is written in many places.
> > >
> > > I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > if some it is appropriate to add WRITE_ONCE() annotations.
> >
> > Sure thing. I called for these changes in commit
> > 501a90c94510 ("inet: protect against too small mtu values.")
> > when I said "Hopefully we will add the missing ones in followup patches."
>
> Ok, so basically it would be nice to add them,
> but they don't block progress of this patchset?
A patch set adding WRITE_ONCE() on all dev->mtu would be great,
and seems orthogonal. Note that we already have many points in the
stack where dev->mtu is read locklessly.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
2024-05-05 14:43 ` Simon Horman
@ 2024-05-07 9:26 ` Paolo Abeni
2024-05-07 9:35 ` Eric Dumazet
2024-05-07 16:32 ` Simon Horman
0 siblings, 2 replies; 27+ messages in thread
From: Paolo Abeni @ 2024-05-07 9:26 UTC (permalink / raw)
To: Simon Horman, Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, netdev, eric.dumazet
On Sun, 2024-05-05 at 15:43 +0100, Simon Horman wrote:
> On Fri, May 03, 2024 at 07:20:54PM +0000, Eric Dumazet wrote:
> > rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly,
> > granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations.
> >
> > Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue()
>
> Hi Eric,
>
> I am wondering if READ_ONCE(caifd->netdev->tx_queue_len)
> is also missing from net/caif/caif_dev.c:transmit().
I agree such read is outside the rtnl lock and could use a READ_ONCE
annotation. I think it's better to handle that as an eventual follow-up
instead of blocking this series.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
2024-05-07 9:26 ` Paolo Abeni
@ 2024-05-07 9:35 ` Eric Dumazet
2024-05-07 16:32 ` Simon Horman
1 sibling, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2024-05-07 9:35 UTC (permalink / raw)
To: Paolo Abeni
Cc: Simon Horman, David S. Miller, Jakub Kicinski, netdev,
eric.dumazet
On Tue, May 7, 2024 at 11:26 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sun, 2024-05-05 at 15:43 +0100, Simon Horman wrote:
> > On Fri, May 03, 2024 at 07:20:54PM +0000, Eric Dumazet wrote:
> > > rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly,
> > > granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations.
> > >
> > > Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue()
> >
> > Hi Eric,
> >
> > I am wondering if READ_ONCE(caifd->netdev->tx_queue_len)
> > is also missing from net/caif/caif_dev.c:transmit().
>
> I agree such read is outside the rtnl lock and could use a READ_ONCE
> annotation. I think it's better to handle that as an eventual follow-up
> instead of blocking this series.
I missed Simon feedback, sorry.
Yes, we can add missing READ_ONCE() as follow ups.
They are all orthogonal.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo()
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
` (7 preceding siblings ...)
2024-05-03 19:20 ` [PATCH net-next 8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection Eric Dumazet
@ 2024-05-07 9:40 ` patchwork-bot+netdevbpf
8 siblings, 0 replies; 27+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-05-07 9:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, eric.dumazet
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Fri, 3 May 2024 19:20:51 +0000 you wrote:
> We want to no longer rely on RTNL for "ip link show" command.
>
> This is a long road, this series takes care of some parts.
>
> Eric Dumazet (8):
> rtnetlink: do not depend on RTNL for IFLA_QDISC output
> rtnetlink: do not depend on RTNL for IFLA_IFNAME output
> rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
> net: write once on dev->allmulti and dev->promiscuity
> rtnetlink: do not depend on RTNL for many attributes
> rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()
> rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb()
> rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
>
> [...]
Here is the summary with links:
- [net-next,1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output
https://git.kernel.org/netdev/net-next/c/698419ffb6fc
- [net-next,2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output
https://git.kernel.org/netdev/net-next/c/8a5826813362
- [net-next,3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
https://git.kernel.org/netdev/net-next/c/ad13b5b0d1f9
- [net-next,4/8] net: write once on dev->allmulti and dev->promiscuity
https://git.kernel.org/netdev/net-next/c/55a2c86c8db3
- [net-next,5/8] rtnetlink: do not depend on RTNL for many attributes
https://git.kernel.org/netdev/net-next/c/6747a5d4990b
- [net-next,6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down()
https://git.kernel.org/netdev/net-next/c/6890ab31d1a3
- [net-next,7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb()
https://git.kernel.org/netdev/net-next/c/979aad40da92
- [net-next,8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection
https://git.kernel.org/netdev/net-next/c/9cf621bd5fcb
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output
2024-05-07 9:26 ` Paolo Abeni
2024-05-07 9:35 ` Eric Dumazet
@ 2024-05-07 16:32 ` Simon Horman
1 sibling, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-07 16:32 UTC (permalink / raw)
To: Paolo Abeni
Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, netdev,
eric.dumazet
On Tue, May 07, 2024 at 11:26:10AM +0200, Paolo Abeni wrote:
> On Sun, 2024-05-05 at 15:43 +0100, Simon Horman wrote:
> > On Fri, May 03, 2024 at 07:20:54PM +0000, Eric Dumazet wrote:
> > > rtnl_fill_ifinfo() can read dev->tx_queue_len locklessly,
> > > granted we add corresponding READ_ONCE()/WRITE_ONCE() annotations.
> > >
> > > Add missing READ_ONCE(dev->tx_queue_len) in teql_enqueue()
> >
> > Hi Eric,
> >
> > I am wondering if READ_ONCE(caifd->netdev->tx_queue_len)
> > is also missing from net/caif/caif_dev.c:transmit().
>
> I agree such read is outside the rtnl lock and could use a READ_ONCE
> annotation. I think it's better to handle that as an eventual follow-up
> instead of blocking this series.
Yes, that is fine by me too.
And I am happy to add it to my TODO list if no one else wants to take it.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-05 15:14 ` Eric Dumazet
@ 2024-05-07 16:38 ` Simon Horman
2024-05-07 16:39 ` Eric Dumazet
0 siblings, 1 reply; 27+ messages in thread
From: Simon Horman @ 2024-05-07 16:38 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote:
> On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
> >
> > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > > >
> > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > > Following device fields can be read locklessly
> > > > > in rtnl_fill_ifinfo() :
> > > > >
> > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > > tso_max_segs, num_rx_queues.
> > > >
> > > > Hi Eric,
> > > >
> > > > * Regarding mtu, as the comment you added to sruct net_device
> > > > some time ago mentions, mtu is written in many places.
> > > >
> > > > I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > > if some it is appropriate to add WRITE_ONCE() annotations.
> > >
> > > Sure thing. I called for these changes in commit
> > > 501a90c94510 ("inet: protect against too small mtu values.")
> > > when I said "Hopefully we will add the missing ones in followup patches."
> >
> > Ok, so basically it would be nice to add them,
> > but they don't block progress of this patchset?
>
> A patch set adding WRITE_ONCE() on all dev->mtu would be great,
> and seems orthogonal.
Ack. I'm guessing an incremental approach to getting better coverage would
be best. I'll add this to my todo list.
> Note that we already have many points in the
> stack where dev->mtu is read locklessly.
Understood.
For the record, I have no objections to this patch.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-07 16:38 ` Simon Horman
@ 2024-05-07 16:39 ` Eric Dumazet
2024-05-07 16:55 ` Simon Horman
0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2024-05-07 16:39 UTC (permalink / raw)
To: Simon Horman
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Tue, May 7, 2024 at 6:38 PM Simon Horman <horms@kernel.org> wrote:
>
> On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote:
> > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
> > >
> > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > > > >
> > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > > > Following device fields can be read locklessly
> > > > > > in rtnl_fill_ifinfo() :
> > > > > >
> > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > > > tso_max_segs, num_rx_queues.
> > > > >
> > > > > Hi Eric,
> > > > >
> > > > > * Regarding mtu, as the comment you added to sruct net_device
> > > > > some time ago mentions, mtu is written in many places.
> > > > >
> > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > > > if some it is appropriate to add WRITE_ONCE() annotations.
> > > >
> > > > Sure thing. I called for these changes in commit
> > > > 501a90c94510 ("inet: protect against too small mtu values.")
> > > > when I said "Hopefully we will add the missing ones in followup patches."
> > >
> > > Ok, so basically it would be nice to add them,
> > > but they don't block progress of this patchset?
> >
> > A patch set adding WRITE_ONCE() on all dev->mtu would be great,
> > and seems orthogonal.
>
> Ack. I'm guessing an incremental approach to getting better coverage would
> be best. I'll add this to my todo list.
I sent a single patch about that already ;)
https://patchwork.kernel.org/project/netdevbpf/patch/20240506102812.3025432-1-edumazet@google.com/
Thanks !
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes
2024-05-07 16:39 ` Eric Dumazet
@ 2024-05-07 16:55 ` Simon Horman
0 siblings, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-05-07 16:55 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet
On Tue, May 07, 2024 at 06:39:54PM +0200, Eric Dumazet wrote:
> On Tue, May 7, 2024 at 6:38 PM Simon Horman <horms@kernel.org> wrote:
> >
> > On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote:
> > > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
> > > >
> > > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > > > > Following device fields can be read locklessly
> > > > > > > in rtnl_fill_ifinfo() :
> > > > > > >
> > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > > > > tso_max_segs, num_rx_queues.
> > > > > >
> > > > > > Hi Eric,
> > > > > >
> > > > > > * Regarding mtu, as the comment you added to sruct net_device
> > > > > > some time ago mentions, mtu is written in many places.
> > > > > >
> > > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > > > > if some it is appropriate to add WRITE_ONCE() annotations.
> > > > >
> > > > > Sure thing. I called for these changes in commit
> > > > > 501a90c94510 ("inet: protect against too small mtu values.")
> > > > > when I said "Hopefully we will add the missing ones in followup patches."
> > > >
> > > > Ok, so basically it would be nice to add them,
> > > > but they don't block progress of this patchset?
> > >
> > > A patch set adding WRITE_ONCE() on all dev->mtu would be great,
> > > and seems orthogonal.
> >
> > Ack. I'm guessing an incremental approach to getting better coverage would
> > be best. I'll add this to my todo list.
>
> I sent a single patch about that already ;)
:)
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2024-05-07 16:55 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-03 19:20 [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() Eric Dumazet
2024-05-03 19:20 ` [PATCH net-next 1/8] rtnetlink: do not depend on RTNL for IFLA_QDISC output Eric Dumazet
2024-05-05 14:48 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 2/8] rtnetlink: do not depend on RTNL for IFLA_IFNAME output Eric Dumazet
2024-05-05 14:49 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 3/8] rtnetlink: do not depend on RTNL for IFLA_TXQLEN output Eric Dumazet
2024-05-05 14:43 ` Simon Horman
2024-05-07 9:26 ` Paolo Abeni
2024-05-07 9:35 ` Eric Dumazet
2024-05-07 16:32 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 4/8] net: write once on dev->allmulti and dev->promiscuity Eric Dumazet
2024-05-05 14:54 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 5/8] rtnetlink: do not depend on RTNL for many attributes Eric Dumazet
2024-05-05 14:46 ` Simon Horman
2024-05-05 15:00 ` Eric Dumazet
2024-05-05 15:06 ` Simon Horman
2024-05-05 15:14 ` Eric Dumazet
2024-05-07 16:38 ` Simon Horman
2024-05-07 16:39 ` Eric Dumazet
2024-05-07 16:55 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 6/8] rtnetlink: do not depend on RTNL in rtnl_fill_proto_down() Eric Dumazet
2024-05-05 14:52 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 7/8] rtnetlink: do not depend on RTNL in rtnl_xdp_prog_skb() Eric Dumazet
2024-05-05 14:53 ` Simon Horman
2024-05-03 19:20 ` [PATCH net-next 8/8] rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protection Eric Dumazet
2024-05-05 14:54 ` Simon Horman
2024-05-07 9:40 ` [PATCH net-next 0/8] rtnetlink: more rcu conversions for rtnl_fill_ifinfo() patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).