* [PATCH v2 net 01/16] net: add dev_net_rcu() helper
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 02/16] ipv4: add RCU protection to ip4_dst_hoplimit() Eric Dumazet
` (14 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
dev->nd_net can change, readers should either
use rcu_read_lock() or RTNL.
We currently use a generic helper, dev_net() with
no debugging support. We probably have many hidden bugs.
Add dev_net_rcu() helper for callers using rcu_read_lock()
protection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/linux/netdevice.h | 6 ++++++
include/net/net_namespace.h | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2a59034a5fa2fb53300657968c2053ab354bb746..046015adf2856f859b9a671e2be4ef674125ef96 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2663,6 +2663,12 @@ struct net *dev_net(const struct net_device *dev)
return read_pnet(&dev->nd_net);
}
+static inline
+struct net *dev_net_rcu(const struct net_device *dev)
+{
+ return read_pnet_rcu(&dev->nd_net);
+}
+
static inline
void dev_net_set(struct net_device *dev, struct net *net)
{
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0f5eb9db0c6264efc1ac83ab577511fd6823f4fe..7ba1402ca7796663bed3373b1a0c6a0249cd1599 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -398,7 +398,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
#endif
}
-static inline struct net *read_pnet_rcu(possible_net_t *pnet)
+static inline struct net *read_pnet_rcu(const possible_net_t *pnet)
{
#ifdef CONFIG_NET_NS
return rcu_dereference(pnet->net);
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 02/16] ipv4: add RCU protection to ip4_dst_hoplimit()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 01/16] net: add dev_net_rcu() helper Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 03/16] ipv4: use RCU protection in ip_dst_mtu_maybe_forward() Eric Dumazet
` (13 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
ip4_dst_hoplimit() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/route.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index f86775be3e2934697533a61f566aca1ef196d74e..c605fd5ec0c08cc7658c3cf6aa6223790d463ede 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -382,10 +382,15 @@ static inline int inet_iif(const struct sk_buff *skb)
static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
{
int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT);
- struct net *net = dev_net(dst->dev);
- if (hoplimit == 0)
+ if (hoplimit == 0) {
+ const struct net *net;
+
+ rcu_read_lock();
+ net = dev_net_rcu(dst->dev);
hoplimit = READ_ONCE(net->ipv4.sysctl_ip_default_ttl);
+ rcu_read_unlock();
+ }
return hoplimit;
}
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 03/16] ipv4: use RCU protection in ip_dst_mtu_maybe_forward()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 01/16] net: add dev_net_rcu() helper Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 02/16] ipv4: add RCU protection to ip4_dst_hoplimit() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 04/16] ipv4: use RCU protection in ipv4_default_advmss() Eric Dumazet
` (12 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
ip_dst_mtu_maybe_forward() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: f87c10a8aa1e8 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/ip.h | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index 9f5e33e371fcdd8ea88c54584b8d4b6c50e7d0c9..ba7b43447775e51b3b9a8cbf5c3345d6308bb525 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -471,9 +471,12 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
bool forwarding)
{
const struct rtable *rt = dst_rtable(dst);
- struct net *net = dev_net(dst->dev);
- unsigned int mtu;
+ unsigned int mtu, res;
+ struct net *net;
+
+ rcu_read_lock();
+ net = dev_net_rcu(dst->dev);
if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
ip_mtu_locked(dst) ||
!forwarding) {
@@ -497,7 +500,11 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
out:
mtu = min_t(unsigned int, mtu, IP_MAX_MTU);
- return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
+ res = mtu - lwtunnel_headroom(dst->lwtstate, mtu);
+
+ rcu_read_unlock();
+
+ return res;
}
static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 04/16] ipv4: use RCU protection in ipv4_default_advmss()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (2 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 03/16] ipv4: use RCU protection in ip_dst_mtu_maybe_forward() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 05/16] ipv4: use RCU protection in rt_is_expired() Eric Dumazet
` (11 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
ipv4_default_advmss() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: 2e9589ff809e ("ipv4: Namespaceify min_adv_mss sysctl knob")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/route.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 577b88a43293aa801c3ee736d7e5cc4d97917717..74c074f45758be5ae78a87edb31837481cc40278 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1307,10 +1307,15 @@ static void set_class_tag(struct rtable *rt, u32 tag)
static unsigned int ipv4_default_advmss(const struct dst_entry *dst)
{
- struct net *net = dev_net(dst->dev);
unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct iphdr);
- unsigned int advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size,
- net->ipv4.ip_rt_min_advmss);
+ unsigned int advmss;
+ struct net *net;
+
+ rcu_read_lock();
+ net = dev_net_rcu(dst->dev);
+ advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size,
+ net->ipv4.ip_rt_min_advmss);
+ rcu_read_unlock();
return min(advmss, IPV4_MAX_PMTU - header_size);
}
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 05/16] ipv4: use RCU protection in rt_is_expired()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (3 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 04/16] ipv4: use RCU protection in ipv4_default_advmss() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 06/16] tcp: convert to dev_net_rcu() Eric Dumazet
` (10 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
rt_is_expired() must use RCU protection to make
sure the net structure it reads does not disappear.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/route.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 74c074f45758be5ae78a87edb31837481cc40278..e959327c0ba8979ce5c7ca8c46ae41068824edc6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -390,7 +390,13 @@ static inline int ip_rt_proc_init(void)
static inline bool rt_is_expired(const struct rtable *rth)
{
- return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev));
+ bool res;
+
+ rcu_read_lock();
+ res = rth->rt_genid != rt_genid_ipv4(dev_net_rcu(rth->dst.dev));
+ rcu_read_unlock();
+
+ return res;
}
void rt_cache_flush(struct net *net)
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 06/16] tcp: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (4 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 05/16] ipv4: use RCU protection in rt_is_expired() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 07/16] net: gro: convert four dev_net() calls Eric Dumazet
` (9 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
TCP uses of dev_net() are safe, change them to dev_net_rcu()
to get LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/inet6_hashtables.h | 2 +-
include/net/inet_hashtables.h | 2 +-
net/ipv4/tcp_ipv4.c | 8 ++++----
net/ipv4/tcp_metrics.c | 6 +++---
net/ipv6/tcp_ipv6.c | 10 +++++-----
5 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 74dd90ff5f129fe4c8adad67a642ae5070410518..c32878c69179dac5a7fcfa098a297420d9adfab2 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -150,7 +150,7 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
int iif, int sdif,
bool *refcounted)
{
- struct net *net = dev_net(skb_dst(skb)->dev);
+ struct net *net = dev_net_rcu(skb_dst(skb)->dev);
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
struct sock *sk;
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 5eea47f135a421ce8275d4cd83c5771b3f448e5c..da818fb0205fed6b4120946bc032e67e046b716f 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -492,7 +492,7 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
const int sdif,
bool *refcounted)
{
- struct net *net = dev_net(skb_dst(skb)->dev);
+ struct net *net = dev_net_rcu(skb_dst(skb)->dev);
const struct iphdr *iph = ip_hdr(skb);
struct sock *sk;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cc2b5194a18d2e64595f474f62c6f2fd3eff319f..3bd835220d43d6d6491fd5c8d5e9954c37303f83 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -503,7 +503,7 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
struct request_sock *fastopen;
u32 seq, snd_una;
int err;
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
iph->daddr, th->dest, iph->saddr,
@@ -788,7 +788,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb,
arg.iov[0].iov_base = (unsigned char *)&rep;
arg.iov[0].iov_len = sizeof(rep.th);
- net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev);
+ net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev);
/* Invalid TCP option size or twice included auth */
if (tcp_parse_auth_options(tcp_hdr(skb), &md5_hash_location, &aoh))
@@ -1967,7 +1967,7 @@ EXPORT_SYMBOL(tcp_v4_do_rcv);
int tcp_v4_early_demux(struct sk_buff *skb)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
const struct iphdr *iph;
const struct tcphdr *th;
struct sock *sk;
@@ -2178,7 +2178,7 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph,
int tcp_v4_rcv(struct sk_buff *skb)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
enum skb_drop_reason drop_reason;
int sdif = inet_sdif(skb);
int dif = inet_iif(skb);
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 95669935494ef8003a1877e2b86c76bd27307afd..4251670e328c83b55eff7bbda3cc3d97d78563a8 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -170,7 +170,7 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst,
bool reclaim = false;
spin_lock_bh(&tcp_metrics_lock);
- net = dev_net(dst->dev);
+ net = dev_net_rcu(dst->dev);
/* While waiting for the spin-lock the cache might have been populated
* with this entry and so we have to check again.
@@ -273,7 +273,7 @@ static struct tcp_metrics_block *__tcp_get_metrics_req(struct request_sock *req,
return NULL;
}
- net = dev_net(dst->dev);
+ net = dev_net_rcu(dst->dev);
hash ^= net_hash_mix(net);
hash = hash_32(hash, tcp_metrics_hash_log);
@@ -318,7 +318,7 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
else
return NULL;
- net = dev_net(dst->dev);
+ net = dev_net_rcu(dst->dev);
hash ^= net_hash_mix(net);
hash = hash_32(hash, tcp_metrics_hash_log);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2debdf085a3b4d2452b2b316cb5368507b17efc8..429f8a5ab511b671aa405ae20f7c1b3163839779 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -376,7 +376,7 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
{
const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data;
const struct tcphdr *th = (struct tcphdr *)(skb->data+offset);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
struct request_sock *fastopen;
struct ipv6_pinfo *np;
struct tcp_sock *tp;
@@ -868,7 +868,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
struct tcphdr *t1;
struct sk_buff *buff;
struct flowi6 fl6;
- struct net *net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev);
+ struct net *net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev);
struct sock *ctl_sk = net->ipv6.tcp_sk;
unsigned int tot_len = sizeof(struct tcphdr);
__be32 mrst = 0, *topt;
@@ -1039,7 +1039,7 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb,
if (!sk && !ipv6_unicast_destination(skb))
return;
- net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev);
+ net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev);
/* Invalid TCP option size or twice included auth */
if (tcp_parse_auth_options(th, &md5_hash_location, &aoh))
return;
@@ -1744,6 +1744,7 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
{
+ struct net *net = dev_net_rcu(skb->dev);
enum skb_drop_reason drop_reason;
int sdif = inet6_sdif(skb);
int dif = inet6_iif(skb);
@@ -1753,7 +1754,6 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
bool refcounted;
int ret;
u32 isn;
- struct net *net = dev_net(skb->dev);
drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
if (skb->pkt_type != PACKET_HOST)
@@ -2004,7 +2004,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
void tcp_v6_early_demux(struct sk_buff *skb)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
const struct ipv6hdr *hdr;
const struct tcphdr *th;
struct sock *sk;
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 07/16] net: gro: convert four dev_net() calls
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (5 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 06/16] tcp: convert to dev_net_rcu() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 08/16] udp: convert to dev_net_rcu() Eric Dumazet
` (8 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
tcp4_check_fraglist_gro(), tcp6_check_fraglist_gro(),
udp4_gro_lookup_skb() and udp6_gro_lookup_skb()
assume RCU is held so that the net structure does not disappear.
Use dev_net_rcu() instead of dev_net() to get LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/tcp_offload.c | 2 +-
net/ipv4/udp_offload.c | 2 +-
net/ipv6/tcpv6_offload.c | 2 +-
net/ipv6/udp_offload.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2308665b51c5388814e5b61a262a1636d897c4a9..ecef16c58c07146cbeebade0620a5ec7251ddbc5 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -425,7 +425,7 @@ static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
inet_get_iif_sdif(skb, &iif, &sdif);
iph = skb_gro_network_header(skb);
- net = dev_net(skb->dev);
+ net = dev_net_rcu(skb->dev);
sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
iph->saddr, th->source,
iph->daddr, ntohs(th->dest),
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index a5be6e4ed326fbdc6a9b3889db4da903f7f25d37..c1a85b300ee87758ee683a834248a600a3e7f18d 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -630,7 +630,7 @@ static struct sock *udp4_gro_lookup_skb(struct sk_buff *skb, __be16 sport,
__be16 dport)
{
const struct iphdr *iph = skb_gro_network_header(skb);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
int iif, sdif;
inet_get_iif_sdif(skb, &iif, &sdif);
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index a45bf17cb2a172d4612cb42f51481b97bbf364cd..91b88daa5b555cb1af591db7680b7d829ce7b1b7 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -35,7 +35,7 @@ static void tcp6_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
inet6_get_iif_sdif(skb, &iif, &sdif);
hdr = skb_gro_network_header(skb);
- net = dev_net(skb->dev);
+ net = dev_net_rcu(skb->dev);
sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
&hdr->saddr, th->source,
&hdr->daddr, ntohs(th->dest),
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index b41152dd424697a9fc3cef13fbb430de49dcb913..404212dfc99abba4d48fc27a574b48ab53731d39 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -117,7 +117,7 @@ static struct sock *udp6_gro_lookup_skb(struct sk_buff *skb, __be16 sport,
__be16 dport)
{
const struct ipv6hdr *iph = skb_gro_network_header(skb);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
int iif, sdif;
inet6_get_iif_sdif(skb, &iif, &sdif);
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 08/16] udp: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (6 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 07/16] net: gro: convert four dev_net() calls Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 09/16] ipv4: icmp: " Eric Dumazet
` (7 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
UDP uses of dev_net() are safe, change them to dev_net_rcu()
to get LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/udp.c | 19 ++++++++++---------
net/ipv6/udp.c | 18 +++++++++---------
2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index a9bb9ce5438eaa9f9ceede1e4ac080dc6ab74588..fc1e37eb49190cb7e2671ebd54ac4fca54b77ac2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -750,7 +750,7 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
{
const struct iphdr *iph = ip_hdr(skb);
- return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport,
+ return __udp4_lib_lookup(dev_net_rcu(skb->dev), iph->saddr, sport,
iph->daddr, dport, inet_iif(skb),
inet_sdif(skb), udptable, skb);
}
@@ -760,7 +760,7 @@ struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
{
const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
int iif, sdif;
inet_get_iif_sdif(skb, &iif, &sdif);
@@ -934,13 +934,13 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
struct inet_sock *inet;
const struct iphdr *iph = (const struct iphdr *)skb->data;
struct udphdr *uh = (struct udphdr *)(skb->data+(iph->ihl<<2));
+ struct net *net = dev_net_rcu(skb->dev);
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
bool tunnel = false;
struct sock *sk;
int harderr;
int err;
- struct net *net = dev_net(skb->dev);
sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
iph->saddr, uh->source, skb->dev->ifindex,
@@ -1025,7 +1025,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
int udp_err(struct sk_buff *skb, u32 info)
{
- return __udp4_lib_err(skb, info, dev_net(skb->dev)->ipv4.udp_table);
+ return __udp4_lib_err(skb, info, dev_net_rcu(skb->dev)->ipv4.udp_table);
}
/*
@@ -2466,7 +2466,7 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
udp_post_segment_fix_csum(skb);
ret = udp_queue_rcv_one_skb(sk, skb);
if (ret > 0)
- ip_protocol_deliver_rcu(dev_net(skb->dev), skb, ret);
+ ip_protocol_deliver_rcu(dev_net_rcu(skb->dev), skb, ret);
}
return 0;
}
@@ -2632,12 +2632,12 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
int proto)
{
+ struct net *net = dev_net_rcu(skb->dev);
+ struct rtable *rt = skb_rtable(skb);
struct sock *sk = NULL;
struct udphdr *uh;
unsigned short ulen;
- struct rtable *rt = skb_rtable(skb);
__be32 saddr, daddr;
- struct net *net = dev_net(skb->dev);
bool refcounted;
int drop_reason;
@@ -2804,7 +2804,7 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
int udp_v4_early_demux(struct sk_buff *skb)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
struct in_device *in_dev = NULL;
const struct iphdr *iph;
const struct udphdr *uh;
@@ -2873,7 +2873,8 @@ int udp_v4_early_demux(struct sk_buff *skb)
int udp_rcv(struct sk_buff *skb)
{
- return __udp4_lib_rcv(skb, dev_net(skb->dev)->ipv4.udp_table, IPPROTO_UDP);
+ return __udp4_lib_rcv(skb, dev_net_rcu(skb->dev)->ipv4.udp_table,
+ IPPROTO_UDP);
}
void udp_destroy_sock(struct sock *sk)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c6ea438b5c7588edd2971997f21382c26446a45c..d0b8f724e4362ec35352dae547e916c912716cab 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -410,7 +410,7 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
{
const struct ipv6hdr *iph = ipv6_hdr(skb);
- return __udp6_lib_lookup(dev_net(skb->dev), &iph->saddr, sport,
+ return __udp6_lib_lookup(dev_net_rcu(skb->dev), &iph->saddr, sport,
&iph->daddr, dport, inet6_iif(skb),
inet6_sdif(skb), udptable, skb);
}
@@ -420,7 +420,7 @@ struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
{
const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
int iif, sdif;
inet6_get_iif_sdif(skb, &iif, &sdif);
@@ -702,16 +702,16 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
u8 type, u8 code, int offset, __be32 info,
struct udp_table *udptable)
{
- struct ipv6_pinfo *np;
const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data;
const struct in6_addr *saddr = &hdr->saddr;
const struct in6_addr *daddr = seg6_get_daddr(skb, opt) ? : &hdr->daddr;
struct udphdr *uh = (struct udphdr *)(skb->data+offset);
+ struct net *net = dev_net_rcu(skb->dev);
+ struct ipv6_pinfo *np;
bool tunnel = false;
struct sock *sk;
int harderr;
int err;
- struct net *net = dev_net(skb->dev);
sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
@@ -818,7 +818,7 @@ static __inline__ int udpv6_err(struct sk_buff *skb,
u8 code, int offset, __be32 info)
{
return __udp6_lib_err(skb, opt, type, code, offset, info,
- dev_net(skb->dev)->ipv4.udp_table);
+ dev_net_rcu(skb->dev)->ipv4.udp_table);
}
static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
@@ -929,7 +929,7 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
udp_post_segment_fix_csum(skb);
ret = udpv6_queue_rcv_one_skb(sk, skb);
if (ret > 0)
- ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,
+ ip6_protocol_deliver_rcu(dev_net_rcu(skb->dev), skb, ret,
true);
}
return 0;
@@ -1071,8 +1071,8 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
int proto)
{
enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
+ struct net *net = dev_net_rcu(skb->dev);
const struct in6_addr *saddr, *daddr;
- struct net *net = dev_net(skb->dev);
struct sock *sk = NULL;
struct udphdr *uh;
bool refcounted;
@@ -1220,7 +1220,7 @@ static struct sock *__udp6_lib_demux_lookup(struct net *net,
void udp_v6_early_demux(struct sk_buff *skb)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
const struct udphdr *uh;
struct sock *sk;
struct dst_entry *dst;
@@ -1262,7 +1262,7 @@ void udp_v6_early_demux(struct sk_buff *skb)
INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
{
- return __udp6_lib_rcv(skb, dev_net(skb->dev)->ipv4.udp_table, IPPROTO_UDP);
+ return __udp6_lib_rcv(skb, dev_net_rcu(skb->dev)->ipv4.udp_table, IPPROTO_UDP);
}
/*
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (7 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 08/16] udp: convert to dev_net_rcu() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 23:36 ` Jakub Kicinski
2025-02-03 14:30 ` [PATCH v2 net 10/16] ipv6: " Eric Dumazet
` (6 subsequent siblings)
15 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
ICMP uses of dev_net() are safe, change them to dev_net_rcu()
to get LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/icmp.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 094084b61bff8a17c4e85c99019b84e9cba21599..19bf8edd6759872fe667af82790b77b01212271b 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -401,7 +401,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
{
struct ipcm_cookie ipc;
struct rtable *rt = skb_rtable(skb);
- struct net *net = dev_net(rt->dst.dev);
+ struct net *net = dev_net_rcu(rt->dst.dev);
bool apply_ratelimit = false;
struct flowi4 fl4;
struct sock *sk;
@@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
goto out;
if (rt->dst.dev)
- net = dev_net(rt->dst.dev);
+ net = dev_net_rcu(rt->dst.dev);
else if (skb_in->dev)
- net = dev_net(skb_in->dev);
+ net = dev_net_rcu(skb_in->dev);
else
goto out;
@@ -834,7 +834,7 @@ static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
* avoid additional coding at protocol handlers.
*/
if (!pskb_may_pull(skb, iph->ihl * 4 + 8)) {
- __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS);
+ __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS);
return;
}
@@ -868,7 +868,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb)
struct net *net;
u32 info = 0;
- net = dev_net(skb_dst(skb)->dev);
+ net = dev_net_rcu(skb_dst(skb)->dev);
/*
* Incomplete header ?
@@ -979,7 +979,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb)
static enum skb_drop_reason icmp_redirect(struct sk_buff *skb)
{
if (skb->len < sizeof(struct iphdr)) {
- __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS);
+ __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS);
return SKB_DROP_REASON_PKT_TOO_SMALL;
}
@@ -1011,7 +1011,7 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb)
struct icmp_bxm icmp_param;
struct net *net;
- net = dev_net(skb_dst(skb)->dev);
+ net = dev_net_rcu(skb_dst(skb)->dev);
/* should there be an ICMP stat for ignored echos? */
if (READ_ONCE(net->ipv4.sysctl_icmp_echo_ignore_all))
return SKB_NOT_DROPPED_YET;
@@ -1040,9 +1040,9 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb)
bool icmp_build_probe(struct sk_buff *skb, struct icmphdr *icmphdr)
{
+ struct net *net = dev_net_rcu(skb->dev);
struct icmp_ext_hdr *ext_hdr, _ext_hdr;
struct icmp_ext_echo_iio *iio, _iio;
- struct net *net = dev_net(skb->dev);
struct inet6_dev *in6_dev;
struct in_device *in_dev;
struct net_device *dev;
@@ -1181,7 +1181,7 @@ static enum skb_drop_reason icmp_timestamp(struct sk_buff *skb)
return SKB_NOT_DROPPED_YET;
out_err:
- __ICMP_INC_STATS(dev_net(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
+ __ICMP_INC_STATS(dev_net_rcu(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
return SKB_DROP_REASON_PKT_TOO_SMALL;
}
@@ -1198,7 +1198,7 @@ int icmp_rcv(struct sk_buff *skb)
{
enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
struct rtable *rt = skb_rtable(skb);
- struct net *net = dev_net(rt->dst.dev);
+ struct net *net = dev_net_rcu(rt->dst.dev);
struct icmphdr *icmph;
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
@@ -1371,9 +1371,9 @@ int icmp_err(struct sk_buff *skb, u32 info)
struct iphdr *iph = (struct iphdr *)skb->data;
int offset = iph->ihl<<2;
struct icmphdr *icmph = (struct icmphdr *)(skb->data + offset);
+ struct net *net = dev_net_rcu(skb->dev);
int type = icmp_hdr(skb)->type;
int code = icmp_hdr(skb)->code;
- struct net *net = dev_net(skb->dev);
/*
* Use ping_err to handle all icmp errors except those
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-03 14:30 ` [PATCH v2 net 09/16] ipv4: icmp: " Eric Dumazet
@ 2025-02-03 23:36 ` Jakub Kicinski
2025-02-04 4:14 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2025-02-03 23:36 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
> @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> goto out;
>
> if (rt->dst.dev)
> - net = dev_net(rt->dst.dev);
> + net = dev_net_rcu(rt->dst.dev);
> else if (skb_in->dev)
> - net = dev_net(skb_in->dev);
> + net = dev_net_rcu(skb_in->dev);
> else
> goto out;
Hm. Weird. NIPA says this one is not under RCU.
[ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
[ 275.731033][ C1]
[ 275.731033][ C1] other info that might help us debug this:
[ 275.731033][ C1]
[ 275.731471][ C1]
[ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
[ 275.731799][ C1] 1 lock held by swapper/1/0:
[ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
[ 275.732354][ C1]
[ 275.732354][ C1] stack backtrace:
[ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
[ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 275.732646][ C1] Call Trace:
[ 275.732647][ C1] <IRQ>
[ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
[ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
[ 275.732678][ C1] __icmp_send+0xb0d/0x1580
[ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
[ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
[ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
[ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
[ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
[ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
[ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
[ 275.732778][ C1] ? hlock_class+0x4e/0x130
[ 275.732784][ C1] ? mark_lock+0x38/0x3e0
[ 275.732788][ C1] ? sock_put+0x1a/0x60
[ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
[ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
[ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
[ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
[ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
[ 275.732856][ C1] arp_error_report+0x96/0x170
[ 275.732862][ C1] neigh_invalidate+0x209/0x540
[ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
[ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
[ 275.732886][ C1] call_timer_fn+0x13b/0x230
[ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
[ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
[ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
[ 275.732902][ C1] ? mark_lock+0x38/0x3e0
[ 275.732920][ C1] __run_timers+0x545/0x810
[ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
[ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
[ 275.732939][ C1] ? __lock_release+0x103/0x460
[ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
[ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 275.732956][ C1] ? lock_acquire+0x32/0xc0
[ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
[ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
[ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
[ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
[ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
[ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
[ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
[ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
[ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
[ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
Decoded:
https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-03 23:36 ` Jakub Kicinski
@ 2025-02-04 4:14 ` Eric Dumazet
2025-02-04 4:57 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2025-02-04 4:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Tue, Feb 4, 2025 at 12:36 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
> > @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> > goto out;
> >
> > if (rt->dst.dev)
> > - net = dev_net(rt->dst.dev);
> > + net = dev_net_rcu(rt->dst.dev);
> > else if (skb_in->dev)
> > - net = dev_net(skb_in->dev);
> > + net = dev_net_rcu(skb_in->dev);
> > else
> > goto out;
>
> Hm. Weird. NIPA says this one is not under RCU.
>
> [ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
> [ 275.731033][ C1]
> [ 275.731033][ C1] other info that might help us debug this:
> [ 275.731033][ C1]
> [ 275.731471][ C1]
> [ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
> [ 275.731799][ C1] 1 lock held by swapper/1/0:
> [ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
> [ 275.732354][ C1]
> [ 275.732354][ C1] stack backtrace:
> [ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
> [ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 275.732646][ C1] Call Trace:
> [ 275.732647][ C1] <IRQ>
> [ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
> [ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
> [ 275.732678][ C1] __icmp_send+0xb0d/0x1580
> [ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
> [ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
> [ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
> [ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
> [ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
> [ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
> [ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
> [ 275.732778][ C1] ? hlock_class+0x4e/0x130
> [ 275.732784][ C1] ? mark_lock+0x38/0x3e0
> [ 275.732788][ C1] ? sock_put+0x1a/0x60
> [ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
> [ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
> [ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
> [ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
> [ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
> [ 275.732856][ C1] arp_error_report+0x96/0x170
> [ 275.732862][ C1] neigh_invalidate+0x209/0x540
> [ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
> [ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> [ 275.732886][ C1] call_timer_fn+0x13b/0x230
> [ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
> [ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
> [ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
> [ 275.732902][ C1] ? mark_lock+0x38/0x3e0
> [ 275.732920][ C1] __run_timers+0x545/0x810
> [ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> [ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
> [ 275.732939][ C1] ? __lock_release+0x103/0x460
> [ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
> [ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 275.732956][ C1] ? lock_acquire+0x32/0xc0
> [ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
> [ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
> [ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
> [ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
> [ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
> [ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
> [ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
> [ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
> [ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
> [ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
>
> Decoded:
> https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
Oops, I thought I ran the tests on the whole series. I missed this one.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-04 4:14 ` Eric Dumazet
@ 2025-02-04 4:57 ` Eric Dumazet
2025-02-04 10:35 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2025-02-04 4:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Tue, Feb 4, 2025 at 5:14 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Feb 4, 2025 at 12:36 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
> > > @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> > > goto out;
> > >
> > > if (rt->dst.dev)
> > > - net = dev_net(rt->dst.dev);
> > > + net = dev_net_rcu(rt->dst.dev);
> > > else if (skb_in->dev)
> > > - net = dev_net(skb_in->dev);
> > > + net = dev_net_rcu(skb_in->dev);
> > > else
> > > goto out;
> >
> > Hm. Weird. NIPA says this one is not under RCU.
> >
> > [ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
> > [ 275.731033][ C1]
> > [ 275.731033][ C1] other info that might help us debug this:
> > [ 275.731033][ C1]
> > [ 275.731471][ C1]
> > [ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
> > [ 275.731799][ C1] 1 lock held by swapper/1/0:
> > [ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
> > [ 275.732354][ C1]
> > [ 275.732354][ C1] stack backtrace:
> > [ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
> > [ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 275.732646][ C1] Call Trace:
> > [ 275.732647][ C1] <IRQ>
> > [ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
> > [ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
> > [ 275.732678][ C1] __icmp_send+0xb0d/0x1580
> > [ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
> > [ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
> > [ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
> > [ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
> > [ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
> > [ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
> > [ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
> > [ 275.732778][ C1] ? hlock_class+0x4e/0x130
> > [ 275.732784][ C1] ? mark_lock+0x38/0x3e0
> > [ 275.732788][ C1] ? sock_put+0x1a/0x60
> > [ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
> > [ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
> > [ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
> > [ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
> > [ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
> > [ 275.732856][ C1] arp_error_report+0x96/0x170
> > [ 275.732862][ C1] neigh_invalidate+0x209/0x540
> > [ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
> > [ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> > [ 275.732886][ C1] call_timer_fn+0x13b/0x230
> > [ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
> > [ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
> > [ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
> > [ 275.732902][ C1] ? mark_lock+0x38/0x3e0
> > [ 275.732920][ C1] __run_timers+0x545/0x810
> > [ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> > [ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
> > [ 275.732939][ C1] ? __lock_release+0x103/0x460
> > [ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
> > [ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
> > [ 275.732956][ C1] ? lock_acquire+0x32/0xc0
> > [ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
> > [ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
> > [ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
> > [ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
> > [ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
> > [ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
> > [ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
> > [ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
> > [ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
> > [ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
> >
> > Decoded:
> > https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
>
> Oops, I thought I ran the tests on the whole series. I missed this one.
BTW, ICMPv6 has the same potential problem, I will amend both cases.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-04 4:57 ` Eric Dumazet
@ 2025-02-04 10:35 ` Eric Dumazet
2025-02-04 16:21 ` Matthieu Baerts
2025-02-04 16:21 ` Jakub Kicinski
0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-04 10:35 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Tue, Feb 4, 2025 at 5:57 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Feb 4, 2025 at 5:14 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Feb 4, 2025 at 12:36 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
> > > > @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> > > > goto out;
> > > >
> > > > if (rt->dst.dev)
> > > > - net = dev_net(rt->dst.dev);
> > > > + net = dev_net_rcu(rt->dst.dev);
> > > > else if (skb_in->dev)
> > > > - net = dev_net(skb_in->dev);
> > > > + net = dev_net_rcu(skb_in->dev);
> > > > else
> > > > goto out;
> > >
> > > Hm. Weird. NIPA says this one is not under RCU.
> > >
> > > [ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
> > > [ 275.731033][ C1]
> > > [ 275.731033][ C1] other info that might help us debug this:
> > > [ 275.731033][ C1]
> > > [ 275.731471][ C1]
> > > [ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
> > > [ 275.731799][ C1] 1 lock held by swapper/1/0:
> > > [ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
> > > [ 275.732354][ C1]
> > > [ 275.732354][ C1] stack backtrace:
> > > [ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
> > > [ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > [ 275.732646][ C1] Call Trace:
> > > [ 275.732647][ C1] <IRQ>
> > > [ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
> > > [ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
> > > [ 275.732678][ C1] __icmp_send+0xb0d/0x1580
> > > [ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
> > > [ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
> > > [ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
> > > [ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
> > > [ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
> > > [ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
> > > [ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
> > > [ 275.732778][ C1] ? hlock_class+0x4e/0x130
> > > [ 275.732784][ C1] ? mark_lock+0x38/0x3e0
> > > [ 275.732788][ C1] ? sock_put+0x1a/0x60
> > > [ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
> > > [ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
> > > [ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
> > > [ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
> > > [ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
> > > [ 275.732856][ C1] arp_error_report+0x96/0x170
> > > [ 275.732862][ C1] neigh_invalidate+0x209/0x540
> > > [ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
> > > [ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> > > [ 275.732886][ C1] call_timer_fn+0x13b/0x230
> > > [ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
> > > [ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
> > > [ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
> > > [ 275.732902][ C1] ? mark_lock+0x38/0x3e0
> > > [ 275.732920][ C1] __run_timers+0x545/0x810
> > > [ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> > > [ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
> > > [ 275.732939][ C1] ? __lock_release+0x103/0x460
> > > [ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
> > > [ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
> > > [ 275.732956][ C1] ? lock_acquire+0x32/0xc0
> > > [ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
> > > [ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
> > > [ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
> > > [ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
> > > [ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
> > > [ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
> > > [ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
> > > [ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
> > > [ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
> > > [ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
> > >
> > > Decoded:
> > > https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
> >
> > Oops, I thought I ran the tests on the whole series. I missed this one.
>
> BTW, ICMPv6 has the same potential problem, I will amend both cases.
I ran again the tests for v3, got an unrelated crash, FYI.
14237.095216] #PF: supervisor instruction fetch in kernel mode
[14237.095570] #PF: error_code(0x0010) - not-present page
[14237.095915] PGD 1e58067 P4D 1e58067 PUD ce1c067 PMD 0
[14237.096991] Oops: Oops: 0010 [#1] SMP DEBUG_PAGEALLOC NOPTI
[14237.097507] CPU: 0 UID: 0 PID: 6371 Comm: python3 Not tainted
6.13.0-virtme #1559
[14237.098045] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[14237.098578] RIP: 0010:0x0
[14237.099324] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[14237.099752] RSP: 0018:ffffacfd4486bed0 EFLAGS: 00000286
[14237.100079] RAX: 0000000000000000 RBX: ffff9af502607200 RCX: 0000000000000002
[14237.100452] RDX: 00007fffc684a690 RSI: 0000000000005401 RDI: ffff9af502607200
[14237.100821] RBP: 0000000000005401 R08: 0000000000000001 R09: 0000000000000000
[14237.101182] R10: 0000000000000001 R11: 0000000000000000 R12: 00007fffc684a690
[14237.101542] R13: ffff9af50888ed68 R14: ffff9af502607200 R15: 0000000000000000
[14237.101956] FS: 00007f76b73f95c0(0000) GS:ffff9af57cc00000(0000)
knlGS:0000000000000000
[14237.102372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14237.102679] CR2: ffffffffffffffd6 CR3: 00000000039ca000 CR4: 00000000000006f0
[14237.103160] Call Trace:
[14237.103435] <TASK>
[14237.103720] ? __die_body.cold+0x19/0x26
[14237.104340] ? page_fault_oops+0x134/0x2a0
[14237.104553] ? cp_new_stat+0x157/0x190
[14237.104799] ? exc_page_fault+0x68/0x230
[14237.105013] ? asm_exc_page_fault+0x26/0x30
[14237.105259] full_proxy_unlocked_ioctl+0x63/0x90
[14237.105546] __x64_sys_ioctl+0x97/0xc0
[14237.105754] do_syscall_64+0x72/0x180
[14237.105949] entry_SYSCALL_64_after_hwframe+0x76/0x7e
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-04 10:35 ` Eric Dumazet
@ 2025-02-04 16:21 ` Matthieu Baerts
2025-02-04 16:56 ` Eric Dumazet
2025-02-04 16:21 ` Jakub Kicinski
1 sibling, 1 reply; 26+ messages in thread
From: Matthieu Baerts @ 2025-02-04 16:21 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
Hi Eric,
On 04/02/2025 11:35, Eric Dumazet wrote:
> On Tue, Feb 4, 2025 at 5:57 AM Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Tue, Feb 4, 2025 at 5:14 AM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Tue, Feb 4, 2025 at 12:36 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
>>>>> @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
>>>>> goto out;
>>>>>
>>>>> if (rt->dst.dev)
>>>>> - net = dev_net(rt->dst.dev);
>>>>> + net = dev_net_rcu(rt->dst.dev);
>>>>> else if (skb_in->dev)
>>>>> - net = dev_net(skb_in->dev);
>>>>> + net = dev_net_rcu(skb_in->dev);
>>>>> else
>>>>> goto out;
>>>>
>>>> Hm. Weird. NIPA says this one is not under RCU.
>>>>
>>>> [ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
>>>> [ 275.731033][ C1]
>>>> [ 275.731033][ C1] other info that might help us debug this:
>>>> [ 275.731033][ C1]
>>>> [ 275.731471][ C1]
>>>> [ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
>>>> [ 275.731799][ C1] 1 lock held by swapper/1/0:
>>>> [ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
>>>> [ 275.732354][ C1]
>>>> [ 275.732354][ C1] stack backtrace:
>>>> [ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
>>>> [ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>>> [ 275.732646][ C1] Call Trace:
>>>> [ 275.732647][ C1] <IRQ>
>>>> [ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
>>>> [ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
>>>> [ 275.732678][ C1] __icmp_send+0xb0d/0x1580
>>>> [ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
>>>> [ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
>>>> [ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
>>>> [ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
>>>> [ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
>>>> [ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
>>>> [ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
>>>> [ 275.732778][ C1] ? hlock_class+0x4e/0x130
>>>> [ 275.732784][ C1] ? mark_lock+0x38/0x3e0
>>>> [ 275.732788][ C1] ? sock_put+0x1a/0x60
>>>> [ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
>>>> [ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
>>>> [ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
>>>> [ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
>>>> [ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
>>>> [ 275.732856][ C1] arp_error_report+0x96/0x170
>>>> [ 275.732862][ C1] neigh_invalidate+0x209/0x540
>>>> [ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
>>>> [ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
>>>> [ 275.732886][ C1] call_timer_fn+0x13b/0x230
>>>> [ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
>>>> [ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
>>>> [ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
>>>> [ 275.732902][ C1] ? mark_lock+0x38/0x3e0
>>>> [ 275.732920][ C1] __run_timers+0x545/0x810
>>>> [ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
>>>> [ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
>>>> [ 275.732939][ C1] ? __lock_release+0x103/0x460
>>>> [ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
>>>> [ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
>>>> [ 275.732956][ C1] ? lock_acquire+0x32/0xc0
>>>> [ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
>>>> [ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
>>>> [ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
>>>> [ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
>>>> [ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
>>>> [ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
>>>> [ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
>>>> [ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
>>>> [ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
>>>> [ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
>>>>
>>>> Decoded:
>>>> https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
>>>
>>> Oops, I thought I ran the tests on the whole series. I missed this one.
>>
>> BTW, ICMPv6 has the same potential problem, I will amend both cases.
>
> I ran again the tests for v3, got an unrelated crash, FYI.
>
> 14237.095216] #PF: supervisor instruction fetch in kernel mode
> [14237.095570] #PF: error_code(0x0010) - not-present page
> [14237.095915] PGD 1e58067 P4D 1e58067 PUD ce1c067 PMD 0
> [14237.096991] Oops: Oops: 0010 [#1] SMP DEBUG_PAGEALLOC NOPTI
> [14237.097507] CPU: 0 UID: 0 PID: 6371 Comm: python3 Not tainted
> 6.13.0-virtme #1559
> [14237.098045] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [14237.098578] RIP: 0010:0x0
> [14237.099324] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> [14237.099752] RSP: 0018:ffffacfd4486bed0 EFLAGS: 00000286
> [14237.100079] RAX: 0000000000000000 RBX: ffff9af502607200 RCX: 0000000000000002
> [14237.100452] RDX: 00007fffc684a690 RSI: 0000000000005401 RDI: ffff9af502607200
> [14237.100821] RBP: 0000000000005401 R08: 0000000000000001 R09: 0000000000000000
> [14237.101182] R10: 0000000000000001 R11: 0000000000000000 R12: 00007fffc684a690
> [14237.101542] R13: ffff9af50888ed68 R14: ffff9af502607200 R15: 0000000000000000
> [14237.101956] FS: 00007f76b73f95c0(0000) GS:ffff9af57cc00000(0000)
> knlGS:0000000000000000
> [14237.102372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [14237.102679] CR2: ffffffffffffffd6 CR3: 00000000039ca000 CR4: 00000000000006f0
> [14237.103160] Call Trace:
> [14237.103435] <TASK>
> [14237.103720] ? __die_body.cold+0x19/0x26
> [14237.104340] ? page_fault_oops+0x134/0x2a0
> [14237.104553] ? cp_new_stat+0x157/0x190
> [14237.104799] ? exc_page_fault+0x68/0x230
> [14237.105013] ? asm_exc_page_fault+0x26/0x30
> [14237.105259] full_proxy_unlocked_ioctl+0x63/0x90
> [14237.105546] __x64_sys_ioctl+0x97/0xc0
> [14237.105754] do_syscall_64+0x72/0x180
> [14237.105949] entry_SYSCALL_64_after_hwframe+0x76/0x7e
I think I got this issue as well on MPTCP side, when using GCOV, but
something else using the debugfs could trigger that as well I guess. It
is apparently fixed in the Linus tree, see 57b314752ec0 ("debugfs: Fix
the missing initializations in __debugfs_file_get()")
https://lore.kernel.org/all/20250129191937.GR1977892@ZenIV/
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-04 16:21 ` Matthieu Baerts
@ 2025-02-04 16:56 ` Eric Dumazet
0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-04 16:56 UTC (permalink / raw)
To: Matthieu Baerts
Cc: Jakub Kicinski, David S . Miller, Paolo Abeni, netdev,
Kuniyuki Iwashima, Simon Horman, eric.dumazet
On Tue, Feb 4, 2025 at 5:21 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Eric,
>
> On 04/02/2025 11:35, Eric Dumazet wrote:
> > On Tue, Feb 4, 2025 at 5:57 AM Eric Dumazet <edumazet@google.com> wrote:
> >>
> >> On Tue, Feb 4, 2025 at 5:14 AM Eric Dumazet <edumazet@google.com> wrote:
> >>>
> >>> On Tue, Feb 4, 2025 at 12:36 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>>
> >>>> On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
> >>>>> @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> >>>>> goto out;
> >>>>>
> >>>>> if (rt->dst.dev)
> >>>>> - net = dev_net(rt->dst.dev);
> >>>>> + net = dev_net_rcu(rt->dst.dev);
> >>>>> else if (skb_in->dev)
> >>>>> - net = dev_net(skb_in->dev);
> >>>>> + net = dev_net_rcu(skb_in->dev);
> >>>>> else
> >>>>> goto out;
> >>>>
> >>>> Hm. Weird. NIPA says this one is not under RCU.
> >>>>
> >>>> [ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
> >>>> [ 275.731033][ C1]
> >>>> [ 275.731033][ C1] other info that might help us debug this:
> >>>> [ 275.731033][ C1]
> >>>> [ 275.731471][ C1]
> >>>> [ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
> >>>> [ 275.731799][ C1] 1 lock held by swapper/1/0:
> >>>> [ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
> >>>> [ 275.732354][ C1]
> >>>> [ 275.732354][ C1] stack backtrace:
> >>>> [ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
> >>>> [ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >>>> [ 275.732646][ C1] Call Trace:
> >>>> [ 275.732647][ C1] <IRQ>
> >>>> [ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
> >>>> [ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
> >>>> [ 275.732678][ C1] __icmp_send+0xb0d/0x1580
> >>>> [ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
> >>>> [ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
> >>>> [ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
> >>>> [ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
> >>>> [ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
> >>>> [ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
> >>>> [ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
> >>>> [ 275.732778][ C1] ? hlock_class+0x4e/0x130
> >>>> [ 275.732784][ C1] ? mark_lock+0x38/0x3e0
> >>>> [ 275.732788][ C1] ? sock_put+0x1a/0x60
> >>>> [ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
> >>>> [ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
> >>>> [ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
> >>>> [ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
> >>>> [ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
> >>>> [ 275.732856][ C1] arp_error_report+0x96/0x170
> >>>> [ 275.732862][ C1] neigh_invalidate+0x209/0x540
> >>>> [ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
> >>>> [ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> >>>> [ 275.732886][ C1] call_timer_fn+0x13b/0x230
> >>>> [ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
> >>>> [ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
> >>>> [ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
> >>>> [ 275.732902][ C1] ? mark_lock+0x38/0x3e0
> >>>> [ 275.732920][ C1] __run_timers+0x545/0x810
> >>>> [ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> >>>> [ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
> >>>> [ 275.732939][ C1] ? __lock_release+0x103/0x460
> >>>> [ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
> >>>> [ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
> >>>> [ 275.732956][ C1] ? lock_acquire+0x32/0xc0
> >>>> [ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
> >>>> [ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
> >>>> [ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
> >>>> [ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
> >>>> [ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
> >>>> [ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
> >>>> [ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
> >>>> [ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
> >>>> [ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
> >>>> [ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
> >>>>
> >>>> Decoded:
> >>>> https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
> >>>
> >>> Oops, I thought I ran the tests on the whole series. I missed this one.
> >>
> >> BTW, ICMPv6 has the same potential problem, I will amend both cases.
> >
> > I ran again the tests for v3, got an unrelated crash, FYI.
> >
> > 14237.095216] #PF: supervisor instruction fetch in kernel mode
> > [14237.095570] #PF: error_code(0x0010) - not-present page
> > [14237.095915] PGD 1e58067 P4D 1e58067 PUD ce1c067 PMD 0
> > [14237.096991] Oops: Oops: 0010 [#1] SMP DEBUG_PAGEALLOC NOPTI
> > [14237.097507] CPU: 0 UID: 0 PID: 6371 Comm: python3 Not tainted
> > 6.13.0-virtme #1559
> > [14237.098045] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [14237.098578] RIP: 0010:0x0
> > [14237.099324] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> > [14237.099752] RSP: 0018:ffffacfd4486bed0 EFLAGS: 00000286
> > [14237.100079] RAX: 0000000000000000 RBX: ffff9af502607200 RCX: 0000000000000002
> > [14237.100452] RDX: 00007fffc684a690 RSI: 0000000000005401 RDI: ffff9af502607200
> > [14237.100821] RBP: 0000000000005401 R08: 0000000000000001 R09: 0000000000000000
> > [14237.101182] R10: 0000000000000001 R11: 0000000000000000 R12: 00007fffc684a690
> > [14237.101542] R13: ffff9af50888ed68 R14: ffff9af502607200 R15: 0000000000000000
> > [14237.101956] FS: 00007f76b73f95c0(0000) GS:ffff9af57cc00000(0000)
> > knlGS:0000000000000000
> > [14237.102372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [14237.102679] CR2: ffffffffffffffd6 CR3: 00000000039ca000 CR4: 00000000000006f0
> > [14237.103160] Call Trace:
> > [14237.103435] <TASK>
> > [14237.103720] ? __die_body.cold+0x19/0x26
> > [14237.104340] ? page_fault_oops+0x134/0x2a0
> > [14237.104553] ? cp_new_stat+0x157/0x190
> > [14237.104799] ? exc_page_fault+0x68/0x230
> > [14237.105013] ? asm_exc_page_fault+0x26/0x30
> > [14237.105259] full_proxy_unlocked_ioctl+0x63/0x90
> > [14237.105546] __x64_sys_ioctl+0x97/0xc0
> > [14237.105754] do_syscall_64+0x72/0x180
> > [14237.105949] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> I think I got this issue as well on MPTCP side, when using GCOV, but
> something else using the debugfs could trigger that as well I guess. It
> is apparently fixed in the Linus tree, see 57b314752ec0 ("debugfs: Fix
> the missing initializations in __debugfs_file_get()")
>
> https://lore.kernel.org/all/20250129191937.GR1977892@ZenIV/
Great, thanks for the info :)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
2025-02-04 10:35 ` Eric Dumazet
2025-02-04 16:21 ` Matthieu Baerts
@ 2025-02-04 16:21 ` Jakub Kicinski
1 sibling, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2025-02-04 16:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Tue, 4 Feb 2025 11:35:46 +0100 Eric Dumazet wrote:
> > > Oops, I thought I ran the tests on the whole series. I missed this one.
> >
> > BTW, ICMPv6 has the same potential problem, I will amend both cases.
>
> I ran again the tests for v3, got an unrelated crash, FYI.
Yes, FWIW that's
https://lore.kernel.org/all/20250129191937.GR1977892@ZenIV/
we have the fix queued up locally in NIPA. Merge windows are fun!
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 net 10/16] ipv6: icmp: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (8 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 09/16] ipv4: icmp: " Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 11/16] ipv6: input: " Eric Dumazet
` (5 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
ICMP uses of dev_net() are safe, change them to dev_net_rcu()
to get LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv6/icmp.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index a6984a29fdb9dd972a11ca9f8d5e794c443bac6f..cb9ba5d8b6bab340fd4900f2fa99baa1ebeacb0f 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -76,7 +76,7 @@ static int icmpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
{
/* icmpv6_notify checks 8 bytes can be pulled, icmp6hdr is 8 bytes */
struct icmp6hdr *icmp6 = (struct icmp6hdr *) (skb->data + offset);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
if (type == ICMPV6_PKT_TOOBIG)
ip6_update_pmtu(skb, net, info, skb->dev->ifindex, 0, sock_net_uid(net, NULL));
@@ -473,7 +473,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
if (!skb->dev)
return;
- net = dev_net(skb->dev);
+ net = dev_net_rcu(skb->dev);
mark = IP6_REPLY_MARK(net, skb->mark);
/*
* Make sure we respect the rules
@@ -679,8 +679,8 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type,
skb_pull(skb2, nhs);
skb_reset_network_header(skb2);
- rt = rt6_lookup(dev_net(skb->dev), &ipv6_hdr(skb2)->saddr, NULL, 0,
- skb, 0);
+ rt = rt6_lookup(dev_net_rcu(skb->dev), &ipv6_hdr(skb2)->saddr,
+ NULL, 0, skb, 0);
if (rt && rt->dst.dev)
skb2->dev = rt->dst.dev;
@@ -717,7 +717,7 @@ EXPORT_SYMBOL(ip6_err_gen_icmpv6_unreach);
static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
struct sock *sk;
struct inet6_dev *idev;
struct ipv6_pinfo *np;
@@ -832,7 +832,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type,
u8 code, __be32 info)
{
struct inet6_skb_parm *opt = IP6CB(skb);
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
const struct inet6_protocol *ipprot;
enum skb_drop_reason reason;
int inner_offset;
@@ -889,7 +889,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type,
static int icmpv6_rcv(struct sk_buff *skb)
{
enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
struct net_device *dev = icmp6_dev(skb);
struct inet6_dev *idev = __in6_dev_get(dev);
const struct in6_addr *saddr, *daddr;
@@ -921,7 +921,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
skb_set_network_header(skb, nh);
}
- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INMSGS);
+ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INMSGS);
saddr = &ipv6_hdr(skb)->saddr;
daddr = &ipv6_hdr(skb)->daddr;
@@ -939,7 +939,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
type = hdr->icmp6_type;
- ICMP6MSGIN_INC_STATS(dev_net(dev), idev, type);
+ ICMP6MSGIN_INC_STATS(dev_net_rcu(dev), idev, type);
switch (type) {
case ICMPV6_ECHO_REQUEST:
@@ -1034,9 +1034,9 @@ static int icmpv6_rcv(struct sk_buff *skb)
csum_error:
reason = SKB_DROP_REASON_ICMP_CSUM;
- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_CSUMERRORS);
+ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_CSUMERRORS);
discard_it:
- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INERRORS);
+ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INERRORS);
drop_no_count:
kfree_skb_reason(skb, reason);
return 0;
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 11/16] ipv6: input: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (9 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 10/16] ipv6: " Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 12/16] ipv6: output: " Eric Dumazet
` (4 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
dev_net() calls from net/ipv6/ip6_input.c seem to
happen under RCU protection.
Convert them to dev_net_rcu() to ensure LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv6/ip6_input.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 70c0e16c0ae6837d1c64d0036829c8b61799578b..4030527ebe098e86764f37c9068d2f2f9af2d183 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -301,7 +301,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff *skb, struct net_device *dev,
int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
skb = ip6_rcv_core(skb, dev, net);
if (skb == NULL)
@@ -330,7 +330,7 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
list_for_each_entry_safe(skb, next, head, list) {
struct net_device *dev = skb->dev;
- struct net *net = dev_net(dev);
+ struct net *net = dev_net_rcu(dev);
skb_list_del_init(skb);
skb = ip6_rcv_core(skb, dev, net);
@@ -488,7 +488,7 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
int ip6_input(struct sk_buff *skb)
{
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
- dev_net(skb->dev), NULL, skb, skb->dev, NULL,
+ dev_net_rcu(skb->dev), NULL, skb, skb->dev, NULL,
ip6_input_finish);
}
EXPORT_SYMBOL_GPL(ip6_input);
@@ -500,14 +500,14 @@ int ip6_mc_input(struct sk_buff *skb)
struct net_device *dev;
bool deliver;
- __IP6_UPD_PO_STATS(dev_net(skb_dst(skb)->dev),
+ __IP6_UPD_PO_STATS(dev_net_rcu(skb_dst(skb)->dev),
__in6_dev_get_safely(skb->dev), IPSTATS_MIB_INMCAST,
skb->len);
/* skb->dev passed may be master dev for vrfs. */
if (sdif) {
rcu_read_lock();
- dev = dev_get_by_index_rcu(dev_net(skb->dev), sdif);
+ dev = dev_get_by_index_rcu(dev_net_rcu(skb->dev), sdif);
if (!dev) {
rcu_read_unlock();
kfree_skb(skb);
@@ -526,7 +526,7 @@ int ip6_mc_input(struct sk_buff *skb)
/*
* IPv6 multicast router mode is now supported ;)
*/
- if (atomic_read(&dev_net(skb->dev)->ipv6.devconf_all->mc_forwarding) &&
+ if (atomic_read(&dev_net_rcu(skb->dev)->ipv6.devconf_all->mc_forwarding) &&
!(ipv6_addr_type(&hdr->daddr) &
(IPV6_ADDR_LOOPBACK|IPV6_ADDR_LINKLOCAL)) &&
likely(!(IP6CB(skb)->flags & IP6SKB_FORWARDED))) {
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 12/16] ipv6: output: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (10 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 11/16] ipv6: input: " Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 13/16] ipv6: use RCU protection in ip6_default_advmss() Eric Dumazet
` (3 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
dev_net() calls from net/ipv6/ip6_output.c
and net/ipv6/output_core.c are happening under RCU
protection.
Convert them to dev_net_rcu() to ensure LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv6/ip6_output.c | 4 ++--
net/ipv6/output_core.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d577bf2f3053873d27b241029592cdbb0a124ad7..4c73a4cdcb23f76d81e572d5b1bd0f6902447c0e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -393,7 +393,7 @@ static int ip6_call_ra_chain(struct sk_buff *skb, int sel)
sk->sk_bound_dev_if == skb->dev->ifindex)) {
if (inet6_test_bit(RTALERT_ISOLATE, sk) &&
- !net_eq(sock_net(sk), dev_net(skb->dev))) {
+ !net_eq(sock_net(sk), dev_net_rcu(skb->dev))) {
continue;
}
if (last) {
@@ -503,7 +503,7 @@ int ip6_forward(struct sk_buff *skb)
struct dst_entry *dst = skb_dst(skb);
struct ipv6hdr *hdr = ipv6_hdr(skb);
struct inet6_skb_parm *opt = IP6CB(skb);
- struct net *net = dev_net(dst->dev);
+ struct net *net = dev_net_rcu(dst->dev);
struct inet6_dev *idev;
SKB_DR(reason);
u32 mtu;
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 806d4b5dd1e60b27726facbb59bbef97d6fee7f5..94438fd4f0e833bb8f5ea4822c7312376ea79304 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -113,7 +113,7 @@ int ip6_dst_hoplimit(struct dst_entry *dst)
if (idev)
hoplimit = READ_ONCE(idev->cnf.hop_limit);
else
- hoplimit = READ_ONCE(dev_net(dev)->ipv6.devconf_all->hop_limit);
+ hoplimit = READ_ONCE(dev_net_rcu(dev)->ipv6.devconf_all->hop_limit);
rcu_read_unlock();
}
return hoplimit;
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 13/16] ipv6: use RCU protection in ip6_default_advmss()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (11 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 12/16] ipv6: output: " Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 14/16] net: filter: convert to dev_net_rcu() Eric Dumazet
` (2 subsequent siblings)
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
ip6_default_advmss() needs rcu protection to make
sure the net structure it reads does not disappear.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv6/route.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 78362822b9070df138a0724dc76003b63026f9e2..ef2d23a1e3d532f5db37ca94ca482c5522dddffc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3196,13 +3196,18 @@ static unsigned int ip6_default_advmss(const struct dst_entry *dst)
{
struct net_device *dev = dst->dev;
unsigned int mtu = dst_mtu(dst);
- struct net *net = dev_net(dev);
+ struct net *net;
mtu -= sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
+ rcu_read_lock();
+
+ net = dev_net_rcu(dev);
if (mtu < net->ipv6.sysctl.ip6_rt_min_advmss)
mtu = net->ipv6.sysctl.ip6_rt_min_advmss;
+ rcu_read_unlock();
+
/*
* Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and
* corresponding MSS is IPV6_MAXPLEN - tcp_header_size.
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 14/16] net: filter: convert to dev_net_rcu()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (12 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 13/16] ipv6: use RCU protection in ip6_default_advmss() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 15/16] flow_dissector: use rcu protection to fetch dev_net() Eric Dumazet
2025-02-03 14:30 ` [PATCH v2 net 16/16] ipv4: use RCU protection in inet_select_addr() Eric Dumazet
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
All calls to dev_net() from net/core/filter.c are currently
done under rcu_read_lock().
Convert them to dev_net_rcu() to ensure LOCKDEP support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/filter.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 2ec162dd83c463640dcf3c151327206f519b217a..4db537a982d55fa9b42aaa70820cb337d5283299 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2244,7 +2244,7 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
struct bpf_nh_params *nh)
{
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
- struct net *net = dev_net(dev);
+ struct net *net = dev_net_rcu(dev);
int err, ret = NET_XMIT_DROP;
if (!nh) {
@@ -2348,7 +2348,7 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
struct bpf_nh_params *nh)
{
const struct iphdr *ip4h = ip_hdr(skb);
- struct net *net = dev_net(dev);
+ struct net *net = dev_net_rcu(dev);
int err, ret = NET_XMIT_DROP;
if (!nh) {
@@ -2438,7 +2438,7 @@ BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
if (unlikely(flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL)))
return -EINVAL;
- dev = dev_get_by_index_rcu(dev_net(skb->dev), ifindex);
+ dev = dev_get_by_index_rcu(dev_net_rcu(skb->dev), ifindex);
if (unlikely(!dev))
return -EINVAL;
@@ -2482,7 +2482,7 @@ static struct net_device *skb_get_peer_dev(struct net_device *dev)
int skb_do_redirect(struct sk_buff *skb)
{
struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
struct net_device *dev;
u32 flags = ri->flags;
@@ -2497,7 +2497,7 @@ int skb_do_redirect(struct sk_buff *skb)
dev = skb_get_peer_dev(dev);
if (unlikely(!dev ||
!(dev->flags & IFF_UP) ||
- net_eq(net, dev_net(dev))))
+ net_eq(net, dev_net_rcu(dev))))
goto out_drop;
skb->dev = dev;
dev_sw_netstats_rx_add(dev, skb->len);
@@ -4425,7 +4425,7 @@ __xdp_do_redirect_frame(struct bpf_redirect_info *ri, struct net_device *dev,
break;
case BPF_MAP_TYPE_UNSPEC:
if (map_id == INT_MAX) {
- fwd = dev_get_by_index_rcu(dev_net(dev), ri->tgt_index);
+ fwd = dev_get_by_index_rcu(dev_net_rcu(dev), ri->tgt_index);
if (unlikely(!fwd)) {
err = -EINVAL;
break;
@@ -4550,7 +4550,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
ri->map_type = BPF_MAP_TYPE_UNSPEC;
if (map_type == BPF_MAP_TYPE_UNSPEC && map_id == INT_MAX) {
- fwd = dev_get_by_index_rcu(dev_net(dev), ri->tgt_index);
+ fwd = dev_get_by_index_rcu(dev_net_rcu(dev), ri->tgt_index);
if (unlikely(!fwd)) {
err = -EINVAL;
goto err;
@@ -6203,12 +6203,12 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
switch (params->family) {
#if IS_ENABLED(CONFIG_INET)
case AF_INET:
- return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params,
+ return bpf_ipv4_fib_lookup(dev_net_rcu(ctx->rxq->dev), params,
flags, true);
#endif
#if IS_ENABLED(CONFIG_IPV6)
case AF_INET6:
- return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params,
+ return bpf_ipv6_fib_lookup(dev_net_rcu(ctx->rxq->dev), params,
flags, true);
#endif
}
@@ -6228,7 +6228,7 @@ static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
struct bpf_fib_lookup *, params, int, plen, u32, flags)
{
- struct net *net = dev_net(skb->dev);
+ struct net *net = dev_net_rcu(skb->dev);
int rc = -EAFNOSUPPORT;
bool check_mtu = false;
@@ -6283,7 +6283,7 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
static struct net_device *__dev_via_ifindex(struct net_device *dev_curr,
u32 ifindex)
{
- struct net *netns = dev_net(dev_curr);
+ struct net *netns = dev_net_rcu(dev_curr);
/* Non-redirect use-cases can use ifindex=0 and save ifindex lookup */
if (ifindex == 0)
@@ -6806,7 +6806,7 @@ bpf_skc_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
int ifindex;
if (skb->dev) {
- caller_net = dev_net(skb->dev);
+ caller_net = dev_net_rcu(skb->dev);
ifindex = skb->dev->ifindex;
} else {
caller_net = sock_net(skb->sk);
@@ -6906,7 +6906,7 @@ BPF_CALL_5(bpf_tc_skc_lookup_tcp, struct sk_buff *, skb,
{
struct net_device *dev = skb->dev;
int ifindex = dev->ifindex, sdif = dev_sdif(dev);
- struct net *caller_net = dev_net(dev);
+ struct net *caller_net = dev_net_rcu(dev);
return (unsigned long)__bpf_skc_lookup(skb, tuple, len, caller_net,
ifindex, IPPROTO_TCP, netns_id,
@@ -6930,7 +6930,7 @@ BPF_CALL_5(bpf_tc_sk_lookup_tcp, struct sk_buff *, skb,
{
struct net_device *dev = skb->dev;
int ifindex = dev->ifindex, sdif = dev_sdif(dev);
- struct net *caller_net = dev_net(dev);
+ struct net *caller_net = dev_net_rcu(dev);
return (unsigned long)__bpf_sk_lookup(skb, tuple, len, caller_net,
ifindex, IPPROTO_TCP, netns_id,
@@ -6954,7 +6954,7 @@ BPF_CALL_5(bpf_tc_sk_lookup_udp, struct sk_buff *, skb,
{
struct net_device *dev = skb->dev;
int ifindex = dev->ifindex, sdif = dev_sdif(dev);
- struct net *caller_net = dev_net(dev);
+ struct net *caller_net = dev_net_rcu(dev);
return (unsigned long)__bpf_sk_lookup(skb, tuple, len, caller_net,
ifindex, IPPROTO_UDP, netns_id,
@@ -6992,7 +6992,7 @@ BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
{
struct net_device *dev = ctx->rxq->dev;
int ifindex = dev->ifindex, sdif = dev_sdif(dev);
- struct net *caller_net = dev_net(dev);
+ struct net *caller_net = dev_net_rcu(dev);
return (unsigned long)__bpf_sk_lookup(NULL, tuple, len, caller_net,
ifindex, IPPROTO_UDP, netns_id,
@@ -7016,7 +7016,7 @@ BPF_CALL_5(bpf_xdp_skc_lookup_tcp, struct xdp_buff *, ctx,
{
struct net_device *dev = ctx->rxq->dev;
int ifindex = dev->ifindex, sdif = dev_sdif(dev);
- struct net *caller_net = dev_net(dev);
+ struct net *caller_net = dev_net_rcu(dev);
return (unsigned long)__bpf_skc_lookup(NULL, tuple, len, caller_net,
ifindex, IPPROTO_TCP, netns_id,
@@ -7040,7 +7040,7 @@ BPF_CALL_5(bpf_xdp_sk_lookup_tcp, struct xdp_buff *, ctx,
{
struct net_device *dev = ctx->rxq->dev;
int ifindex = dev->ifindex, sdif = dev_sdif(dev);
- struct net *caller_net = dev_net(dev);
+ struct net *caller_net = dev_net_rcu(dev);
return (unsigned long)__bpf_sk_lookup(NULL, tuple, len, caller_net,
ifindex, IPPROTO_TCP, netns_id,
@@ -7510,7 +7510,7 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
return -EINVAL;
if (!skb_at_tc_ingress(skb))
return -EOPNOTSUPP;
- if (unlikely(dev_net(skb->dev) != sock_net(sk)))
+ if (unlikely(dev_net_rcu(skb->dev) != sock_net(sk)))
return -ENETUNREACH;
if (sk_unhashed(sk))
return -EOPNOTSUPP;
@@ -11985,7 +11985,7 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
if (!skb_at_tc_ingress(skb))
return -EINVAL;
- net = dev_net(skb->dev);
+ net = dev_net_rcu(skb->dev);
if (net != sock_net(sk))
return -ENETUNREACH;
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v2 net 15/16] flow_dissector: use rcu protection to fetch dev_net()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (13 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 14/16] net: filter: convert to dev_net_rcu() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
2025-02-03 23:38 ` Jakub Kicinski
2025-02-03 14:30 ` [PATCH v2 net 16/16] ipv4: use RCU protection in inet_select_addr() Eric Dumazet
15 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
__skb_flow_dissect() can be called from arbitrary contexts.
It must extend its rcu protection section to include
the call to dev_net(), which can become dev_net_rcu().
This makes sure the net structure can not disappear under us.
Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case")flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/flow_dissector.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 0e638a37aa0961de6281deeed227b3e7ef70e546..5db41bf2ed93e0df721c216ca4557dad16aa5f83 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1108,10 +1108,12 @@ bool __skb_flow_dissect(const struct net *net,
FLOW_DISSECTOR_KEY_BASIC,
target_container);
+ rcu_read_lock();
+
if (skb) {
if (!net) {
if (skb->dev)
- net = dev_net(skb->dev);
+ net = dev_net_rcu(skb->dev);
else if (skb->sk)
net = sock_net(skb->sk);
}
@@ -1122,7 +1124,6 @@ bool __skb_flow_dissect(const struct net *net,
enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
struct bpf_prog_array *run_array;
- rcu_read_lock();
run_array = rcu_dereference(init_net.bpf.run_array[type]);
if (!run_array)
run_array = rcu_dereference(net->bpf.run_array[type]);
@@ -1150,17 +1151,17 @@ bool __skb_flow_dissect(const struct net *net,
prog = READ_ONCE(run_array->items[0].prog);
result = bpf_flow_dissect(prog, &ctx, n_proto, nhoff,
hlen, flags);
- if (result == BPF_FLOW_DISSECTOR_CONTINUE)
- goto dissect_continue;
- __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
- target_container);
- rcu_read_unlock();
- return result == BPF_OK;
+ if (result != BPF_FLOW_DISSECTOR_CONTINUE) {
+ __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
+ target_container);
+ rcu_read_unlock();
+ return result == BPF_OK;
+ }
}
-dissect_continue:
- rcu_read_unlock();
}
+ rcu_read_unlock();
+
if (dissector_uses_key(flow_dissector,
FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
struct ethhdr *eth = eth_hdr(skb);
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 15/16] flow_dissector: use rcu protection to fetch dev_net()
2025-02-03 14:30 ` [PATCH v2 net 15/16] flow_dissector: use rcu protection to fetch dev_net() Eric Dumazet
@ 2025-02-03 23:38 ` Jakub Kicinski
2025-02-04 4:16 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2025-02-03 23:38 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Mon, 3 Feb 2025 14:30:45 +0000 Eric Dumazet wrote:
> Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case")flow_dissect")
This Fixes tag looks corrupted, in case you need to post v3
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 net 15/16] flow_dissector: use rcu protection to fetch dev_net()
2025-02-03 23:38 ` Jakub Kicinski
@ 2025-02-04 4:16 ` Eric Dumazet
0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-04 4:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Kuniyuki Iwashima,
Simon Horman, eric.dumazet
On Tue, Feb 4, 2025 at 12:38 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 3 Feb 2025 14:30:45 +0000 Eric Dumazet wrote:
> > Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case")flow_dissect")
>
> This Fixes tag looks corrupted, in case you need to post v3
ACK, thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 net 16/16] ipv4: use RCU protection in inet_select_addr()
2025-02-03 14:30 [PATCH v2 net 00/16] net: first round to use dev_net_rcu() Eric Dumazet
` (14 preceding siblings ...)
2025-02-03 14:30 ` [PATCH v2 net 15/16] flow_dissector: use rcu protection to fetch dev_net() Eric Dumazet
@ 2025-02-03 14:30 ` Eric Dumazet
15 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2025-02-03 14:30 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Kuniyuki Iwashima, Simon Horman, eric.dumazet,
Eric Dumazet
inet_select_addr() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: c4544c724322 ("[NETNS]: Process inet_select_addr inside a namespace.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/devinet.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index c8b3cf5fba4c02941b919687a6a657cf68f5f99a..55b8151759bc9f76ebdbfae27544d6ee666a4809 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1371,10 +1371,11 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
__be32 addr = 0;
unsigned char localnet_scope = RT_SCOPE_HOST;
struct in_device *in_dev;
- struct net *net = dev_net(dev);
+ struct net *net;
int master_idx;
rcu_read_lock();
+ net = dev_net_rcu(dev);
in_dev = __in_dev_get_rcu(dev);
if (!in_dev)
goto no_in_dev;
--
2.48.1.362.g079036d154-goog
^ permalink raw reply related [flat|nested] 26+ messages in thread