* [PATCH net-next 01/14] ipv6: lockless IPV6_UNICAST_HOPS implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
@ 2023-09-12 16:01 ` Eric Dumazet
2023-09-14 14:51 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 02/14] ipv6: lockless IPV6_MULTICAST_LOOP implementation Eric Dumazet
` (14 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:01 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Some np->hop_limit accesses are racy, when socket lock is not held.
Add missing annotations and switch to full lockless implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 12 +-----------
include/net/ipv6.h | 2 +-
net/ipv6/ip6_output.c | 2 +-
net/ipv6/ipv6_sockglue.c | 20 +++++++++++---------
net/ipv6/mcast.c | 2 +-
net/ipv6/ndisc.c | 2 +-
6 files changed, 16 insertions(+), 24 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index af8a771a053c51eed297516f927a5fd003315ef4..c2e0870713849fbbf1a8ec2d60cca80caab0cb98 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -213,17 +213,7 @@ struct ipv6_pinfo {
__be32 flow_label;
__u32 frag_size;
- /*
- * Packed in 16bits.
- * Omit one shift by putting the signed field at MSB.
- */
-#if defined(__BIG_ENDIAN_BITFIELD)
- __s16 hop_limit:9;
- __u16 __unused_1:7;
-#else
- __u16 __unused_1:7;
- __s16 hop_limit:9;
-#endif
+ s16 hop_limit;
#if defined(__BIG_ENDIAN_BITFIELD)
/* Packed in 16bits. */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 0675be0f3fa0efc55575bb5b2569dc8a1dbb9f24..61007db0036482e27121747add0eec77f912b54a 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -911,7 +911,7 @@ static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo *np, struct flowi6 *fl6,
if (ipv6_addr_is_multicast(&fl6->daddr))
hlimit = np->mcast_hops;
else
- hlimit = np->hop_limit;
+ hlimit = READ_ONCE(np->hop_limit);
if (hlimit < 0)
hlimit = ip6_dst_hoplimit(dst);
return hlimit;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 54fc4c711f2c545f2ca625d6b0e09f2bb8e6d513..1e16d56d8c38ac51bd999038ae4e8478bf2f5f8c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -309,7 +309,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
* Fill in the IPv6 header
*/
if (np)
- hlimit = np->hop_limit;
+ hlimit = READ_ONCE(np->hop_limit);
if (hlimit < 0)
hlimit = ip6_dst_hoplimit(dst);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 0e2a0847b387f0f6f50211b89f92ac1e00a0b07a..f27993a1470dddd876f34f65c1f171c576eca272 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -415,6 +415,16 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
if (ip6_mroute_opt(optname))
return ip6_mroute_setsockopt(sk, optname, optval, optlen);
+ /* Handle options that can be set without locking the socket. */
+ switch (optname) {
+ case IPV6_UNICAST_HOPS:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val > 255 || val < -1)
+ return -EINVAL;
+ WRITE_ONCE(np->hop_limit, val);
+ return 0;
+ }
if (needs_rtnl)
rtnl_lock();
sockopt_lock_sock(sk);
@@ -733,14 +743,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
}
break;
}
- case IPV6_UNICAST_HOPS:
- if (optlen < sizeof(int))
- goto e_inval;
- if (val > 255 || val < -1)
- goto e_inval;
- np->hop_limit = val;
- retv = 0;
- break;
case IPV6_MULTICAST_HOPS:
if (sk->sk_type == SOCK_STREAM)
@@ -1347,7 +1349,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
struct dst_entry *dst;
if (optname == IPV6_UNICAST_HOPS)
- val = np->hop_limit;
+ val = READ_ONCE(np->hop_limit);
else
val = np->mcast_hops;
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 5ce25bcb9974de97f26635d0d3d54695af3070a7..6a33a50687bcf7201e75574f03e619fe89636068 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1716,7 +1716,7 @@ static void ip6_mc_hdr(const struct sock *sk, struct sk_buff *skb,
hdr->payload_len = htons(len);
hdr->nexthdr = proto;
- hdr->hop_limit = inet6_sk(sk)->hop_limit;
+ hdr->hop_limit = READ_ONCE(inet6_sk(sk)->hop_limit);
hdr->saddr = *saddr;
hdr->daddr = *daddr;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 553c8664e0a7a37d7858393ab6a30616ab13a3bf..b554fd40bdc3787eb3bafa1d9923076d6078217e 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -500,7 +500,7 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
csum_partial(icmp6h,
skb->len, 0));
- ip6_nd_hdr(skb, saddr, daddr, inet6_sk(sk)->hop_limit, skb->len);
+ ip6_nd_hdr(skb, saddr, daddr, READ_ONCE(inet6_sk(sk)->hop_limit), skb->len);
rcu_read_lock();
idev = __in6_dev_get(dst->dev);
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 01/14] ipv6: lockless IPV6_UNICAST_HOPS implementation
2023-09-12 16:01 ` [PATCH net-next 01/14] ipv6: lockless IPV6_UNICAST_HOPS implementation Eric Dumazet
@ 2023-09-14 14:51 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 14:51 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:01 AM, Eric Dumazet wrote:
> Some np->hop_limit accesses are racy, when socket lock is not held.
>
> Add missing annotations and switch to full lockless implementation.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 12 +-----------
> include/net/ipv6.h | 2 +-
> net/ipv6/ip6_output.c | 2 +-
> net/ipv6/ipv6_sockglue.c | 20 +++++++++++---------
> net/ipv6/mcast.c | 2 +-
> net/ipv6/ndisc.c | 2 +-
> 6 files changed, 16 insertions(+), 24 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 02/14] ipv6: lockless IPV6_MULTICAST_LOOP implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
2023-09-12 16:01 ` [PATCH net-next 01/14] ipv6: lockless IPV6_UNICAST_HOPS implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 14:54 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 03/14] ipv6: lockless IPV6_MULTICAST_HOPS implementation Eric Dumazet
` (13 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Add inet6_{test|set|clear|assign}_bit() helpers.
Note that I am using bits from inet->inet_flags,
this might change in the future if we need more flags.
While solving data-races accessing np->mc_loop,
this patch also allows to implement lockless accesses
to np->mcast_hops in the following patch.
Also constify sk_mc_loop() argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 18 ++++++++++++++----
include/net/inet_sock.h | 1 +
include/net/sock.h | 2 +-
net/core/sock.c | 4 ++--
net/ipv6/af_inet6.c | 2 +-
net/ipv6/ipv6_sockglue.c | 18 ++++++++----------
net/ipv6/ndisc.c | 2 +-
net/netfilter/ipvs/ip_vs_sync.c | 8 ++------
8 files changed, 30 insertions(+), 25 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index c2e0870713849fbbf1a8ec2d60cca80caab0cb98..68cf1ca949141e419abf2031db2b42105b821ab0 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -218,11 +218,9 @@ struct ipv6_pinfo {
#if defined(__BIG_ENDIAN_BITFIELD)
/* Packed in 16bits. */
__s16 mcast_hops:9;
- __u16 __unused_2:6,
- mc_loop:1;
+ __u16 __unused_2:7,
#else
- __u16 mc_loop:1,
- __unused_2:6;
+ __u16 __unused_2:7;
__s16 mcast_hops:9;
#endif
int ucast_oif;
@@ -283,6 +281,18 @@ struct ipv6_pinfo {
struct inet6_cork cork;
};
+/* We currently use available bits from inet_sk(sk)->inet_flags,
+ * this could change in the future.
+ */
+#define inet6_test_bit(nr, sk) \
+ test_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
+#define inet6_set_bit(nr, sk) \
+ set_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
+#define inet6_clear_bit(nr, sk) \
+ clear_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
+#define inet6_assign_bit(nr, sk, val) \
+ assign_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags, val)
+
/* WARNING: don't change the layout of the members in {raw,udp,tcp}6_sock! */
struct raw6_sock {
/* inet_sock has to be the first member of raw6_sock */
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 2de0e4d4a027889706323b7ee4b96e406101bff4..b5a9dca92fb45425c032bdf08bfa88cad77926b8 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -268,6 +268,7 @@ enum {
INET_FLAGS_NODEFRAG = 17,
INET_FLAGS_BIND_ADDRESS_NO_PORT = 18,
INET_FLAGS_DEFER_CONNECT = 19,
+ INET_FLAGS_MC6_LOOP = 20,
};
/* cmsg flags for inet */
diff --git a/include/net/sock.h b/include/net/sock.h
index b770261fbdaf59d4d1c0b30adb2592c56442e9e3..9e1c17e56971f8714d421d58e408bf3face421b0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2239,7 +2239,7 @@ static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
}
}
-bool sk_mc_loop(struct sock *sk);
+bool sk_mc_loop(const struct sock *sk);
static inline bool sk_can_gso(const struct sock *sk)
{
diff --git a/net/core/sock.c b/net/core/sock.c
index 16584e2dd6481a3fc28d796db785439f0446703b..b2a9b5630bb513d5e5b99a6b7d3cef54af3a4b6f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -759,7 +759,7 @@ static int sock_getbindtodevice(struct sock *sk, sockptr_t optval,
return ret;
}
-bool sk_mc_loop(struct sock *sk)
+bool sk_mc_loop(const struct sock *sk)
{
if (dev_recursion_level())
return false;
@@ -771,7 +771,7 @@ bool sk_mc_loop(struct sock *sk)
return inet_test_bit(MC_LOOP, sk);
#if IS_ENABLED(CONFIG_IPV6)
case AF_INET6:
- return inet6_sk(sk)->mc_loop;
+ return inet6_test_bit(MC6_LOOP, sk);
#endif
}
WARN_ON_ONCE(1);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 368824fe9719f92b46512f3f78446fe5bc802ef7..bbd4aa1b96d09d346c521dab2194045123e7a5a6 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -217,7 +217,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
inet_sk(sk)->pinet6 = np = inet6_sk_generic(sk);
np->hop_limit = -1;
np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
- np->mc_loop = 1;
+ inet6_set_bit(MC6_LOOP, sk);
np->mc_all = 1;
np->pmtudisc = IPV6_PMTUDISC_WANT;
np->repflow = net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_ESTABLISHED;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index f27993a1470dddd876f34f65c1f171c576eca272..755fac85a120de44272f685529b579e7118d306b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -424,6 +424,13 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
WRITE_ONCE(np->hop_limit, val);
return 0;
+ case IPV6_MULTICAST_LOOP:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val != valbool)
+ return -EINVAL;
+ inet6_assign_bit(MC6_LOOP, sk, valbool);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -755,15 +762,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
retv = 0;
break;
- case IPV6_MULTICAST_LOOP:
- if (optlen < sizeof(int))
- goto e_inval;
- if (val != valbool)
- goto e_inval;
- np->mc_loop = valbool;
- retv = 0;
- break;
-
case IPV6_UNICAST_IF:
{
struct net_device *dev = NULL;
@@ -1367,7 +1365,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
}
case IPV6_MULTICAST_LOOP:
- val = np->mc_loop;
+ val = inet6_test_bit(MC6_LOOP, sk);
break;
case IPV6_MULTICAST_IF:
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index b554fd40bdc3787eb3bafa1d9923076d6078217e..679443d7ecb586af17fa22f9ecf573318a6ac49d 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1996,7 +1996,7 @@ static int __net_init ndisc_net_init(struct net *net)
np = inet6_sk(sk);
np->hop_limit = 255;
/* Do not loopback ndisc messages */
- np->mc_loop = 0;
+ inet6_clear_bit(MC6_LOOP, sk);
return 0;
}
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index da5af28ff57b5254c0ec8976c4180113037c96a0..3c2251cabd0439834ca0fc2b8bbf0ecc6cfe9266 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1298,17 +1298,13 @@ static void set_sock_size(struct sock *sk, int mode, int val)
static void set_mcast_loop(struct sock *sk, u_char loop)
{
/* setsockopt(sock, SOL_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop)); */
- lock_sock(sk);
inet_assign_bit(MC_LOOP, sk, loop);
#ifdef CONFIG_IP_VS_IPV6
- if (sk->sk_family == AF_INET6) {
- struct ipv6_pinfo *np = inet6_sk(sk);
-
+ if (READ_ONCE(sk->sk_family) == AF_INET6) {
/* IPV6_MULTICAST_LOOP */
- np->mc_loop = loop ? 1 : 0;
+ inet6_assign_bit(MC6_LOOP, sk, loop);
}
#endif
- release_sock(sk);
}
/*
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 02/14] ipv6: lockless IPV6_MULTICAST_LOOP implementation
2023-09-12 16:02 ` [PATCH net-next 02/14] ipv6: lockless IPV6_MULTICAST_LOOP implementation Eric Dumazet
@ 2023-09-14 14:54 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 14:54 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> Add inet6_{test|set|clear|assign}_bit() helpers.
>
> Note that I am using bits from inet->inet_flags,
> this might change in the future if we need more flags.
>
> While solving data-races accessing np->mc_loop,
> this patch also allows to implement lockless accesses
> to np->mcast_hops in the following patch.
>
> Also constify sk_mc_loop() argument.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 18 ++++++++++++++----
> include/net/inet_sock.h | 1 +
> include/net/sock.h | 2 +-
> net/core/sock.c | 4 ++--
> net/ipv6/af_inet6.c | 2 +-
> net/ipv6/ipv6_sockglue.c | 18 ++++++++----------
> net/ipv6/ndisc.c | 2 +-
> net/netfilter/ipvs/ip_vs_sync.c | 8 ++------
> 8 files changed, 30 insertions(+), 25 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 03/14] ipv6: lockless IPV6_MULTICAST_HOPS implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
2023-09-12 16:01 ` [PATCH net-next 01/14] ipv6: lockless IPV6_UNICAST_HOPS implementation Eric Dumazet
2023-09-12 16:02 ` [PATCH net-next 02/14] ipv6: lockless IPV6_MULTICAST_LOOP implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 14:55 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 04/14] ipv6: lockless IPV6_MTU implementation Eric Dumazet
` (12 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
This fixes data-races around np->mcast_hops,
and make IPV6_MULTICAST_HOPS lockless.
Note that np->mcast_hops is never negative,
thus can fit an u8 field instead of s16.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 9 +--------
include/net/ipv6.h | 2 +-
net/dccp/ipv6.c | 2 +-
net/ipv6/ipv6_sockglue.c | 28 +++++++++++++++-------------
net/ipv6/tcp_ipv6.c | 3 ++-
net/netfilter/ipvs/ip_vs_sync.c | 2 +-
6 files changed, 21 insertions(+), 25 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 68cf1ca949141e419abf2031db2b42105b821ab0..9cc278b5e4f42ce097e57ecd95a50479a947fd82 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -214,15 +214,8 @@ struct ipv6_pinfo {
__u32 frag_size;
s16 hop_limit;
+ u8 mcast_hops;
-#if defined(__BIG_ENDIAN_BITFIELD)
- /* Packed in 16bits. */
- __s16 mcast_hops:9;
- __u16 __unused_2:7,
-#else
- __u16 __unused_2:7;
- __s16 mcast_hops:9;
-#endif
int ucast_oif;
int mcast_oif;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 61007db0036482e27121747add0eec77f912b54a..0af1a7565a3602e4deb68762267cba454750341e 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -909,7 +909,7 @@ static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo *np, struct flowi6 *fl6,
int hlimit;
if (ipv6_addr_is_multicast(&fl6->daddr))
- hlimit = np->mcast_hops;
+ hlimit = READ_ONCE(np->mcast_hops);
else
hlimit = READ_ONCE(np->hop_limit);
if (hlimit < 0)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 33f6ccf6ba77b9bcc24054b09857aaee4bb71acf..83617a16b98e70aa577c08a394df63e006e53e9e 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -676,7 +676,7 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
if (np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo)
np->mcast_oif = inet6_iif(opt_skb);
if (np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim)
- np->mcast_hops = ipv6_hdr(opt_skb)->hop_limit;
+ WRITE_ONCE(np->mcast_hops, ipv6_hdr(opt_skb)->hop_limit);
if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass)
np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
if (np->repflow)
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 755fac85a120de44272f685529b579e7118d306b..5fff19a87c75518358bae067dfeb227d6738bb03 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -431,6 +431,16 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet6_assign_bit(MC6_LOOP, sk, valbool);
return 0;
+ case IPV6_MULTICAST_HOPS:
+ if (sk->sk_type == SOCK_STREAM)
+ return retv;
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val > 255 || val < -1)
+ return -EINVAL;
+ WRITE_ONCE(np->mcast_hops,
+ val == -1 ? IPV6_DEFAULT_MCASTHOPS : val);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -751,16 +761,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
break;
}
- case IPV6_MULTICAST_HOPS:
- if (sk->sk_type == SOCK_STREAM)
- break;
- if (optlen < sizeof(int))
- goto e_inval;
- if (val > 255 || val < -1)
- goto e_inval;
- np->mcast_hops = (val == -1 ? IPV6_DEFAULT_MCASTHOPS : val);
- retv = 0;
- break;
case IPV6_UNICAST_IF:
{
@@ -1180,7 +1180,8 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
put_cmsg(&msg, SOL_IPV6, IPV6_PKTINFO, sizeof(src_info), &src_info);
}
if (np->rxopt.bits.rxhlim) {
- int hlim = np->mcast_hops;
+ int hlim = READ_ONCE(np->mcast_hops);
+
put_cmsg(&msg, SOL_IPV6, IPV6_HOPLIMIT, sizeof(hlim), &hlim);
}
if (np->rxopt.bits.rxtclass) {
@@ -1197,7 +1198,8 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
put_cmsg(&msg, SOL_IPV6, IPV6_2292PKTINFO, sizeof(src_info), &src_info);
}
if (np->rxopt.bits.rxohlim) {
- int hlim = np->mcast_hops;
+ int hlim = READ_ONCE(np->mcast_hops);
+
put_cmsg(&msg, SOL_IPV6, IPV6_2292HOPLIMIT, sizeof(hlim), &hlim);
}
if (np->rxopt.bits.rxflow) {
@@ -1349,7 +1351,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
if (optname == IPV6_UNICAST_HOPS)
val = READ_ONCE(np->hop_limit);
else
- val = np->mcast_hops;
+ val = READ_ONCE(np->mcast_hops);
if (val < 0) {
rcu_read_lock();
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 3a88545a265d6bd064ecc41d33c9541a75fe0f4d..54db5fab318bc68cf9efbe6f26dacba614fa8562 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1542,7 +1542,8 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
if (np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo)
np->mcast_oif = tcp_v6_iif(opt_skb);
if (np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim)
- np->mcast_hops = ipv6_hdr(opt_skb)->hop_limit;
+ WRITE_ONCE(np->mcast_hops,
+ ipv6_hdr(opt_skb)->hop_limit);
if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass)
np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
if (np->repflow)
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 3c2251cabd0439834ca0fc2b8bbf0ecc6cfe9266..df1b33b61059eef1e86baefc63e138108a50a081 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1322,7 +1322,7 @@ static void set_mcast_ttl(struct sock *sk, u_char ttl)
struct ipv6_pinfo *np = inet6_sk(sk);
/* IPV6_MULTICAST_HOPS */
- np->mcast_hops = ttl;
+ WRITE_ONCE(np->mcast_hops, ttl);
}
#endif
release_sock(sk);
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 03/14] ipv6: lockless IPV6_MULTICAST_HOPS implementation
2023-09-12 16:02 ` [PATCH net-next 03/14] ipv6: lockless IPV6_MULTICAST_HOPS implementation Eric Dumazet
@ 2023-09-14 14:55 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 14:55 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> This fixes data-races around np->mcast_hops,
> and make IPV6_MULTICAST_HOPS lockless.
>
> Note that np->mcast_hops is never negative,
> thus can fit an u8 field instead of s16.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 9 +--------
> include/net/ipv6.h | 2 +-
> net/dccp/ipv6.c | 2 +-
> net/ipv6/ipv6_sockglue.c | 28 +++++++++++++++-------------
> net/ipv6/tcp_ipv6.c | 3 ++-
> net/netfilter/ipvs/ip_vs_sync.c | 2 +-
> 6 files changed, 21 insertions(+), 25 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 04/14] ipv6: lockless IPV6_MTU implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (2 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 03/14] ipv6: lockless IPV6_MULTICAST_HOPS implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 14:58 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 05/14] ipv6: lockless IPV6_MINHOPCOUNT implementation Eric Dumazet
` (11 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
np->frag_size can be read/written without holding socket lock.
Add missing annotations and make IPV6_MTU setsockopt() lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv6/ip6_output.c | 19 +++++++++++--------
net/ipv6/ipv6_sockglue.c | 15 +++++++--------
2 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1e16d56d8c38ac51bd999038ae4e8478bf2f5f8c..ab7ede4a731a96fe6dce3205df29b298c923acc7 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -881,9 +881,11 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
mtu = IPV6_MIN_MTU;
}
- if (np && np->frag_size < mtu) {
- if (np->frag_size)
- mtu = np->frag_size;
+ if (np) {
+ u32 frag_size = READ_ONCE(np->frag_size);
+
+ if (frag_size && frag_size < mtu)
+ mtu = frag_size;
}
if (mtu < hlen + sizeof(struct frag_hdr) + 8)
goto fail_toobig;
@@ -1392,7 +1394,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
struct rt6_info *rt)
{
struct ipv6_pinfo *np = inet6_sk(sk);
- unsigned int mtu;
+ unsigned int mtu, frag_size;
struct ipv6_txoptions *nopt, *opt = ipc6->opt;
/* callers pass dst together with a reference, set it first so
@@ -1441,10 +1443,11 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
else
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst));
- if (np->frag_size < mtu) {
- if (np->frag_size)
- mtu = np->frag_size;
- }
+
+ frag_size = READ_ONCE(np->frag_size);
+ if (frag_size && frag_size < mtu)
+ mtu = frag_size;
+
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 5fff19a87c75518358bae067dfeb227d6738bb03..3b2a34828daab5c666d7b429afa961279739c70b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -441,6 +441,13 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
WRITE_ONCE(np->mcast_hops,
val == -1 ? IPV6_DEFAULT_MCASTHOPS : val);
return 0;
+ case IPV6_MTU:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val && val < IPV6_MIN_MTU)
+ return -EINVAL;
+ WRITE_ONCE(np->frag_size, val);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -910,14 +917,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
np->pmtudisc = val;
retv = 0;
break;
- case IPV6_MTU:
- if (optlen < sizeof(int))
- goto e_inval;
- if (val && val < IPV6_MIN_MTU)
- goto e_inval;
- np->frag_size = val;
- retv = 0;
- break;
case IPV6_RECVERR:
if (optlen < sizeof(int))
goto e_inval;
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 04/14] ipv6: lockless IPV6_MTU implementation
2023-09-12 16:02 ` [PATCH net-next 04/14] ipv6: lockless IPV6_MTU implementation Eric Dumazet
@ 2023-09-14 14:58 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 14:58 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> np->frag_size can be read/written without holding socket lock.
>
> Add missing annotations and make IPV6_MTU setsockopt() lockless.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> net/ipv6/ip6_output.c | 19 +++++++++++--------
> net/ipv6/ipv6_sockglue.c | 15 +++++++--------
> 2 files changed, 18 insertions(+), 16 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 05/14] ipv6: lockless IPV6_MINHOPCOUNT implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (3 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 04/14] ipv6: lockless IPV6_MTU implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:01 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 06/14] ipv6: lockless IPV6_RECVERR_RFC4884 implementation Eric Dumazet
` (10 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Add one missing READ_ONCE() annotation in do_ipv6_getsockopt()
and make IPV6_MINHOPCOUNT setsockopt() lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv6/ipv6_sockglue.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 3b2a34828daab5c666d7b429afa961279739c70b..bbc8a009e05d3de49868e1ccf469a12bc31b8e22 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -448,6 +448,20 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
WRITE_ONCE(np->frag_size, val);
return 0;
+ case IPV6_MINHOPCOUNT:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val < 0 || val > 255)
+ return -EINVAL;
+
+ if (val)
+ static_branch_enable(&ip6_min_hopcount);
+
+ /* tcp_v6_err() and tcp_v6_rcv() might read min_hopcount
+ * while we are changing it.
+ */
+ WRITE_ONCE(np->min_hopcount, val);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -947,21 +961,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
retv = __ip6_sock_set_addr_preferences(sk, val);
break;
- case IPV6_MINHOPCOUNT:
- if (optlen < sizeof(int))
- goto e_inval;
- if (val < 0 || val > 255)
- goto e_inval;
-
- if (val)
- static_branch_enable(&ip6_min_hopcount);
-
- /* tcp_v6_err() and tcp_v6_rcv() might read min_hopcount
- * while we are changing it.
- */
- WRITE_ONCE(np->min_hopcount, val);
- retv = 0;
- break;
case IPV6_DONTFRAG:
np->dontfrag = valbool;
retv = 0;
@@ -1443,7 +1442,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_MINHOPCOUNT:
- val = np->min_hopcount;
+ val = READ_ONCE(np->min_hopcount);
break;
case IPV6_DONTFRAG:
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* [PATCH net-next 06/14] ipv6: lockless IPV6_RECVERR_RFC4884 implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (4 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 05/14] ipv6: lockless IPV6_MINHOPCOUNT implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:02 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 07/14] ipv6: lockless IPV6_MULTICAST_ALL implementation Eric Dumazet
` (9 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Move np->recverr_rfc4884 to an atomic flag to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 1 -
include/net/inet_sock.h | 1 +
net/ipv6/datagram.c | 2 +-
net/ipv6/ipv6_sockglue.c | 17 ++++++++---------
4 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 9cc278b5e4f42ce097e57ecd95a50479a947fd82..0d2b0a1b2daeaee51a03624adab5a385cc852cc7 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -256,7 +256,6 @@ struct ipv6_pinfo {
autoflowlabel:1,
autoflowlabel_set:1,
mc_all:1,
- recverr_rfc4884:1,
rtalert_isolate:1;
__u8 min_hopcount;
__u8 tclass;
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index b5a9dca92fb45425c032bdf08bfa88cad77926b8..8cf1f7b442348bef83cc3d9648521a01667efae7 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -269,6 +269,7 @@ enum {
INET_FLAGS_BIND_ADDRESS_NO_PORT = 18,
INET_FLAGS_DEFER_CONNECT = 19,
INET_FLAGS_MC6_LOOP = 20,
+ INET_FLAGS_RECVERR6_RFC4884 = 21,
};
/* cmsg flags for inet */
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 41ebc4e574734456357169e883c3d13e42fa66b2..e81892814935fb3934fbf0e6f9defc702ec29152 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -332,7 +332,7 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
__skb_pull(skb, payload - skb->data);
- if (inet6_sk(sk)->recverr_rfc4884)
+ if (inet6_test_bit(RECVERR6_RFC4884, sk))
ipv6_icmp_error_rfc4884(skb, &serr->ee.ee_rfc4884);
skb_reset_transport_header(skb);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index bbc8a009e05d3de49868e1ccf469a12bc31b8e22..b65e73ac2ccdee79aa293948d3ba9853966e1e2d 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -462,6 +462,13 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
*/
WRITE_ONCE(np->min_hopcount, val);
return 0;
+ case IPV6_RECVERR_RFC4884:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val < 0 || val > 1)
+ return -EINVAL;
+ inet6_assign_bit(RECVERR6_RFC4884, sk, valbool);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -974,14 +981,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
np->rxopt.bits.recvfragsize = valbool;
retv = 0;
break;
- case IPV6_RECVERR_RFC4884:
- if (optlen < sizeof(int))
- goto e_inval;
- if (val < 0 || val > 1)
- goto e_inval;
- np->recverr_rfc4884 = valbool;
- retv = 0;
- break;
}
unlock:
@@ -1462,7 +1461,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_RECVERR_RFC4884:
- val = np->recverr_rfc4884;
+ val = inet6_test_bit(RECVERR6_RFC4884, sk);
break;
default:
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* [PATCH net-next 07/14] ipv6: lockless IPV6_MULTICAST_ALL implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (5 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 06/14] ipv6: lockless IPV6_RECVERR_RFC4884 implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:03 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 08/14] ipv6: lockless IPV6_AUTOFLOWLABEL implementation Eric Dumazet
` (8 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Move np->mc_all to an atomic flags to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 1 -
include/net/inet_sock.h | 1 +
net/ipv6/af_inet6.c | 2 +-
net/ipv6/ipv6_sockglue.c | 14 ++++++--------
net/ipv6/mcast.c | 2 +-
5 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 0d2b0a1b2daeaee51a03624adab5a385cc852cc7..d88e91b7f0a319a816488025ef213c4fb90ed359 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -255,7 +255,6 @@ struct ipv6_pinfo {
dontfrag:1,
autoflowlabel:1,
autoflowlabel_set:1,
- mc_all:1,
rtalert_isolate:1;
__u8 min_hopcount;
__u8 tclass;
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 8cf1f7b442348bef83cc3d9648521a01667efae7..97e70a97dae888e6ab93c6446f4f3ba58cd8583e 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -270,6 +270,7 @@ enum {
INET_FLAGS_DEFER_CONNECT = 19,
INET_FLAGS_MC6_LOOP = 20,
INET_FLAGS_RECVERR6_RFC4884 = 21,
+ INET_FLAGS_MC6_ALL = 22,
};
/* cmsg flags for inet */
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index bbd4aa1b96d09d346c521dab2194045123e7a5a6..372fb7b9112c8dfed09b6ddfdb37016a1a668494 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -218,7 +218,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
np->hop_limit = -1;
np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
inet6_set_bit(MC6_LOOP, sk);
- np->mc_all = 1;
+ inet6_set_bit(MC6_ALL, sk);
np->pmtudisc = IPV6_PMTUDISC_WANT;
np->repflow = net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_ESTABLISHED;
sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index b65e73ac2ccdee79aa293948d3ba9853966e1e2d..7a181831f226c67813446145f8f58fa58908e3ae 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -469,6 +469,11 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet6_assign_bit(RECVERR6_RFC4884, sk, valbool);
return 0;
+ case IPV6_MULTICAST_ALL:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ inet6_assign_bit(MC6_ALL, sk, valbool);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -890,13 +895,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
retv = ipv6_sock_ac_drop(sk, mreq.ipv6mr_ifindex, &mreq.ipv6mr_acaddr);
break;
}
- case IPV6_MULTICAST_ALL:
- if (optlen < sizeof(int))
- goto e_inval;
- np->mc_all = valbool;
- retv = 0;
- break;
-
case MCAST_JOIN_GROUP:
case MCAST_LEAVE_GROUP:
if (in_compat_syscall())
@@ -1372,7 +1370,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_MULTICAST_ALL:
- val = np->mc_all;
+ val = inet6_test_bit(MC6_ALL, sk);
break;
case IPV6_UNICAST_IF:
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 6a33a50687bcf7201e75574f03e619fe89636068..483f797ae44d538009184b5e53ad7755d73bab4a 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -642,7 +642,7 @@ bool inet6_mc_check(const struct sock *sk, const struct in6_addr *mc_addr,
}
if (!mc) {
rcu_read_unlock();
- return np->mc_all;
+ return inet6_test_bit(MC6_ALL, sk);
}
psl = rcu_dereference(mc->sflist);
if (!psl) {
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* [PATCH net-next 08/14] ipv6: lockless IPV6_AUTOFLOWLABEL implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (6 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 07/14] ipv6: lockless IPV6_MULTICAST_ALL implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:04 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 09/14] ipv6: lockless IPV6_DONTFRAG implementation Eric Dumazet
` (7 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags,
to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 2 --
include/net/inet_sock.h | 2 ++
include/net/ipv6.h | 2 +-
net/ipv6/ip6_output.c | 12 +++++-------
net/ipv6/ipv6_sockglue.c | 11 +++++------
5 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index d88e91b7f0a319a816488025ef213c4fb90ed359..e3be5dc21b7d27080b398f1425bf11145896a4f3 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -253,8 +253,6 @@ struct ipv6_pinfo {
* 100: prefer care-of address
*/
dontfrag:1,
- autoflowlabel:1,
- autoflowlabel_set:1,
rtalert_isolate:1;
__u8 min_hopcount;
__u8 tclass;
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 97e70a97dae888e6ab93c6446f4f3ba58cd8583e..f1af64a4067310258a3bc45b84ad3fd093bddbab 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -271,6 +271,8 @@ enum {
INET_FLAGS_MC6_LOOP = 20,
INET_FLAGS_RECVERR6_RFC4884 = 21,
INET_FLAGS_MC6_ALL = 22,
+ INET_FLAGS_AUTOFLOWLABEL_SET = 23,
+ INET_FLAGS_AUTOFLOWLABEL = 24,
};
/* cmsg flags for inet */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 0af1a7565a3602e4deb68762267cba454750341e..fe1978a288630a20ba03dc3a36e22938495082e4 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -428,7 +428,7 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq,
int flags);
int ip6_flowlabel_init(void);
void ip6_flowlabel_cleanup(void);
-bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np);
+bool ip6_autoflowlabel(struct net *net, const struct sock *sk);
static inline void fl6_sock_release(struct ip6_flowlabel *fl)
{
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ab7ede4a731a96fe6dce3205df29b298c923acc7..47aa42f93ccda8b49ed6ecd7a7a07703ae147928 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -232,12 +232,11 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
}
EXPORT_SYMBOL(ip6_output);
-bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+bool ip6_autoflowlabel(struct net *net, const struct sock *sk)
{
- if (!np->autoflowlabel_set)
+ if (!inet6_test_bit(AUTOFLOWLABEL_SET, sk))
return ip6_default_np_autolabel(net);
- else
- return np->autoflowlabel;
+ return inet6_test_bit(AUTOFLOWLABEL, sk);
}
/*
@@ -314,7 +313,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
hlimit = ip6_dst_hoplimit(dst);
ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
- ip6_autoflowlabel(net, np), fl6));
+ ip6_autoflowlabel(net, sk), fl6));
hdr->payload_len = htons(seg_len);
hdr->nexthdr = proto;
@@ -1938,7 +1937,6 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
struct sk_buff *skb, *tmp_skb;
struct sk_buff **tail_skb;
struct in6_addr *final_dst;
- struct ipv6_pinfo *np = inet6_sk(sk);
struct net *net = sock_net(sk);
struct ipv6hdr *hdr;
struct ipv6_txoptions *opt = v6_cork->opt;
@@ -1981,7 +1979,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
ip6_flow_hdr(hdr, v6_cork->tclass,
ip6_make_flowlabel(net, skb, fl6->flowlabel,
- ip6_autoflowlabel(net, np), fl6));
+ ip6_autoflowlabel(net, sk), fl6));
hdr->hop_limit = v6_cork->hop_limit;
hdr->nexthdr = proto;
hdr->saddr = fl6->saddr;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 7a181831f226c67813446145f8f58fa58908e3ae..d5d428a695f728d96a7d075d86f806cc3f926e0a 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -474,6 +474,10 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet6_assign_bit(MC6_ALL, sk, valbool);
return 0;
+ case IPV6_AUTOFLOWLABEL:
+ inet6_assign_bit(AUTOFLOWLABEL, sk, valbool);
+ inet6_set_bit(AUTOFLOWLABEL_SET, sk);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -970,11 +974,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
np->dontfrag = valbool;
retv = 0;
break;
- case IPV6_AUTOFLOWLABEL:
- np->autoflowlabel = valbool;
- np->autoflowlabel_set = 1;
- retv = 0;
- break;
case IPV6_RECVFRAGSIZE:
np->rxopt.bits.recvfragsize = valbool;
retv = 0;
@@ -1447,7 +1446,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_AUTOFLOWLABEL:
- val = ip6_autoflowlabel(sock_net(sk), np);
+ val = ip6_autoflowlabel(sock_net(sk), sk);
break;
case IPV6_RECVFRAGSIZE:
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 08/14] ipv6: lockless IPV6_AUTOFLOWLABEL implementation
2023-09-12 16:02 ` [PATCH net-next 08/14] ipv6: lockless IPV6_AUTOFLOWLABEL implementation Eric Dumazet
@ 2023-09-14 15:04 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 15:04 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags,
> to fix data-races.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 2 --
> include/net/inet_sock.h | 2 ++
> include/net/ipv6.h | 2 +-
> net/ipv6/ip6_output.c | 12 +++++-------
> net/ipv6/ipv6_sockglue.c | 11 +++++------
> 5 files changed, 13 insertions(+), 16 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 09/14] ipv6: lockless IPV6_DONTFRAG implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (7 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 08/14] ipv6: lockless IPV6_AUTOFLOWLABEL implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:05 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 10/14] ipv6: lockless IPV6_RECVERR implemetation Eric Dumazet
` (6 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Move np->dontfrag flag to inet->inet_flags to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 1 -
include/net/inet_sock.h | 1 +
include/net/ipv6.h | 6 +++---
include/net/xfrm.h | 2 +-
net/ipv6/icmp.c | 4 ++--
net/ipv6/ip6_output.c | 2 +-
net/ipv6/ipv6_sockglue.c | 9 ++++-----
net/ipv6/ping.c | 2 +-
net/ipv6/raw.c | 2 +-
net/ipv6/udp.c | 2 +-
net/l2tp/l2tp_ip6.c | 2 +-
11 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index e3be5dc21b7d27080b398f1425bf11145896a4f3..57d563f1d4b1707264f0d79406c4c139cc0fa525 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -252,7 +252,6 @@ struct ipv6_pinfo {
* 010: prefer public address
* 100: prefer care-of address
*/
- dontfrag:1,
rtalert_isolate:1;
__u8 min_hopcount;
__u8 tclass;
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index f1af64a4067310258a3bc45b84ad3fd093bddbab..ac75324e9e1eafe68cee7b0581e472cbb4f49aa3 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -273,6 +273,7 @@ enum {
INET_FLAGS_MC6_ALL = 22,
INET_FLAGS_AUTOFLOWLABEL_SET = 23,
INET_FLAGS_AUTOFLOWLABEL = 24,
+ INET_FLAGS_DONTFRAG = 25,
};
/* cmsg flags for inet */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index fe1978a288630a20ba03dc3a36e22938495082e4..d2cf7e176f2b97dac957e65b75d5e69a39c546b5 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -373,12 +373,12 @@ static inline void ipcm6_init(struct ipcm6_cookie *ipc6)
}
static inline void ipcm6_init_sk(struct ipcm6_cookie *ipc6,
- const struct ipv6_pinfo *np)
+ const struct sock *sk)
{
*ipc6 = (struct ipcm6_cookie) {
.hlimit = -1,
- .tclass = np->tclass,
- .dontfrag = np->dontfrag,
+ .tclass = inet6_sk(sk)->tclass,
+ .dontfrag = inet6_test_bit(DONTFRAG, sk),
};
}
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 363c7d5105542ec7f43f91e5071b877314584bc5..98d7aa78addaab129f7ce060b10b7652fd0acba1 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -2166,7 +2166,7 @@ static inline bool xfrm6_local_dontfrag(const struct sock *sk)
proto = sk->sk_protocol;
if (proto == IPPROTO_UDP || proto == IPPROTO_RAW)
- return inet6_sk(sk)->dontfrag;
+ return inet6_test_bit(DONTFRAG, sk);
return false;
}
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 93a594a901d12befb754e7035f56726273eead92..8fb4a791881a48d5efcebc990c8829d8f77fe94f 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -588,7 +588,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
else if (!fl6.flowi6_oif)
fl6.flowi6_oif = np->ucast_oif;
- ipcm6_init_sk(&ipc6, np);
+ ipcm6_init_sk(&ipc6, sk);
ipc6.sockc.mark = mark;
fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
@@ -791,7 +791,7 @@ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb)
msg.offset = 0;
msg.type = type;
- ipcm6_init_sk(&ipc6, np);
+ ipcm6_init_sk(&ipc6, sk);
ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
ipc6.tclass = ipv6_get_dsfield(ipv6_hdr(skb));
ipc6.sockc.mark = mark;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 47aa42f93ccda8b49ed6ecd7a7a07703ae147928..8851fe5d45a0781c8b78c995c2c4c6c81e10cd52 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -2092,7 +2092,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
return ERR_PTR(err);
}
if (ipc6->dontfrag < 0)
- ipc6->dontfrag = inet6_sk(sk)->dontfrag;
+ ipc6->dontfrag = inet6_test_bit(DONTFRAG, sk);
err = __ip6_append_data(sk, &queue, cork, &v6_cork,
¤t->task_frag, getfrag, from,
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index d5d428a695f728d96a7d075d86f806cc3f926e0a..33dd4dd872e6bca2ee18a634283640007adcc692 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -478,6 +478,9 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
inet6_assign_bit(AUTOFLOWLABEL, sk, valbool);
inet6_set_bit(AUTOFLOWLABEL_SET, sk);
return 0;
+ case IPV6_DONTFRAG:
+ inet6_assign_bit(DONTFRAG, sk, valbool);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -970,10 +973,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
retv = __ip6_sock_set_addr_preferences(sk, val);
break;
- case IPV6_DONTFRAG:
- np->dontfrag = valbool;
- retv = 0;
- break;
case IPV6_RECVFRAGSIZE:
np->rxopt.bits.recvfragsize = valbool;
retv = 0;
@@ -1442,7 +1441,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_DONTFRAG:
- val = np->dontfrag;
+ val = inet6_test_bit(DONTFRAG, sk);
break;
case IPV6_AUTOFLOWLABEL:
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 5831aaa53d75eae7b764d54ab52da65db4030d73..4444b61eb23bbf483068d2b119a7559e49ba3880 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -118,7 +118,7 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
return -EINVAL;
- ipcm6_init_sk(&ipc6, np);
+ ipcm6_init_sk(&ipc6, sk);
ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags);
ipc6.sockc.mark = READ_ONCE(sk->sk_mark);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 42fcec3ecf5e171a5ebe724b8c971d90885abe41..cc9673c1809fb238f6d9ab6915116cf0dd6eb593 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -898,7 +898,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
if (ipc6.dontfrag < 0)
- ipc6.dontfrag = np->dontfrag;
+ ipc6.dontfrag = inet6_test_bit(DONTFRAG, sk);
if (msg->msg_flags&MSG_CONFIRM)
goto do_confirm;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 86b5d509a4688cacb2f40667c9ddc10f81ade2fe..d904c5450a07bf1df10d94ee6bb9b2a8fb9381b5 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1593,7 +1593,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
do_append_data:
if (ipc6.dontfrag < 0)
- ipc6.dontfrag = np->dontfrag;
+ ipc6.dontfrag = inet6_test_bit(DONTFRAG, sk);
up->len += ulen;
err = ip6_append_data(sk, getfrag, msg, ulen, sizeof(struct udphdr),
&ipc6, fl6, (struct rt6_info *)dst,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index ed8ebb6f59097ac18bb284d1c48f9e801e9a92c2..40af2431e73aad74ab64e97db8a5ee79dda0879d 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -621,7 +621,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
if (ipc6.dontfrag < 0)
- ipc6.dontfrag = np->dontfrag;
+ ipc6.dontfrag = inet6_test_bit(DONTFRAG, sk);
if (msg->msg_flags & MSG_CONFIRM)
goto do_confirm;
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 09/14] ipv6: lockless IPV6_DONTFRAG implementation
2023-09-12 16:02 ` [PATCH net-next 09/14] ipv6: lockless IPV6_DONTFRAG implementation Eric Dumazet
@ 2023-09-14 15:05 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 15:05 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> Move np->dontfrag flag to inet->inet_flags to fix data-races.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 1 -
> include/net/inet_sock.h | 1 +
> include/net/ipv6.h | 6 +++---
> include/net/xfrm.h | 2 +-
> net/ipv6/icmp.c | 4 ++--
> net/ipv6/ip6_output.c | 2 +-
> net/ipv6/ipv6_sockglue.c | 9 ++++-----
> net/ipv6/ping.c | 2 +-
> net/ipv6/raw.c | 2 +-
> net/ipv6/udp.c | 2 +-
> net/l2tp/l2tp_ip6.c | 2 +-
> 11 files changed, 16 insertions(+), 17 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 10/14] ipv6: lockless IPV6_RECVERR implemetation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (8 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 09/14] ipv6: lockless IPV6_DONTFRAG implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:06 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 11/14] ipv6: move np->repflow to atomic flags Eric Dumazet
` (5 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
np->recverr is moved to inet->inet_flags to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 3 +--
include/net/inet_sock.h | 1 +
include/net/ipv6.h | 4 +---
net/dccp/ipv6.c | 2 +-
net/ipv4/ping.c | 2 +-
net/ipv6/datagram.c | 6 ++----
net/ipv6/ipv6_sockglue.c | 17 ++++++++---------
net/ipv6/raw.c | 10 +++++-----
net/ipv6/tcp_ipv6.c | 2 +-
net/ipv6/udp.c | 6 +++---
net/sctp/ipv6.c | 4 +---
11 files changed, 25 insertions(+), 32 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 57d563f1d4b1707264f0d79406c4c139cc0fa525..53f4f1b97a787ac01fc274a8057494a28fa270fd 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -243,8 +243,7 @@ struct ipv6_pinfo {
} rxopt;
/* sockopt flags */
- __u16 recverr:1,
- sndflow:1,
+ __u16 sndflow:1,
repflow:1,
pmtudisc:3,
padding:1, /* 1 bit hole */
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ac75324e9e1eafe68cee7b0581e472cbb4f49aa3..3b79bc759ff478f96d729f2669c6963bbe768ba1 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -274,6 +274,7 @@ enum {
INET_FLAGS_AUTOFLOWLABEL_SET = 23,
INET_FLAGS_AUTOFLOWLABEL = 24,
INET_FLAGS_DONTFRAG = 25,
+ INET_FLAGS_RECVERR6 = 26,
};
/* cmsg flags for inet */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index d2cf7e176f2b97dac957e65b75d5e69a39c546b5..51c94fddd8039f980eb5a14441936623fd9b7a5d 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1298,9 +1298,7 @@ static inline int ip6_sock_set_v6only(struct sock *sk)
static inline void ip6_sock_set_recverr(struct sock *sk)
{
- lock_sock(sk);
- inet6_sk(sk)->recverr = true;
- release_sock(sk);
+ inet6_set_bit(RECVERR6, sk);
}
static inline int __ip6_sock_set_addr_preferences(struct sock *sk, int val)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 83617a16b98e70aa577c08a394df63e006e53e9e..e6c3d84c2b9ec2df9b89ab0879991b3b312d0b6f 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -185,7 +185,7 @@ static int dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
goto out;
}
- if (!sock_owned_by_user(sk) && np->recverr) {
+ if (!sock_owned_by_user(sk) && inet6_test_bit(RECVERR6, sk)) {
sk->sk_err = err;
sk_error_report(sk);
} else {
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 75e0aee35eb787a6c9f70394294b30490c980a64..bc01ad5fc01ab97f71f7704a671eaf644ec040be 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -581,7 +581,7 @@ void ping_err(struct sk_buff *skb, int offset, u32 info)
* 4.1.3.3.
*/
if ((family == AF_INET && !inet_test_bit(RECVERR, sk)) ||
- (family == AF_INET6 && !inet6_sk(sk)->recverr)) {
+ (family == AF_INET6 && !inet6_test_bit(RECVERR6, sk))) {
if (!harderr || sk->sk_state != TCP_ESTABLISHED)
goto out;
} else {
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index e81892814935fb3934fbf0e6f9defc702ec29152..74673a5eff319f23871e64584a33f5299fa7b521 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -305,11 +305,10 @@ static void ipv6_icmp_error_rfc4884(const struct sk_buff *skb,
void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
__be16 port, u32 info, u8 *payload)
{
- struct ipv6_pinfo *np = inet6_sk(sk);
struct icmp6hdr *icmph = icmp6_hdr(skb);
struct sock_exterr_skb *serr;
- if (!np->recverr)
+ if (!inet6_test_bit(RECVERR6, sk))
return;
skb = skb_clone(skb, GFP_ATOMIC);
@@ -344,12 +343,11 @@ EXPORT_SYMBOL_GPL(ipv6_icmp_error);
void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info)
{
- const struct ipv6_pinfo *np = inet6_sk(sk);
struct sock_exterr_skb *serr;
struct ipv6hdr *iph;
struct sk_buff *skb;
- if (!np->recverr)
+ if (!inet6_test_bit(RECVERR6, sk))
return;
skb = alloc_skb(sizeof(struct ipv6hdr), GFP_ATOMIC);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 33dd4dd872e6bca2ee18a634283640007adcc692..ec10b45c49c15f9655466a529046f741f8b9fc69 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -481,6 +481,13 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
case IPV6_DONTFRAG:
inet6_assign_bit(DONTFRAG, sk, valbool);
return 0;
+ case IPV6_RECVERR:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ inet6_assign_bit(RECVERR6, sk, valbool);
+ if (!val)
+ skb_errqueue_purge(&sk->sk_error_queue);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -943,14 +950,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
np->pmtudisc = val;
retv = 0;
break;
- case IPV6_RECVERR:
- if (optlen < sizeof(int))
- goto e_inval;
- np->recverr = valbool;
- if (!val)
- skb_errqueue_purge(&sk->sk_error_queue);
- retv = 0;
- break;
case IPV6_FLOWINFO_SEND:
if (optlen < sizeof(int))
goto e_inval;
@@ -1380,7 +1379,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_RECVERR:
- val = np->recverr;
+ val = inet6_test_bit(RECVERR6, sk);
break;
case IPV6_FLOWINFO_SEND:
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index cc9673c1809fb238f6d9ab6915116cf0dd6eb593..71f6bdccfa1f39290e1b573ff8c647d91fd007a4 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -291,6 +291,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
struct inet6_skb_parm *opt,
u8 type, u8 code, int offset, __be32 info)
{
+ bool recverr = inet6_test_bit(RECVERR6, sk);
struct ipv6_pinfo *np = inet6_sk(sk);
int err;
int harderr;
@@ -300,7 +301,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
2. Socket is connected (otherwise the error indication
is useless without recverr and error is hard.
*/
- if (!np->recverr && sk->sk_state != TCP_ESTABLISHED)
+ if (!recverr && sk->sk_state != TCP_ESTABLISHED)
return;
harderr = icmpv6_err_convert(type, code, &err);
@@ -312,14 +313,14 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
ip6_sk_redirect(skb, sk);
return;
}
- if (np->recverr) {
+ if (recverr) {
u8 *payload = skb->data;
if (!inet_test_bit(HDRINCL, sk))
payload += offset;
ipv6_icmp_error(sk, skb, err, 0, ntohl(info), payload);
}
- if (np->recverr || harderr) {
+ if (recverr || harderr) {
sk->sk_err = err;
sk_error_report(sk);
}
@@ -587,7 +588,6 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
struct flowi6 *fl6, struct dst_entry **dstp,
unsigned int flags, const struct sockcm_cookie *sockc)
{
- struct ipv6_pinfo *np = inet6_sk(sk);
struct net *net = sock_net(sk);
struct ipv6hdr *iph;
struct sk_buff *skb;
@@ -668,7 +668,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
error:
IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
error_check:
- if (err == -ENOBUFS && !np->recverr)
+ if (err == -ENOBUFS && !inet6_test_bit(RECVERR6, sk))
err = 0;
return err;
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 54db5fab318bc68cf9efbe6f26dacba614fa8562..b5954b136b57306429690594238f7a01b0cf15de 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -508,7 +508,7 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
tcp_ld_RTO_revert(sk, seq);
}
- if (!sock_owned_by_user(sk) && np->recverr) {
+ if (!sock_owned_by_user(sk) && inet6_test_bit(RECVERR6, sk)) {
WRITE_ONCE(sk->sk_err, err);
sk_error_report(sk);
} else {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d904c5450a07bf1df10d94ee6bb9b2a8fb9381b5..65f6217d36cb7c862f1511a058a7a5973c40cef8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -619,7 +619,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
goto out;
}
- if (!np->recverr) {
+ if (!inet6_test_bit(RECVERR6, sk)) {
if (!harderr || sk->sk_state != TCP_ESTABLISHED)
goto out;
} else {
@@ -1281,7 +1281,7 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
send:
err = ip6_send_skb(skb);
if (err) {
- if (err == -ENOBUFS && !inet6_sk(sk)->recverr) {
+ if (err == -ENOBUFS && !inet6_test_bit(RECVERR6, sk)) {
UDP6_INC_STATS(sock_net(sk),
UDP_MIB_SNDBUFERRORS, is_udplite);
err = 0;
@@ -1606,7 +1606,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
up->pending = 0;
if (err > 0)
- err = np->recverr ? net_xmit_errno(err) : 0;
+ err = inet6_test_bit(RECVERR6, sk) ? net_xmit_errno(err) : 0;
release_sock(sk);
out:
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 43f2731bf590e5757b7ad2d3a92a12e4098e0d47..42b5b853ea01c767e1fe878772eeabe5c05adb6d 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -128,7 +128,6 @@ static void sctp_v6_err_handle(struct sctp_transport *t, struct sk_buff *skb,
{
struct sctp_association *asoc = t->asoc;
struct sock *sk = asoc->base.sk;
- struct ipv6_pinfo *np;
int err = 0;
switch (type) {
@@ -149,9 +148,8 @@ static void sctp_v6_err_handle(struct sctp_transport *t, struct sk_buff *skb,
break;
}
- np = inet6_sk(sk);
icmpv6_err_convert(type, code, &err);
- if (!sock_owned_by_user(sk) && np->recverr) {
+ if (!sock_owned_by_user(sk) && inet6_test_bit(RECVERR6, sk)) {
sk->sk_err = err;
sk_error_report(sk);
} else {
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 10/14] ipv6: lockless IPV6_RECVERR implemetation
2023-09-12 16:02 ` [PATCH net-next 10/14] ipv6: lockless IPV6_RECVERR implemetation Eric Dumazet
@ 2023-09-14 15:06 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 15:06 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> np->recverr is moved to inet->inet_flags to fix data-races.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 3 +--
> include/net/inet_sock.h | 1 +
> include/net/ipv6.h | 4 +---
> net/dccp/ipv6.c | 2 +-
> net/ipv4/ping.c | 2 +-
> net/ipv6/datagram.c | 6 ++----
> net/ipv6/ipv6_sockglue.c | 17 ++++++++---------
> net/ipv6/raw.c | 10 +++++-----
> net/ipv6/tcp_ipv6.c | 2 +-
> net/ipv6/udp.c | 6 +++---
> net/sctp/ipv6.c | 4 +---
> 11 files changed, 25 insertions(+), 32 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 11/14] ipv6: move np->repflow to atomic flags
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (9 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 10/14] ipv6: lockless IPV6_RECVERR implemetation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:07 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 12/14] ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation Eric Dumazet
` (4 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Move np->repflow to inet->inet_flags to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 1 -
include/net/inet_sock.h | 1 +
net/dccp/ipv6.c | 2 +-
net/ipv6/af_inet6.c | 3 ++-
net/ipv6/ip6_flowlabel.c | 8 ++++----
net/ipv6/tcp_ipv6.c | 14 ++++++--------
6 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 53f4f1b97a787ac01fc274a8057494a28fa270fd..e62413371ea40cbd9f13aa6ac6b6be41a6831237 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -244,7 +244,6 @@ struct ipv6_pinfo {
/* sockopt flags */
__u16 sndflow:1,
- repflow:1,
pmtudisc:3,
padding:1, /* 1 bit hole */
srcprefs:3, /* 001: prefer temporary address
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 3b79bc759ff478f96d729f2669c6963bbe768ba1..5d61c7dc6577827740254f0e9aa288065f1bda7f 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -275,6 +275,7 @@ enum {
INET_FLAGS_AUTOFLOWLABEL = 24,
INET_FLAGS_DONTFRAG = 25,
INET_FLAGS_RECVERR6 = 26,
+ INET_FLAGS_REPFLOW = 27,
};
/* cmsg flags for inet */
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index e6c3d84c2b9ec2df9b89ab0879991b3b312d0b6f..d7e63eea705dfe5c40d374301f93987e1c34748b 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -679,7 +679,7 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
WRITE_ONCE(np->mcast_hops, ipv6_hdr(opt_skb)->hop_limit);
if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass)
np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
- if (np->repflow)
+ if (inet6_test_bit(REPFLOW, sk))
np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb));
if (ipv6_opt_accepted(sk, opt_skb,
&DCCP_SKB_CB(opt_skb)->header.h6)) {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 372fb7b9112c8dfed09b6ddfdb37016a1a668494..48737363377fef32f471075fd3f000bc742fd4e4 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -220,7 +220,8 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
inet6_set_bit(MC6_LOOP, sk);
inet6_set_bit(MC6_ALL, sk);
np->pmtudisc = IPV6_PMTUDISC_WANT;
- np->repflow = net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_ESTABLISHED;
+ inet6_assign_bit(REPFLOW, sk, net->ipv6.sysctl.flowlabel_reflect &
+ FLOWLABEL_REFLECT_ESTABLISHED);
sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
sk->sk_txrehash = READ_ONCE(net->core.sysctl_txrehash);
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index b3ca4beb4405aa9dc4ce610abda9a46ac3ceb5fb..eca07e10e21fcf11b3a8ebe6353f38789b87bdaf 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -513,7 +513,7 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq,
return 0;
}
- if (np->repflow) {
+ if (inet6_test_bit(REPFLOW, sk)) {
freq->flr_label = np->flow_label;
return 0;
}
@@ -551,10 +551,10 @@ static int ipv6_flowlabel_put(struct sock *sk, struct in6_flowlabel_req *freq)
if (freq->flr_flags & IPV6_FL_F_REFLECT) {
if (sk->sk_protocol != IPPROTO_TCP)
return -ENOPROTOOPT;
- if (!np->repflow)
+ if (!inet6_test_bit(REPFLOW, sk))
return -ESRCH;
np->flow_label = 0;
- np->repflow = 0;
+ inet6_clear_bit(REPFLOW, sk);
return 0;
}
@@ -626,7 +626,7 @@ static int ipv6_flowlabel_get(struct sock *sk, struct in6_flowlabel_req *freq,
if (sk->sk_protocol != IPPROTO_TCP)
return -ENOPROTOOPT;
- np->repflow = 1;
+ inet6_set_bit(REPFLOW, sk);
return 0;
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index b5954b136b57306429690594238f7a01b0cf15de..201caf88bb99e4ff87048fab3d89b6ea22269df3 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -548,7 +548,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
&ireq->ir_v6_rmt_addr);
fl6->daddr = ireq->ir_v6_rmt_addr;
- if (np->repflow && ireq->pktopts)
+ if (inet6_test_bit(REPFLOW, sk) && ireq->pktopts)
fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));
tclass = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) ?
@@ -797,7 +797,7 @@ static void tcp_v6_init_req(struct request_sock *req,
(ipv6_opt_accepted(sk_listener, skb, &TCP_SKB_CB(skb)->header.h6) ||
np->rxopt.bits.rxinfo ||
np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim ||
- np->rxopt.bits.rxohlim || np->repflow)) {
+ np->rxopt.bits.rxohlim || inet6_test_bit(REPFLOW, sk_listener))) {
refcount_inc(&skb->users);
ireq->pktopts = skb;
}
@@ -1055,10 +1055,8 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
if (sk) {
oif = sk->sk_bound_dev_if;
if (sk_fullsock(sk)) {
- const struct ipv6_pinfo *np = tcp_inet6_sk(sk);
-
trace_tcp_send_reset(sk, skb);
- if (np->repflow)
+ if (inet6_test_bit(REPFLOW, sk))
label = ip6_flowlabel(ipv6h);
priority = sk->sk_priority;
txhash = sk->sk_txhash;
@@ -1247,7 +1245,7 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
newnp->mcast_oif = inet_iif(skb);
newnp->mcast_hops = ip_hdr(skb)->ttl;
newnp->rcv_flowinfo = 0;
- if (np->repflow)
+ if (inet6_test_bit(REPFLOW, sk))
newnp->flow_label = 0;
/*
@@ -1320,7 +1318,7 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
newnp->mcast_oif = tcp_v6_iif(skb);
newnp->mcast_hops = ipv6_hdr(skb)->hop_limit;
newnp->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(skb));
- if (np->repflow)
+ if (inet6_test_bit(REPFLOW, sk))
newnp->flow_label = ip6_flowlabel(ipv6_hdr(skb));
/* Set ToS of the new socket based upon the value of incoming SYN.
@@ -1546,7 +1544,7 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
ipv6_hdr(opt_skb)->hop_limit);
if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass)
np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
- if (np->repflow)
+ if (inet6_test_bit(REPFLOW, sk))
np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb));
if (ipv6_opt_accepted(sk, opt_skb, &TCP_SKB_CB(opt_skb)->header.h6)) {
tcp_v6_restore_cb(opt_skb);
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 11/14] ipv6: move np->repflow to atomic flags
2023-09-12 16:02 ` [PATCH net-next 11/14] ipv6: move np->repflow to atomic flags Eric Dumazet
@ 2023-09-14 15:07 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 15:07 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> Move np->repflow to inet->inet_flags to fix data-races.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 1 -
> include/net/inet_sock.h | 1 +
> net/dccp/ipv6.c | 2 +-
> net/ipv6/af_inet6.c | 3 ++-
> net/ipv6/ip6_flowlabel.c | 8 ++++----
> net/ipv6/tcp_ipv6.c | 14 ++++++--------
> 6 files changed, 14 insertions(+), 15 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 12/14] ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (10 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 11/14] ipv6: move np->repflow to atomic flags Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:08 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 13/14] ipv6: lockless IPV6_MTU_DISCOVER implementation Eric Dumazet
` (3 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Reads from np->rtalert_isolate are racy.
Move this flag to inet->inet_flags to fix data-races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 3 +--
include/net/inet_sock.h | 1 +
net/ipv6/ip6_output.c | 3 +--
net/ipv6/ipv6_sockglue.c | 13 ++++++-------
4 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index e62413371ea40cbd9f13aa6ac6b6be41a6831237..f288a35f157f73ded445639c30f3365047fd9ddc 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -246,11 +246,10 @@ struct ipv6_pinfo {
__u16 sndflow:1,
pmtudisc:3,
padding:1, /* 1 bit hole */
- srcprefs:3, /* 001: prefer temporary address
+ srcprefs:3; /* 001: prefer temporary address
* 010: prefer public address
* 100: prefer care-of address
*/
- rtalert_isolate:1;
__u8 min_hopcount;
__u8 tclass;
__be32 rcv_flowinfo;
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 5d61c7dc6577827740254f0e9aa288065f1bda7f..befee0f66c0555f3ac4524fd8f7780ff21c04aaa 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -276,6 +276,7 @@ enum {
INET_FLAGS_DONTFRAG = 25,
INET_FLAGS_RECVERR6 = 26,
INET_FLAGS_REPFLOW = 27,
+ INET_FLAGS_RTALERT_ISOLATE = 28,
};
/* cmsg flags for inet */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8851fe5d45a0781c8b78c995c2c4c6c81e10cd52..f87d8491d7e273f167b7b144a7e134783e1b80f6 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -368,9 +368,8 @@ static int ip6_call_ra_chain(struct sk_buff *skb, int sel)
if (sk && ra->sel == sel &&
(!sk->sk_bound_dev_if ||
sk->sk_bound_dev_if == skb->dev->ifindex)) {
- struct ipv6_pinfo *np = inet6_sk(sk);
- if (np && np->rtalert_isolate &&
+ if (inet6_test_bit(RTALERT_ISOLATE, sk) &&
!net_eq(sock_net(sk), dev_net(skb->dev))) {
continue;
}
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index ec10b45c49c15f9655466a529046f741f8b9fc69..c22a492e05360b68ef6868707e363f2ce84a4c35 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -488,6 +488,11 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
if (!val)
skb_errqueue_purge(&sk->sk_error_queue);
return 0;
+ case IPV6_ROUTER_ALERT_ISOLATE:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ inet6_assign_bit(RTALERT_ISOLATE, sk, valbool);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -936,12 +941,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
retv = ip6_ra_control(sk, val);
break;
- case IPV6_ROUTER_ALERT_ISOLATE:
- if (optlen < sizeof(int))
- goto e_inval;
- np->rtalert_isolate = valbool;
- retv = 0;
- break;
case IPV6_MTU_DISCOVER:
if (optlen < sizeof(int))
goto e_inval;
@@ -1452,7 +1451,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_ROUTER_ALERT_ISOLATE:
- val = np->rtalert_isolate;
+ val = inet6_test_bit(RTALERT_ISOLATE, sk);
break;
case IPV6_RECVERR_RFC4884:
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* [PATCH net-next 13/14] ipv6: lockless IPV6_MTU_DISCOVER implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (11 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 12/14] ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:10 ` David Ahern
2023-09-12 16:02 ` [PATCH net-next 14/14] ipv6: lockless IPV6_FLOWINFO_SEND implementation Eric Dumazet
` (2 subsequent siblings)
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
Most np->pmtudisc reads are racy.
Move this 3bit field on a full byte, add annotations
and make IPV6_MTU_DISCOVER setsockopt() lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 5 ++---
include/net/ip6_route.h | 14 +++++++++-----
net/ipv6/ip6_output.c | 4 ++--
net/ipv6/ipv6_sockglue.c | 17 ++++++++---------
net/ipv6/raw.c | 2 +-
net/ipv6/udp.c | 2 +-
net/netfilter/ipvs/ip_vs_sync.c | 2 +-
7 files changed, 24 insertions(+), 22 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index f288a35f157f73ded445639c30f3365047fd9ddc..10f521a6a9c8a881b4677d53597929622ae95b67 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -243,13 +243,12 @@ struct ipv6_pinfo {
} rxopt;
/* sockopt flags */
- __u16 sndflow:1,
- pmtudisc:3,
- padding:1, /* 1 bit hole */
+ __u8 sndflow:1,
srcprefs:3; /* 001: prefer temporary address
* 010: prefer public address
* 100: prefer care-of address
*/
+ __u8 pmtudisc;
__u8 min_hopcount;
__u8 tclass;
__be32 rcv_flowinfo;
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index b32539bb0fb05c67b5849bb219be59fabe5bb51c..b1ea49900b4ae17cb3436f884e26f5ae3a7a761c 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -266,7 +266,7 @@ static inline unsigned int ip6_skb_dst_mtu(const struct sk_buff *skb)
const struct dst_entry *dst = skb_dst(skb);
unsigned int mtu;
- if (np && np->pmtudisc >= IPV6_PMTUDISC_PROBE) {
+ if (np && READ_ONCE(np->pmtudisc) >= IPV6_PMTUDISC_PROBE) {
mtu = READ_ONCE(dst->dev->mtu);
mtu -= lwtunnel_headroom(dst->lwtstate, mtu);
} else {
@@ -277,14 +277,18 @@ static inline unsigned int ip6_skb_dst_mtu(const struct sk_buff *skb)
static inline bool ip6_sk_accept_pmtu(const struct sock *sk)
{
- return inet6_sk(sk)->pmtudisc != IPV6_PMTUDISC_INTERFACE &&
- inet6_sk(sk)->pmtudisc != IPV6_PMTUDISC_OMIT;
+ u8 pmtudisc = READ_ONCE(inet6_sk(sk)->pmtudisc);
+
+ return pmtudisc != IPV6_PMTUDISC_INTERFACE &&
+ pmtudisc != IPV6_PMTUDISC_OMIT;
}
static inline bool ip6_sk_ignore_df(const struct sock *sk)
{
- return inet6_sk(sk)->pmtudisc < IPV6_PMTUDISC_DO ||
- inet6_sk(sk)->pmtudisc == IPV6_PMTUDISC_OMIT;
+ u8 pmtudisc = READ_ONCE(inet6_sk(sk)->pmtudisc);
+
+ return pmtudisc < IPV6_PMTUDISC_DO ||
+ pmtudisc == IPV6_PMTUDISC_OMIT;
}
static inline const struct in6_addr *rt6_nexthop(const struct rt6_info *rt,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f87d8491d7e273f167b7b144a7e134783e1b80f6..7e5d9eeb990fd4549be753fdaaf1e6c6c21d3f8d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1436,10 +1436,10 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
v6_cork->hop_limit = ipc6->hlimit;
v6_cork->tclass = ipc6->tclass;
if (rt->dst.flags & DST_XFRM_TUNNEL)
- mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
+ mtu = READ_ONCE(np->pmtudisc) >= IPV6_PMTUDISC_PROBE ?
READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
else
- mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
+ mtu = READ_ONCE(np->pmtudisc) >= IPV6_PMTUDISC_PROBE ?
READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst));
frag_size = READ_ONCE(np->frag_size);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index c22a492e05360b68ef6868707e363f2ce84a4c35..85ea42644dcbbe3ed8f625e51ffc6d55ada40156 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -493,6 +493,13 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet6_assign_bit(RTALERT_ISOLATE, sk, valbool);
return 0;
+ case IPV6_MTU_DISCOVER:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ if (val < IPV6_PMTUDISC_DONT || val > IPV6_PMTUDISC_OMIT)
+ return -EINVAL;
+ WRITE_ONCE(np->pmtudisc, val);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -941,14 +948,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
retv = ip6_ra_control(sk, val);
break;
- case IPV6_MTU_DISCOVER:
- if (optlen < sizeof(int))
- goto e_inval;
- if (val < IPV6_PMTUDISC_DONT || val > IPV6_PMTUDISC_OMIT)
- goto e_inval;
- np->pmtudisc = val;
- retv = 0;
- break;
case IPV6_FLOWINFO_SEND:
if (optlen < sizeof(int))
goto e_inval;
@@ -1374,7 +1373,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_MTU_DISCOVER:
- val = np->pmtudisc;
+ val = READ_ONCE(np->pmtudisc);
break;
case IPV6_RECVERR:
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 71f6bdccfa1f39290e1b573ff8c647d91fd007a4..47372cceb98f6e606346b74230b03e76e303822c 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -307,7 +307,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
harderr = icmpv6_err_convert(type, code, &err);
if (type == ICMPV6_PKT_TOOBIG) {
ip6_sk_update_pmtu(skb, sk, info);
- harderr = (np->pmtudisc == IPV6_PMTUDISC_DO);
+ harderr = (READ_ONCE(np->pmtudisc) == IPV6_PMTUDISC_DO);
}
if (type == NDISC_REDIRECT) {
ip6_sk_redirect(skb, sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 65f6217d36cb7c862f1511a058a7a5973c40cef8..97fabbd7e7aa8bf66bfe21a98f97d4408af13d2b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -598,7 +598,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
if (!ip6_sk_accept_pmtu(sk))
goto out;
ip6_sk_update_pmtu(skb, sk, info);
- if (np->pmtudisc != IPV6_PMTUDISC_DONT)
+ if (READ_ONCE(np->pmtudisc) != IPV6_PMTUDISC_DONT)
harderr = 1;
}
if (type == NDISC_REDIRECT) {
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index df1b33b61059eef1e86baefc63e138108a50a081..5820a8156c4701bb163f569d735c389d7a8e3820 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1341,7 +1341,7 @@ static void set_mcast_pmtudisc(struct sock *sk, int val)
struct ipv6_pinfo *np = inet6_sk(sk);
/* IPV6_MTU_DISCOVER */
- np->pmtudisc = val;
+ WRITE_ONCE(np->pmtudisc, val);
}
#endif
release_sock(sk);
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 13/14] ipv6: lockless IPV6_MTU_DISCOVER implementation
2023-09-12 16:02 ` [PATCH net-next 13/14] ipv6: lockless IPV6_MTU_DISCOVER implementation Eric Dumazet
@ 2023-09-14 15:10 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 15:10 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> Most np->pmtudisc reads are racy.
>
> Move this 3bit field on a full byte, add annotations
> and make IPV6_MTU_DISCOVER setsockopt() lockless.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 5 ++---
> include/net/ip6_route.h | 14 +++++++++-----
> net/ipv6/ip6_output.c | 4 ++--
> net/ipv6/ipv6_sockglue.c | 17 ++++++++---------
> net/ipv6/raw.c | 2 +-
> net/ipv6/udp.c | 2 +-
> net/netfilter/ipvs/ip_vs_sync.c | 2 +-
> 7 files changed, 24 insertions(+), 22 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next 14/14] ipv6: lockless IPV6_FLOWINFO_SEND implementation
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (12 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 13/14] ipv6: lockless IPV6_MTU_DISCOVER implementation Eric Dumazet
@ 2023-09-12 16:02 ` Eric Dumazet
2023-09-14 15:11 ` David Ahern
2023-09-14 10:25 ` [PATCH net-next 00/14] ipv6: round of data-races fixes Simon Horman
2023-09-15 9:40 ` patchwork-bot+netdevbpf
15 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2023-09-12 16:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, netdev, eric.dumazet, Eric Dumazet
np->sndflow reads are racy.
Use one bit ftom atomic inet->inet_flags instead,
IPV6_FLOWINFO_SEND setsockopt() can be lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/ipv6.h | 3 +--
include/net/inet_sock.h | 1 +
net/dccp/ipv6.c | 2 +-
net/ipv4/ping.c | 3 +--
net/ipv6/af_inet6.c | 2 +-
net/ipv6/datagram.c | 7 ++++---
net/ipv6/ipv6_sockglue.c | 13 ++++++-------
net/ipv6/ping.c | 2 +-
net/ipv6/raw.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/ipv6/udp.c | 2 +-
net/l2tp/l2tp_ip6.c | 4 ++--
net/sctp/ipv6.c | 3 ++-
13 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 10f521a6a9c8a881b4677d53597929622ae95b67..09253825c99c7a94c4c8a3f176f0ceecd0b166bc 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -243,8 +243,7 @@ struct ipv6_pinfo {
} rxopt;
/* sockopt flags */
- __u8 sndflow:1,
- srcprefs:3; /* 001: prefer temporary address
+ __u8 srcprefs:3; /* 001: prefer temporary address
* 010: prefer public address
* 100: prefer care-of address
*/
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index befee0f66c0555f3ac4524fd8f7780ff21c04aaa..98e11958cdff688249fddf1893ce06b45ecb68d9 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -277,6 +277,7 @@ enum {
INET_FLAGS_RECVERR6 = 26,
INET_FLAGS_REPFLOW = 27,
INET_FLAGS_RTALERT_ISOLATE = 28,
+ INET_FLAGS_SNDFLOW = 29,
};
/* cmsg flags for inet */
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index d7e63eea705dfe5c40d374301f93987e1c34748b..4803f06148488b07ba027138c93014d2b5fa28db 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -844,7 +844,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
memset(&fl6, 0, sizeof(fl6));
- if (np->sndflow) {
+ if (inet6_test_bit(SNDFLOW, sk)) {
fl6.flowlabel = usin->sin6_flowinfo & IPV6_FLOWINFO_MASK;
IP6_ECN_flow_init(fl6.flowlabel);
if (fl6.flowlabel & IPV6_FLOWLABEL_MASK) {
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index bc01ad5fc01ab97f71f7704a671eaf644ec040be..4dd809b7b18867154df42bc28809b886913e253c 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -899,7 +899,6 @@ int ping_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
#if IS_ENABLED(CONFIG_IPV6)
} else if (family == AF_INET6) {
- struct ipv6_pinfo *np = inet6_sk(sk);
struct ipv6hdr *ip6 = ipv6_hdr(skb);
DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
@@ -908,7 +907,7 @@ int ping_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
sin6->sin6_port = 0;
sin6->sin6_addr = ip6->saddr;
sin6->sin6_flowinfo = 0;
- if (np->sndflow)
+ if (inet6_test_bit(SNDFLOW, sk))
sin6->sin6_flowinfo = ip6_flowinfo(ip6);
sin6->sin6_scope_id =
ipv6_iface_scope_id(&sin6->sin6_addr,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 48737363377fef32f471075fd3f000bc742fd4e4..c6ad0d6e99b5e2259648e260e2cad54f34c90cfd 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -537,7 +537,7 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
}
sin->sin6_port = inet->inet_dport;
sin->sin6_addr = sk->sk_v6_daddr;
- if (np->sndflow)
+ if (inet6_test_bit(SNDFLOW, sk))
sin->sin6_flowinfo = np->flow_label;
BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin,
CGROUP_INET6_GETPEERNAME);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 74673a5eff319f23871e64584a33f5299fa7b521..cc6a502db39d2e446c39656ccc398e6ac20abf6b 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -80,7 +80,8 @@ int ip6_datagram_dst_update(struct sock *sk, bool fix_sk_saddr)
struct flowi6 fl6;
int err = 0;
- if (np->sndflow && (np->flow_label & IPV6_FLOWLABEL_MASK)) {
+ if (inet6_test_bit(SNDFLOW, sk) &&
+ (np->flow_label & IPV6_FLOWLABEL_MASK)) {
flowlabel = fl6_sock_lookup(sk, np->flow_label);
if (IS_ERR(flowlabel))
return -EINVAL;
@@ -163,7 +164,7 @@ int __ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr,
if (usin->sin6_family != AF_INET6)
return -EAFNOSUPPORT;
- if (np->sndflow)
+ if (inet6_test_bit(SNDFLOW, sk))
fl6_flowlabel = usin->sin6_flowinfo & IPV6_FLOWINFO_MASK;
if (ipv6_addr_any(&usin->sin6_addr)) {
@@ -491,7 +492,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
const struct ipv6hdr *ip6h = container_of((struct in6_addr *)(nh + serr->addr_offset),
struct ipv6hdr, daddr);
sin->sin6_addr = ip6h->daddr;
- if (np->sndflow)
+ if (inet6_test_bit(SNDFLOW, sk))
sin->sin6_flowinfo = ip6_flowinfo(ip6h);
sin->sin6_scope_id =
ipv6_iface_scope_id(&sin->sin6_addr,
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 85ea42644dcbbe3ed8f625e51ffc6d55ada40156..e9dc6f881bb92db267903a71f3f3e4de4c557819 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -500,6 +500,11 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
WRITE_ONCE(np->pmtudisc, val);
return 0;
+ case IPV6_FLOWINFO_SEND:
+ if (optlen < sizeof(int))
+ return -EINVAL;
+ inet6_assign_bit(SNDFLOW, sk, valbool);
+ return 0;
}
if (needs_rtnl)
rtnl_lock();
@@ -948,12 +953,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
retv = ip6_ra_control(sk, val);
break;
- case IPV6_FLOWINFO_SEND:
- if (optlen < sizeof(int))
- goto e_inval;
- np->sndflow = valbool;
- retv = 0;
- break;
case IPV6_FLOWLABEL_MGR:
retv = ipv6_flowlabel_opt(sk, optval, optlen);
break;
@@ -1381,7 +1380,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_FLOWINFO_SEND:
- val = np->sndflow;
+ val = inet6_test_bit(SNDFLOW, sk);
break;
case IPV6_FLOWLABEL_MGR:
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 4444b61eb23bbf483068d2b119a7559e49ba3880..e8fb0d275cc2d9adf997f944a42a8fc456f8b950 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -89,7 +89,7 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return -EAFNOSUPPORT;
}
daddr = &(u->sin6_addr);
- if (np->sndflow)
+ if (inet6_test_bit(SNDFLOW, sk))
fl6.flowlabel = u->sin6_flowinfo & IPV6_FLOWINFO_MASK;
if (__ipv6_addr_needs_scope_id(ipv6_addr_type(daddr)))
oif = u->sin6_scope_id;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 47372cceb98f6e606346b74230b03e76e303822c..a2aa54a2baaec0169fecd490588a2cd4e8a2f2d7 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -795,7 +795,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return -EINVAL;
daddr = &sin6->sin6_addr;
- if (np->sndflow) {
+ if (inet6_test_bit(SNDFLOW, sk)) {
fl6.flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) {
flowlabel = fl6_sock_lookup(sk, fl6.flowlabel);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 201caf88bb99e4ff87048fab3d89b6ea22269df3..94afb8d0f2d0e4974c3dbe4e3301f0152b5cb9e1 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -163,7 +163,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
memset(&fl6, 0, sizeof(fl6));
- if (np->sndflow) {
+ if (inet6_test_bit(SNDFLOW, sk)) {
fl6.flowlabel = usin->sin6_flowinfo&IPV6_FLOWINFO_MASK;
IP6_ECN_flow_init(fl6.flowlabel);
if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 97fabbd7e7aa8bf66bfe21a98f97d4408af13d2b..b55e23ba1da53eba2ee4c468e30f9428a6fee3a7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1427,7 +1427,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
fl6->fl6_dport = sin6->sin6_port;
daddr = &sin6->sin6_addr;
- if (np->sndflow) {
+ if (inet6_test_bit(SNDFLOW, sk)) {
fl6->flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
if (fl6->flowlabel & IPV6_FLOWLABEL_MASK) {
flowlabel = fl6_sock_lookup(sk, fl6->flowlabel);
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 40af2431e73aad74ab64e97db8a5ee79dda0879d..44cfb72bbd18a34e83e50bebca09729c55df524f 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -431,7 +431,7 @@ static int l2tp_ip6_getname(struct socket *sock, struct sockaddr *uaddr,
return -ENOTCONN;
lsa->l2tp_conn_id = lsk->peer_conn_id;
lsa->l2tp_addr = sk->sk_v6_daddr;
- if (np->sndflow)
+ if (inet6_test_bit(SNDFLOW, sk))
lsa->l2tp_flowinfo = np->flow_label;
} else {
if (ipv6_addr_any(&sk->sk_v6_rcv_saddr))
@@ -529,7 +529,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return -EAFNOSUPPORT;
daddr = &lsa->l2tp_addr;
- if (np->sndflow) {
+ if (inet6_test_bit(SNDFLOW, sk)) {
fl6.flowlabel = lsa->l2tp_flowinfo & IPV6_FLOWINFO_MASK;
if (fl6.flowlabel & IPV6_FLOWLABEL_MASK) {
flowlabel = fl6_sock_lookup(sk, fl6.flowlabel);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 42b5b853ea01c767e1fe878772eeabe5c05adb6d..5c0ed5909d85a1fc137e8652e32df75d8bef28ac 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -296,7 +296,8 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
if (t->flowlabel & SCTP_FLOWLABEL_SET_MASK)
fl6->flowlabel = htonl(t->flowlabel & SCTP_FLOWLABEL_VAL_MASK);
- if (np->sndflow && (fl6->flowlabel & IPV6_FLOWLABEL_MASK)) {
+ if (inet6_test_bit(SNDFLOW, sk) &&
+ (fl6->flowlabel & IPV6_FLOWLABEL_MASK)) {
struct ip6_flowlabel *flowlabel;
flowlabel = fl6_sock_lookup(sk, fl6->flowlabel);
--
2.42.0.283.g2d96d420d3-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH net-next 14/14] ipv6: lockless IPV6_FLOWINFO_SEND implementation
2023-09-12 16:02 ` [PATCH net-next 14/14] ipv6: lockless IPV6_FLOWINFO_SEND implementation Eric Dumazet
@ 2023-09-14 15:11 ` David Ahern
0 siblings, 0 replies; 31+ messages in thread
From: David Ahern @ 2023-09-14 15:11 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet
On 9/12/23 10:02 AM, Eric Dumazet wrote:
> np->sndflow reads are racy.
>
> Use one bit ftom atomic inet->inet_flags instead,
> IPV6_FLOWINFO_SEND setsockopt() can be lockless.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/ipv6.h | 3 +--
> include/net/inet_sock.h | 1 +
> net/dccp/ipv6.c | 2 +-
> net/ipv4/ping.c | 3 +--
> net/ipv6/af_inet6.c | 2 +-
> net/ipv6/datagram.c | 7 ++++---
> net/ipv6/ipv6_sockglue.c | 13 ++++++-------
> net/ipv6/ping.c | 2 +-
> net/ipv6/raw.c | 2 +-
> net/ipv6/tcp_ipv6.c | 2 +-
> net/ipv6/udp.c | 2 +-
> net/l2tp/l2tp_ip6.c | 4 ++--
> net/sctp/ipv6.c | 3 ++-
> 13 files changed, 23 insertions(+), 23 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next 00/14] ipv6: round of data-races fixes
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (13 preceding siblings ...)
2023-09-12 16:02 ` [PATCH net-next 14/14] ipv6: lockless IPV6_FLOWINFO_SEND implementation Eric Dumazet
@ 2023-09-14 10:25 ` Simon Horman
2023-09-15 9:40 ` patchwork-bot+netdevbpf
15 siblings, 0 replies; 31+ messages in thread
From: Simon Horman @ 2023-09-14 10:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, David Ahern, netdev,
eric.dumazet
On Tue, Sep 12, 2023 at 04:01:58PM +0000, Eric Dumazet wrote:
> This series is inspired by one related syzbot report.
>
> Many inet6_sk(sk) fields reads or writes are racy.
>
> Move 1-bit fields to inet->inet_flags to provide
> atomic safety. inet6_{test|set|clear|assign}_bit() helpers
> could be changed later if we need to make room in inet_flags.
>
> Also add missing READ_ONCE()/WRITE_ONCE() when
> lockless readers need access to specific fields.
>
> np->srcprefs will be handled separately to avoid merge conflicts
> because a prior patch was posted for net tree.
>
> Eric Dumazet (14):
> ipv6: lockless IPV6_UNICAST_HOPS implementation
> ipv6: lockless IPV6_MULTICAST_LOOP implementation
> ipv6: lockless IPV6_MULTICAST_HOPS implementation
> ipv6: lockless IPV6_MTU implementation
> ipv6: lockless IPV6_MINHOPCOUNT implementation
> ipv6: lockless IPV6_RECVERR_RFC4884 implementation
> ipv6: lockless IPV6_MULTICAST_ALL implementation
> ipv6: lockless IPV6_AUTOFLOWLABEL implementation
> ipv6: lockless IPV6_DONTFRAG implementation
> ipv6: lockless IPV6_RECVERR implemetation
> ipv6: move np->repflow to atomic flags
> ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation
> ipv6: lockless IPV6_MTU_DISCOVER implementation
> ipv6: lockless IPV6_FLOWINFO_SEND implementation
For series,
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH net-next 00/14] ipv6: round of data-races fixes
2023-09-12 16:01 [PATCH net-next 00/14] ipv6: round of data-races fixes Eric Dumazet
` (14 preceding siblings ...)
2023-09-14 10:25 ` [PATCH net-next 00/14] ipv6: round of data-races fixes Simon Horman
@ 2023-09-15 9:40 ` patchwork-bot+netdevbpf
15 siblings, 0 replies; 31+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-09-15 9:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, kuba, pabeni, dsahern, netdev, eric.dumazet
Hello:
This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:
On Tue, 12 Sep 2023 16:01:58 +0000 you wrote:
> This series is inspired by one related syzbot report.
>
> Many inet6_sk(sk) fields reads or writes are racy.
>
> Move 1-bit fields to inet->inet_flags to provide
> atomic safety. inet6_{test|set|clear|assign}_bit() helpers
> could be changed later if we need to make room in inet_flags.
>
> [...]
Here is the summary with links:
- [net-next,01/14] ipv6: lockless IPV6_UNICAST_HOPS implementation
https://git.kernel.org/netdev/net-next/c/b0adfba7ee77
- [net-next,02/14] ipv6: lockless IPV6_MULTICAST_LOOP implementation
https://git.kernel.org/netdev/net-next/c/d986f52124e0
- [net-next,03/14] ipv6: lockless IPV6_MULTICAST_HOPS implementation
https://git.kernel.org/netdev/net-next/c/2da23eb07c91
- [net-next,04/14] ipv6: lockless IPV6_MTU implementation
https://git.kernel.org/netdev/net-next/c/15f926c4457a
- [net-next,05/14] ipv6: lockless IPV6_MINHOPCOUNT implementation
https://git.kernel.org/netdev/net-next/c/273784d3c574
- [net-next,06/14] ipv6: lockless IPV6_RECVERR_RFC4884 implementation
https://git.kernel.org/netdev/net-next/c/dcae74622c05
- [net-next,07/14] ipv6: lockless IPV6_MULTICAST_ALL implementation
https://git.kernel.org/netdev/net-next/c/6559c0ff3bc2
- [net-next,08/14] ipv6: lockless IPV6_AUTOFLOWLABEL implementation
https://git.kernel.org/netdev/net-next/c/5121516b0c47
- [net-next,09/14] ipv6: lockless IPV6_DONTFRAG implementation
https://git.kernel.org/netdev/net-next/c/1086ca7cce29
- [net-next,10/14] ipv6: lockless IPV6_RECVERR implemetation
https://git.kernel.org/netdev/net-next/c/3fa29971c695
- [net-next,11/14] ipv6: move np->repflow to atomic flags
https://git.kernel.org/netdev/net-next/c/3cccda8db2cf
- [net-next,12/14] ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation
https://git.kernel.org/netdev/net-next/c/83cd5eb654b3
- [net-next,13/14] ipv6: lockless IPV6_MTU_DISCOVER implementation
https://git.kernel.org/netdev/net-next/c/6b724bc4300b
- [net-next,14/14] ipv6: lockless IPV6_FLOWINFO_SEND implementation
https://git.kernel.org/netdev/net-next/c/859f8b265fc2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 31+ messages in thread