* [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance
@ 2023-08-11 7:36 Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 01/15] inet: introduce inet->inet_flags Eric Dumazet
` (14 more replies)
0 siblings, 15 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
In this series, I converted 20 bits in "struct inet_sock" and made
them truly atomic.
This allows to implement many IP_ socket options in a lockless
fashion (no need to acquire socket lock), and fixes data-races
that were showing up in various KCSAN reports.
I also took care of IP_TTL/IP_MINTTL, but left few other options
for another series.
v2: addressed a feedback from a build bot in patch 9 by removing
unused issk variable in mptcp_setsockopt_sol_ip_set_transparent()
Added Acked-by: tags from Soheil (thanks!)
Eric Dumazet (15):
inet: introduce inet->inet_flags
inet: set/get simple options locklessly
inet: move inet->recverr to inet->inet_flags
inet: move inet->recverr_rfc4884 to inet->inet_flags
inet: move inet->freebind to inet->inet_flags
inet: move inet->hdrincl to inet->inet_flags
inet: move inet->mc_loop to inet->inet_frags
inet: move inet->mc_all to inet->inet_frags
inet: move inet->transparent to inet->inet_flags
inet: move inet->is_icsk to inet->inet_flags
inet: move inet->nodefrag to inet->inet_flags
inet: move inet->bind_address_no_port to inet->inet_flags
inet: move inet->defer_connect to inet->inet_flags
inet: implement lockless IP_TTL
inet: implement lockless IP_MINTTL
include/net/inet_connection_sock.h | 4 +-
include/net/inet_sock.h | 92 ++++---
include/net/ipv6.h | 3 +-
include/net/route.h | 2 +-
include/net/tcp.h | 2 +-
net/core/sock.c | 2 +-
net/dccp/ipv4.c | 4 +-
net/ipv4/af_inet.c | 16 +-
net/ipv4/cipso_ipv4.c | 4 +-
net/ipv4/igmp.c | 2 +-
net/ipv4/inet_diag.c | 22 +-
net/ipv4/inet_timewait_sock.c | 2 +-
net/ipv4/ip_output.c | 7 +-
net/ipv4/ip_sockglue.c | 405 +++++++++++++---------------
net/ipv4/netfilter/nf_defrag_ipv4.c | 2 +-
net/ipv4/ping.c | 7 +-
net/ipv4/raw.c | 26 +-
net/ipv4/route.c | 8 +-
net/ipv4/tcp.c | 12 +-
net/ipv4/tcp_fastopen.c | 2 +-
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_ipv4.c | 5 +-
net/ipv4/tcp_minisocks.c | 3 +-
net/ipv4/udp.c | 7 +-
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/af_inet6.c | 8 +-
net/ipv6/datagram.c | 2 +-
net/ipv6/ip6_output.c | 5 +-
net/ipv6/ipv6_sockglue.c | 12 +-
net/ipv6/raw.c | 16 +-
net/ipv6/udp.c | 2 +-
net/l2tp/l2tp_ip.c | 2 +-
net/mptcp/protocol.c | 12 +-
net/mptcp/sockopt.c | 19 +-
net/netfilter/ipvs/ip_vs_core.c | 4 +-
net/sctp/input.c | 2 +-
net/sctp/protocol.c | 2 +-
net/sctp/socket.c | 2 +-
38 files changed, 364 insertions(+), 367 deletions(-)
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 01/15] inet: introduce inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 02/15] inet: set/get simple options locklessly Eric Dumazet
` (13 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
Various inet fields are currently racy.
do_ip_setsockopt() and do_ip_getsockopt() are mostly holding
the socket lock, but some (fast) paths do not.
Use a new inet->inet_flags to hold atomic bits in the series.
Remove inet->cmsg_flags, and use instead 9 bits from inet_flags.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 55 +++++++++++++++++++++------
net/ipv4/ip_sockglue.c | 84 +++++++++++++++--------------------------
net/ipv4/ping.c | 5 ++-
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 2 +-
net/ipv6/datagram.c | 2 +-
net/ipv6/udp.c | 2 +-
net/l2tp/l2tp_ip.c | 2 +-
8 files changed, 83 insertions(+), 71 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 0bb32bfc61832dd787abcb2db3ee85d55c83f2c9..e3b35b0015f335fefa350fd81797a9466ba32f32 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -194,6 +194,7 @@ struct rtable;
* @inet_rcv_saddr - Bound local IPv4 addr
* @inet_dport - Destination port
* @inet_num - Local port
+ * @inet_flags - various atomic flags
* @inet_saddr - Sending source
* @uc_ttl - Unicast TTL
* @inet_sport - Source port
@@ -218,11 +219,11 @@ struct inet_sock {
#define inet_dport sk.__sk_common.skc_dport
#define inet_num sk.__sk_common.skc_num
+ unsigned long inet_flags;
__be32 inet_saddr;
__s16 uc_ttl;
- __u16 cmsg_flags;
- struct ip_options_rcu __rcu *inet_opt;
__be16 inet_sport;
+ struct ip_options_rcu __rcu *inet_opt;
__u16 inet_id;
__u8 tos;
@@ -259,16 +260,48 @@ struct inet_sock {
#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
#define IPCORK_ALLFRAG 2 /* always fragment (for ipv6 for now) */
+enum {
+ INET_FLAGS_PKTINFO = 0,
+ INET_FLAGS_TTL = 1,
+ INET_FLAGS_TOS = 2,
+ INET_FLAGS_RECVOPTS = 3,
+ INET_FLAGS_RETOPTS = 4,
+ INET_FLAGS_PASSSEC = 5,
+ INET_FLAGS_ORIGDSTADDR = 6,
+ INET_FLAGS_CHECKSUM = 7,
+ INET_FLAGS_RECVFRAGSIZE = 8,
+};
+
/* cmsg flags for inet */
-#define IP_CMSG_PKTINFO BIT(0)
-#define IP_CMSG_TTL BIT(1)
-#define IP_CMSG_TOS BIT(2)
-#define IP_CMSG_RECVOPTS BIT(3)
-#define IP_CMSG_RETOPTS BIT(4)
-#define IP_CMSG_PASSSEC BIT(5)
-#define IP_CMSG_ORIGDSTADDR BIT(6)
-#define IP_CMSG_CHECKSUM BIT(7)
-#define IP_CMSG_RECVFRAGSIZE BIT(8)
+#define IP_CMSG_PKTINFO BIT(INET_FLAGS_PKTINFO)
+#define IP_CMSG_TTL BIT(INET_FLAGS_TTL)
+#define IP_CMSG_TOS BIT(INET_FLAGS_TOS)
+#define IP_CMSG_RECVOPTS BIT(INET_FLAGS_RECVOPTS)
+#define IP_CMSG_RETOPTS BIT(INET_FLAGS_RETOPTS)
+#define IP_CMSG_PASSSEC BIT(INET_FLAGS_PASSSEC)
+#define IP_CMSG_ORIGDSTADDR BIT(INET_FLAGS_ORIGDSTADDR)
+#define IP_CMSG_CHECKSUM BIT(INET_FLAGS_CHECKSUM)
+#define IP_CMSG_RECVFRAGSIZE BIT(INET_FLAGS_RECVFRAGSIZE)
+
+#define IP_CMSG_ALL (IP_CMSG_PKTINFO | IP_CMSG_TTL | \
+ IP_CMSG_TOS | IP_CMSG_RECVOPTS | \
+ IP_CMSG_RETOPTS | IP_CMSG_PASSSEC | \
+ IP_CMSG_ORIGDSTADDR | IP_CMSG_CHECKSUM | \
+ IP_CMSG_RECVFRAGSIZE)
+
+static inline unsigned long inet_cmsg_flags(const struct inet_sock *inet)
+{
+ return READ_ONCE(inet->inet_flags) & IP_CMSG_ALL;
+}
+
+#define inet_test_bit(nr, sk) \
+ test_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
+#define inet_set_bit(nr, sk) \
+ set_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
+#define inet_clear_bit(nr, sk) \
+ clear_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags)
+#define inet_assign_bit(nr, sk, val) \
+ assign_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags, val)
static inline bool sk_is_inet(struct sock *sk)
{
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d41bce8927b2cca825a804dc113450b62262cc94..66f55f3db5ec88e1c771847444eba1d554aca8dc 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -171,8 +171,10 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb)
void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb, int tlen, int offset)
{
- struct inet_sock *inet = inet_sk(sk);
- unsigned int flags = inet->cmsg_flags;
+ unsigned long flags = inet_cmsg_flags(inet_sk(sk));
+
+ if (!flags)
+ return;
/* Ordered by supposed usage frequency */
if (flags & IP_CMSG_PKTINFO) {
@@ -568,7 +570,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
if (ipv4_datagram_support_cmsg(sk, skb, serr->ee.ee_origin)) {
sin->sin_family = AF_INET;
sin->sin_addr.s_addr = ip_hdr(skb)->saddr;
- if (inet_sk(sk)->cmsg_flags)
+ if (inet_cmsg_flags(inet_sk(sk)))
ip_cmsg_recv(msg, skb);
}
@@ -635,7 +637,7 @@ EXPORT_SYMBOL(ip_sock_set_mtu_discover);
void ip_sock_set_pktinfo(struct sock *sk)
{
lock_sock(sk);
- inet_sk(sk)->cmsg_flags |= IP_CMSG_PKTINFO;
+ inet_set_bit(PKTINFO, sk);
release_sock(sk);
}
EXPORT_SYMBOL(ip_sock_set_pktinfo);
@@ -990,67 +992,43 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
break;
}
case IP_PKTINFO:
- if (val)
- inet->cmsg_flags |= IP_CMSG_PKTINFO;
- else
- inet->cmsg_flags &= ~IP_CMSG_PKTINFO;
+ inet_assign_bit(PKTINFO, sk, val);
break;
case IP_RECVTTL:
- if (val)
- inet->cmsg_flags |= IP_CMSG_TTL;
- else
- inet->cmsg_flags &= ~IP_CMSG_TTL;
+ inet_assign_bit(TTL, sk, val);
break;
case IP_RECVTOS:
- if (val)
- inet->cmsg_flags |= IP_CMSG_TOS;
- else
- inet->cmsg_flags &= ~IP_CMSG_TOS;
+ inet_assign_bit(TOS, sk, val);
break;
case IP_RECVOPTS:
- if (val)
- inet->cmsg_flags |= IP_CMSG_RECVOPTS;
- else
- inet->cmsg_flags &= ~IP_CMSG_RECVOPTS;
+ inet_assign_bit(RECVOPTS, sk, val);
break;
case IP_RETOPTS:
- if (val)
- inet->cmsg_flags |= IP_CMSG_RETOPTS;
- else
- inet->cmsg_flags &= ~IP_CMSG_RETOPTS;
+ inet_assign_bit(RETOPTS, sk, val);
break;
case IP_PASSSEC:
- if (val)
- inet->cmsg_flags |= IP_CMSG_PASSSEC;
- else
- inet->cmsg_flags &= ~IP_CMSG_PASSSEC;
+ inet_assign_bit(PASSSEC, sk, val);
break;
case IP_RECVORIGDSTADDR:
- if (val)
- inet->cmsg_flags |= IP_CMSG_ORIGDSTADDR;
- else
- inet->cmsg_flags &= ~IP_CMSG_ORIGDSTADDR;
+ inet_assign_bit(ORIGDSTADDR, sk, val);
break;
case IP_CHECKSUM:
if (val) {
- if (!(inet->cmsg_flags & IP_CMSG_CHECKSUM)) {
+ if (!(inet_test_bit(CHECKSUM, sk))) {
inet_inc_convert_csum(sk);
- inet->cmsg_flags |= IP_CMSG_CHECKSUM;
+ inet_set_bit(CHECKSUM, sk);
}
} else {
- if (inet->cmsg_flags & IP_CMSG_CHECKSUM) {
+ if (inet_test_bit(CHECKSUM, sk)) {
inet_dec_convert_csum(sk);
- inet->cmsg_flags &= ~IP_CMSG_CHECKSUM;
+ inet_clear_bit(CHECKSUM, sk);
}
}
break;
case IP_RECVFRAGSIZE:
if (sk->sk_type != SOCK_RAW && sk->sk_type != SOCK_DGRAM)
goto e_inval;
- if (val)
- inet->cmsg_flags |= IP_CMSG_RECVFRAGSIZE;
- else
- inet->cmsg_flags &= ~IP_CMSG_RECVFRAGSIZE;
+ inet_assign_bit(RECVFRAGSIZE, sk, val);
break;
case IP_TOS: /* This sets both TOS and Precedence */
__ip_sock_set_tos(sk, val);
@@ -1415,7 +1393,7 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
{
struct in_pktinfo *pktinfo = PKTINFO_SKB_CB(skb);
- bool prepare = (inet_sk(sk)->cmsg_flags & IP_CMSG_PKTINFO) ||
+ bool prepare = inet_test_bit(PKTINFO, sk) ||
ipv6_sk_rxinfo(sk);
if (prepare && skb_rtable(skb)) {
@@ -1601,31 +1579,31 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
return 0;
}
case IP_PKTINFO:
- val = (inet->cmsg_flags & IP_CMSG_PKTINFO) != 0;
+ val = inet_test_bit(PKTINFO, sk);
break;
case IP_RECVTTL:
- val = (inet->cmsg_flags & IP_CMSG_TTL) != 0;
+ val = inet_test_bit(TTL, sk);
break;
case IP_RECVTOS:
- val = (inet->cmsg_flags & IP_CMSG_TOS) != 0;
+ val = inet_test_bit(TOS, sk);
break;
case IP_RECVOPTS:
- val = (inet->cmsg_flags & IP_CMSG_RECVOPTS) != 0;
+ val = inet_test_bit(RECVOPTS, sk);
break;
case IP_RETOPTS:
- val = (inet->cmsg_flags & IP_CMSG_RETOPTS) != 0;
+ val = inet_test_bit(RETOPTS, sk);
break;
case IP_PASSSEC:
- val = (inet->cmsg_flags & IP_CMSG_PASSSEC) != 0;
+ val = inet_test_bit(PASSSEC, sk);
break;
case IP_RECVORIGDSTADDR:
- val = (inet->cmsg_flags & IP_CMSG_ORIGDSTADDR) != 0;
+ val = inet_test_bit(ORIGDSTADDR, sk);
break;
case IP_CHECKSUM:
- val = (inet->cmsg_flags & IP_CMSG_CHECKSUM) != 0;
+ val = inet_test_bit(CHECKSUM, sk);
break;
case IP_RECVFRAGSIZE:
- val = (inet->cmsg_flags & IP_CMSG_RECVFRAGSIZE) != 0;
+ val = inet_test_bit(RECVFRAGSIZE, sk);
break;
case IP_TOS:
val = inet->tos;
@@ -1737,7 +1715,7 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
msg.msg_controllen = len;
msg.msg_flags = in_compat_syscall() ? MSG_CMSG_COMPAT : 0;
- if (inet->cmsg_flags & IP_CMSG_PKTINFO) {
+ if (inet_test_bit(PKTINFO, sk)) {
struct in_pktinfo info;
info.ipi_addr.s_addr = inet->inet_rcv_saddr;
@@ -1745,11 +1723,11 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
info.ipi_ifindex = inet->mc_index;
put_cmsg(&msg, SOL_IP, IP_PKTINFO, sizeof(info), &info);
}
- if (inet->cmsg_flags & IP_CMSG_TTL) {
+ if (inet_test_bit(TTL, sk)) {
int hlim = inet->mc_ttl;
put_cmsg(&msg, SOL_IP, IP_TTL, sizeof(hlim), &hlim);
}
- if (inet->cmsg_flags & IP_CMSG_TOS) {
+ if (inet_test_bit(TOS, sk)) {
int tos = inet->rcv_tos;
put_cmsg(&msg, SOL_IP, IP_TOS, sizeof(tos), &tos);
}
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 25dd78cee1792fdbd46873e58a503ab5a45d85b2..7e8702cb6634465f5e319a10e8f845093a354f47 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -894,7 +894,7 @@ int ping_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
*addr_len = sizeof(*sin);
}
- if (isk->cmsg_flags)
+ if (inet_cmsg_flags(isk))
ip_cmsg_recv(msg, skb);
#if IS_ENABLED(CONFIG_IPV6)
@@ -921,7 +921,8 @@ int ping_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
if (skb->protocol == htons(ETH_P_IPV6) &&
inet6_sk(sk)->rxopt.all)
pingv6_ops.ip6_datagram_recv_specific_ctl(sk, msg, skb);
- else if (skb->protocol == htons(ETH_P_IP) && isk->cmsg_flags)
+ else if (skb->protocol == htons(ETH_P_IP) &&
+ inet_cmsg_flags(isk))
ip_cmsg_recv(msg, skb);
#endif
} else {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index cb381f5aa46438394cdec520a99f7a8bc67fcfb9..e6e813f4aa317c3a5f242776a889610ccc1aa72f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -767,7 +767,7 @@ static int raw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
memset(&sin->sin_zero, 0, sizeof(sin->sin_zero));
*addr_len = sizeof(*sin);
}
- if (inet->cmsg_flags)
+ if (inet_cmsg_flags(inet))
ip_cmsg_recv(msg, skb);
if (flags & MSG_TRUNC)
copied = skb->len;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3e2f29c14fa8c4ec10009b08e5740b7be052132d..4b791133989c0abe4f869ef0c56649c9d671db1a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1870,7 +1870,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
if (udp_sk(sk)->gro_enabled)
udp_cmsg_recv(msg, sk, skb);
- if (inet->cmsg_flags)
+ if (inet_cmsg_flags(inet))
ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);
err = copied;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index d80d6024cafa90932ba7d749c0b8d4cd8f9d9cc3..41ebc4e574734456357169e883c3d13e42fa66b2 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -524,7 +524,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
} else {
ipv6_addr_set_v4mapped(ip_hdr(skb)->saddr,
&sin->sin6_addr);
- if (inet_sk(sk)->cmsg_flags)
+ if (inet_cmsg_flags(inet_sk(sk)))
ip_cmsg_recv(msg, skb);
}
}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 1ea01b0d9be383b2f7095ca40d89fd1599e0e33f..ebc6ae47cfeadc699e3f5a1f46be85803ff37fdd 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -420,7 +420,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
ip6_datagram_recv_common_ctl(sk, msg, skb);
if (is_udp4) {
- if (inet->cmsg_flags)
+ if (inet_cmsg_flags(inet))
ip_cmsg_recv_offset(msg, sk, skb,
sizeof(struct udphdr), off);
} else {
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index f9073bc7281f96267f6b40b830a19fa0e8df140f..9a2a9ed3ba478b9d00885b1a00e87f0edde5cb33 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -552,7 +552,7 @@ static int l2tp_ip_recvmsg(struct sock *sk, struct msghdr *msg,
memset(&sin->sin_zero, 0, sizeof(sin->sin_zero));
*addr_len = sizeof(*sin);
}
- if (inet->cmsg_flags)
+ if (inet_cmsg_flags(inet))
ip_cmsg_recv(msg, skb);
if (flags & MSG_TRUNC)
copied = skb->len;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 02/15] inet: set/get simple options locklessly
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 01/15] inet: introduce inet->inet_flags Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 03/15] inet: move inet->recverr to inet->inet_flags Eric Dumazet
` (12 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
Now we have inet->inet_flags, we can set following options
without having to hold the socket lock:
IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS,
IP_PASSSEC, IP_RECVORIGDSTADDR, IP_RECVFRAGSIZE.
ip_sock_set_pktinfo() no longer hold the socket lock.
Similarly we can get the following options whithout holding
the socket lock:
IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS,
IP_PASSSEC, IP_RECVORIGDSTADDR, IP_CHECKSUM, IP_RECVFRAGSIZE.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
net/ipv4/ip_sockglue.c | 118 ++++++++++++++++++++++-------------------
1 file changed, 62 insertions(+), 56 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 66f55f3db5ec88e1c771847444eba1d554aca8dc..69b87518348aa5697edc6d88679384f00681f539 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -636,9 +636,7 @@ EXPORT_SYMBOL(ip_sock_set_mtu_discover);
void ip_sock_set_pktinfo(struct sock *sk)
{
- lock_sock(sk);
inet_set_bit(PKTINFO, sk);
- release_sock(sk);
}
EXPORT_SYMBOL(ip_sock_set_pktinfo);
@@ -952,6 +950,36 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
if (ip_mroute_opt(optname))
return ip_mroute_setsockopt(sk, optname, optval, optlen);
+ /* Handle options that can be set without locking the socket. */
+ switch (optname) {
+ case IP_PKTINFO:
+ inet_assign_bit(PKTINFO, sk, val);
+ return 0;
+ case IP_RECVTTL:
+ inet_assign_bit(TTL, sk, val);
+ return 0;
+ case IP_RECVTOS:
+ inet_assign_bit(TOS, sk, val);
+ return 0;
+ case IP_RECVOPTS:
+ inet_assign_bit(RECVOPTS, sk, val);
+ return 0;
+ case IP_RETOPTS:
+ inet_assign_bit(RETOPTS, sk, val);
+ return 0;
+ case IP_PASSSEC:
+ inet_assign_bit(PASSSEC, sk, val);
+ return 0;
+ case IP_RECVORIGDSTADDR:
+ inet_assign_bit(ORIGDSTADDR, sk, val);
+ return 0;
+ case IP_RECVFRAGSIZE:
+ if (sk->sk_type != SOCK_RAW && sk->sk_type != SOCK_DGRAM)
+ return -EINVAL;
+ inet_assign_bit(RECVFRAGSIZE, sk, val);
+ return 0;
+ }
+
err = 0;
if (needs_rtnl)
rtnl_lock();
@@ -991,27 +1019,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
kfree_rcu(old, rcu);
break;
}
- case IP_PKTINFO:
- inet_assign_bit(PKTINFO, sk, val);
- break;
- case IP_RECVTTL:
- inet_assign_bit(TTL, sk, val);
- break;
- case IP_RECVTOS:
- inet_assign_bit(TOS, sk, val);
- break;
- case IP_RECVOPTS:
- inet_assign_bit(RECVOPTS, sk, val);
- break;
- case IP_RETOPTS:
- inet_assign_bit(RETOPTS, sk, val);
- break;
- case IP_PASSSEC:
- inet_assign_bit(PASSSEC, sk, val);
- break;
- case IP_RECVORIGDSTADDR:
- inet_assign_bit(ORIGDSTADDR, sk, val);
- break;
case IP_CHECKSUM:
if (val) {
if (!(inet_test_bit(CHECKSUM, sk))) {
@@ -1025,11 +1032,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
}
}
break;
- case IP_RECVFRAGSIZE:
- if (sk->sk_type != SOCK_RAW && sk->sk_type != SOCK_DGRAM)
- goto e_inval;
- inet_assign_bit(RECVFRAGSIZE, sk, val);
- break;
case IP_TOS: /* This sets both TOS and Precedence */
__ip_sock_set_tos(sk, val);
break;
@@ -1544,6 +1546,37 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
if (len < 0)
return -EINVAL;
+ /* Handle options that can be read without locking the socket. */
+ switch (optname) {
+ case IP_PKTINFO:
+ val = inet_test_bit(PKTINFO, sk);
+ goto copyval;
+ case IP_RECVTTL:
+ val = inet_test_bit(TTL, sk);
+ goto copyval;
+ case IP_RECVTOS:
+ val = inet_test_bit(TOS, sk);
+ goto copyval;
+ case IP_RECVOPTS:
+ val = inet_test_bit(RECVOPTS, sk);
+ goto copyval;
+ case IP_RETOPTS:
+ val = inet_test_bit(RETOPTS, sk);
+ goto copyval;
+ case IP_PASSSEC:
+ val = inet_test_bit(PASSSEC, sk);
+ goto copyval;
+ case IP_RECVORIGDSTADDR:
+ val = inet_test_bit(ORIGDSTADDR, sk);
+ goto copyval;
+ case IP_CHECKSUM:
+ val = inet_test_bit(CHECKSUM, sk);
+ goto copyval;
+ case IP_RECVFRAGSIZE:
+ val = inet_test_bit(RECVFRAGSIZE, sk);
+ goto copyval;
+ }
+
if (needs_rtnl)
rtnl_lock();
sockopt_lock_sock(sk);
@@ -1578,33 +1611,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
return -EFAULT;
return 0;
}
- case IP_PKTINFO:
- val = inet_test_bit(PKTINFO, sk);
- break;
- case IP_RECVTTL:
- val = inet_test_bit(TTL, sk);
- break;
- case IP_RECVTOS:
- val = inet_test_bit(TOS, sk);
- break;
- case IP_RECVOPTS:
- val = inet_test_bit(RECVOPTS, sk);
- break;
- case IP_RETOPTS:
- val = inet_test_bit(RETOPTS, sk);
- break;
- case IP_PASSSEC:
- val = inet_test_bit(PASSSEC, sk);
- break;
- case IP_RECVORIGDSTADDR:
- val = inet_test_bit(ORIGDSTADDR, sk);
- break;
- case IP_CHECKSUM:
- val = inet_test_bit(CHECKSUM, sk);
- break;
- case IP_RECVFRAGSIZE:
- val = inet_test_bit(RECVFRAGSIZE, sk);
- break;
case IP_TOS:
val = inet->tos;
break;
@@ -1754,7 +1760,7 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
return -ENOPROTOOPT;
}
sockopt_release_sock(sk);
-
+copyval:
if (len < sizeof(int) && len > 0 && val >= 0 && val <= 255) {
unsigned char ucval = (unsigned char)val;
len = 1;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 03/15] inet: move inet->recverr to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 01/15] inet: introduce inet->inet_flags Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 02/15] inet: set/get simple options locklessly Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 04/15] inet: move inet->recverr_rfc4884 " Eric Dumazet
` (11 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_RECVERR socket option can now be set/get without locking the socket.
This patch potentially avoid data-races around inet->recverr.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 5 +++--
net/dccp/ipv4.c | 4 +---
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 23 ++++++++++-------------
net/ipv4/ping.c | 2 +-
net/ipv4/raw.c | 14 ++++++++------
net/ipv4/tcp_ipv4.c | 5 ++---
net/ipv4/udp.c | 5 +++--
net/sctp/input.c | 2 +-
9 files changed, 30 insertions(+), 32 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index e3b35b0015f335fefa350fd81797a9466ba32f32..552188aa5a2d2f968b1d95e963d48a063ec4fd59 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -230,8 +230,7 @@ struct inet_sock {
__u8 min_ttl;
__u8 mc_ttl;
__u8 pmtudisc;
- __u8 recverr:1,
- is_icsk:1,
+ __u8 is_icsk:1,
freebind:1,
hdrincl:1,
mc_loop:1,
@@ -270,6 +269,8 @@ enum {
INET_FLAGS_ORIGDSTADDR = 6,
INET_FLAGS_CHECKSUM = 7,
INET_FLAGS_RECVFRAGSIZE = 8,
+
+ INET_FLAGS_RECVERR = 9,
};
/* cmsg flags for inet */
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 8e919cfe6e2333d066b0ce08b752c1c89dc8fe64..8dd6837c476a96071f39ef63b517a15b7b1e8cb0 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -247,7 +247,6 @@ static int dccp_v4_err(struct sk_buff *skb, u32 info)
const u8 offset = iph->ihl << 2;
const struct dccp_hdr *dh;
struct dccp_sock *dp;
- struct inet_sock *inet;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
struct sock *sk;
@@ -361,8 +360,7 @@ static int dccp_v4_err(struct sk_buff *skb, u32 info)
* --ANK (980905)
*/
- inet = inet_sk(sk);
- if (!sock_owned_by_user(sk) && inet->recverr) {
+ if (!sock_owned_by_user(sk) && inet_test_bit(RECVERR, sk)) {
sk->sk_err = err;
sk_error_report(sk);
} else { /* Only an error on timeout */
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index f7426926a10413b42ec3b99d97f59445b6d1becc..25d5f76b66bd82be2c2abc6bd5206ec54f736be6 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -182,7 +182,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
r->idiag_inode = sock_i_ino(sk);
memset(&inet_sockopt, 0, sizeof(inet_sockopt));
- inet_sockopt.recverr = inet->recverr;
+ inet_sockopt.recverr = inet_test_bit(RECVERR, sk);
inet_sockopt.is_icsk = inet->is_icsk;
inet_sockopt.freebind = inet->freebind;
inet_sockopt.hdrincl = inet->hdrincl;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 69b87518348aa5697edc6d88679384f00681f539..8283d862a9dbb5040db4e419e9dff31bbd3cff81 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -446,12 +446,11 @@ EXPORT_SYMBOL_GPL(ip_icmp_error);
void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 info)
{
- struct inet_sock *inet = inet_sk(sk);
struct sock_exterr_skb *serr;
struct iphdr *iph;
struct sk_buff *skb;
- if (!inet->recverr)
+ if (!inet_test_bit(RECVERR, sk))
return;
skb = alloc_skb(sizeof(struct iphdr), GFP_ATOMIC);
@@ -617,9 +616,7 @@ EXPORT_SYMBOL(ip_sock_set_freebind);
void ip_sock_set_recverr(struct sock *sk)
{
- lock_sock(sk);
- inet_sk(sk)->recverr = true;
- release_sock(sk);
+ inet_set_bit(RECVERR, sk);
}
EXPORT_SYMBOL(ip_sock_set_recverr);
@@ -978,6 +975,11 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet_assign_bit(RECVFRAGSIZE, sk, val);
return 0;
+ case IP_RECVERR:
+ inet_assign_bit(RECVERR, sk, val);
+ if (!val)
+ skb_queue_purge(&sk->sk_error_queue);
+ return 0;
}
err = 0;
@@ -1064,11 +1066,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet->pmtudisc = val;
break;
- case IP_RECVERR:
- inet->recverr = !!val;
- if (!val)
- skb_queue_purge(&sk->sk_error_queue);
- break;
case IP_RECVERR_RFC4884:
if (val < 0 || val > 1)
goto e_inval;
@@ -1575,6 +1572,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_RECVFRAGSIZE:
val = inet_test_bit(RECVFRAGSIZE, sk);
goto copyval;
+ case IP_RECVERR:
+ val = inet_test_bit(RECVERR, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1649,9 +1649,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
}
break;
}
- case IP_RECVERR:
- val = inet->recverr;
- break;
case IP_RECVERR_RFC4884:
val = inet->recverr_rfc4884;
break;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 7e8702cb6634465f5e319a10e8f845093a354f47..75e0aee35eb787a6c9f70394294b30490c980a64 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -580,7 +580,7 @@ void ping_err(struct sk_buff *skb, int offset, u32 info)
* RFC1122: OK. Passes ICMP errors back to application, as per
* 4.1.3.3.
*/
- if ((family == AF_INET && !inet_sock->recverr) ||
+ if ((family == AF_INET && !inet_test_bit(RECVERR, sk)) ||
(family == AF_INET6 && !inet6_sk(sk)->recverr)) {
if (!harderr || sk->sk_state != TCP_ESTABLISHED)
goto out;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index e6e813f4aa317c3a5f242776a889610ccc1aa72f..f4c27dc5714bd4be7bbd4a8e5b614c9426e6b987 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -203,8 +203,9 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
struct inet_sock *inet = inet_sk(sk);
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
- int err = 0;
int harderr = 0;
+ bool recverr;
+ int err = 0;
if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED)
ipv4_sk_update_pmtu(skb, sk, info);
@@ -218,7 +219,8 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
2. Socket is connected (otherwise the error indication
is useless without ip_recverr and error is hard.
*/
- if (!inet->recverr && sk->sk_state != TCP_ESTABLISHED)
+ recverr = inet_test_bit(RECVERR, sk);
+ if (!recverr && sk->sk_state != TCP_ESTABLISHED)
return;
switch (type) {
@@ -245,7 +247,7 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
}
}
- if (inet->recverr) {
+ if (recverr) {
const struct iphdr *iph = (const struct iphdr *)skb->data;
u8 *payload = skb->data + (iph->ihl << 2);
@@ -254,7 +256,7 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
ip_icmp_error(sk, skb, err, 0, info, payload);
}
- if (inet->recverr || harderr) {
+ if (recverr || harderr) {
sk->sk_err = err;
sk_error_report(sk);
}
@@ -413,7 +415,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
kfree_skb(skb);
error:
IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS);
- if (err == -ENOBUFS && !inet->recverr)
+ if (err == -ENOBUFS && !inet_test_bit(RECVERR, sk))
err = 0;
return err;
}
@@ -645,7 +647,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ip_flush_pending_frames(sk);
else if (!(msg->msg_flags & MSG_MORE)) {
err = ip_push_pending_frames(sk, &fl4);
- if (err == -ENOBUFS && !inet->recverr)
+ if (err == -ENOBUFS && !inet_test_bit(RECVERR, sk))
err = 0;
}
release_sock(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5b18a048f613e9ab2807c4774882df6320754a8d..2a662d5f3072f5eef5398314ac9a91703ac816bb 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -477,7 +477,6 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
const struct iphdr *iph = (const struct iphdr *)skb->data;
struct tcphdr *th = (struct tcphdr *)(skb->data + (iph->ihl << 2));
struct tcp_sock *tp;
- struct inet_sock *inet;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
struct sock *sk;
@@ -625,8 +624,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
* --ANK (980905)
*/
- inet = inet_sk(sk);
- if (!sock_owned_by_user(sk) && inet->recverr) {
+ if (!sock_owned_by_user(sk) &&
+ inet_test_bit(RECVERR, sk)) {
WRITE_ONCE(sk->sk_err, err);
sk_error_report(sk);
} else { /* Only an error on timeout */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4b791133989c0abe4f869ef0c56649c9d671db1a..0794a2c46a568d644cc488c1d7f6ee676180a5bd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -779,7 +779,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
(u8 *)(uh+1));
goto out;
}
- if (!inet->recverr) {
+ if (!inet_test_bit(RECVERR, sk)) {
if (!harderr || sk->sk_state != TCP_ESTABLISHED)
goto out;
} else
@@ -962,7 +962,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
send:
err = ip_send_skb(sock_net(sk), skb);
if (err) {
- if (err == -ENOBUFS && !inet->recverr) {
+ if (err == -ENOBUFS &&
+ !inet_test_bit(RECVERR, sk)) {
UDP_INC_STATS(sock_net(sk),
UDP_MIB_SNDBUFERRORS, is_udplite);
err = 0;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 2613c4d74b1699aad9e480663600841059fc0d6b..17fcaa9b0df9452bbfe7c3bb4b2d300e6ca6ce40 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -581,7 +581,7 @@ static void sctp_v4_err_handle(struct sctp_transport *t, struct sk_buff *skb,
default:
return;
}
- if (!sock_owned_by_user(sk) && inet_sk(sk)->recverr) {
+ if (!sock_owned_by_user(sk) && inet_test_bit(RECVERR, sk)) {
sk->sk_err = err;
sk_error_report(sk);
} else { /* Only an error on timeout */
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 04/15] inet: move inet->recverr_rfc4884 to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (2 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 03/15] inet: move inet->recverr to inet->inet_flags Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 05/15] inet: move inet->freebind " Eric Dumazet
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_RECVERR_RFC4884 socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 18 +++++++++---------
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 552188aa5a2d2f968b1d95e963d48a063ec4fd59..c01f1f64a8617582c68079048f74e0db606e1834 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -238,7 +238,6 @@ struct inet_sock {
mc_all:1,
nodefrag:1;
__u8 bind_address_no_port:1,
- recverr_rfc4884:1,
defer_connect:1; /* Indicates that fastopen_connect is set
* and cookie exists so we defer connect
* until first data frame is written
@@ -271,6 +270,7 @@ enum {
INET_FLAGS_RECVFRAGSIZE = 8,
INET_FLAGS_RECVERR = 9,
+ INET_FLAGS_RECVERR_RFC4884 = 10,
};
/* cmsg flags for inet */
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 25d5f76b66bd82be2c2abc6bd5206ec54f736be6..6255d6fdbc80d82904583a8fc6c439a25e875a0b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -191,7 +191,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.mc_all = inet->mc_all;
inet_sockopt.nodefrag = inet->nodefrag;
inet_sockopt.bind_address_no_port = inet->bind_address_no_port;
- inet_sockopt.recverr_rfc4884 = inet->recverr_rfc4884;
+ inet_sockopt.recverr_rfc4884 = inet_test_bit(RECVERR_RFC4884, sk);
inet_sockopt.defer_connect = inet->defer_connect;
if (nla_put(skb, INET_DIAG_SOCKOPT, sizeof(inet_sockopt),
&inet_sockopt))
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 8283d862a9dbb5040db4e419e9dff31bbd3cff81..f75f44ad7b11ac169b343b3c26d744cdc81d747c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -433,7 +433,7 @@ void ip_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
serr->port = port;
if (skb_pull(skb, payload - skb->data)) {
- if (inet_sk(sk)->recverr_rfc4884)
+ if (inet_test_bit(RECVERR_RFC4884, sk))
ipv4_icmp_error_rfc4884(skb, &serr->ee.ee_rfc4884);
skb_reset_transport_header(skb);
@@ -980,6 +980,11 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
if (!val)
skb_queue_purge(&sk->sk_error_queue);
return 0;
+ case IP_RECVERR_RFC4884:
+ if (val < 0 || val > 1)
+ return -EINVAL;
+ inet_assign_bit(RECVERR_RFC4884, sk, val);
+ return 0;
}
err = 0;
@@ -1066,11 +1071,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet->pmtudisc = val;
break;
- case IP_RECVERR_RFC4884:
- if (val < 0 || val > 1)
- goto e_inval;
- inet->recverr_rfc4884 = !!val;
- break;
case IP_MULTICAST_TTL:
if (sk->sk_type == SOCK_STREAM)
goto e_inval;
@@ -1575,6 +1575,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_RECVERR:
val = inet_test_bit(RECVERR, sk);
goto copyval;
+ case IP_RECVERR_RFC4884:
+ val = inet_test_bit(RECVERR_RFC4884, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1649,9 +1652,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
}
break;
}
- case IP_RECVERR_RFC4884:
- val = inet->recverr_rfc4884;
- break;
case IP_MULTICAST_TTL:
val = inet->mc_ttl;
break;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 05/15] inet: move inet->freebind to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (3 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 04/15] inet: move inet->recverr_rfc4884 " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 06/15] inet: move inet->hdrincl " Eric Dumazet
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_FREEBIND socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 5 +++--
include/net/ipv6.h | 3 ++-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 21 +++++++++------------
net/ipv6/ipv6_sockglue.c | 4 ++--
net/mptcp/sockopt.c | 8 +++++---
net/sctp/protocol.c | 2 +-
7 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index c01f1f64a8617582c68079048f74e0db606e1834..d6ba963534b4a5aa5dc6f88b94dd36f260be765b 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -231,7 +231,6 @@ struct inet_sock {
__u8 mc_ttl;
__u8 pmtudisc;
__u8 is_icsk:1,
- freebind:1,
hdrincl:1,
mc_loop:1,
transparent:1,
@@ -271,6 +270,7 @@ enum {
INET_FLAGS_RECVERR = 9,
INET_FLAGS_RECVERR_RFC4884 = 10,
+ INET_FLAGS_FREEBIND = 11,
};
/* cmsg flags for inet */
@@ -423,7 +423,8 @@ static inline bool inet_can_nonlocal_bind(struct net *net,
struct inet_sock *inet)
{
return READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind) ||
- inet->freebind || inet->transparent;
+ test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) ||
+ inet->transparent;
}
static inline bool inet_addr_valid_or_nonlocal(struct net *net,
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 2acc4c808d45d1c1bb1c5076e79842e136203e4c..5f513503e7d568c189a7b14439612f4e27ba539b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -937,7 +937,8 @@ static inline bool ipv6_can_nonlocal_bind(struct net *net,
struct inet_sock *inet)
{
return net->ipv6.sysctl.ip_nonlocal_bind ||
- inet->freebind || inet->transparent;
+ test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) ||
+ inet->transparent;
}
/* Sysctl settings for net ipv6.auto_flowlabels */
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 6255d6fdbc80d82904583a8fc6c439a25e875a0b..5a96f4f28eca6ae6e84cb3761531309e8da0be09 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -184,7 +184,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
memset(&inet_sockopt, 0, sizeof(inet_sockopt));
inet_sockopt.recverr = inet_test_bit(RECVERR, sk);
inet_sockopt.is_icsk = inet->is_icsk;
- inet_sockopt.freebind = inet->freebind;
+ inet_sockopt.freebind = inet_test_bit(FREEBIND, sk);
inet_sockopt.hdrincl = inet->hdrincl;
inet_sockopt.mc_loop = inet->mc_loop;
inet_sockopt.transparent = inet->transparent;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index f75f44ad7b11ac169b343b3c26d744cdc81d747c..6af84310631288c07f26c19734c5abc0fd82dc23 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -608,9 +608,7 @@ EXPORT_SYMBOL(ip_sock_set_tos);
void ip_sock_set_freebind(struct sock *sk)
{
- lock_sock(sk);
- inet_sk(sk)->freebind = true;
- release_sock(sk);
+ inet_set_bit(FREEBIND, sk);
}
EXPORT_SYMBOL(ip_sock_set_freebind);
@@ -985,6 +983,11 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet_assign_bit(RECVERR_RFC4884, sk, val);
return 0;
+ case IP_FREEBIND:
+ if (optlen < 1)
+ return -EINVAL;
+ inet_assign_bit(FREEBIND, sk, val);
+ return 0;
}
err = 0;
@@ -1310,12 +1313,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
inet->mc_all = val;
break;
- case IP_FREEBIND:
- if (optlen < 1)
- goto e_inval;
- inet->freebind = !!val;
- break;
-
case IP_IPSEC_POLICY:
case IP_XFRM_POLICY:
err = -EPERM;
@@ -1578,6 +1575,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_RECVERR_RFC4884:
val = inet_test_bit(RECVERR_RFC4884, sk);
goto copyval;
+ case IP_FREEBIND:
+ val = inet_test_bit(FREEBIND, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1737,9 +1737,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
len -= msg.msg_controllen;
return copy_to_sockptr(optlen, &len, sizeof(int));
}
- case IP_FREEBIND:
- val = inet->freebind;
- break;
case IP_TRANSPARENT:
val = inet->transparent;
break;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index ca377159967c8aa9c18a80f9b189f4ef41398d01..3eb38436f8d431ca37200869bfe57ec33b46bf8b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -641,7 +641,7 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
if (optlen < sizeof(int))
goto e_inval;
/* we also don't have a separate freebind bit for IPV6 */
- inet_sk(sk)->freebind = valbool;
+ inet_assign_bit(FREEBIND, sk, valbool);
retv = 0;
break;
@@ -1334,7 +1334,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_FREEBIND:
- val = inet_sk(sk)->freebind;
+ val = inet_test_bit(FREEBIND, sk);
break;
case IPV6_RECVORIGDSTADDR:
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index a3f1fe810cc961bf689fe8edda49d227a3170f91..1f3331f9f7c85f3b2a1e8dc03cf80be73af4ed0d 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -419,7 +419,8 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
inet_sk(sk)->transparent = inet_sk(ssock->sk)->transparent;
break;
case IPV6_FREEBIND:
- inet_sk(sk)->freebind = inet_sk(ssock->sk)->freebind;
+ inet_assign_bit(FREEBIND, sk,
+ inet_test_bit(FREEBIND, ssock->sk));
break;
}
@@ -704,7 +705,8 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
switch (optname) {
case IP_FREEBIND:
- issk->freebind = inet_sk(sk)->freebind;
+ inet_assign_bit(FREEBIND, ssock->sk,
+ inet_test_bit(FREEBIND, sk));
break;
case IP_TRANSPARENT:
issk->transparent = inet_sk(sk)->transparent;
@@ -1442,7 +1444,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
__tcp_sock_set_nodelay(ssk, !!msk->nodelay);
inet_sk(ssk)->transparent = inet_sk(sk)->transparent;
- inet_sk(ssk)->freebind = inet_sk(sk)->freebind;
+ inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
}
static void __mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk)
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 33c0895e101c08d042f16adad7d6ea5ff2bc05c0..2185f44198deb002bc8ed7f1b0f3fe02d6bb9f09 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -360,7 +360,7 @@ static int sctp_v4_available(union sctp_addr *addr, struct sctp_sock *sp)
ret = inet_addr_type_table(net, addr->v4.sin_addr.s_addr, tb_id);
if (addr->v4.sin_addr.s_addr != htonl(INADDR_ANY) &&
ret != RTN_LOCAL &&
- !sp->inet.freebind &&
+ !inet_test_bit(FREEBIND, sk) &&
!READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind))
return 0;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 06/15] inet: move inet->hdrincl to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (4 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 05/15] inet: move inet->freebind " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags Eric Dumazet
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_HDRINCL socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 4 ++--
net/ipv4/af_inet.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_output.c | 5 +++--
net/ipv4/ip_sockglue.c | 18 ++++++++----------
net/ipv4/raw.c | 10 +++-------
net/ipv4/route.c | 8 ++++----
net/ipv6/af_inet6.c | 2 +-
net/ipv6/ip6_output.c | 5 +++--
net/ipv6/raw.c | 16 +++++-----------
10 files changed, 31 insertions(+), 41 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index d6ba963534b4a5aa5dc6f88b94dd36f260be765b..ad1895e32e7d9bbad4ce210bda9698328e026b18 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -231,7 +231,6 @@ struct inet_sock {
__u8 mc_ttl;
__u8 pmtudisc;
__u8 is_icsk:1,
- hdrincl:1,
mc_loop:1,
transparent:1,
mc_all:1,
@@ -271,6 +270,7 @@ enum {
INET_FLAGS_RECVERR = 9,
INET_FLAGS_RECVERR_RFC4884 = 10,
INET_FLAGS_FREEBIND = 11,
+ INET_FLAGS_HDRINCL = 12,
};
/* cmsg flags for inet */
@@ -397,7 +397,7 @@ static inline __u8 inet_sk_flowi_flags(const struct sock *sk)
{
__u8 flags = 0;
- if (inet_sk(sk)->transparent || inet_sk(sk)->hdrincl)
+ if (inet_sk(sk)->transparent || inet_test_bit(HDRINCL, sk))
flags |= FLOWI_FLAG_ANYSRC;
return flags;
}
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9b2ca2fcc5a1176ffcaab4abee1492c6466ce5ca..a42ae7a6a7aa17cf15faf4a9674241bc38e59e42 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -332,7 +332,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
if (SOCK_RAW == sock->type) {
inet->inet_num = protocol;
if (IPPROTO_RAW == protocol)
- inet->hdrincl = 1;
+ inet_set_bit(HDRINCL, sk);
}
if (READ_ONCE(net->ipv4.sysctl_ip_no_pmtu_disc))
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 5a96f4f28eca6ae6e84cb3761531309e8da0be09..98f3eb0ce16ab32daccf3c2407630622e9cdb71d 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -185,7 +185,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.recverr = inet_test_bit(RECVERR, sk);
inet_sockopt.is_icsk = inet->is_icsk;
inet_sockopt.freebind = inet_test_bit(FREEBIND, sk);
- inet_sockopt.hdrincl = inet->hdrincl;
+ inet_sockopt.hdrincl = inet_test_bit(HDRINCL, sk);
inet_sockopt.mc_loop = inet->mc_loop;
inet_sockopt.transparent = inet->transparent;
inet_sockopt.mc_all = inet->mc_all;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index f28c87533a46567dca565a9cd47789cdefe9ac07..8f396eada1b6e61ab174473e9859bc62a10a0d1c 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1039,7 +1039,7 @@ static int __ip_append_data(struct sock *sk,
}
}
} else if ((flags & MSG_SPLICE_PAGES) && length) {
- if (inet->hdrincl)
+ if (inet_test_bit(HDRINCL, sk))
return -EPERM;
if (rt->dst.dev->features & NETIF_F_SG &&
getfrag == ip_generic_getfrag)
@@ -1467,7 +1467,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
* so icmphdr does not in skb linear region and can not get icmp_type
* by icmp_hdr(skb)->type.
*/
- if (sk->sk_type == SOCK_RAW && !inet_sk(sk)->hdrincl)
+ if (sk->sk_type == SOCK_RAW &&
+ !inet_test_bit(HDRINCL, sk))
icmp_type = fl4->fl4_icmp_type;
else
icmp_type = icmp_hdr(skb)->type;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 6af84310631288c07f26c19734c5abc0fd82dc23..763456fd4f4faac8e46d649a281f178be05a7cef 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -988,6 +988,11 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet_assign_bit(FREEBIND, sk, val);
return 0;
+ case IP_HDRINCL:
+ if (sk->sk_type != SOCK_RAW)
+ return -ENOPROTOOPT;
+ inet_assign_bit(HDRINCL, sk, val);
+ return 0;
}
err = 0;
@@ -1052,13 +1057,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet->uc_ttl = val;
break;
- case IP_HDRINCL:
- if (sk->sk_type != SOCK_RAW) {
- err = -ENOPROTOOPT;
- break;
- }
- inet->hdrincl = val ? 1 : 0;
- break;
case IP_NODEFRAG:
if (sk->sk_type != SOCK_RAW) {
err = -ENOPROTOOPT;
@@ -1578,6 +1576,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_FREEBIND:
val = inet_test_bit(FREEBIND, sk);
goto copyval;
+ case IP_HDRINCL:
+ val = inet_test_bit(HDRINCL, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1625,9 +1626,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
inet->uc_ttl);
break;
}
- case IP_HDRINCL:
- val = inet->hdrincl;
- break;
case IP_NODEFRAG:
val = inet->nodefrag;
break;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index f4c27dc5714bd4be7bbd4a8e5b614c9426e6b987..4b5db5d1edc279df1fd7412af2845a7a79c95ec8 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -251,7 +251,7 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
const struct iphdr *iph = (const struct iphdr *)skb->data;
u8 *payload = skb->data + (iph->ihl << 2);
- if (inet->hdrincl)
+ if (inet_test_bit(HDRINCL, sk))
payload = skb->data;
ip_icmp_error(sk, skb, err, 0, info, payload);
}
@@ -491,12 +491,8 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (len > 0xFFFF)
goto out;
- /* hdrincl should be READ_ONCE(inet->hdrincl)
- * but READ_ONCE() doesn't work with bit fields.
- * Doing this indirectly yields the same result.
- */
- hdrincl = inet->hdrincl;
- hdrincl = READ_ONCE(hdrincl);
+ hdrincl = inet_test_bit(HDRINCL, sk);
+
/*
* Check the flags.
*/
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 92fede388d52052ee3bd2337298b8cb0608dc362..a4e153dd615ba9321d8252a5026acafaa294a149 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -515,13 +515,12 @@ static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
__u8 scope = RT_SCOPE_UNIVERSE;
if (sk) {
- const struct inet_sock *inet = inet_sk(sk);
-
oif = sk->sk_bound_dev_if;
mark = READ_ONCE(sk->sk_mark);
tos = ip_sock_rt_tos(sk);
scope = ip_sock_rt_scope(sk);
- prot = inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol;
+ prot = inet_test_bit(HDRINCL, sk) ? IPPROTO_RAW :
+ sk->sk_protocol;
}
flowi4_init_output(fl4, oif, mark, tos & IPTOS_RT_MASK, scope,
@@ -555,7 +554,8 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk)
flowi4_init_output(fl4, sk->sk_bound_dev_if, READ_ONCE(sk->sk_mark),
ip_sock_rt_tos(sk) & IPTOS_RT_MASK,
ip_sock_rt_scope(sk),
- inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ inet_test_bit(HDRINCL, sk) ?
+ IPPROTO_RAW : sk->sk_protocol,
inet_sk_flowi_flags(sk),
daddr, inet->inet_saddr, 0, 0, sk->sk_uid);
rcu_read_unlock();
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 9f9c4b838664a76cb4d7efbeb16056e22f12b358..138270e59ea6e2f30fcd75440609f92306bd4975 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -205,7 +205,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
if (SOCK_RAW == sock->type) {
inet->inet_num = protocol;
if (IPPROTO_RAW == protocol)
- inet->hdrincl = 1;
+ inet_set_bit(HDRINCL, sk);
}
sk->sk_destruct = inet6_sock_destruct;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index bc96559bbf0f8d27e2afc05696a13da6b4c1f33c..f8a1f6bb3f87251836fe6a478f16ef948239ed93 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1591,7 +1591,7 @@ static int __ip6_append_data(struct sock *sk,
}
}
} else if ((flags & MSG_SPLICE_PAGES) && length) {
- if (inet_sk(sk)->hdrincl)
+ if (inet_test_bit(HDRINCL, sk))
return -EPERM;
if (rt->dst.dev->features & NETIF_F_SG &&
getfrag == ip_generic_getfrag)
@@ -1995,7 +1995,8 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
u8 icmp6_type;
- if (sk->sk_socket->type == SOCK_RAW && !inet_sk(sk)->hdrincl)
+ if (sk->sk_socket->type == SOCK_RAW &&
+ !inet_test_bit(HDRINCL, sk))
icmp6_type = fl6->fl6_icmp_type;
else
icmp6_type = icmp6_hdr(skb)->icmp6_type;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index ea16734f5e1f7f81d329c337efbd02ab466b7ec2..0eae7661a85c4487a64384c6054a3fb827387ce7 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -291,7 +291,6 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
struct inet6_skb_parm *opt,
u8 type, u8 code, int offset, __be32 info)
{
- struct inet_sock *inet = inet_sk(sk);
struct ipv6_pinfo *np = inet6_sk(sk);
int err;
int harderr;
@@ -315,7 +314,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
}
if (np->recverr) {
u8 *payload = skb->data;
- if (!inet->hdrincl)
+ if (!inet_test_bit(HDRINCL, sk))
payload += offset;
ipv6_icmp_error(sk, skb, err, 0, ntohl(info), payload);
}
@@ -406,7 +405,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
skb->len,
inet->inet_num, 0));
- if (inet->hdrincl) {
+ if (inet_test_bit(HDRINCL, sk)) {
if (skb_checksum_complete(skb)) {
atomic_inc(&sk->sk_drops);
kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM);
@@ -762,12 +761,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (msg->msg_flags & MSG_OOB)
return -EOPNOTSUPP;
- /* hdrincl should be READ_ONCE(inet->hdrincl)
- * but READ_ONCE() doesn't work with bit fields.
- * Doing this indirectly yields the same result.
- */
- hdrincl = inet->hdrincl;
- hdrincl = READ_ONCE(hdrincl);
+ hdrincl = inet_test_bit(HDRINCL, sk);
/*
* Get and verify the address.
@@ -1000,7 +994,7 @@ static int do_rawv6_setsockopt(struct sock *sk, int level, int optname,
case IPV6_HDRINCL:
if (sk->sk_type != SOCK_RAW)
return -EINVAL;
- inet_sk(sk)->hdrincl = !!val;
+ inet_assign_bit(HDRINCL, sk, val);
return 0;
case IPV6_CHECKSUM:
if (inet_sk(sk)->inet_num == IPPROTO_ICMPV6 &&
@@ -1068,7 +1062,7 @@ static int do_rawv6_getsockopt(struct sock *sk, int level, int optname,
switch (optname) {
case IPV6_HDRINCL:
- val = inet_sk(sk)->hdrincl;
+ val = inet_test_bit(HDRINCL, sk);
break;
case IPV6_CHECKSUM:
/*
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (5 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 06/15] inet: move inet->hdrincl " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 14:10 ` kernel test robot
2023-08-11 7:36 ` [PATCH v2 net-next 08/15] inet: move inet->mc_all " Eric Dumazet
` (7 subsequent siblings)
14 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_MULTICAST_LOOP socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 2 +-
net/core/sock.c | 2 +-
net/ipv4/af_inet.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 16 ++++++++--------
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/af_inet6.c | 2 +-
net/sctp/socket.c | 2 +-
8 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ad1895e32e7d9bbad4ce210bda9698328e026b18..6c4eeca59f608ff18e5f05dec33700189d6e2198 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -231,7 +231,6 @@ struct inet_sock {
__u8 mc_ttl;
__u8 pmtudisc;
__u8 is_icsk:1,
- mc_loop:1,
transparent:1,
mc_all:1,
nodefrag:1;
@@ -271,6 +270,7 @@ enum {
INET_FLAGS_RECVERR_RFC4884 = 10,
INET_FLAGS_FREEBIND = 11,
INET_FLAGS_HDRINCL = 12,
+ INET_FLAGS_MC_LOOP = 13,
};
/* cmsg flags for inet */
diff --git a/net/core/sock.c b/net/core/sock.c
index 525619776c6f4945552d5c4117c5742fe7e14f5e..22d94394335fb75f12da65368e87c5a65167cc0e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -767,7 +767,7 @@ bool sk_mc_loop(struct sock *sk)
return true;
switch (sk->sk_family) {
case AF_INET:
- return inet_sk(sk)->mc_loop;
+ return inet_test_bit(MC_LOOP, sk);
#if IS_ENABLED(CONFIG_IPV6)
case AF_INET6:
return inet6_sk(sk)->mc_loop;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index a42ae7a6a7aa17cf15faf4a9674241bc38e59e42..80e2a3c897a540c76b979355957b81a024bd8259 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -350,7 +350,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
sk->sk_txrehash = READ_ONCE(net->core.sysctl_txrehash);
inet->uc_ttl = -1;
- inet->mc_loop = 1;
+ inet_set_bit(MC_LOOP, sk);
inet->mc_ttl = 1;
inet->mc_all = 1;
inet->mc_index = 0;
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 98f3eb0ce16ab32daccf3c2407630622e9cdb71d..cc797261893b902f626b5a36074e4b4bf7535063 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -186,7 +186,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.is_icsk = inet->is_icsk;
inet_sockopt.freebind = inet_test_bit(FREEBIND, sk);
inet_sockopt.hdrincl = inet_test_bit(HDRINCL, sk);
- inet_sockopt.mc_loop = inet->mc_loop;
+ inet_sockopt.mc_loop = inet_test_bit(MC_LOOP, sk);
inet_sockopt.transparent = inet->transparent;
inet_sockopt.mc_all = inet->mc_all;
inet_sockopt.nodefrag = inet->nodefrag;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 763456fd4f4faac8e46d649a281f178be05a7cef..be569032b612bef1277e802400a1ee6ec20e877a 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -993,6 +993,11 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -ENOPROTOOPT;
inet_assign_bit(HDRINCL, sk, val);
return 0;
+ case IP_MULTICAST_LOOP:
+ if (optlen < 1)
+ return -EINVAL;
+ inet_assign_bit(MC_LOOP, sk, val);
+ return 0;
}
err = 0;
@@ -1083,11 +1088,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet->mc_ttl = val;
break;
- case IP_MULTICAST_LOOP:
- if (optlen < 1)
- goto e_inval;
- inet->mc_loop = !!val;
- break;
case IP_UNICAST_IF:
{
struct net_device *dev = NULL;
@@ -1579,6 +1579,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_HDRINCL:
val = inet_test_bit(HDRINCL, sk);
goto copyval;
+ case IP_MULTICAST_LOOP:
+ val = inet_test_bit(MC_LOOP, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1653,9 +1656,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_MULTICAST_TTL:
val = inet->mc_ttl;
break;
- case IP_MULTICAST_LOOP:
- val = inet->mc_loop;
- break;
case IP_UNICAST_IF:
val = (__force int)htonl((__u32) inet->uc_index);
break;
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index 5f8104cf082d0e25a204f1d7ae5c27d9961914ea..9b18f371af0d49bd3ee9a440f222d03efd8a4911 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -63,7 +63,7 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
struct sock *sk = sock->sk;
/* Disable multicast loopback */
- inet_sk(sk)->mc_loop = 0;
+ inet_clear_bit(MC_LOOP, sk);
/* Enable CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion */
inet_inc_convert_csum(sk);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 138270e59ea6e2f30fcd75440609f92306bd4975..4a34a4ba62b229991307ebed74ac7cd9f3a943ba 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -229,7 +229,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
*/
inet->uc_ttl = -1;
- inet->mc_loop = 1;
+ inet_set_bit(MC_LOOP, sk);
inet->mc_ttl = 1;
inet->mc_index = 0;
RCU_INIT_POINTER(inet->mc_list, NULL);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 6e3d28aa587cdb64f7a1ac384fa28a34d4c6739c..04b390892827b8abbb7e7433d71f4f54dd1dac21 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -9482,7 +9482,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
newinet->inet_id = get_random_u16();
newinet->uc_ttl = inet->uc_ttl;
- newinet->mc_loop = 1;
+ inet_set_bit(MC_LOOP, newsk);
newinet->mc_ttl = 1;
newinet->mc_index = 0;
newinet->mc_list = NULL;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 08/15] inet: move inet->mc_all to inet->inet_frags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (6 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 09/15] inet: move inet->transparent to inet->inet_flags Eric Dumazet
` (6 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_MULTICAST_ALL socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 2 +-
net/ipv4/af_inet.c | 2 +-
net/ipv4/igmp.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 22 +++++++++++-----------
5 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 6c4eeca59f608ff18e5f05dec33700189d6e2198..fffd34fa6a7cb92a98e29bd6b36ccf907b5e3a6d 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -232,7 +232,6 @@ struct inet_sock {
__u8 pmtudisc;
__u8 is_icsk:1,
transparent:1,
- mc_all:1,
nodefrag:1;
__u8 bind_address_no_port:1,
defer_connect:1; /* Indicates that fastopen_connect is set
@@ -271,6 +270,7 @@ enum {
INET_FLAGS_FREEBIND = 11,
INET_FLAGS_HDRINCL = 12,
INET_FLAGS_MC_LOOP = 13,
+ INET_FLAGS_MC_ALL = 14,
};
/* cmsg flags for inet */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 80e2a3c897a540c76b979355957b81a024bd8259..c15aae4a386097b66a8908e2dcf23c549200e86f 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -352,7 +352,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
inet->uc_ttl = -1;
inet_set_bit(MC_LOOP, sk);
inet->mc_ttl = 1;
- inet->mc_all = 1;
+ inet_set_bit(MC_ALL, sk);
inet->mc_index = 0;
inet->mc_list = NULL;
inet->rcv_tos = 0;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 48ff5f13e7979dc00da60b466ee2e74ddce0891b..0c9e768e5628b1c8fd7e87bebe528762ea4a6e1e 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -2658,7 +2658,7 @@ int ip_mc_sf_allow(const struct sock *sk, __be32 loc_addr, __be32 rmt_addr,
(sdif && pmc->multi.imr_ifindex == sdif)))
break;
}
- ret = inet->mc_all;
+ ret = inet_test_bit(MC_ALL, sk);
if (!pmc)
goto unlock;
psl = rcu_dereference(pmc->sflist);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index cc797261893b902f626b5a36074e4b4bf7535063..e009dab80c3546c5222c587531acd394f2eeff0d 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -188,7 +188,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.hdrincl = inet_test_bit(HDRINCL, sk);
inet_sockopt.mc_loop = inet_test_bit(MC_LOOP, sk);
inet_sockopt.transparent = inet->transparent;
- inet_sockopt.mc_all = inet->mc_all;
+ inet_sockopt.mc_all = inet_test_bit(MC_ALL, sk);
inet_sockopt.nodefrag = inet->nodefrag;
inet_sockopt.bind_address_no_port = inet->bind_address_no_port;
inet_sockopt.recverr_rfc4884 = inet_test_bit(RECVERR_RFC4884, sk);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index be569032b612bef1277e802400a1ee6ec20e877a..2f27c30a4eccca5d23b70851daeb5115bcc1de16 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -998,6 +998,14 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet_assign_bit(MC_LOOP, sk, val);
return 0;
+ case IP_MULTICAST_ALL:
+ if (optlen < 1)
+ return -EINVAL;
+ if (val != 0 && val != 1)
+ return -EINVAL;
+ inet_assign_bit(MC_ALL, sk, val);
+ return 0;
+
}
err = 0;
@@ -1303,14 +1311,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
else
err = ip_set_mcast_msfilter(sk, optval, optlen);
break;
- case IP_MULTICAST_ALL:
- if (optlen < 1)
- goto e_inval;
- if (val != 0 && val != 1)
- goto e_inval;
- inet->mc_all = val;
- break;
-
case IP_IPSEC_POLICY:
case IP_XFRM_POLICY:
err = -EPERM;
@@ -1582,6 +1582,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_MULTICAST_LOOP:
val = inet_test_bit(MC_LOOP, sk);
goto copyval;
+ case IP_MULTICAST_ALL:
+ val = inet_test_bit(MC_ALL, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1694,9 +1697,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
else
err = ip_get_mcast_msfilter(sk, optval, optlen, len);
goto out;
- case IP_MULTICAST_ALL:
- val = inet->mc_all;
- break;
case IP_PKTOPTIONS:
{
struct msghdr msg;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 09/15] inet: move inet->transparent to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (7 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 08/15] inet: move inet->mc_all " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 10/15] inet: move inet->is_icsk " Eric Dumazet
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_TRANSPARENT socket option can now be set/read
without locking the socket.
v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transparent()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 6 +++---
include/net/ipv6.h | 2 +-
include/net/route.h | 2 +-
include/net/tcp.h | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/inet_timewait_sock.c | 2 +-
net/ipv4/ip_sockglue.c | 28 +++++++++++++---------------
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_minisocks.c | 3 +--
net/ipv6/ipv6_sockglue.c | 4 ++--
net/mptcp/sockopt.c | 11 +++++------
11 files changed, 30 insertions(+), 34 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index fffd34fa6a7cb92a98e29bd6b36ccf907b5e3a6d..cefd9a60dc6d8432cc685716c2e556be7a7dc2ec 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -231,7 +231,6 @@ struct inet_sock {
__u8 mc_ttl;
__u8 pmtudisc;
__u8 is_icsk:1,
- transparent:1,
nodefrag:1;
__u8 bind_address_no_port:1,
defer_connect:1; /* Indicates that fastopen_connect is set
@@ -271,6 +270,7 @@ enum {
INET_FLAGS_HDRINCL = 12,
INET_FLAGS_MC_LOOP = 13,
INET_FLAGS_MC_ALL = 14,
+ INET_FLAGS_TRANSPARENT = 15,
};
/* cmsg flags for inet */
@@ -397,7 +397,7 @@ static inline __u8 inet_sk_flowi_flags(const struct sock *sk)
{
__u8 flags = 0;
- if (inet_sk(sk)->transparent || inet_test_bit(HDRINCL, sk))
+ if (inet_test_bit(TRANSPARENT, sk) || inet_test_bit(HDRINCL, sk))
flags |= FLOWI_FLAG_ANYSRC;
return flags;
}
@@ -424,7 +424,7 @@ static inline bool inet_can_nonlocal_bind(struct net *net,
{
return READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind) ||
test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) ||
- inet->transparent;
+ test_bit(INET_FLAGS_TRANSPARENT, &inet->inet_flags);
}
static inline bool inet_addr_valid_or_nonlocal(struct net *net,
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 5f513503e7d568c189a7b14439612f4e27ba539b..9a3520d5340198ad48b5d52e22653d8a7a9d80af 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -938,7 +938,7 @@ static inline bool ipv6_can_nonlocal_bind(struct net *net,
{
return net->ipv6.sysctl.ip_nonlocal_bind ||
test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) ||
- inet->transparent;
+ test_bit(INET_FLAGS_TRANSPARENT, &inet->inet_flags);
}
/* Sysctl settings for net ipv6.auto_flowlabels */
diff --git a/include/net/route.h b/include/net/route.h
index d9ca98d2366ff96a754682f5749037ffcdadcc8e..51a45b1887b562bfb473f9f8c50897d5d3073476 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -298,7 +298,7 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst,
{
__u8 flow_flags = 0;
- if (inet_sk(sk)->transparent)
+ if (inet_test_bit(TRANSPARENT, sk))
flow_flags |= FLOWI_FLAG_ANYSRC;
flowi4_init_output(fl4, oif, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk),
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d77c08d83b76a8bf4347bbb05dc6e808b5857d0..07b21d9a962072e4fbd3986162458e16a62abfb0 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2031,7 +2031,7 @@ static inline bool inet_sk_transparent(const struct sock *sk)
case TCP_NEW_SYN_RECV:
return inet_rsk(inet_reqsk(sk))->no_srccheck;
}
- return inet_sk(sk)->transparent;
+ return inet_test_bit(TRANSPARENT, sk);
}
/* Determines whether this is a thin stream (which may suffer from
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index e009dab80c3546c5222c587531acd394f2eeff0d..45fefd2f31fd7b921d796b0317b72b8858ca9c5b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -187,7 +187,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.freebind = inet_test_bit(FREEBIND, sk);
inet_sockopt.hdrincl = inet_test_bit(HDRINCL, sk);
inet_sockopt.mc_loop = inet_test_bit(MC_LOOP, sk);
- inet_sockopt.transparent = inet->transparent;
+ inet_sockopt.transparent = inet_test_bit(TRANSPARENT, sk);
inet_sockopt.mc_all = inet_test_bit(MC_ALL, sk);
inet_sockopt.nodefrag = inet->nodefrag;
inet_sockopt.bind_address_no_port = inet->bind_address_no_port;
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 2c1b245dba8e8d1403018fb5b8caee1981ee1043..dd37a5bf6881117aafc4f2a0631979c4e3928be6 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -203,7 +203,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
tw->tw_reuseport = sk->sk_reuseport;
tw->tw_hash = sk->sk_hash;
tw->tw_ipv6only = 0;
- tw->tw_transparent = inet->transparent;
+ tw->tw_transparent = inet_test_bit(TRANSPARENT, sk);
tw->tw_prot = sk->sk_prot_creator;
atomic64_set(&tw->tw_cookie, atomic64_read(&sk->sk_cookie));
twsk_net_set(tw, sock_net(sk));
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 2f27c30a4eccca5d23b70851daeb5115bcc1de16..3f5323a230b3d84048838cb03d648b213bd95fab 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1005,7 +1005,16 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
inet_assign_bit(MC_ALL, sk, val);
return 0;
-
+ case IP_TRANSPARENT:
+ if (!!val && !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
+ !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+ err = -EPERM;
+ break;
+ }
+ if (optlen < 1)
+ goto e_inval;
+ inet_assign_bit(TRANSPARENT, sk, val);
+ return 0;
}
err = 0;
@@ -1319,17 +1328,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
err = xfrm_user_policy(sk, optname, optval, optlen);
break;
- case IP_TRANSPARENT:
- if (!!val && !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
- !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
- err = -EPERM;
- break;
- }
- if (optlen < 1)
- goto e_inval;
- inet->transparent = !!val;
- break;
-
case IP_MINTTL:
if (optlen < 1)
goto e_inval;
@@ -1585,6 +1583,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_MULTICAST_ALL:
val = inet_test_bit(MC_ALL, sk);
goto copyval;
+ case IP_TRANSPARENT:
+ val = inet_test_bit(TRANSPARENT, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1735,9 +1736,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
len -= msg.msg_controllen;
return copy_to_sockptr(optlen, &len, sizeof(int));
}
- case IP_TRANSPARENT:
- val = inet->transparent;
- break;
case IP_MINTTL:
val = inet->min_ttl;
break;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8e96ebe373d7ec88213adac9f85cc367200694ec..5ad755c014b7a33d3d1f096d648f654d856c78e3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6994,7 +6994,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
tcp_openreq_init(req, &tmp_opt, skb, sk);
- inet_rsk(req)->no_srccheck = inet_sk(sk)->transparent;
+ inet_rsk(req)->no_srccheck = inet_test_bit(TRANSPARENT, sk);
/* Note: tcp_v6_init_req() might override ir_iif for link locals */
inet_rsk(req)->ir_iif = inet_request_bound_dev_if(sk, skb);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 13ee12983c4293afe2ddabe282155be045a2e9b2..b98d476f1594bd8f9a70e6ff53d7f868a15997c5 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -289,9 +289,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
if (tw) {
struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
const int rto = (icsk->icsk_rto << 2) - (icsk->icsk_rto >> 1);
- struct inet_sock *inet = inet_sk(sk);
- tw->tw_transparent = inet->transparent;
+ tw->tw_transparent = inet_test_bit(TRANSPARENT, sk);
tw->tw_mark = sk->sk_mark;
tw->tw_priority = sk->sk_priority;
tw->tw_rcv_wscale = tp->rx_opt.rcv_wscale;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 3eb38436f8d431ca37200869bfe57ec33b46bf8b..eb334122512c2a7b41dc5f6bc83aaa3c2b946a06 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -633,7 +633,7 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
if (optlen < sizeof(int))
goto e_inval;
/* we don't have a separate transparent bit for IPV6 we use the one in the IPv4 socket */
- inet_sk(sk)->transparent = valbool;
+ inet_assign_bit(TRANSPARENT, sk, valbool);
retv = 0;
break;
@@ -1330,7 +1330,7 @@ int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
}
case IPV6_TRANSPARENT:
- val = inet_sk(sk)->transparent;
+ val = inet_test_bit(TRANSPARENT, sk);
break;
case IPV6_FREEBIND:
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 1f3331f9f7c85f3b2a1e8dc03cf80be73af4ed0d..64bd9e1193f465f882d63a88c90a19946047121c 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -416,7 +416,8 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
sk->sk_ipv6only = ssock->sk->sk_ipv6only;
break;
case IPV6_TRANSPARENT:
- inet_sk(sk)->transparent = inet_sk(ssock->sk)->transparent;
+ inet_assign_bit(TRANSPARENT, sk,
+ inet_test_bit(TRANSPARENT, ssock->sk));
break;
case IPV6_FREEBIND:
inet_assign_bit(FREEBIND, sk,
@@ -685,7 +686,6 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
sockptr_t optval, unsigned int optlen)
{
struct sock *sk = (struct sock *)msk;
- struct inet_sock *issk;
struct socket *ssock;
int err;
@@ -701,15 +701,14 @@ static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int o
return PTR_ERR(ssock);
}
- issk = inet_sk(ssock->sk);
-
switch (optname) {
case IP_FREEBIND:
inet_assign_bit(FREEBIND, ssock->sk,
inet_test_bit(FREEBIND, sk));
break;
case IP_TRANSPARENT:
- issk->transparent = inet_sk(sk)->transparent;
+ inet_assign_bit(TRANSPARENT, ssock->sk,
+ inet_test_bit(TRANSPARENT, sk));
break;
default:
release_sock(sk);
@@ -1443,7 +1442,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
__tcp_sock_set_cork(ssk, !!msk->cork);
__tcp_sock_set_nodelay(ssk, !!msk->nodelay);
- inet_sk(ssk)->transparent = inet_sk(sk)->transparent;
+ inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
}
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 10/15] inet: move inet->is_icsk to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (8 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 09/15] inet: move inet->transparent to inet->inet_flags Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 11/15] inet: move inet->nodefrag " Eric Dumazet
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
We move single bit fields to inet->inet_flags to avoid races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_connection_sock.h | 4 ++--
include/net/inet_sock.h | 5 ++---
net/ipv4/af_inet.c | 2 +-
net/ipv4/cipso_ipv4.c | 4 ++--
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 4 ++--
net/ipv6/af_inet6.c | 2 +-
net/ipv6/ipv6_sockglue.c | 4 ++--
8 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c2b15f7e551617b06863a3b52056348d9c53bb12..7402918b015636bd668ae4ebb53b3f73d1cd34a6 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -341,9 +341,9 @@ static inline bool inet_csk_in_pingpong_mode(struct sock *sk)
return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH;
}
-static inline bool inet_csk_has_ulp(struct sock *sk)
+static inline bool inet_csk_has_ulp(const struct sock *sk)
{
- return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops;
+ return inet_test_bit(IS_ICSK, sk) && !!inet_csk(sk)->icsk_ulp_ops;
}
#endif /* _INET_CONNECTION_SOCK_H */
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index cefd9a60dc6d8432cc685716c2e556be7a7dc2ec..38f7fc1c4dacfb4ecacbbb38ae484ed06f2638e2 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -201,7 +201,6 @@ struct rtable;
* @inet_id - ID counter for DF pkts
* @tos - TOS
* @mc_ttl - Multicasting TTL
- * @is_icsk - is this an inet_connection_sock?
* @uc_index - Unicast outgoing device index
* @mc_index - Multicast device index
* @mc_list - Group array
@@ -230,8 +229,7 @@ struct inet_sock {
__u8 min_ttl;
__u8 mc_ttl;
__u8 pmtudisc;
- __u8 is_icsk:1,
- nodefrag:1;
+ __u8 nodefrag:1;
__u8 bind_address_no_port:1,
defer_connect:1; /* Indicates that fastopen_connect is set
* and cookie exists so we defer connect
@@ -271,6 +269,7 @@ enum {
INET_FLAGS_MC_LOOP = 13,
INET_FLAGS_MC_ALL = 14,
INET_FLAGS_TRANSPARENT = 15,
+ INET_FLAGS_IS_ICSK = 16,
};
/* cmsg flags for inet */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c15aae4a386097b66a8908e2dcf23c549200e86f..7655574b2de152fad70b258e779fcdadfb283f32 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -325,7 +325,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
sk->sk_reuse = SK_CAN_REUSE;
inet = inet_sk(sk);
- inet->is_icsk = (INET_PROTOSW_ICSK & answer_flags) != 0;
+ inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
inet->nodefrag = 0;
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 79ae7204e8edb973764e53349d3270dda78e18c4..d048aa83329386b0bbe4c68d4dee2c86871f8efb 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1881,7 +1881,7 @@ int cipso_v4_sock_setattr(struct sock *sk,
old = rcu_dereference_protected(sk_inet->inet_opt,
lockdep_sock_is_held(sk));
- if (sk_inet->is_icsk) {
+ if (inet_test_bit(IS_ICSK, sk)) {
sk_conn = inet_csk(sk);
if (old)
sk_conn->icsk_ext_hdr_len -= old->opt.optlen;
@@ -2051,7 +2051,7 @@ void cipso_v4_sock_delattr(struct sock *sk)
sk_inet = inet_sk(sk);
hdr_delta = cipso_v4_delopt(&sk_inet->inet_opt);
- if (sk_inet->is_icsk && hdr_delta > 0) {
+ if (inet_test_bit(IS_ICSK, sk) && hdr_delta > 0) {
struct inet_connection_sock *sk_conn = inet_csk(sk);
sk_conn->icsk_ext_hdr_len -= hdr_delta;
sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 45fefd2f31fd7b921d796b0317b72b8858ca9c5b..ada198fc1a92bfbaa1abe691da24489edf281f22 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -183,7 +183,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
memset(&inet_sockopt, 0, sizeof(inet_sockopt));
inet_sockopt.recverr = inet_test_bit(RECVERR, sk);
- inet_sockopt.is_icsk = inet->is_icsk;
+ inet_sockopt.is_icsk = inet_test_bit(IS_ICSK, sk);
inet_sockopt.freebind = inet_test_bit(FREEBIND, sk);
inet_sockopt.hdrincl = inet_test_bit(HDRINCL, sk);
inet_sockopt.mc_loop = inet_test_bit(MC_LOOP, sk);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 3f5323a230b3d84048838cb03d648b213bd95fab..dac471ed067b4ba276fc0a9379750df54ea8987c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1034,7 +1034,7 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
break;
old = rcu_dereference_protected(inet->inet_opt,
lockdep_sock_is_held(sk));
- if (inet->is_icsk) {
+ if (inet_test_bit(IS_ICSK, sk)) {
struct inet_connection_sock *icsk = inet_csk(sk);
#if IS_ENABLED(CONFIG_IPV6)
if (sk->sk_family == PF_INET ||
@@ -1209,7 +1209,7 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
struct ip_mreqn mreq;
err = -EPROTO;
- if (inet_sk(sk)->is_icsk)
+ if (inet_test_bit(IS_ICSK, sk))
break;
if (optlen < sizeof(struct ip_mreq))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 4a34a4ba62b229991307ebed74ac7cd9f3a943ba..fea7918ad6ef351afc6bfb45d54aae8d658d4b55 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -200,7 +200,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
sk->sk_reuse = SK_CAN_REUSE;
inet = inet_sk(sk);
- inet->is_icsk = (INET_PROTOSW_ICSK & answer_flags) != 0;
+ inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
if (SOCK_RAW == sock->type) {
inet->inet_num = protocol;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index eb334122512c2a7b41dc5f6bc83aaa3c2b946a06..d19577a94bcc6120e85dafb2768521e6567c0511 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -102,7 +102,7 @@ int ip6_ra_control(struct sock *sk, int sel)
struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
struct ipv6_txoptions *opt)
{
- if (inet_sk(sk)->is_icsk) {
+ if (inet_test_bit(IS_ICSK, sk)) {
if (opt &&
!((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
inet_sk(sk)->inet_daddr != LOOPBACK4_IPV6) {
@@ -831,7 +831,7 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
retv = -EPROTO;
- if (inet_sk(sk)->is_icsk)
+ if (inet_test_bit(IS_ICSK, sk))
break;
retv = -EFAULT;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 11/15] inet: move inet->nodefrag to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (9 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 10/15] inet: move inet->is_icsk " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 12/15] inet: move inet->bind_address_no_port " Eric Dumazet
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_NODEFRAG socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 2 +-
net/ipv4/af_inet.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 18 ++++++++----------
net/ipv4/netfilter/nf_defrag_ipv4.c | 2 +-
net/netfilter/ipvs/ip_vs_core.c | 4 ++--
6 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 38f7fc1c4dacfb4ecacbbb38ae484ed06f2638e2..0e6e1b017efb1f738be1682448675ecece43c1f7 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -229,7 +229,6 @@ struct inet_sock {
__u8 min_ttl;
__u8 mc_ttl;
__u8 pmtudisc;
- __u8 nodefrag:1;
__u8 bind_address_no_port:1,
defer_connect:1; /* Indicates that fastopen_connect is set
* and cookie exists so we defer connect
@@ -270,6 +269,7 @@ enum {
INET_FLAGS_MC_ALL = 14,
INET_FLAGS_TRANSPARENT = 15,
INET_FLAGS_IS_ICSK = 16,
+ INET_FLAGS_NODEFRAG = 17,
};
/* cmsg flags for inet */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7655574b2de152fad70b258e779fcdadfb283f32..f684310c8f24ca08170f39ec955d20209566d7c5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -327,7 +327,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
inet = inet_sk(sk);
inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
- inet->nodefrag = 0;
+ inet_clear_bit(NODEFRAG, sk);
if (SOCK_RAW == sock->type) {
inet->inet_num = protocol;
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index ada198fc1a92bfbaa1abe691da24489edf281f22..39606caad484a99a78beae399e38e56584f23f28 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -189,7 +189,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.mc_loop = inet_test_bit(MC_LOOP, sk);
inet_sockopt.transparent = inet_test_bit(TRANSPARENT, sk);
inet_sockopt.mc_all = inet_test_bit(MC_ALL, sk);
- inet_sockopt.nodefrag = inet->nodefrag;
+ inet_sockopt.nodefrag = inet_test_bit(NODEFRAG, sk);
inet_sockopt.bind_address_no_port = inet->bind_address_no_port;
inet_sockopt.recverr_rfc4884 = inet_test_bit(RECVERR_RFC4884, sk);
inet_sockopt.defer_connect = inet->defer_connect;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index dac471ed067b4ba276fc0a9379750df54ea8987c..ec946c13ea206dde3c5634d6dcd07aab7090cad8 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1015,6 +1015,11 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet_assign_bit(TRANSPARENT, sk, val);
return 0;
+ case IP_NODEFRAG:
+ if (sk->sk_type != SOCK_RAW)
+ return -ENOPROTOOPT;
+ inet_assign_bit(NODEFRAG, sk, val);
+ return 0;
}
err = 0;
@@ -1079,13 +1084,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet->uc_ttl = val;
break;
- case IP_NODEFRAG:
- if (sk->sk_type != SOCK_RAW) {
- err = -ENOPROTOOPT;
- break;
- }
- inet->nodefrag = val ? 1 : 0;
- break;
case IP_BIND_ADDRESS_NO_PORT:
inet->bind_address_no_port = val ? 1 : 0;
break;
@@ -1586,6 +1584,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_TRANSPARENT:
val = inet_test_bit(TRANSPARENT, sk);
goto copyval;
+ case IP_NODEFRAG:
+ val = inet_test_bit(NODEFRAG, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1633,9 +1634,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
inet->uc_ttl);
break;
}
- case IP_NODEFRAG:
- val = inet->nodefrag;
- break;
case IP_BIND_ADDRESS_NO_PORT:
val = inet->bind_address_no_port;
break;
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index a9ba7de092c42895e01d808beeab18affe196abc..265b39bc435b4c7f356a7e92705e43353adb426a 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -66,7 +66,7 @@ static unsigned int ipv4_conntrack_defrag(void *priv,
struct sock *sk = skb->sk;
if (sk && sk_fullsock(sk) && (sk->sk_family == PF_INET) &&
- inet_sk(sk)->nodefrag)
+ inet_test_bit(NODEFRAG, sk))
return NF_ACCEPT;
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index cb83ca506c5c9de43012b1e66b9a4619ffda7de4..3230506ae3ffd8c120f0c96b07d78a7b58a4aaac 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1346,7 +1346,7 @@ ip_vs_out_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *stat
if (unlikely(sk && hooknum == NF_INET_LOCAL_OUT &&
af == AF_INET)) {
- if (sk->sk_family == PF_INET && inet_sk(sk)->nodefrag)
+ if (sk->sk_family == PF_INET && inet_test_bit(NODEFRAG, sk))
return NF_ACCEPT;
}
@@ -1946,7 +1946,7 @@ ip_vs_in_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state
if (unlikely(sk && hooknum == NF_INET_LOCAL_OUT &&
af == AF_INET)) {
- if (sk->sk_family == PF_INET && inet_sk(sk)->nodefrag)
+ if (sk->sk_family == PF_INET && inet_test_bit(NODEFRAG, sk))
return NF_ACCEPT;
}
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 12/15] inet: move inet->bind_address_no_port to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (10 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 11/15] inet: move inet->nodefrag " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 13/15] inet: move inet->defer_connect " Eric Dumazet
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
IP_BIND_ADDRESS_NO_PORT socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 4 ++--
net/ipv4/af_inet.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
net/ipv4/ip_sockglue.c | 12 ++++++------
net/ipv6/af_inet6.c | 2 +-
5 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 0e6e1b017efb1f738be1682448675ecece43c1f7..5eca2e70cbb2c16d26caa7f219ae53fe066ea3bd 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -229,8 +229,7 @@ struct inet_sock {
__u8 min_ttl;
__u8 mc_ttl;
__u8 pmtudisc;
- __u8 bind_address_no_port:1,
- defer_connect:1; /* Indicates that fastopen_connect is set
+ __u8 defer_connect:1; /* Indicates that fastopen_connect is set
* and cookie exists so we defer connect
* until first data frame is written
*/
@@ -270,6 +269,7 @@ enum {
INET_FLAGS_TRANSPARENT = 15,
INET_FLAGS_IS_ICSK = 16,
INET_FLAGS_NODEFRAG = 17,
+ INET_FLAGS_BIND_ADDRESS_NO_PORT = 18,
};
/* cmsg flags for inet */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f684310c8f24ca08170f39ec955d20209566d7c5..c591f04eb6a9fc3b7b37a4b93b826a35488b9b50 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -519,7 +519,7 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
inet->inet_saddr = 0; /* Use device */
/* Make sure we are allowed to bind here. */
- if (snum || !(inet->bind_address_no_port ||
+ if (snum || !(inet_test_bit(BIND_ADDRESS_NO_PORT, sk) ||
(flags & BIND_FORCE_ADDRESS_NO_PORT))) {
err = sk->sk_prot->get_port(sk, snum);
if (err) {
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 39606caad484a99a78beae399e38e56584f23f28..128966dea5540caaa94f6b87db4d3960d177caac 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -190,7 +190,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.transparent = inet_test_bit(TRANSPARENT, sk);
inet_sockopt.mc_all = inet_test_bit(MC_ALL, sk);
inet_sockopt.nodefrag = inet_test_bit(NODEFRAG, sk);
- inet_sockopt.bind_address_no_port = inet->bind_address_no_port;
+ inet_sockopt.bind_address_no_port = inet_test_bit(BIND_ADDRESS_NO_PORT, sk);
inet_sockopt.recverr_rfc4884 = inet_test_bit(RECVERR_RFC4884, sk);
inet_sockopt.defer_connect = inet->defer_connect;
if (nla_put(skb, INET_DIAG_SOCKOPT, sizeof(inet_sockopt),
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ec946c13ea206dde3c5634d6dcd07aab7090cad8..cfa65a0b0900f2f77bfd800f105ea079e2afff7c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1020,6 +1020,9 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -ENOPROTOOPT;
inet_assign_bit(NODEFRAG, sk, val);
return 0;
+ case IP_BIND_ADDRESS_NO_PORT:
+ inet_assign_bit(BIND_ADDRESS_NO_PORT, sk, val);
+ return 0;
}
err = 0;
@@ -1084,9 +1087,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
goto e_inval;
inet->uc_ttl = val;
break;
- case IP_BIND_ADDRESS_NO_PORT:
- inet->bind_address_no_port = val ? 1 : 0;
- break;
case IP_MTU_DISCOVER:
if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_OMIT)
goto e_inval;
@@ -1587,6 +1587,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_NODEFRAG:
val = inet_test_bit(NODEFRAG, sk);
goto copyval;
+ case IP_BIND_ADDRESS_NO_PORT:
+ val = inet_test_bit(BIND_ADDRESS_NO_PORT, sk);
+ goto copyval;
}
if (needs_rtnl)
@@ -1634,9 +1637,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
inet->uc_ttl);
break;
}
- case IP_BIND_ADDRESS_NO_PORT:
- val = inet->bind_address_no_port;
- break;
case IP_MTU_DISCOVER:
val = inet->pmtudisc;
break;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index fea7918ad6ef351afc6bfb45d54aae8d658d4b55..37af30fefeca317a6fa1a32db84b6ee3500301a9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -399,7 +399,7 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
sk->sk_ipv6only = 1;
/* Make sure we are allowed to bind here. */
- if (snum || !(inet->bind_address_no_port ||
+ if (snum || !(inet_test_bit(BIND_ADDRESS_NO_PORT, sk) ||
(flags & BIND_FORCE_ADDRESS_NO_PORT))) {
err = sk->sk_prot->get_port(sk, snum);
if (err) {
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 13/15] inet: move inet->defer_connect to inet->inet_flags
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (11 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 12/15] inet: move inet->bind_address_no_port " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 14/15] inet: implement lockless IP_TTL Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 15/15] inet: implement lockless IP_MINTTL Eric Dumazet
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
Make room in struct inet_sock by removing this bit field,
using one available bit in inet_flags instead.
Also move local_port_range to fill the resulting hole,
saving 8 bytes on 64bit arches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/net/inet_sock.h | 10 ++++------
net/ipv4/af_inet.c | 4 ++--
net/ipv4/inet_diag.c | 2 +-
net/ipv4/tcp.c | 12 +++++++-----
net/ipv4/tcp_fastopen.c | 2 +-
net/mptcp/protocol.c | 12 ++++++++----
6 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 5eca2e70cbb2c16d26caa7f219ae53fe066ea3bd..acbb93d7607ab873783802b4be6a23f54e2086d3 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -229,21 +229,18 @@ struct inet_sock {
__u8 min_ttl;
__u8 mc_ttl;
__u8 pmtudisc;
- __u8 defer_connect:1; /* Indicates that fastopen_connect is set
- * and cookie exists so we defer connect
- * until first data frame is written
- */
__u8 rcv_tos;
__u8 convert_csum;
int uc_index;
int mc_index;
__be32 mc_addr;
- struct ip_mc_socklist __rcu *mc_list;
- struct inet_cork_full cork;
struct {
__u16 lo;
__u16 hi;
} local_port_range;
+
+ struct ip_mc_socklist __rcu *mc_list;
+ struct inet_cork_full cork;
};
#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
@@ -270,6 +267,7 @@ enum {
INET_FLAGS_IS_ICSK = 16,
INET_FLAGS_NODEFRAG = 17,
INET_FLAGS_BIND_ADDRESS_NO_PORT = 18,
+ INET_FLAGS_DEFER_CONNECT = 19,
};
/* cmsg flags for inet */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c591f04eb6a9fc3b7b37a4b93b826a35488b9b50..3f4ac026b07ddcc8d5d8a791da363b56f2ce2746 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -646,7 +646,7 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
err = -EISCONN;
goto out;
case SS_CONNECTING:
- if (inet_sk(sk)->defer_connect)
+ if (inet_test_bit(DEFER_CONNECT, sk))
err = is_sendmsg ? -EINPROGRESS : -EISCONN;
else
err = -EALREADY;
@@ -669,7 +669,7 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
sock->state = SS_CONNECTING;
- if (!err && inet_sk(sk)->defer_connect)
+ if (!err && inet_test_bit(DEFER_CONNECT, sk))
goto out;
/* Just entered SS_CONNECTING state; the only
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 128966dea5540caaa94f6b87db4d3960d177caac..e13a84433413ed88088435ff8e11efeb30fc3cca 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -192,7 +192,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
inet_sockopt.nodefrag = inet_test_bit(NODEFRAG, sk);
inet_sockopt.bind_address_no_port = inet_test_bit(BIND_ADDRESS_NO_PORT, sk);
inet_sockopt.recverr_rfc4884 = inet_test_bit(RECVERR_RFC4884, sk);
- inet_sockopt.defer_connect = inet->defer_connect;
+ inet_sockopt.defer_connect = inet_test_bit(DEFER_CONNECT, sk);
if (nla_put(skb, INET_DIAG_SOCKOPT, sizeof(inet_sockopt),
&inet_sockopt))
goto errout;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4fbc7ff8c53c05cbef3d108527239c7ec8c1363e..cee1e548660cb93835102836fe8103666c4c4697 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -583,7 +583,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
if (urg_data & TCP_URG_VALID)
mask |= EPOLLPRI;
- } else if (state == TCP_SYN_SENT && inet_sk(sk)->defer_connect) {
+ } else if (state == TCP_SYN_SENT &&
+ inet_test_bit(DEFER_CONNECT, sk)) {
/* Active TCP fastopen socket with defer_connect
* Return EPOLLOUT so application can call write()
* in order for kernel to generate SYN+data
@@ -1007,7 +1008,7 @@ int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied,
tp->fastopen_req->size = size;
tp->fastopen_req->uarg = uarg;
- if (inet->defer_connect) {
+ if (inet_test_bit(DEFER_CONNECT, sk)) {
err = tcp_connect(sk);
/* Same failure procedure as in tcp_v4/6_connect */
if (err) {
@@ -1025,7 +1026,7 @@ int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied,
if (tp->fastopen_req) {
*copied = tp->fastopen_req->copied;
tcp_free_fastopen_req(tp);
- inet->defer_connect = 0;
+ inet_clear_bit(DEFER_CONNECT, sk);
}
return err;
}
@@ -1066,7 +1067,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
zc = MSG_SPLICE_PAGES;
}
- if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) &&
+ if (unlikely(flags & MSG_FASTOPEN ||
+ inet_test_bit(DEFER_CONNECT, sk)) &&
!tp->repair) {
err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size, uarg);
if (err == -EINPROGRESS && copied_syn > 0)
@@ -3088,7 +3090,7 @@ int tcp_disconnect(struct sock *sk, int flags)
/* Clean up fastopen related fields */
tcp_free_fastopen_req(tp);
- inet->defer_connect = 0;
+ inet_clear_bit(DEFER_CONNECT, sk);
tp->fastopen_client_fail = 0;
WARN_ON(inet->inet_num && !icsk->icsk_bind_hash);
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 85e4953f118215ba7100931dccb37ad871c5dfd2..8ed54e7334a9c646dfbbc6dc41b9ef11b925de0a 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -451,7 +451,7 @@ bool tcp_fastopen_defer_connect(struct sock *sk, int *err)
if (tp->fastopen_connect && !tp->fastopen_req) {
if (tcp_fastopen_cookie_check(sk, &mss, &cookie)) {
- inet_sk(sk)->defer_connect = 1;
+ inet_set_bit(DEFER_CONNECT, sk);
return true;
}
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 48e649fe2360daf3939fccb0f9ec1a2398670a04..2332e1c4ec7b52a12a1c29d41064b6d8277f864e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1690,7 +1690,7 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
if (!mptcp_disconnect(sk, 0))
sk->sk_socket->state = SS_UNCONNECTED;
}
- inet_sk(sk)->defer_connect = 0;
+ inet_clear_bit(DEFER_CONNECT, sk);
return ret;
}
@@ -1708,7 +1708,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
lock_sock(sk);
- if (unlikely(inet_sk(sk)->defer_connect || msg->msg_flags & MSG_FASTOPEN)) {
+ if (unlikely(inet_test_bit(DEFER_CONNECT, sk) ||
+ msg->msg_flags & MSG_FASTOPEN)) {
int copied_syn = 0;
ret = mptcp_sendmsg_fastopen(sk, msg, len, &copied_syn);
@@ -3618,7 +3619,9 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
else
err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
- inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
+
+ inet_assign_bit(DEFER_CONNECT, sk,
+ inet_test_bit(DEFER_CONNECT, ssock->sk));
/* on successful connect, the msk state will be moved to established by
* subflow_finish_connect()
@@ -3837,7 +3840,8 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
mask |= EPOLLOUT | EPOLLWRNORM;
else
mask |= mptcp_check_writeable(msk);
- } else if (state == TCP_SYN_SENT && inet_sk(sk)->defer_connect) {
+ } else if (state == TCP_SYN_SENT &&
+ inet_test_bit(DEFER_CONNECT, sk)) {
/* cf tcp_poll() note about TFO */
mask |= EPOLLOUT | EPOLLWRNORM;
}
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 14/15] inet: implement lockless IP_TTL
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (12 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 13/15] inet: move inet->defer_connect " Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 15/15] inet: implement lockless IP_MINTTL Eric Dumazet
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
ip_select_ttl() is racy, because it reads inet->uc_ttl
without proper locking.
Add READ_ONCE()/WRITE_ONCE() annotations while
allowing IP_TTL socket option to be set/read without
holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
net/ipv4/ip_output.c | 2 +-
net/ipv4/ip_sockglue.c | 27 ++++++++++++---------------
2 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8f396eada1b6e61ab174473e9859bc62a10a0d1c..ce6257860a4019d01e28d57d3ce4981fe79d0a0e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -133,7 +133,7 @@ EXPORT_SYMBOL_GPL(ip_local_out);
static inline int ip_select_ttl(const struct inet_sock *inet,
const struct dst_entry *dst)
{
- int ttl = inet->uc_ttl;
+ int ttl = READ_ONCE(inet->uc_ttl);
if (ttl < 0)
ttl = ip4_dst_hoplimit(dst);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index cfa65a0b0900f2f77bfd800f105ea079e2afff7c..dbb2d2342ebf0c1f1366ee6b6b2158a6118b2659 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1023,6 +1023,13 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
case IP_BIND_ADDRESS_NO_PORT:
inet_assign_bit(BIND_ADDRESS_NO_PORT, sk, val);
return 0;
+ case IP_TTL:
+ if (optlen < 1)
+ return -EINVAL;
+ if (val != -1 && (val < 1 || val > 255))
+ return -EINVAL;
+ WRITE_ONCE(inet->uc_ttl, val);
+ return 0;
}
err = 0;
@@ -1080,13 +1087,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
case IP_TOS: /* This sets both TOS and Precedence */
__ip_sock_set_tos(sk, val);
break;
- case IP_TTL:
- if (optlen < 1)
- goto e_inval;
- if (val != -1 && (val < 1 || val > 255))
- goto e_inval;
- inet->uc_ttl = val;
- break;
case IP_MTU_DISCOVER:
if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_OMIT)
goto e_inval;
@@ -1590,6 +1590,11 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_BIND_ADDRESS_NO_PORT:
val = inet_test_bit(BIND_ADDRESS_NO_PORT, sk);
goto copyval;
+ case IP_TTL:
+ val = READ_ONCE(inet->uc_ttl);
+ if (val < 0)
+ val = READ_ONCE(sock_net(sk)->ipv4.sysctl_ip_default_ttl);
+ goto copyval;
}
if (needs_rtnl)
@@ -1629,14 +1634,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_TOS:
val = inet->tos;
break;
- case IP_TTL:
- {
- struct net *net = sock_net(sk);
- val = (inet->uc_ttl == -1 ?
- READ_ONCE(net->ipv4.sysctl_ip_default_ttl) :
- inet->uc_ttl);
- break;
- }
case IP_MTU_DISCOVER:
val = inet->pmtudisc;
break;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 net-next 15/15] inet: implement lockless IP_MINTTL
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
` (13 preceding siblings ...)
2023-08-11 7:36 ` [PATCH v2 net-next 14/15] inet: implement lockless IP_TTL Eric Dumazet
@ 2023-08-11 7:36 ` Eric Dumazet
14 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 7:36 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet,
Eric Dumazet
inet->min_ttl is already read with READ_ONCE().
Implementing IP_MINTTL socket option set/read
without holding the socket lock is easy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
net/ipv4/ip_sockglue.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index dbb2d2342ebf0c1f1366ee6b6b2158a6118b2659..61b2e7bc7031501ff5a3ebeffc3f90be180fa09e 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1030,6 +1030,17 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
WRITE_ONCE(inet->uc_ttl, val);
return 0;
+ case IP_MINTTL:
+ if (optlen < 1)
+ return -EINVAL;
+ if (val < 0 || val > 255)
+ return -EINVAL;
+
+ if (val)
+ static_branch_enable(&ip4_min_ttl);
+
+ WRITE_ONCE(inet->min_ttl, val);
+ return 0;
}
err = 0;
@@ -1326,21 +1337,6 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname,
err = xfrm_user_policy(sk, optname, optval, optlen);
break;
- case IP_MINTTL:
- if (optlen < 1)
- goto e_inval;
- if (val < 0 || val > 255)
- goto e_inval;
-
- if (val)
- static_branch_enable(&ip4_min_ttl);
-
- /* tcp_v4_err() and tcp_v4_rcv() might read min_ttl
- * while we are changint it.
- */
- WRITE_ONCE(inet->min_ttl, val);
- break;
-
case IP_LOCAL_PORT_RANGE:
{
const __u16 lo = val;
@@ -1595,6 +1591,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
if (val < 0)
val = READ_ONCE(sock_net(sk)->ipv4.sysctl_ip_default_ttl);
goto copyval;
+ case IP_MINTTL:
+ val = READ_ONCE(inet->min_ttl);
+ goto copyval;
}
if (needs_rtnl)
@@ -1731,9 +1730,6 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
len -= msg.msg_controllen;
return copy_to_sockptr(optlen, &len, sizeof(int));
}
- case IP_MINTTL:
- val = inet->min_ttl;
- break;
case IP_LOCAL_PORT_RANGE:
val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
break;
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags
2023-08-11 7:36 ` [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags Eric Dumazet
@ 2023-08-11 14:10 ` kernel test robot
2023-08-11 15:48 ` Eric Dumazet
0 siblings, 1 reply; 18+ messages in thread
From: kernel test robot @ 2023-08-11 14:10 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, Simon Horman, Soheil Hassas Yeganeh, netdev,
eric.dumazet, Eric Dumazet
Hi Eric,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/inet-introduce-inet-inet_flags/20230811-154157
base: net-next/main
patch link: https://lore.kernel.org/r/20230811073621.2874702-8-edumazet%40google.com
patch subject: [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags
config: nios2-randconfig-r001-20230811 (https://download.01.org/0day-ci/archive/20230811/202308112211.xpcXWwEP-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230811/202308112211.xpcXWwEP-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308112211.xpcXWwEP-lkp@intel.com/
All errors (new ones prefixed by >>):
net/netfilter/ipvs/ip_vs_sync.c: In function 'set_mcast_loop':
>> net/netfilter/ipvs/ip_vs_sync.c:1304:13: error: 'struct inet_sock' has no member named 'mc_loop'
1304 | inet->mc_loop = loop ? 1 : 0;
| ^~
vim +1304 net/netfilter/ipvs/ip_vs_sync.c
1c003b1580e20f net/netfilter/ipvs/ip_vs_sync.c Pablo Neira Ayuso 2012-05-08 1294
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1295 /*
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1296 * Setup loopback of outgoing multicasts on a sending socket
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1297 */
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1298 static void set_mcast_loop(struct sock *sk, u_char loop)
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1299 {
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1300 struct inet_sock *inet = inet_sk(sk);
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1301
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1302 /* setsockopt(sock, SOL_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop)); */
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1303 lock_sock(sk);
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 @1304 inet->mc_loop = loop ? 1 : 0;
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1305 #ifdef CONFIG_IP_VS_IPV6
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1306 if (sk->sk_family == AF_INET6) {
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1307 struct ipv6_pinfo *np = inet6_sk(sk);
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1308
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1309 /* IPV6_MULTICAST_LOOP */
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1310 np->mc_loop = loop ? 1 : 0;
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1311 }
d33288172e72c4 net/netfilter/ipvs/ip_vs_sync.c Julian Anastasov 2015-07-26 1312 #endif
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1313 release_sock(sk);
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1314 }
^1da177e4c3f41 net/ipv4/ipvs/ip_vs_sync.c Linus Torvalds 2005-04-16 1315
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags
2023-08-11 14:10 ` kernel test robot
@ 2023-08-11 15:48 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-08-11 15:48 UTC (permalink / raw)
To: kernel test robot
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, oe-kbuild-all,
Simon Horman, Soheil Hassas Yeganeh, netdev, eric.dumazet
On Fri, Aug 11, 2023 at 4:11 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Eric,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on net-next/main]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/inet-introduce-inet-inet_flags/20230811-154157
> base: net-next/main
> patch link: https://lore.kernel.org/r/20230811073621.2874702-8-edumazet%40google.com
> patch subject: [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags
> config: nios2-randconfig-r001-20230811 (https://download.01.org/0day-ci/archive/20230811/202308112211.xpcXWwEP-lkp@intel.com/config)
> compiler: nios2-linux-gcc (GCC) 12.3.0
> reproduce: (https://download.01.org/0day-ci/archive/20230811/202308112211.xpcXWwEP-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202308112211.xpcXWwEP-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> net/netfilter/ipvs/ip_vs_sync.c: In function 'set_mcast_loop':
> >> net/netfilter/ipvs/ip_vs_sync.c:1304:13: error: 'struct inet_sock' has no member named 'mc_loop'
> 1304 | inet->mc_loop = loop ? 1 : 0;
> | ^~
>
>
> vim +1304 net/netfilter/ipvs/ip_vs_sync.c
Oh right, thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-08-11 15:48 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-11 7:36 [PATCH v2 net-next 00/15] inet: socket lock and data-races avoidance Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 01/15] inet: introduce inet->inet_flags Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 02/15] inet: set/get simple options locklessly Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 03/15] inet: move inet->recverr to inet->inet_flags Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 04/15] inet: move inet->recverr_rfc4884 " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 05/15] inet: move inet->freebind " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 06/15] inet: move inet->hdrincl " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 07/15] inet: move inet->mc_loop to inet->inet_frags Eric Dumazet
2023-08-11 14:10 ` kernel test robot
2023-08-11 15:48 ` Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 08/15] inet: move inet->mc_all " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 09/15] inet: move inet->transparent to inet->inet_flags Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 10/15] inet: move inet->is_icsk " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 11/15] inet: move inet->nodefrag " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 12/15] inet: move inet->bind_address_no_port " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 13/15] inet: move inet->defer_connect " Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 14/15] inet: implement lockless IP_TTL Eric Dumazet
2023-08-11 7:36 ` [PATCH v2 net-next 15/15] inet: implement lockless IP_MINTTL Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).