* [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields.
@ 2024-06-03 14:32 Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
` (14 more replies)
0 siblings, 15 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
The patch 1 fixes a bug that SOCK_DGRAM's sk->sk_state is changed
to TCP_CLOSE even if the socket is connect()ed to another socket.
The rest of this series annotates lockless accesses to the following
fields:
* sk->sk_state
* sk->sk_sndbuf
* net->unx.sysctl_max_dgram_qlen
* sk->sk_receive_queue.qlen
* sk->sk_shutdown
Note that with this series there is skb_queue_empty() left in
unix_dgram_disconnected() that needs to be changed to lockless
version, and unix_peer(other) access there should be protected
by unix_state_lock().
This will require some refactoring, so another series will follow.
Kuniyuki Iwashima (15):
af_unix: Set sk->sk_state under unix_state_lock() for truly
disconencted peer.
af_unix: Annodate data-races around sk->sk_state for writers.
af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
af_unix: Annotate data-races around sk->sk_state in unix_write_space()
and poll().
af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
af_unix: Annotate data-race of sk->sk_state in unix_accept().
af_unix: Annotate data-races around sk->sk_state in sendmsg() and
recvmsg().
af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
af_unix: Annotate data-races around sk->sk_sndbuf.
af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
net/unix/af_unix.c | 90 +++++++++++++++++++++++-----------------------
net/unix/diag.c | 12 +++----
2 files changed, 50 insertions(+), 52 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 16:26 ` Cong Wang
2024-06-03 14:32 ` [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers Kuniyuki Iwashima
` (13 subsequent siblings)
14 siblings, 1 reply; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, Cong Wang
When a SOCK_DGRAM socket connect()s to another socket, the both sockets'
sk->sk_state are changed to TCP_ESTABLISHED so that we can register them
to BPF SOCKMAP.
When the socket disconnects from the peer by connect(AF_UNSPEC), the state
is set back to TCP_CLOSE.
Then, the peer's state is also set to TCP_CLOSE, but the update is done
locklessly and unconditionally.
Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects
from B.
After the first two connect()s, all three sockets' sk->sk_state are
TCP_ESTABLISHED:
$ ss -xa
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
u_dgr ESTAB 0 0 @A 641 * 642
u_dgr ESTAB 0 0 @B 642 * 643
u_dgr ESTAB 0 0 @C 643 * 0
And after the disconnect, B's state is TCP_CLOSE even though it's still
connected to C and C's state is TCP_ESTABLISHED.
$ ss -xa
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
u_dgr UNCONN 0 0 @A 641 * 642
u_dgr UNCONN 0 0 @B 642 * 643
u_dgr ESTAB 0 0 @C 643 * 0
In this case, we cannot register B to SOCKMAP.
So, when a socket disconnects from the peer, we should not set TCP_CLOSE to
the peer if the peer is connected to yet another socket, and this must be
done under unix_state_lock().
Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless
readers. These data-races will be fixed in the following patches.
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
CC: Cong Wang <cong.wang@bytedance.com>
---
net/unix/af_unix.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 25b49efc0926..541216382cf5 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -570,7 +570,6 @@ static void unix_dgram_disconnected(struct sock *sk, struct sock *other)
sk_error_report(other);
}
}
- other->sk_state = TCP_CLOSE;
}
static void unix_sock_destructor(struct sock *sk)
@@ -1424,8 +1423,15 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
unix_state_double_unlock(sk, other);
- if (other != old_peer)
+ if (other != old_peer) {
unix_dgram_disconnected(sk, old_peer);
+
+ unix_state_lock(old_peer);
+ if (!unix_peer(old_peer))
+ WRITE_ONCE(old_peer->sk_state, TCP_CLOSE);
+ unix_state_lock(old_peer);
+ }
+
sock_put(old_peer);
} else {
unix_peer(sk) = other;
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers.
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len() Kuniyuki Iwashima
` (12 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
sk->sk_state is changed under unix_state_lock(), but it's read locklessly
in many places.
This patch adds WRITE_ONCE() on the writer side.
We will add READ_ONCE() to the lockless readers in the following patches.
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 541216382cf5..124ca3af1452 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -616,7 +616,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
u->path.dentry = NULL;
u->path.mnt = NULL;
state = sk->sk_state;
- sk->sk_state = TCP_CLOSE;
+ WRITE_ONCE(sk->sk_state, TCP_CLOSE);
skpair = unix_peer(sk);
unix_peer(sk) = NULL;
@@ -738,7 +738,8 @@ static int unix_listen(struct socket *sock, int backlog)
if (backlog > sk->sk_max_ack_backlog)
wake_up_interruptible_all(&u->peer_wait);
sk->sk_max_ack_backlog = backlog;
- sk->sk_state = TCP_LISTEN;
+ WRITE_ONCE(sk->sk_state, TCP_LISTEN);
+
/* set credentials so connect can copy them */
init_peercred(sk);
err = 0;
@@ -1401,7 +1402,8 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
if (err)
goto out_unlock;
- sk->sk_state = other->sk_state = TCP_ESTABLISHED;
+ WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED);
+ WRITE_ONCE(other->sk_state, TCP_ESTABLISHED);
} else {
/*
* 1003.1g breaking connected state with AF_UNSPEC
@@ -1418,7 +1420,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
unix_peer(sk) = other;
if (!other)
- sk->sk_state = TCP_CLOSE;
+ WRITE_ONCE(sk->sk_state, TCP_CLOSE);
unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
unix_state_double_unlock(sk, other);
@@ -1639,7 +1641,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
copy_peercred(sk, other);
sock->state = SS_CONNECTED;
- sk->sk_state = TCP_ESTABLISHED;
+ WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED);
sock_hold(newsk);
smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */
@@ -2050,7 +2052,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
unix_peer(sk) = NULL;
unix_dgram_peer_wake_disconnect_wakeup(sk, other);
- sk->sk_state = TCP_CLOSE;
+ WRITE_ONCE(sk->sk_state, TCP_CLOSE);
unix_state_unlock(sk);
unix_dgram_disconnected(sk, other);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll() Kuniyuki Iwashima
` (11 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first
and returns -EINVAL if it's TCP_LISTEN.
Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of
bytes in recvq.
However, unix_inq_len() does not hold unix_state_lock(), and the
concurrent listen() might change the state after checking sk->sk_state.
If the race occurs, 0 is returned for the listener, instead of -EINVAL,
because the length of skb with embryo is 0.
We could hold unix_state_lock() in unix_inq_len(), but it's overkill
given the result is true for pre-listen() TCP_CLOSE state.
So, let's use READ_ONCE() for sk->sk_state in unix_inq_len().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 124ca3af1452..dfb74822ed2e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3017,7 +3017,7 @@ long unix_inq_len(struct sock *sk)
struct sk_buff *skb;
long amount = 0;
- if (sk->sk_state == TCP_LISTEN)
+ if (READ_ONCE(sk->sk_state) == TCP_LISTEN)
return -EINVAL;
spin_lock(&sk->sk_receive_queue.lock);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (2 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect() Kuniyuki Iwashima
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and
calls unix_writable() which also reads sk->sk_state without holding
unix_state_lock().
Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass
it to unix_writable().
While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as
that state does not exist for AF_UNIX socket since the code was added.
Fixes: 1586a5877db9 ("af_unix: do not report POLLOUT on listeners")
Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index dfb74822ed2e..e7a756e8711d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -530,9 +530,9 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
return 0;
}
-static int unix_writable(const struct sock *sk)
+static int unix_writable(const struct sock *sk, unsigned char state)
{
- return sk->sk_state != TCP_LISTEN &&
+ return state != TCP_LISTEN &&
(refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
}
@@ -541,7 +541,7 @@ static void unix_write_space(struct sock *sk)
struct socket_wq *wq;
rcu_read_lock();
- if (unix_writable(sk)) {
+ if (unix_writable(sk, READ_ONCE(sk->sk_state))) {
wq = rcu_dereference(sk->sk_wq);
if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait,
@@ -3129,12 +3129,14 @@ static int unix_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon
static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait)
{
struct sock *sk = sock->sk;
+ unsigned char state;
__poll_t mask;
u8 shutdown;
sock_poll_wait(file, sock, wait);
mask = 0;
shutdown = READ_ONCE(sk->sk_shutdown);
+ state = READ_ONCE(sk->sk_state);
/* exceptional events? */
if (READ_ONCE(sk->sk_err))
@@ -3156,14 +3158,14 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
/* Connection-based need to check for termination and startup */
if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) &&
- sk->sk_state == TCP_CLOSE)
+ state == TCP_CLOSE)
mask |= EPOLLHUP;
/*
* we set writable also when the other side has shut down the
* connection. This prevents stuck sockets.
*/
- if (unix_writable(sk))
+ if (unix_writable(sk, state))
mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
return mask;
@@ -3174,12 +3176,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
{
struct sock *sk = sock->sk, *other;
unsigned int writable;
+ unsigned char state;
__poll_t mask;
u8 shutdown;
sock_poll_wait(file, sock, wait);
mask = 0;
shutdown = READ_ONCE(sk->sk_shutdown);
+ state = READ_ONCE(sk->sk_state);
/* exceptional events? */
if (READ_ONCE(sk->sk_err) ||
@@ -3199,19 +3203,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
mask |= EPOLLIN | EPOLLRDNORM;
/* Connection-based need to check for termination and startup */
- if (sk->sk_type == SOCK_SEQPACKET) {
- if (sk->sk_state == TCP_CLOSE)
- mask |= EPOLLHUP;
- /* connection hasn't started yet? */
- if (sk->sk_state == TCP_SYN_SENT)
- return mask;
- }
+ if (sk->sk_type == SOCK_SEQPACKET && state == TCP_CLOSE)
+ mask |= EPOLLHUP;
/* No write status requested, avoid expensive OUT tests. */
if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
return mask;
- writable = unix_writable(sk);
+ writable = unix_writable(sk, state);
if (writable) {
unix_state_lock(sk);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (3 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept() Kuniyuki Iwashima
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
As small optimisation, unix_stream_connect() prefetches the client's
sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE.
Later, sk->sk_state is checked again under unix_state_lock().
Let's use READ_ONCE() for the first check and TCP_CLOSE directly for
the second check.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e7a756e8711d..c01ca584a014 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1481,7 +1481,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
struct sk_buff *skb = NULL;
long timeo;
int err;
- int st;
err = unix_validate_addr(sunaddr, addr_len);
if (err)
@@ -1571,9 +1570,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
Well, and we have to recheck the state after socket locked.
*/
- st = sk->sk_state;
-
- switch (st) {
+ switch (READ_ONCE(sk->sk_state)) {
case TCP_CLOSE:
/* This is ok... continue with connect */
break;
@@ -1588,7 +1585,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
unix_state_lock_nested(sk, U_LOCK_SECOND);
- if (sk->sk_state != st) {
+ if (sk->sk_state != TCP_CLOSE) {
unix_state_unlock(sk);
unix_state_unlock(other);
sock_put(other);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (4 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg() Kuniyuki Iwashima
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes the advantage and reads sk->sk_state without
holding unix_state_lock().
Let's use READ_ONCE() there.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c01ca584a014..a97f4305b74f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1710,7 +1710,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock,
goto out;
arg->err = -EINVAL;
- if (sk->sk_state != TCP_LISTEN)
+ if (READ_ONCE(sk->sk_state) != TCP_LISTEN)
goto out;
/* If socket state is TCP_LISTEN it cannot change (for now...),
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (5 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb() Kuniyuki Iwashima
` (7 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
The following functions read sk->sk_state locklessly and proceed only if
the state is TCP_ESTABLISHED.
* unix_stream_sendmsg
* unix_stream_read_generic
* unix_seqpacket_sendmsg
* unix_seqpacket_recvmsg
Let's use READ_ONCE() there.
Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a97f4305b74f..43605bed0ef7 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2226,7 +2226,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
}
if (msg->msg_namelen) {
- err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
+ err = READ_ONCE(sk->sk_state) == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
goto out_err;
} else {
err = -ENOTCONN;
@@ -2340,7 +2340,7 @@ static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg,
if (err)
return err;
- if (sk->sk_state != TCP_ESTABLISHED)
+ if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)
return -ENOTCONN;
if (msg->msg_namelen)
@@ -2354,7 +2354,7 @@ static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
{
struct sock *sk = sock->sk;
- if (sk->sk_state != TCP_ESTABLISHED)
+ if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)
return -ENOTCONN;
return unix_dgram_recvmsg(sock, msg, size, flags);
@@ -2683,7 +2683,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
size_t size = state->size;
unsigned int last_len;
- if (unlikely(sk->sk_state != TCP_ESTABLISHED)) {
+ if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)) {
err = -EINVAL;
goto out;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (6 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG Kuniyuki Iwashima
` (6 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
unix_stream_read_skb() is called from sk->sk_data_ready() context
where unix_state_lock() is not held.
Let's use READ_ONCE() there.
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 43605bed0ef7..8eadc6464301 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2659,7 +2659,7 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
{
- if (unlikely(sk->sk_state != TCP_ESTABLISHED))
+ if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED))
return -ENOTCONN;
return unix_read_skb(sk, recv_actor);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (7 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf Kuniyuki Iwashima
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.
Let's use READ_ONCE() there.
Note that the result could be inconsistent if the socket is dumped
during the state change. This is common for other SOCK_DIAG and
similar interfaces.
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/diag.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c
index ae39538c5042..116cf508aea4 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -65,7 +65,7 @@ static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb)
u32 *buf;
int i;
- if (sk->sk_state == TCP_LISTEN) {
+ if (READ_ONCE(sk->sk_state) == TCP_LISTEN) {
spin_lock(&sk->sk_receive_queue.lock);
attr = nla_reserve(nlskb, UNIX_DIAG_ICONS,
@@ -103,7 +103,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
{
struct unix_diag_rqlen rql;
- if (sk->sk_state == TCP_LISTEN) {
+ if (READ_ONCE(sk->sk_state) == TCP_LISTEN) {
rql.udiag_rqueue = sk->sk_receive_queue.qlen;
rql.udiag_wqueue = sk->sk_max_ack_backlog;
} else {
@@ -136,7 +136,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
rep = nlmsg_data(nlh);
rep->udiag_family = AF_UNIX;
rep->udiag_type = sk->sk_type;
- rep->udiag_state = sk->sk_state;
+ rep->udiag_state = READ_ONCE(sk->sk_state);
rep->pad = 0;
rep->udiag_ino = sk_ino;
sock_diag_save_cookie(sk, rep->udiag_cookie);
@@ -215,7 +215,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
sk_for_each(sk, &net->unx.table.buckets[slot]) {
if (num < s_num)
goto next;
- if (!(req->udiag_states & (1 << sk->sk_state)))
+ if (!(req->udiag_states & (1 << READ_ONCE(sk->sk_state))))
goto next;
if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk),
NETLINK_CB(cb->skb).portid,
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf.
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (8 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen Kuniyuki Iwashima
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
not used in af_unix.c.
Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
unix_dgram_sendmsg(), and unix_stream_sendmsg().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 8eadc6464301..e99b94fc80b3 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -533,7 +533,7 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
static int unix_writable(const struct sock *sk, unsigned char state)
{
return state != TCP_LISTEN &&
- (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+ (refcount_read(&sk->sk_wmem_alloc) << 2) <= READ_ONCE(sk->sk_sndbuf);
}
static void unix_write_space(struct sock *sk)
@@ -1967,7 +1967,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
}
err = -EMSGSIZE;
- if (len > sk->sk_sndbuf - 32)
+ if (len > READ_ONCE(sk->sk_sndbuf) - 32)
goto out;
if (len > SKB_MAX_ALLOC) {
@@ -2247,7 +2247,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
&err, 0);
} else {
/* Keep two messages in the pipe so it schedules better */
- size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64);
+ size = min_t(int, size, (READ_ONCE(sk->sk_sndbuf) >> 1) - 64);
/* allow fallback to order-0 allocations */
size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (9 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect() Kuniyuki Iwashima
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.
Let's use READ_ONCE() in unix_create1().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e99b94fc80b3..6773f1465e50 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -976,7 +976,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
sk->sk_hash = unix_unbound_hash(sk);
sk->sk_allocation = GFP_KERNEL_ACCOUNT;
sk->sk_write_space = unix_write_space;
- sk->sk_max_ack_backlog = net->unx.sysctl_max_dgram_qlen;
+ sk->sk_max_ack_backlog = READ_ONCE(net->unx.sysctl_max_dgram_qlen);
sk->sk_destruct = unix_sock_destructor;
u = unix_sk(sk);
u->listener = NULL;
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (10 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock() Kuniyuki Iwashima
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.
It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().
Thus, we need to use unix_recvq_full_lockless() to avoid data-race.
Now we remove unix_recvq_full() as no one uses it.
Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:
(1) For SOCK_DGRAM, it is a written-once field in unix_create1()
(2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
listener's unix_state_lock() in unix_listen(), and we hold
the lock in unix_stream_connect()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 6773f1465e50..88cda8af30d4 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -221,15 +221,9 @@ static inline int unix_may_send(struct sock *sk, struct sock *osk)
return unix_peer(osk) == NULL || unix_our_peer(sk, osk);
}
-static inline int unix_recvq_full(const struct sock *sk)
-{
- return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
-}
-
static inline int unix_recvq_full_lockless(const struct sock *sk)
{
- return skb_queue_len_lockless(&sk->sk_receive_queue) >
- READ_ONCE(sk->sk_max_ack_backlog);
+ return skb_queue_len_lockless(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
struct sock *unix_peer_get(struct sock *s)
@@ -1545,7 +1539,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
if (other->sk_shutdown & RCV_SHUTDOWN)
goto out_unlock;
- if (unix_recvq_full(other)) {
+ if (unix_recvq_full_lockless(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (11 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill() Kuniyuki Iwashima
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().
However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock. Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().
Thues, unix_state_lock() does not protect qlen.
Let's use skb_queue_empty_lockless() in unix_release_sock().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 88cda8af30d4..0c49c34551e6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -631,7 +631,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
unix_state_lock(skpair);
/* No more writes */
WRITE_ONCE(skpair->sk_shutdown, SHUTDOWN_MASK);
- if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
+ if (!skb_queue_empty_lockless(&sk->sk_receive_queue) || embrion)
WRITE_ONCE(skpair->sk_err, ECONNRESET);
unix_state_unlock(skpair);
skpair->sk_state_change(skpair);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (12 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill() Kuniyuki Iwashima
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.
If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.
Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/diag.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c
index 116cf508aea4..321336f91a0a 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -104,7 +104,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
struct unix_diag_rqlen rql;
if (READ_ONCE(sk->sk_state) == TCP_LISTEN) {
- rql.udiag_rqueue = sk->sk_receive_queue.qlen;
+ rql.udiag_rqueue = skb_queue_len_lockless(&sk->sk_receive_queue);
rql.udiag_wqueue = sk->sk_max_ack_backlog;
} else {
rql.udiag_rqueue = (u32) unix_inq_len(sk);
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
` (13 preceding siblings ...)
2024-06-03 14:32 ` [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().
Let's use READ_ONCE() to read sk->sk_shutdown.
Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/diag.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c
index 321336f91a0a..937edf4afed4 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -165,7 +165,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
sock_diag_put_meminfo(sk, skb, UNIX_DIAG_MEMINFO))
goto out_nlmsg_trim;
- if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, sk->sk_shutdown))
+ if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, READ_ONCE(sk->sk_shutdown)))
goto out_nlmsg_trim;
if ((req->udiag_show & UDIAG_SHOW_UID) &&
--
2.30.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
@ 2024-06-03 16:26 ` Cong Wang
2024-06-03 16:32 ` Kuniyuki Iwashima
0 siblings, 1 reply; 18+ messages in thread
From: Cong Wang @ 2024-06-03 16:26 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Kuniyuki Iwashima, netdev, Cong Wang
On Mon, Jun 03, 2024 at 07:32:17AM -0700, Kuniyuki Iwashima wrote:
> - if (other != old_peer)
> + if (other != old_peer) {
> unix_dgram_disconnected(sk, old_peer);
> +
> + unix_state_lock(old_peer);
> + if (!unix_peer(old_peer))
> + WRITE_ONCE(old_peer->sk_state, TCP_CLOSE);
> + unix_state_lock(old_peer);
lock() old_peer twice? Has it been tested? ;-)
Regards.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
2024-06-03 16:26 ` Cong Wang
@ 2024-06-03 16:32 ` Kuniyuki Iwashima
0 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 16:32 UTC (permalink / raw)
To: xiyou.wangcong
Cc: cong.wang, davem, edumazet, kuba, kuni1840, kuniyu, netdev,
pabeni
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 3 Jun 2024 09:26:29 -0700
> On Mon, Jun 03, 2024 at 07:32:17AM -0700, Kuniyuki Iwashima wrote:
> > - if (other != old_peer)
> > + if (other != old_peer) {
> > unix_dgram_disconnected(sk, old_peer);
> > +
> > + unix_state_lock(old_peer);
> > + if (!unix_peer(old_peer))
> > + WRITE_ONCE(old_peer->sk_state, TCP_CLOSE);
> > + unix_state_lock(old_peer);
>
> lock() old_peer twice? Has it been tested? ;-)B
Ugh, apparently no :S (compile-test only)
Should've run the same command in the changelog.
Will fix in v2.
Thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-06-03 16:32 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
2024-06-03 16:26 ` Cong Wang
2024-06-03 16:32 ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill() Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).