netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields.
@ 2024-06-03 14:32 Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
                   ` (14 more replies)
  0 siblings, 15 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

The patch 1 fixes a bug that SOCK_DGRAM's sk->sk_state is changed
to TCP_CLOSE even if the socket is connect()ed to another socket.

The rest of this series annotates lockless accesses to the following
fields:

  * sk->sk_state
  * sk->sk_sndbuf
  * net->unx.sysctl_max_dgram_qlen
  * sk->sk_receive_queue.qlen
  * sk->sk_shutdown

Note that with this series there is skb_queue_empty() left in
unix_dgram_disconnected() that needs to be changed to lockless
version, and unix_peer(other) access there should be protected
by unix_state_lock().

This will require some refactoring, so another series will follow.


Kuniyuki Iwashima (15):
  af_unix: Set sk->sk_state under unix_state_lock() for truly
    disconencted peer.
  af_unix: Annodate data-races around sk->sk_state for writers.
  af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
  af_unix: Annotate data-races around sk->sk_state in unix_write_space()
    and poll().
  af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
  af_unix: Annotate data-race of sk->sk_state in unix_accept().
  af_unix: Annotate data-races around sk->sk_state in sendmsg() and
    recvmsg().
  af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
  af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
  af_unix: Annotate data-races around sk->sk_sndbuf.
  af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
  af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
  af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
  af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
  af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().

 net/unix/af_unix.c | 90 +++++++++++++++++++++++-----------------------
 net/unix/diag.c    | 12 +++----
 2 files changed, 50 insertions(+), 52 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 16:26   ` Cong Wang
  2024-06-03 14:32 ` [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers Kuniyuki Iwashima
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, Cong Wang

When a SOCK_DGRAM socket connect()s to another socket, the both sockets'
sk->sk_state are changed to TCP_ESTABLISHED so that we can register them
to BPF SOCKMAP.

When the socket disconnects from the peer by connect(AF_UNSPEC), the state
is set back to TCP_CLOSE.

Then, the peer's state is also set to TCP_CLOSE, but the update is done
locklessly and unconditionally.

Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects
from B.

After the first two connect()s, all three sockets' sk->sk_state are
TCP_ESTABLISHED:

  $ ss -xa
  Netid State  Recv-Q Send-Q  Local Address:Port  Peer Address:PortProcess
  u_dgr ESTAB  0      0       @A 641              * 642
  u_dgr ESTAB  0      0       @B 642              * 643
  u_dgr ESTAB  0      0       @C 643              * 0

And after the disconnect, B's state is TCP_CLOSE even though it's still
connected to C and C's state is TCP_ESTABLISHED.

  $ ss -xa
  Netid State  Recv-Q Send-Q  Local Address:Port  Peer Address:PortProcess
  u_dgr UNCONN 0      0       @A 641              * 642
  u_dgr UNCONN 0      0       @B 642              * 643
  u_dgr ESTAB  0      0       @C 643              * 0

In this case, we cannot register B to SOCKMAP.

So, when a socket disconnects from the peer, we should not set TCP_CLOSE to
the peer if the peer is connected to yet another socket, and this must be
done under unix_state_lock().

Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless
readers.  These data-races will be fixed in the following patches.

Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
CC: Cong Wang <cong.wang@bytedance.com>
---
 net/unix/af_unix.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 25b49efc0926..541216382cf5 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -570,7 +570,6 @@ static void unix_dgram_disconnected(struct sock *sk, struct sock *other)
 			sk_error_report(other);
 		}
 	}
-	other->sk_state = TCP_CLOSE;
 }
 
 static void unix_sock_destructor(struct sock *sk)
@@ -1424,8 +1423,15 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 
 		unix_state_double_unlock(sk, other);
 
-		if (other != old_peer)
+		if (other != old_peer) {
 			unix_dgram_disconnected(sk, old_peer);
+
+			unix_state_lock(old_peer);
+			if (!unix_peer(old_peer))
+				WRITE_ONCE(old_peer->sk_state, TCP_CLOSE);
+			unix_state_lock(old_peer);
+		}
+
 		sock_put(old_peer);
 	} else {
 		unix_peer(sk) = other;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers.
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len() Kuniyuki Iwashima
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

sk->sk_state is changed under unix_state_lock(), but it's read locklessly
in many places.

This patch adds WRITE_ONCE() on the writer side.

We will add READ_ONCE() to the lockless readers in the following patches.

Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 541216382cf5..124ca3af1452 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -616,7 +616,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
 	u->path.dentry = NULL;
 	u->path.mnt = NULL;
 	state = sk->sk_state;
-	sk->sk_state = TCP_CLOSE;
+	WRITE_ONCE(sk->sk_state, TCP_CLOSE);
 
 	skpair = unix_peer(sk);
 	unix_peer(sk) = NULL;
@@ -738,7 +738,8 @@ static int unix_listen(struct socket *sock, int backlog)
 	if (backlog > sk->sk_max_ack_backlog)
 		wake_up_interruptible_all(&u->peer_wait);
 	sk->sk_max_ack_backlog	= backlog;
-	sk->sk_state		= TCP_LISTEN;
+	WRITE_ONCE(sk->sk_state, TCP_LISTEN);
+
 	/* set credentials so connect can copy them */
 	init_peercred(sk);
 	err = 0;
@@ -1401,7 +1402,8 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		if (err)
 			goto out_unlock;
 
-		sk->sk_state = other->sk_state = TCP_ESTABLISHED;
+		WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED);
+		WRITE_ONCE(other->sk_state, TCP_ESTABLISHED);
 	} else {
 		/*
 		 *	1003.1g breaking connected state with AF_UNSPEC
@@ -1418,7 +1420,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 
 		unix_peer(sk) = other;
 		if (!other)
-			sk->sk_state = TCP_CLOSE;
+			WRITE_ONCE(sk->sk_state, TCP_CLOSE);
 		unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
 
 		unix_state_double_unlock(sk, other);
@@ -1639,7 +1641,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	copy_peercred(sk, other);
 
 	sock->state	= SS_CONNECTED;
-	sk->sk_state	= TCP_ESTABLISHED;
+	WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED);
 	sock_hold(newsk);
 
 	smp_mb__after_atomic();	/* sock_hold() does an atomic_inc() */
@@ -2050,7 +2052,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			unix_peer(sk) = NULL;
 			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
 
-			sk->sk_state = TCP_CLOSE;
+			WRITE_ONCE(sk->sk_state, TCP_CLOSE);
 			unix_state_unlock(sk);
 
 			unix_dgram_disconnected(sk, other);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll() Kuniyuki Iwashima
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first
and returns -EINVAL if it's TCP_LISTEN.

Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of
bytes in recvq.

However, unix_inq_len() does not hold unix_state_lock(), and the
concurrent listen() might change the state after checking sk->sk_state.

If the race occurs, 0 is returned for the listener, instead of -EINVAL,
because the length of skb with embryo is 0.

We could hold unix_state_lock() in unix_inq_len(), but it's overkill
given the result is true for pre-listen() TCP_CLOSE state.

So, let's use READ_ONCE() for sk->sk_state in unix_inq_len().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 124ca3af1452..dfb74822ed2e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3017,7 +3017,7 @@ long unix_inq_len(struct sock *sk)
 	struct sk_buff *skb;
 	long amount = 0;
 
-	if (sk->sk_state == TCP_LISTEN)
+	if (READ_ONCE(sk->sk_state) == TCP_LISTEN)
 		return -EINVAL;
 
 	spin_lock(&sk->sk_receive_queue.lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect() Kuniyuki Iwashima
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and
calls unix_writable() which also reads sk->sk_state without holding
unix_state_lock().

Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass
it to unix_writable().

While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as
that state does not exist for AF_UNIX socket since the code was added.

Fixes: 1586a5877db9 ("af_unix: do not report POLLOUT on listeners")
Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index dfb74822ed2e..e7a756e8711d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -530,9 +530,9 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
 	return 0;
 }
 
-static int unix_writable(const struct sock *sk)
+static int unix_writable(const struct sock *sk, unsigned char state)
 {
-	return sk->sk_state != TCP_LISTEN &&
+	return state != TCP_LISTEN &&
 	       (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
 }
 
@@ -541,7 +541,7 @@ static void unix_write_space(struct sock *sk)
 	struct socket_wq *wq;
 
 	rcu_read_lock();
-	if (unix_writable(sk)) {
+	if (unix_writable(sk, READ_ONCE(sk->sk_state))) {
 		wq = rcu_dereference(sk->sk_wq);
 		if (skwq_has_sleeper(wq))
 			wake_up_interruptible_sync_poll(&wq->wait,
@@ -3129,12 +3129,14 @@ static int unix_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon
 static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait)
 {
 	struct sock *sk = sock->sk;
+	unsigned char state;
 	__poll_t mask;
 	u8 shutdown;
 
 	sock_poll_wait(file, sock, wait);
 	mask = 0;
 	shutdown = READ_ONCE(sk->sk_shutdown);
+	state = READ_ONCE(sk->sk_state);
 
 	/* exceptional events? */
 	if (READ_ONCE(sk->sk_err))
@@ -3156,14 +3158,14 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 
 	/* Connection-based need to check for termination and startup */
 	if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) &&
-	    sk->sk_state == TCP_CLOSE)
+	    state == TCP_CLOSE)
 		mask |= EPOLLHUP;
 
 	/*
 	 * we set writable also when the other side has shut down the
 	 * connection. This prevents stuck sockets.
 	 */
-	if (unix_writable(sk))
+	if (unix_writable(sk, state))
 		mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
 
 	return mask;
@@ -3174,12 +3176,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 {
 	struct sock *sk = sock->sk, *other;
 	unsigned int writable;
+	unsigned char state;
 	__poll_t mask;
 	u8 shutdown;
 
 	sock_poll_wait(file, sock, wait);
 	mask = 0;
 	shutdown = READ_ONCE(sk->sk_shutdown);
+	state = READ_ONCE(sk->sk_state);
 
 	/* exceptional events? */
 	if (READ_ONCE(sk->sk_err) ||
@@ -3199,19 +3203,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
-	if (sk->sk_type == SOCK_SEQPACKET) {
-		if (sk->sk_state == TCP_CLOSE)
-			mask |= EPOLLHUP;
-		/* connection hasn't started yet? */
-		if (sk->sk_state == TCP_SYN_SENT)
-			return mask;
-	}
+	if (sk->sk_type == SOCK_SEQPACKET && state == TCP_CLOSE)
+		mask |= EPOLLHUP;
 
 	/* No write status requested, avoid expensive OUT tests. */
 	if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
 		return mask;
 
-	writable = unix_writable(sk);
+	writable = unix_writable(sk, state);
 	if (writable) {
 		unix_state_lock(sk);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept() Kuniyuki Iwashima
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

As small optimisation, unix_stream_connect() prefetches the client's
sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE.

Later, sk->sk_state is checked again under unix_state_lock().

Let's use READ_ONCE() for the first check and TCP_CLOSE directly for
the second check.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e7a756e8711d..c01ca584a014 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1481,7 +1481,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	struct sk_buff *skb = NULL;
 	long timeo;
 	int err;
-	int st;
 
 	err = unix_validate_addr(sunaddr, addr_len);
 	if (err)
@@ -1571,9 +1570,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 
 	   Well, and we have to recheck the state after socket locked.
 	 */
-	st = sk->sk_state;
-
-	switch (st) {
+	switch (READ_ONCE(sk->sk_state)) {
 	case TCP_CLOSE:
 		/* This is ok... continue with connect */
 		break;
@@ -1588,7 +1585,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 
 	unix_state_lock_nested(sk, U_LOCK_SECOND);
 
-	if (sk->sk_state != st) {
+	if (sk->sk_state != TCP_CLOSE) {
 		unix_state_unlock(sk);
 		unix_state_unlock(other);
 		sock_put(other);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg() Kuniyuki Iwashima
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes the advantage and reads sk->sk_state without
holding unix_state_lock().

Let's use READ_ONCE() there.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c01ca584a014..a97f4305b74f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1710,7 +1710,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock,
 		goto out;
 
 	arg->err = -EINVAL;
-	if (sk->sk_state != TCP_LISTEN)
+	if (READ_ONCE(sk->sk_state) != TCP_LISTEN)
 		goto out;
 
 	/* If socket state is TCP_LISTEN it cannot change (for now...),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (5 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb() Kuniyuki Iwashima
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

The following functions read sk->sk_state locklessly and proceed only if
the state is TCP_ESTABLISHED.

  * unix_stream_sendmsg
  * unix_stream_read_generic
  * unix_seqpacket_sendmsg
  * unix_seqpacket_recvmsg

Let's use READ_ONCE() there.

Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a97f4305b74f..43605bed0ef7 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2226,7 +2226,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 	}
 
 	if (msg->msg_namelen) {
-		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
+		err = READ_ONCE(sk->sk_state) == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
 		goto out_err;
 	} else {
 		err = -ENOTCONN;
@@ -2340,7 +2340,7 @@ static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg,
 	if (err)
 		return err;
 
-	if (sk->sk_state != TCP_ESTABLISHED)
+	if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)
 		return -ENOTCONN;
 
 	if (msg->msg_namelen)
@@ -2354,7 +2354,7 @@ static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
 {
 	struct sock *sk = sock->sk;
 
-	if (sk->sk_state != TCP_ESTABLISHED)
+	if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)
 		return -ENOTCONN;
 
 	return unix_dgram_recvmsg(sock, msg, size, flags);
@@ -2683,7 +2683,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
 	size_t size = state->size;
 	unsigned int last_len;
 
-	if (unlikely(sk->sk_state != TCP_ESTABLISHED)) {
+	if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)) {
 		err = -EINVAL;
 		goto out;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (6 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG Kuniyuki Iwashima
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

unix_stream_read_skb() is called from sk->sk_data_ready() context
where unix_state_lock() is not held.

Let's use READ_ONCE() there.

Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 43605bed0ef7..8eadc6464301 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2659,7 +2659,7 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
 
 static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
-	if (unlikely(sk->sk_state != TCP_ESTABLISHED))
+	if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED))
 		return -ENOTCONN;
 
 	return unix_read_skb(sk, recv_actor);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (7 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf Kuniyuki Iwashima
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.

Let's use READ_ONCE() there.

Note that the result could be inconsistent if the socket is dumped
during the state change.  This is common for other SOCK_DIAG and
similar interfaces.

Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/diag.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index ae39538c5042..116cf508aea4 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -65,7 +65,7 @@ static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb)
 	u32 *buf;
 	int i;
 
-	if (sk->sk_state == TCP_LISTEN) {
+	if (READ_ONCE(sk->sk_state) == TCP_LISTEN) {
 		spin_lock(&sk->sk_receive_queue.lock);
 
 		attr = nla_reserve(nlskb, UNIX_DIAG_ICONS,
@@ -103,7 +103,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 {
 	struct unix_diag_rqlen rql;
 
-	if (sk->sk_state == TCP_LISTEN) {
+	if (READ_ONCE(sk->sk_state) == TCP_LISTEN) {
 		rql.udiag_rqueue = sk->sk_receive_queue.qlen;
 		rql.udiag_wqueue = sk->sk_max_ack_backlog;
 	} else {
@@ -136,7 +136,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
 	rep = nlmsg_data(nlh);
 	rep->udiag_family = AF_UNIX;
 	rep->udiag_type = sk->sk_type;
-	rep->udiag_state = sk->sk_state;
+	rep->udiag_state = READ_ONCE(sk->sk_state);
 	rep->pad = 0;
 	rep->udiag_ino = sk_ino;
 	sock_diag_save_cookie(sk, rep->udiag_cookie);
@@ -215,7 +215,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 		sk_for_each(sk, &net->unx.table.buckets[slot]) {
 			if (num < s_num)
 				goto next;
-			if (!(req->udiag_states & (1 << sk->sk_state)))
+			if (!(req->udiag_states & (1 << READ_ONCE(sk->sk_state))))
 				goto next;
 			if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk),
 					 NETLINK_CB(cb->skb).portid,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf.
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (8 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
not used in af_unix.c.

Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
unix_dgram_sendmsg(), and unix_stream_sendmsg().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 8eadc6464301..e99b94fc80b3 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -533,7 +533,7 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
 static int unix_writable(const struct sock *sk, unsigned char state)
 {
 	return state != TCP_LISTEN &&
-	       (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+		(refcount_read(&sk->sk_wmem_alloc) << 2) <= READ_ONCE(sk->sk_sndbuf);
 }
 
 static void unix_write_space(struct sock *sk)
@@ -1967,7 +1967,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	}
 
 	err = -EMSGSIZE;
-	if (len > sk->sk_sndbuf - 32)
+	if (len > READ_ONCE(sk->sk_sndbuf) - 32)
 		goto out;
 
 	if (len > SKB_MAX_ALLOC) {
@@ -2247,7 +2247,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 						   &err, 0);
 		} else {
 			/* Keep two messages in the pipe so it schedules better */
-			size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64);
+			size = min_t(int, size, (READ_ONCE(sk->sk_sndbuf) >> 1) - 64);
 
 			/* allow fallback to order-0 allocations */
 			size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (9 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect() Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.

Let's use READ_ONCE() in unix_create1().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e99b94fc80b3..6773f1465e50 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -976,7 +976,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 	sk->sk_hash		= unix_unbound_hash(sk);
 	sk->sk_allocation	= GFP_KERNEL_ACCOUNT;
 	sk->sk_write_space	= unix_write_space;
-	sk->sk_max_ack_backlog	= net->unx.sysctl_max_dgram_qlen;
+	sk->sk_max_ack_backlog	= READ_ONCE(net->unx.sysctl_max_dgram_qlen);
 	sk->sk_destruct		= unix_sock_destructor;
 	u = unix_sk(sk);
 	u->listener = NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (10 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock() Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.

It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().

Thus, we need to use unix_recvq_full_lockless() to avoid data-race.

Now we remove unix_recvq_full() as no one uses it.

Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:

  (1) For SOCK_DGRAM, it is a written-once field in unix_create1()

  (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
      listener's unix_state_lock() in unix_listen(), and we hold
      the lock in unix_stream_connect()

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 6773f1465e50..88cda8af30d4 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -221,15 +221,9 @@ static inline int unix_may_send(struct sock *sk, struct sock *osk)
 	return unix_peer(osk) == NULL || unix_our_peer(sk, osk);
 }
 
-static inline int unix_recvq_full(const struct sock *sk)
-{
-	return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
-}
-
 static inline int unix_recvq_full_lockless(const struct sock *sk)
 {
-	return skb_queue_len_lockless(&sk->sk_receive_queue) >
-		READ_ONCE(sk->sk_max_ack_backlog);
+	return skb_queue_len_lockless(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
 }
 
 struct sock *unix_peer_get(struct sock *s)
@@ -1545,7 +1539,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	if (other->sk_shutdown & RCV_SHUTDOWN)
 		goto out_unlock;
 
-	if (unix_recvq_full(other)) {
+	if (unix_recvq_full_lockless(other)) {
 		err = -EAGAIN;
 		if (!timeo)
 			goto out_unlock;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (11 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen() Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill() Kuniyuki Iwashima
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().

However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock.  Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().

Thues, unix_state_lock() does not protect qlen.

Let's use skb_queue_empty_lockless() in unix_release_sock().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 88cda8af30d4..0c49c34551e6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -631,7 +631,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
 			unix_state_lock(skpair);
 			/* No more writes */
 			WRITE_ONCE(skpair->sk_shutdown, SHUTDOWN_MASK);
-			if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
+			if (!skb_queue_empty_lockless(&sk->sk_receive_queue) || embrion)
 				WRITE_ONCE(skpair->sk_err, ECONNRESET);
 			unix_state_unlock(skpair);
 			skpair->sk_state_change(skpair);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (12 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  2024-06-03 14:32 ` [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill() Kuniyuki Iwashima
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.

If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.

Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().

Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index 116cf508aea4..321336f91a0a 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -104,7 +104,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 	struct unix_diag_rqlen rql;
 
 	if (READ_ONCE(sk->sk_state) == TCP_LISTEN) {
-		rql.udiag_rqueue = sk->sk_receive_queue.qlen;
+		rql.udiag_rqueue = skb_queue_len_lockless(&sk->sk_receive_queue);
 		rql.udiag_wqueue = sk->sk_max_ack_backlog;
 	} else {
 		rql.udiag_rqueue = (u32) unix_inq_len(sk);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
  2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
                   ` (13 preceding siblings ...)
  2024-06-03 14:32 ` [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen() Kuniyuki Iwashima
@ 2024-06-03 14:32 ` Kuniyuki Iwashima
  14 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 14:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().

Let's use READ_ONCE() to read sk->sk_shutdown.

Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index 321336f91a0a..937edf4afed4 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -165,7 +165,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
 	    sock_diag_put_meminfo(sk, skb, UNIX_DIAG_MEMINFO))
 		goto out_nlmsg_trim;
 
-	if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, sk->sk_shutdown))
+	if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, READ_ONCE(sk->sk_shutdown)))
 		goto out_nlmsg_trim;
 
 	if ((req->udiag_show & UDIAG_SHOW_UID) &&
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
  2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
@ 2024-06-03 16:26   ` Cong Wang
  2024-06-03 16:32     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 18+ messages in thread
From: Cong Wang @ 2024-06-03 16:26 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, netdev, Cong Wang

On Mon, Jun 03, 2024 at 07:32:17AM -0700, Kuniyuki Iwashima wrote:
> -		if (other != old_peer)
> +		if (other != old_peer) {
>  			unix_dgram_disconnected(sk, old_peer);
> +
> +			unix_state_lock(old_peer);
> +			if (!unix_peer(old_peer))
> +				WRITE_ONCE(old_peer->sk_state, TCP_CLOSE);
> +			unix_state_lock(old_peer);

lock() old_peer twice? Has it been tested? ;-)

Regards.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
  2024-06-03 16:26   ` Cong Wang
@ 2024-06-03 16:32     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 18+ messages in thread
From: Kuniyuki Iwashima @ 2024-06-03 16:32 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: cong.wang, davem, edumazet, kuba, kuni1840, kuniyu, netdev,
	pabeni

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 3 Jun 2024 09:26:29 -0700
> On Mon, Jun 03, 2024 at 07:32:17AM -0700, Kuniyuki Iwashima wrote:
> > -		if (other != old_peer)
> > +		if (other != old_peer) {
> >  			unix_dgram_disconnected(sk, old_peer);
> > +
> > +			unix_state_lock(old_peer);
> > +			if (!unix_peer(old_peer))
> > +				WRITE_ONCE(old_peer->sk_state, TCP_CLOSE);
> > +			unix_state_lock(old_peer);
> 
> lock() old_peer twice? Has it been tested? ;-)B

Ugh, apparently no :S  (compile-test only)

Should've run the same command in the changelog.
Will fix in v2.

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-06-03 16:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 14:32 [PATCH v1 net 00/15] af_unix: Fix lockless access of sk->sk_state and others fields Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer Kuniyuki Iwashima
2024-06-03 16:26   ` Cong Wang
2024-06-03 16:32     ` Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 02/15] af_unix: Annodate data-races around sk->sk_state for writers Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 03/15] af_unix: Annotate data-race of sk->sk_state in unix_inq_len() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 04/15] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 05/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_connect() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 06/15] af_unix: Annotate data-race of sk->sk_state in unix_accept() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 07/15] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 08/15] af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 09/15] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 10/15] af_unix: Annotate data-races around sk->sk_sndbuf Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 11/15] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 12/15] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 13/15] af_unix: Use skb_queue_empty_lockless() in unix_release_sock() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 14/15] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen() Kuniyuki Iwashima
2024-06-03 14:32 ` [PATCH v1 net 15/15] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill() Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).