From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
Kuniyuki Iwashima <kuni1840@gmail.com>, <netdev@vger.kernel.org>
Subject: [PATCH v2 net-next 03/11] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect().
Date: Tue, 11 Jun 2024 15:28:57 -0700 [thread overview]
Message-ID: <20240611222905.34695-4-kuniyu@amazon.com> (raw)
In-Reply-To: <20240611222905.34695-1-kuniyu@amazon.com>
When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
to lock the two sockets to check their states in unix_stream_connect().
We use unix_state_lock() for the server and unix_state_lock_nested() for
client with tricky sk->sk_state check to avoid deadlock.
The possible deadlock scenario are the following:
1) Self connect()
2) Simultaneous connect()
The former is simple, attempt to grab the same lock, and the latter is
AB-BA deadlock.
After the server's unix_state_lock(), we check the server socket's state,
and if it's not TCP_LISTEN, connect() fails with -EINVAL.
Then, we avoid the former deadlock by checking the client's state before
unix_state_lock_nested(). If its state is not TCP_LISTEN, we can make
sure that the client and the server are not identical based on the state.
Also, the latter deadlock can be avoided in the same way. Due to the
server sk->sk_state requirement, AB-BA deadlock could happen only with
TCP_LISTEN sockets. So, if the client's state is TCP_LISTEN, we can
give up the second lock to avoid the deadlock.
CPU 1 CPU 2 CPU 3
connect(A -> B) connect(B -> A) listen(A)
--- --- ---
unix_state_lock(B)
B->sk_state == TCP_LISTEN
READ_ONCE(A->sk_state) == TCP_CLOSE
^^^^^^^^^
ok, will lock A unix_state_lock(A)
.--------------' WRITE_ONCE(A->sk_state, TCP_LISTEN)
| unix_state_unlock(A)
|
| unix_state_lock(A)
| A->sk_sk_state == TCP_LISTEN
| READ_ONCE(B->sk_state) == TCP_LISTEN
v ^^^^^^^^^^
unix_state_lock_nested(A) Don't lock B !!
Currently, while checking the client's state, we also check if it's
TCP_ESTABLISHED, but this is unlikely and can be checked after we know
the state is not TCP_CLOSE.
Moreover, if it happens after the second lock, we now jump to the restart
label, but it's unlikely that the server is not found during the retry,
so the jump is mostly to revist the client state check.
Let's remove the retry logic and check the state against TCP_CLOSE first.
Note that sk->sk_state does not change once it's changed from TCP_CLOSE,
so READ_ONCE() is not needed in the second state read in the first check.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/unix/af_unix.c | 34 ++++++++--------------------------
1 file changed, 8 insertions(+), 26 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c09bf2b03582..a6dc8bb360ca 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1546,7 +1546,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
goto out;
}
- /* Latch state of peer */
unix_state_lock(other);
/* Apparently VFS overslept socket death. Retry. */
@@ -1576,37 +1575,20 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
goto restart;
}
- /* Latch our state.
-
- It is tricky place. We need to grab our state lock and cannot
- drop lock on peer. It is dangerous because deadlock is
- possible. Connect to self case and simultaneous
- attempt to connect are eliminated by checking socket
- state. other is TCP_LISTEN, if sk is TCP_LISTEN we
- check this before attempt to grab lock.
-
- Well, and we have to recheck the state after socket locked.
+ /* self connect and simultaneous connect are eliminated
+ * by rejecting TCP_LISTEN socket to avoid deadlock.
*/
- switch (READ_ONCE(sk->sk_state)) {
- case TCP_CLOSE:
- /* This is ok... continue with connect */
- break;
- case TCP_ESTABLISHED:
- /* Socket is already connected */
- err = -EISCONN;
- goto out_unlock;
- default:
- err = -EINVAL;
+ if (unlikely(READ_ONCE(sk->sk_state) != TCP_CLOSE)) {
+ err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
goto out_unlock;
}
unix_state_lock_nested(sk, U_LOCK_SECOND);
- if (sk->sk_state != TCP_CLOSE) {
- unix_state_unlock(sk);
- unix_state_unlock(other);
- sock_put(other);
- goto restart;
+ if (unlikely(sk->sk_state != TCP_CLOSE)) {
+ err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
+ unix_state_lock(sk);
+ goto out_unlock;
}
err = security_unix_stream_connect(sk, other, newsk);
--
2.30.2
next prev parent reply other threads:[~2024-06-11 22:30 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-11 22:28 [PATCH v2 net-next 00/11] af_unix: Remove spin_lock_nested() and convert to lock_cmp_fn Kuniyuki Iwashima
2024-06-11 22:28 ` [PATCH v2 net-next 01/11] af_unix: Define locking order for unix_table_double_lock() Kuniyuki Iwashima
2024-06-11 23:15 ` Kent Overstreet
2024-06-11 22:28 ` [PATCH v2 net-next 02/11] af_unix: Define locking order for U_LOCK_SECOND in unix_state_double_lock() Kuniyuki Iwashima
2024-06-11 23:16 ` Kent Overstreet
2024-06-11 22:28 ` Kuniyuki Iwashima [this message]
2024-06-14 10:49 ` [PATCH v2 net-next 03/11] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect() Paolo Abeni
2024-06-14 18:53 ` Kuniyuki Iwashima
2024-06-11 22:28 ` [PATCH v2 net-next 04/11] af_unix: Define locking order for U_LOCK_SECOND " Kuniyuki Iwashima
2024-06-11 22:28 ` [PATCH v2 net-next 05/11] af_unix: Don't acquire unix_state_lock() for sock_i_ino() Kuniyuki Iwashima
2024-06-11 22:29 ` [PATCH v2 net-next 06/11] af_unix: Remove U_LOCK_DIAG Kuniyuki Iwashima
2024-06-11 22:29 ` [PATCH v2 net-next 07/11] af_unix: Remove U_LOCK_GC_LISTENER Kuniyuki Iwashima
2024-06-11 22:29 ` [PATCH v2 net-next 08/11] af_unix: Define locking order for U_RECVQ_LOCK_EMBRYO in unix_collect_skb() Kuniyuki Iwashima
2024-06-11 23:17 ` Kent Overstreet
2024-06-11 23:23 ` Kuniyuki Iwashima
2024-06-14 11:04 ` Paolo Abeni
2024-06-14 18:55 ` Kuniyuki Iwashima
2024-06-11 22:29 ` [PATCH v2 net-next 09/11] af_unix: Set sk_peer_pid/sk_peer_cred locklessly for new socket Kuniyuki Iwashima
2024-06-11 22:29 ` [PATCH v2 net-next 10/11] af_unix: Remove put_pid()/put_cred() in copy_peercred() Kuniyuki Iwashima
2024-06-11 22:29 ` [PATCH v2 net-next 11/11] af_unix: Don't use spin_lock_nested() " Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240611222905.34695-4-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kent.overstreet@linux.dev \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).