netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting
@ 2025-03-17  9:52 Michal Luczaj
  2025-03-17  9:52 ` [PATCH net v4 1/3] vsock/bpf: Fix EINTR connect() racing sockmap update Michal Luczaj
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Michal Luczaj @ 2025-03-17  9:52 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, virtualization, linux-kernel, linux-kselftest,
	Michal Luczaj

Signal delivery during connect() may disconnect an already established
socket. Problem is that such socket might have been placed in a sockmap
before the connection was closed.

PATCH 1 ensures this race won't lead to an unconnected vsock staying in the
sockmap. PATCH 2 selftests it. 

PATCH 3 fixes a related race. Note that selftest in PATCH 2 does test this
code as well, but winning this race variant may take more than 2 seconds,
so I'm not advertising it.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Changes in v4:
- Selftest: send signal to only our own process
- Link to v3: https://lore.kernel.org/r/20250316-vsock-trans-signal-race-v3-0-17a6862277c9@rbox.co

Changes in v3:
- Selftest: drop unnecessary variable initialization and reorder the calls
- Link to v2: https://lore.kernel.org/r/20250314-vsock-trans-signal-race-v2-0-421a41f60f42@rbox.co

Changes in v2:
- Handle one more path of tripping the warning
- Add a selftest
- Collect R-b [Stefano]
- Link to v1: https://lore.kernel.org/r/20250307-vsock-trans-signal-race-v1-1-3aca3f771fbd@rbox.co

---
Michal Luczaj (3):
      vsock/bpf: Fix EINTR connect() racing sockmap update
      selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update
      vsock/bpf: Fix bpf recvmsg() racing transport reassignment

 net/vmw_vsock/af_vsock.c                           | 10 ++-
 net/vmw_vsock/vsock_bpf.c                          | 24 ++++--
 .../selftests/bpf/prog_tests/sockmap_basic.c       | 99 ++++++++++++++++++++++
 3 files changed, 124 insertions(+), 9 deletions(-)
---
base-commit: da9e8efe7ee10e8425dc356a9fc593502c8e3933
change-id: 20250305-vsock-trans-signal-race-d62f7718d099

Best regards,
-- 
Michal Luczaj <mhal@rbox.co>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v4 1/3] vsock/bpf: Fix EINTR connect() racing sockmap update
  2025-03-17  9:52 [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michal Luczaj
@ 2025-03-17  9:52 ` Michal Luczaj
  2025-03-17  9:52 ` [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK " Michal Luczaj
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Michal Luczaj @ 2025-03-17  9:52 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, virtualization, linux-kernel, linux-kselftest,
	Michal Luczaj

Signal delivery during connect() may result in a disconnect of an already
TCP_ESTABLISHED socket. Problem is that such established socket might have
been placed in a sockmap before the connection was closed. We end up with a
SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
reassign (unconnected) vsock's transport to NULL, breaks the sockmap
contract. As manifested by WARN_ON_ONCE.

connect
  / state = SS_CONNECTED /
                                sock_map_update_elem
  if signal_pending
    state = SS_UNCONNECTED

connect
  transport = NULL
                                vsock_bpf_recvmsg
                                  WARN_ON_ONCE(!vsk->transport)

Ensure the socket does not stay in sockmap.

WARNING: CPU: 8 PID: 1228 at net/vmw_vsock/vsock_bpf.c:97 vsock_bpf_recvmsg+0xb43/0xe00
CPU: 8 UID: 0 PID: 1228 Comm: a.out Not tainted 6.14.0-rc5+
RIP: 0010:vsock_bpf_recvmsg+0xb43/0xe00
 sock_recvmsg+0x1b2/0x220
 __sys_recvfrom+0x190/0x270
 __x64_sys_recvfrom+0xdc/0x1b0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 net/vmw_vsock/af_vsock.c  | 10 +++++++++-
 net/vmw_vsock/vsock_bpf.c |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7e3db87ae4333cf63327ec105ca99253569bb9fe..81b1b8e9c946a646778367ab78ca180cef75ef72 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1579,7 +1579,15 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
 
 		if (signal_pending(current)) {
 			err = sock_intr_errno(timeout);
-			sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
+			if (sk->sk_state == TCP_ESTABLISHED) {
+				/* Might have raced with a sockmap update. */
+				if (sk->sk_prot->unhash)
+					sk->sk_prot->unhash(sk);
+
+				sk->sk_state = TCP_CLOSING;
+			} else {
+				sk->sk_state = TCP_CLOSE;
+			}
 			sock->state = SS_UNCONNECTED;
 			vsock_transport_cancel_pkt(vsk);
 			vsock_remove_connected(vsk);
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 07b96d56f3a577af71021b1b8132743554996c4f..c68fdaf09046b68254dac3ea70ffbe73dfa45cef 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -127,6 +127,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas
 {
 	*prot        = *base;
 	prot->close  = sock_map_close;
+	prot->unhash = sock_map_unhash;
 	prot->recvmsg = vsock_bpf_recvmsg;
 	prot->sock_is_readable = sk_msg_is_readable;
 }

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update
  2025-03-17  9:52 [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michal Luczaj
  2025-03-17  9:52 ` [PATCH net v4 1/3] vsock/bpf: Fix EINTR connect() racing sockmap update Michal Luczaj
@ 2025-03-17  9:52 ` Michal Luczaj
  2025-03-19  9:17   ` Stefano Garzarella
  2025-03-17  9:52 ` [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment Michal Luczaj
  2025-03-19  9:21 ` [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michael S. Tsirkin
  3 siblings, 1 reply; 12+ messages in thread
From: Michal Luczaj @ 2025-03-17  9:52 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, virtualization, linux-kernel, linux-kselftest,
	Michal Luczaj

Racing signal-interrupted connect() and sockmap update may result in an
unconnected (and missing vsock transport) socket in a sockmap.

Test spends 2 seconds attempting to reach WARN_ON_ONCE().

connect
  / state = SS_CONNECTED /
                                sock_map_update_elem
  if signal_pending
    state = SS_UNCONNECTED

connect
  transport = NULL
                                vsock_bpf_recvmsg
                                  WARN_ON_ONCE(!vsk->transport)

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c       | 99 ++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 1e3e4392dcca0e1722c1982ecc649a80c27443b2..2f8bba27866354848f1e30b5473cedb6a85244ff 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -3,6 +3,7 @@
 #include <error.h>
 #include <netinet/tcp.h>
 #include <sys/epoll.h>
+#include <linux/time64.h>
 
 #include "test_progs.h"
 #include "test_skmsg_load_helpers.skel.h"
@@ -1042,6 +1043,102 @@ static void test_sockmap_vsock_unconnected(void)
 	xclose(map);
 }
 
+#define CONNECT_SIGNAL_RACE_TIMEOUT 2 /* seconds */
+
+static void sig_handler(int signum)
+{
+	/* nop */
+}
+
+static void connect_signal_racer_cleanup(void *map)
+{
+	xclose(*(int *)map);
+}
+
+static void *connect_signal_racer(void *arg)
+{
+	pid_t pid;
+	int map;
+
+	map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int),
+			     sizeof(int), 1, NULL);
+	if (!ASSERT_OK_FD(map, "bpf_map_create"))
+		return NULL;
+
+	pthread_cleanup_push(connect_signal_racer_cleanup, &map);
+	pid = getpid();
+
+	for (;;) {
+		int c = *(int *)arg;
+		int zero = 0;
+
+		(void)bpf_map_update_elem(map, &zero, &c, BPF_ANY);
+
+		if (kill(pid, SIGUSR1)) {
+			FAIL_ERRNO("kill");
+			break;
+		}
+
+		if ((recv(c, NULL, 0, MSG_DONTWAIT) < 0) && errno == ENODEV) {
+			FAIL_ERRNO("recv");
+			break;
+		}
+	}
+
+	pthread_cleanup_pop(1);
+
+	return NULL;
+}
+
+static void test_sockmap_vsock_connect_signal_race(void)
+{
+	struct sockaddr_vm addr, bad_addr;
+	socklen_t alen = sizeof(addr);
+	sighandler_t orig_handler;
+	pthread_t thread;
+	int s, c, p;
+	__u64 tout;
+
+	orig_handler = signal(SIGUSR1, sig_handler);
+	if (!ASSERT_NEQ(orig_handler, SIG_ERR, "signal handler setup"))
+		return;
+
+	s = socket_loopback(AF_VSOCK, SOCK_SEQPACKET | SOCK_NONBLOCK);
+	if (s < 0)
+		goto restore;
+
+	if (xgetsockname(s, (struct sockaddr *)&addr, &alen))
+		goto close;
+
+	bad_addr = addr;
+	bad_addr.svm_cid = 0x42424242; /* non-existing */
+
+	if (xpthread_create(&thread, 0, connect_signal_racer, &c))
+		goto close;
+
+	tout = get_time_ns() + CONNECT_SIGNAL_RACE_TIMEOUT * NSEC_PER_SEC;
+	do {
+		c = xsocket(AF_VSOCK, SOCK_SEQPACKET, 0);
+		if (c < 0)
+			break;
+
+		if (connect(c, (struct sockaddr *)&addr, alen) && errno == EINTR)
+			(void)connect(c, (struct sockaddr *)&bad_addr, alen);
+
+		xclose(c);
+		p = accept(s, NULL, NULL);
+		if (p >= 0)
+			xclose(p);
+	} while (get_time_ns() < tout);
+
+	ASSERT_OK(pthread_cancel(thread), "pthread_cancel");
+	xpthread_join(thread, NULL);
+close:
+	xclose(s);
+restore:
+	ASSERT_NEQ(signal(SIGUSR1, orig_handler), SIG_ERR, "handler restore");
+}
+
 void test_sockmap_basic(void)
 {
 	if (test__start_subtest("sockmap create_update_free"))
@@ -1108,4 +1205,6 @@ void test_sockmap_basic(void)
 		test_sockmap_skb_verdict_vsock_poll();
 	if (test__start_subtest("sockmap vsock unconnected"))
 		test_sockmap_vsock_unconnected();
+	if (test__start_subtest("sockmap vsock connect signal race"))
+		test_sockmap_vsock_connect_signal_race();
 }

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-17  9:52 [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michal Luczaj
  2025-03-17  9:52 ` [PATCH net v4 1/3] vsock/bpf: Fix EINTR connect() racing sockmap update Michal Luczaj
  2025-03-17  9:52 ` [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK " Michal Luczaj
@ 2025-03-17  9:52 ` Michal Luczaj
  2025-03-19  9:34   ` Stefano Garzarella
  2025-03-19 22:18   ` Cong Wang
  2025-03-19  9:21 ` [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michael S. Tsirkin
  3 siblings, 2 replies; 12+ messages in thread
From: Michal Luczaj @ 2025-03-17  9:52 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan
  Cc: netdev, bpf, virtualization, linux-kernel, linux-kselftest,
	Michal Luczaj

Signal delivery during connect() may lead to a disconnect of an already
established socket. That involves removing socket from any sockmap and
resetting state to SS_UNCONNECTED. While it correctly restores socket's
proto, a call to vsock_bpf_recvmsg() might have been already under way in
another thread. If the connect()ing thread reassigns the vsock transport to
NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

connect
  / state = SS_CONNECTED /
                                sock_map_update_elem
                                vsock_bpf_recvmsg
                                  psock = sk_psock_get()
  lock sk
  if signal_pending
    unhash
      sock_map_remove_links
    state = SS_UNCONNECTED
  release sk

connect
  transport = NULL
                                  lock sk
                                  WARN_ON_ONCE(!vsk->transport)

Protect recvmsg() from racing against transport reassignment. Enforce the
sockmap invariant that psock implies transport: lock socket before getting
psock.

WARNING: CPU: 9 PID: 1222 at net/vmw_vsock/vsock_bpf.c:92 vsock_bpf_recvmsg+0xb55/0xe00
CPU: 9 UID: 0 PID: 1222 Comm: a.out Not tainted 6.14.0-rc5+
RIP: 0010:vsock_bpf_recvmsg+0xb55/0xe00
 sock_recvmsg+0x1b2/0x220
 __sys_recvfrom+0x190/0x270
 __x64_sys_recvfrom+0xdc/0x1b0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 net/vmw_vsock/vsock_bpf.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index c68fdaf09046b68254dac3ea70ffbe73dfa45cef..5138195d91fb258d4bc09b48e80e13651d62863a 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -73,28 +73,35 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int
 	return err;
 }
 
-static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
-			     size_t len, int flags, int *addr_len)
+static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+			     int flags, int *addr_len)
 {
 	struct sk_psock *psock;
 	struct vsock_sock *vsk;
 	int copied;
 
+	/* Since signal delivery during connect() may reset the state of socket
+	 * that's already in a sockmap, take the lock before checking on psock.
+	 * This serializes a possible transport reassignment, protecting this
+	 * function from running with NULL transport.
+	 */
+	lock_sock(sk);
+
 	psock = sk_psock_get(sk);
-	if (unlikely(!psock))
+	if (unlikely(!psock)) {
+		release_sock(sk);
 		return __vsock_recvmsg(sk, msg, len, flags);
+	}
 
-	lock_sock(sk);
 	vsk = vsock_sk(sk);
-
 	if (WARN_ON_ONCE(!vsk->transport)) {
 		copied = -ENODEV;
 		goto out;
 	}
 
 	if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
-		release_sock(sk);
 		sk_psock_put(sk, psock);
+		release_sock(sk);
 		return __vsock_recvmsg(sk, msg, len, flags);
 	}
 
@@ -108,8 +115,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
 		}
 
 		if (sk_psock_queue_empty(psock)) {
-			release_sock(sk);
 			sk_psock_put(sk, psock);
+			release_sock(sk);
 			return __vsock_recvmsg(sk, msg, len, flags);
 		}
 
@@ -117,8 +124,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
 	}
 
 out:
-	release_sock(sk);
 	sk_psock_put(sk, psock);
+	release_sock(sk);
 
 	return copied;
 }

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update
  2025-03-17  9:52 ` [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK " Michal Luczaj
@ 2025-03-19  9:17   ` Stefano Garzarella
  0 siblings, 0 replies; 12+ messages in thread
From: Stefano Garzarella @ 2025-03-19  9:17 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Michael S. Tsirkin, Bobby Eshleman, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, virtualization, linux-kernel,
	linux-kselftest

On Mon, Mar 17, 2025 at 10:52:24AM +0100, Michal Luczaj wrote:
>Racing signal-interrupted connect() and sockmap update may result in an
>unconnected (and missing vsock transport) socket in a sockmap.
>
>Test spends 2 seconds attempting to reach WARN_ON_ONCE().
>
>connect
>  / state = SS_CONNECTED /
>                                sock_map_update_elem
>  if signal_pending
>    state = SS_UNCONNECTED
>
>connect
>  transport = NULL
>                                vsock_bpf_recvmsg
>                                  WARN_ON_ONCE(!vsk->transport)
>
>Signed-off-by: Michal Luczaj <mhal@rbox.co>
>---
> .../selftests/bpf/prog_tests/sockmap_basic.c       | 99 ++++++++++++++++++++++
> 1 file changed, 99 insertions(+)

LGTM for the vsock part!

Acked-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
>index 1e3e4392dcca0e1722c1982ecc649a80c27443b2..2f8bba27866354848f1e30b5473cedb6a85244ff 100644
>--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
>+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
>@@ -3,6 +3,7 @@
> #include <error.h>
> #include <netinet/tcp.h>
> #include <sys/epoll.h>
>+#include <linux/time64.h>
>
> #include "test_progs.h"
> #include "test_skmsg_load_helpers.skel.h"
>@@ -1042,6 +1043,102 @@ static void test_sockmap_vsock_unconnected(void)
> 	xclose(map);
> }
>
>+#define CONNECT_SIGNAL_RACE_TIMEOUT 2 /* seconds */
>+
>+static void sig_handler(int signum)
>+{
>+	/* nop */
>+}
>+
>+static void connect_signal_racer_cleanup(void *map)
>+{
>+	xclose(*(int *)map);
>+}
>+
>+static void *connect_signal_racer(void *arg)
>+{
>+	pid_t pid;
>+	int map;
>+
>+	map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int),
>+			     sizeof(int), 1, NULL);
>+	if (!ASSERT_OK_FD(map, "bpf_map_create"))
>+		return NULL;
>+
>+	pthread_cleanup_push(connect_signal_racer_cleanup, &map);
>+	pid = getpid();
>+
>+	for (;;) {
>+		int c = *(int *)arg;
>+		int zero = 0;
>+
>+		(void)bpf_map_update_elem(map, &zero, &c, BPF_ANY);
>+
>+		if (kill(pid, SIGUSR1)) {
>+			FAIL_ERRNO("kill");
>+			break;
>+		}
>+
>+		if ((recv(c, NULL, 0, MSG_DONTWAIT) < 0) && errno == ENODEV) {
>+			FAIL_ERRNO("recv");
>+			break;
>+		}
>+	}
>+
>+	pthread_cleanup_pop(1);
>+
>+	return NULL;
>+}
>+
>+static void test_sockmap_vsock_connect_signal_race(void)
>+{
>+	struct sockaddr_vm addr, bad_addr;
>+	socklen_t alen = sizeof(addr);
>+	sighandler_t orig_handler;
>+	pthread_t thread;
>+	int s, c, p;
>+	__u64 tout;
>+
>+	orig_handler = signal(SIGUSR1, sig_handler);
>+	if (!ASSERT_NEQ(orig_handler, SIG_ERR, "signal handler setup"))
>+		return;
>+
>+	s = socket_loopback(AF_VSOCK, SOCK_SEQPACKET | SOCK_NONBLOCK);
>+	if (s < 0)
>+		goto restore;
>+
>+	if (xgetsockname(s, (struct sockaddr *)&addr, &alen))
>+		goto close;
>+
>+	bad_addr = addr;
>+	bad_addr.svm_cid = 0x42424242; /* non-existing */
>+
>+	if (xpthread_create(&thread, 0, connect_signal_racer, &c))
>+		goto close;
>+
>+	tout = get_time_ns() + CONNECT_SIGNAL_RACE_TIMEOUT * NSEC_PER_SEC;
>+	do {
>+		c = xsocket(AF_VSOCK, SOCK_SEQPACKET, 0);
>+		if (c < 0)
>+			break;
>+
>+		if (connect(c, (struct sockaddr *)&addr, alen) && errno == EINTR)
>+			(void)connect(c, (struct sockaddr *)&bad_addr, alen);
>+
>+		xclose(c);
>+		p = accept(s, NULL, NULL);
>+		if (p >= 0)
>+			xclose(p);
>+	} while (get_time_ns() < tout);
>+
>+	ASSERT_OK(pthread_cancel(thread), "pthread_cancel");
>+	xpthread_join(thread, NULL);
>+close:
>+	xclose(s);
>+restore:
>+	ASSERT_NEQ(signal(SIGUSR1, orig_handler), SIG_ERR, "handler restore");
>+}
>+
> void test_sockmap_basic(void)
> {
> 	if (test__start_subtest("sockmap create_update_free"))
>@@ -1108,4 +1205,6 @@ void test_sockmap_basic(void)
> 		test_sockmap_skb_verdict_vsock_poll();
> 	if (test__start_subtest("sockmap vsock unconnected"))
> 		test_sockmap_vsock_unconnected();
>+	if (test__start_subtest("sockmap vsock connect signal race"))
>+		test_sockmap_vsock_connect_signal_race();
> }
>
>-- 
>2.48.1
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting
  2025-03-17  9:52 [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michal Luczaj
                   ` (2 preceding siblings ...)
  2025-03-17  9:52 ` [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment Michal Luczaj
@ 2025-03-19  9:21 ` Michael S. Tsirkin
  3 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2025-03-19  9:21 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Bobby Eshleman, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, virtualization, linux-kernel,
	linux-kselftest

On Mon, Mar 17, 2025 at 10:52:22AM +0100, Michal Luczaj wrote:
> Signal delivery during connect() may disconnect an already established
> socket. Problem is that such socket might have been placed in a sockmap
> before the connection was closed.
> 
> PATCH 1 ensures this race won't lead to an unconnected vsock staying in the
> sockmap. PATCH 2 selftests it. 
> 
> PATCH 3 fixes a related race. Note that selftest in PATCH 2 does test this
> code as well, but winning this race variant may take more than 2 seconds,
> so I'm not advertising it.
> 
> Signed-off-by: Michal Luczaj <mhal@rbox.co>

vsock things:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> Changes in v4:
> - Selftest: send signal to only our own process
> - Link to v3: https://lore.kernel.org/r/20250316-vsock-trans-signal-race-v3-0-17a6862277c9@rbox.co
> 
> Changes in v3:
> - Selftest: drop unnecessary variable initialization and reorder the calls
> - Link to v2: https://lore.kernel.org/r/20250314-vsock-trans-signal-race-v2-0-421a41f60f42@rbox.co
> 
> Changes in v2:
> - Handle one more path of tripping the warning
> - Add a selftest
> - Collect R-b [Stefano]
> - Link to v1: https://lore.kernel.org/r/20250307-vsock-trans-signal-race-v1-1-3aca3f771fbd@rbox.co
> 
> ---
> Michal Luczaj (3):
>       vsock/bpf: Fix EINTR connect() racing sockmap update
>       selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update
>       vsock/bpf: Fix bpf recvmsg() racing transport reassignment
> 
>  net/vmw_vsock/af_vsock.c                           | 10 ++-
>  net/vmw_vsock/vsock_bpf.c                          | 24 ++++--
>  .../selftests/bpf/prog_tests/sockmap_basic.c       | 99 ++++++++++++++++++++++
>  3 files changed, 124 insertions(+), 9 deletions(-)
> ---
> base-commit: da9e8efe7ee10e8425dc356a9fc593502c8e3933
> change-id: 20250305-vsock-trans-signal-race-d62f7718d099
> 
> Best regards,
> -- 
> Michal Luczaj <mhal@rbox.co>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-17  9:52 ` [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment Michal Luczaj
@ 2025-03-19  9:34   ` Stefano Garzarella
  2025-03-19 19:05     ` Michal Luczaj
  2025-03-19 22:18   ` Cong Wang
  1 sibling, 1 reply; 12+ messages in thread
From: Stefano Garzarella @ 2025-03-19  9:34 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Michael S. Tsirkin, Bobby Eshleman, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, virtualization, linux-kernel,
	linux-kselftest

On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>Signal delivery during connect() may lead to a disconnect of an already
>established socket. That involves removing socket from any sockmap and
>resetting state to SS_UNCONNECTED. While it correctly restores socket's
>proto, a call to vsock_bpf_recvmsg() might have been already under way in
>another thread. If the connect()ing thread reassigns the vsock transport to
>NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
>
>connect
>  / state = SS_CONNECTED /
>                                sock_map_update_elem
>                                vsock_bpf_recvmsg
>                                  psock = sk_psock_get()
>  lock sk
>  if signal_pending
>    unhash
>      sock_map_remove_links
>    state = SS_UNCONNECTED
>  release sk
>
>connect
>  transport = NULL
>                                  lock sk
>                                  WARN_ON_ONCE(!vsk->transport)
>
>Protect recvmsg() from racing against transport reassignment. Enforce the
>sockmap invariant that psock implies transport: lock socket before getting
>psock.
>
>WARNING: CPU: 9 PID: 1222 at net/vmw_vsock/vsock_bpf.c:92 vsock_bpf_recvmsg+0xb55/0xe00
>CPU: 9 UID: 0 PID: 1222 Comm: a.out Not tainted 6.14.0-rc5+
>RIP: 0010:vsock_bpf_recvmsg+0xb55/0xe00
> sock_recvmsg+0x1b2/0x220
> __sys_recvfrom+0x190/0x270
> __x64_sys_recvfrom+0xdc/0x1b0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>Fixes: 634f1a7110b4 ("vsock: support sockmap")
>Signed-off-by: Michal Luczaj <mhal@rbox.co>
>---
> net/vmw_vsock/vsock_bpf.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
>diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
>index c68fdaf09046b68254dac3ea70ffbe73dfa45cef..5138195d91fb258d4bc09b48e80e13651d62863a 100644
>--- a/net/vmw_vsock/vsock_bpf.c
>+++ b/net/vmw_vsock/vsock_bpf.c
>@@ -73,28 +73,35 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int
> 	return err;
> }
>
>-static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
>-			     size_t len, int flags, int *addr_len)
>+static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>+			     int flags, int *addr_len)

I would avoid this change, especially in a patch with the Fixes tag then 
to be backported.

> {
> 	struct sk_psock *psock;
> 	struct vsock_sock *vsk;
> 	int copied;
>
>+	/* Since signal delivery during connect() may reset the state of socket
>+	 * that's already in a sockmap, take the lock before checking on psock.
>+	 * This serializes a possible transport reassignment, protecting this
>+	 * function from running with NULL transport.
>+	 */
>+	lock_sock(sk);
>+
> 	psock = sk_psock_get(sk);
>-	if (unlikely(!psock))
>+	if (unlikely(!psock)) {
>+		release_sock(sk);
> 		return __vsock_recvmsg(sk, msg, len, flags);
>+	}
>
>-	lock_sock(sk);
> 	vsk = vsock_sk(sk);
>-
> 	if (WARN_ON_ONCE(!vsk->transport)) {
> 		copied = -ENODEV;
> 		goto out;
> 	}
>
> 	if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
>-		release_sock(sk);
> 		sk_psock_put(sk, psock);
>+		release_sock(sk);

But here we release it, so can still a reset happen at this point, 
before calling __vsock_connectible_recvmsg().
In there anyway we handle the case where transport is null, so there's 
no problem, right?

The rest LTGM.

Thanks,
Stefano

> 		return __vsock_recvmsg(sk, msg, len, flags);
> 	}
>
>@@ -108,8 +115,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
> 		}
>
> 		if (sk_psock_queue_empty(psock)) {
>-			release_sock(sk);
> 			sk_psock_put(sk, psock);
>+			release_sock(sk);
> 			return __vsock_recvmsg(sk, msg, len, flags);
> 		}
>
>@@ -117,8 +124,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
> 	}
>
> out:
>-	release_sock(sk);
> 	sk_psock_put(sk, psock);
>+	release_sock(sk);
>
> 	return copied;
> }
>
>-- 
>2.48.1
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-19  9:34   ` Stefano Garzarella
@ 2025-03-19 19:05     ` Michal Luczaj
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Luczaj @ 2025-03-19 19:05 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Michael S. Tsirkin, Bobby Eshleman, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, virtualization, linux-kernel,
	linux-kselftest

On 3/19/25 10:34, Stefano Garzarella wrote:
> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>> ...
>> -static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
>> -			     size_t len, int flags, int *addr_len)
>> +static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>> +			     int flags, int *addr_len)
> 
> I would avoid this change, especially in a patch with the Fixes tag then 
> to be backported.

I thought that since I've modified this function in so many places, doing
this wouldn't hurt. But ok, I'll drop this change.

>> {
>> 	struct sk_psock *psock;
>> 	struct vsock_sock *vsk;
>> 	int copied;
>>
>> +	/* Since signal delivery during connect() may reset the state of socket
>> +	 * that's already in a sockmap, take the lock before checking on psock.
>> +	 * This serializes a possible transport reassignment, protecting this
>> +	 * function from running with NULL transport.
>> +	 */
>> +	lock_sock(sk);
>> +
>> 	psock = sk_psock_get(sk);
>> -	if (unlikely(!psock))
>> +	if (unlikely(!psock)) {
>> +		release_sock(sk);
>> 		return __vsock_recvmsg(sk, msg, len, flags);
>> +	}
>>
>> -	lock_sock(sk);
>> 	vsk = vsock_sk(sk);
>> -
>> 	if (WARN_ON_ONCE(!vsk->transport)) {
>> 		copied = -ENODEV;
>> 		goto out;
>> 	}
>>
>> 	if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
>> -		release_sock(sk);
>> 		sk_psock_put(sk, psock);
>> +		release_sock(sk);
> 
> But here we release it, so can still a reset happen at this point, 
> before calling __vsock_connectible_recvmsg().
> In there anyway we handle the case where transport is null, so there's 
> no problem, right?

Yes, I think we're good. That function needs to gracefully handle being
called without a transport, and it does.

Thanks,
Michal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-17  9:52 ` [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment Michal Luczaj
  2025-03-19  9:34   ` Stefano Garzarella
@ 2025-03-19 22:18   ` Cong Wang
  2025-03-20 12:05     ` Michal Luczaj
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2025-03-19 22:18 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, netdev, bpf, virtualization,
	linux-kernel, linux-kselftest

On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
> Signal delivery during connect() may lead to a disconnect of an already
> established socket. That involves removing socket from any sockmap and
> resetting state to SS_UNCONNECTED. While it correctly restores socket's
> proto, a call to vsock_bpf_recvmsg() might have been already under way in
> another thread. If the connect()ing thread reassigns the vsock transport to
> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
> 
> connect
>   / state = SS_CONNECTED /
>                                 sock_map_update_elem
>                                 vsock_bpf_recvmsg
>                                   psock = sk_psock_get()
>   lock sk
>   if signal_pending
>     unhash
>       sock_map_remove_links

So vsock's ->recvmsg() should be restored after this, right? Then how is
vsock_bpf_recvmsg() called afterward?

>     state = SS_UNCONNECTED
>   release sk
> 
> connect
>   transport = NULL
>                                   lock sk
>                                   WARN_ON_ONCE(!vsk->transport)
> 

And I am wondering why we need to WARN here since we can handle this error
case correctly?

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-19 22:18   ` Cong Wang
@ 2025-03-20 12:05     ` Michal Luczaj
  2025-03-20 20:54       ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Luczaj @ 2025-03-20 12:05 UTC (permalink / raw)
  To: Cong Wang
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, netdev, bpf, virtualization,
	linux-kernel, linux-kselftest

On 3/19/25 23:18, Cong Wang wrote:
> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>> Signal delivery during connect() may lead to a disconnect of an already
>> established socket. That involves removing socket from any sockmap and
>> resetting state to SS_UNCONNECTED. While it correctly restores socket's
>> proto, a call to vsock_bpf_recvmsg() might have been already under way in
>> another thread. If the connect()ing thread reassigns the vsock transport to
>> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
>>

   *THREAD 1*                      *THREAD 2*

>> connect
>>   / state = SS_CONNECTED /
>>                                 sock_map_update_elem
>>                                 vsock_bpf_recvmsg
>>                                   psock = sk_psock_get()
>>   lock sk
>>   if signal_pending
>>     unhash
>>       sock_map_remove_links
> 
> So vsock's ->recvmsg() should be restored after this, right? Then how is
> vsock_bpf_recvmsg() called afterward?

I'm not sure I understand the question, so I've added a header above: those
are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called
afterwards. It was called before sock_map_remove_links(). Note that at the
time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still
executing (in T2).

>>     state = SS_UNCONNECTED
>>   release sk
>>
>> connect
>>   transport = NULL
>>                                   lock sk
>>                                   WARN_ON_ONCE(!vsk->transport)
>>
> 
> And I am wondering why we need to WARN here since we can handle this error
> case correctly?

The WARN and transport check are here for defensive measures, and to state
a contract.

But I think I get your point. If we accept for a fact of life that BPF code
should be able to handle transport disappearing - then WARN can be removed
(while keeping the check) and this patch can be dropped.

My aim, instead, was to keep things consistent. By which I mean sticking to
the conditions expressed in vsock_bpf_update_proto() as invariants; so that
vsock with a psock is guaranteed to have transport assigned.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-20 12:05     ` Michal Luczaj
@ 2025-03-20 20:54       ` Cong Wang
  2025-03-20 22:16         ` Michal Luczaj
  0 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2025-03-20 20:54 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, netdev, bpf, virtualization,
	linux-kernel, linux-kselftest

On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote:
> On 3/19/25 23:18, Cong Wang wrote:
> > On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
> >> Signal delivery during connect() may lead to a disconnect of an already
> >> established socket. That involves removing socket from any sockmap and
> >> resetting state to SS_UNCONNECTED. While it correctly restores socket's
> >> proto, a call to vsock_bpf_recvmsg() might have been already under way in
> >> another thread. If the connect()ing thread reassigns the vsock transport to
> >> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
> >>
> 
>    *THREAD 1*                      *THREAD 2*
> 
> >> connect
> >>   / state = SS_CONNECTED /
> >>                                 sock_map_update_elem
> >>                                 vsock_bpf_recvmsg
> >>                                   psock = sk_psock_get()
> >>   lock sk
> >>   if signal_pending
> >>     unhash
> >>       sock_map_remove_links
> > 
> > So vsock's ->recvmsg() should be restored after this, right? Then how is
> > vsock_bpf_recvmsg() called afterward?
> 
> I'm not sure I understand the question, so I've added a header above: those
> are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called
> afterwards. It was called before sock_map_remove_links(). Note that at the
> time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still
> executing (in T2).

I thought the above vsock_bpf_recvmsg() on the right side completed
before sock_map_remove_links(), sorry for the confusion.

> 
> >>     state = SS_UNCONNECTED
> >>   release sk
> >>
> >> connect
> >>   transport = NULL
> >>                                   lock sk
> >>                                   WARN_ON_ONCE(!vsk->transport)
> >>
> > 
> > And I am wondering why we need to WARN here since we can handle this error
> > case correctly?
> 
> The WARN and transport check are here for defensive measures, and to state
> a contract.
> 
> But I think I get your point. If we accept for a fact of life that BPF code
> should be able to handle transport disappearing - then WARN can be removed
> (while keeping the check) and this patch can be dropped.

I am thinking whether we have more elegant way to handle this case,
WARN looks not pretty.

> 
> My aim, instead, was to keep things consistent. By which I mean sticking to
> the conditions expressed in vsock_bpf_update_proto() as invariants; so that
> vsock with a psock is guaranteed to have transport assigned.

Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg()
because for example UDP is (almost) lockless, so enforcing the sock lock
for all vsock types looks not flexible and may hurt performance.

Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks
for different struct proto (as we did for TCP/UDP)?

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment
  2025-03-20 20:54       ` Cong Wang
@ 2025-03-20 22:16         ` Michal Luczaj
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Luczaj @ 2025-03-20 22:16 UTC (permalink / raw)
  To: Cong Wang
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michael S. Tsirkin, Bobby Eshleman,
	Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, netdev, bpf, virtualization,
	linux-kernel, linux-kselftest

On 3/20/25 21:54, Cong Wang wrote:
> On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote:
>> On 3/19/25 23:18, Cong Wang wrote:
>>> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>>>> Signal delivery during connect() may lead to a disconnect of an already
>>>> established socket. That involves removing socket from any sockmap and
>>>> resetting state to SS_UNCONNECTED. While it correctly restores socket's
>>>> proto, a call to vsock_bpf_recvmsg() might have been already under way in
>>>> another thread. If the connect()ing thread reassigns the vsock transport to
>>>> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
>>>>
>>
>>    *THREAD 1*                      *THREAD 2*
>>
>>>> connect
>>>>   / state = SS_CONNECTED /
>>>>                                 sock_map_update_elem
>>>>                                 vsock_bpf_recvmsg
>>>>                                   psock = sk_psock_get()
>>>>   lock sk
>>>>   if signal_pending
>>>>     unhash
>>>>       sock_map_remove_links
>>>
>>> So vsock's ->recvmsg() should be restored after this, right? Then how is
>>> vsock_bpf_recvmsg() called afterward?
>>
>> I'm not sure I understand the question, so I've added a header above: those
>> are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called
>> afterwards. It was called before sock_map_remove_links(). Note that at the
>> time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still
>> executing (in T2).
> 
> I thought the above vsock_bpf_recvmsg() on the right side completed
> before sock_map_remove_links(), sorry for the confusion.

No problem, I see why you've might. Perhaps deeper indentation would make
things clearer.

>>>>     state = SS_UNCONNECTED
>>>>   release sk
>>>>
>>>> connect
>>>>   transport = NULL
>>>>                                   lock sk
>>>>                                   WARN_ON_ONCE(!vsk->transport)
>>>>
>>>
>>> And I am wondering why we need to WARN here since we can handle this error
>>> case correctly?
>>
>> The WARN and transport check are here for defensive measures, and to state
>> a contract.
>>
>> But I think I get your point. If we accept for a fact of life that BPF code
>> should be able to handle transport disappearing - then WARN can be removed
>> (while keeping the check) and this patch can be dropped.
> 
> I am thinking whether we have more elegant way to handle this case,
> WARN looks not pretty.

Since the case should never happen, I like to think of WARN as a deliberate
eyesore :)

>> My aim, instead, was to keep things consistent. By which I mean sticking to
>> the conditions expressed in vsock_bpf_update_proto() as invariants; so that
>> vsock with a psock is guaranteed to have transport assigned.
> 
> Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg()
> because for example UDP is (almost) lockless, so enforcing the sock lock
> for all vsock types looks not flexible and may hurt performance.
>
> Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks
> for different struct proto (as we did for TCP/UDP)?

By UDP you mean vsock SOCK_DGRAM? No need to worry. VMCI is the only
transport that features VSOCK_TRANSPORT_F_DGRAM, but it does not
implemented read_skb() callback, making it unsupported by BPF/sockmap.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-03-20 22:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17  9:52 [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michal Luczaj
2025-03-17  9:52 ` [PATCH net v4 1/3] vsock/bpf: Fix EINTR connect() racing sockmap update Michal Luczaj
2025-03-17  9:52 ` [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK " Michal Luczaj
2025-03-19  9:17   ` Stefano Garzarella
2025-03-17  9:52 ` [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment Michal Luczaj
2025-03-19  9:34   ` Stefano Garzarella
2025-03-19 19:05     ` Michal Luczaj
2025-03-19 22:18   ` Cong Wang
2025-03-20 12:05     ` Michal Luczaj
2025-03-20 20:54       ` Cong Wang
2025-03-20 22:16         ` Michal Luczaj
2025-03-19  9:21 ` [PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).