netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment
@ 2025-01-10  8:35 Stefano Garzarella
  2025-01-10  8:35 ` [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes Stefano Garzarella
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10  8:35 UTC (permalink / raw)
  To: netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Stefano Garzarella, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, Michal Luczaj, virtualization, Bobby Eshleman

v1: https://lore.kernel.org/netdev/20250108180617.154053-1-sgarzare@redhat.com/
v2:
- Added patch 3 to cancel the virtio close delayed work when de-assigning
  the transport
- Added patch 4 to clean the socket state after de-assigning the transport
- Added patch 5 as suggested by Michael and Hyunwoo Kim. It's based on
  Hyunwoo Kim and Wongi Lee patch [1] but using WARN_ON and covering more
  functions
- Added R-b/T-b tags

This series includes two patches discussed in the thread started by
Hyunwoo Kim a few weeks ago [1], plus 3 more patches added after some
discussions on v1 (see changelog). All related to the case where a vsock
socket is de-assigned from a transport (e.g., because the connect fails
or is interrupted by a signal) and then assigned to another transport
or to no-one (NULL).

I tested with usual vsock test suite, plus Michal repro [2]. (Note: the repo
works only if a G2H transport is not loaded, e.g. virtio-vsock driver).

The first patch is a fix more appropriate to the problem reported in
that thread, the second patch on the other hand is a related fix but
of a different problem highlighted by Michal Luczaj. It's present only
in vsock_bpf and already handled in af_vsock.c
The third patch is to cancel the virtio close delayed work when de-assigning
the transport, the fourth patch is to clean the socket state after de-assigning
the transport, the last patch adds warnings and prevents null-ptr-deref in
vsock_*[has_data|has_space].

Hyunwoo Kim, Michal, if you can test and report your Tested-by that
would be great!

[1] https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
[2] https://lore.kernel.org/netdev/2b3062e3-bdaa-4c94-a3c0-2930595b9670@rbox.co/

Stefano Garzarella (5):
  vsock/virtio: discard packets if the transport changes
  vsock/bpf: return early if transport is not assigned
  vsock/virtio: cancel close work in the destructor
  vsock: reset socket state when de-assigning the transport
  vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]

 net/vmw_vsock/af_vsock.c                | 18 +++++++++++++
 net/vmw_vsock/virtio_transport_common.c | 36 ++++++++++++++++++-------
 net/vmw_vsock/vsock_bpf.c               |  9 +++++++
 3 files changed, 53 insertions(+), 10 deletions(-)


base-commit: fbfd64d25c7af3b8695201ebc85efe90be28c5a3
-- 
2.47.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
@ 2025-01-10  8:35 ` Stefano Garzarella
  2025-01-10 22:46   ` Hyunwoo Kim
  2025-01-12 22:42   ` Michal Luczaj
  2025-01-10  8:35 ` [PATCH net v2 2/5] vsock/bpf: return early if transport is not assigned Stefano Garzarella
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10  8:35 UTC (permalink / raw)
  To: netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Stefano Garzarella, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, Michal Luczaj, virtualization, Bobby Eshleman,
	stable

If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.

A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Reported-by: Wongi Lee <qwerty@theori.io>
Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9acc13ab3f82..51a494b69be8 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 
 	lock_sock(sk);
 
-	/* Check if sk has been closed before lock_sock */
-	if (sock_flag(sk, SOCK_DONE)) {
+	/* Check if sk has been closed or assigned to another transport before
+	 * lock_sock (note: listener sockets are not assigned to any transport)
+	 */
+	if (sock_flag(sk, SOCK_DONE) ||
+	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
 		(void)virtio_transport_reset_no_sock(t, skb);
 		release_sock(sk);
 		sock_put(sk);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net v2 2/5] vsock/bpf: return early if transport is not assigned
  2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
  2025-01-10  8:35 ` [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes Stefano Garzarella
@ 2025-01-10  8:35 ` Stefano Garzarella
  2025-01-10  8:35 ` [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor Stefano Garzarella
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10  8:35 UTC (permalink / raw)
  To: netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Stefano Garzarella, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, Michal Luczaj, virtualization, Bobby Eshleman,
	stable, syzbot+3affdbfc986ecd9200fd

Some of the core functions can only be called if the transport
has been assigned.

As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:

    BUG: kernel NULL pointer dereference, address: 00000000000000a0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
    Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
    RIP: 0010:vsock_connectible_has_data+0x1f/0x40
    Call Trace:
     vsock_bpf_recvmsg+0xca/0x5e0
     sock_recvmsg+0xb9/0xc0
     __sys_recvfrom+0xb3/0x130
     __x64_sys_recvfrom+0x20/0x30
     do_syscall_64+0x93/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Cc: stable@vger.kernel.org
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal@rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/vsock_bpf.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 4aa6e74ec295..f201d9eca1df 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -77,6 +77,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
 			     size_t len, int flags, int *addr_len)
 {
 	struct sk_psock *psock;
+	struct vsock_sock *vsk;
 	int copied;
 
 	psock = sk_psock_get(sk);
@@ -84,6 +85,13 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
 		return __vsock_recvmsg(sk, msg, len, flags);
 
 	lock_sock(sk);
+	vsk = vsock_sk(sk);
+
+	if (!vsk->transport) {
+		copied = -ENODEV;
+		goto out;
+	}
+
 	if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
 		release_sock(sk);
 		sk_psock_put(sk, psock);
@@ -108,6 +116,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
 		copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
 	}
 
+out:
 	release_sock(sk);
 	sk_psock_put(sk, psock);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor
  2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
  2025-01-10  8:35 ` [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes Stefano Garzarella
  2025-01-10  8:35 ` [PATCH net v2 2/5] vsock/bpf: return early if transport is not assigned Stefano Garzarella
@ 2025-01-10  8:35 ` Stefano Garzarella
  2025-01-10 10:57   ` Luigi Leonardi
  2025-01-10 22:48   ` Hyunwoo Kim
  2025-01-10  8:35 ` [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport Stefano Garzarella
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10  8:35 UTC (permalink / raw)
  To: netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Stefano Garzarella, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, Michal Luczaj, virtualization, Bobby Eshleman,
	stable

During virtio_transport_release() we can schedule a delayed work to
perform the closing of the socket before destruction.

The destructor is called either when the socket is really destroyed
(reference counter to zero), or it can also be called when we are
de-assigning the transport.

In the former case, we are sure the delayed work has completed, because
it holds a reference until it completes, so the destructor will
definitely be called after the delayed work is finished.
But in the latter case, the destructor is called by AF_VSOCK core, just
after the release(), so there may still be delayed work scheduled.

Refactor the code, moving the code to delete the close work already in
the do_close() to a new function. Invoke it during destruction to make
sure we don't leave any pending work.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 29 ++++++++++++++++++-------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 51a494b69be8..7f7de6d88096 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,9 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
+					       bool cancel_timeout);
+
 static const struct virtio_transport *
 virtio_transport_get_ops(struct vsock_sock *vsk)
 {
@@ -1109,6 +1112,8 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
 
+	virtio_transport_cancel_close_work(vsk, true);
+
 	kfree(vvs);
 	vsk->trans = NULL;
 }
@@ -1204,17 +1209,11 @@ static void virtio_transport_wait_close(struct sock *sk, long timeout)
 	}
 }
 
-static void virtio_transport_do_close(struct vsock_sock *vsk,
-				      bool cancel_timeout)
+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
+					       bool cancel_timeout)
 {
 	struct sock *sk = sk_vsock(vsk);
 
-	sock_set_flag(sk, SOCK_DONE);
-	vsk->peer_shutdown = SHUTDOWN_MASK;
-	if (vsock_stream_has_data(vsk) <= 0)
-		sk->sk_state = TCP_CLOSING;
-	sk->sk_state_change(sk);
-
 	if (vsk->close_work_scheduled &&
 	    (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
 		vsk->close_work_scheduled = false;
@@ -1226,6 +1225,20 @@ static void virtio_transport_do_close(struct vsock_sock *vsk,
 	}
 }
 
+static void virtio_transport_do_close(struct vsock_sock *vsk,
+				      bool cancel_timeout)
+{
+	struct sock *sk = sk_vsock(vsk);
+
+	sock_set_flag(sk, SOCK_DONE);
+	vsk->peer_shutdown = SHUTDOWN_MASK;
+	if (vsock_stream_has_data(vsk) <= 0)
+		sk->sk_state = TCP_CLOSING;
+	sk->sk_state_change(sk);
+
+	virtio_transport_cancel_close_work(vsk, cancel_timeout);
+}
+
 static void virtio_transport_close_timeout(struct work_struct *work)
 {
 	struct vsock_sock *vsk =
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport
  2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
                   ` (2 preceding siblings ...)
  2025-01-10  8:35 ` [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor Stefano Garzarella
@ 2025-01-10  8:35 ` Stefano Garzarella
  2025-01-10 10:56   ` Luigi Leonardi
  2025-01-10  8:35 ` [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space] Stefano Garzarella
  2025-01-14 11:50 ` [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment patchwork-bot+netdevbpf
  5 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10  8:35 UTC (permalink / raw)
  To: netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Stefano Garzarella, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, Michal Luczaj, virtualization, Bobby Eshleman,
	stable

Transport's release() and destruct() are called when de-assigning the
vsock transport. These callbacks can touch some socket state like
sock flags, sk_state, and peer_shutdown.

Since we are reassigning the socket to a new transport during
vsock_connect(), let's reset these fields to have a clean state with
the new transport.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/af_vsock.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5cf8109f672a..74d35a871644 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -491,6 +491,15 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		 */
 		vsk->transport->release(vsk);
 		vsock_deassign_transport(vsk);
+
+		/* transport's release() and destruct() can touch some socket
+		 * state, since we are reassigning the socket to a new transport
+		 * during vsock_connect(), let's reset these fields to have a
+		 * clean state.
+		 */
+		sock_reset_flag(sk, SOCK_DONE);
+		sk->sk_state = TCP_CLOSE;
+		vsk->peer_shutdown = 0;
 	}
 
 	/* We increase the module refcnt to prevent the transport unloading
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
  2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
                   ` (3 preceding siblings ...)
  2025-01-10  8:35 ` [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport Stefano Garzarella
@ 2025-01-10  8:35 ` Stefano Garzarella
  2025-01-10  9:49   ` Luigi Leonardi
  2025-01-10 22:52   ` Hyunwoo Kim
  2025-01-14 11:50 ` [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment patchwork-bot+netdevbpf
  5 siblings, 2 replies; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10  8:35 UTC (permalink / raw)
  To: netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Stefano Garzarella, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, Michal Luczaj, virtualization, Bobby Eshleman,
	stable

Recent reports have shown how we sometimes call vsock_*_has_data()
when a vsock socket has been de-assigned from a transport (see attached
links), but we shouldn't.

Previous commits should have solved the real problems, but we may have
more in the future, so to avoid null-ptr-deref, we can return 0
(no space, no data available) but with a warning.

This way the code should continue to run in a nearly consistent state
and have a warning that allows us to debug future problems.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Co-developed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Co-developed-by: Wongi Lee <qwerty@theori.io>
Signed-off-by: Wongi Lee <qwerty@theori.io>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/af_vsock.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 74d35a871644..fa9d1b49599b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -879,6 +879,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
 
 s64 vsock_stream_has_data(struct vsock_sock *vsk)
 {
+	if (WARN_ON(!vsk->transport))
+		return 0;
+
 	return vsk->transport->stream_has_data(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_data);
@@ -887,6 +890,9 @@ s64 vsock_connectible_has_data(struct vsock_sock *vsk)
 {
 	struct sock *sk = sk_vsock(vsk);
 
+	if (WARN_ON(!vsk->transport))
+		return 0;
+
 	if (sk->sk_type == SOCK_SEQPACKET)
 		return vsk->transport->seqpacket_has_data(vsk);
 	else
@@ -896,6 +902,9 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_data);
 
 s64 vsock_stream_has_space(struct vsock_sock *vsk)
 {
+	if (WARN_ON(!vsk->transport))
+		return 0;
+
 	return vsk->transport->stream_has_space(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_space);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
  2025-01-10  8:35 ` [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space] Stefano Garzarella
@ 2025-01-10  9:49   ` Luigi Leonardi
  2025-01-10 22:52   ` Hyunwoo Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Luigi Leonardi @ 2025-01-10  9:49 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, David S. Miller, Wongi Lee,
	Eugenio Pérez, Michael S. Tsirkin, Eric Dumazet, kvm,
	Paolo Abeni, Stefan Hajnoczi, Jason Wang, Simon Horman,
	Hyunwoo Kim, Jakub Kicinski, Michal Luczaj, virtualization,
	Bobby Eshleman, stable

On Fri, Jan 10, 2025 at 09:35:11AM +0100, Stefano Garzarella wrote:
>Recent reports have shown how we sometimes call vsock_*_has_data()
>when a vsock socket has been de-assigned from a transport (see attached
>links), but we shouldn't.
>
>Previous commits should have solved the real problems, but we may have
>more in the future, so to avoid null-ptr-deref, we can return 0
>(no space, no data available) but with a warning.
>
>This way the code should continue to run in a nearly consistent state
>and have a warning that allows us to debug future problems.
>
>Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>Cc: stable@vger.kernel.org
>Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
>Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
>Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
>Co-developed-by: Hyunwoo Kim <v4bel@theori.io>
>Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
>Co-developed-by: Wongi Lee <qwerty@theori.io>
>Signed-off-by: Wongi Lee <qwerty@theori.io>
>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>---
> net/vmw_vsock/af_vsock.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 74d35a871644..fa9d1b49599b 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -879,6 +879,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
>
> s64 vsock_stream_has_data(struct vsock_sock *vsk)
> {
>+	if (WARN_ON(!vsk->transport))
>+		return 0;
>+
> 	return vsk->transport->stream_has_data(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_stream_has_data);
>@@ -887,6 +890,9 @@ s64 vsock_connectible_has_data(struct vsock_sock *vsk)
> {
> 	struct sock *sk = sk_vsock(vsk);
>
>+	if (WARN_ON(!vsk->transport))
>+		return 0;
>+
> 	if (sk->sk_type == SOCK_SEQPACKET)
> 		return vsk->transport->seqpacket_has_data(vsk);
> 	else
>@@ -896,6 +902,9 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_data);
>
> s64 vsock_stream_has_space(struct vsock_sock *vsk)
> {
>+	if (WARN_ON(!vsk->transport))
>+		return 0;
>+
> 	return vsk->transport->stream_has_space(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_stream_has_space);
>-- 
>2.47.1
>

Reviewed-by: Luigi Leonardi <leonardi@redhat.com>

Thanks!
Luigi


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport
  2025-01-10  8:35 ` [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport Stefano Garzarella
@ 2025-01-10 10:56   ` Luigi Leonardi
  2025-01-10 11:25     ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Luigi Leonardi @ 2025-01-10 10:56 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, David S. Miller, Wongi Lee,
	Eugenio Pérez, Michael S. Tsirkin, Eric Dumazet, kvm,
	Paolo Abeni, Stefan Hajnoczi, Jason Wang, Simon Horman,
	Hyunwoo Kim, Jakub Kicinski, Michal Luczaj, virtualization,
	Bobby Eshleman, stable

On Fri, Jan 10, 2025 at 09:35:10AM +0100, Stefano Garzarella wrote:
>Transport's release() and destruct() are called when de-assigning the
>vsock transport. These callbacks can touch some socket state like
>sock flags, sk_state, and peer_shutdown.
>
>Since we are reassigning the socket to a new transport during
>vsock_connect(), let's reset these fields to have a clean state with
>the new transport.
>
>Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>Cc: stable@vger.kernel.org
>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>---
> net/vmw_vsock/af_vsock.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 5cf8109f672a..74d35a871644 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -491,6 +491,15 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> 		 */
> 		vsk->transport->release(vsk);
> 		vsock_deassign_transport(vsk);
>+
>+		/* transport's release() and destruct() can touch some socket
>+		 * state, since we are reassigning the socket to a new transport
>+		 * during vsock_connect(), let's reset these fields to have a
>+		 * clean state.
>+		 */
>+		sock_reset_flag(sk, SOCK_DONE);
>+		sk->sk_state = TCP_CLOSE;
>+		vsk->peer_shutdown = 0;
> 	}
>
> 	/* We increase the module refcnt to prevent the transport unloading
>-- 
>2.47.1
>

Hi Stefano,
I spent some time investigating what would happen if the scheduled work
ran before `virtio_transport_cancel_close_work`. IIUC that should do no 
harm and all the fields are reset correctly.

Thank you,
Luigi

Reviewed-by: Luigi Leonardi <leonardi@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor
  2025-01-10  8:35 ` [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor Stefano Garzarella
@ 2025-01-10 10:57   ` Luigi Leonardi
  2025-01-10 22:48   ` Hyunwoo Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Luigi Leonardi @ 2025-01-10 10:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, David S. Miller, Wongi Lee,
	Eugenio Pérez, Michael S. Tsirkin, Eric Dumazet, kvm,
	Paolo Abeni, Stefan Hajnoczi, Jason Wang, Simon Horman,
	Hyunwoo Kim, Jakub Kicinski, Michal Luczaj, virtualization,
	Bobby Eshleman, stable

On Fri, Jan 10, 2025 at 09:35:09AM +0100, Stefano Garzarella wrote:
>During virtio_transport_release() we can schedule a delayed work to
>perform the closing of the socket before destruction.
>
>The destructor is called either when the socket is really destroyed
>(reference counter to zero), or it can also be called when we are
>de-assigning the transport.
>
>In the former case, we are sure the delayed work has completed, because
>it holds a reference until it completes, so the destructor will
>definitely be called after the delayed work is finished.
>But in the latter case, the destructor is called by AF_VSOCK core, just
>after the release(), so there may still be delayed work scheduled.
>
>Refactor the code, moving the code to delete the close work already in
>the do_close() to a new function. Invoke it during destruction to make
>sure we don't leave any pending work.
>
>Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>Cc: stable@vger.kernel.org
>Reported-by: Hyunwoo Kim <v4bel@theori.io>
>Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>---
> net/vmw_vsock/virtio_transport_common.c | 29 ++++++++++++++++++-------
> 1 file changed, 21 insertions(+), 8 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 51a494b69be8..7f7de6d88096 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -26,6 +26,9 @@
> /* Threshold for detecting small packets to copy */
> #define GOOD_COPY_LEN  128
>
>+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
>+					       bool cancel_timeout);
>+
> static const struct virtio_transport *
> virtio_transport_get_ops(struct vsock_sock *vsk)
> {
>@@ -1109,6 +1112,8 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
> {
> 	struct virtio_vsock_sock *vvs = vsk->trans;
>
>+	virtio_transport_cancel_close_work(vsk, true);
>+
> 	kfree(vvs);
> 	vsk->trans = NULL;
> }
>@@ -1204,17 +1209,11 @@ static void virtio_transport_wait_close(struct sock *sk, long timeout)
> 	}
> }
>
>-static void virtio_transport_do_close(struct vsock_sock *vsk,
>-				      bool cancel_timeout)
>+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
>+					       bool cancel_timeout)
> {
> 	struct sock *sk = sk_vsock(vsk);
>
>-	sock_set_flag(sk, SOCK_DONE);
>-	vsk->peer_shutdown = SHUTDOWN_MASK;
>-	if (vsock_stream_has_data(vsk) <= 0)
>-		sk->sk_state = TCP_CLOSING;
>-	sk->sk_state_change(sk);
>-
> 	if (vsk->close_work_scheduled &&
> 	    (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
> 		vsk->close_work_scheduled = false;
>@@ -1226,6 +1225,20 @@ static void virtio_transport_do_close(struct vsock_sock *vsk,
> 	}
> }
>
>+static void virtio_transport_do_close(struct vsock_sock *vsk,
>+				      bool cancel_timeout)
>+{
>+	struct sock *sk = sk_vsock(vsk);
>+
>+	sock_set_flag(sk, SOCK_DONE);
>+	vsk->peer_shutdown = SHUTDOWN_MASK;
>+	if (vsock_stream_has_data(vsk) <= 0)
>+		sk->sk_state = TCP_CLOSING;
>+	sk->sk_state_change(sk);
>+
>+	virtio_transport_cancel_close_work(vsk, cancel_timeout);
>+}
>+
> static void virtio_transport_close_timeout(struct work_struct *work)
> {
> 	struct vsock_sock *vsk =
>-- 
>2.47.1
>

Thanks!

Reviewed-by: Luigi Leonardi <leonardi@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport
  2025-01-10 10:56   ` Luigi Leonardi
@ 2025-01-10 11:25     ` Stefano Garzarella
  0 siblings, 0 replies; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-10 11:25 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, David S. Miller, Wongi Lee,
	Eugenio Pérez, Michael S. Tsirkin, Eric Dumazet, kvm,
	Paolo Abeni, Stefan Hajnoczi, Jason Wang, Simon Horman,
	Hyunwoo Kim, Jakub Kicinski, Michal Luczaj, virtualization,
	Bobby Eshleman, stable

On Fri, Jan 10, 2025 at 11:56:28AM +0100, Luigi Leonardi wrote:
>On Fri, Jan 10, 2025 at 09:35:10AM +0100, Stefano Garzarella wrote:
>>Transport's release() and destruct() are called when de-assigning the
>>vsock transport. These callbacks can touch some socket state like
>>sock flags, sk_state, and peer_shutdown.
>>
>>Since we are reassigning the socket to a new transport during
>>vsock_connect(), let's reset these fields to have a clean state with
>>the new transport.
>>
>>Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>>Cc: stable@vger.kernel.org
>>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>---
>>net/vmw_vsock/af_vsock.c | 9 +++++++++
>>1 file changed, 9 insertions(+)
>>
>>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>index 5cf8109f672a..74d35a871644 100644
>>--- a/net/vmw_vsock/af_vsock.c
>>+++ b/net/vmw_vsock/af_vsock.c
>>@@ -491,6 +491,15 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>>		 */
>>		vsk->transport->release(vsk);
>>		vsock_deassign_transport(vsk);
>>+
>>+		/* transport's release() and destruct() can touch some socket
>>+		 * state, since we are reassigning the socket to a new transport
>>+		 * during vsock_connect(), let's reset these fields to have a
>>+		 * clean state.
>>+		 */
>>+		sock_reset_flag(sk, SOCK_DONE);
>>+		sk->sk_state = TCP_CLOSE;
>>+		vsk->peer_shutdown = 0;
>>	}
>>
>>	/* We increase the module refcnt to prevent the transport unloading
>>-- 
>>2.47.1
>>
>
>Hi Stefano,
>I spent some time investigating what would happen if the scheduled work
>ran before `virtio_transport_cancel_close_work`. IIUC that should do 
>no harm and all the fields are reset correctly.

Yep, after transport->destruct() call, the delayed work should have 
already finished or canceled.

>
>Thank you,
>Luigi
>
>Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
>

Thanks for the review,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-10  8:35 ` [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes Stefano Garzarella
@ 2025-01-10 22:46   ` Hyunwoo Kim
  2025-01-12 22:42   ` Michal Luczaj
  1 sibling, 0 replies; 26+ messages in thread
From: Hyunwoo Kim @ 2025-01-10 22:46 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Jakub Kicinski,
	Michal Luczaj, virtualization, Bobby Eshleman, stable, imv4bel,
	v4bel

On Fri, Jan 10, 2025 at 09:35:07AM +0100, Stefano Garzarella wrote:
> If the socket has been de-assigned or assigned to another transport,
> we must discard any packets received because they are not expected
> and would cause issues when we access vsk->transport.
> 
> A possible scenario is described by Hyunwoo Kim in the attached link,
> where after a first connect() interrupted by a signal, and a second
> connect() failed, we can find `vsk->transport` at NULL, leading to a
> NULL pointer dereference.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Cc: stable@vger.kernel.org
> Reported-by: Hyunwoo Kim <v4bel@theori.io>
> Reported-by: Wongi Lee <qwerty@theori.io>
> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 9acc13ab3f82..51a494b69be8 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  
>  	lock_sock(sk);
>  
> -	/* Check if sk has been closed before lock_sock */
> -	if (sock_flag(sk, SOCK_DONE)) {
> +	/* Check if sk has been closed or assigned to another transport before
> +	 * lock_sock (note: listener sockets are not assigned to any transport)
> +	 */
> +	if (sock_flag(sk, SOCK_DONE) ||
> +	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
>  		(void)virtio_transport_reset_no_sock(t, skb);
>  		release_sock(sk);
>  		sock_put(sk);
> -- 
> 2.47.1
> 

Reviewed-by: Hyunwoo Kim <v4bel@theori.io>


Regards,
Hyunwoo Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor
  2025-01-10  8:35 ` [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor Stefano Garzarella
  2025-01-10 10:57   ` Luigi Leonardi
@ 2025-01-10 22:48   ` Hyunwoo Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Hyunwoo Kim @ 2025-01-10 22:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Jakub Kicinski,
	Michal Luczaj, virtualization, Bobby Eshleman, stable, imv4bel,
	v4bel

On Fri, Jan 10, 2025 at 09:35:09AM +0100, Stefano Garzarella wrote:
> During virtio_transport_release() we can schedule a delayed work to
> perform the closing of the socket before destruction.
> 
> The destructor is called either when the socket is really destroyed
> (reference counter to zero), or it can also be called when we are
> de-assigning the transport.
> 
> In the former case, we are sure the delayed work has completed, because
> it holds a reference until it completes, so the destructor will
> definitely be called after the delayed work is finished.
> But in the latter case, the destructor is called by AF_VSOCK core, just
> after the release(), so there may still be delayed work scheduled.
> 
> Refactor the code, moving the code to delete the close work already in
> the do_close() to a new function. Invoke it during destruction to make
> sure we don't leave any pending work.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Cc: stable@vger.kernel.org
> Reported-by: Hyunwoo Kim <v4bel@theori.io>
> Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 29 ++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 51a494b69be8..7f7de6d88096 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -26,6 +26,9 @@
>  /* Threshold for detecting small packets to copy */
>  #define GOOD_COPY_LEN  128
>  
> +static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
> +					       bool cancel_timeout);
> +
>  static const struct virtio_transport *
>  virtio_transport_get_ops(struct vsock_sock *vsk)
>  {
> @@ -1109,6 +1112,8 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
>  
> +	virtio_transport_cancel_close_work(vsk, true);
> +
>  	kfree(vvs);
>  	vsk->trans = NULL;
>  }
> @@ -1204,17 +1209,11 @@ static void virtio_transport_wait_close(struct sock *sk, long timeout)
>  	}
>  }
>  
> -static void virtio_transport_do_close(struct vsock_sock *vsk,
> -				      bool cancel_timeout)
> +static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
> +					       bool cancel_timeout)
>  {
>  	struct sock *sk = sk_vsock(vsk);
>  
> -	sock_set_flag(sk, SOCK_DONE);
> -	vsk->peer_shutdown = SHUTDOWN_MASK;
> -	if (vsock_stream_has_data(vsk) <= 0)
> -		sk->sk_state = TCP_CLOSING;
> -	sk->sk_state_change(sk);
> -
>  	if (vsk->close_work_scheduled &&
>  	    (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
>  		vsk->close_work_scheduled = false;
> @@ -1226,6 +1225,20 @@ static void virtio_transport_do_close(struct vsock_sock *vsk,
>  	}
>  }
>  
> +static void virtio_transport_do_close(struct vsock_sock *vsk,
> +				      bool cancel_timeout)
> +{
> +	struct sock *sk = sk_vsock(vsk);
> +
> +	sock_set_flag(sk, SOCK_DONE);
> +	vsk->peer_shutdown = SHUTDOWN_MASK;
> +	if (vsock_stream_has_data(vsk) <= 0)
> +		sk->sk_state = TCP_CLOSING;
> +	sk->sk_state_change(sk);
> +
> +	virtio_transport_cancel_close_work(vsk, cancel_timeout);
> +}
> +
>  static void virtio_transport_close_timeout(struct work_struct *work)
>  {
>  	struct vsock_sock *vsk =
> -- 
> 2.47.1
> 

The two scenarios I presented have been resolved.

Tested-by: Hyunwoo Kim <v4bel@theori.io>


Regards,
Hyunwoo Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
  2025-01-10  8:35 ` [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space] Stefano Garzarella
  2025-01-10  9:49   ` Luigi Leonardi
@ 2025-01-10 22:52   ` Hyunwoo Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Hyunwoo Kim @ 2025-01-10 22:52 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Jakub Kicinski,
	Michal Luczaj, virtualization, Bobby Eshleman, stable, imv4bel,
	v4bel

On Fri, Jan 10, 2025 at 09:35:11AM +0100, Stefano Garzarella wrote:
> Recent reports have shown how we sometimes call vsock_*_has_data()
> when a vsock socket has been de-assigned from a transport (see attached
> links), but we shouldn't.
> 
> Previous commits should have solved the real problems, but we may have
> more in the future, so to avoid null-ptr-deref, we can return 0
> (no space, no data available) but with a warning.
> 
> This way the code should continue to run in a nearly consistent state
> and have a warning that allows us to debug future problems.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
> Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
> Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
> Co-developed-by: Hyunwoo Kim <v4bel@theori.io>
> Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
> Co-developed-by: Wongi Lee <qwerty@theori.io>
> Signed-off-by: Wongi Lee <qwerty@theori.io>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/af_vsock.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 74d35a871644..fa9d1b49599b 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -879,6 +879,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
>  
>  s64 vsock_stream_has_data(struct vsock_sock *vsk)
>  {
> +	if (WARN_ON(!vsk->transport))
> +		return 0;
> +
>  	return vsk->transport->stream_has_data(vsk);
>  }
>  EXPORT_SYMBOL_GPL(vsock_stream_has_data);
> @@ -887,6 +890,9 @@ s64 vsock_connectible_has_data(struct vsock_sock *vsk)
>  {
>  	struct sock *sk = sk_vsock(vsk);
>  
> +	if (WARN_ON(!vsk->transport))
> +		return 0;
> +
>  	if (sk->sk_type == SOCK_SEQPACKET)
>  		return vsk->transport->seqpacket_has_data(vsk);
>  	else
> @@ -896,6 +902,9 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_data);
>  
>  s64 vsock_stream_has_space(struct vsock_sock *vsk)
>  {
> +	if (WARN_ON(!vsk->transport))
> +		return 0;
> +
>  	return vsk->transport->stream_has_space(vsk);
>  }
>  EXPORT_SYMBOL_GPL(vsock_stream_has_space);
> -- 
> 2.47.1
> 

Reviewed-by: Hyunwoo Kim <v4bel@theori.io>


Regards,
Hyunwoo Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-10  8:35 ` [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes Stefano Garzarella
  2025-01-10 22:46   ` Hyunwoo Kim
@ 2025-01-12 22:42   ` Michal Luczaj
  2025-01-13  8:57     ` Stefano Garzarella
  1 sibling, 1 reply; 26+ messages in thread
From: Michal Luczaj @ 2025-01-12 22:42 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi, David S. Miller,
	Wongi Lee, Eugenio Pérez, Michael S. Tsirkin, Eric Dumazet,
	kvm, Paolo Abeni, Stefan Hajnoczi, Jason Wang, Simon Horman,
	Hyunwoo Kim, Jakub Kicinski, virtualization, Bobby Eshleman,
	stable

On 1/10/25 09:35, Stefano Garzarella wrote:
> If the socket has been de-assigned or assigned to another transport,
> we must discard any packets received because they are not expected
> and would cause issues when we access vsk->transport.
> 
> A possible scenario is described by Hyunwoo Kim in the attached link,
> where after a first connect() interrupted by a signal, and a second
> connect() failed, we can find `vsk->transport` at NULL, leading to a
> NULL pointer dereference.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Cc: stable@vger.kernel.org
> Reported-by: Hyunwoo Kim <v4bel@theori.io>
> Reported-by: Wongi Lee <qwerty@theori.io>
> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 9acc13ab3f82..51a494b69be8 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  
>  	lock_sock(sk);
>  
> -	/* Check if sk has been closed before lock_sock */
> -	if (sock_flag(sk, SOCK_DONE)) {
> +	/* Check if sk has been closed or assigned to another transport before
> +	 * lock_sock (note: listener sockets are not assigned to any transport)
> +	 */
> +	if (sock_flag(sk, SOCK_DONE) ||
> +	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
>  		(void)virtio_transport_reset_no_sock(t, skb);
>  		release_sock(sk);
>  		sock_put(sk);

I wanted to check if such special-casing for TCP_LISTEN doesn't bother
BPF/sockmap, but instead I've hit a UAF.

```
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>

/* net/vmw_vsock/af_vsock.c */
#define MAX_PORT_RETRIES	24

static void die(const char *msg)
{
	perror(msg);
	exit(-1);
}

int socket_bind(int port)
{
	struct sockaddr_vm addr = {
		.svm_family = AF_VSOCK,
		.svm_cid = VMADDR_CID_LOCAL,
		.svm_port = port,
	};
	int s;

	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	if (s < 0)
		die("socket");

	if (bind(s, (struct sockaddr *)&addr, sizeof(addr)))
		die("bind");

	return s;
}

int main(void)
{
	struct sockaddr_vm addr;
	socklen_t alen = sizeof(addr);
	int dummy, i, s;

	/* Play with `static u32 port` in __vsock_bind_connectible()
	 * to fail vsock_auto_bind() at connect #1.
	 */
	dummy = socket_bind(VMADDR_PORT_ANY);
	if (getsockname(dummy, (struct sockaddr *)&addr, &alen))
		die("getsockname");
	for (i = 0; i < MAX_PORT_RETRIES; ++i)
		socket_bind(++addr.svm_port);

	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	if (s < 0)
		die("socket s");

	if (!connect(s, (struct sockaddr *)&addr, alen))
		die("connect #1");
	perror("ok, connect #1 failed; transport set, sk in unbound list");

	addr.svm_cid = 42; /* non-existing */
	if (!connect(s, (struct sockaddr *)&addr, alen))
		die("connect #2");
	/* vsock_assign_transport
	 *   virtio_transport_release (vsk->transport->release)
	 *     virtio_transport_remove_sock
	 *       vsock_remove_sock
	 *         vsock_remove_bound
	 *           __vsock_remove_bound
	 *             sock_put(&vsk->sk)
	 */
	perror("ok, connect #2 failed; transport unset, sk ref dropped");

	addr.svm_cid = VMADDR_CID_LOCAL;
	addr.svm_port = VMADDR_PORT_ANY;
	if (bind(s, (struct sockaddr *)&addr, alen))
		die("bind s");
	/* vsock_bind
	 *   __vsock_bind
	 *     __vsock_bind_connectible
	 *       __vsock_remove_bound
	 *         sock_put(&vsk->sk)
	 */

	printf("done\n");
	return 0;
}
```

=========================
WARNING: held lock freed!
6.13.0-rc6+ #146 Not tainted
-------------------------
a.out/2057 is freeing memory ffff88816b46a200-ffff88816b46a9f7, with a lock still held there!
ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
2 locks held by a.out/2057:
 #0: ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
 #1: ffffffff86574a78 (vsock_table_lock){+...}-{3:3}, at: __vsock_bind+0x129/0x730

stack backtrace:
CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0x90
 debug_check_no_locks_freed+0x21a/0x280
 ? lockdep_hardirqs_on+0x78/0x100
 kmem_cache_free+0x142/0x590
 ? security_sk_free+0x54/0xf0
 ? __sk_destruct+0x388/0x5a0
 __sk_destruct+0x388/0x5a0
 __vsock_bind+0x5e1/0x730
 ? __pfx___vsock_bind+0x10/0x10
 ? __local_bh_enable_ip+0xab/0x140
 vsock_bind+0x97/0xe0
 ? __pfx_vsock_bind+0x10/0x10
 __sys_bind+0x154/0x1f0
 ? __pfx___sys_bind+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 __x64_sys_bind+0x6e/0xb0
 ? lockdep_hardirqs_on+0x78/0x100
 do_syscall_64+0x93/0x1b0
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a618e34b
Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
 </TASK>
==================================================================
BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730
Read of size 4 at addr ffff88816b46a74c by task a.out/2057

CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0x90
 print_report+0x174/0x4f6
 ? __virt_addr_valid+0x208/0x400
 ? __vsock_bind+0x62e/0x730
 kasan_report+0xb9/0x190
 ? __vsock_bind+0x62e/0x730
 __vsock_bind+0x62e/0x730
 ? __pfx___vsock_bind+0x10/0x10
 ? __local_bh_enable_ip+0xab/0x140
 vsock_bind+0x97/0xe0
 ? __pfx_vsock_bind+0x10/0x10
 __sys_bind+0x154/0x1f0
 ? __pfx___sys_bind+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 __x64_sys_bind+0x6e/0xb0
 ? lockdep_hardirqs_on+0x78/0x100
 do_syscall_64+0x93/0x1b0
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a618e34b
Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
 </TASK>

Allocated by task 2057:
 kasan_save_stack+0x1e/0x40
 kasan_save_track+0x10/0x30
 __kasan_slab_alloc+0x85/0x90
 kmem_cache_alloc_noprof+0x131/0x450
 sk_prot_alloc+0x5b/0x220
 sk_alloc+0x2c/0x870
 __vsock_create.constprop.0+0x2e/0xb60
 vsock_create+0xe4/0x420
 __sock_create+0x241/0x650
 __sys_socket+0xf2/0x1a0
 __x64_sys_socket+0x6e/0xb0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Freed by task 2057:
 kasan_save_stack+0x1e/0x40
 kasan_save_track+0x10/0x30
 kasan_save_free_info+0x37/0x60
 __kasan_slab_free+0x4b/0x70
 kmem_cache_free+0x1a1/0x590
 __sk_destruct+0x388/0x5a0
 __vsock_bind+0x5e1/0x730
 vsock_bind+0x97/0xe0
 __sys_bind+0x154/0x1f0
 __x64_sys_bind+0x6e/0xb0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The buggy address belongs to the object at ffff88816b46a200
 which belongs to the cache AF_VSOCK of size 2040
The buggy address is located 1356 bytes inside of
 freed 2040-byte region [ffff88816b46a200, ffff88816b46a9f8)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16b468
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff888125368401
flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
head: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
head: 0017ffffc0000003 ffffea0005ad1a01 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88816b46a600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88816b46a680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88816b46a700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
 ffff88816b46a780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88816b46a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B              6.13.0-rc6+ #146
Tainted: [B]=BAD_PAGE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xce/0x150
Code: 7b fe d8 03 01 e8 22 db ac fe 0f 0b eb b1 80 3d 6e fe d8 03 00 75 a8 48 c7 c7 e0 da 95 84 c6 05 5e fe d8 03 01 e8 02 db ac fe <0f> 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05
RSP: 0018:ffff8881285c7c90 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcd76349
R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a768
R13: ffff88816b46a280 R14: ffff88816b46a770 R15: ffffffff88901520
FS:  00007fa9a606e740(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000010f8d488 CR3: 0000000130c4a000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1ff
 ? refcount_warn_saturate+0xce/0x150
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? refcount_warn_saturate+0xce/0x150
 __vsock_bind+0x66d/0x730
 ? __pfx___vsock_bind+0x10/0x10
 ? __local_bh_enable_ip+0xab/0x140
 vsock_bind+0x97/0xe0
 ? __pfx_vsock_bind+0x10/0x10
 __sys_bind+0x154/0x1f0
 ? __pfx___sys_bind+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 __x64_sys_bind+0x6e/0xb0
 ? lockdep_hardirqs_on+0x78/0x100
 do_syscall_64+0x93/0x1b0
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a618e34b
Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
 </TASK>
irq event stamp: 9836
hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B   W          6.13.0-rc6+ #146
Tainted: [B]=BAD_PAGE, [W]=WARN
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xee/0x150
Code: 5e fe d8 03 01 e8 02 db ac fe 0f 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05 3d fe d8 03 01 e8 e2 da ac fe <0f> 0b e9 6e ff ff ff 80 3d 2d fe d8 03 00 0f 85 61 ff ff ff 48 c7
RSP: 0018:ffff8881285c7b68 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcd76349
R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a770
R13: ffffffff88901520 R14: ffffffff88901520 R15: ffff888128cff640
FS:  0000000000000000(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa9a6156050 CR3: 0000000005a74000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1ff
 ? refcount_warn_saturate+0xee/0x150
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? refcount_warn_saturate+0xee/0x150
 ? refcount_warn_saturate+0xee/0x150
 vsock_remove_bound+0x187/0x1e0
 __vsock_release+0x383/0x4a0
 ? down_write+0x129/0x1c0
 vsock_release+0x90/0x120
 __sock_release+0xa3/0x250
 sock_close+0x14/0x20
 __fput+0x359/0xa80
 ? trace_irq_enable.constprop.0+0xce/0x110
 task_work_run+0x107/0x1d0
 ? __pfx_do_raw_spin_lock+0x10/0x10
 ? __pfx_task_work_run+0x10/0x10
 do_exit+0x847/0x2560
 ? __pfx_lock_release+0x10/0x10
 ? do_raw_spin_lock+0x11a/0x240
 ? __pfx_do_exit+0x10/0x10
 ? rcu_is_watching+0x11/0xb0
 ? trace_irq_enable.constprop.0+0xce/0x110
 do_group_exit+0xb8/0x250
 __x64_sys_exit_group+0x3a/0x50
 x64_sys_call+0xfec/0x14f0
 do_syscall_64+0x93/0x1b0
 ? __pfx___up_read+0x10/0x10
 ? rcu_is_watching+0x11/0xb0
 ? trace_irq_enable.constprop.0+0xce/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a615606d
Code: Unable to access opcode bytes at 0x7fa9a6156043.
RSP: 002b:00007fff5e2d2f58 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007fa9a6259fa8 RCX: 00007fa9a615606d
RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
RBP: 00007fff5e2d2fb0 R08: 00007fff5e2d2f00 R09: 00007fff5e2d2e8f
R10: 00007fff5e2d2e10 R11: 0000000000000206 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a6258680 R15: 00007fa9a6259fc0
 </TASK>
irq event stamp: 9836
hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
---[ end trace 0000000000000000 ]---

So, if I get this right:
1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
2. transport->release() calls vsock_remove_bound() without checking if sk
   was bound and moved to bound list (refcnt=1)
3. vsock_bind() assumes sk is in unbound list and before
   __vsock_insert_bound(vsock_bound_sockets()) calls
   __vsock_remove_bound() which does:
      list_del_init(&vsk->bound_table); // nop
      sock_put(&vsk->sk);               // refcnt=0

The following fixes things for me. I'm just not certain that's the only
place where transport destruction may lead to an unbound socket being
removed from the unbound list.

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 7f7de6d88096..0fe807c8c052 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
 
 	if (remove_sock) {
 		sock_set_flag(sk, SOCK_DONE);
-		virtio_transport_remove_sock(vsk);
+		if (vsock_addr_bound(&vsk->local_addr))
+			virtio_transport_remove_sock(vsk);
 	}
 }
 EXPORT_SYMBOL_GPL(virtio_transport_release);

Thanks,
Michal


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-12 22:42   ` Michal Luczaj
@ 2025-01-13  8:57     ` Stefano Garzarella
  2025-01-13  9:07       ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-13  8:57 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, Bobby Eshleman, stable

On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>On 1/10/25 09:35, Stefano Garzarella wrote:
>> If the socket has been de-assigned or assigned to another transport,
>> we must discard any packets received because they are not expected
>> and would cause issues when we access vsk->transport.
>>
>> A possible scenario is described by Hyunwoo Kim in the attached link,
>> where after a first connect() interrupted by a signal, and a second
>> connect() failed, we can find `vsk->transport` at NULL, leading to a
>> NULL pointer dereference.
>>
>> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>> Cc: stable@vger.kernel.org
>> Reported-by: Hyunwoo Kim <v4bel@theori.io>
>> Reported-by: Wongi Lee <qwerty@theori.io>
>> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>  net/vmw_vsock/virtio_transport_common.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 9acc13ab3f82..51a494b69be8 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>
>>  	lock_sock(sk);
>>
>> -	/* Check if sk has been closed before lock_sock */
>> -	if (sock_flag(sk, SOCK_DONE)) {
>> +	/* Check if sk has been closed or assigned to another transport before
>> +	 * lock_sock (note: listener sockets are not assigned to any transport)
>> +	 */
>> +	if (sock_flag(sk, SOCK_DONE) ||
>> +	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
>>  		(void)virtio_transport_reset_no_sock(t, skb);
>>  		release_sock(sk);
>>  		sock_put(sk);
>
>I wanted to check if such special-casing for TCP_LISTEN doesn't bother
>BPF/sockmap, but instead I've hit a UAF.
>
>```
>#include <stdio.h>
>#include <stdlib.h>
>#include <sys/socket.h>
>#include <linux/vm_sockets.h>
>
>/* net/vmw_vsock/af_vsock.c */
>#define MAX_PORT_RETRIES	24
>
>static void die(const char *msg)
>{
>	perror(msg);
>	exit(-1);
>}
>
>int socket_bind(int port)
>{
>	struct sockaddr_vm addr = {
>		.svm_family = AF_VSOCK,
>		.svm_cid = VMADDR_CID_LOCAL,
>		.svm_port = port,
>	};
>	int s;
>
>	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>	if (s < 0)
>		die("socket");
>
>	if (bind(s, (struct sockaddr *)&addr, sizeof(addr)))
>		die("bind");
>
>	return s;
>}
>
>int main(void)
>{
>	struct sockaddr_vm addr;
>	socklen_t alen = sizeof(addr);
>	int dummy, i, s;
>
>	/* Play with `static u32 port` in __vsock_bind_connectible()
>	 * to fail vsock_auto_bind() at connect #1.
>	 */
>	dummy = socket_bind(VMADDR_PORT_ANY);
>	if (getsockname(dummy, (struct sockaddr *)&addr, &alen))
>		die("getsockname");
>	for (i = 0; i < MAX_PORT_RETRIES; ++i)
>		socket_bind(++addr.svm_port);
>
>	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>	if (s < 0)
>		die("socket s");
>
>	if (!connect(s, (struct sockaddr *)&addr, alen))
>		die("connect #1");
>	perror("ok, connect #1 failed; transport set, sk in unbound list");
>
>	addr.svm_cid = 42; /* non-existing */
>	if (!connect(s, (struct sockaddr *)&addr, alen))
>		die("connect #2");
>	/* vsock_assign_transport
>	 *   virtio_transport_release (vsk->transport->release)
>	 *     virtio_transport_remove_sock
>	 *       vsock_remove_sock
>	 *         vsock_remove_bound
>	 *           __vsock_remove_bound
>	 *             sock_put(&vsk->sk)
>	 */
>	perror("ok, connect #2 failed; transport unset, sk ref dropped");
>
>	addr.svm_cid = VMADDR_CID_LOCAL;
>	addr.svm_port = VMADDR_PORT_ANY;
>	if (bind(s, (struct sockaddr *)&addr, alen))
>		die("bind s");
>	/* vsock_bind
>	 *   __vsock_bind
>	 *     __vsock_bind_connectible
>	 *       __vsock_remove_bound
>	 *         sock_put(&vsk->sk)
>	 */
>
>	printf("done\n");
>	return 0;
>}
>```
>
>=========================
>WARNING: held lock freed!
>6.13.0-rc6+ #146 Not tainted
>-------------------------
>a.out/2057 is freeing memory ffff88816b46a200-ffff88816b46a9f7, with a lock still held there!
>ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
>2 locks held by a.out/2057:
> #0: ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
> #1: ffffffff86574a78 (vsock_table_lock){+...}-{3:3}, at: __vsock_bind+0x129/0x730
>
>stack backtrace:
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>Call Trace:
> <TASK>
> dump_stack_lvl+0x68/0x90
> debug_check_no_locks_freed+0x21a/0x280
> ? lockdep_hardirqs_on+0x78/0x100
> kmem_cache_free+0x142/0x590
> ? security_sk_free+0x54/0xf0
> ? __sk_destruct+0x388/0x5a0
> __sk_destruct+0x388/0x5a0
> __vsock_bind+0x5e1/0x730
> ? __pfx___vsock_bind+0x10/0x10
> ? __local_bh_enable_ip+0xab/0x140
> vsock_bind+0x97/0xe0
> ? __pfx_vsock_bind+0x10/0x10
> __sys_bind+0x154/0x1f0
> ? __pfx___sys_bind+0x10/0x10
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> __x64_sys_bind+0x6e/0xb0
> ? lockdep_hardirqs_on+0x78/0x100
> do_syscall_64+0x93/0x1b0
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a618e34b
>Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
>RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
>RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
>RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
>RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
>R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
> </TASK>
>==================================================================
>BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730
>Read of size 4 at addr ffff88816b46a74c by task a.out/2057
>
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>Call Trace:
> <TASK>
> dump_stack_lvl+0x68/0x90
> print_report+0x174/0x4f6
> ? __virt_addr_valid+0x208/0x400
> ? __vsock_bind+0x62e/0x730
> kasan_report+0xb9/0x190
> ? __vsock_bind+0x62e/0x730
> __vsock_bind+0x62e/0x730
> ? __pfx___vsock_bind+0x10/0x10
> ? __local_bh_enable_ip+0xab/0x140
> vsock_bind+0x97/0xe0
> ? __pfx_vsock_bind+0x10/0x10
> __sys_bind+0x154/0x1f0
> ? __pfx___sys_bind+0x10/0x10
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> __x64_sys_bind+0x6e/0xb0
> ? lockdep_hardirqs_on+0x78/0x100
> do_syscall_64+0x93/0x1b0
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a618e34b
>Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
>RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
>RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
>RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
>RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
>R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
> </TASK>
>
>Allocated by task 2057:
> kasan_save_stack+0x1e/0x40
> kasan_save_track+0x10/0x30
> __kasan_slab_alloc+0x85/0x90
> kmem_cache_alloc_noprof+0x131/0x450
> sk_prot_alloc+0x5b/0x220
> sk_alloc+0x2c/0x870
> __vsock_create.constprop.0+0x2e/0xb60
> vsock_create+0xe4/0x420
> __sock_create+0x241/0x650
> __sys_socket+0xf2/0x1a0
> __x64_sys_socket+0x6e/0xb0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>Freed by task 2057:
> kasan_save_stack+0x1e/0x40
> kasan_save_track+0x10/0x30
> kasan_save_free_info+0x37/0x60
> __kasan_slab_free+0x4b/0x70
> kmem_cache_free+0x1a1/0x590
> __sk_destruct+0x388/0x5a0
> __vsock_bind+0x5e1/0x730
> vsock_bind+0x97/0xe0
> __sys_bind+0x154/0x1f0
> __x64_sys_bind+0x6e/0xb0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>The buggy address belongs to the object at ffff88816b46a200
> which belongs to the cache AF_VSOCK of size 2040
>The buggy address is located 1356 bytes inside of
> freed 2040-byte region [ffff88816b46a200, ffff88816b46a9f8)
>
>The buggy address belongs to the physical page:
>page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16b468
>head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>memcg:ffff888125368401
>flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
>page_type: f5(slab)
>raw: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
>raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
>head: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
>head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
>head: 0017ffffc0000003 ffffea0005ad1a01 ffffffffffffffff 0000000000000000
>head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
>page dumped because: kasan: bad access detected
>
>Memory state around the buggy address:
> ffff88816b46a600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88816b46a680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>ffff88816b46a700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                              ^
> ffff88816b46a780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88816b46a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>==================================================================
>------------[ cut here ]------------
>refcount_t: addition on 0; use-after-free.
>WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
>Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B              6.13.0-rc6+ #146
>Tainted: [B]=BAD_PAGE
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>RIP: 0010:refcount_warn_saturate+0xce/0x150
>Code: 7b fe d8 03 01 e8 22 db ac fe 0f 0b eb b1 80 3d 6e fe d8 03 00 75 a8 48 c7 c7 e0 da 95 84 c6 05 5e fe d8 03 01 e8 02 db ac fe <0f> 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05
>RSP: 0018:ffff8881285c7c90 EFLAGS: 00010282
>RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
>RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
>RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcd76349
>R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a768
>R13: ffff88816b46a280 R14: ffff88816b46a770 R15: ffffffff88901520
>FS:  00007fa9a606e740(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 0000000010f8d488 CR3: 0000000130c4a000 CR4: 0000000000752ef0
>PKRU: 55555554
>Call Trace:
> <TASK>
> ? __warn.cold+0x5f/0x1ff
> ? refcount_warn_saturate+0xce/0x150
> ? report_bug+0x1ec/0x390
> ? handle_bug+0x58/0x90
> ? exc_invalid_op+0x13/0x40
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0xce/0x150
> __vsock_bind+0x66d/0x730
> ? __pfx___vsock_bind+0x10/0x10
> ? __local_bh_enable_ip+0xab/0x140
> vsock_bind+0x97/0xe0
> ? __pfx_vsock_bind+0x10/0x10
> __sys_bind+0x154/0x1f0
> ? __pfx___sys_bind+0x10/0x10
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> __x64_sys_bind+0x6e/0xb0
> ? lockdep_hardirqs_on+0x78/0x100
> do_syscall_64+0x93/0x1b0
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a618e34b
>Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
>RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
>RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
>RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
>RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
>R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
> </TASK>
>irq event stamp: 9836
>hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
>hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
>softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
>softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
>---[ end trace 0000000000000000 ]---
>------------[ cut here ]------------
>refcount_t: underflow; use-after-free.
>WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
>Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B   W          6.13.0-rc6+ #146
>Tainted: [B]=BAD_PAGE, [W]=WARN
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>RIP: 0010:refcount_warn_saturate+0xee/0x150
>Code: 5e fe d8 03 01 e8 02 db ac fe 0f 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05 3d fe d8 03 01 e8 e2 da ac fe <0f> 0b e9 6e ff ff ff 80 3d 2d fe d8 03 00 0f 85 61 ff ff ff 48 c7
>RSP: 0018:ffff8881285c7b68 EFLAGS: 00010296
>RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
>RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
>RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcd76349
>R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a770
>R13: ffffffff88901520 R14: ffffffff88901520 R15: ffff888128cff640
>FS:  0000000000000000(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 00007fa9a6156050 CR3: 0000000005a74000 CR4: 0000000000752ef0
>PKRU: 55555554
>Call Trace:
> <TASK>
> ? __warn.cold+0x5f/0x1ff
> ? refcount_warn_saturate+0xee/0x150
> ? report_bug+0x1ec/0x390
> ? handle_bug+0x58/0x90
> ? exc_invalid_op+0x13/0x40
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0xee/0x150
> ? refcount_warn_saturate+0xee/0x150
> vsock_remove_bound+0x187/0x1e0
> __vsock_release+0x383/0x4a0
> ? down_write+0x129/0x1c0
> vsock_release+0x90/0x120
> __sock_release+0xa3/0x250
> sock_close+0x14/0x20
> __fput+0x359/0xa80
> ? trace_irq_enable.constprop.0+0xce/0x110
> task_work_run+0x107/0x1d0
> ? __pfx_do_raw_spin_lock+0x10/0x10
> ? __pfx_task_work_run+0x10/0x10
> do_exit+0x847/0x2560
> ? __pfx_lock_release+0x10/0x10
> ? do_raw_spin_lock+0x11a/0x240
> ? __pfx_do_exit+0x10/0x10
> ? rcu_is_watching+0x11/0xb0
> ? trace_irq_enable.constprop.0+0xce/0x110
> do_group_exit+0xb8/0x250
> __x64_sys_exit_group+0x3a/0x50
> x64_sys_call+0xfec/0x14f0
> do_syscall_64+0x93/0x1b0
> ? __pfx___up_read+0x10/0x10
> ? rcu_is_watching+0x11/0xb0
> ? trace_irq_enable.constprop.0+0xce/0x110
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a615606d
>Code: Unable to access opcode bytes at 0x7fa9a6156043.
>RSP: 002b:00007fff5e2d2f58 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
>RAX: ffffffffffffffda RBX: 00007fa9a6259fa8 RCX: 00007fa9a615606d
>RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
>RBP: 00007fff5e2d2fb0 R08: 00007fff5e2d2f00 R09: 00007fff5e2d2e8f
>R10: 00007fff5e2d2e10 R11: 0000000000000206 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a6258680 R15: 00007fa9a6259fc0
> </TASK>
>irq event stamp: 9836
>hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
>hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
>softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
>softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
>---[ end trace 0000000000000000 ]---
>
>So, if I get this right:
>1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>2. transport->release() calls vsock_remove_bound() without checking if sk
>   was bound and moved to bound list (refcnt=1)
>3. vsock_bind() assumes sk is in unbound list and before
>   __vsock_insert_bound(vsock_bound_sockets()) calls
>   __vsock_remove_bound() which does:
>      list_del_init(&vsk->bound_table); // nop
>      sock_put(&vsk->sk);               // refcnt=0
>
>The following fixes things for me. I'm just not certain that's the only
>place where transport destruction may lead to an unbound socket being
>removed from the unbound list.
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 7f7de6d88096..0fe807c8c052 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>
> 	if (remove_sock) {
> 		sock_set_flag(sk, SOCK_DONE);
>-		virtio_transport_remove_sock(vsk);
>+		if (vsock_addr_bound(&vsk->local_addr))
>+			virtio_transport_remove_sock(vsk);

I don't get this fix, virtio_transport_remove_sock() calls
   vsock_remove_sock()
     vsock_remove_bound()
       if (__vsock_in_bound_table(vsk))
           __vsock_remove_bound(vsk);


So, should already avoid this issue, no?

Can the problem be in vsock_bind() ?

Is this issue pre-existing or introduced by this series?

I'll also investigate a bit more.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-13  8:57     ` Stefano Garzarella
@ 2025-01-13  9:07       ` Stefano Garzarella
  2025-01-13 10:12         ` Michal Luczaj
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-13  9:07 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, Bobby Eshleman, stable

On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:

[...]

> >
> >So, if I get this right:
> >1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
> >2. transport->release() calls vsock_remove_bound() without checking if sk
> >   was bound and moved to bound list (refcnt=1)
> >3. vsock_bind() assumes sk is in unbound list and before
> >   __vsock_insert_bound(vsock_bound_sockets()) calls
> >   __vsock_remove_bound() which does:
> >      list_del_init(&vsk->bound_table); // nop
> >      sock_put(&vsk->sk);               // refcnt=0
> >
> >The following fixes things for me. I'm just not certain that's the only
> >place where transport destruction may lead to an unbound socket being
> >removed from the unbound list.
> >
> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >index 7f7de6d88096..0fe807c8c052 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
> >
> >       if (remove_sock) {
> >               sock_set_flag(sk, SOCK_DONE);
> >-              virtio_transport_remove_sock(vsk);
> >+              if (vsock_addr_bound(&vsk->local_addr))
> >+                      virtio_transport_remove_sock(vsk);
>
> I don't get this fix, virtio_transport_remove_sock() calls
>    vsock_remove_sock()
>      vsock_remove_bound()
>        if (__vsock_in_bound_table(vsk))
>            __vsock_remove_bound(vsk);
>
>
> So, should already avoid this issue, no?

I got it wrong, I see now what are you trying to do, but I don't think
we should skip virtio_transport_remove_sock() entirely, it also purge
the rx_queue.

>
> Can the problem be in vsock_bind() ?
>
> Is this issue pre-existing or introduced by this series?

I think this is pre-existing, can you confirm?

In that case, I'd not stop this series, and fix it in another patch/series.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-13  9:07       ` Stefano Garzarella
@ 2025-01-13 10:12         ` Michal Luczaj
  2025-01-13 11:05           ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Luczaj @ 2025-01-13 10:12 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, Bobby Eshleman, stable

On 1/13/25 10:07, Stefano Garzarella wrote:
> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
> 
> [...]
> 
>>>
>>> So, if I get this right:
>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>   was bound and moved to bound list (refcnt=1)
>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>   __vsock_remove_bound() which does:
>>>      list_del_init(&vsk->bound_table); // nop
>>>      sock_put(&vsk->sk);               // refcnt=0
>>>
>>> The following fixes things for me. I'm just not certain that's the only
>>> place where transport destruction may lead to an unbound socket being
>>> removed from the unbound list.
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 7f7de6d88096..0fe807c8c052 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>
>>>       if (remove_sock) {
>>>               sock_set_flag(sk, SOCK_DONE);
>>> -              virtio_transport_remove_sock(vsk);
>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>> +                      virtio_transport_remove_sock(vsk);
>>
>> I don't get this fix, virtio_transport_remove_sock() calls
>>    vsock_remove_sock()
>>      vsock_remove_bound()
>>        if (__vsock_in_bound_table(vsk))
>>            __vsock_remove_bound(vsk);
>>
>>
>> So, should already avoid this issue, no?
> 
> I got it wrong, I see now what are you trying to do, but I don't think
> we should skip virtio_transport_remove_sock() entirely, it also purge
> the rx_queue.

Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?

>> Can the problem be in vsock_bind() ?

Well, I wouldn't say so.

>> Is this issue pre-existing or introduced by this series?
> 
> I think this is pre-existing, can you confirm?

Yup, I agree, pre-existing.

> In that case, I'd not stop this series, and fix it in another patch/series.

Yeah, sure thing.

Thanks,
Michal


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-13 10:12         ` Michal Luczaj
@ 2025-01-13 11:05           ` Stefano Garzarella
  2025-01-13 13:51             ` Michal Luczaj
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-13 11:05 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, Bobby Eshleman, stable

On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>On 1/13/25 10:07, Stefano Garzarella wrote:
>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>
>> [...]
>>
>>>>
>>>> So, if I get this right:
>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>   was bound and moved to bound list (refcnt=1)
>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>   __vsock_remove_bound() which does:
>>>>      list_del_init(&vsk->bound_table); // nop
>>>>      sock_put(&vsk->sk);               // refcnt=0
>>>>
>>>> The following fixes things for me. I'm just not certain that's the only
>>>> place where transport destruction may lead to an unbound socket being
>>>> removed from the unbound list.
>>>>
>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>
>>>>       if (remove_sock) {
>>>>               sock_set_flag(sk, SOCK_DONE);
>>>> -              virtio_transport_remove_sock(vsk);
>>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>>> +                      virtio_transport_remove_sock(vsk);
>>>
>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>    vsock_remove_sock()
>>>      vsock_remove_bound()
>>>        if (__vsock_in_bound_table(vsk))
>>>            __vsock_remove_bound(vsk);
>>>
>>>
>>> So, should already avoid this issue, no?
>>
>> I got it wrong, I see now what are you trying to do, but I don't think
>> we should skip virtio_transport_remove_sock() entirely, it also purge
>> the rx_queue.
>
>Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?

It could be.

But I see some other issues:
- we need to fix also in the other transports, since they do the same
- we need to check delayed cancel work too that call 
   virtio_transport_remove_sock()

An alternative approach, which would perhaps allow us to avoid all this, 
is to re-insert the socket in the unbound list after calling release() 
when we deassign the transport.

WDYT?

Stefano

>
>>> Can the problem be in vsock_bind() ?
>
>Well, I wouldn't say so.
>
>>> Is this issue pre-existing or introduced by this series?
>>
>> I think this is pre-existing, can you confirm?
>
>Yup, I agree, pre-existing.
>
>> In that case, I'd not stop this series, and fix it in another patch/series.
>
>Yeah, sure thing.
>
>Thanks,
>Michal
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-13 11:05           ` Stefano Garzarella
@ 2025-01-13 13:51             ` Michal Luczaj
  2025-01-13 15:01               ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Luczaj @ 2025-01-13 13:51 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, Bobby Eshleman, stable

On 1/13/25 12:05, Stefano Garzarella wrote:
> On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>> On 1/13/25 10:07, Stefano Garzarella wrote:
>>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>>
>>> [...]
>>>
>>>>>
>>>>> So, if I get this right:
>>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>>   was bound and moved to bound list (refcnt=1)
>>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>>   __vsock_remove_bound() which does:
>>>>>      list_del_init(&vsk->bound_table); // nop
>>>>>      sock_put(&vsk->sk);               // refcnt=0
>>>>>
>>>>> The following fixes things for me. I'm just not certain that's the only
>>>>> place where transport destruction may lead to an unbound socket being
>>>>> removed from the unbound list.
>>>>>
>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>>
>>>>>       if (remove_sock) {
>>>>>               sock_set_flag(sk, SOCK_DONE);
>>>>> -              virtio_transport_remove_sock(vsk);
>>>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>>>> +                      virtio_transport_remove_sock(vsk);
>>>>
>>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>>    vsock_remove_sock()
>>>>      vsock_remove_bound()
>>>>        if (__vsock_in_bound_table(vsk))
>>>>            __vsock_remove_bound(vsk);
>>>>
>>>>
>>>> So, should already avoid this issue, no?
>>>
>>> I got it wrong, I see now what are you trying to do, but I don't think
>>> we should skip virtio_transport_remove_sock() entirely, it also purge
>>> the rx_queue.
>>
>> Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?
> 
> It could be.
> 
> But I see some other issues:
> - we need to fix also in the other transports, since they do the same

Ahh, yes, VMCI and Hyper-V would need that, too.

> - we need to check delayed cancel work too that call 
>    virtio_transport_remove_sock()

That's the "I'm just not certain" part. As with rx_queue, I though delayed
cancel can only happen for a bound socket. So, per architecture, no need to
deal with that here, right?

> An alternative approach, which would perhaps allow us to avoid all this, 
> is to re-insert the socket in the unbound list after calling release() 
> when we deassign the transport.
> 
> WDYT?

If we can't keep the old state (sk_state, transport, etc) on failed
re-connect() then reverting back to initial state sounds, uhh, like an
option :) I'm not sure how well this aligns with (user's expectations of)
good ol' socket API, but maybe that train has already left.

Another possibility would be to simply brick the socket on failed (re)connect.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-13 13:51             ` Michal Luczaj
@ 2025-01-13 15:01               ` Stefano Garzarella
  2025-01-14  0:09                 ` Michal Luczaj
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-13 15:01 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, Bobby Eshleman, stable

On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>On 1/13/25 12:05, Stefano Garzarella wrote:
>> On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>>> On 1/13/25 10:07, Stefano Garzarella wrote:
>>>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> So, if I get this right:
>>>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>>>   was bound and moved to bound list (refcnt=1)
>>>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>>>   __vsock_remove_bound() which does:
>>>>>>      list_del_init(&vsk->bound_table); // nop
>>>>>>      sock_put(&vsk->sk);               // refcnt=0
>>>>>>
>>>>>> The following fixes things for me. I'm just not certain that's the only
>>>>>> place where transport destruction may lead to an unbound socket being
>>>>>> removed from the unbound list.
>>>>>>
>>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>>>
>>>>>>       if (remove_sock) {
>>>>>>               sock_set_flag(sk, SOCK_DONE);
>>>>>> -              virtio_transport_remove_sock(vsk);
>>>>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>>>>> +                      virtio_transport_remove_sock(vsk);
>>>>>
>>>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>>>    vsock_remove_sock()
>>>>>      vsock_remove_bound()
>>>>>        if (__vsock_in_bound_table(vsk))
>>>>>            __vsock_remove_bound(vsk);
>>>>>
>>>>>
>>>>> So, should already avoid this issue, no?
>>>>
>>>> I got it wrong, I see now what are you trying to do, but I don't think
>>>> we should skip virtio_transport_remove_sock() entirely, it also purge
>>>> the rx_queue.
>>>
>>> Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?
>>
>> It could be.
>>
>> But I see some other issues:
>> - we need to fix also in the other transports, since they do the same
>
>Ahh, yes, VMCI and Hyper-V would need that, too.
>
>> - we need to check delayed cancel work too that call
>>    virtio_transport_remove_sock()
>
>That's the "I'm just not certain" part. As with rx_queue, I though delayed
>cancel can only happen for a bound socket. So, per architecture, no need to
>deal with that here, right?

I think so.

>
>> An alternative approach, which would perhaps allow us to avoid all this,
>> is to re-insert the socket in the unbound list after calling release()
>> when we deassign the transport.
>>
>> WDYT?
>
>If we can't keep the old state (sk_state, transport, etc) on failed
>re-connect() then reverting back to initial state sounds, uhh, like an
>option :) I'm not sure how well this aligns with (user's expectations of)
>good ol' socket API, but maybe that train has already left.

We really want to behave as similar as possible with the other sockets,
like AF_INET, so I would try to continue toward that train.

>
>Another possibility would be to simply brick the socket on failed (re)connect.
>

I see, though, this is not the behavior of AF_INET for example, right?

Do you have time to investigate/fix this problem?
If not, I'll try to look into it in the next few days, maybe next week.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-13 15:01               ` Stefano Garzarella
@ 2025-01-14  0:09                 ` Michal Luczaj
  2025-01-14 10:16                   ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Luczaj @ 2025-01-14  0:09 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, stable

On 1/13/25 16:01, Stefano Garzarella wrote:
> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>> ...
>>> An alternative approach, which would perhaps allow us to avoid all this,
>>> is to re-insert the socket in the unbound list after calling release()
>>> when we deassign the transport.
>>>
>>> WDYT?
>>
>> If we can't keep the old state (sk_state, transport, etc) on failed
>> re-connect() then reverting back to initial state sounds, uhh, like an
>> option :) I'm not sure how well this aligns with (user's expectations of)
>> good ol' socket API, but maybe that train has already left.
> 
> We really want to behave as similar as possible with the other sockets,
> like AF_INET, so I would try to continue toward that train.

I was worried that such connect()/transport error handling may have some
user visible side effects, but I guess I was wrong. I mean you can still
reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
that's a different issue.

I've tried your suggestion on top of this series. Passes the tests.

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index fa9d1b49599b..4718fe86689d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		vsk->transport->release(vsk);
 		vsock_deassign_transport(vsk);
 
+		vsock_addr_unbind(&vsk->local_addr);
+		vsock_addr_unbind(&vsk->remote_addr);
+		vsock_insert_unbound(vsk);
+
 		/* transport's release() and destruct() can touch some socket
 		 * state, since we are reassigning the socket to a new transport
 		 * during vsock_connect(), let's reset these fields to have a

>> Another possibility would be to simply brick the socket on failed (re)connect.
> 
> I see, though, this is not the behavior of AF_INET for example, right?

Right.

> Do you have time to investigate/fix this problem?
> If not, I'll try to look into it in the next few days, maybe next week.

I'm happy to help, but it's not like I have any better ideas.

Michal

[1]: E.g. this way:
```
from socket import *

MAX_PORT_RETRIES = 24 # net/vmw_vsock/af_vsock.c
VMADDR_CID_LOCAL = 1
VMADDR_PORT_ANY = -1
hold = []

def take_port(port):
	s = socket(AF_VSOCK, SOCK_SEQPACKET)
	s.bind((VMADDR_CID_LOCAL, port))
	hold.append(s)
	return s

s = take_port(VMADDR_PORT_ANY)
_, port = s.getsockname()
for _ in range(MAX_PORT_RETRIES):
	port += 1
	take_port(port);

s = socket(AF_VSOCK, SOCK_SEQPACKET)
err = s.connect_ex((VMADDR_CID_LOCAL, port))
assert err != 0
print("ok, connect failed; transport set")

s.bind((VMADDR_CID_LOCAL, port+1))
s.listen(16)
```


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-14  0:09                 ` Michal Luczaj
@ 2025-01-14 10:16                   ` Stefano Garzarella
  2025-01-14 16:31                     ` Michal Luczaj
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-14 10:16 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, stable

On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
>On 1/13/25 16:01, Stefano Garzarella wrote:
>> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>>> ...
>>>> An alternative approach, which would perhaps allow us to avoid all this,
>>>> is to re-insert the socket in the unbound list after calling release()
>>>> when we deassign the transport.
>>>>
>>>> WDYT?
>>>
>>> If we can't keep the old state (sk_state, transport, etc) on failed
>>> re-connect() then reverting back to initial state sounds, uhh, like an
>>> option :) I'm not sure how well this aligns with (user's expectations of)
>>> good ol' socket API, but maybe that train has already left.
>>
>> We really want to behave as similar as possible with the other sockets,
>> like AF_INET, so I would try to continue toward that train.
>
>I was worried that such connect()/transport error handling may have some
>user visible side effects, but I guess I was wrong. I mean you can still
>reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
>that's a different issue.
>
>I've tried your suggestion on top of this series. Passes the tests.

Great, thanks!

>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index fa9d1b49599b..4718fe86689d 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> 		vsk->transport->release(vsk);
> 		vsock_deassign_transport(vsk);
>
>+		vsock_addr_unbind(&vsk->local_addr);
>+		vsock_addr_unbind(&vsk->remote_addr);

My only doubt is that if a user did a specific bind() before the
connect, this way we're resetting everything, is that right?

Maybe we need to look better at the release, and prevent it from
removing the socket from the lists as you suggested, maybe adding a
function in af_vsock.c that all transports can call.

Thanks,
Stefano

>+		vsock_insert_unbound(vsk);
>+
> 		/* transport's release() and destruct() can touch some socket
> 		 * state, since we are reassigning the socket to a new transport
> 		 * during vsock_connect(), let's reset these fields to have a
>
>>> Another possibility would be to simply brick the socket on failed (re)connect.
>>
>> I see, though, this is not the behavior of AF_INET for example, right?
>
>Right.
>
>> Do you have time to investigate/fix this problem?
>> If not, I'll try to look into it in the next few days, maybe next week.
>
>I'm happy to help, but it's not like I have any better ideas.
>
>Michal
>
>[1]: E.g. this way:
>```
>from socket import *
>
>MAX_PORT_RETRIES = 24 # net/vmw_vsock/af_vsock.c
>VMADDR_CID_LOCAL = 1
>VMADDR_PORT_ANY = -1
>hold = []
>
>def take_port(port):
>	s = socket(AF_VSOCK, SOCK_SEQPACKET)
>	s.bind((VMADDR_CID_LOCAL, port))
>	hold.append(s)
>	return s
>
>s = take_port(VMADDR_PORT_ANY)
>_, port = s.getsockname()
>for _ in range(MAX_PORT_RETRIES):
>	port += 1
>	take_port(port);
>
>s = socket(AF_VSOCK, SOCK_SEQPACKET)
>err = s.connect_ex((VMADDR_CID_LOCAL, port))
>assert err != 0
>print("ok, connect failed; transport set")
>
>s.bind((VMADDR_CID_LOCAL, port+1))
>s.listen(16)
>```
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment
  2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
                   ` (4 preceding siblings ...)
  2025-01-10  8:35 ` [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space] Stefano Garzarella
@ 2025-01-14 11:50 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 26+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-01-14 11:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, xuanzhuo, bpf, linux-kernel, leonardi, davem, qwerty,
	eperezma, mst, edumazet, kvm, pabeni, stefanha, jasowang, horms,
	v4bel, kuba, mhal, virtualization, bobby.eshleman

Hello:

This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Fri, 10 Jan 2025 09:35:06 +0100 you wrote:
> v1: https://lore.kernel.org/netdev/20250108180617.154053-1-sgarzare@redhat.com/
> v2:
> - Added patch 3 to cancel the virtio close delayed work when de-assigning
>   the transport
> - Added patch 4 to clean the socket state after de-assigning the transport
> - Added patch 5 as suggested by Michael and Hyunwoo Kim. It's based on
>   Hyunwoo Kim and Wongi Lee patch [1] but using WARN_ON and covering more
>   functions
> - Added R-b/T-b tags
> 
> [...]

Here is the summary with links:
  - [net,v2,1/5] vsock/virtio: discard packets if the transport changes
    https://git.kernel.org/netdev/net/c/2cb7c756f605
  - [net,v2,2/5] vsock/bpf: return early if transport is not assigned
    https://git.kernel.org/netdev/net/c/f6abafcd32f9
  - [net,v2,3/5] vsock/virtio: cancel close work in the destructor
    https://git.kernel.org/netdev/net/c/df137da9d6d1
  - [net,v2,4/5] vsock: reset socket state when de-assigning the transport
    https://git.kernel.org/netdev/net/c/a24009bc9be6
  - [net,v2,5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
    https://git.kernel.org/netdev/net/c/91751e248256

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-14 10:16                   ` Stefano Garzarella
@ 2025-01-14 16:31                     ` Michal Luczaj
  2025-01-16  8:57                       ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Luczaj @ 2025-01-14 16:31 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, stable

On 1/14/25 11:16, Stefano Garzarella wrote:
> On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
>> On 1/13/25 16:01, Stefano Garzarella wrote:
>>> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>>>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>>>> ...
>>>>> An alternative approach, which would perhaps allow us to avoid all this,
>>>>> is to re-insert the socket in the unbound list after calling release()
>>>>> when we deassign the transport.
>>>>>
>>>>> WDYT?
>>>>
>>>> If we can't keep the old state (sk_state, transport, etc) on failed
>>>> re-connect() then reverting back to initial state sounds, uhh, like an
>>>> option :) I'm not sure how well this aligns with (user's expectations of)
>>>> good ol' socket API, but maybe that train has already left.
>>>
>>> We really want to behave as similar as possible with the other sockets,
>>> like AF_INET, so I would try to continue toward that train.
>>
>> I was worried that such connect()/transport error handling may have some
>> user visible side effects, but I guess I was wrong. I mean you can still
>> reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
>> that's a different issue.
>>
>> I've tried your suggestion on top of this series. Passes the tests.
> 
> Great, thanks!
> 
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index fa9d1b49599b..4718fe86689d 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>> 		vsk->transport->release(vsk);
>> 		vsock_deassign_transport(vsk);
>>
>> +		vsock_addr_unbind(&vsk->local_addr);
>> +		vsock_addr_unbind(&vsk->remote_addr);
> 
> My only doubt is that if a user did a specific bind() before the
> connect, this way we're resetting everything, is that right?

That is right.

But we aren't changing much. Transport release already removes vsk from
vsock_bound_sockets. So even though vsk->local_addr is untouched (i.e.
vsock_addr_bound() returns `true`), vsk can't be picked by
vsock_find_bound_socket(). User can't bind() it again, either.

And when patched as above: bind() works as "expected", but socket is pretty
much useless, anyway. If I'm correct, the first failing connect() trips
virtio_transport_recv_connecting(), which sets `sk->sk_err`. I don't see it
being reset. Does the vsock suppose to keep sk_err state once set?

Currently only AF_VSOCK throws ConnectionResetError:
```
from socket import *

def test(family, addr):
	s = socket(family, SOCK_STREAM)
	assert s.connect_ex(addr) != 0

	lis = socket(family, SOCK_STREAM)
	lis.bind(addr)
	lis.listen()
	s.connect(addr)

	p, _ = lis.accept()
	p.send(b'x')
	assert s.recv(1) == b'x'

test(AF_INET, ('127.0.0.1', 2000))
test(AF_UNIX, '\0/tmp/foo')
test(AF_VSOCK, (1, 2000)) # VMADDR_CID_LOCAL
```

> Maybe we need to look better at the release, and prevent it from
> removing the socket from the lists as you suggested, maybe adding a
> function in af_vsock.c that all transports can call.

I'd be happy to submit a proper patch, but it would be helpful to decide
how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you
rather have that UAF plugged first?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-14 16:31                     ` Michal Luczaj
@ 2025-01-16  8:57                       ` Stefano Garzarella
  2025-01-17 22:02                         ` Michal Luczaj
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2025-01-16  8:57 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, stable

On Tue, Jan 14, 2025 at 05:31:08PM +0100, Michal Luczaj wrote:
>On 1/14/25 11:16, Stefano Garzarella wrote:
>> On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
>>> On 1/13/25 16:01, Stefano Garzarella wrote:
>>>> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>>>>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>>>>> ...
>>>>>> An alternative approach, which would perhaps allow us to avoid all this,
>>>>>> is to re-insert the socket in the unbound list after calling release()
>>>>>> when we deassign the transport.
>>>>>>
>>>>>> WDYT?
>>>>>
>>>>> If we can't keep the old state (sk_state, transport, etc) on failed
>>>>> re-connect() then reverting back to initial state sounds, uhh, like an
>>>>> option :) I'm not sure how well this aligns with (user's expectations of)
>>>>> good ol' socket API, but maybe that train has already left.
>>>>
>>>> We really want to behave as similar as possible with the other sockets,
>>>> like AF_INET, so I would try to continue toward that train.
>>>
>>> I was worried that such connect()/transport error handling may have some
>>> user visible side effects, but I guess I was wrong. I mean you can still
>>> reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
>>> that's a different issue.
>>>
>>> I've tried your suggestion on top of this series. Passes the tests.
>>
>> Great, thanks!
>>
>>>
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index fa9d1b49599b..4718fe86689d 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>>> 		vsk->transport->release(vsk);
>>> 		vsock_deassign_transport(vsk);
>>>
>>> +		vsock_addr_unbind(&vsk->local_addr);
>>> +		vsock_addr_unbind(&vsk->remote_addr);
>>
>> My only doubt is that if a user did a specific bind() before the
>> connect, this way we're resetting everything, is that right?
>
>That is right.
>
>But we aren't changing much. Transport release already removes vsk from
>vsock_bound_sockets. So even though vsk->local_addr is untouched (i.e.
>vsock_addr_bound() returns `true`), vsk can't be picked by
>vsock_find_bound_socket(). User can't bind() it again, either.

Okay, I see, so maybe for now makes sense to merge your patch, to fix 
the UAF fist.

>
>And when patched as above: bind() works as "expected", but socket is pretty
>much useless, anyway. If I'm correct, the first failing connect() trips
>virtio_transport_recv_connecting(), which sets `sk->sk_err`. I don't see it
>being reset. Does the vsock suppose to keep sk_err state once set?

Nope, I think this is another thing to fix.

>
>Currently only AF_VSOCK throws ConnectionResetError:
>```
>from socket import *
>
>def test(family, addr):
>	s = socket(family, SOCK_STREAM)
>	assert s.connect_ex(addr) != 0
>
>	lis = socket(family, SOCK_STREAM)
>	lis.bind(addr)
>	lis.listen()
>	s.connect(addr)
>
>	p, _ = lis.accept()
>	p.send(b'x')
>	assert s.recv(1) == b'x'
>
>test(AF_INET, ('127.0.0.1', 2000))
>test(AF_UNIX, '\0/tmp/foo')
>test(AF_VSOCK, (1, 2000)) # VMADDR_CID_LOCAL
>```
>
>> Maybe we need to look better at the release, and prevent it from
>> removing the socket from the lists as you suggested, maybe adding a
>> function in af_vsock.c that all transports can call.
>
>I'd be happy to submit a proper patch, but it would be helpful to decide
>how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you
>rather have that UAF plugged first?
>

I'd say, let's fix the UAF first, then fix the behaviour (also in a
single series, but I prefer 2 separate patches if possible).
About that, AF_VSOCK was started with the goal of following AF_INET as
closely as possible, and the test suite should serve that as well, so if
we can solve this problem and get closer to AF_INET, possibly even
adding a dedicated test, that would be ideal!

Thank you very much for the help!
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
  2025-01-16  8:57                       ` Stefano Garzarella
@ 2025-01-17 22:02                         ` Michal Luczaj
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Luczaj @ 2025-01-17 22:02 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Xuan Zhuo, bpf, linux-kernel, Luigi Leonardi,
	David S. Miller, Wongi Lee, Eugenio Pérez,
	Michael S. Tsirkin, Eric Dumazet, kvm, Paolo Abeni,
	Stefan Hajnoczi, Jason Wang, Simon Horman, Hyunwoo Kim,
	Jakub Kicinski, virtualization, stable

On 1/16/25 09:57, Stefano Garzarella wrote:
> On Tue, Jan 14, 2025 at 05:31:08PM +0100, Michal Luczaj wrote:
>>> ...
>>> Maybe we need to look better at the release, and prevent it from
>>> removing the socket from the lists as you suggested, maybe adding a
>>> function in af_vsock.c that all transports can call.
>>
>> I'd be happy to submit a proper patch, but it would be helpful to decide
>> how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you
>> rather have that UAF plugged first?
>>
> 
> I'd say, let's fix the UAF first, then fix the behaviour (also in a
> single series, but I prefer 2 separate patches if possible).
> About that, AF_VSOCK was started with the goal of following AF_INET as
> closely as possible, and the test suite should serve that as well, so if
> we can solve this problem and get closer to AF_INET, possibly even
> adding a dedicated test, that would be ideal!

All right, so let's keep the binding and allow removal from (un)bound list
only on socket destruction. This is transport independent, changes are
pretty minimal and, well, keeps the binding. Mixes well with the connect()
behaviour fix.

Let me know what you think:
https://lore.kernel.org/netdev/20250117-vsock-transport-vs-autobind-v1-0-c802c803762d@rbox.co/

Thanks,
Michal


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-01-17 22:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10  8:35 [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment Stefano Garzarella
2025-01-10  8:35 ` [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes Stefano Garzarella
2025-01-10 22:46   ` Hyunwoo Kim
2025-01-12 22:42   ` Michal Luczaj
2025-01-13  8:57     ` Stefano Garzarella
2025-01-13  9:07       ` Stefano Garzarella
2025-01-13 10:12         ` Michal Luczaj
2025-01-13 11:05           ` Stefano Garzarella
2025-01-13 13:51             ` Michal Luczaj
2025-01-13 15:01               ` Stefano Garzarella
2025-01-14  0:09                 ` Michal Luczaj
2025-01-14 10:16                   ` Stefano Garzarella
2025-01-14 16:31                     ` Michal Luczaj
2025-01-16  8:57                       ` Stefano Garzarella
2025-01-17 22:02                         ` Michal Luczaj
2025-01-10  8:35 ` [PATCH net v2 2/5] vsock/bpf: return early if transport is not assigned Stefano Garzarella
2025-01-10  8:35 ` [PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor Stefano Garzarella
2025-01-10 10:57   ` Luigi Leonardi
2025-01-10 22:48   ` Hyunwoo Kim
2025-01-10  8:35 ` [PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport Stefano Garzarella
2025-01-10 10:56   ` Luigi Leonardi
2025-01-10 11:25     ` Stefano Garzarella
2025-01-10  8:35 ` [PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space] Stefano Garzarella
2025-01-10  9:49   ` Luigi Leonardi
2025-01-10 22:52   ` Hyunwoo Kim
2025-01-14 11:50 ` [PATCH net v2 0/5] vsock: some fixes due to transport de-assignment patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).