* [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock
@ 2025-10-23 18:27 Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
` (14 more replies)
0 siblings, 15 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
This series adds namespace support to vhost-vsock and loopback. It does
not add namespaces to any of the other guest transports (virtio-vsock,
hyperv, or vmci).
The current revision supports two modes: local and global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).
The mode is set using /proc/sys/net/vsock/ns_mode.
Modes are per-netns and write-once. This allows a system to configure
namespaces independently (some may share CIDs, others are completely
isolated). This also supports future possible mixed use cases, where
there may be namespaces in global mode spinning up VMs while there are
mixed mode namespaces that provide services to the VMs, but are not
allowed to allocate from the global CID pool (this mode not implemented
in this series).
If a socket or VM is created when a namespace is global but the
namespace changes to local, the socket or VM will continue working
normally. That is, the socket or VM assumes the mode behavior of the
namespace at the time the socket/VM was created. The original mode is
captured in vsock_create() and so occurs at the time of socket(2) and
accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This
prevents a socket/VM connection from suddenly breaking due to a
namespace mode change. Any new sockets/VMs created after the mode change
will adopt the new mode's behavior.
Additionally, added tests for the new namespace features:
tools/testing/selftests/vsock/vmtest.sh
1..30
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_host_vsock_ns_mode_ok
ok 5 ns_host_vsock_ns_mode_write_once_ok
ok 6 ns_global_same_cid_fails
ok 7 ns_local_same_cid_ok
ok 8 ns_global_local_same_cid_ok
ok 9 ns_local_global_same_cid_ok
ok 10 ns_diff_global_host_connect_to_global_vm_ok
ok 11 ns_diff_global_host_connect_to_local_vm_fails
ok 12 ns_diff_global_vm_connect_to_global_host_ok
ok 13 ns_diff_global_vm_connect_to_local_host_fails
ok 14 ns_diff_local_host_connect_to_local_vm_fails
ok 15 ns_diff_local_vm_connect_to_local_host_fails
ok 16 ns_diff_global_to_local_loopback_local_fails
ok 17 ns_diff_local_to_global_loopback_fails
ok 18 ns_diff_local_to_local_loopback_fails
ok 19 ns_diff_global_to_global_loopback_ok
ok 20 ns_same_local_loopback_ok
ok 21 ns_same_local_host_connect_to_local_vm_ok
ok 22 ns_same_local_vm_connect_to_local_host_ok
ok 23 ns_mode_change_connection_continue_vm_ok
ok 24 ns_mode_change_connection_continue_host_ok
ok 25 ns_mode_change_connection_continue_both_ok
ok 26 ns_delete_vm_ok
ok 27 ns_delete_host_ok
ok 28 ns_delete_both_ok
ok 29 ns_loopback_global_global_late_module_load_ok
ok 30 ns_loopback_local_local_late_module_load_fails
SUMMARY: PASS=30 SKIP=0 FAIL=0
Dependent on series:
https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com/
Thanks again for everyone's help and reviews!
Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
To: Shuah Khan <shuah@kernel.org>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Simon Horman <horms@kernel.org>
To: Stefan Hajnoczi <stefanha@redhat.com>
To: Michael S. Tsirkin <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Eugenio Pérez <eperezma@redhat.com>
To: K. Y. Srinivasan <kys@microsoft.com>
To: Haiyang Zhang <haiyangz@microsoft.com>
To: Wei Liu <wei.liu@kernel.org>
To: Dexuan Cui <decui@microsoft.com>
To: Bryan Tan <bryan-bt.tan@broadcom.com>
To: Vishnu Dasa <vishnu.dasa@broadcom.com>
To: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: virtualization@lists.linux.dev
Cc: netdev@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: berrange@redhat.com
Changes in v8:
- Break generic cleanup/refactoring patches into standalone series,
remove those from this series
- Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com/
- Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com
Changes in v7:
- fix hv_sock build
- break out vmtest patches into distinct, more well-scoped patches
- change `orig_net_mode` to `net_mode`
- many fixes and style changes in per-patch change sets (see individual
patches for specific changes)
- optimize `virtio_vsock_skb_cb` layout
- update commit messages with more useful descriptions
- vsock_loopback: use orig_net_mode instead of current net mode
- add tests for edge cases (ns deletion, mode changing, loopback module
load ordering)
- Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com
Changes in v6:
- define behavior when mode changes to local while socket/VM is alive
- af_vsock: clarify description of CID behavior
- af_vsock: use stronger langauge around CID rules (dont use "may")
- af_vsock: improve naming of buf/buffer
- af_vsock: improve string length checking on proc writes
- vsock_loopback: add space in struct to clarify lock protection
- vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net() instead of sock_net()
- vsock_loopback: set loopback to NULL after kfree()
- vsock_loopback: use pernet_operations and remove callback mechanism
- vsock_loopback: add macros for "global" and "local"
- vsock_loopback: fix length checking
- vmtest.sh: check for namespace support in vmtest.sh
- Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com
Changes in v5:
- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
- vsock_global_net -> vsock_global_dummy_net
- fix netns lookup in vhost_vsock to respect pid namespaces
- add callbacks for vsock_loopback to avoid circular dependency
- vmtest.sh loads vsock_loopback module
- remove vsock_net_mode_can_set()
- change vsock_net_write_mode() to return true/false based on success
- make vsock_net_mode enum instead of u8
- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
---
Bobby Eshleman (14):
vsock: a per-net vsock NS mode state
vsock/virtio: pack struct virtio_vsock_skb_cb
vsock: add netns to vsock skb cb
vsock: add netns to vsock core
vsock/loopback: add netns support
vsock/virtio: add netns to virtio transport common
vhost/vsock: add netns support
selftests/vsock: add namespace helpers to vmtest.sh
selftests/vsock: prepare vm management helpers for namespaces
selftests/vsock: add tests for proc sys vsock ns_mode
selftests/vsock: add namespace tests for CID collisions
selftests/vsock: add tests for host <-> vm connectivity with namespaces
selftests/vsock: add tests for namespace deletion and mode changes
selftests/vsock: add tests for module loading order
MAINTAINERS | 1 +
drivers/vhost/vsock.c | 48 +-
include/linux/virtio_vsock.h | 47 +-
include/net/af_vsock.h | 70 ++-
include/net/net_namespace.h | 4 +
include/net/netns/vsock.h | 22 +
net/vmw_vsock/af_vsock.c | 264 +++++++-
net/vmw_vsock/virtio_transport.c | 7 +-
net/vmw_vsock/virtio_transport_common.c | 21 +-
net/vmw_vsock/vsock_loopback.c | 89 ++-
tools/testing/selftests/vsock/vmtest.sh | 1044 ++++++++++++++++++++++++++++++-
11 files changed, 1532 insertions(+), 85 deletions(-)
---
base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59
change-id: 20250325-vsock-vmtest-b3a21d2102c2
prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com>
prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5
prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85
prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449
prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61
prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909
prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd
prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c
prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671
prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325
prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf
prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698
prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df
Best regards,
--
Bobby Eshleman <bobbyeshleman@meta.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:16 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 02/14] vsock/virtio: pack struct virtio_vsock_skb_cb Bobby Eshleman
` (13 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add the per-net vsock NS mode state. This only adds the structure for
holding the mode and some of the functions for setting/getting and
checking the mode, but does not integrate the functionality yet.
A "net_mode" field is added to vsock_sock to store the mode of the
namespace when the vsock_sock was created. In order to evaluate
namespace mode rules we need to know both a) which namespace the
endpoints are in, and b) what mode that namespace had when the endpoints
were created. This allows us to handle the changing of modes from global
to local *after* a socket has been created by remembering that the mode
was global when the socket was created. If we were to use the current
net's mode instead, then the lookup would fail and the socket would
break.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v7:
- clarify vsock_net_check_mode() comments
- change to `orig_net_mode == VSOCK_NET_MODE_GLOBAL && orig_net_mode == vsk->orig_net_mode`
- remove extraneous explanation of `orig_net_mode`
- rename `written` to `mode_locked`
- rename `vsock_hdr` to `sysctl_hdr`
- change `orig_net_mode` to `net_mode`
- make vsock_net_check_mode() more generic by taking just net pointers
and modes, instead of a vsock_sock ptr, for reuse by transports
(e.g., vhost_vsock)
Changes in v6:
- add orig_net_mode to store mode at creation time which will be used to
avoid breakage when namespace changes mode during socket/VM lifespan
Changes in v5:
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- change from net->vsock.ns_mode to net->vsock.mode
- change vsock_net_set_mode() to vsock_net_write_mode()
- vsock_net_write_mode() returns bool for write success to avoid
need to use vsock_net_mode_can_set()
- remove vsock_net_mode_can_set()
---
MAINTAINERS | 1 +
include/net/af_vsock.h | 56 +++++++++++++++++++++++++++++++++++++++++++++
include/net/net_namespace.h | 4 ++++
include/net/netns/vsock.h | 20 ++++++++++++++++
4 files changed, 81 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index ea72b3bd2248..dd765bbf79ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27070,6 +27070,7 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/vhost/vsock.c
F: include/linux/virtio_vsock.h
+F: include/net/netns/vsock.h
F: include/uapi/linux/virtio_vsock.h
F: net/vmw_vsock/virtio_transport.c
F: net/vmw_vsock/virtio_transport_common.c
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d40e978126e3..bce5389ef742 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -10,6 +10,7 @@
#include <linux/kernel.h>
#include <linux/workqueue.h>
+#include <net/netns/vsock.h>
#include <net/sock.h>
#include <uapi/linux/vm_sockets.h>
@@ -65,6 +66,7 @@ struct vsock_sock {
u32 peer_shutdown;
bool sent_request;
bool ignore_connecting_rst;
+ enum vsock_net_mode net_mode;
/* Protected by lock_sock(sk) */
u64 buffer_size;
@@ -256,4 +258,58 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
{
return t->msgzerocopy_allow && t->msgzerocopy_allow();
}
+
+static inline enum vsock_net_mode vsock_net_mode(struct net *net)
+{
+ enum vsock_net_mode ret;
+
+ spin_lock_bh(&net->vsock.lock);
+ ret = net->vsock.mode;
+ spin_unlock_bh(&net->vsock.lock);
+ return ret;
+}
+
+static inline bool vsock_net_write_mode(struct net *net, u8 mode)
+{
+ bool ret;
+
+ spin_lock_bh(&net->vsock.lock);
+
+ if (net->vsock.mode_locked) {
+ ret = false;
+ goto skip;
+ }
+
+ net->vsock.mode = mode;
+ net->vsock.mode_locked = true;
+ ret = true;
+
+skip:
+ spin_unlock_bh(&net->vsock.lock);
+ return ret;
+}
+
+/* Return true if two namespaces and modes pass the mode rules. Otherwise,
+ * return false.
+ *
+ * ns0 and ns1 are the namespaces being checked.
+ * mode0 and mode1 are the vsock namespace modes of ns0 and ns1.
+ *
+ * Read more about modes in the comment header of net/vmw_vsock/af_vsock.c.
+ */
+static inline bool vsock_net_check_mode(struct net *ns0, enum vsock_net_mode mode0,
+ struct net *ns1, enum vsock_net_mode mode1)
+{
+ /* Any vsocks within the same network namespace are always reachable,
+ * regardless of the mode.
+ */
+ if (net_eq(ns0, ns1))
+ return true;
+
+ /*
+ * If the network namespaces differ, vsocks are only reachable if both
+ * were created in VSOCK_NET_MODE_GLOBAL mode.
+ */
+ return mode0 == VSOCK_NET_MODE_GLOBAL && mode0 == mode1;
+}
#endif /* __AF_VSOCK_H__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index cb664f6e3558..66d3de1d935f 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -37,6 +37,7 @@
#include <net/netns/smc.h>
#include <net/netns/bpf.h>
#include <net/netns/mctp.h>
+#include <net/netns/vsock.h>
#include <net/net_trackers.h>
#include <linux/ns_common.h>
#include <linux/idr.h>
@@ -196,6 +197,9 @@ struct net {
/* Move to a better place when the config guard is removed. */
struct mutex rtnl_mutex;
#endif
+#if IS_ENABLED(CONFIG_VSOCKETS)
+ struct netns_vsock vsock;
+#endif
} __randomize_layout;
#include <linux/seq_file_net.h>
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
new file mode 100644
index 000000000000..c9a438ad52f2
--- /dev/null
+++ b/include/net/netns/vsock.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_NET_NAMESPACE_VSOCK_H
+#define __NET_NET_NAMESPACE_VSOCK_H
+
+#include <linux/types.h>
+
+enum vsock_net_mode {
+ VSOCK_NET_MODE_GLOBAL,
+ VSOCK_NET_MODE_LOCAL,
+};
+
+struct netns_vsock {
+ struct ctl_table_header *sysctl_hdr;
+ spinlock_t lock;
+
+ /* protected by lock */
+ enum vsock_net_mode mode;
+ bool mode_locked;
+};
+#endif /* __NET_NET_NAMESPACE_VSOCK_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 02/14] vsock/virtio: pack struct virtio_vsock_skb_cb
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:16 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 03/14] vsock: add netns to vsock skb cb Bobby Eshleman
` (12 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Reduce holes in struct virtio_vsock_skb_cb. As this struct continues to
grow, we want to keep it trimmed down so it doesn't exceed the size of
skb->cb (currently 48 bytes). Eliminating the 2 byte hole provides an
additional two bytes for new fields at the end of the structure. It does
not shrink the total size, however.
Future work could include combining fields like reply and tap_delivered
into a single bitfield, but currently doing so will not make the total
struct size smaller (although, would extend the tail-end padding area by
one byte).
Before this patch:
struct virtio_vsock_skb_cb {
bool reply; /* 0 1 */
bool tap_delivered; /* 1 1 */
/* XXX 2 bytes hole, try to pack */
u32 offset; /* 4 4 */
/* size: 8, cachelines: 1, members: 3 */
/* sum members: 6, holes: 1, sum holes: 2 */
/* last cacheline: 8 bytes */
};
;
After this patch:
struct virtio_vsock_skb_cb {
u32 offset; /* 0 4 */
bool reply; /* 4 1 */
bool tap_delivered; /* 5 1 */
/* size: 8, cachelines: 1, members: 3 */
/* padding: 2 */
/* last cacheline: 8 bytes */
};
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
include/linux/virtio_vsock.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 0c67543a45c8..87cf4dcac78a 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -10,9 +10,9 @@
#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
struct virtio_vsock_skb_cb {
+ u32 offset;
bool reply;
bool tap_delivered;
- u32 offset;
};
#define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb))
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 03/14] vsock: add netns to vsock skb cb
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 02/14] vsock/virtio: pack struct virtio_vsock_skb_cb Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:17 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 04/14] vsock: add netns to vsock core Bobby Eshleman
` (11 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add a net pointer and net_mode to the vsock skb and helpers for
getting/setting them. When skbs are received the transport needs a way
to tell the vsock layer and/or virtio common layer which namespace and
what namespace mode the packet belongs to. This will be used by those
upper layers for finding the correct socket object. This patch stashes
these fields in the skb control buffer.
This extends virtio_vsock_skb_cb to 24 bytes:
struct virtio_vsock_skb_cb {
struct net * net; /* 0 8 */
enum vsock_net_mode net_mode; /* 8 4 */
u32 offset; /* 12 4 */
bool reply; /* 16 1 */
bool tap_delivered; /* 17 1 */
/* size: 24, cachelines: 1, members: 5 */
/* padding: 6 */
/* last cacheline: 24 bytes */
};
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v7:
- rename `orig_net_mode` to `net_mode`
- update commit message with a more complete explanation of changes
Changes in v5:
- some diff context change due to rebase to current net-next
---
include/linux/virtio_vsock.h | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 87cf4dcac78a..7f334a32133c 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -10,6 +10,8 @@
#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
struct virtio_vsock_skb_cb {
+ struct net *net;
+ enum vsock_net_mode net_mode;
u32 offset;
bool reply;
bool tap_delivered;
@@ -130,6 +132,27 @@ static inline size_t virtio_vsock_skb_len(struct sk_buff *skb)
return (size_t)(skb_end_pointer(skb) - skb->head);
}
+static inline struct net *virtio_vsock_skb_net(struct sk_buff *skb)
+{
+ return VIRTIO_VSOCK_SKB_CB(skb)->net;
+}
+
+static inline void virtio_vsock_skb_set_net(struct sk_buff *skb, struct net *net)
+{
+ VIRTIO_VSOCK_SKB_CB(skb)->net = net;
+}
+
+static inline enum vsock_net_mode virtio_vsock_skb_net_mode(struct sk_buff *skb)
+{
+ return VIRTIO_VSOCK_SKB_CB(skb)->net_mode;
+}
+
+static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
+ enum vsock_net_mode net_mode)
+{
+ VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
+}
+
/* Dimension the RX SKB so that the entire thing fits exactly into
* a single 4KiB page. This avoids wasting memory due to alloc_skb()
* rounding up to the next page order and also means that we
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 04/14] vsock: add netns to vsock core
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (2 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 03/14] vsock: add netns to vsock skb cb Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:18 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 05/14] vsock/loopback: add netns support Bobby Eshleman
` (10 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add netns logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".
Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
used by transports that do not support namespacing. This dummy always
has mode "global" to preserve previous CID behavior.
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
accepts the "global" or "local" mode strings.
The transports (besides vhost) are modified to use the global dummy,
which makes them behave as if always in the global namespace. Vhost is
an exception because it inherits its namespace from the process that
opens the vhost device.
Add netns functionality (initialization, passing to transports, procfs,
etc...) to the af_vsock socket layer. Later patches that add netns
support to transports depend on this patch.
seqpacket_allow() callbacks are modified to take a vsk so that transport
implementations can inspect sock_net(sk) and vsk->net_mode when performing
lookups (e.g., vhost does this in its future netns patch). Because the
API change affects all transports, it seemed more appropriate to make
this internal API change in the "vsock core" patch then in the "vhost"
patch.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v7:
- hv_sock: fix hyperv build error
- explain why vhost does not use the dummy
- explain usage of __vsock_global_dummy_net
- explain why VSOCK_NET_MODE_STR_MAX is 8 characters
- use switch-case in vsock_net_mode_string()
- avoid changing transports as much as possible
- add vsock_find_{bound,connected}_socket_net()
- rename `vsock_hdr` to `sysctl_hdr`
- add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
global mode for virtio-vsock, move skb->cb zero-ing into wrapper
- explain seqpacket_allow() change
- move net setting to __vsock_create() instead of vsock_create() so
that child sockets also have their net assigned upon accept()
Changes in v6:
- unregister sysctl ops in vsock_exit()
- af_vsock: clarify description of CID behavior
- af_vsock: fix buf vs buffer naming, and length checking
- af_vsock: fix length checking w/ correct ctl_table->maxlen
Changes in v5:
- vsock_global_net() -> vsock_global_dummy_net()
- update comments for new uAPI
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- add prototype changes so patch remains compilable
---
drivers/vhost/vsock.c | 4 +-
include/linux/virtio_vsock.h | 21 ++++
include/net/af_vsock.h | 14 ++-
net/vmw_vsock/af_vsock.c | 264 ++++++++++++++++++++++++++++++++++++---
net/vmw_vsock/virtio_transport.c | 7 +-
net/vmw_vsock/vsock_loopback.c | 4 +-
6 files changed, 288 insertions(+), 26 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ae01457ea2cd..34adf0cf9124 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
return true;
}
-static bool vhost_transport_seqpacket_allow(u32 remote_cid);
+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport vhost_transport = {
.transport = {
@@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
.send_pkt = vhost_transport_send_pkt,
};
-static bool vhost_transport_seqpacket_allow(u32 remote_cid)
+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
{
struct vhost_vsock *vsock;
bool seqpacket_allow = false;
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 7f334a32133c..29290395054c 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
}
+static inline struct sk_buff *
+virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
+{
+ struct sk_buff *skb;
+
+ skb = virtio_vsock_alloc_linear_skb(size, mask);
+ if (!skb)
+ return NULL;
+
+ memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
+
+ /* virtio-vsock does not yet support namespaces, so on receive
+ * we force legacy namespace behavior using the global dummy net
+ * and global net mode.
+ */
+ virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
+ virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
+
+ return skb;
+}
+
/* Dimension the RX SKB so that the entire thing fits exactly into
* a single 4KiB page. This avoids wasting memory due to alloc_skb()
* rounding up to the next page order and also means that we
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index bce5389ef742..69bb70c3c0fd 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -145,7 +145,7 @@ struct vsock_transport {
int flags);
int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
size_t len);
- bool (*seqpacket_allow)(u32 remote_cid);
+ bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
/* Notification. */
@@ -218,6 +218,12 @@ void vsock_remove_connected(struct vsock_sock *vsk);
struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst);
+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, struct net *net,
+ enum vsock_net_mode net_mode);
+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
+ struct sockaddr_vm *dst,
+ struct net *net,
+ enum vsock_net_mode net_mode);
void vsock_remove_sock(struct vsock_sock *vsk);
void vsock_for_each_connected_socket(struct vsock_transport *transport,
void (*fn)(struct sock *sk));
@@ -259,6 +265,12 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
return t->msgzerocopy_allow && t->msgzerocopy_allow();
}
+extern struct net __vsock_global_dummy_net;
+static inline struct net *vsock_global_dummy_net(void)
+{
+ return &__vsock_global_dummy_net;
+}
+
static inline enum vsock_net_mode vsock_net_mode(struct net *net)
{
enum vsock_net_mode ret;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4c2db6cca557..656a78810c68 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -83,6 +83,35 @@
* TCP_ESTABLISHED - connected
* TCP_CLOSING - disconnecting
* TCP_LISTEN - listening
+ *
+ * - Namespaces in vsock support two different modes configured
+ * through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
+ * Each mode defines how the namespace interacts with CIDs.
+ * /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
+ * and locked down by a namespace manager. The default is "global". The mode
+ * is set per-namespace.
+ *
+ * The modes affect the allocation and accessibility of CIDs as follows:
+ *
+ * - global - access and allocation are all system-wide
+ * - all CID allocation from global namespaces draw from the same
+ * system-wide pool
+ * - if one global namespace has already allocated some CID, another
+ * global namespace will not be able to allocate the same CID
+ * - global mode AF_VSOCK sockets can reach any VM or socket in any global
+ * namespace, they are not contained to only their own namespace
+ * - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
+ * sockets in any local mode namespace
+ * - local - access and allocation are contained within the namespace
+ * - CID allocation draws only from a private pool local only to the
+ * namespace, and does not affect the CIDs available for allocation in any
+ * other namespace (global or local)
+ * - VMs in a local namespace do not collide with CIDs in any other local
+ * namespace or any global namespace. For example, if a VM in a local mode
+ * namespace is given CID 10, then CID 10 is still available for
+ * allocation in any other namespace, but not in the same namespace
+ * - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
+ * other sockets within their own namespace.
*/
#include <linux/compat.h>
@@ -100,6 +129,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/net.h>
+#include <linux/proc_fs.h>
#include <linux/poll.h>
#include <linux/random.h>
#include <linux/skbuff.h>
@@ -111,9 +141,18 @@
#include <linux/workqueue.h>
#include <net/sock.h>
#include <net/af_vsock.h>
+#include <net/netns/vsock.h>
#include <uapi/linux/vm_sockets.h>
#include <uapi/asm-generic/ioctls.h>
+#define VSOCK_NET_MODE_STR_GLOBAL "global"
+#define VSOCK_NET_MODE_STR_LOCAL "local"
+
+/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
+ * The newline is added by proc_dostring() for read operations.
+ */
+#define VSOCK_NET_MODE_STR_MAX 8
+
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
@@ -149,6 +188,15 @@ static const struct vsock_transport *transport_dgram;
static const struct vsock_transport *transport_local;
static DEFINE_MUTEX(vsock_register_mutex);
+/* This net is used only for transports that do support namespaces. It is never
+ * registered with the namespace subsystem and always has
+ * VSOCK_NET_MODE_GLOBAL. Pass this net to the net lookup functions (e.g.,
+ * vsock_find_bound_socket_net()) when you want to force global-mode or the
+ * same behavior as before namespaces were supported.
+ */
+struct net __vsock_global_dummy_net;
+EXPORT_SYMBOL_GPL(__vsock_global_dummy_net);
+
/**** UTILS ****/
/* Each bound VSocket is stored in the bind hash table and each connected
@@ -235,33 +283,44 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
sock_put(&vsk->sk);
}
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
+ struct net *net,
+ enum vsock_net_mode net_mode)
{
struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
- if (vsock_addr_equals_addr(addr, &vsk->local_addr))
- return sk_vsock(vsk);
+ struct sock *sk = sk_vsock(vsk);
+
+ if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
+ vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode))
+ return sk;
if (addr->svm_port == vsk->local_addr.svm_port &&
(vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
- addr->svm_cid == VMADDR_CID_ANY))
- return sk_vsock(vsk);
+ addr->svm_cid == VMADDR_CID_ANY) &&
+ vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode))
+ return sk;
}
return NULL;
}
-static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
- struct sockaddr_vm *dst)
+static struct sock *__vsock_find_connected_socket_net(struct sockaddr_vm *src,
+ struct sockaddr_vm *dst,
+ struct net *net,
+ enum vsock_net_mode net_mode)
{
struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
connected_table) {
+ struct sock *sk = sk_vsock(vsk);
+
if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
- dst->svm_port == vsk->local_addr.svm_port) {
- return sk_vsock(vsk);
+ dst->svm_port == vsk->local_addr.svm_port &&
+ vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode)) {
+ return sk;
}
}
@@ -304,12 +363,14 @@ void vsock_remove_connected(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_remove_connected);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
+ struct net *net,
+ enum vsock_net_mode net_mode)
{
struct sock *sk;
spin_lock_bh(&vsock_table_lock);
- sk = __vsock_find_bound_socket(addr);
+ sk = __vsock_find_bound_socket_net(addr, net, net_mode);
if (sk)
sock_hold(sk);
@@ -317,15 +378,24 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
return sk;
}
+EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
+
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
+{
+ return vsock_find_bound_socket_net(addr, vsock_global_dummy_net(),
+ VSOCK_NET_MODE_GLOBAL);
+}
EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
-struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
- struct sockaddr_vm *dst)
+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
+ struct sockaddr_vm *dst,
+ struct net *net,
+ enum vsock_net_mode net_mode)
{
struct sock *sk;
spin_lock_bh(&vsock_table_lock);
- sk = __vsock_find_connected_socket(src, dst);
+ sk = __vsock_find_connected_socket_net(src, dst, net, net_mode);
if (sk)
sock_hold(sk);
@@ -333,6 +403,15 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
return sk;
}
+EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
+
+struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
+ struct sockaddr_vm *dst)
+{
+ return vsock_find_connected_socket_net(src, dst,
+ vsock_global_dummy_net(),
+ VSOCK_NET_MODE_GLOBAL);
+}
EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
void vsock_remove_sock(struct vsock_sock *vsk)
@@ -528,7 +607,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
if (sk->sk_type == SOCK_SEQPACKET) {
if (!new_transport->seqpacket_allow ||
- !new_transport->seqpacket_allow(remote_cid)) {
+ !new_transport->seqpacket_allow(vsk, remote_cid)) {
module_put(new_transport->module);
return -ESOCKTNOSUPPORT;
}
@@ -676,6 +755,7 @@ static void vsock_pending_work(struct work_struct *work)
static int __vsock_bind_connectible(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
+ struct net *net = sock_net(sk_vsock(vsk));
static u32 port;
struct sockaddr_vm new_addr;
@@ -695,7 +775,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
- if (!__vsock_find_bound_socket(&new_addr)) {
+ if (!__vsock_find_bound_socket_net(&new_addr, net,
+ vsk->net_mode)) {
found = true;
break;
}
@@ -712,7 +793,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return -EACCES;
}
- if (__vsock_find_bound_socket(&new_addr))
+ if (__vsock_find_bound_socket_net(&new_addr, net,
+ vsk->net_mode))
return -EADDRINUSE;
}
@@ -836,6 +918,8 @@ static struct sock *__vsock_create(struct net *net,
vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE;
}
+ vsk->net_mode = vsock_net_mode(net);
+
return sk;
}
@@ -2636,6 +2720,142 @@ static struct miscdevice vsock_device = {
.fops = &vsock_device_ops,
};
+static int vsock_net_mode_string(const struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ char data[VSOCK_NET_MODE_STR_MAX] = {0};
+ enum vsock_net_mode mode;
+ struct ctl_table tmp;
+ struct net *net;
+ int ret;
+
+ if (!table->data || !table->maxlen || !*lenp) {
+ *lenp = 0;
+ return 0;
+ }
+
+ net = current->nsproxy->net_ns;
+ tmp = *table;
+ tmp.data = data;
+
+ if (!write) {
+ const char *p;
+
+ mode = vsock_net_mode(net);
+
+ switch (mode) {
+ case VSOCK_NET_MODE_GLOBAL:
+ p = VSOCK_NET_MODE_STR_GLOBAL;
+ break;
+ case VSOCK_NET_MODE_LOCAL:
+ p = VSOCK_NET_MODE_STR_LOCAL;
+ break;
+ default:
+ WARN_ONCE(true, "netns has invalid vsock mode");
+ *lenp = 0;
+ return 0;
+ }
+
+ strscpy(data, p, sizeof(data));
+ tmp.maxlen = strlen(p);
+ }
+
+ ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
+ if (ret)
+ return ret;
+
+ if (write) {
+ if (*lenp >= sizeof(data))
+ return -EINVAL;
+
+ if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
+ mode = VSOCK_NET_MODE_GLOBAL;
+ else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
+ mode = VSOCK_NET_MODE_LOCAL;
+ else
+ return -EINVAL;
+
+ if (!vsock_net_write_mode(net, mode))
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+static struct ctl_table vsock_table[] = {
+ {
+ .procname = "ns_mode",
+ .data = &init_net.vsock.mode,
+ .maxlen = VSOCK_NET_MODE_STR_MAX,
+ .mode = 0644,
+ .proc_handler = vsock_net_mode_string
+ },
+};
+
+static int __net_init vsock_sysctl_register(struct net *net)
+{
+ struct ctl_table *table;
+
+ if (net_eq(net, &init_net)) {
+ table = vsock_table;
+ } else {
+ table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);
+ if (!table)
+ goto err_alloc;
+
+ table[0].data = &net->vsock.mode;
+ }
+
+ net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table,
+ ARRAY_SIZE(vsock_table));
+ if (!net->vsock.sysctl_hdr)
+ goto err_reg;
+
+ return 0;
+
+err_reg:
+ if (!net_eq(net, &init_net))
+ kfree(table);
+err_alloc:
+ return -ENOMEM;
+}
+
+static void vsock_sysctl_unregister(struct net *net)
+{
+ const struct ctl_table *table;
+
+ table = net->vsock.sysctl_hdr->ctl_table_arg;
+ unregister_net_sysctl_table(net->vsock.sysctl_hdr);
+ if (!net_eq(net, &init_net))
+ kfree(table);
+}
+
+static void vsock_net_init(struct net *net)
+{
+ spin_lock_init(&net->vsock.lock);
+ net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
+}
+
+static __net_init int vsock_sysctl_init_net(struct net *net)
+{
+ vsock_net_init(net);
+
+ if (vsock_sysctl_register(net))
+ return -ENOMEM;
+
+ return 0;
+}
+
+static __net_exit void vsock_sysctl_exit_net(struct net *net)
+{
+ vsock_sysctl_unregister(net);
+}
+
+static struct pernet_operations vsock_sysctl_ops __net_initdata = {
+ .init = vsock_sysctl_init_net,
+ .exit = vsock_sysctl_exit_net,
+};
+
static int __init vsock_init(void)
{
int err = 0;
@@ -2663,10 +2883,19 @@ static int __init vsock_init(void)
goto err_unregister_proto;
}
+ if (register_pernet_subsys(&vsock_sysctl_ops)) {
+ err = -ENOMEM;
+ goto err_unregister_sock;
+ }
+
+ vsock_net_init(&init_net);
+ vsock_net_init(vsock_global_dummy_net());
vsock_bpf_build_proto();
return 0;
+err_unregister_sock:
+ sock_unregister(AF_VSOCK);
err_unregister_proto:
proto_unregister(&vsock_proto);
err_deregister_misc:
@@ -2680,6 +2909,7 @@ static void __exit vsock_exit(void)
misc_deregister(&vsock_device);
sock_unregister(AF_VSOCK);
proto_unregister(&vsock_proto);
+ unregister_pernet_subsys(&vsock_sysctl_ops);
}
const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 8c867023a2e5..6abec6b9b5bc 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -316,11 +316,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
vq = vsock->vqs[VSOCK_VQ_RX];
do {
- skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
+ skb = virtio_vsock_alloc_rx_skb(total_len, GFP_KERNEL);
if (!skb)
break;
- memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
sg_init_one(&pkt, virtio_vsock_hdr(skb), total_len);
p = &pkt;
ret = virtqueue_add_sgs(vq, &p, 0, 1, skb, GFP_KERNEL);
@@ -536,7 +535,7 @@ static bool virtio_transport_msgzerocopy_allow(void)
return true;
}
-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport virtio_transport = {
.transport = {
@@ -593,7 +592,7 @@ static struct virtio_transport virtio_transport = {
.can_msgzerocopy = virtio_transport_can_msgzerocopy,
};
-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
{
struct virtio_vsock *vsock;
bool seqpacket_allow;
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index bc2ff918b315..a8f218f0c5a3 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
return 0;
}
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static bool vsock_loopback_msgzerocopy_allow(void)
{
return true;
@@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = {
.send_pkt = vsock_loopback_send_pkt,
};
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
{
return true;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 05/14] vsock/loopback: add netns support
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (3 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 04/14] vsock: add netns to vsock core Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:18 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common Bobby Eshleman
` (9 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add NS support to vsock loopback. Sockets in a global mode netns
communicate with each other, regardless of namespace. Sockets in a local
mode netns may only communicate with other sockets within the same
namespace.
Use pernet_ops to install a vsock_loopback for every namespace that is
created (to be used if local mode is enabled).
Retroactively call init/exit on every namespace when the vsock_loopback
module is loaded in order to initialize the per-ns device.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v7:
- drop for_each_net() init/exit, drop net_rwsem, the pernet registration
handles this automatically and race-free
- flush workqueue before destruction, purge pkt list
- remember net_mode instead of current net mode
- keep space after INIT_WORK()
- change vsock_loopback in netns_vsock to ->priv void ptr
- rename `orig_net_mode` to `net_mode`
- remove useless comment
- protect `register_pernet_subsys()` with `net_rwsem`
- do cleanup before releasing `net_rwsem` when failure happens
- call `unregister_pernet_subsys()` in `vsock_loopback_exit()`
- call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()`
Changes in v6:
- init pernet ops for vsock_loopback module
- vsock_loopback: add space in struct to clarify lock protection
- do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net()
Changes in v5:
- add callbacks code to avoid reverse dependency
- add logic for handling vsock_loopback setup for already existing
namespaces
---
include/net/netns/vsock.h | 2 +
net/vmw_vsock/vsock_loopback.c | 85 ++++++++++++++++++++++++++++++++++++------
2 files changed, 75 insertions(+), 12 deletions(-)
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index c9a438ad52f2..9d0d8e2fbc37 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -16,5 +16,7 @@ struct netns_vsock {
/* protected by lock */
enum vsock_net_mode mode;
bool mode_locked;
+
+ void *priv;
};
#endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index a8f218f0c5a3..474083d4cfcb 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -28,8 +28,16 @@ static u32 vsock_loopback_get_local_cid(void)
static int vsock_loopback_send_pkt(struct sk_buff *skb)
{
- struct vsock_loopback *vsock = &the_vsock_loopback;
+ struct vsock_loopback *vsock;
int len = skb->len;
+ struct net *net;
+
+ net = virtio_vsock_skb_net(skb);
+
+ if (virtio_vsock_skb_net_mode(skb) == VSOCK_NET_MODE_LOCAL)
+ vsock = (struct vsock_loopback *)net->vsock.priv;
+ else
+ vsock = &the_vsock_loopback;
virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
queue_work(vsock->workqueue, &vsock->pkt_work);
@@ -134,11 +142,8 @@ static void vsock_loopback_work(struct work_struct *work)
}
}
-static int __init vsock_loopback_init(void)
+static int vsock_loopback_init_vsock(struct vsock_loopback *vsock)
{
- struct vsock_loopback *vsock = &the_vsock_loopback;
- int ret;
-
vsock->workqueue = alloc_workqueue("vsock-loopback", WQ_PERCPU, 0);
if (!vsock->workqueue)
return -ENOMEM;
@@ -146,15 +151,73 @@ static int __init vsock_loopback_init(void)
skb_queue_head_init(&vsock->pkt_queue);
INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
+ return 0;
+}
+
+static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock)
+{
+ if (vsock->workqueue) {
+ flush_work(&vsock->pkt_work);
+ virtio_vsock_skb_queue_purge(&vsock->pkt_queue);
+ destroy_workqueue(vsock->workqueue);
+ vsock->workqueue = NULL;
+ }
+}
+
+static int vsock_loopback_init_net(struct net *net)
+{
+ int ret;
+
+ net->vsock.priv = kzalloc(sizeof(struct vsock_loopback), GFP_KERNEL);
+ if (!net->vsock.priv)
+ return -ENOMEM;
+
+ ret = vsock_loopback_init_vsock((struct vsock_loopback *)net->vsock.priv);
+ if (ret < 0) {
+ kfree(net->vsock.priv);
+ net->vsock.priv = NULL;
+ return ret;
+ }
+
+ return 0;
+}
+
+static void vsock_loopback_exit_net(struct net *net)
+{
+ vsock_loopback_deinit_vsock(net->vsock.priv);
+ kfree(net->vsock.priv);
+ net->vsock.priv = NULL;
+}
+
+static struct pernet_operations vsock_loopback_net_ops = {
+ .init = vsock_loopback_init_net,
+ .exit = vsock_loopback_exit_net,
+};
+
+static int __init vsock_loopback_init(void)
+{
+ struct vsock_loopback *vsock = &the_vsock_loopback;
+ int ret;
+
+ ret = vsock_loopback_init_vsock(vsock);
+ if (ret < 0)
+ return ret;
+
+ ret = register_pernet_subsys(&vsock_loopback_net_ops);
+ if (ret < 0)
+ goto out_deinit_vsock;
+
ret = vsock_core_register(&loopback_transport.transport,
VSOCK_TRANSPORT_F_LOCAL);
if (ret)
- goto out_wq;
+ goto out_unregister_pernet_subsys;
return 0;
-out_wq:
- destroy_workqueue(vsock->workqueue);
+out_unregister_pernet_subsys:
+ unregister_pernet_subsys(&vsock_loopback_net_ops);
+out_deinit_vsock:
+ vsock_loopback_deinit_vsock(vsock);
return ret;
}
@@ -164,11 +227,9 @@ static void __exit vsock_loopback_exit(void)
vsock_core_unregister(&loopback_transport.transport);
- flush_work(&vsock->pkt_work);
-
- virtio_vsock_skb_queue_purge(&vsock->pkt_queue);
+ unregister_pernet_subsys(&vsock_loopback_net_ops);
- destroy_workqueue(vsock->workqueue);
+ vsock_loopback_deinit_vsock(vsock);
}
module_init(vsock_loopback_init);
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (4 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 05/14] vsock/loopback: add netns support Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:20 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 07/14] vhost/vsock: add netns support Bobby Eshleman
` (8 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Enable network namespace support in the virtio-vsock common transport
layer by declaring namespace pointers in the transmit and receive
paths.
The changes include:
1. Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
pointer for outgoing packets.
2. Store the namespace and namespace mode in the skb control buffer when
allocating packets (except for VIRTIO_VSOCK_OP_RST packets which do
not have an associated socket).
3. Retrieve namespace information from skbs on the receive path for
lookups using vsock_find_connected_socket_net() and
vsock_find_bound_socket_net().
This allows users of virtio transport common code
(vhost-vsock/virtio-vsock) to later enable namespace support.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v7:
- add comment explaining the !vsk case in virtio_transport_alloc_skb()
---
include/linux/virtio_vsock.h | 1 +
net/vmw_vsock/virtio_transport_common.c | 21 +++++++++++++++++++--
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 29290395054c..f90646f82993 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -217,6 +217,7 @@ struct virtio_vsock_pkt_info {
u32 remote_cid, remote_port;
struct vsock_sock *vsk;
struct msghdr *msg;
+ struct net *net;
u32 pkt_len;
u16 type;
u16 op;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index dcc8a1d5851e..b8e52c71920a 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
info->flags,
zcopy);
+ /*
+ * If there is no corresponding socket, then we don't have a
+ * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
+ */
+ if (vsk) {
+ virtio_vsock_skb_set_net(skb, info->net);
+ virtio_vsock_skb_set_net_mode(skb, vsk->net_mode);
+ }
+
return skb;
out:
kfree_skb(skb);
@@ -527,6 +536,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk)
struct virtio_vsock_pkt_info info = {
.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1067,6 +1077,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
struct virtio_vsock_pkt_info info = {
.op = VIRTIO_VSOCK_OP_REQUEST,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1082,6 +1093,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
(mode & SEND_SHUTDOWN ?
VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1108,6 +1120,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
.msg = msg,
.pkt_len = len,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1145,6 +1158,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
.op = VIRTIO_VSOCK_OP_RST,
.reply = !!skb,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
/* Send RST only if the original pkt is not a RST pkt */
@@ -1465,6 +1479,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
.remote_port = le32_to_cpu(hdr->src_port),
.reply = true,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1578,7 +1593,9 @@ static bool virtio_transport_valid_type(u16 type)
void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb)
{
+ enum vsock_net_mode net_mode = virtio_vsock_skb_net_mode(skb);
struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+ struct net *net = virtio_vsock_skb_net(skb);
struct sockaddr_vm src, dst;
struct vsock_sock *vsk;
struct sock *sk;
@@ -1606,9 +1623,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
/* The socket must be in connected or bound table
* otherwise send reset back
*/
- sk = vsock_find_connected_socket(&src, &dst);
+ sk = vsock_find_connected_socket_net(&src, &dst, net, net_mode);
if (!sk) {
- sk = vsock_find_bound_socket(&dst);
+ sk = vsock_find_bound_socket_net(&dst, net, net_mode);
if (!sk) {
(void)virtio_transport_reset_no_sock(t, skb);
goto free_pkt;
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 07/14] vhost/vsock: add netns support
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (5 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-11-06 16:21 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 08/14] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
` (7 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add the ability to isolate vhost-vsock flows using namespaces.
The VM, via the vhost_vsock struct, inherits its namespace from the
process that opens the vhost-vsock device. vhost_vsock lookup functions
are modified to take into account the mode (e.g., if CIDs are matching
but modes don't align, then return NULL).
vhost_vsock now acquires a reference to the namespace.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v7:
- remove the check_global flag of vhost_vsock_get(), that logic was both
wrong and not necessary, reuse vsock_net_check_mode() instead
- remove 'delete me' comment
Changes in v5:
- respect pid namespaces when assigning namespace to vhost_vsock
---
drivers/vhost/vsock.c | 44 ++++++++++++++++++++++++++++++++++----------
1 file changed, 34 insertions(+), 10 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34adf0cf9124..df6136633cd8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -46,6 +46,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
struct vhost_vsock {
struct vhost_dev dev;
struct vhost_virtqueue vqs[2];
+ struct net *net;
+ netns_tracker ns_tracker;
+
+ /* The ns mode at the time vhost_vsock was created */
+ enum vsock_net_mode net_mode;
/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
struct hlist_node hash;
@@ -67,7 +72,8 @@ static u32 vhost_transport_get_local_cid(void)
/* Callers that dereference the return value must hold vhost_vsock_mutex or the
* RCU read lock.
*/
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net,
+ enum vsock_net_mode mode)
{
struct vhost_vsock *vsock;
@@ -78,9 +84,9 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
if (other_cid == 0)
continue;
- if (other_cid == guest_cid)
+ if (other_cid == guest_cid &&
+ vsock_net_check_mode(net, mode, vsock->net, vsock->net_mode))
return vsock;
-
}
return NULL;
@@ -271,14 +277,16 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
static int
vhost_transport_send_pkt(struct sk_buff *skb)
{
+ enum vsock_net_mode mode = virtio_vsock_skb_net_mode(skb);
struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+ struct net *net = virtio_vsock_skb_net(skb);
struct vhost_vsock *vsock;
int len = skb->len;
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid));
+ vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net, mode);
if (!vsock) {
rcu_read_unlock();
kfree_skb(skb);
@@ -305,7 +313,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+ vsock = vhost_vsock_get(vsk->remote_addr.svm_cid,
+ sock_net(sk_vsock(vsk)), vsk->net_mode);
if (!vsock)
goto out;
@@ -327,7 +336,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
}
static struct sk_buff *
-vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
+vhost_vsock_alloc_skb(struct vhost_vsock *vsock, struct vhost_virtqueue *vq,
unsigned int out, unsigned int in)
{
struct virtio_vsock_hdr *hdr;
@@ -353,6 +362,9 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
if (!skb)
return NULL;
+ virtio_vsock_skb_set_net(skb, vsock->net);
+ virtio_vsock_skb_set_net_mode(skb, vsock->net_mode);
+
iov_iter_init(&iov_iter, ITER_SOURCE, vq->iov, out, len);
hdr = virtio_vsock_hdr(skb);
@@ -462,11 +474,12 @@ static struct virtio_transport vhost_transport = {
static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
{
+ struct net *net = sock_net(sk_vsock(vsk));
struct vhost_vsock *vsock;
bool seqpacket_allow = false;
rcu_read_lock();
- vsock = vhost_vsock_get(remote_cid);
+ vsock = vhost_vsock_get(remote_cid, net, vsk->net_mode);
if (vsock)
seqpacket_allow = vsock->seqpacket_allow;
@@ -520,7 +533,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
break;
}
- skb = vhost_vsock_alloc_skb(vq, out, in);
+ skb = vhost_vsock_alloc_skb(vsock, vq, out, in);
if (!skb) {
vq_err(vq, "Faulted on pkt\n");
continue;
@@ -652,8 +665,10 @@ static void vhost_vsock_free(struct vhost_vsock *vsock)
static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
{
+
struct vhost_virtqueue **vqs;
struct vhost_vsock *vsock;
+ struct net *net;
int ret;
/* This struct is large and allocation could fail, fall back to vmalloc
@@ -669,6 +684,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
goto out;
}
+ net = current->nsproxy->net_ns;
+ vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
+
+ /* Cache the mode of the namespace so that if that netns mode changes,
+ * the vhost_vsock will continue to function as expected.
+ */
+ vsock->net_mode = vsock_net_mode(net);
+
vsock->guest_cid = 0; /* no CID assigned yet */
vsock->seqpacket_allow = false;
@@ -708,7 +731,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
*/
/* If the peer is still valid, no need to reset connection */
- if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+ if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk), vsk->net_mode))
return;
/* If the close timeout is pending, let it expire. This avoids races
@@ -753,6 +776,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
vhost_dev_cleanup(&vsock->dev);
+ put_net_track(vsock->net, &vsock->ns_tracker);
kfree(vsock->dev.vqs);
vhost_vsock_free(vsock);
return 0;
@@ -779,7 +803,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */
mutex_lock(&vhost_vsock_mutex);
- other = vhost_vsock_get(guest_cid);
+ other = vhost_vsock_get(guest_cid, vsock->net, vsock->net_mode);
if (other && other != vsock) {
mutex_unlock(&vhost_vsock_mutex);
return -EADDRINUSE;
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 08/14] selftests/vsock: add namespace helpers to vmtest.sh
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (6 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 07/14] vhost/vsock: add netns support Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 09/14] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
` (6 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add functions for initializing namespaces with the different vsock NS
modes. Callers can use add_namespaces() and del_namespaces() to create
namespaces global0, global1, local0, and local1.
The init_namespaces() function initializes global0, local0, etc... with
their respective vsock NS mode. This function is separate so that tests
that depend on this initialization can use it, while other tests that
want to test the initialization interface itself can start with a clean
slate by omitting this call.
Remove namespaces upon exiting the program in cleanup(). This is
unlikely to be needed for a healthy run, but it is useful for tests that
are manually killed mid-test. In that case, this patch prevents the
subsequent test run from finding stale namespaces with
already-write-once-locked vsock ns modes.
This patch is in preparation for later namespace tests.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 45 +++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 62b4f5ede9f6..5f4bae952e13 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -46,6 +46,7 @@ readonly TEST_DESCS=(
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly NS_MODES=("local" "global")
VERBOSE=0
@@ -100,11 +101,55 @@ check_result() {
cnt_total=$(( cnt_total + 1 ))
}
+add_namespaces() {
+ # add namespaces local0, local1, global0, and global1
+ for mode in "${NS_MODES[@]}"; do
+ ip netns add "${mode}0" 2>/dev/null
+ ip netns add "${mode}1" 2>/dev/null
+ done
+}
+
+init_namespaces() {
+ for mode in "${NS_MODES[@]}"; do
+ ns_set_mode "${mode}0" "${mode}"
+ ns_set_mode "${mode}1" "${mode}"
+
+ log_host "set ns ${mode}0 to mode ${mode}"
+ log_host "set ns ${mode}1 to mode ${mode}"
+
+ # we need lo for qemu port forwarding
+ ip netns exec "${mode}0" ip link set dev lo up
+ ip netns exec "${mode}1" ip link set dev lo up
+ done
+}
+
+del_namespaces() {
+ for mode in "${NS_MODES[@]}"; do
+ ip netns del "${mode}0" &>/dev/null
+ ip netns del "${mode}1" &>/dev/null
+ log_host "removed ns ${mode}0"
+ log_host "removed ns ${mode}1"
+ done
+}
+
+ns_set_mode() {
+ local ns=$1
+ local mode=$2
+
+ echo "${mode}" | ip netns exec "${ns}" \
+ tee /proc/sys/net/vsock/ns_mode &>/dev/null
+}
+
vm_ssh() {
ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
return $?
}
+cleanup() {
+ del_namespaces
+}
+
+trap cleanup EXIT
check_args() {
local found
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 09/14] selftests/vsock: prepare vm management helpers for namespaces
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (7 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 08/14] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 10/14] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
` (5 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add namespace support to vm management, ssh helpers, and vsock_test
wrapper functions. This enables running VMs and test helpers in specific
namespaces, which is required for upcoming namespace isolation tests.
The functions still work correctly within the init ns, though the caller
must now pass "init_ns" explicitly.
No functional changes for existing tests. All have been updated to pass
"init_ns" explicitly.
Affected functions (such as vm_start() and vm_ssh()) now wrap their
commands with 'ip netns exec' when executing commands in non-init
namespaces.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 102 ++++++++++++++++++++++----------
1 file changed, 71 insertions(+), 31 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 5f4bae952e13..d047f6d27df4 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -141,7 +141,18 @@ ns_set_mode() {
}
vm_ssh() {
- ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
+ local ns_exec
+
+ if [[ "${1}" == init_ns ]]; then
+ ns_exec=""
+ else
+ ns_exec="ip netns exec ${1}"
+ fi
+
+ shift
+
+ ${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost $*
+
return $?
}
@@ -254,10 +265,12 @@ terminate_pidfiles() {
vm_start() {
local pidfile=$1
+ local ns=$2
local logfile=/dev/null
local verbose_opt=""
local kernel_opt=""
local qemu_opts=""
+ local ns_exec=""
local qemu
qemu=$(command -v "${QEMU}")
@@ -278,7 +291,11 @@ vm_start() {
kernel_opt="${KERNEL_CHECKOUT}"
fi
- vng \
+ if [[ "${ns}" != "init_ns" ]]; then
+ ns_exec="ip netns exec ${ns}"
+ fi
+
+ ${ns_exec} vng \
--run \
${kernel_opt} \
${verbose_opt} \
@@ -293,6 +310,7 @@ vm_start() {
}
vm_wait_for_ssh() {
+ local ns=$1
local i
i=0
@@ -300,7 +318,8 @@ vm_wait_for_ssh() {
if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
die "Timed out waiting for guest ssh"
fi
- if vm_ssh -- true; then
+
+ if vm_ssh "${ns}" -- true; then
break
fi
i=$(( i + 1 ))
@@ -344,28 +363,42 @@ wait_for_listener()
}
vm_wait_for_listener() {
- local port=$1
+ local ns=$1
+ local port=$2
+
+ log "Waiting for listener on port ${port} on vm"
- vm_ssh <<EOF
+ vm_ssh "${ns}" <<EOF
$(declare -f wait_for_listener)
wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
EOF
}
host_wait_for_listener() {
- wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+ local ns=$1
+ local port=$2
+
+ if [[ "${ns}" == init_ns ]]; then
+ wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+ else
+ ip netns exec "${ns}" bash <<-EOF
+ $(declare -f wait_for_listener)
+ wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+ EOF
+ fi
}
vm_vsock_test() {
- local host=$1
- local cid=$2
- local port=$3
+ local ns=$1
+ local host=$2
+ local cid=$3
+ local port=$4
local rc
set -o pipefail
if [[ "${host}" != server ]]; then
# log output and use pipefail to respect vsock_test errors
- vm_ssh -- "${VSOCK_TEST}" \
+ vm_ssh "${ns}" -- "${VSOCK_TEST}" \
--mode=client \
--control-host="${host}" \
--peer-cid="${cid}" \
@@ -374,7 +407,7 @@ vm_vsock_test() {
rc=$?
else
# log output and use pipefail to respect vsock_test errors
- vm_ssh -- "${VSOCK_TEST}" \
+ vm_ssh "${ns}" -- "${VSOCK_TEST}" \
--mode=server \
--peer-cid="${cid}" \
--control-port="${port}" \
@@ -386,7 +419,7 @@ vm_vsock_test() {
return $rc
fi
- vm_wait_for_listener "${port}"
+ vm_wait_for_listener "${ns}" "${port}"
rc=$?
fi
set +o pipefail
@@ -395,22 +428,28 @@ vm_vsock_test() {
}
host_vsock_test() {
- local host=$1
- local cid=$2
- local port=$3
+ local ns=$1
+ local host=$2
+ local cid=$3
+ local port=$4
local rc
+ local cmd="${VSOCK_TEST}"
+ if [[ "${ns}" != "init_ns" ]]; then
+ cmd="ip netns exec ${ns} ${cmd}"
+ fi
+
# log output and use pipefail to respect vsock_test errors
set -o pipefail
if [[ "${host}" != server ]]; then
- ${VSOCK_TEST} \
+ ${cmd} \
--mode=client \
--peer-cid="${cid}" \
--control-host="${host}" \
--control-port="${port}" 2>&1 | log_host
rc=$?
else
- ${VSOCK_TEST} \
+ ${cmd} \
--mode=server \
--peer-cid="${cid}" \
--control-port="${port}" 2>&1 | log_host &
@@ -420,7 +459,7 @@ host_vsock_test() {
return $rc
fi
- host_wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+ host_wait_for_listener "${ns}" "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
rc=$?
fi
set +o pipefail
@@ -464,11 +503,11 @@ log_guest() {
}
test_vm_server_host_client() {
- if ! vm_vsock_test "server" 2 "${TEST_GUEST_PORT}"; then
+ if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then
return "${KSFT_FAIL}"
fi
- if ! host_vsock_test "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then
+ if ! host_vsock_test "init_ns" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then
return "${KSFT_FAIL}"
fi
@@ -476,11 +515,11 @@ test_vm_server_host_client() {
}
test_vm_client_host_server() {
- if ! host_vsock_test "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then
+ if ! host_vsock_test "init_ns" "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then
return "${KSFT_FAIL}"
fi
- if ! vm_vsock_test "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then
+ if ! vm_vsock_test "init_ns" "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then
return "${KSFT_FAIL}"
fi
@@ -490,13 +529,14 @@ test_vm_client_host_server() {
test_vm_loopback() {
local port=60000 # non-forwarded local port
- vm_ssh -- modprobe vsock_loopback &> /dev/null || :
+ vm_ssh "init_ns" -- modprobe vsock_loopback &> /dev/null || :
- if ! vm_vsock_test "server" 1 "${port}"; then
+ if ! vm_vsock_test "init_ns" "server" 1 "${port}"; then
return "${KSFT_FAIL}"
fi
- if ! vm_vsock_test "127.0.0.1" 1 "${port}"; then
+
+ if ! vm_vsock_test "init_ns" "127.0.0.1" 1 "${port}"; then
return "${KSFT_FAIL}"
fi
@@ -554,8 +594,8 @@ run_shared_vm_test() {
host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock')
- vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
- vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock')
+ vm_oops_cnt_before=$(vm_ssh "init_ns" -- dmesg | grep -c -i 'Oops')
+ vm_warn_cnt_before=$(vm_ssh "init_ns" -- dmesg --level=warn | grep -c -i 'vsock')
name=$(echo "${1}" | awk '{ print $1 }')
eval test_"${name}"
@@ -573,13 +613,13 @@ run_shared_vm_test() {
rc=$KSFT_FAIL
fi
- vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
+ vm_oops_cnt_after=$(vm_ssh "init_ns" -- dmesg | grep -i 'Oops' | wc -l)
if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
echo "FAIL: kernel oops detected on vm" | log_host
rc=$KSFT_FAIL
fi
- vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i vsock)
+ vm_warn_cnt_after=$(vm_ssh "init_ns" -- dmesg --level=warn | grep -c -i vsock)
if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
echo "FAIL: kernel warning detected on vm" | log_host
rc=$KSFT_FAIL
@@ -623,8 +663,8 @@ cnt_total=0
if shared_vm_tests_requested "${ARGS[@]}"; then
log_host "Booting up VM"
pidfile=$(mktemp $PIDFILE_TEMPLATE)
- vm_start "${pidfile}"
- vm_wait_for_ssh
+ vm_start "${pidfile}" "init_ns"
+ vm_wait_for_ssh "init_ns"
log_host "VM booted up"
run_shared_vm_tests "${ARGS[@]}"
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 10/14] selftests/vsock: add tests for proc sys vsock ns_mode
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (8 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 09/14] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 11/14] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
` (4 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests for the /proc/sys/net/vsock/ns_mode interface. Namely,
that it accepts "global" and "local" strings and enforces a write-once
policy.
Start a convention of commenting the test name over the test
description. Add test name comments over test descriptions that existed
before this convention.
Add a check_netns() function that checks if the test requires namespaces
and if the current kernel supports namespaces. Skip tests that require
namespaces if the system does not have namespace support.
This patch is the first to add tests that do *not* re-use the same
shared VM. For that reason, it adds a run_tests() function to run these
tests and filter out the shared VM tests.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 99 ++++++++++++++++++++++++++++++++-
1 file changed, 98 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index d047f6d27df4..b775fb0cd4ed 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -38,11 +38,28 @@ readonly KERNEL_CMDLINE="\
virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \
"
readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
-readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly TEST_NAMES=(
+ vm_server_host_client
+ vm_client_host_server
+ vm_loopback
+ ns_host_vsock_ns_mode_ok
+ ns_host_vsock_ns_mode_write_once_ok
+)
readonly TEST_DESCS=(
+ # vm_server_host_client
"Run vsock_test in server mode on the VM and in client mode on the host."
+
+ # vm_client_host_server
"Run vsock_test in client mode on the VM and in server mode on the host."
+
+ # vm_loopback
"Run vsock_test using the loopback transport in the VM."
+
+ # ns_host_vsock_ns_mode_ok
+ "Check /proc/sys/net/vsock/ns_mode strings on the host."
+
+ # ns_host_vsock_ns_mode_write_once_ok
+ "Check /proc/sys/net/vsock/ns_mode is write-once on the host."
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
@@ -203,6 +220,20 @@ check_deps() {
fi
}
+check_netns() {
+ local tname=$1
+
+ # If the test requires NS support, check if NS support exists
+ # using /proc/self/ns
+ if [[ "${tname}" =~ ^ns_ ]] &&
+ [[ ! -e /proc/self/ns ]]; then
+ log_host "No NS support detected for test ${tname}"
+ return 1
+ fi
+
+ return 0
+}
+
check_vng() {
local tested_versions
local version
@@ -502,6 +533,43 @@ log_guest() {
LOG_PREFIX=guest log $@
}
+test_ns_host_vsock_ns_mode_ok() {
+ add_namespaces
+
+ for mode in "${NS_MODES[@]}"; do
+ if ! ns_set_mode "${mode}0" "${mode}"; then
+ del_namespaces
+ return "${KSFT_FAIL}"
+ fi
+ done
+
+ del_namespaces
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_host_vsock_ns_mode_write_once_ok() {
+ add_namespaces
+
+ for mode in "${NS_MODES[@]}"; do
+ local ns="${mode}0"
+ if ! ns_set_mode "${ns}" "${mode}"; then
+ del_namespaces
+ return "${KSFT_FAIL}"
+ fi
+
+ # try writing again and expect failure
+ if ns_set_mode "${ns}" "${mode}"; then
+ del_namespaces
+ return "${KSFT_FAIL}"
+ fi
+ done
+
+ del_namespaces
+
+ return "${KSFT_PASS}"
+}
+
test_vm_server_host_client() {
if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then
return "${KSFT_FAIL}"
@@ -575,6 +643,11 @@ run_shared_vm_tests() {
continue
fi
+ if ! check_netns "${arg}"; then
+ check_result "${KSFT_SKIP}"
+ continue
+ fi
+
run_shared_vm_test "${arg}"
check_result $?
done
@@ -628,6 +701,28 @@ run_shared_vm_test() {
return "${rc}"
}
+run_tests() {
+ for arg in "${ARGS[@]}"; do
+ if shared_vm_test "${arg}"; then
+ continue
+ fi
+
+ if ! check_netns "${arg}"; then
+ check_result "${KSFT_SKIP}"
+ continue
+ fi
+
+ add_namespaces
+
+ name=$(echo "${arg}" | awk '{ print $1 }')
+ log_host "Executing test_${name}"
+ eval test_"${name}"
+ check_result $?
+
+ del_namespaces
+ done
+}
+
BUILD=0
QEMU="qemu-system-$(uname -m)"
@@ -671,6 +766,8 @@ if shared_vm_tests_requested "${ARGS[@]}"; then
terminate_pidfiles "${pidfile}"
fi
+run_tests "${ARGS[@]}"
+
echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
echo "Log: ${LOG}"
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 11/14] selftests/vsock: add namespace tests for CID collisions
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (9 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 10/14] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 12/14] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
` (3 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests to verify CID collision rules across different vsock namespace
modes.
1. Two VMs with the same CID cannot start in different global namespaces
(ns_global_same_cid_fails)
2. Two VMs with the same CID can start in different local namespaces
(ns_local_same_cid_ok)
3. VMs with the same CID can coexist when one is in a global namespace
and another is in a local namespace (ns_global_local_same_cid_ok and
ns_local_global_same_cid_ok)
The tests ns_global_local_same_cid_ok and ns_local_global_same_cid_ok
make sure that ordering does not matter.
The tests use a shared helper function namespaces_can_boot_same_cid()
that attempts to start two VMs with identical CIDs in the specified
namespaces and verifies whether VM initialization failed or succeeded.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 74 +++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index b775fb0cd4ed..f2a99cde9fb4 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -44,6 +44,10 @@ readonly TEST_NAMES=(
vm_loopback
ns_host_vsock_ns_mode_ok
ns_host_vsock_ns_mode_write_once_ok
+ ns_global_same_cid_fails
+ ns_local_same_cid_ok
+ ns_global_local_same_cid_ok
+ ns_local_global_same_cid_ok
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -60,6 +64,18 @@ readonly TEST_DESCS=(
# ns_host_vsock_ns_mode_write_once_ok
"Check /proc/sys/net/vsock/ns_mode is write-once on the host."
+
+ # ns_global_same_cid_fails
+ "Check QEMU fails to start two VMs with same CID in two different global namespaces."
+
+ # ns_local_same_cid_ok
+ "Check QEMU successfully starts two VMs with same CID in two different local namespaces."
+
+ # ns_global_local_same_cid_ok
+ "Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID."
+
+ # ns_local_global_same_cid_ok
+ "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
@@ -548,6 +564,64 @@ test_ns_host_vsock_ns_mode_ok() {
return "${KSFT_PASS}"
}
+namespaces_can_boot_same_cid() {
+ local ns0=$1
+ local ns1=$2
+ local pidfile1 pidfile2
+ local rc
+
+ pidfile1=$(mktemp $PIDFILE_TEMPLATE)
+ vm_start "${pidfile1}" "${ns0}"
+
+ pidfile2=$(mktemp $PIDFILE_TEMPLATE)
+ vm_start "${pidfile2}" "${ns1}"
+
+ rc=$?
+ terminate_pidfiles "${pidfile1}" "${pidfile2}"
+
+ return $rc
+}
+
+test_ns_global_same_cid_fails() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "global0" "global1"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_local_global_same_cid_ok() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "local0" "global0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_global_local_same_cid_ok() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "global0" "local0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_local_same_cid_ok() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "local0" "local0"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
test_ns_host_vsock_ns_mode_write_once_ok() {
add_namespaces
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 12/14] selftests/vsock: add tests for host <-> vm connectivity with namespaces
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (10 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 11/14] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 13/14] selftests/vsock: add tests for namespace deletion and mode changes Bobby Eshleman
` (2 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests to validate namespace correctness using vsock_test and socat.
The vsock_test tool is used to validate expected success tests, but
socat is used for expected failure tests. socat is used to ensure that
connections are rejected outright instead of failing due to some other
socket behavior (as tested in vsock_test). Additionally, socat is
already required for tunneling TCP traffic from vsock_test. Using only
one of the vsock_test tests like 'test_stream_client_close_client' would
have yielded a similar result, but doing so wouldn't remove the socat
dependency.
Additionally, check for the dependency socat. socat needs special
handling beyond just checking if it is on the path because it must be
compiled with support for both vsock and unix. The function
check_socat() checks that this support exists.
Add more padding to test name printf strings because the tests added in
this patch would otherwise overflow.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 463 +++++++++++++++++++++++++++++++-
1 file changed, 461 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index f2a99cde9fb4..60d349c80153 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -7,6 +7,7 @@
# * virtme-ng
# * busybox-static (used by virtme-ng)
# * qemu (used by virtme-ng)
+# * socat
readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
@@ -48,6 +49,19 @@ readonly TEST_NAMES=(
ns_local_same_cid_ok
ns_global_local_same_cid_ok
ns_local_global_same_cid_ok
+ ns_diff_global_host_connect_to_global_vm_ok
+ ns_diff_global_host_connect_to_local_vm_fails
+ ns_diff_global_vm_connect_to_global_host_ok
+ ns_diff_global_vm_connect_to_local_host_fails
+ ns_diff_local_host_connect_to_local_vm_fails
+ ns_diff_local_vm_connect_to_local_host_fails
+ ns_diff_global_to_local_loopback_local_fails
+ ns_diff_local_to_global_loopback_fails
+ ns_diff_local_to_local_loopback_fails
+ ns_diff_global_to_global_loopback_ok
+ ns_same_local_loopback_ok
+ ns_same_local_host_connect_to_local_vm_ok
+ ns_same_local_vm_connect_to_local_host_ok
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -76,6 +90,45 @@ readonly TEST_DESCS=(
# ns_local_global_same_cid_ok
"Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
+
+ # ns_diff_global_host_connect_to_global_vm_ok
+ "Run vsock_test client in global ns with server in VM in another global ns."
+
+ # ns_diff_global_host_connect_to_local_vm_fails
+ "Run socat to test a process in a global ns fails to connect to a VM in a local ns."
+
+ # ns_diff_global_vm_connect_to_global_host_ok
+ "Run vsock_test client in VM in a global ns with server in another global ns."
+
+ # ns_diff_global_vm_connect_to_local_host_fails
+ "Run socat to test a VM in a global ns fails to connect to a host process in a local ns."
+
+ # ns_diff_local_host_connect_to_local_vm_fails
+ "Run socat to test a host process in a local ns fails to connect to a VM in another local ns."
+
+ # ns_diff_local_vm_connect_to_local_host_fails
+ "Run socat to test a VM in a local ns fails to connect to a host process in another local ns."
+
+ # ns_diff_global_to_local_loopback_local_fails
+ "Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns."
+
+ # ns_diff_local_to_global_loopback_fails
+ "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns."
+
+ # ns_diff_local_to_local_loopback_fails
+ "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns."
+
+ # ns_diff_global_to_global_loopback_ok
+ "Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns."
+
+ # ns_same_local_loopback_ok
+ "Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns."
+
+ # ns_same_local_host_connect_to_local_vm_ok
+ "Run vsock_test client in a local ns with server in VM in same ns."
+
+ # ns_same_local_vm_connect_to_local_host_ok
+ "Run vsock_test client in VM in a local ns with server in same ns."
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
@@ -102,7 +155,7 @@ usage() {
for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
name=${TEST_NAMES[${i}]}
desc=${TEST_DESCS[${i}]}
- printf "\t%-35s%-35s\n" "${name}" "${desc}"
+ printf "\t%-55s%-35s\n" "${name}" "${desc}"
done
echo
@@ -222,7 +275,7 @@ check_args() {
}
check_deps() {
- for dep in vng ${QEMU} busybox pkill ssh; do
+ for dep in vng ${QEMU} busybox pkill ssh socat; do
if [[ ! -x $(command -v "${dep}") ]]; then
echo -e "skip: dependency ${dep} not found!\n"
exit "${KSFT_SKIP}"
@@ -273,6 +326,20 @@ check_vng() {
fi
}
+check_socat() {
+ local support_string
+
+ support_string="$(socat -V)"
+
+ if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then
+ die "err: socat is missing vsock support"
+ fi
+
+ if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then
+ die "err: socat is missing unix support"
+ fi
+}
+
handle_build() {
if [[ ! "${BUILD}" -eq 1 ]]; then
return
@@ -310,6 +377,14 @@ terminate_pidfiles() {
done
}
+terminate_pids() {
+ local pid
+
+ for pid in "$@"; do
+ kill -SIGTERM "${pid}" &>/dev/null || :
+ done
+}
+
vm_start() {
local pidfile=$1
local ns=$2
@@ -564,6 +639,389 @@ test_ns_host_vsock_ns_mode_ok() {
return "${KSFT_PASS}"
}
+test_ns_diff_global_host_connect_to_global_vm_ok() {
+ local pids pid pidfile
+ local ns0 ns1 port
+ declare -a pids
+ local unixfile
+ ns0="global0"
+ ns1="global1"
+ port=1234
+ local rc
+
+ init_namespaces
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ unixfile=$(mktemp -u /tmp/XXXX.sock)
+ ip netns exec "${ns1}" \
+ socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \
+ UNIX-CONNECT:"${unixfile}" &
+ pids+=($!)
+ host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}"
+
+ ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \
+ TCP-CONNECT:localhost:"${TEST_HOST_PORT}" &
+ pids+=($!)
+
+ vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}"
+ vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}"
+ host_vsock_test "${ns1}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+ rc=$?
+
+ for pid in "${pids[@]}"; do
+ if [[ "$(jobs -p)" = *"${pid}"* ]]; then
+ kill -SIGTERM "${pid}" &>/dev/null
+ fi
+ done
+
+ terminate_pidfiles "${pidfile}"
+
+ if [[ $rc -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_diff_global_host_connect_to_local_vm_fails() {
+ local ns0="global0"
+ local ns1="local0"
+ local port=12345
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ outfile=$(mktemp)
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+ if ! vm_start "${pidfile}" "${ns1}"; then
+ log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})"
+ return $KSFT_FAIL
+ fi
+
+ vm_wait_for_ssh "${ns1}"
+ vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
+ echo TEST | ip netns exec "${ns0}" \
+ socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
+
+ terminate_pidfiles "${pidfile}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]]; then
+ return $KSFT_PASS
+ fi
+
+ return $KSFT_FAIL
+}
+
+test_ns_diff_global_vm_connect_to_global_host_ok() {
+ local ns0="global0"
+ local ns1="global1"
+ local port=12345
+ local unixfile
+ local pidfile
+ local pids
+
+ init_namespaces
+
+ declare -a pids
+
+ log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
+
+ unixfile=$(mktemp -u /tmp/XXXX.sock)
+
+ ip netns exec "${ns0}" \
+ socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &
+ pids+=($!)
+
+ ip netns exec "${ns1}" \
+ socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" &
+ pids+=($!)
+
+ log_host "Launching ${VSOCK_TEST} in ns ${ns1}"
+ host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}"
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ terminate_pids "${pids[@]}"
+ rm -f "${unixfile}"
+ return $KSFT_FAIL
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+ vm_vsock_test "${ns0}" "10.0.2.2" 2 "${port}"
+ rc=$?
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pids[@]}"
+ rm -f "${unixfile}"
+
+ if [[ ! $rc -eq 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+
+}
+
+test_ns_diff_global_vm_connect_to_local_host_fails() {
+ local ns0="global0"
+ local ns1="local0"
+ local port=12345
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ log_host "Launching socat in ns ${ns1}"
+ outfile=$(mktemp)
+ ip netns exec "${ns1}" socat VSOCK-LISTEN:${port} STDOUT &> "${outfile}" &
+ pid=$!
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ terminate_pids "${pid}"
+ rm -f "${outfile}"
+ return $KSFT_FAIL
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+
+ vm_ssh "${ns0}" -- \
+ bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pid}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]]; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_host_connect_to_local_vm_fails() {
+ local ns0="local0"
+ local ns1="local1"
+ local port=12345
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ outfile=$(mktemp)
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+ if ! vm_start "${pidfile}" "${ns1}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ return $KSFT_FAIL
+ fi
+
+ vm_wait_for_ssh "${ns1}"
+ vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
+ echo TEST | ip netns exec "${ns0}" \
+ socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
+
+ terminate_pidfiles "${pidfile}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]]; then
+ return $KSFT_PASS
+ fi
+
+ return $KSFT_FAIL
+}
+
+test_ns_diff_local_vm_connect_to_local_host_fails() {
+ local ns0="local0"
+ local ns1="local1"
+ local port=12345
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ log_host "Launching socat in ns ${ns1}"
+ outfile=$(mktemp)
+ ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
+ pid=$!
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ rm -f "${outfile}"
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+
+ vm_ssh "${ns0}" -- \
+ bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pid}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]]; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+__test_loopback_two_netns() {
+ local ns0=$1
+ local ns1=$2
+ local port=12345
+ local result
+ local pid
+
+ modprobe vsock_loopback &> /dev/null || :
+
+ log_host "Launching socat in ns ${ns1}"
+ outfile=$(mktemp)
+ ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null &
+ pid=$!
+
+ log_host "Launching socat in ns ${ns0}"
+ echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null
+ terminate_pids "${pid}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" == TEST ]]; then
+ return 0
+ fi
+
+ return 1
+}
+
+test_ns_diff_global_to_local_loopback_local_fails() {
+ init_namespaces
+
+ if ! __test_loopback_two_netns "global0" "local0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_to_global_loopback_fails() {
+ init_namespaces
+
+ if ! __test_loopback_two_netns "local0" "global0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_to_local_loopback_fails() {
+ init_namespaces
+
+ if ! __test_loopback_two_netns "local0" "local1"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_global_to_global_loopback_ok() {
+ init_namespaces
+
+ if __test_loopback_two_netns "global0" "global1"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_same_local_loopback_ok() {
+ init_namespaces
+
+ if __test_loopback_two_netns "local0" "local0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_same_local_host_connect_to_local_vm_ok() {
+ local ns="local0"
+ local port=1234
+ local pidfile
+ local rc
+
+ init_namespaces
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+
+ if ! vm_start "${pidfile}" "${ns}"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
+ host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+ rc=$?
+
+ terminate_pidfiles "${pidfile}"
+
+ if [[ $rc -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_same_local_vm_connect_to_local_host_ok() {
+ local ns="local0"
+ local port=1234
+ local pidfile
+ local rc
+
+ init_namespaces
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+
+ if ! vm_start "${pidfile}" "${ns}"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
+ host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+ rc=$?
+
+ terminate_pidfiles "${pidfile}"
+
+ if [[ $rc -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
namespaces_can_boot_same_cid() {
local ns0=$1
local ns1=$2
@@ -820,6 +1278,7 @@ fi
check_args "${ARGS[@]}"
check_deps
check_vng
+check_socat
handle_build
echo "1..${#ARGS[@]}"
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 13/14] selftests/vsock: add tests for namespace deletion and mode changes
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (11 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 12/14] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 14/14] selftests/vsock: add tests for module loading order Bobby Eshleman
2025-10-27 13:28 ` [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Stefano Garzarella
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests that validate vsock sockets are resilient to deleting
namespaces or changing namespace modes from global to local. The vsock
sockets should still function normally.
The function check_ns_changes_dont_break_connection() is added to re-use
the step-by-step logic of 1) setup connections, 2) do something that
would maybe break the connections, 3) check that the connections are
still ok.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 123 ++++++++++++++++++++++++++++++++
1 file changed, 123 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 60d349c80153..014cecd93858 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -62,6 +62,12 @@ readonly TEST_NAMES=(
ns_same_local_loopback_ok
ns_same_local_host_connect_to_local_vm_ok
ns_same_local_vm_connect_to_local_host_ok
+ ns_mode_change_connection_continue_vm_ok
+ ns_mode_change_connection_continue_host_ok
+ ns_mode_change_connection_continue_both_ok
+ ns_delete_vm_ok
+ ns_delete_host_ok
+ ns_delete_both_ok
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -129,6 +135,24 @@ readonly TEST_DESCS=(
# ns_same_local_vm_connect_to_local_host_ok
"Run vsock_test client in VM in a local ns with server in same ns."
+
+ # ns_mode_change_connection_continue_vm_ok
+ "Check that changing NS mode of VM namespace from global to local after a connection is established doesn't break the connection"
+
+ # ns_mode_change_connection_continue_host_ok
+ "Check that changing NS mode of host namespace from global to local after a connection is established doesn't break the connection"
+
+ # ns_mode_change_connection_continue_both_ok
+ "Check that changing NS mode of host and VM namespaces from global to local after a connection is established doesn't break the connection"
+
+ # ns_delete_vm_ok
+ "Check that deleting the VM's namespace does not break the socket connection"
+
+ # ns_delete_host_ok
+ "Check that deleting the host's namespace does not break the socket connection"
+
+ # ns_delete_both_ok
+ "Check that deleting the VM and host's namespaces does not break the socket connection"
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
@@ -1143,6 +1167,105 @@ test_vm_loopback() {
return "${KSFT_PASS}"
}
+check_ns_changes_dont_break_connection() {
+ local ns0="global0"
+ local ns1="global1"
+ local port=12345
+ local pidfile
+ local outfile
+ local pids=()
+ local rc=0
+
+ init_namespaces
+
+ pidfile=$(mktemp $PIDFILE_TEMPLATE)
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ return "${KSFT_FAIL}"
+ fi
+ vm_wait_for_ssh "${ns0}"
+
+ outfile=$(mktemp)
+ vm_ssh "${ns0}" -- \
+ socat VSOCK-LISTEN:"${port}",fork STDOUT > "${outfile}" 2>/dev/null &
+ pids+=($!)
+
+ # wait_for_listener() does not work for vsock because vsock does not
+ # export socket state to /proc/net/. Instead, we have no choice but to
+ # sleep for some hardcoded time.
+ sleep ${WAIT_PERIOD}
+
+ # We use a pipe here so that we can echo into the pipe instead of
+ # using socat and a unix socket file.
+ local pipefile=$(mktemp -u /tmp/vmtest_pipe_XXXX)
+ ip netns exec "${ns1}" \
+ socat PIPE:"${pipefile}" VSOCK-CONNECT:"${VSOCK_CID}":"${port}" &
+ pids+=($!)
+
+ timeout ${WAIT_PERIOD} \
+ bash -c 'while [[ ! -e '"${pipefile}"' ]]; do sleep 1; done; exit 0'
+
+ if [[ $2 == "delete" ]]; then
+ if [[ "$1" == "vm" ]]; then
+ ip netns del "${ns0}"
+ elif [[ "$1" == "host" ]]; then
+ ip netns del "${ns1}"
+ elif [[ "$1" == "both" ]]; then
+ ip netns del "${ns0}"
+ ip netns del "${ns1}"
+ fi
+ elif [[ $2 == "change_mode" ]]; then
+ if [[ "$1" == "vm" ]]; then
+ ns_set_mode "${ns0}" "local"
+ elif [[ "$1" == "host" ]]; then
+ ns_set_mode "${ns1}" "local"
+ elif [[ "$1" == "both" ]]; then
+ ns_set_mode "${ns0}" "local"
+ ns_set_mode "${ns1}" "local"
+ fi
+ fi
+
+ echo "TEST" > "${pipefile}"
+
+ timeout ${WAIT_PERIOD} \
+ bash -c 'while [[ ! -s '"${outfile}"' ]]; do sleep 1; done; exit 0'
+
+ if grep -q "TEST" "${outfile}"; then
+ rc="${KSFT_PASS}"
+ else
+ rc="${KSFT_FAIL}"
+ fi
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pids[@]}"
+ rm -f "${outfile}"
+
+ return "${rc}"
+}
+
+test_ns_mode_change_connection_continue_vm_ok() {
+ check_ns_changes_dont_break_connection "vm" "change_mode"
+}
+
+test_ns_mode_change_connection_continue_host_ok() {
+ check_ns_changes_dont_break_connection "host" "change_mode"
+}
+
+test_ns_mode_change_connection_continue_both_ok() {
+ check_ns_changes_dont_break_connection "both" "change_mode"
+}
+
+test_ns_delete_vm_ok() {
+ check_ns_changes_dont_break_connection "vm" "delete"
+}
+
+test_ns_delete_host_ok() {
+ check_ns_changes_dont_break_connection "host" "delete"
+}
+
+test_ns_delete_both_ok() {
+ check_ns_changes_dont_break_connection "both" "delete"
+}
+
shared_vm_test() {
local tname
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next v8 14/14] selftests/vsock: add tests for module loading order
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (12 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 13/14] selftests/vsock: add tests for namespace deletion and mode changes Bobby Eshleman
@ 2025-10-23 18:27 ` Bobby Eshleman
2025-10-27 13:28 ` [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Stefano Garzarella
14 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-23 18:27 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, Broadcom internal kernel review list, Bobby Eshleman
Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
linux-hyperv, berrange, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests to check that module loading order does not break
vsock_loopback. Because vsock_loopback has some per-namespace data
structure initialization that affects vsock namespace modes, lets make
sure that namespace modes are respected and loopback sockets are
functional even when the namespaces and modes are set prior to loading
the vsock_loopback module.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 138 ++++++++++++++++++++++++++++++++
1 file changed, 138 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 014cecd93858..9aa3200b160f 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -68,6 +68,8 @@ readonly TEST_NAMES=(
ns_delete_vm_ok
ns_delete_host_ok
ns_delete_both_ok
+ ns_loopback_global_global_late_module_load_ok
+ ns_loopback_local_local_late_module_load_fails
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -153,6 +155,12 @@ readonly TEST_DESCS=(
# ns_delete_both_ok
"Check that deleting the VM and host's namespaces does not break the socket connection"
+
+ # ns_loopback_global_global_late_module_load_ok
+ "Test that loopback still works in global namespaces initialized prior to loading the vsock_loopback kmod"
+
+ # ns_loopback_local_local_late_module_load_fails
+ "Test that loopback connections still fail between local namespaces initialized prior to loading the vsock_loopback kmod"
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
@@ -914,6 +922,30 @@ test_ns_diff_local_vm_connect_to_local_host_fails() {
return "${KSFT_FAIL}"
}
+unload_module() {
+ local module=$1
+ local retries=5
+ readonly retries
+ local delay=1
+ local i
+
+ # Sometimes previously executed tests may result in a delayed release
+ # of the reference to the vsock_loopback module and result in the
+ # module being unremovable. For that reason, we use retries to allow
+ # some time for those references to be dropped.
+ for ((i = 0; i < ${retries}; i++)); do
+ modprobe -r "${module}" 2>/dev/null || :
+
+ if [[ "$(lsmod | grep -c ${module})" -eq 0 ]]; then
+ return 0
+ fi
+
+ sleep ${delay}
+ done
+
+ return 1
+}
+
__test_loopback_two_netns() {
local ns0=$1
local ns1=$2
@@ -1266,6 +1298,112 @@ test_ns_delete_both_ok() {
check_ns_changes_dont_break_connection "both" "delete"
}
+test_ns_loopback_global_global_late_module_load_ok() {
+ declare -a pids
+ local unixfile
+ local ns0 ns1
+ local pids
+ local port
+
+ if ! unload_module vsock_loopback; then
+ log_host "Unable to unload vsock_loopback, skipping..."
+ return "${KSFT_SKIP}"
+ fi
+
+ ns0=loopback_ns0
+ ns1=loopback_ns1
+
+ ip netns del "${ns0}" &>/dev/null || :
+ ip netns del "${ns1}" &>/dev/null || :
+ ip netns add "${ns0}"
+ ip netns add "${ns1}"
+ ns_set_mode "${ns0}" global
+ ns_set_mode "${ns1}" global
+ ip netns exec "${ns0}" ip link set dev lo up
+ ip netns exec "${ns1}" ip link set dev lo up
+
+ modprobe vsock_loopback &> /dev/null || :
+
+ unixfile=$(mktemp -u /tmp/XXXX.sock)
+ port=321
+ ip netns exec "${ns1}" \
+ socat TCP-LISTEN:"${port}",fork \
+ UNIX-CONNECT:"${unixfile}" &
+ pids+=($!)
+
+ host_wait_for_listener "${ns1}" "${port}"
+ ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \
+ TCP-CONNECT:localhost:"${port}" &
+ pids+=($!)
+
+ if ! host_vsock_test "${ns0}" "server" 1 "${port}"; then
+ ip netns del "${ns0}" &>/dev/null || :
+ ip netns del "${ns1}" &>/dev/null || :
+ terminate_pids "${pids[@]}"
+ return "${KSFT_FAIL}"
+ fi
+
+ if ! host_vsock_test "${ns1}" "127.0.0.1" 1 "${port}"; then
+ ip netns del "${ns0}" &>/dev/null || :
+ ip netns del "${ns1}" &>/dev/null || :
+ terminate_pids "${pids[@]}"
+ return "${KSFT_FAIL}"
+ fi
+
+ ip netns del "${ns0}" &>/dev/null || :
+ ip netns del "${ns1}" &>/dev/null || :
+ terminate_pids "${pids[@]}"
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_loopback_local_local_late_module_load_fails() {
+ declare -a pids
+ local ns0 ns1
+ local outfile
+ local pids
+ local rc
+
+ if ! unload_module vsock_loopback; then
+ log_host "Unable to unload vsock_loopback, skipping..."
+ return "${KSFT_SKIP}"
+ fi
+
+ ns0=loopback_ns0
+ ns1=loopback_ns1
+
+ ip netns del "${ns0}" &>/dev/null || :
+ ip netns del "${ns1}" &>/dev/null || :
+ ip netns add "${ns0}"
+ ip netns add "${ns1}"
+ ns_set_mode "${ns0}" local
+ ns_set_mode "${ns1}" local
+
+ modprobe vsock_loopback &> /dev/null || :
+
+ outfile=$(mktemp /tmp/XXXX.vmtest.out)
+ ip netns exec "${ns0}" socat VSOCK-LISTEN:${port} STDOUT \
+ > "${outfile}" 2>/dev/null &
+ pids+=($!)
+
+ echo TEST | \
+ ip netns exec "${ns1}" socat STDIN VSOCK-CONNECT:1:${port} \
+ 2>/dev/null
+
+ if grep -q "TEST" "${outfile}" 2>/dev/null; then
+ rc="${KSFT_FAIL}"
+ else
+ rc="${KSFT_PASS}"
+ fi
+
+ ip netns del "${ns0}" &>/dev/null || :
+ ip netns del "${ns1}" &>/dev/null || :
+ terminate_pids "${pids[@]}"
+ rm -f "${outfile}"
+
+ return "${rc}"
+}
+
shared_vm_test() {
local tname
--
2.47.3
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
` (13 preceding siblings ...)
2025-10-23 18:27 ` [PATCH net-next v8 14/14] selftests/vsock: add tests for module loading order Bobby Eshleman
@ 2025-10-27 13:28 ` Stefano Garzarella
2025-10-27 17:25 ` Bobby Eshleman
14 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-10-27 13:28 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
Hi Bobby,
On Thu, Oct 23, 2025 at 11:27:39AM -0700, Bobby Eshleman wrote:
>This series adds namespace support to vhost-vsock and loopback. It does
>not add namespaces to any of the other guest transports (virtio-vsock,
>hyperv, or vmci).
>
>The current revision supports two modes: local and global. Local
>mode is complete isolation of namespaces, while global mode is complete
>sharing between namespaces of CIDs (the original behavior).
>
>The mode is set using /proc/sys/net/vsock/ns_mode.
>
>Modes are per-netns and write-once. This allows a system to configure
>namespaces independently (some may share CIDs, others are completely
>isolated). This also supports future possible mixed use cases, where
>there may be namespaces in global mode spinning up VMs while there are
>mixed mode namespaces that provide services to the VMs, but are not
>allowed to allocate from the global CID pool (this mode not implemented
>in this series).
>
>If a socket or VM is created when a namespace is global but the
>namespace changes to local, the socket or VM will continue working
>normally. That is, the socket or VM assumes the mode behavior of the
>namespace at the time the socket/VM was created. The original mode is
>captured in vsock_create() and so occurs at the time of socket(2) and
>accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This
>prevents a socket/VM connection from suddenly breaking due to a
>namespace mode change. Any new sockets/VMs created after the mode change
>will adopt the new mode's behavior.
>
>Additionally, added tests for the new namespace features:
>
>tools/testing/selftests/vsock/vmtest.sh
>1..30
>ok 1 vm_server_host_client
>ok 2 vm_client_host_server
>ok 3 vm_loopback
>ok 4 ns_host_vsock_ns_mode_ok
>ok 5 ns_host_vsock_ns_mode_write_once_ok
>ok 6 ns_global_same_cid_fails
>ok 7 ns_local_same_cid_ok
>ok 8 ns_global_local_same_cid_ok
>ok 9 ns_local_global_same_cid_ok
>ok 10 ns_diff_global_host_connect_to_global_vm_ok
>ok 11 ns_diff_global_host_connect_to_local_vm_fails
>ok 12 ns_diff_global_vm_connect_to_global_host_ok
>ok 13 ns_diff_global_vm_connect_to_local_host_fails
>ok 14 ns_diff_local_host_connect_to_local_vm_fails
>ok 15 ns_diff_local_vm_connect_to_local_host_fails
>ok 16 ns_diff_global_to_local_loopback_local_fails
>ok 17 ns_diff_local_to_global_loopback_fails
>ok 18 ns_diff_local_to_local_loopback_fails
>ok 19 ns_diff_global_to_global_loopback_ok
>ok 20 ns_same_local_loopback_ok
>ok 21 ns_same_local_host_connect_to_local_vm_ok
>ok 22 ns_same_local_vm_connect_to_local_host_ok
>ok 23 ns_mode_change_connection_continue_vm_ok
>ok 24 ns_mode_change_connection_continue_host_ok
>ok 25 ns_mode_change_connection_continue_both_ok
>ok 26 ns_delete_vm_ok
>ok 27 ns_delete_host_ok
>ok 28 ns_delete_both_ok
>ok 29 ns_loopback_global_global_late_module_load_ok
>ok 30 ns_loopback_local_local_late_module_load_fails
>SUMMARY: PASS=30 SKIP=0 FAIL=0
>
>Dependent on series:
>https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com/
>
>Thanks again for everyone's help and reviews!
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
>To: Stefano Garzarella <sgarzare@redhat.com>
>To: Shuah Khan <shuah@kernel.org>
>To: David S. Miller <davem@davemloft.net>
>To: Eric Dumazet <edumazet@google.com>
>To: Jakub Kicinski <kuba@kernel.org>
>To: Paolo Abeni <pabeni@redhat.com>
>To: Simon Horman <horms@kernel.org>
>To: Stefan Hajnoczi <stefanha@redhat.com>
>To: Michael S. Tsirkin <mst@redhat.com>
>To: Jason Wang <jasowang@redhat.com>
>To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>To: Eugenio Pérez <eperezma@redhat.com>
>To: K. Y. Srinivasan <kys@microsoft.com>
>To: Haiyang Zhang <haiyangz@microsoft.com>
>To: Wei Liu <wei.liu@kernel.org>
>To: Dexuan Cui <decui@microsoft.com>
>To: Bryan Tan <bryan-bt.tan@broadcom.com>
>To: Vishnu Dasa <vishnu.dasa@broadcom.com>
>To: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
>Cc: virtualization@lists.linux.dev
>Cc: netdev@vger.kernel.org
>Cc: linux-kselftest@vger.kernel.org
>Cc: linux-kernel@vger.kernel.org
>Cc: kvm@vger.kernel.org
>Cc: linux-hyperv@vger.kernel.org
>Cc: berrange@redhat.com
>
>Changes in v8:
>- Break generic cleanup/refactoring patches into standalone series,
> remove those from this series
Yep, thanks for splitting the series. I'll review it ASAP since it's a
dependency.
I was at GSoC mentor summit last week, so I'm bit busy with the backlog,
but I'll do my best to review both series this week.
Thanks,
Stefano
>- Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com/
>- Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com
>
>Changes in v7:
>- fix hv_sock build
>- break out vmtest patches into distinct, more well-scoped patches
>- change `orig_net_mode` to `net_mode`
>- many fixes and style changes in per-patch change sets (see individual
> patches for specific changes)
>- optimize `virtio_vsock_skb_cb` layout
>- update commit messages with more useful descriptions
>- vsock_loopback: use orig_net_mode instead of current net mode
>- add tests for edge cases (ns deletion, mode changing, loopback module
> load ordering)
>- Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com
>
>Changes in v6:
>- define behavior when mode changes to local while socket/VM is alive
>- af_vsock: clarify description of CID behavior
>- af_vsock: use stronger langauge around CID rules (dont use "may")
>- af_vsock: improve naming of buf/buffer
>- af_vsock: improve string length checking on proc writes
>- vsock_loopback: add space in struct to clarify lock protection
>- vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit()
>- vsock_loopback: use virtio_vsock_skb_net() instead of sock_net()
>- vsock_loopback: set loopback to NULL after kfree()
>- vsock_loopback: use pernet_operations and remove callback mechanism
>- vsock_loopback: add macros for "global" and "local"
>- vsock_loopback: fix length checking
>- vmtest.sh: check for namespace support in vmtest.sh
>- Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com
>
>Changes in v5:
>- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
>- vsock_global_net -> vsock_global_dummy_net
>- fix netns lookup in vhost_vsock to respect pid namespaces
>- add callbacks for vsock_loopback to avoid circular dependency
>- vmtest.sh loads vsock_loopback module
>- remove vsock_net_mode_can_set()
>- change vsock_net_write_mode() to return true/false based on success
>- make vsock_net_mode enum instead of u8
>- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
>
>Changes in v4:
>- removed RFC tag
>- implemented loopback support
>- renamed new tests to better reflect behavior
>- completed suite of tests with permutations of ns modes and vsock_test
> as guest/host
>- simplified socat bridging with unix socket instead of tcp + veth
>- only use vsock_test for success case, socat for failure case (context
> in commit message)
>- lots of cleanup
>
>Changes in v3:
>- add notion of "modes"
>- add procfs /proc/net/vsock_ns_mode
>- local and global modes only
>- no /dev/vhost-vsock-netns
>- vmtest.sh already merged, so new patch just adds new tests for NS
>- Link to v2:
> https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
>
>Changes in v2:
>- only support vhost-vsock namespaces
>- all g2h namespaces retain old behavior, only common API changes
> impacted by vhost-vsock changes
>- add /dev/vhost-vsock-netns for "opt-in"
>- leave /dev/vhost-vsock to old behavior
>- removed netns module param
>- Link to v1:
> https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
>
>Changes in v1:
>- added 'netns' module param to vsock.ko to enable the
> network namespace support (disabled by default)
>- added 'vsock_net_eq()' to check the "net" assigned to a socket
> only when 'netns' support is enabled
>- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
>
>---
>Bobby Eshleman (14):
> vsock: a per-net vsock NS mode state
> vsock/virtio: pack struct virtio_vsock_skb_cb
> vsock: add netns to vsock skb cb
> vsock: add netns to vsock core
> vsock/loopback: add netns support
> vsock/virtio: add netns to virtio transport common
> vhost/vsock: add netns support
> selftests/vsock: add namespace helpers to vmtest.sh
> selftests/vsock: prepare vm management helpers for namespaces
> selftests/vsock: add tests for proc sys vsock ns_mode
> selftests/vsock: add namespace tests for CID collisions
> selftests/vsock: add tests for host <-> vm connectivity with namespaces
> selftests/vsock: add tests for namespace deletion and mode changes
> selftests/vsock: add tests for module loading order
>
> MAINTAINERS | 1 +
> drivers/vhost/vsock.c | 48 +-
> include/linux/virtio_vsock.h | 47 +-
> include/net/af_vsock.h | 70 ++-
> include/net/net_namespace.h | 4 +
> include/net/netns/vsock.h | 22 +
> net/vmw_vsock/af_vsock.c | 264 +++++++-
> net/vmw_vsock/virtio_transport.c | 7 +-
> net/vmw_vsock/virtio_transport_common.c | 21 +-
> net/vmw_vsock/vsock_loopback.c | 89 ++-
> tools/testing/selftests/vsock/vmtest.sh | 1044 ++++++++++++++++++++++++++++++-
> 11 files changed, 1532 insertions(+), 85 deletions(-)
>---
>base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59
>change-id: 20250325-vsock-vmtest-b3a21d2102c2
>prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com>
>prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5
>prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85
>prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449
>prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61
>prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909
>prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd
>prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c
>prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671
>prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325
>prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf
>prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698
>prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df
>
>Best regards,
>--
>Bobby Eshleman <bobbyeshleman@meta.com>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock
2025-10-27 13:28 ` [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Stefano Garzarella
@ 2025-10-27 17:25 ` Bobby Eshleman
2025-11-06 16:23 ` Stefano Garzarella
0 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-10-27 17:25 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Mon, Oct 27, 2025 at 02:28:31PM +0100, Stefano Garzarella wrote:
> Hi Bobby,
>
> >
> > Changes in v8:
> > - Break generic cleanup/refactoring patches into standalone series,
> > remove those from this series
>
> Yep, thanks for splitting the series. I'll review it ASAP since it's a
> dependency.
>
> I was at GSoC mentor summit last week, so I'm bit busy with the backlog, but
> I'll do my best to review both series this week.
>
> Thanks,
> Stefano
>
Thanks for the heads up!
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state
2025-10-23 18:27 ` [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
@ 2025-11-06 16:16 ` Stefano Garzarella
2025-11-07 1:09 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:16 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:40AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add the per-net vsock NS mode state. This only adds the structure for
>holding the mode and some of the functions for setting/getting and
>checking the mode, but does not integrate the functionality yet.
>
>A "net_mode" field is added to vsock_sock to store the mode of the
>namespace when the vsock_sock was created. In order to evaluate
>namespace mode rules we need to know both a) which namespace the
>endpoints are in, and b) what mode that namespace had when the endpoints
>were created. This allows us to handle the changing of modes from global
>to local *after* a socket has been created by remembering that the mode
>was global when the socket was created. If we were to use the current
>net's mode instead, then the lookup would fail and the socket would
>break.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
>Changes in v7:
>- clarify vsock_net_check_mode() comments
>- change to `orig_net_mode == VSOCK_NET_MODE_GLOBAL && orig_net_mode == vsk->orig_net_mode`
>- remove extraneous explanation of `orig_net_mode`
>- rename `written` to `mode_locked`
>- rename `vsock_hdr` to `sysctl_hdr`
>- change `orig_net_mode` to `net_mode`
>- make vsock_net_check_mode() more generic by taking just net pointers
> and modes, instead of a vsock_sock ptr, for reuse by transports
> (e.g., vhost_vsock)
>
>Changes in v6:
>- add orig_net_mode to store mode at creation time which will be used to
> avoid breakage when namespace changes mode during socket/VM lifespan
>
>Changes in v5:
>- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
>- change from net->vsock.ns_mode to net->vsock.mode
>- change vsock_net_set_mode() to vsock_net_write_mode()
>- vsock_net_write_mode() returns bool for write success to avoid
> need to use vsock_net_mode_can_set()
>- remove vsock_net_mode_can_set()
>---
> MAINTAINERS | 1 +
> include/net/af_vsock.h | 56 +++++++++++++++++++++++++++++++++++++++++++++
> include/net/net_namespace.h | 4 ++++
> include/net/netns/vsock.h | 20 ++++++++++++++++
> 4 files changed, 81 insertions(+)
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index ea72b3bd2248..dd765bbf79ab 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -27070,6 +27070,7 @@ L: netdev@vger.kernel.org
> S: Maintained
> F: drivers/vhost/vsock.c
> F: include/linux/virtio_vsock.h
>+F: include/net/netns/vsock.h
> F: include/uapi/linux/virtio_vsock.h
> F: net/vmw_vsock/virtio_transport.c
> F: net/vmw_vsock/virtio_transport_common.c
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index d40e978126e3..bce5389ef742 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -10,6 +10,7 @@
>
> #include <linux/kernel.h>
> #include <linux/workqueue.h>
>+#include <net/netns/vsock.h>
> #include <net/sock.h>
> #include <uapi/linux/vm_sockets.h>
>
>@@ -65,6 +66,7 @@ struct vsock_sock {
> u32 peer_shutdown;
> bool sent_request;
> bool ignore_connecting_rst;
>+ enum vsock_net_mode net_mode;
>
> /* Protected by lock_sock(sk) */
> u64 buffer_size;
>@@ -256,4 +258,58 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> {
> return t->msgzerocopy_allow && t->msgzerocopy_allow();
> }
>+
>+static inline enum vsock_net_mode vsock_net_mode(struct net *net)
>+{
>+ enum vsock_net_mode ret;
>+
>+ spin_lock_bh(&net->vsock.lock);
>+ ret = net->vsock.mode;
Do we really need a spin_lock just to set/get a variable?
What about WRITE_ONCE/READ_ONCE and/or atomic ?
Not a strong opinion, just to check if we can do something like this:
static inline enum vsock_net_mode vsock_net_mode(struct net *net)
{
return READ_ONCE(net->vsock.mode);
}
static inline bool vsock_net_write_mode(struct net *net, u8 mode)
{
// Or using test_and_set_bit() if you prefer
if (xchg(&net->vsock.mode_locked, true))
return false;
WRITE_ONCE(net->vsock.mode, mode);
return true;
}
Thanks,
Stefano
>+ spin_unlock_bh(&net->vsock.lock);
>+ return ret;
>+}
>+
>+static inline bool vsock_net_write_mode(struct net *net, u8 mode)
>+{
>+ bool ret;
>+
>+ spin_lock_bh(&net->vsock.lock);
>+
>+ if (net->vsock.mode_locked) {
>+ ret = false;
>+ goto skip;
>+ }
>+
>+ net->vsock.mode = mode;
>+ net->vsock.mode_locked = true;
>+ ret = true;
>+
>+skip:
>+ spin_unlock_bh(&net->vsock.lock);
>+ return ret;
>+}
>+
>+/* Return true if two namespaces and modes pass the mode rules. Otherwise,
>+ * return false.
>+ *
>+ * ns0 and ns1 are the namespaces being checked.
>+ * mode0 and mode1 are the vsock namespace modes of ns0 and ns1.
>+ *
>+ * Read more about modes in the comment header of net/vmw_vsock/af_vsock.c.
>+ */
>+static inline bool vsock_net_check_mode(struct net *ns0, enum vsock_net_mode mode0,
>+ struct net *ns1, enum vsock_net_mode mode1)
>+{
>+ /* Any vsocks within the same network namespace are always reachable,
>+ * regardless of the mode.
>+ */
>+ if (net_eq(ns0, ns1))
>+ return true;
>+
>+ /*
>+ * If the network namespaces differ, vsocks are only reachable if both
>+ * were created in VSOCK_NET_MODE_GLOBAL mode.
>+ */
>+ return mode0 == VSOCK_NET_MODE_GLOBAL && mode0 == mode1;
>+}
> #endif /* __AF_VSOCK_H__ */
>diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
>index cb664f6e3558..66d3de1d935f 100644
>--- a/include/net/net_namespace.h
>+++ b/include/net/net_namespace.h
>@@ -37,6 +37,7 @@
> #include <net/netns/smc.h>
> #include <net/netns/bpf.h>
> #include <net/netns/mctp.h>
>+#include <net/netns/vsock.h>
> #include <net/net_trackers.h>
> #include <linux/ns_common.h>
> #include <linux/idr.h>
>@@ -196,6 +197,9 @@ struct net {
> /* Move to a better place when the config guard is removed. */
> struct mutex rtnl_mutex;
> #endif
>+#if IS_ENABLED(CONFIG_VSOCKETS)
>+ struct netns_vsock vsock;
>+#endif
> } __randomize_layout;
>
> #include <linux/seq_file_net.h>
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>new file mode 100644
>index 000000000000..c9a438ad52f2
>--- /dev/null
>+++ b/include/net/netns/vsock.h
>@@ -0,0 +1,20 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+#ifndef __NET_NET_NAMESPACE_VSOCK_H
>+#define __NET_NET_NAMESPACE_VSOCK_H
>+
>+#include <linux/types.h>
>+
>+enum vsock_net_mode {
>+ VSOCK_NET_MODE_GLOBAL,
>+ VSOCK_NET_MODE_LOCAL,
>+};
>+
>+struct netns_vsock {
>+ struct ctl_table_header *sysctl_hdr;
>+ spinlock_t lock;
>+
>+ /* protected by lock */
>+ enum vsock_net_mode mode;
>+ bool mode_locked;
>+};
>+#endif /* __NET_NET_NAMESPACE_VSOCK_H */
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 02/14] vsock/virtio: pack struct virtio_vsock_skb_cb
2025-10-23 18:27 ` [PATCH net-next v8 02/14] vsock/virtio: pack struct virtio_vsock_skb_cb Bobby Eshleman
@ 2025-11-06 16:16 ` Stefano Garzarella
0 siblings, 0 replies; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:16 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:41AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Reduce holes in struct virtio_vsock_skb_cb. As this struct continues to
>grow, we want to keep it trimmed down so it doesn't exceed the size of
>skb->cb (currently 48 bytes). Eliminating the 2 byte hole provides an
>additional two bytes for new fields at the end of the structure. It does
>not shrink the total size, however.
>
>Future work could include combining fields like reply and tap_delivered
>into a single bitfield, but currently doing so will not make the total
>struct size smaller (although, would extend the tail-end padding area by
>one byte).
>
>Before this patch:
>
>struct virtio_vsock_skb_cb {
> bool reply; /* 0 1 */
> bool tap_delivered; /* 1 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> u32 offset; /* 4 4 */
>
> /* size: 8, cachelines: 1, members: 3 */
> /* sum members: 6, holes: 1, sum holes: 2 */
> /* last cacheline: 8 bytes */
>};
>;
>
>After this patch:
>
>struct virtio_vsock_skb_cb {
> u32 offset; /* 0 4 */
> bool reply; /* 4 1 */
> bool tap_delivered; /* 5 1 */
>
> /* size: 8, cachelines: 1, members: 3 */
> /* padding: 2 */
> /* last cacheline: 8 bytes */
>};
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
> include/linux/virtio_vsock.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Yeah, thanks for that!
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 0c67543a45c8..87cf4dcac78a 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -10,9 +10,9 @@
> #define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
>
> struct virtio_vsock_skb_cb {
>+ u32 offset;
> bool reply;
> bool tap_delivered;
>- u32 offset;
> };
>
> #define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb))
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 03/14] vsock: add netns to vsock skb cb
2025-10-23 18:27 ` [PATCH net-next v8 03/14] vsock: add netns to vsock skb cb Bobby Eshleman
@ 2025-11-06 16:17 ` Stefano Garzarella
0 siblings, 0 replies; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:17 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:42AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add a net pointer and net_mode to the vsock skb and helpers for
>getting/setting them. When skbs are received the transport needs a way
>to tell the vsock layer and/or virtio common layer which namespace and
>what namespace mode the packet belongs to. This will be used by those
>upper layers for finding the correct socket object. This patch stashes
>these fields in the skb control buffer.
>
>This extends virtio_vsock_skb_cb to 24 bytes:
>
>struct virtio_vsock_skb_cb {
> struct net * net; /* 0 8 */
> enum vsock_net_mode net_mode; /* 8 4 */
> u32 offset; /* 12 4 */
> bool reply; /* 16 1 */
> bool tap_delivered; /* 17 1 */
>
> /* size: 24, cachelines: 1, members: 5 */
> /* padding: 6 */
> /* last cacheline: 24 bytes */
>};
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
>Changes in v7:
>- rename `orig_net_mode` to `net_mode`
>- update commit message with a more complete explanation of changes
>
>Changes in v5:
>- some diff context change due to rebase to current net-next
>---
> include/linux/virtio_vsock.h | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 87cf4dcac78a..7f334a32133c 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -10,6 +10,8 @@
> #define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
>
> struct virtio_vsock_skb_cb {
>+ struct net *net;
>+ enum vsock_net_mode net_mode;
> u32 offset;
> bool reply;
> bool tap_delivered;
>@@ -130,6 +132,27 @@ static inline size_t virtio_vsock_skb_len(struct sk_buff *skb)
> return (size_t)(skb_end_pointer(skb) - skb->head);
> }
>
>+static inline struct net *virtio_vsock_skb_net(struct sk_buff *skb)
>+{
>+ return VIRTIO_VSOCK_SKB_CB(skb)->net;
>+}
>+
>+static inline void virtio_vsock_skb_set_net(struct sk_buff *skb, struct net *net)
>+{
>+ VIRTIO_VSOCK_SKB_CB(skb)->net = net;
>+}
>+
>+static inline enum vsock_net_mode virtio_vsock_skb_net_mode(struct sk_buff *skb)
>+{
>+ return VIRTIO_VSOCK_SKB_CB(skb)->net_mode;
>+}
>+
>+static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
>+ enum vsock_net_mode net_mode)
>+{
>+ VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
>+}
>+
> /* Dimension the RX SKB so that the entire thing fits exactly into
> * a single 4KiB page. This avoids wasting memory due to alloc_skb()
> * rounding up to the next page order and also means that we
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 04/14] vsock: add netns to vsock core
2025-10-23 18:27 ` [PATCH net-next v8 04/14] vsock: add netns to vsock core Bobby Eshleman
@ 2025-11-06 16:18 ` Stefano Garzarella
2025-11-07 2:03 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:18 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add netns logic to vsock core. Additionally, modify transport hook
>prototypes to be used by later transport-specific patches (e.g.,
>*_seqpacket_allow()).
>
>Namespaces are supported primarily by changing socket lookup functions
>(e.g., vsock_find_connected_socket()) to take into account the socket
>namespace and the namespace mode before considering a candidate socket a
>"match".
>
>Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
>used by transports that do not support namespacing. This dummy always
>has mode "global" to preserve previous CID behavior.
>
>This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
>accepts the "global" or "local" mode strings.
>
>The transports (besides vhost) are modified to use the global dummy,
>which makes them behave as if always in the global namespace. Vhost is
>an exception because it inherits its namespace from the process that
>opens the vhost device.
>
>Add netns functionality (initialization, passing to transports, procfs,
>etc...) to the af_vsock socket layer. Later patches that add netns
>support to transports depend on this patch.
>
>seqpacket_allow() callbacks are modified to take a vsk so that transport
>implementations can inspect sock_net(sk) and vsk->net_mode when performing
>lookups (e.g., vhost does this in its future netns patch). Because the
>API change affects all transports, it seemed more appropriate to make
>this internal API change in the "vsock core" patch then in the "vhost"
>patch.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
>Changes in v7:
>- hv_sock: fix hyperv build error
>- explain why vhost does not use the dummy
>- explain usage of __vsock_global_dummy_net
>- explain why VSOCK_NET_MODE_STR_MAX is 8 characters
>- use switch-case in vsock_net_mode_string()
>- avoid changing transports as much as possible
>- add vsock_find_{bound,connected}_socket_net()
>- rename `vsock_hdr` to `sysctl_hdr`
>- add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
> global mode for virtio-vsock, move skb->cb zero-ing into wrapper
>- explain seqpacket_allow() change
>- move net setting to __vsock_create() instead of vsock_create() so
> that child sockets also have their net assigned upon accept()
>
>Changes in v6:
>- unregister sysctl ops in vsock_exit()
>- af_vsock: clarify description of CID behavior
>- af_vsock: fix buf vs buffer naming, and length checking
>- af_vsock: fix length checking w/ correct ctl_table->maxlen
>
>Changes in v5:
>- vsock_global_net() -> vsock_global_dummy_net()
>- update comments for new uAPI
>- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
>- add prototype changes so patch remains compilable
>---
> drivers/vhost/vsock.c | 4 +-
> include/linux/virtio_vsock.h | 21 ++++
> include/net/af_vsock.h | 14 ++-
> net/vmw_vsock/af_vsock.c | 264 ++++++++++++++++++++++++++++++++++++---
> net/vmw_vsock/virtio_transport.c | 7 +-
> net/vmw_vsock/vsock_loopback.c | 4 +-
> 6 files changed, 288 insertions(+), 26 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index ae01457ea2cd..34adf0cf9124 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
> return true;
> }
>
>-static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
> .transport = {
>@@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
> .send_pkt = vhost_transport_send_pkt,
> };
>
>-static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
> struct vhost_vsock *vsock;
> bool seqpacket_allow = false;
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 7f334a32133c..29290395054c 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
> VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
> }
>
>+static inline struct sk_buff *
>+virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
>+{
>+ struct sk_buff *skb;
>+
>+ skb = virtio_vsock_alloc_linear_skb(size, mask);
>+ if (!skb)
>+ return NULL;
>+
>+ memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
>+
>+ /* virtio-vsock does not yet support namespaces, so on receive
>+ * we force legacy namespace behavior using the global dummy net
>+ * and global net mode.
>+ */
>+ virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
>+ virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
>+
>+ return skb;
>+}
Why we are introducing this change in this patch?
Where the net of the virtio's skb is read?
>+
> /* Dimension the RX SKB so that the entire thing fits exactly into
> * a single 4KiB page. This avoids wasting memory due to alloc_skb()
> * rounding up to the next page order and also means that we
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index bce5389ef742..69bb70c3c0fd 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -145,7 +145,7 @@ struct vsock_transport {
> int flags);
> int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> size_t len);
>- bool (*seqpacket_allow)(u32 remote_cid);
>+ bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
> u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
>
> /* Notification. */
>@@ -218,6 +218,12 @@ void vsock_remove_connected(struct vsock_sock *vsk);
> struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> struct sockaddr_vm *dst);
>+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, struct net *net,
>+ enum vsock_net_mode net_mode);
>+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
>+ struct sockaddr_vm *dst,
>+ struct net *net,
>+ enum vsock_net_mode net_mode);
> void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(struct vsock_transport *transport,
> void (*fn)(struct sock *sk));
>@@ -259,6 +265,12 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> return t->msgzerocopy_allow && t->msgzerocopy_allow();
> }
>
>+extern struct net __vsock_global_dummy_net;
>+static inline struct net *vsock_global_dummy_net(void)
>+{
>+ return &__vsock_global_dummy_net;
>+}
>+
> static inline enum vsock_net_mode vsock_net_mode(struct net *net)
> {
> enum vsock_net_mode ret;
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 4c2db6cca557..656a78810c68 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -83,6 +83,35 @@
> * TCP_ESTABLISHED - connected
> * TCP_CLOSING - disconnecting
> * TCP_LISTEN - listening
>+ *
>+ * - Namespaces in vsock support two different modes configured
>+ * through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
>+ * Each mode defines how the namespace interacts with CIDs.
>+ * /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
>+ * and locked down by a namespace manager. The default is "global". The mode
>+ * is set per-namespace.
>+ *
>+ * The modes affect the allocation and accessibility of CIDs as follows:
>+ *
>+ * - global - access and allocation are all system-wide
>+ * - all CID allocation from global namespaces draw from the same
>+ * system-wide pool
>+ * - if one global namespace has already allocated some CID, another
>+ * global namespace will not be able to allocate the same CID
>+ * - global mode AF_VSOCK sockets can reach any VM or socket in any global
>+ * namespace, they are not contained to only their own namespace
>+ * - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
>+ * sockets in any local mode namespace
>+ * - local - access and allocation are contained within the namespace
>+ * - CID allocation draws only from a private pool local only to the
>+ * namespace, and does not affect the CIDs available for allocation in any
>+ * other namespace (global or local)
>+ * - VMs in a local namespace do not collide with CIDs in any other local
>+ * namespace or any global namespace. For example, if a VM in a local mode
>+ * namespace is given CID 10, then CID 10 is still available for
>+ * allocation in any other namespace, but not in the same namespace
>+ * - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
>+ * other sockets within their own namespace.
> */
>
> #include <linux/compat.h>
>@@ -100,6 +129,7 @@
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/net.h>
>+#include <linux/proc_fs.h>
> #include <linux/poll.h>
> #include <linux/random.h>
> #include <linux/skbuff.h>
>@@ -111,9 +141,18 @@
> #include <linux/workqueue.h>
> #include <net/sock.h>
> #include <net/af_vsock.h>
>+#include <net/netns/vsock.h>
> #include <uapi/linux/vm_sockets.h>
> #include <uapi/asm-generic/ioctls.h>
>
>+#define VSOCK_NET_MODE_STR_GLOBAL "global"
>+#define VSOCK_NET_MODE_STR_LOCAL "local"
>+
>+/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
>+ * The newline is added by proc_dostring() for read operations.
>+ */
>+#define VSOCK_NET_MODE_STR_MAX 8
>+
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>@@ -149,6 +188,15 @@ static const struct vsock_transport *transport_dgram;
> static const struct vsock_transport *transport_local;
> static DEFINE_MUTEX(vsock_register_mutex);
>
>+/* This net is used only for transports that do support namespaces. It is never
>+ * registered with the namespace subsystem and always has
>+ * VSOCK_NET_MODE_GLOBAL. Pass this net to the net lookup functions (e.g.,
>+ * vsock_find_bound_socket_net()) when you want to force global-mode or the
>+ * same behavior as before namespaces were supported.
>+ */
>+struct net __vsock_global_dummy_net;
>+EXPORT_SYMBOL_GPL(__vsock_global_dummy_net);
>+
> /**** UTILS ****/
>
> /* Each bound VSocket is stored in the bind hash table and each connected
>@@ -235,33 +283,44 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
>-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
>+ struct net *net,
>+ enum vsock_net_mode net_mode)
> {
> struct vsock_sock *vsk;
>
> list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>- if (vsock_addr_equals_addr(addr, &vsk->local_addr))
>- return sk_vsock(vsk);
>+ struct sock *sk = sk_vsock(vsk);
>+
>+ if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
>+ vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode))
>+ return sk;
>
> if (addr->svm_port == vsk->local_addr.svm_port &&
> (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
>- addr->svm_cid == VMADDR_CID_ANY))
>- return sk_vsock(vsk);
>+ addr->svm_cid == VMADDR_CID_ANY) &&
>+ vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode))
>+ return sk;
> }
>
> return NULL;
> }
>
>-static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>- struct sockaddr_vm *dst)
>+static struct sock *__vsock_find_connected_socket_net(struct sockaddr_vm *src,
>+ struct sockaddr_vm *dst,
>+ struct net *net,
>+ enum vsock_net_mode net_mode)
> {
> struct vsock_sock *vsk;
>
> list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
> connected_table) {
>+ struct sock *sk = sk_vsock(vsk);
>+
> if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
>- dst->svm_port == vsk->local_addr.svm_port) {
>- return sk_vsock(vsk);
>+ dst->svm_port == vsk->local_addr.svm_port &&
>+ vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode)) {
>+ return sk;
> }
> }
>
>@@ -304,12 +363,14 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_remove_connected);
>
>-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
>+ struct net *net,
>+ enum vsock_net_mode net_mode)
> {
> struct sock *sk;
>
> spin_lock_bh(&vsock_table_lock);
>- sk = __vsock_find_bound_socket(addr);
>+ sk = __vsock_find_bound_socket_net(addr, net, net_mode);
> if (sk)
> sock_hold(sk);
>
>@@ -317,15 +378,24 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>
> return sk;
> }
>+EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
>+
>+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>+{
>+ return vsock_find_bound_socket_net(addr, vsock_global_dummy_net(),
>+ VSOCK_NET_MODE_GLOBAL);
>+}
> EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
>
>-struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>- struct sockaddr_vm *dst)
>+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
>+ struct sockaddr_vm *dst,
>+ struct net *net,
>+ enum vsock_net_mode net_mode)
> {
> struct sock *sk;
>
> spin_lock_bh(&vsock_table_lock);
>- sk = __vsock_find_connected_socket(src, dst);
>+ sk = __vsock_find_connected_socket_net(src, dst, net, net_mode);
> if (sk)
> sock_hold(sk);
>
>@@ -333,6 +403,15 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>
> return sk;
> }
>+EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
>+
>+struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>+ struct sockaddr_vm *dst)
>+{
>+ return vsock_find_connected_socket_net(src, dst,
>+ vsock_global_dummy_net(),
>+ VSOCK_NET_MODE_GLOBAL);
>+}
> EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>
> void vsock_remove_sock(struct vsock_sock *vsk)
>@@ -528,7 +607,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>
> if (sk->sk_type == SOCK_SEQPACKET) {
> if (!new_transport->seqpacket_allow ||
>- !new_transport->seqpacket_allow(remote_cid)) {
>+ !new_transport->seqpacket_allow(vsk, remote_cid)) {
> module_put(new_transport->module);
> return -ESOCKTNOSUPPORT;
> }
>@@ -676,6 +755,7 @@ static void vsock_pending_work(struct work_struct *work)
> static int __vsock_bind_connectible(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
>+ struct net *net = sock_net(sk_vsock(vsk));
> static u32 port;
> struct sockaddr_vm new_addr;
>
>@@ -695,7 +775,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>
> new_addr.svm_port = port++;
>
>- if (!__vsock_find_bound_socket(&new_addr)) {
>+ if (!__vsock_find_bound_socket_net(&new_addr, net,
>+ vsk->net_mode)) {
> found = true;
> break;
> }
>@@ -712,7 +793,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return -EACCES;
> }
>
>- if (__vsock_find_bound_socket(&new_addr))
>+ if (__vsock_find_bound_socket_net(&new_addr, net,
>+ vsk->net_mode))
> return -EADDRINUSE;
> }
>
>@@ -836,6 +918,8 @@ static struct sock *__vsock_create(struct net *net,
> vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE;
> }
>
>+ vsk->net_mode = vsock_net_mode(net);
>+
> return sk;
> }
>
>@@ -2636,6 +2720,142 @@ static struct miscdevice vsock_device = {
> .fops = &vsock_device_ops,
> };
>
>+static int vsock_net_mode_string(const struct ctl_table *table, int write,
>+ void *buffer, size_t *lenp, loff_t *ppos)
>+{
>+ char data[VSOCK_NET_MODE_STR_MAX] = {0};
>+ enum vsock_net_mode mode;
>+ struct ctl_table tmp;
>+ struct net *net;
>+ int ret;
>+
>+ if (!table->data || !table->maxlen || !*lenp) {
>+ *lenp = 0;
>+ return 0;
>+ }
>+
>+ net = current->nsproxy->net_ns;
>+ tmp = *table;
>+ tmp.data = data;
>+
>+ if (!write) {
>+ const char *p;
>+
>+ mode = vsock_net_mode(net);
>+
>+ switch (mode) {
>+ case VSOCK_NET_MODE_GLOBAL:
>+ p = VSOCK_NET_MODE_STR_GLOBAL;
>+ break;
>+ case VSOCK_NET_MODE_LOCAL:
>+ p = VSOCK_NET_MODE_STR_LOCAL;
>+ break;
>+ default:
>+ WARN_ONCE(true, "netns has invalid vsock mode");
>+ *lenp = 0;
>+ return 0;
>+ }
>+
>+ strscpy(data, p, sizeof(data));
>+ tmp.maxlen = strlen(p);
>+ }
>+
>+ ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
>+ if (ret)
>+ return ret;
>+
>+ if (write) {
Do we need to check some capability, e.g. CAP_NET_ADMIN ?
The rest LGTM!
Stefano
>+ if (*lenp >= sizeof(data))
>+ return -EINVAL;
>+
>+ if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
>+ mode = VSOCK_NET_MODE_GLOBAL;
>+ else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
>+ mode = VSOCK_NET_MODE_LOCAL;
>+ else
>+ return -EINVAL;
>+
>+ if (!vsock_net_write_mode(net, mode))
>+ return -EPERM;
>+ }
>+
>+ return 0;
>+}
>+
>+static struct ctl_table vsock_table[] = {
>+ {
>+ .procname = "ns_mode",
>+ .data = &init_net.vsock.mode,
>+ .maxlen = VSOCK_NET_MODE_STR_MAX,
>+ .mode = 0644,
>+ .proc_handler = vsock_net_mode_string
>+ },
>+};
>+
>+static int __net_init vsock_sysctl_register(struct net *net)
>+{
>+ struct ctl_table *table;
>+
>+ if (net_eq(net, &init_net)) {
>+ table = vsock_table;
>+ } else {
>+ table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);
>+ if (!table)
>+ goto err_alloc;
>+
>+ table[0].data = &net->vsock.mode;
>+ }
>+
>+ net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table,
>+ ARRAY_SIZE(vsock_table));
>+ if (!net->vsock.sysctl_hdr)
>+ goto err_reg;
>+
>+ return 0;
>+
>+err_reg:
>+ if (!net_eq(net, &init_net))
>+ kfree(table);
>+err_alloc:
>+ return -ENOMEM;
>+}
>+
>+static void vsock_sysctl_unregister(struct net *net)
>+{
>+ const struct ctl_table *table;
>+
>+ table = net->vsock.sysctl_hdr->ctl_table_arg;
>+ unregister_net_sysctl_table(net->vsock.sysctl_hdr);
>+ if (!net_eq(net, &init_net))
>+ kfree(table);
>+}
>+
>+static void vsock_net_init(struct net *net)
>+{
>+ spin_lock_init(&net->vsock.lock);
>+ net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
>+}
>+
>+static __net_init int vsock_sysctl_init_net(struct net *net)
>+{
>+ vsock_net_init(net);
>+
>+ if (vsock_sysctl_register(net))
>+ return -ENOMEM;
>+
>+ return 0;
>+}
>+
>+static __net_exit void vsock_sysctl_exit_net(struct net *net)
>+{
>+ vsock_sysctl_unregister(net);
>+}
>+
>+static struct pernet_operations vsock_sysctl_ops __net_initdata = {
>+ .init = vsock_sysctl_init_net,
>+ .exit = vsock_sysctl_exit_net,
>+};
>+
> static int __init vsock_init(void)
> {
> int err = 0;
>@@ -2663,10 +2883,19 @@ static int __init vsock_init(void)
> goto err_unregister_proto;
> }
>
>+ if (register_pernet_subsys(&vsock_sysctl_ops)) {
>+ err = -ENOMEM;
>+ goto err_unregister_sock;
>+ }
>+
>+ vsock_net_init(&init_net);
>+ vsock_net_init(vsock_global_dummy_net());
> vsock_bpf_build_proto();
>
> return 0;
>
>+err_unregister_sock:
>+ sock_unregister(AF_VSOCK);
> err_unregister_proto:
> proto_unregister(&vsock_proto);
> err_deregister_misc:
>@@ -2680,6 +2909,7 @@ static void __exit vsock_exit(void)
> misc_deregister(&vsock_device);
> sock_unregister(AF_VSOCK);
> proto_unregister(&vsock_proto);
>+ unregister_pernet_subsys(&vsock_sysctl_ops);
> }
>
> const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 8c867023a2e5..6abec6b9b5bc 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -316,11 +316,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> vq = vsock->vqs[VSOCK_VQ_RX];
>
> do {
>- skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
>+ skb = virtio_vsock_alloc_rx_skb(total_len, GFP_KERNEL);
> if (!skb)
> break;
>
>- memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
> sg_init_one(&pkt, virtio_vsock_hdr(skb), total_len);
> p = &pkt;
> ret = virtqueue_add_sgs(vq, &p, 0, 1, skb, GFP_KERNEL);
>@@ -536,7 +535,7 @@ static bool virtio_transport_msgzerocopy_allow(void)
> return true;
> }
>
>-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
> .transport = {
>@@ -593,7 +592,7 @@ static struct virtio_transport virtio_transport = {
> .can_msgzerocopy = virtio_transport_can_msgzerocopy,
> };
>
>-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
> struct virtio_vsock *vsock;
> bool seqpacket_allow;
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index bc2ff918b315..a8f218f0c5a3 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
> return 0;
> }
>
>-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
>+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
> static bool vsock_loopback_msgzerocopy_allow(void)
> {
> return true;
>@@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = {
> .send_pkt = vsock_loopback_send_pkt,
> };
>
>-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
>+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
> return true;
> }
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 05/14] vsock/loopback: add netns support
2025-10-23 18:27 ` [PATCH net-next v8 05/14] vsock/loopback: add netns support Bobby Eshleman
@ 2025-11-06 16:18 ` Stefano Garzarella
2025-11-07 2:17 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:18 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:44AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add NS support to vsock loopback. Sockets in a global mode netns
>communicate with each other, regardless of namespace. Sockets in a local
>mode netns may only communicate with other sockets within the same
>namespace.
>
>Use pernet_ops to install a vsock_loopback for every namespace that is
>created (to be used if local mode is enabled).
>
>Retroactively call init/exit on every namespace when the vsock_loopback
>module is loaded in order to initialize the per-ns device.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
I'm a bit confused, should we move this after the next patch that add
support of netns in the virtio common module?
Or this is a pre-requisite?
>Changes in v7:
>- drop for_each_net() init/exit, drop net_rwsem, the pernet registration
> handles this automatically and race-free
>- flush workqueue before destruction, purge pkt list
>- remember net_mode instead of current net mode
>- keep space after INIT_WORK()
>- change vsock_loopback in netns_vsock to ->priv void ptr
>- rename `orig_net_mode` to `net_mode`
>- remove useless comment
>- protect `register_pernet_subsys()` with `net_rwsem`
>- do cleanup before releasing `net_rwsem` when failure happens
>- call `unregister_pernet_subsys()` in `vsock_loopback_exit()`
>- call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()`
>
>Changes in v6:
>- init pernet ops for vsock_loopback module
>- vsock_loopback: add space in struct to clarify lock protection
>- do proper cleanup/unregister on vsock_loopback_exit()
>- vsock_loopback: use virtio_vsock_skb_net()
>
>Changes in v5:
>- add callbacks code to avoid reverse dependency
>- add logic for handling vsock_loopback setup for already existing
> namespaces
>---
> include/net/netns/vsock.h | 2 +
> net/vmw_vsock/vsock_loopback.c | 85 ++++++++++++++++++++++++++++++++++++------
> 2 files changed, 75 insertions(+), 12 deletions(-)
>
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>index c9a438ad52f2..9d0d8e2fbc37 100644
>--- a/include/net/netns/vsock.h
>+++ b/include/net/netns/vsock.h
>@@ -16,5 +16,7 @@ struct netns_vsock {
> /* protected by lock */
> enum vsock_net_mode mode;
> bool mode_locked;
>+
>+ void *priv;
> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index a8f218f0c5a3..474083d4cfcb 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -28,8 +28,16 @@ static u32 vsock_loopback_get_local_cid(void)
>
> static int vsock_loopback_send_pkt(struct sk_buff *skb)
> {
>- struct vsock_loopback *vsock = &the_vsock_loopback;
>+ struct vsock_loopback *vsock;
> int len = skb->len;
>+ struct net *net;
>+
>+ net = virtio_vsock_skb_net(skb);
>+
>+ if (virtio_vsock_skb_net_mode(skb) == VSOCK_NET_MODE_LOCAL)
>+ vsock = (struct vsock_loopback *)net->vsock.priv;
Is there some kind of refcount on the net?
What I mean is, are we sure this pointer is still valid? Could the net
disappear in the meantime?
The rest LGTM!
Thanks,
Stefano
>+ else
>+ vsock = &the_vsock_loopback;
>
> virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
> queue_work(vsock->workqueue, &vsock->pkt_work);
>@@ -134,11 +142,8 @@ static void vsock_loopback_work(struct work_struct *work)
> }
> }
>
>-static int __init vsock_loopback_init(void)
>+static int vsock_loopback_init_vsock(struct vsock_loopback *vsock)
> {
>- struct vsock_loopback *vsock = &the_vsock_loopback;
>- int ret;
>-
> vsock->workqueue = alloc_workqueue("vsock-loopback", WQ_PERCPU, 0);
> if (!vsock->workqueue)
> return -ENOMEM;
>@@ -146,15 +151,73 @@ static int __init vsock_loopback_init(void)
> skb_queue_head_init(&vsock->pkt_queue);
> INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
>
>+ return 0;
>+}
>+
>+static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock)
>+{
>+ if (vsock->workqueue) {
>+ flush_work(&vsock->pkt_work);
>+ virtio_vsock_skb_queue_purge(&vsock->pkt_queue);
>+ destroy_workqueue(vsock->workqueue);
>+ vsock->workqueue = NULL;
>+ }
>+}
>+
>+static int vsock_loopback_init_net(struct net *net)
>+{
>+ int ret;
>+
>+ net->vsock.priv = kzalloc(sizeof(struct vsock_loopback), GFP_KERNEL);
>+ if (!net->vsock.priv)
>+ return -ENOMEM;
>+
>+ ret = vsock_loopback_init_vsock((struct vsock_loopback *)net->vsock.priv);
>+ if (ret < 0) {
>+ kfree(net->vsock.priv);
>+ net->vsock.priv = NULL;
>+ return ret;
>+ }
>+
>+ return 0;
>+}
>+
>+static void vsock_loopback_exit_net(struct net *net)
>+{
>+ vsock_loopback_deinit_vsock(net->vsock.priv);
>+ kfree(net->vsock.priv);
>+ net->vsock.priv = NULL;
>+}
>+
>+static struct pernet_operations vsock_loopback_net_ops = {
>+ .init = vsock_loopback_init_net,
>+ .exit = vsock_loopback_exit_net,
>+};
>+
>+static int __init vsock_loopback_init(void)
>+{
>+ struct vsock_loopback *vsock = &the_vsock_loopback;
>+ int ret;
>+
>+ ret = vsock_loopback_init_vsock(vsock);
>+ if (ret < 0)
>+ return ret;
>+
>+ ret = register_pernet_subsys(&vsock_loopback_net_ops);
>+ if (ret < 0)
>+ goto out_deinit_vsock;
>+
> ret = vsock_core_register(&loopback_transport.transport,
> VSOCK_TRANSPORT_F_LOCAL);
> if (ret)
>- goto out_wq;
>+ goto out_unregister_pernet_subsys;
>
> return 0;
>
>-out_wq:
>- destroy_workqueue(vsock->workqueue);
>+out_unregister_pernet_subsys:
>+ unregister_pernet_subsys(&vsock_loopback_net_ops);
>+out_deinit_vsock:
>+ vsock_loopback_deinit_vsock(vsock);
> return ret;
> }
>
>@@ -164,11 +227,9 @@ static void __exit vsock_loopback_exit(void)
>
> vsock_core_unregister(&loopback_transport.transport);
>
>- flush_work(&vsock->pkt_work);
>-
>- virtio_vsock_skb_queue_purge(&vsock->pkt_queue);
>+ unregister_pernet_subsys(&vsock_loopback_net_ops);
>
>- destroy_workqueue(vsock->workqueue);
>+ vsock_loopback_deinit_vsock(vsock);
> }
>
> module_init(vsock_loopback_init);
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-10-23 18:27 ` [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common Bobby Eshleman
@ 2025-11-06 16:20 ` Stefano Garzarella
2025-11-07 2:52 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:20 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:45AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Enable network namespace support in the virtio-vsock common transport
>layer by declaring namespace pointers in the transmit and receive
>paths.
>
>The changes include:
>1. Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
> pointer for outgoing packets.
>2. Store the namespace and namespace mode in the skb control buffer when
> allocating packets (except for VIRTIO_VSOCK_OP_RST packets which do
> not have an associated socket).
>3. Retrieve namespace information from skbs on the receive path for
> lookups using vsock_find_connected_socket_net() and
> vsock_find_bound_socket_net().
>
>This allows users of virtio transport common code
>(vhost-vsock/virtio-vsock) to later enable namespace support.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
>Changes in v7:
>- add comment explaining the !vsk case in virtio_transport_alloc_skb()
>---
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport_common.c | 21 +++++++++++++++++++--
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 29290395054c..f90646f82993 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -217,6 +217,7 @@ struct virtio_vsock_pkt_info {
> u32 remote_cid, remote_port;
> struct vsock_sock *vsk;
> struct msghdr *msg;
>+ struct net *net;
> u32 pkt_len;
> u16 type;
> u16 op;
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index dcc8a1d5851e..b8e52c71920a 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
> info->flags,
> zcopy);
>
>+ /*
>+ * If there is no corresponding socket, then we don't have a
>+ * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
>+ */
So, in virtio_transport_recv_pkt() should we check that `net` is not
set?
Should we set it to NULL here?
>+ if (vsk) {
>+ virtio_vsock_skb_set_net(skb, info->net);
Ditto here about the net refcnt, can the net disappear?
Should we use get_net() in some way, or the socket will prevent that?
>+ virtio_vsock_skb_set_net_mode(skb, vsk->net_mode);
>+ }
>+
> return skb;
> out:
> kfree_skb(skb);
>@@ -527,6 +536,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk)
> struct virtio_vsock_pkt_info info = {
> .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
> .vsk = vsk,
>+ .net = sock_net(sk_vsock(vsk)),
> };
>
> return virtio_transport_send_pkt_info(vsk, &info);
>@@ -1067,6 +1077,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
> struct virtio_vsock_pkt_info info = {
> .op = VIRTIO_VSOCK_OP_REQUEST,
> .vsk = vsk,
>+ .net = sock_net(sk_vsock(vsk)),
> };
>
> return virtio_transport_send_pkt_info(vsk, &info);
>@@ -1082,6 +1093,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
> (mode & SEND_SHUTDOWN ?
> VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
> .vsk = vsk,
>+ .net = sock_net(sk_vsock(vsk)),
> };
>
> return virtio_transport_send_pkt_info(vsk, &info);
>@@ -1108,6 +1120,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
> .msg = msg,
> .pkt_len = len,
> .vsk = vsk,
>+ .net = sock_net(sk_vsock(vsk)),
> };
>
> return virtio_transport_send_pkt_info(vsk, &info);
>@@ -1145,6 +1158,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
> .op = VIRTIO_VSOCK_OP_RST,
> .reply = !!skb,
> .vsk = vsk,
>+ .net = sock_net(sk_vsock(vsk)),
> };
>
> /* Send RST only if the original pkt is not a RST pkt */
>@@ -1465,6 +1479,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
> .remote_port = le32_to_cpu(hdr->src_port),
> .reply = true,
> .vsk = vsk,
>+ .net = sock_net(sk_vsock(vsk)),
> };
>
> return virtio_transport_send_pkt_info(vsk, &info);
>@@ -1578,7 +1593,9 @@ static bool virtio_transport_valid_type(u16 type)
> void virtio_transport_recv_pkt(struct virtio_transport *t,
> struct sk_buff *skb)
> {
>+ enum vsock_net_mode net_mode = virtio_vsock_skb_net_mode(skb);
> struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>+ struct net *net = virtio_vsock_skb_net(skb);
Okay, so this is where the skb net is read, so why we touch the
virtio-vsock driver (virtio_transport.c) in the other patch where we
changed just af_vsock.c?
IMO we should move that change here, or in a separate commit.
Or maybe I missed some dependency :-)
Thanks,
Stefano
> struct sockaddr_vm src, dst;
> struct vsock_sock *vsk;
> struct sock *sk;
>@@ -1606,9 +1623,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> /* The socket must be in connected or bound table
> * otherwise send reset back
> */
>- sk = vsock_find_connected_socket(&src, &dst);
>+ sk = vsock_find_connected_socket_net(&src, &dst, net, net_mode);
> if (!sk) {
>- sk = vsock_find_bound_socket(&dst);
>+ sk = vsock_find_bound_socket_net(&dst, net, net_mode);
> if (!sk) {
> (void)virtio_transport_reset_no_sock(t, skb);
> goto free_pkt;
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 07/14] vhost/vsock: add netns support
2025-10-23 18:27 ` [PATCH net-next v8 07/14] vhost/vsock: add netns support Bobby Eshleman
@ 2025-11-06 16:21 ` Stefano Garzarella
2025-11-07 3:07 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:21 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Oct 23, 2025 at 11:27:46AM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add the ability to isolate vhost-vsock flows using namespaces.
>
>The VM, via the vhost_vsock struct, inherits its namespace from the
>process that opens the vhost-vsock device. vhost_vsock lookup functions
>are modified to take into account the mode (e.g., if CIDs are matching
>but modes don't align, then return NULL).
>
>vhost_vsock now acquires a reference to the namespace.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
>Changes in v7:
>- remove the check_global flag of vhost_vsock_get(), that logic was both
> wrong and not necessary, reuse vsock_net_check_mode() instead
>- remove 'delete me' comment
>Changes in v5:
>- respect pid namespaces when assigning namespace to vhost_vsock
>---
> drivers/vhost/vsock.c | 44 ++++++++++++++++++++++++++++++++++----------
> 1 file changed, 34 insertions(+), 10 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 34adf0cf9124..df6136633cd8 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -46,6 +46,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
> struct vhost_vsock {
> struct vhost_dev dev;
> struct vhost_virtqueue vqs[2];
>+ struct net *net;
>+ netns_tracker ns_tracker;
>+
>+ /* The ns mode at the time vhost_vsock was created */
>+ enum vsock_net_mode net_mode;
>
> /* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
> struct hlist_node hash;
>@@ -67,7 +72,8 @@ static u32 vhost_transport_get_local_cid(void)
> /* Callers that dereference the return value must hold vhost_vsock_mutex or the
> * RCU read lock.
> */
>-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net,
>+ enum vsock_net_mode mode)
> {
> struct vhost_vsock *vsock;
>
>@@ -78,9 +84,9 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> if (other_cid == 0)
> continue;
>
>- if (other_cid == guest_cid)
>+ if (other_cid == guest_cid &&
>+ vsock_net_check_mode(net, mode, vsock->net, vsock->net_mode))
> return vsock;
>-
> }
>
> return NULL;
>@@ -271,14 +277,16 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
> static int
> vhost_transport_send_pkt(struct sk_buff *skb)
> {
>+ enum vsock_net_mode mode = virtio_vsock_skb_net_mode(skb);
> struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>+ struct net *net = virtio_vsock_skb_net(skb);
> struct vhost_vsock *vsock;
> int len = skb->len;
>
> rcu_read_lock();
>
> /* Find the vhost_vsock according to guest context id */
>- vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid));
>+ vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net, mode);
> if (!vsock) {
> rcu_read_unlock();
> kfree_skb(skb);
>@@ -305,7 +313,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> rcu_read_lock();
>
> /* Find the vhost_vsock according to guest context id */
>- vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
>+ vsock = vhost_vsock_get(vsk->remote_addr.svm_cid,
>+ sock_net(sk_vsock(vsk)), vsk->net_mode);
> if (!vsock)
> goto out;
>
>@@ -327,7 +336,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> }
>
> static struct sk_buff *
>-vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>+vhost_vsock_alloc_skb(struct vhost_vsock *vsock, struct vhost_virtqueue *vq,
> unsigned int out, unsigned int in)
> {
> struct virtio_vsock_hdr *hdr;
>@@ -353,6 +362,9 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
> if (!skb)
> return NULL;
>
>+ virtio_vsock_skb_set_net(skb, vsock->net);
>+ virtio_vsock_skb_set_net_mode(skb, vsock->net_mode);
>+
> iov_iter_init(&iov_iter, ITER_SOURCE, vq->iov, out, len);
>
> hdr = virtio_vsock_hdr(skb);
>@@ -462,11 +474,12 @@ static struct virtio_transport vhost_transport = {
>
> static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
>+ struct net *net = sock_net(sk_vsock(vsk));
> struct vhost_vsock *vsock;
> bool seqpacket_allow = false;
>
> rcu_read_lock();
>- vsock = vhost_vsock_get(remote_cid);
>+ vsock = vhost_vsock_get(remote_cid, net, vsk->net_mode);
>
> if (vsock)
> seqpacket_allow = vsock->seqpacket_allow;
>@@ -520,7 +533,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> break;
> }
>
>- skb = vhost_vsock_alloc_skb(vq, out, in);
>+ skb = vhost_vsock_alloc_skb(vsock, vq, out, in);
> if (!skb) {
> vq_err(vq, "Faulted on pkt\n");
> continue;
>@@ -652,8 +665,10 @@ static void vhost_vsock_free(struct vhost_vsock *vsock)
>
> static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> {
>+
> struct vhost_virtqueue **vqs;
> struct vhost_vsock *vsock;
>+ struct net *net;
> int ret;
>
> /* This struct is large and allocation could fail, fall back to vmalloc
>@@ -669,6 +684,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> goto out;
> }
>
>+ net = current->nsproxy->net_ns;
>+ vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
>+
>+ /* Cache the mode of the namespace so that if that netns mode changes,
>+ * the vhost_vsock will continue to function as expected.
>+ */
I think we should document this in the commit description and in both we
should add also the reason. (IIRC, it was to simplify everything and
prevent a VM from changing modes when running and then tracking all its
packets)
>+ vsock->net_mode = vsock_net_mode(net);
>+
> vsock->guest_cid = 0; /* no CID assigned yet */
> vsock->seqpacket_allow = false;
>
>@@ -708,7 +731,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
> */
>
> /* If the peer is still valid, no need to reset connection */
>- if (vhost_vsock_get(vsk->remote_addr.svm_cid))
>+ if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk), vsk->net_mode))
> return;
>
> /* If the close timeout is pending, let it expire. This avoids races
>@@ -753,6 +776,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
> virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
>
> vhost_dev_cleanup(&vsock->dev);
>+ put_net_track(vsock->net, &vsock->ns_tracker);
Doing this after virtio_vsock_skb_queue_purge() should ensure that all
skbs have been drained, so there should be no one flying with this
netns. Perhaps this clarifies my doubts about the skb net, but should we
do something similar for loopback as well?
And maybe we should document that also in the virtio_vsock_skb_cb.
The rest LGTM.
Thanks,
Stefano
> kfree(vsock->dev.vqs);
> vhost_vsock_free(vsock);
> return 0;
>@@ -779,7 +803,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
>
> /* Refuse if CID is already in use */
> mutex_lock(&vhost_vsock_mutex);
>- other = vhost_vsock_get(guest_cid);
>+ other = vhost_vsock_get(guest_cid, vsock->net, vsock->net_mode);
> if (other && other != vsock) {
> mutex_unlock(&vhost_vsock_mutex);
> return -EADDRINUSE;
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock
2025-10-27 17:25 ` Bobby Eshleman
@ 2025-11-06 16:23 ` Stefano Garzarella
2025-11-07 1:00 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-06 16:23 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Mon, Oct 27, 2025 at 10:25:29AM -0700, Bobby Eshleman wrote:
>On Mon, Oct 27, 2025 at 02:28:31PM +0100, Stefano Garzarella wrote:
>> Hi Bobby,
>>
>> >
>> > Changes in v8:
>> > - Break generic cleanup/refactoring patches into standalone series,
>> > remove those from this series
>>
>> Yep, thanks for splitting the series. I'll review it ASAP since it's a
>> dependency.
>>
>> I was at GSoC mentor summit last week, so I'm bit busy with the backlog, but
>> I'll do my best to review both series this week.
>>
>> Thanks,
>> Stefano
>>
>
>Thanks for the heads up!
I just reviewed the code changes. I skipped the selftest, since we are
still discussing the other series (indeed I can't apply this anymore on
top of that), so I'll check the rest later.
Thanks for the great work!
Stefano
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock
2025-11-06 16:23 ` Stefano Garzarella
@ 2025-11-07 1:00 ` Bobby Eshleman
0 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 1:00 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 05:23:53PM +0100, Stefano Garzarella wrote:
> On Mon, Oct 27, 2025 at 10:25:29AM -0700, Bobby Eshleman wrote:
> > On Mon, Oct 27, 2025 at 02:28:31PM +0100, Stefano Garzarella wrote:
[...]
>
> I just reviewed the code changes. I skipped the selftest, since we are still
> discussing the other series (indeed I can't apply this anymore on top of
> that), so I'll check the rest later.
>
> Thanks for the great work!
>
> Stefano
>
I appreciate it! Thanks again for the work on your side reviewing.
I'll address your feedback and rebase onto that other series shortly.
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state
2025-11-06 16:16 ` Stefano Garzarella
@ 2025-11-07 1:09 ` Bobby Eshleman
0 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 1:09 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 05:16:29PM +0100, Stefano Garzarella wrote:
> On Thu, Oct 23, 2025 at 11:27:40AM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
[...]
> > @@ -65,6 +66,7 @@ struct vsock_sock {
> > u32 peer_shutdown;
> > bool sent_request;
> > bool ignore_connecting_rst;
> > + enum vsock_net_mode net_mode;
> >
> > /* Protected by lock_sock(sk) */
> > u64 buffer_size;
> > @@ -256,4 +258,58 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> > {
> > return t->msgzerocopy_allow && t->msgzerocopy_allow();
> > }
> > +
> > +static inline enum vsock_net_mode vsock_net_mode(struct net *net)
> > +{
> > + enum vsock_net_mode ret;
> > +
> > + spin_lock_bh(&net->vsock.lock);
> > + ret = net->vsock.mode;
>
> Do we really need a spin_lock just to set/get a variable?
> What about WRITE_ONCE/READ_ONCE and/or atomic ?
>
> Not a strong opinion, just to check if we can do something like this:
>
> static inline enum vsock_net_mode vsock_net_mode(struct net *net)
> {
> return READ_ONCE(net->vsock.mode);
> }
>
> static inline bool vsock_net_write_mode(struct net *net, u8 mode)
> {
> // Or using test_and_set_bit() if you prefer
> if (xchg(&net->vsock.mode_locked, true))
> return false;
>
> WRITE_ONCE(net->vsock.mode, mode);
> return true;
> }
>
I think that works and seems worth it to avoid the lock on the read
side. I'll move this over for the next rev.
[...]
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 04/14] vsock: add netns to vsock core
2025-11-06 16:18 ` Stefano Garzarella
@ 2025-11-07 2:03 ` Bobby Eshleman
2025-11-07 13:53 ` Stefano Garzarella
0 siblings, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 2:03 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 05:18:00PM +0100, Stefano Garzarella wrote:
> On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> >
> > Add netns logic to vsock core. Additionally, modify transport hook
> > prototypes to be used by later transport-specific patches (e.g.,
> > *_seqpacket_allow()).
> >
> > Namespaces are supported primarily by changing socket lookup functions
> > (e.g., vsock_find_connected_socket()) to take into account the socket
> > namespace and the namespace mode before considering a candidate socket a
> > "match".
> >
> > Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
> > used by transports that do not support namespacing. This dummy always
> > has mode "global" to preserve previous CID behavior.
> >
> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
> > accepts the "global" or "local" mode strings.
> >
> > The transports (besides vhost) are modified to use the global dummy,
> > which makes them behave as if always in the global namespace. Vhost is
> > an exception because it inherits its namespace from the process that
> > opens the vhost device.
> >
> > Add netns functionality (initialization, passing to transports, procfs,
> > etc...) to the af_vsock socket layer. Later patches that add netns
> > support to transports depend on this patch.
> >
> > seqpacket_allow() callbacks are modified to take a vsk so that transport
> > implementations can inspect sock_net(sk) and vsk->net_mode when performing
> > lookups (e.g., vhost does this in its future netns patch). Because the
> > API change affects all transports, it seemed more appropriate to make
> > this internal API change in the "vsock core" patch then in the "vhost"
> > patch.
> >
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v7:
> > - hv_sock: fix hyperv build error
> > - explain why vhost does not use the dummy
> > - explain usage of __vsock_global_dummy_net
> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
> > - use switch-case in vsock_net_mode_string()
> > - avoid changing transports as much as possible
> > - add vsock_find_{bound,connected}_socket_net()
> > - rename `vsock_hdr` to `sysctl_hdr`
> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
> > global mode for virtio-vsock, move skb->cb zero-ing into wrapper
> > - explain seqpacket_allow() change
> > - move net setting to __vsock_create() instead of vsock_create() so
> > that child sockets also have their net assigned upon accept()
> >
> > Changes in v6:
> > - unregister sysctl ops in vsock_exit()
> > - af_vsock: clarify description of CID behavior
> > - af_vsock: fix buf vs buffer naming, and length checking
> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
> >
> > Changes in v5:
> > - vsock_global_net() -> vsock_global_dummy_net()
> > - update comments for new uAPI
> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
> > - add prototype changes so patch remains compilable
> > ---
> > drivers/vhost/vsock.c | 4 +-
> > include/linux/virtio_vsock.h | 21 ++++
> > include/net/af_vsock.h | 14 ++-
> > net/vmw_vsock/af_vsock.c | 264 ++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/virtio_transport.c | 7 +-
> > net/vmw_vsock/vsock_loopback.c | 4 +-
> > 6 files changed, 288 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ae01457ea2cd..34adf0cf9124 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
> > return true;
> > }
> >
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
> >
> > static struct virtio_transport vhost_transport = {
> > .transport = {
> > @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
> > .send_pkt = vhost_transport_send_pkt,
> > };
> >
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> > {
> > struct vhost_vsock *vsock;
> > bool seqpacket_allow = false;
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 7f334a32133c..29290395054c 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
> > VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
> > }
> >
> > +static inline struct sk_buff *
> > +virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
> > +{
> > + struct sk_buff *skb;
> > +
> > + skb = virtio_vsock_alloc_linear_skb(size, mask);
> > + if (!skb)
> > + return NULL;
> > +
> > + memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
> > +
> > + /* virtio-vsock does not yet support namespaces, so on receive
> > + * we force legacy namespace behavior using the global dummy net
> > + * and global net mode.
> > + */
> > + virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
> > + virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
> > +
> > + return skb;
> > +}
>
> Why we are introducing this change in this patch?
>
> Where the net of the virtio's skb is read?
>
Oh good point, this is a weird place for this. I'll move this to where
it is actually used.
[...]
> >
> > +static int vsock_net_mode_string(const struct ctl_table *table, int write,
> > + void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > + char data[VSOCK_NET_MODE_STR_MAX] = {0};
> > + enum vsock_net_mode mode;
> > + struct ctl_table tmp;
> > + struct net *net;
> > + int ret;
> > +
> > + if (!table->data || !table->maxlen || !*lenp) {
> > + *lenp = 0;
> > + return 0;
> > + }
> > +
> > + net = current->nsproxy->net_ns;
> > + tmp = *table;
> > + tmp.data = data;
> > +
> > + if (!write) {
> > + const char *p;
> > +
> > + mode = vsock_net_mode(net);
> > +
> > + switch (mode) {
> > + case VSOCK_NET_MODE_GLOBAL:
> > + p = VSOCK_NET_MODE_STR_GLOBAL;
> > + break;
> > + case VSOCK_NET_MODE_LOCAL:
> > + p = VSOCK_NET_MODE_STR_LOCAL;
> > + break;
> > + default:
> > + WARN_ONCE(true, "netns has invalid vsock mode");
> > + *lenp = 0;
> > + return 0;
> > + }
> > +
> > + strscpy(data, p, sizeof(data));
> > + tmp.maxlen = strlen(p);
> > + }
> > +
> > + ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
> > + if (ret)
> > + return ret;
> > +
> > + if (write) {
>
> Do we need to check some capability, e.g. CAP_NET_ADMIN ?
>
We get that for free via the sysctl_net registration, through this path
on open (CAP_NET_ADMIN is checked in net_ctl_permissions):
net_ctl_permissions+1
sysctl_perm+24
proc_sys_permission+117
inode_permission+217
link_path_walk+162
path_openat+152
do_filp_open+171
do_sys_openat2+98
__x64_sys_openat+69
do_syscall_64+93
Verified with:
cp /bin/echo /tmp/echo_netadmin
setcap cap_net_admin+ep /tmp/echo_netadmin
(non-root user fails with regular echo, succeeds with
/tmp/echo_netadmin)
Best regards,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 05/14] vsock/loopback: add netns support
2025-11-06 16:18 ` Stefano Garzarella
@ 2025-11-07 2:17 ` Bobby Eshleman
0 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 2:17 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 05:18:36PM +0100, Stefano Garzarella wrote:
> On Thu, Oct 23, 2025 at 11:27:44AM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> >
> > Add NS support to vsock loopback. Sockets in a global mode netns
> > communicate with each other, regardless of namespace. Sockets in a local
> > mode netns may only communicate with other sockets within the same
> > namespace.
> >
> > Use pernet_ops to install a vsock_loopback for every namespace that is
> > created (to be used if local mode is enabled).
> >
> > Retroactively call init/exit on every namespace when the vsock_loopback
> > module is loaded in order to initialize the per-ns device.
> >
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
>
> I'm a bit confused, should we move this after the next patch that add
> support of netns in the virtio common module?
>
> Or this is a pre-requisite?
>
Yes let's do that, it does need common.
[...]
> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index a8f218f0c5a3..474083d4cfcb 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -28,8 +28,16 @@ static u32 vsock_loopback_get_local_cid(void)
> >
> > static int vsock_loopback_send_pkt(struct sk_buff *skb)
> > {
> > - struct vsock_loopback *vsock = &the_vsock_loopback;
> > + struct vsock_loopback *vsock;
> > int len = skb->len;
> > + struct net *net;
> > +
> > + net = virtio_vsock_skb_net(skb);
> > +
> > + if (virtio_vsock_skb_net_mode(skb) == VSOCK_NET_MODE_LOCAL)
> > + vsock = (struct vsock_loopback *)net->vsock.priv;
>
> Is there some kind of refcount on the net?
> What I mean is, are we sure this pointer is still valid? Could the net
> disappear in the meantime?
I only considered the case of net being removed, which I think is okay
because user sockets take a net reference in sk_alloc(), and we can't
reach this point after the sock is destroyed and the reference is
released because the transport will be unassigned prior.
But... I'm now realizing there is the case of
virtio_transport_reset_no_sock() where skb net is null. I can't see why
that wouldn't be possible for loopback?
Let's handle that case to be sure...
>
> The rest LGTM!
>
> Thanks, Stefano
>
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-11-06 16:20 ` Stefano Garzarella
@ 2025-11-07 2:52 ` Bobby Eshleman
2025-11-07 14:30 ` Stefano Garzarella
2025-11-07 14:33 ` Bobby Eshleman
0 siblings, 2 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 2:52 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 05:20:05PM +0100, Stefano Garzarella wrote:
> On Thu, Oct 23, 2025 at 11:27:45AM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> >
> > Enable network namespace support in the virtio-vsock common transport
> > layer by declaring namespace pointers in the transmit and receive
> > paths.
> >
> > The changes include:
> > 1. Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
> > pointer for outgoing packets.
> > 2. Store the namespace and namespace mode in the skb control buffer when
> > allocating packets (except for VIRTIO_VSOCK_OP_RST packets which do
> > not have an associated socket).
> > 3. Retrieve namespace information from skbs on the receive path for
> > lookups using vsock_find_connected_socket_net() and
> > vsock_find_bound_socket_net().
> >
> > This allows users of virtio transport common code
> > (vhost-vsock/virtio-vsock) to later enable namespace support.
> >
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v7:
> > - add comment explaining the !vsk case in virtio_transport_alloc_skb()
> > ---
> > include/linux/virtio_vsock.h | 1 +
> > net/vmw_vsock/virtio_transport_common.c | 21 +++++++++++++++++++--
> > 2 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 29290395054c..f90646f82993 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -217,6 +217,7 @@ struct virtio_vsock_pkt_info {
> > u32 remote_cid, remote_port;
> > struct vsock_sock *vsk;
> > struct msghdr *msg;
> > + struct net *net;
> > u32 pkt_len;
> > u16 type;
> > u16 op;
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index dcc8a1d5851e..b8e52c71920a 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
> > info->flags,
> > zcopy);
> >
> > + /*
> > + * If there is no corresponding socket, then we don't have a
> > + * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
> > + */
>
> So, in virtio_transport_recv_pkt() should we check that `net` is not set?
>
> Should we set it to NULL here?
>
Sounds good to me.
> > + if (vsk) {
> > + virtio_vsock_skb_set_net(skb, info->net);
>
> Ditto here about the net refcnt, can the net disappear?
> Should we use get_net() in some way, or the socket will prevent that?
>
As long as the socket has an outstanding skb it can't be destroyed and
so will have a reference to the net, that is after skb_set_owner_w() and
freeing... so I think this is okay.
But, maybe we could simplify the implied relationship between skb, sk,
and net by removing the VIRTIO_VSOCK_SKB_CB(skb)->net entirely, and only
ever referring to sock_net(skb->sk)? I remember originally having a
reason for adding it to the cb, but my hunch is it that it was probably
some confusion over the !vsk case.
WDYT?
[...]
> >
> > return virtio_transport_send_pkt_info(vsk, &info);
> > @@ -1578,7 +1593,9 @@ static bool virtio_transport_valid_type(u16 type)
> > void virtio_transport_recv_pkt(struct virtio_transport *t,
> > struct sk_buff *skb)
> > {
> > + enum vsock_net_mode net_mode = virtio_vsock_skb_net_mode(skb);
> > struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
> > + struct net *net = virtio_vsock_skb_net(skb);
>
> Okay, so this is where the skb net is read, so why we touch the virtio-vsock
> driver (virtio_transport.c) in the other patch where we changed just
> af_vsock.c?
>
> IMO we should move that change here, or in a separate commit.
> Or maybe I missed some dependency :-)
>
100% agree.
> Thanks,
> Stefano
>
Thanks!
-Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 07/14] vhost/vsock: add netns support
2025-11-06 16:21 ` Stefano Garzarella
@ 2025-11-07 3:07 ` Bobby Eshleman
0 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 3:07 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 05:21:35PM +0100, Stefano Garzarella wrote:
> On Thu, Oct 23, 2025 at 11:27:46AM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> >
> > Add the ability to isolate vhost-vsock flows using namespaces.
> >
> > The VM, via the vhost_vsock struct, inherits its namespace from the
> > process that opens the vhost-vsock device. vhost_vsock lookup functions
> > are modified to take into account the mode (e.g., if CIDs are matching
> > but modes don't align, then return NULL).
> >
> > vhost_vsock now acquires a reference to the namespace.
> >
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v7:
> > - remove the check_global flag of vhost_vsock_get(), that logic was both
> > wrong and not necessary, reuse vsock_net_check_mode() instead
> > - remove 'delete me' comment
> > Changes in v5:
> > - respect pid namespaces when assigning namespace to vhost_vsock
> > ---
> > drivers/vhost/vsock.c | 44 ++++++++++++++++++++++++++++++++++----------
> > 1 file changed, 34 insertions(+), 10 deletions(-)
[...]
> > static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> > {
> > +
> > struct vhost_virtqueue **vqs;
> > struct vhost_vsock *vsock;
> > + struct net *net;
> > int ret;
> >
> > /* This struct is large and allocation could fail, fall back to vmalloc
> > @@ -669,6 +684,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> > goto out;
> > }
> >
> > + net = current->nsproxy->net_ns;
> > + vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
> > +
> > + /* Cache the mode of the namespace so that if that netns mode changes,
> > + * the vhost_vsock will continue to function as expected.
> > + */
>
> I think we should document this in the commit description and in both we
> should add also the reason. (IIRC, it was to simplify everything and prevent
> a VM from changing modes when running and then tracking all its packets)
>
Sounds good!
> > + vsock->net_mode = vsock_net_mode(net);
> > +
> > vsock->guest_cid = 0; /* no CID assigned yet */
> > vsock->seqpacket_allow = false;
> >
> > @@ -708,7 +731,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
> > */
> >
> > /* If the peer is still valid, no need to reset connection */
> > - if (vhost_vsock_get(vsk->remote_addr.svm_cid))
> > + if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk), vsk->net_mode))
> > return;
> >
> > /* If the close timeout is pending, let it expire. This avoids races
> > @@ -753,6 +776,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
> > virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
> >
> > vhost_dev_cleanup(&vsock->dev);
> > + put_net_track(vsock->net, &vsock->ns_tracker);
>
> Doing this after virtio_vsock_skb_queue_purge() should ensure that all skbs
> have been drained, so there should be no one flying with this netns. Perhaps
> this clarifies my doubts about the skb net, but should we do something
> similar for loopback as well?
100% - for loopback the skb purge is done in the net exit hook, which is
called just before netns destruction. Maybe it is worth commenting that
context there too.
> And maybe we should document that also in the virtio_vsock_skb_cb.
>
sgtm!
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 04/14] vsock: add netns to vsock core
2025-11-07 2:03 ` Bobby Eshleman
@ 2025-11-07 13:53 ` Stefano Garzarella
0 siblings, 0 replies; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-07 13:53 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 06:03:10PM -0800, Bobby Eshleman wrote:
>On Thu, Nov 06, 2025 at 05:18:00PM +0100, Stefano Garzarella wrote:
>> On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
>> > From: Bobby Eshleman <bobbyeshleman@meta.com>
>> >
>> > Add netns logic to vsock core. Additionally, modify transport hook
>> > prototypes to be used by later transport-specific patches (e.g.,
>> > *_seqpacket_allow()).
>> >
>> > Namespaces are supported primarily by changing socket lookup functions
>> > (e.g., vsock_find_connected_socket()) to take into account the socket
>> > namespace and the namespace mode before considering a candidate socket a
>> > "match".
>> >
>> > Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
>> > used by transports that do not support namespacing. This dummy always
>> > has mode "global" to preserve previous CID behavior.
>> >
>> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
>> > accepts the "global" or "local" mode strings.
>> >
>> > The transports (besides vhost) are modified to use the global dummy,
>> > which makes them behave as if always in the global namespace. Vhost is
>> > an exception because it inherits its namespace from the process that
>> > opens the vhost device.
>> >
>> > Add netns functionality (initialization, passing to transports, procfs,
>> > etc...) to the af_vsock socket layer. Later patches that add netns
>> > support to transports depend on this patch.
>> >
>> > seqpacket_allow() callbacks are modified to take a vsk so that transport
>> > implementations can inspect sock_net(sk) and vsk->net_mode when performing
>> > lookups (e.g., vhost does this in its future netns patch). Because the
>> > API change affects all transports, it seemed more appropriate to make
>> > this internal API change in the "vsock core" patch then in the "vhost"
>> > patch.
>> >
>> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>> > ---
>> > Changes in v7:
>> > - hv_sock: fix hyperv build error
>> > - explain why vhost does not use the dummy
>> > - explain usage of __vsock_global_dummy_net
>> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
>> > - use switch-case in vsock_net_mode_string()
>> > - avoid changing transports as much as possible
>> > - add vsock_find_{bound,connected}_socket_net()
>> > - rename `vsock_hdr` to `sysctl_hdr`
>> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
>> > global mode for virtio-vsock, move skb->cb zero-ing into wrapper
>> > - explain seqpacket_allow() change
>> > - move net setting to __vsock_create() instead of vsock_create() so
>> > that child sockets also have their net assigned upon accept()
>> >
>> > Changes in v6:
>> > - unregister sysctl ops in vsock_exit()
>> > - af_vsock: clarify description of CID behavior
>> > - af_vsock: fix buf vs buffer naming, and length checking
>> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
>> >
>> > Changes in v5:
>> > - vsock_global_net() -> vsock_global_dummy_net()
>> > - update comments for new uAPI
>> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
>> > - add prototype changes so patch remains compilable
>> > ---
>> > drivers/vhost/vsock.c | 4 +-
>> > include/linux/virtio_vsock.h | 21 ++++
>> > include/net/af_vsock.h | 14 ++-
>> > net/vmw_vsock/af_vsock.c | 264 ++++++++++++++++++++++++++++++++++++---
>> > net/vmw_vsock/virtio_transport.c | 7 +-
>> > net/vmw_vsock/vsock_loopback.c | 4 +-
>> > 6 files changed, 288 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index ae01457ea2cd..34adf0cf9124 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
>> > return true;
>> > }
>> >
>> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
>> >
>> > static struct virtio_transport vhost_transport = {
>> > .transport = {
>> > @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
>> > .send_pkt = vhost_transport_send_pkt,
>> > };
>> >
>> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
>> > {
>> > struct vhost_vsock *vsock;
>> > bool seqpacket_allow = false;
>> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > index 7f334a32133c..29290395054c 100644
>> > --- a/include/linux/virtio_vsock.h
>> > +++ b/include/linux/virtio_vsock.h
>> > @@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
>> > VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
>> > }
>> >
>> > +static inline struct sk_buff *
>> > +virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
>> > +{
>> > + struct sk_buff *skb;
>> > +
>> > + skb = virtio_vsock_alloc_linear_skb(size, mask);
>> > + if (!skb)
>> > + return NULL;
>> > +
>> > + memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
>> > +
>> > + /* virtio-vsock does not yet support namespaces, so on receive
>> > + * we force legacy namespace behavior using the global dummy net
>> > + * and global net mode.
>> > + */
>> > + virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
>> > + virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
>> > +
>> > + return skb;
>> > +}
>>
>> Why we are introducing this change in this patch?
>>
>> Where the net of the virtio's skb is read?
>>
>
>Oh good point, this is a weird place for this. I'll move this to where
>it is actually used.
>
>[...]
>
>> >
>> > +static int vsock_net_mode_string(const struct ctl_table *table, int write,
>> > + void *buffer, size_t *lenp, loff_t *ppos)
>> > +{
>> > + char data[VSOCK_NET_MODE_STR_MAX] = {0};
>> > + enum vsock_net_mode mode;
>> > + struct ctl_table tmp;
>> > + struct net *net;
>> > + int ret;
>> > +
>> > + if (!table->data || !table->maxlen || !*lenp) {
>> > + *lenp = 0;
>> > + return 0;
>> > + }
>> > +
>> > + net = current->nsproxy->net_ns;
>> > + tmp = *table;
>> > + tmp.data = data;
>> > +
>> > + if (!write) {
>> > + const char *p;
>> > +
>> > + mode = vsock_net_mode(net);
>> > +
>> > + switch (mode) {
>> > + case VSOCK_NET_MODE_GLOBAL:
>> > + p = VSOCK_NET_MODE_STR_GLOBAL;
>> > + break;
>> > + case VSOCK_NET_MODE_LOCAL:
>> > + p = VSOCK_NET_MODE_STR_LOCAL;
>> > + break;
>> > + default:
>> > + WARN_ONCE(true, "netns has invalid vsock mode");
>> > + *lenp = 0;
>> > + return 0;
>> > + }
>> > +
>> > + strscpy(data, p, sizeof(data));
>> > + tmp.maxlen = strlen(p);
>> > + }
>> > +
>> > + ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
>> > + if (ret)
>> > + return ret;
>> > +
>> > + if (write) {
>>
>> Do we need to check some capability, e.g. CAP_NET_ADMIN ?
>>
>
>We get that for free via the sysctl_net registration, through this path
>on open (CAP_NET_ADMIN is checked in net_ctl_permissions):
>
> net_ctl_permissions+1
> sysctl_perm+24
> proc_sys_permission+117
> inode_permission+217
> link_path_walk+162
> path_openat+152
> do_filp_open+171
> do_sys_openat2+98
> __x64_sys_openat+69
> do_syscall_64+93
>
>Verified with:
>
>cp /bin/echo /tmp/echo_netadmin
>setcap cap_net_admin+ep /tmp/echo_netadmin
>
>(non-root user fails with regular echo, succeeds with
>/tmp/echo_netadmin)
Thanks for checking!
Stefano
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-11-07 2:52 ` Bobby Eshleman
@ 2025-11-07 14:30 ` Stefano Garzarella
2025-11-07 14:33 ` Bobby Eshleman
1 sibling, 0 replies; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-07 14:30 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Thu, Nov 06, 2025 at 06:52:15PM -0800, Bobby Eshleman wrote:
>On Thu, Nov 06, 2025 at 05:20:05PM +0100, Stefano Garzarella wrote:
>> On Thu, Oct 23, 2025 at 11:27:45AM -0700, Bobby Eshleman wrote:
>> > From: Bobby Eshleman <bobbyeshleman@meta.com>
>> >
>> > Enable network namespace support in the virtio-vsock common transport
>> > layer by declaring namespace pointers in the transmit and receive
>> > paths.
>> >
>> > The changes include:
>> > 1. Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
>> > pointer for outgoing packets.
>> > 2. Store the namespace and namespace mode in the skb control buffer when
>> > allocating packets (except for VIRTIO_VSOCK_OP_RST packets which do
>> > not have an associated socket).
>> > 3. Retrieve namespace information from skbs on the receive path for
>> > lookups using vsock_find_connected_socket_net() and
>> > vsock_find_bound_socket_net().
>> >
>> > This allows users of virtio transport common code
>> > (vhost-vsock/virtio-vsock) to later enable namespace support.
>> >
>> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>> > ---
>> > Changes in v7:
>> > - add comment explaining the !vsk case in
>> > virtio_transport_alloc_skb()
>> > ---
>> > include/linux/virtio_vsock.h | 1 +
>> > net/vmw_vsock/virtio_transport_common.c | 21 +++++++++++++++++++--
>> > 2 files changed, 20 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > index 29290395054c..f90646f82993 100644
>> > --- a/include/linux/virtio_vsock.h
>> > +++ b/include/linux/virtio_vsock.h
>> > @@ -217,6 +217,7 @@ struct virtio_vsock_pkt_info {
>> > u32 remote_cid, remote_port;
>> > struct vsock_sock *vsk;
>> > struct msghdr *msg;
>> > + struct net *net;
>> > u32 pkt_len;
>> > u16 type;
>> > u16 op;
>> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > index dcc8a1d5851e..b8e52c71920a 100644
>> > --- a/net/vmw_vsock/virtio_transport_common.c
>> > +++ b/net/vmw_vsock/virtio_transport_common.c
>> > @@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
>> > info->flags,
>> > zcopy);
>> >
>> > + /*
>> > + * If there is no corresponding socket, then we don't have a
>> > + * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
>> > + */
>>
>> So, in virtio_transport_recv_pkt() should we check that `net` is not set?
>>
>> Should we set it to NULL here?
>>
>
>Sounds good to me.
>
>> > + if (vsk) {
>> > + virtio_vsock_skb_set_net(skb, info->net);
>>
>> Ditto here about the net refcnt, can the net disappear?
>> Should we use get_net() in some way, or the socket will prevent that?
>>
>
>As long as the socket has an outstanding skb it can't be destroyed and
>so will have a reference to the net, that is after skb_set_owner_w() and
>freeing... so I think this is okay.
>
>But, maybe we could simplify the implied relationship between skb, sk,
>and net by removing the VIRTIO_VSOCK_SKB_CB(skb)->net entirely, and only
>ever referring to sock_net(skb->sk)? I remember originally having a
>reason for adding it to the cb, but my hunch is it that it was probably
>some confusion over the !vsk case.
>
>WDYT?
If vsk == NULL, I'm expecting that also skb->sk is not valid, right?
Indeed we call skb_set_owner_w() only if vsk != NULL in
virtio_transport_alloc_skb().
Maybe we need to change virtio_transport_recv_pkt() where the `net`
should be passed in some way by the caller, so maybe this is the reason
why you needed it in the cb. But also in that case, I think we can get
the `net` in some way and pass it to virtio_transport_recv_pkt() and
avoid the change in the cb:
- vsock_lookpback.c in vsock_loopback_work() we can use vsock->net
- vhost/vsock.c in vhost_vsock_handle_tx_kick(), ditto we can use
vsock->net
- virtio_transport.c we can just pass always the dummy_net
Same fot the net_mode.
Maybe the real problem is in the send_pkt callbacks, where the skb is
used to get the socket, but as you mention, I think in this path
skb_set_owner_w() is already called, so we can get that info from there
in some way.
Not sure, but yeah, if we can remove that, it will be much clear IMO.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-11-07 2:52 ` Bobby Eshleman
2025-11-07 14:30 ` Stefano Garzarella
@ 2025-11-07 14:33 ` Bobby Eshleman
2025-11-07 15:07 ` Stefano Garzarella
1 sibling, 1 reply; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 14:33 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index dcc8a1d5851e..b8e52c71920a 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
> > > info->flags,
> > > zcopy);
> > >
> > > + /*
> > > + * If there is no corresponding socket, then we don't have a
> > > + * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
> > > + */
> >
> > So, in virtio_transport_recv_pkt() should we check that `net` is not set?
> >
> > Should we set it to NULL here?
> >
>
> Sounds good to me.
>
> > > + if (vsk) {
> > > + virtio_vsock_skb_set_net(skb, info->net);
> >
> > Ditto here about the net refcnt, can the net disappear?
> > Should we use get_net() in some way, or the socket will prevent that?
> >
>
> As long as the socket has an outstanding skb it can't be destroyed and
> so will have a reference to the net, that is after skb_set_owner_w() and
> freeing... so I think this is okay.
>
> But, maybe we could simplify the implied relationship between skb, sk,
> and net by removing the VIRTIO_VSOCK_SKB_CB(skb)->net entirely, and only
> ever referring to sock_net(skb->sk)? I remember originally having a
> reason for adding it to the cb, but my hunch is it that it was probably
> some confusion over the !vsk case.
>
> WDYT?
>
... now I remember the reason, because I didn't want two different
places for storing the net for RX and TX.
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-11-07 14:33 ` Bobby Eshleman
@ 2025-11-07 15:07 ` Stefano Garzarella
2025-11-07 15:47 ` Bobby Eshleman
0 siblings, 1 reply; 36+ messages in thread
From: Stefano Garzarella @ 2025-11-07 15:07 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Fri, Nov 07, 2025 at 06:33:33AM -0800, Bobby Eshleman wrote:
>> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > > index dcc8a1d5851e..b8e52c71920a 100644
>> > > --- a/net/vmw_vsock/virtio_transport_common.c
>> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>> > > @@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
>> > > info->flags,
>> > > zcopy);
>> > >
>> > > + /*
>> > > + * If there is no corresponding socket, then we don't have a
>> > > + * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
>> > > + */
>> >
>> > So, in virtio_transport_recv_pkt() should we check that `net` is not set?
>> >
>> > Should we set it to NULL here?
>> >
>>
>> Sounds good to me.
>>
>> > > + if (vsk) {
>> > > + virtio_vsock_skb_set_net(skb, info->net);
>> >
>> > Ditto here about the net refcnt, can the net disappear?
>> > Should we use get_net() in some way, or the socket will prevent that?
>> >
>>
>> As long as the socket has an outstanding skb it can't be destroyed and
>> so will have a reference to the net, that is after skb_set_owner_w() and
>> freeing... so I think this is okay.
>>
>> But, maybe we could simplify the implied relationship between skb, sk,
>> and net by removing the VIRTIO_VSOCK_SKB_CB(skb)->net entirely, and only
>> ever referring to sock_net(skb->sk)? I remember originally having a
>> reason for adding it to the cb, but my hunch is it that it was probably
>> some confusion over the !vsk case.
>>
>> WDYT?
>>
>
>... now I remember the reason, because I didn't want two different
>places for storing the net for RX and TX.
Yeah, but if we can reuse skb->sk for one path and pass it as parameter
to the other path (see my prev email), why store it?
Or even in the TX maybe it can be passed to .send_pkt() in some way,
e.g. storing it in struct virtio_vsock_sock instead that for each skb.
Stefano
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common
2025-11-07 15:07 ` Stefano Garzarella
@ 2025-11-07 15:47 ` Bobby Eshleman
0 siblings, 0 replies; 36+ messages in thread
From: Bobby Eshleman @ 2025-11-07 15:47 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, virtualization, netdev,
linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
Bobby Eshleman
On Fri, Nov 07, 2025 at 04:07:39PM +0100, Stefano Garzarella wrote:
> On Fri, Nov 07, 2025 at 06:33:33AM -0800, Bobby Eshleman wrote:
> > > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > > index dcc8a1d5851e..b8e52c71920a 100644
> > > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > @@ -316,6 +316,15 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
> > > > > info->flags,
> > > > > zcopy);
> > > > >
> > > > > + /*
> > > > > + * If there is no corresponding socket, then we don't have a
> > > > > + * corresponding namespace. This only happens For VIRTIO_VSOCK_OP_RST.
> > > > > + */
> > > >
> > > > So, in virtio_transport_recv_pkt() should we check that `net` is not set?
> > > >
> > > > Should we set it to NULL here?
> > > >
> > >
> > > Sounds good to me.
> > >
> > > > > + if (vsk) {
> > > > > + virtio_vsock_skb_set_net(skb, info->net);
> > > >
> > > > Ditto here about the net refcnt, can the net disappear?
> > > > Should we use get_net() in some way, or the socket will prevent that?
> > > >
> > >
> > > As long as the socket has an outstanding skb it can't be destroyed and
> > > so will have a reference to the net, that is after skb_set_owner_w() and
> > > freeing... so I think this is okay.
> > >
> > > But, maybe we could simplify the implied relationship between skb, sk,
> > > and net by removing the VIRTIO_VSOCK_SKB_CB(skb)->net entirely, and only
> > > ever referring to sock_net(skb->sk)? I remember originally having a
> > > reason for adding it to the cb, but my hunch is it that it was probably
> > > some confusion over the !vsk case.
> > >
> > > WDYT?
> > >
> >
> > ... now I remember the reason, because I didn't want two different
> > places for storing the net for RX and TX.
>
> Yeah, but if we can reuse skb->sk for one path and pass it as parameter to
> the other path (see my prev email), why store it?
>
> Or even in the TX maybe it can be passed to .send_pkt() in some way, e.g.
> storing it in struct virtio_vsock_sock instead that for each skb.
>
> Stefano
>
That's a good point, the rx path only needs to pass to recv_pkt(), it is
not needed after the socket lookup there.
With TX, it does look like we could get rid of it via the
virtio_vsock_sock.
Best,
Bobby
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-11-07 15:47 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 18:27 [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
2025-11-06 16:16 ` Stefano Garzarella
2025-11-07 1:09 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 02/14] vsock/virtio: pack struct virtio_vsock_skb_cb Bobby Eshleman
2025-11-06 16:16 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 03/14] vsock: add netns to vsock skb cb Bobby Eshleman
2025-11-06 16:17 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 04/14] vsock: add netns to vsock core Bobby Eshleman
2025-11-06 16:18 ` Stefano Garzarella
2025-11-07 2:03 ` Bobby Eshleman
2025-11-07 13:53 ` Stefano Garzarella
2025-10-23 18:27 ` [PATCH net-next v8 05/14] vsock/loopback: add netns support Bobby Eshleman
2025-11-06 16:18 ` Stefano Garzarella
2025-11-07 2:17 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 06/14] vsock/virtio: add netns to virtio transport common Bobby Eshleman
2025-11-06 16:20 ` Stefano Garzarella
2025-11-07 2:52 ` Bobby Eshleman
2025-11-07 14:30 ` Stefano Garzarella
2025-11-07 14:33 ` Bobby Eshleman
2025-11-07 15:07 ` Stefano Garzarella
2025-11-07 15:47 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 07/14] vhost/vsock: add netns support Bobby Eshleman
2025-11-06 16:21 ` Stefano Garzarella
2025-11-07 3:07 ` Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 08/14] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 09/14] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 10/14] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 11/14] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 12/14] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 13/14] selftests/vsock: add tests for namespace deletion and mode changes Bobby Eshleman
2025-10-23 18:27 ` [PATCH net-next v8 14/14] selftests/vsock: add tests for module loading order Bobby Eshleman
2025-10-27 13:28 ` [PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock Stefano Garzarella
2025-10-27 17:25 ` Bobby Eshleman
2025-11-06 16:23 ` Stefano Garzarella
2025-11-07 1:00 ` Bobby Eshleman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).