linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock
@ 2025-08-28  0:31 Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 1/9] vsock: a per-net vsock NS mode state Bobby Eshleman
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

This series adds namespace support to vhost-vsock and loopback. It does
not add namespaces to any of the other guest transports (virtio-vsock,
hyperv, or vmci).

The current revision only supports two modes: local or global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).

The mode is set using /proc/sys/net/vsock/ns_mode.

Modes are per-netns and write-once. This allows a system to configure
namespaces independently (some may share CIDs, others are completely
isolated). This also supports future possible  mixed use cases, where
there may be namespaces in global mode spinning up VMs while there are
mixed mode namespaces that provide services to the VMs, but are not
allowed to allocate from the global CID pool.

Additionally, added tests for the new semantics:

tools/testing/selftests/vsock/vmtest.sh
1..22
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 host_vsock_ns_mode_ok
ok 5 host_vsock_ns_mode_write_once_ok
ok 6 global_same_cid_fails
ok 7 local_same_cid_ok
ok 8 global_local_same_cid_ok
ok 9 local_global_same_cid_ok
ok 10 diff_ns_global_host_connect_to_global_vm_ok
ok 11 diff_ns_global_host_connect_to_local_vm_fails
ok 12 diff_ns_global_vm_connect_to_global_host_ok
ok 13 diff_ns_global_vm_connect_to_local_host_fails
ok 14 diff_ns_local_host_connect_to_local_vm_fails
ok 15 diff_ns_local_vm_connect_to_local_host_fails
ok 16 diff_ns_global_to_local_loopback_local_fails
ok 17 diff_ns_local_to_global_loopback_fails
ok 18 diff_ns_local_to_local_loopback_fails
ok 19 diff_ns_global_to_global_loopback_ok
ok 20 same_ns_local_loopback_ok
ok 21 same_ns_local_host_connect_to_local_vm_ok
ok 22 same_ns_local_vm_connect_to_local_host_ok
SUMMARY: PASS=22 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_OQC4.log

Thanks again for everyone's help and reviews!

Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
To: Shuah Khan <shuah@kernel.org>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Simon Horman <horms@kernel.org>
To: Stefan Hajnoczi <stefanha@redhat.com>
To: Michael S. Tsirkin <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Eugenio Pérez <eperezma@redhat.com>
To: K. Y. Srinivasan <kys@microsoft.com>
To: Haiyang Zhang <haiyangz@microsoft.com>
To: Wei Liu <wei.liu@kernel.org>
To: Dexuan Cui <decui@microsoft.com>
To: Bryan Tan <bryan-bt.tan@broadcom.com>
To: Vishnu Dasa <vishnu.dasa@broadcom.com>
To: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: virtualization@lists.linux.dev
Cc: netdev@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: berrange@redhat.com

Changes in v5:
- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
- vsock_global_net -> vsock_global_dummy_net
- fix netns lookup in vhost_vsock to respect pid namespaces
- add callbacks for vsock_loopback to avoid circular dependency
- vmtest.sh loads vsock_loopback module
- remove vsock_net_mode_can_set()
- change vsock_net_write_mode() to return true/false based on success
- make vsock_net_mode enum instead of u8
- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com

Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
  as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
  in commit message)
- lots of cleanup

Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
  https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com

Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
  impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
  https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com

Changes in v1:
- added 'netns' module param to vsock.ko to enable the
  network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
  only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/

---
Bobby Eshleman (9):
      vsock: a per-net vsock NS mode state
      vsock: add net to vsock skb cb
      vsock: add netns to vsock core
      vsock/loopback: add netns support
      vsock/virtio: add netns to virtio transport common
      vhost/vsock: add netns support
      selftests/vsock: improve logging in vmtest.sh
      selftests/vsock: invoke vsock_test through helpers
      selftests/vsock: add namespace tests

 MAINTAINERS                             |    1 +
 drivers/vhost/vsock.c                   |   30 +-
 include/linux/virtio_vsock.h            |   12 +
 include/net/af_vsock.h                  |   89 ++-
 include/net/net_namespace.h             |    4 +
 include/net/netns/vsock.h               |   25 +
 net/vmw_vsock/af_vsock.c                |  312 ++++++++-
 net/vmw_vsock/hyperv_transport.c        |    2 +-
 net/vmw_vsock/virtio_transport.c        |    5 +-
 net/vmw_vsock/virtio_transport_common.c |   14 +-
 net/vmw_vsock/vmci_transport.c          |    4 +-
 net/vmw_vsock/vsock_loopback.c          |   76 ++-
 tools/testing/selftests/vsock/vmtest.sh | 1092 ++++++++++++++++++++++++++-----
 13 files changed, 1475 insertions(+), 191 deletions(-)
---
base-commit: 242041164339594ca019481d54b4f68a7aaff64e
change-id: 20250325-vsock-vmtest-b3a21d2102c2

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 1/9] vsock: a per-net vsock NS mode state
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 2/9] vsock: add net to vsock skb cb Bobby Eshleman
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add the per-net vsock NS mode state. This only adds the structure for
holding the mode and some of the functions for setting/getting and
checking the mode, but does not integrate the functionality yet.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Changes in v5:
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- change from net->vsock.ns_mode to net->vsock.mode
- change vsock_net_set_mode() to vsock_net_write_mode()
- vsock_net_write_mode() returns bool for write success to avoid
  need to use vsock_net_mode_can_set()
- remove vsock_net_mode_can_set()
---
 MAINTAINERS                 |  1 +
 include/net/af_vsock.h      | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/net/net_namespace.h |  4 ++++
 include/net/netns/vsock.h   | 20 ++++++++++++++++++++
 4 files changed, 67 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bce96dd254b8..deaf7f02ec32 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26578,6 +26578,7 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/vhost/vsock.c
 F:	include/linux/virtio_vsock.h
+F:	include/net/netns/vsock.h
 F:	include/uapi/linux/virtio_vsock.h
 F:	net/vmw_vsock/virtio_transport.c
 F:	net/vmw_vsock/virtio_transport_common.c
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d40e978126e3..5707514c30b6 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -10,6 +10,7 @@
 
 #include <linux/kernel.h>
 #include <linux/workqueue.h>
+#include <net/netns/vsock.h>
 #include <net/sock.h>
 #include <uapi/linux/vm_sockets.h>
 
@@ -256,4 +257,45 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
 {
 	return t->msgzerocopy_allow && t->msgzerocopy_allow();
 }
+
+static inline u8 vsock_net_mode(struct net *net)
+{
+	enum vsock_net_mode ret;
+
+	spin_lock_bh(&net->vsock.lock);
+	ret = net->vsock.mode;
+	spin_unlock_bh(&net->vsock.lock);
+	return ret;
+}
+
+static inline bool vsock_net_write_mode(struct net *net, u8 mode)
+{
+	bool ret;
+
+	spin_lock_bh(&net->vsock.lock);
+
+	if (net->vsock.written) {
+		ret = false;
+		goto skip;
+	}
+
+	net->vsock.mode = mode;
+	net->vsock.written = true;
+	ret = true;
+
+skip:
+	spin_unlock_bh(&net->vsock.lock);
+	return ret;
+}
+
+/* Return true if vsock net mode check passes. Otherwise, return false.
+ *
+ * Read more about modes in comment header of net/vmw_vsock/af_vsock.c.
+ */
+static inline bool vsock_net_check_mode(struct net *n1, struct net *n2)
+{
+	return net_eq(n1, n2) ||
+	       (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL &&
+		vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL);
+}
 #endif /* __AF_VSOCK_H__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 025a7574b275..005c0da4fb62 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -37,6 +37,7 @@
 #include <net/netns/smc.h>
 #include <net/netns/bpf.h>
 #include <net/netns/mctp.h>
+#include <net/netns/vsock.h>
 #include <net/net_trackers.h>
 #include <linux/ns_common.h>
 #include <linux/idr.h>
@@ -196,6 +197,9 @@ struct net {
 	/* Move to a better place when the config guard is removed. */
 	struct mutex		rtnl_mutex;
 #endif
+#if IS_ENABLED(CONFIG_VSOCKETS)
+	struct netns_vsock	vsock;
+#endif
 } __randomize_layout;
 
 #include <linux/seq_file_net.h>
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
new file mode 100644
index 000000000000..d4593c0b8dc4
--- /dev/null
+++ b/include/net/netns/vsock.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_NET_NAMESPACE_VSOCK_H
+#define __NET_NET_NAMESPACE_VSOCK_H
+
+#include <linux/types.h>
+
+enum vsock_net_mode {
+	VSOCK_NET_MODE_GLOBAL,
+	VSOCK_NET_MODE_LOCAL,
+};
+
+struct netns_vsock {
+	struct ctl_table_header *vsock_hdr;
+	spinlock_t lock;
+
+	/* protected by lock */
+	enum vsock_net_mode mode;
+	bool written;
+};
+#endif /* __NET_NET_NAMESPACE_VSOCK_H */

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 2/9] vsock: add net to vsock skb cb
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 1/9] vsock: a per-net vsock NS mode state Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 3/9] vsock: add netns to vsock core Bobby Eshleman
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add a net pointer to the vsock skb and helpers for getting/setting it.
This is in preparation for adding vsock NS support.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Changes in v5:
- some diff context change due to rebase to current net-next
---
 include/linux/virtio_vsock.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 0c67543a45c8..c547cda7196b 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -13,6 +13,7 @@ struct virtio_vsock_skb_cb {
 	bool reply;
 	bool tap_delivered;
 	u32 offset;
+	struct net *net;
 };
 
 #define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb))
@@ -130,6 +131,16 @@ static inline size_t virtio_vsock_skb_len(struct sk_buff *skb)
 	return (size_t)(skb_end_pointer(skb) - skb->head);
 }
 
+static inline struct net *virtio_vsock_skb_net(struct sk_buff *skb)
+{
+	return VIRTIO_VSOCK_SKB_CB(skb)->net;
+}
+
+static inline void virtio_vsock_skb_set_net(struct sk_buff *skb, struct net *net)
+{
+	VIRTIO_VSOCK_SKB_CB(skb)->net = net;
+}
+
 /* Dimension the RX SKB so that the entire thing fits exactly into
  * a single 4KiB page. This avoids wasting memory due to alloc_skb()
  * rounding up to the next page order and also means that we

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 3/9] vsock: add netns to vsock core
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 1/9] vsock: a per-net vsock NS mode state Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 2/9] vsock: add net to vsock skb cb Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-09-02 15:39   ` Stefano Garzarella
  2025-08-28  0:31 ` [PATCH net-next v5 4/9] vsock/loopback: add netns support Bobby Eshleman
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add netns to logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).

Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".

Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
used by transports that do not support namespacing. This dummy always
has mode "global" to preserve previous CID behavior.

This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
accepts the "global" or "local" mode strings.

The transports (besides vhost) are modified to use the global dummy.

Add netns functionality (initialization, passing to transports, procfs,
etc...) to the af_vsock socket layer. Later patches that add netns
support to transports depend on this patch.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Changes in v5:
- vsock_global_net() -> vsock_global_dummy_net()
- update comments for new uAPI
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- add prototype changes so patch remains compilable
---
 drivers/vhost/vsock.c                   |   4 +-
 include/net/af_vsock.h                  |  13 +-
 net/vmw_vsock/af_vsock.c                | 202 +++++++++++++++++++++++++++++---
 net/vmw_vsock/hyperv_transport.c        |   2 +-
 net/vmw_vsock/virtio_transport.c        |   5 +-
 net/vmw_vsock/virtio_transport_common.c |   4 +-
 net/vmw_vsock/vmci_transport.c          |   4 +-
 net/vmw_vsock/vsock_loopback.c          |   4 +-
 8 files changed, 210 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ae01457ea2cd..34adf0cf9124 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
 	return true;
 }
 
-static bool vhost_transport_seqpacket_allow(u32 remote_cid);
+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
 
 static struct virtio_transport vhost_transport = {
 	.transport = {
@@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
 	.send_pkt = vhost_transport_send_pkt,
 };
 
-static bool vhost_transport_seqpacket_allow(u32 remote_cid)
+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
 {
 	struct vhost_vsock *vsock;
 	bool seqpacket_allow = false;
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 5707514c30b6..83f873174ba3 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -144,7 +144,7 @@ struct vsock_transport {
 				     int flags);
 	int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
 				 size_t len);
-	bool (*seqpacket_allow)(u32 remote_cid);
+	bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
 	u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
 
 	/* Notification. */
@@ -214,9 +214,10 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
 void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
-					 struct sockaddr_vm *dst);
+					 struct sockaddr_vm *dst,
+					 struct net *net);
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(struct vsock_transport *transport,
 				     void (*fn)(struct sock *sk));
@@ -258,6 +259,12 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
 	return t->msgzerocopy_allow && t->msgzerocopy_allow();
 }
 
+extern struct net __vsock_global_dummy_net;
+static inline struct net *vsock_global_dummy_net(void)
+{
+	return &__vsock_global_dummy_net;
+}
+
 static inline u8 vsock_net_mode(struct net *net)
 {
 	enum vsock_net_mode ret;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 0538948d5fd9..68a8875c8106 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -83,6 +83,24 @@
  *   TCP_ESTABLISHED - connected
  *   TCP_CLOSING - disconnecting
  *   TCP_LISTEN - listening
+ *
+ * - Namespaces in vsock support two different modes configured
+ *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
+ *   Each mode defines how the namespace interacts with CIDs.
+ *   /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
+ *   and locked down by a namespace manager. The default is "global". The mode
+ *   is set per-namespace.
+ *
+ *   The modes affect the allocation and accessibility of CIDs as follows:
+ *   - global - aka fully public
+ *      - CID allocation draws from the public pool
+ *      - AF_VSOCK sockets may reach any CID allocated from the public pool
+ *      - AF_VSOCK sockets may not reach CIDs allocated from private pools
+ *
+ *   - local - aka fully private
+ *     - CID allocation draws only from the private pool, does not affect public pool
+ *     - AF_VSOCK sockets may only reach CIDs from the private pool
+ *     - AF_VSOCK sockets may not reach CIDs allocated from outside the pool
  */
 
 #include <linux/compat.h>
@@ -100,6 +118,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/net.h>
+#include <linux/proc_fs.h>
 #include <linux/poll.h>
 #include <linux/random.h>
 #include <linux/skbuff.h>
@@ -111,6 +130,7 @@
 #include <linux/workqueue.h>
 #include <net/sock.h>
 #include <net/af_vsock.h>
+#include <net/netns/vsock.h>
 #include <uapi/linux/vm_sockets.h>
 #include <uapi/asm-generic/ioctls.h>
 
@@ -149,6 +169,9 @@ static const struct vsock_transport *transport_dgram;
 static const struct vsock_transport *transport_local;
 static DEFINE_MUTEX(vsock_register_mutex);
 
+struct net __vsock_global_dummy_net;
+EXPORT_SYMBOL_GPL(__vsock_global_dummy_net);
+
 /**** UTILS ****/
 
 /* Each bound VSocket is stored in the bind hash table and each connected
@@ -235,33 +258,42 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
 	sock_put(&vsk->sk);
 }
 
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr,
+					      struct net *net)
 {
 	struct vsock_sock *vsk;
 
 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+		struct sock *sk = sk_vsock(vsk);
+
 		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
-			return sk_vsock(vsk);
+			if (vsock_net_check_mode(net, sock_net(sk)))
+				return sk;
 
 		if (addr->svm_port == vsk->local_addr.svm_port &&
 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
-		     addr->svm_cid == VMADDR_CID_ANY))
-			return sk_vsock(vsk);
+		     addr->svm_cid == VMADDR_CID_ANY) &&
+		     vsock_net_check_mode(net, sock_net(sk)))
+				return sk;
 	}
 
 	return NULL;
 }
 
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
-						  struct sockaddr_vm *dst)
+						  struct sockaddr_vm *dst,
+						  struct net *net)
 {
 	struct vsock_sock *vsk;
 
 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
 			    connected_table) {
+		struct sock *sk = sk_vsock(vsk);
+
 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
-		    dst->svm_port == vsk->local_addr.svm_port) {
-			return sk_vsock(vsk);
+		    dst->svm_port == vsk->local_addr.svm_port &&
+		    vsock_net_check_mode(net, sock_net(sk))) {
+			return sk;
 		}
 	}
 
@@ -304,12 +336,12 @@ void vsock_remove_connected(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(vsock_remove_connected);
 
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net)
 {
 	struct sock *sk;
 
 	spin_lock_bh(&vsock_table_lock);
-	sk = __vsock_find_bound_socket(addr);
+	sk = __vsock_find_bound_socket(addr, net);
 	if (sk)
 		sock_hold(sk);
 
@@ -320,12 +352,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
-					 struct sockaddr_vm *dst)
+					 struct sockaddr_vm *dst,
+					 struct net *net)
 {
 	struct sock *sk;
 
 	spin_lock_bh(&vsock_table_lock);
-	sk = __vsock_find_connected_socket(src, dst);
+	sk = __vsock_find_connected_socket(src, dst, net);
 	if (sk)
 		sock_hold(sk);
 
@@ -528,7 +561,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 
 	if (sk->sk_type == SOCK_SEQPACKET) {
 		if (!new_transport->seqpacket_allow ||
-		    !new_transport->seqpacket_allow(remote_cid)) {
+		    !new_transport->seqpacket_allow(vsk, remote_cid)) {
 			module_put(new_transport->module);
 			return -ESOCKTNOSUPPORT;
 		}
@@ -678,6 +711,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
 {
 	static u32 port;
 	struct sockaddr_vm new_addr;
+	struct net *net = sock_net(sk_vsock(vsk));
 
 	if (!port)
 		port = get_random_u32_above(LAST_RESERVED_PORT);
@@ -695,7 +729,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
 
 			new_addr.svm_port = port++;
 
-			if (!__vsock_find_bound_socket(&new_addr)) {
+			if (!__vsock_find_bound_socket(&new_addr, net)) {
 				found = true;
 				break;
 			}
@@ -712,7 +746,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
 			return -EACCES;
 		}
 
-		if (__vsock_find_bound_socket(&new_addr))
+		if (__vsock_find_bound_socket(&new_addr, net))
 			return -EADDRINUSE;
 	}
 
@@ -2636,6 +2670,137 @@ static struct miscdevice vsock_device = {
 	.fops		= &vsock_device_ops,
 };
 
+#define VSOCK_NET_MODE_STRING_MAX 7
+
+static int vsock_net_mode_string(const struct ctl_table *table, int write,
+				 void *buffer, size_t *lenp, loff_t *ppos)
+{
+	char buf[VSOCK_NET_MODE_STRING_MAX] = {0};
+	enum vsock_net_mode mode;
+	struct ctl_table tmp;
+	struct net *net;
+	const char *p;
+	int ret;
+
+	if (!table->data || !table->maxlen || !*lenp) {
+		*lenp = 0;
+		return 0;
+	}
+
+	net = current->nsproxy->net_ns;
+	tmp = *table;
+	tmp.data = buf;
+
+	if (!write) {
+		mode = vsock_net_mode(net);
+
+		if (mode == VSOCK_NET_MODE_GLOBAL) {
+			p = "global";
+		} else if (mode == VSOCK_NET_MODE_LOCAL) {
+			p = "local";
+		} else {
+			WARN_ONCE(true, "netns has invalid vsock mode");
+			*lenp = 0;
+			return 0;
+		}
+
+		strscpy(buf, p, sizeof(buf));
+		tmp.maxlen = strlen(p);
+	}
+
+	ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	if (write) {
+		if (!strncmp(buffer, "global", 6))
+			mode = VSOCK_NET_MODE_GLOBAL;
+		else if (!strncmp(buffer, "local", 5))
+			mode = VSOCK_NET_MODE_LOCAL;
+		else
+			return -EINVAL;
+
+		if (!vsock_net_write_mode(net, mode))
+			return -EPERM;
+	}
+
+	return 0;
+}
+
+static struct ctl_table vsock_table[] = {
+	{
+		.procname	= "ns_mode",
+		.data		= &init_net.vsock.mode,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= vsock_net_mode_string
+	},
+};
+
+static int __net_init vsock_sysctl_register(struct net *net)
+{
+	struct ctl_table *table;
+
+	if (net_eq(net, &init_net)) {
+		table = vsock_table;
+	} else {
+		table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);
+		if (!table)
+			goto err_alloc;
+
+		table[0].data = &net->vsock.mode;
+	}
+
+	net->vsock.vsock_hdr = register_net_sysctl_sz(net, "net/vsock", table,
+						      ARRAY_SIZE(vsock_table));
+	if (!net->vsock.vsock_hdr)
+		goto err_reg;
+
+	return 0;
+
+err_reg:
+	if (!net_eq(net, &init_net))
+		kfree(table);
+err_alloc:
+	return -ENOMEM;
+}
+
+static void vsock_sysctl_unregister(struct net *net)
+{
+	const struct ctl_table *table;
+
+	table = net->vsock.vsock_hdr->ctl_table_arg;
+	unregister_net_sysctl_table(net->vsock.vsock_hdr);
+	if (!net_eq(net, &init_net))
+		kfree(table);
+}
+
+static void vsock_net_init(struct net *net)
+{
+	spin_lock_init(&net->vsock.lock);
+	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
+}
+
+static __net_init int vsock_sysctl_init_net(struct net *net)
+{
+	vsock_net_init(net);
+
+	if (vsock_sysctl_register(net))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static __net_exit void vsock_sysctl_exit_net(struct net *net)
+{
+	vsock_sysctl_unregister(net);
+}
+
+static struct pernet_operations vsock_sysctl_ops __net_initdata = {
+	.init = vsock_sysctl_init_net,
+	.exit = vsock_sysctl_exit_net,
+};
+
 static int __init vsock_init(void)
 {
 	int err = 0;
@@ -2663,10 +2828,19 @@ static int __init vsock_init(void)
 		goto err_unregister_proto;
 	}
 
+	if (register_pernet_subsys(&vsock_sysctl_ops)) {
+		err = -ENOMEM;
+		goto err_unregister_sock;
+	}
+
+	vsock_net_init(&init_net);
+	vsock_net_init(vsock_global_dummy_net());
 	vsock_bpf_build_proto();
 
 	return 0;
 
+err_unregister_sock:
+	sock_unregister(AF_VSOCK);
 err_unregister_proto:
 	proto_unregister(&vsock_proto);
 err_deregister_misc:
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 432fcbbd14d4..79bc55eeecb3 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -313,7 +313,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		return;
 
 	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
-	sk = vsock_find_bound_socket(&addr);
+	sk = vsock_find_bound_socket(&addr, vsock_global_dummy_net());
 	if (!sk)
 		return;
 
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index b6569b0ca2bb..af3e924fcc31 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -536,7 +536,7 @@ static bool virtio_transport_msgzerocopy_allow(void)
 	return true;
 }
 
-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
 
 static struct virtio_transport virtio_transport = {
 	.transport = {
@@ -593,7 +593,7 @@ static struct virtio_transport virtio_transport = {
 	.can_msgzerocopy = virtio_transport_can_msgzerocopy,
 };
 
-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
 {
 	struct virtio_vsock *vsock;
 	bool seqpacket_allow;
@@ -659,6 +659,7 @@ static void virtio_transport_rx_work(struct work_struct *work)
 			if (payload_len)
 				virtio_vsock_skb_put(skb, payload_len);
 
+			virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
 			virtio_transport_deliver_tap_pkt(skb);
 			virtio_transport_recv_pkt(&virtio_transport, skb);
 		}
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index fe92e5fa95b4..9b3aa4f0395d 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1604,9 +1604,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 	/* The socket must be in connected or bound table
 	 * otherwise send reset back
 	 */
-	sk = vsock_find_connected_socket(&src, &dst);
+	sk = vsock_find_connected_socket(&src, &dst, vsock_global_dummy_net());
 	if (!sk) {
-		sk = vsock_find_bound_socket(&dst);
+		sk = vsock_find_bound_socket(&dst, vsock_global_dummy_net());
 		if (!sk) {
 			(void)virtio_transport_reset_no_sock(t, skb);
 			goto free_pkt;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 7eccd6708d66..fd600ad77d73 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -703,9 +703,9 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
 	vsock_addr_init(&src, pkt->dg.src.context, pkt->src_port);
 	vsock_addr_init(&dst, pkt->dg.dst.context, pkt->dst_port);
 
-	sk = vsock_find_connected_socket(&src, &dst);
+	sk = vsock_find_connected_socket(&src, &dst, vsock_global_dummy_net());
 	if (!sk) {
-		sk = vsock_find_bound_socket(&dst);
+		sk = vsock_find_bound_socket(&dst, vsock_global_dummy_net());
 		if (!sk) {
 			/* We could not find a socket for this specified
 			 * address.  If this packet is a RST, we just drop it.
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 6e78927a598e..1b2fab73e0d0 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
 	return 0;
 }
 
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
 static bool vsock_loopback_msgzerocopy_allow(void)
 {
 	return true;
@@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = {
 	.send_pkt = vsock_loopback_send_pkt,
 };
 
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
 {
 	return true;
 }

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 4/9] vsock/loopback: add netns support
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
                   ` (2 preceding siblings ...)
  2025-08-28  0:31 ` [PATCH net-next v5 3/9] vsock: add netns to vsock core Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28 10:35   ` kernel test robot
  2025-09-02 15:39   ` Stefano Garzarella
  2025-08-28  0:31 ` [PATCH net-next v5 5/9] vsock/virtio: add netns to virtio transport common Bobby Eshleman
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add NS support to vsock loopback. Sockets in a global mode netns
communicate with each other, regardless of namespace. Sockets in a local
mode netns may only communicate with other sockets within the same
namespace.

Add callbacks for transport to hook into the initialization and exit of
net namespaces.

The transport's init hook will be called once per netns init. Likewise
for exit.

When a set of init/exit callbacks is registered, the init callback is
called on each already existing namespace.

Only one callback registration is supported for now. Currently
vsock_loopback is the only user.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Changes in v5:
- add callbacks code to avoid reverse dependency
- add logic for handling vsock_loopback setup for already existing
  namespaces
---
 include/net/af_vsock.h         |  34 +++++++++++++
 include/net/netns/vsock.h      |   5 ++
 net/vmw_vsock/af_vsock.c       | 110 +++++++++++++++++++++++++++++++++++++++++
 net/vmw_vsock/vsock_loopback.c |  72 ++++++++++++++++++++++++---
 4 files changed, 213 insertions(+), 8 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 83f873174ba3..9333a98b9a1e 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -305,4 +305,38 @@ static inline bool vsock_net_check_mode(struct net *n1, struct net *n2)
 	       (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL &&
 		vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL);
 }
+
+struct vsock_net_callbacks {
+	int (*init)(struct net *net);
+	void (*exit)(struct net *net);
+	struct module *owner;
+};
+
+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
+
+#define vsock_register_net_callbacks(__init, __exit) \
+	__vsock_register_net_callbacks((__init), (__exit), THIS_MODULE)
+
+int __vsock_register_net_callbacks(int (*init)(struct net *net),
+				   void (*exit)(struct net *net),
+				   struct module *owner);
+void vsock_unregister_net_callbacks(void);
+
+#else
+
+#define vsock_register_net_callbacks(__init, __exit) do { } while (0)
+
+static inline int __vsock_register_net_callbacks(int (*init)(struct net *net),
+						 void (*exit)(struct net *net),
+						 struct module *owner)
+{
+	return 0;
+}
+
+static inline void vsock_unregister_net_callbacks(void) {}
+static inline int vsock_net_call_init(struct net *net) { return 0; }
+static inline void vsock_net_call_exit(struct net *net) {}
+
+#endif /* CONFIG_VSOCKETS_LOOPBACK */
+
 #endif /* __AF_VSOCK_H__ */
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index d4593c0b8dc4..08d9a933c540 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -9,6 +9,8 @@ enum vsock_net_mode {
 	VSOCK_NET_MODE_LOCAL,
 };
 
+struct vsock_loopback;
+
 struct netns_vsock {
 	struct ctl_table_header *vsock_hdr;
 	spinlock_t lock;
@@ -16,5 +18,8 @@ struct netns_vsock {
 	/* protected by lock */
 	enum vsock_net_mode mode;
 	bool written;
+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
+	struct vsock_loopback *loopback;
+#endif
 };
 #endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 68a8875c8106..5a73d9e1a96f 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -134,6 +134,9 @@
 #include <uapi/linux/vm_sockets.h>
 #include <uapi/asm-generic/ioctls.h>
 
+static struct vsock_net_callbacks vsock_net_callbacks;
+static DEFINE_MUTEX(vsock_net_callbacks_lock);
+
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
 static void vsock_sk_destruct(struct sock *sk);
 static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
@@ -2781,6 +2784,49 @@ static void vsock_net_init(struct net *net)
 	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
 }
 
+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
+static int vsock_net_call_init(struct net *net)
+{
+	struct vsock_net_callbacks *cbs;
+	int ret;
+
+	mutex_lock(&vsock_net_callbacks_lock);
+	cbs = &vsock_net_callbacks;
+
+	ret = 0;
+	if (!cbs->owner)
+		goto out;
+
+	if (try_module_get(cbs->owner)) {
+		ret = cbs->init(net);
+		module_put(cbs->owner);
+	}
+
+out:
+	mutex_unlock(&vsock_net_callbacks_lock);
+	return ret;
+}
+
+static void vsock_net_call_exit(struct net *net)
+{
+	struct vsock_net_callbacks *cbs;
+
+	mutex_lock(&vsock_net_callbacks_lock);
+	cbs = &vsock_net_callbacks;
+
+	if (!cbs->owner)
+		goto out;
+
+	if (try_module_get(cbs->owner)) {
+		cbs->exit(net);
+		module_put(cbs->owner);
+	}
+
+out:
+	mutex_unlock(&vsock_net_callbacks_lock);
+}
+#endif /* CONFIG_VSOCKETS_LOOPBACK */
+
 static __net_init int vsock_sysctl_init_net(struct net *net)
 {
 	vsock_net_init(net);
@@ -2788,12 +2834,20 @@ static __net_init int vsock_sysctl_init_net(struct net *net)
 	if (vsock_sysctl_register(net))
 		return -ENOMEM;
 
+	if (vsock_net_call_init(net) < 0)
+		goto err_sysctl;
+
 	return 0;
+
+err_sysctl:
+	vsock_sysctl_unregister(net);
+	return -ENOMEM;
 }
 
 static __net_exit void vsock_sysctl_exit_net(struct net *net)
 {
 	vsock_sysctl_unregister(net);
+	vsock_net_call_exit(net);
 }
 
 static struct pernet_operations vsock_sysctl_ops __net_initdata = {
@@ -2938,6 +2992,62 @@ void vsock_core_unregister(const struct vsock_transport *t)
 }
 EXPORT_SYMBOL_GPL(vsock_core_unregister);
 
+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
+int __vsock_register_net_callbacks(int (*init)(struct net *net),
+				   void (*exit)(struct net *net),
+				   struct module *owner)
+{
+	struct vsock_net_callbacks *cbs;
+	struct net *net;
+	int ret = 0;
+
+	mutex_lock(&vsock_net_callbacks_lock);
+
+	cbs = &vsock_net_callbacks;
+	cbs->init = init;
+	cbs->exit = exit;
+	cbs->owner = owner;
+
+	/* call callbacks on any net previously created */
+	down_read(&net_rwsem);
+
+	if (try_module_get(cbs->owner)) {
+		for_each_net(net) {
+			ret = cbs->init(net);
+			if (ret < 0)
+				break;
+		}
+
+		if (ret < 0)
+			for_each_net(net)
+				cbs->exit(net);
+
+		module_put(cbs->owner);
+	}
+
+	up_read(&net_rwsem);
+	mutex_unlock(&vsock_net_callbacks_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__vsock_register_net_callbacks);
+
+void vsock_unregister_net_callbacks(void)
+{
+	struct vsock_net_callbacks *cbs;
+
+	mutex_lock(&vsock_net_callbacks_lock);
+
+	cbs = &vsock_net_callbacks;
+	cbs->init = NULL;
+	cbs->exit = NULL;
+	cbs->owner = NULL;
+
+	mutex_unlock(&vsock_net_callbacks_lock);
+}
+EXPORT_SYMBOL_GPL(vsock_unregister_net_callbacks);
+#endif /* CONFIG_VSOCKETS_LOOPBACK */
+
 module_init(vsock_init);
 module_exit(vsock_exit);
 
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 1b2fab73e0d0..f16d21711cb0 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -28,8 +28,19 @@ static u32 vsock_loopback_get_local_cid(void)
 
 static int vsock_loopback_send_pkt(struct sk_buff *skb)
 {
-	struct vsock_loopback *vsock = &the_vsock_loopback;
+	struct vsock_loopback *vsock;
 	int len = skb->len;
+	struct net *net;
+
+	if (skb->sk)
+		net = sock_net(skb->sk);
+	else
+		net = NULL;
+
+	if (net && net->vsock.mode == VSOCK_NET_MODE_LOCAL)
+		vsock = net->vsock.loopback;
+	else
+		vsock = &the_vsock_loopback;
 
 	virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
 	queue_work(vsock->workqueue, &vsock->pkt_work);
@@ -134,27 +145,72 @@ static void vsock_loopback_work(struct work_struct *work)
 	}
 }
 
-static int __init vsock_loopback_init(void)
+static int vsock_loopback_init_vsock(struct vsock_loopback *vsock)
 {
-	struct vsock_loopback *vsock = &the_vsock_loopback;
-	int ret;
-
 	vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0);
 	if (!vsock->workqueue)
 		return -ENOMEM;
 
 	skb_queue_head_init(&vsock->pkt_queue);
 	INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
+	return 0;
+}
+
+static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock)
+{
+	if (vsock->workqueue)
+		destroy_workqueue(vsock->workqueue);
+}
+
+/* called with vsock_net_callbacks lock held */
+static int vsock_loopback_init_net(struct net *net)
+{
+	if (WARN_ON_ONCE(net->vsock.loopback))
+		return 0;
+
+	net->vsock.loopback = kmalloc(sizeof(*net->vsock.loopback), GFP_KERNEL);
+	if (!net->vsock.loopback)
+		return -ENOMEM;
+
+	return vsock_loopback_init_vsock(net->vsock.loopback);
+}
+
+/* called with vsock_net_callbacks lock held */
+static void vsock_loopback_exit_net(struct net *net)
+{
+	if (net->vsock.loopback) {
+		vsock_loopback_deinit_vsock(net->vsock.loopback);
+		kfree(net->vsock.loopback);
+	}
+}
+
+static int __init vsock_loopback_init(void)
+{
+	struct vsock_loopback *vsock = &the_vsock_loopback;
+	int ret;
+
+	ret = vsock_loopback_init_vsock(vsock);
+	if (ret < 0)
+		return ret;
+
+	ret = vsock_register_net_callbacks(vsock_loopback_init_net,
+					   vsock_loopback_exit_net);
+	if (ret < 0)
+		goto out_deinit_vsock;
 
 	ret = vsock_core_register(&loopback_transport.transport,
 				  VSOCK_TRANSPORT_F_LOCAL);
 	if (ret)
-		goto out_wq;
+		goto out_unregister_net;
+
 
 	return 0;
 
-out_wq:
-	destroy_workqueue(vsock->workqueue);
+out_unregister_net:
+	vsock_unregister_net_callbacks();
+
+out_deinit_vsock:
+	vsock_loopback_deinit_vsock(vsock);
 	return ret;
 }
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 5/9] vsock/virtio: add netns to virtio transport common
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
                   ` (3 preceding siblings ...)
  2025-08-28  0:31 ` [PATCH net-next v5 4/9] vsock/loopback: add netns support Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 6/9] vhost/vsock: add netns support Bobby Eshleman
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add support to the virtio-vsock common code for passing around net
namespace pointers (tx and rx). The series still requires vhost/virtio
transport support to be added by future patches.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c547cda7196b..ce6d15eede9c 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -184,6 +184,7 @@ struct virtio_vsock_pkt_info {
 	u32 remote_cid, remote_port;
 	struct vsock_sock *vsk;
 	struct msghdr *msg;
+	struct net *net;
 	u32 pkt_len;
 	u16 type;
 	u16 op;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9b3aa4f0395d..7b566c8f8082 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -314,6 +314,8 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
 					 info->flags,
 					 zcopy);
 
+	virtio_vsock_skb_set_net(skb, info->net);
+
 	return skb;
 out:
 	kfree_skb(skb);
@@ -525,6 +527,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk)
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -1065,6 +1068,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_REQUEST,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -1080,6 +1084,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
 			 (mode & SEND_SHUTDOWN ?
 			  VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -1106,6 +1111,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
 		.msg = msg,
 		.pkt_len = len,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -1143,6 +1149,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
 		.op = VIRTIO_VSOCK_OP_RST,
 		.reply = !!skb,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	/* Send RST only if the original pkt is not a RST pkt */
@@ -1163,6 +1170,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 		.op = VIRTIO_VSOCK_OP_RST,
 		.type = le16_to_cpu(hdr->type),
 		.reply = true,
+		.net = virtio_vsock_skb_net(skb),
 	};
 	struct sk_buff *reply;
 
@@ -1463,6 +1471,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
 		.remote_port = le32_to_cpu(hdr->src_port),
 		.reply = true,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -1577,6 +1586,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 			       struct sk_buff *skb)
 {
 	struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+	struct net *net = virtio_vsock_skb_net(skb);
 	struct sockaddr_vm src, dst;
 	struct vsock_sock *vsk;
 	struct sock *sk;
@@ -1604,9 +1614,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 	/* The socket must be in connected or bound table
 	 * otherwise send reset back
 	 */
-	sk = vsock_find_connected_socket(&src, &dst, vsock_global_dummy_net());
+	sk = vsock_find_connected_socket(&src, &dst, net);
 	if (!sk) {
-		sk = vsock_find_bound_socket(&dst, vsock_global_dummy_net());
+		sk = vsock_find_bound_socket(&dst, net);
 		if (!sk) {
 			(void)virtio_transport_reset_no_sock(t, skb);
 			goto free_pkt;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 6/9] vhost/vsock: add netns support
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
                   ` (4 preceding siblings ...)
  2025-08-28  0:31 ` [PATCH net-next v5 5/9] vsock/virtio: add netns to virtio transport common Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 7/9] selftests/vsock: improve logging in vmtest.sh Bobby Eshleman
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add the ability to isolate vsock flows using namespaces.

The VM, via the vhost_vsock struct, inherits its namespace from the
process that opens the vhost-vsock device. vhost_vsock lookup functions
are modified to take into account the mode (e.g., if CIDs are matching
but modes don't align, then return NULL).

vhost_vsock now acquires a reference to the namespace.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Changes in v5:
- respect pid namespaces when assigning namespace to vhost_vsock
---
 drivers/vhost/vsock.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34adf0cf9124..f7405bb27aab 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -46,6 +46,8 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
 struct vhost_vsock {
 	struct vhost_dev dev;
 	struct vhost_virtqueue vqs[2];
+	struct net *net;
+	netns_tracker ns_tracker;
 
 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
 	struct hlist_node hash;
@@ -67,7 +69,7 @@ static u32 vhost_transport_get_local_cid(void)
 /* Callers that dereference the return value must hold vhost_vsock_mutex or the
  * RCU read lock.
  */
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net)
 {
 	struct vhost_vsock *vsock;
 
@@ -78,9 +80,8 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
 		if (other_cid == 0)
 			continue;
 
-		if (other_cid == guest_cid)
+		if (other_cid == guest_cid && vsock_net_check_mode(net, vsock->net))
 			return vsock;
-
 	}
 
 	return NULL;
@@ -272,13 +273,14 @@ static int
 vhost_transport_send_pkt(struct sk_buff *skb)
 {
 	struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+	struct net *net = virtio_vsock_skb_net(skb);
 	struct vhost_vsock *vsock;
 	int len = skb->len;
 
 	rcu_read_lock();
 
 	/* Find the vhost_vsock according to guest context id  */
-	vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid));
+	vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net);
 	if (!vsock) {
 		rcu_read_unlock();
 		kfree_skb(skb);
@@ -305,7 +307,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
 	rcu_read_lock();
 
 	/* Find the vhost_vsock according to guest context id  */
-	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk_vsock(vsk)));
 	if (!vsock)
 		goto out;
 
@@ -462,11 +464,12 @@ static struct virtio_transport vhost_transport = {
 
 static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
 {
+	struct net *net = sock_net(sk_vsock(vsk));
 	struct vhost_vsock *vsock;
 	bool seqpacket_allow = false;
 
 	rcu_read_lock();
-	vsock = vhost_vsock_get(remote_cid);
+	vsock = vhost_vsock_get(remote_cid, net);
 
 	if (vsock)
 		seqpacket_allow = vsock->seqpacket_allow;
@@ -526,6 +529,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			continue;
 		}
 
+		virtio_vsock_skb_set_net(skb, vsock->net);
 		total_len += sizeof(*hdr) + skb->len;
 
 		/* Deliver to monitoring devices all received packets */
@@ -652,10 +656,14 @@ static void vhost_vsock_free(struct vhost_vsock *vsock)
 
 static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 {
+
 	struct vhost_virtqueue **vqs;
 	struct vhost_vsock *vsock;
+	struct net *net;
 	int ret;
 
+	net = current->nsproxy->net_ns;
+
 	/* This struct is large and allocation could fail, fall back to vmalloc
 	 * if there is no other way.
 	 */
@@ -669,6 +677,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 		goto out;
 	}
 
+	vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
 	vsock->guest_cid = 0; /* no CID assigned yet */
 	vsock->seqpacket_allow = false;
 
@@ -708,7 +717,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
 	 */
 
 	/* If the peer is still valid, no need to reset connection */
-	if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+	if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk)))
 		return;
 
 	/* If the close timeout is pending, let it expire.  This avoids races
@@ -753,6 +762,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
 	virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
 
 	vhost_dev_cleanup(&vsock->dev);
+	put_net_track(vsock->net, &vsock->ns_tracker);
 	kfree(vsock->dev.vqs);
 	vhost_vsock_free(vsock);
 	return 0;
@@ -779,7 +789,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
 
 	/* Refuse if CID is already in use */
 	mutex_lock(&vhost_vsock_mutex);
-	other = vhost_vsock_get(guest_cid);
+	other = vhost_vsock_get(guest_cid, vsock->net);
 	if (other && other != vsock) {
 		mutex_unlock(&vhost_vsock_mutex);
 		return -EADDRINUSE;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 7/9] selftests/vsock: improve logging in vmtest.sh
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
                   ` (5 preceding siblings ...)
  2025-08-28  0:31 ` [PATCH net-next v5 6/9] vhost/vsock: add netns support Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 8/9] selftests/vsock: invoke vsock_test through helpers Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 9/9] selftests/vsock: add namespace tests Bobby Eshleman
  8 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Improve logging by adding configurable log levels. Additionally, improve
usability of logging functions. Remove the test name prefix from logging
functions so that logging calls can be made deeper into the call stack
without passing down the test name or setting some global. Teach log
function to accept a LOG_PREFIX variable to avoid unnecessary argument
shifting.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 tools/testing/selftests/vsock/vmtest.sh | 75 ++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 38 deletions(-)

diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index edacebfc1632..183647a86c8a 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -51,7 +51,12 @@ readonly TEST_DESCS=(
 	"Run vsock_test using the loopback transport in the VM."
 )
 
-VERBOSE=0
+readonly LOG_LEVEL_DEBUG=0
+readonly LOG_LEVEL_INFO=1
+readonly LOG_LEVEL_WARN=2
+readonly LOG_LEVEL_ERROR=3
+
+VERBOSE="${LOG_LEVEL_WARN}"
 
 usage() {
 	local name
@@ -196,7 +201,7 @@ vm_start() {
 
 	qemu=$(command -v "${QEMU}")
 
-	if [[ "${VERBOSE}" -eq 1 ]]; then
+	if [[ ${VERBOSE} -le ${LOG_LEVEL_DEBUG} ]]; then
 		verbose_opt="--verbose"
 		logfile=/dev/stdout
 	fi
@@ -271,60 +276,56 @@ EOF
 
 host_wait_for_listener() {
 	wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
-}
-
-__log_stdin() {
-	cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
-}
 
-__log_args() {
-	echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
 }
 
 log() {
-	local prefix="$1"
+	local redirect
+	local prefix
 
-	shift
-	local redirect=
-	if [[ ${VERBOSE} -eq 0 ]]; then
+	if [[ ${VERBOSE} -gt ${LOG_LEVEL_INFO} ]]; then
 		redirect=/dev/null
 	else
 		redirect=/dev/stdout
 	fi
 
+	prefix="${LOG_PREFIX:-}"
+
 	if [[ "$#" -eq 0 ]]; then
-		__log_stdin | tee -a "${LOG}" > ${redirect}
+		if [[ -n "${prefix}" ]]; then
+			cat | awk -v prefix="${prefix}" '{printf "%s: %s\n", prefix, $0}'
+		else
+			cat
+		fi
 	else
-		__log_args "$@" | tee -a "${LOG}" > ${redirect}
-	fi
+		if [[ -n "${prefix}" ]]; then
+			echo "${prefix}: " "$@"
+		else
+			echo "$@"
+		fi
+	fi | tee -a "${LOG}" > ${redirect}
 }
 
-log_setup() {
-	log "setup" "$@"
+log_host() {
+	LOG_PREFIX=host log $@
 }
 
-log_host() {
-	local testname=$1
+log_guest() {
+	LOG_PREFIX=guest log $@
+}
 
-	shift
-	log "test:${testname}:host" "$@"
 }
 
-log_guest() {
-	local testname=$1
 
-	shift
-	log "test:${testname}:guest" "$@"
 }
 
 test_vm_server_host_client() {
-	local testname="${FUNCNAME[0]#test_}"
 
 	vm_ssh -- "${VSOCK_TEST}" \
 		--mode=server \
 		--control-port="${TEST_GUEST_PORT}" \
 		--peer-cid=2 \
-		2>&1 | log_guest "${testname}" &
+		2>&1 | log_guest &
 
 	vm_wait_for_listener "${TEST_GUEST_PORT}"
 
@@ -332,18 +333,17 @@ test_vm_server_host_client() {
 		--mode=client \
 		--control-host=127.0.0.1 \
 		--peer-cid="${VSOCK_CID}" \
-		--control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}"
+		--control-port="${TEST_HOST_PORT}" 2>&1 | log_host
 
 	return $?
 }
 
 test_vm_client_host_server() {
-	local testname="${FUNCNAME[0]#test_}"
 
 	${VSOCK_TEST} \
 		--mode "server" \
 		--control-port "${TEST_HOST_PORT_LISTENER}" \
-		--peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" &
+		--peer-cid "${VSOCK_CID}" 2>&1 | log_host &
 
 	host_wait_for_listener
 
@@ -351,19 +351,18 @@ test_vm_client_host_server() {
 		--mode=client \
 		--control-host=10.0.2.2 \
 		--peer-cid=2 \
-		--control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}"
+		--control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest
 
 	return $?
 }
 
 test_vm_loopback() {
-	local testname="${FUNCNAME[0]#test_}"
 	local port=60000 # non-forwarded local port
 
 	vm_ssh -- "${VSOCK_TEST}" \
 		--mode=server \
 		--control-port="${port}" \
-		--peer-cid=1 2>&1 | log_guest "${testname}" &
+		--peer-cid=1 2>&1 | log_guest &
 
 	vm_wait_for_listener "${port}"
 
@@ -371,7 +370,7 @@ test_vm_loopback() {
 		--mode=client \
 		--control-host="127.0.0.1" \
 		--control-port="${port}" \
-		--peer-cid=1 2>&1 | log_guest "${testname}"
+		--peer-cid=1 2>&1 | log_guest
 
 	return $?
 }
@@ -429,7 +428,7 @@ QEMU="qemu-system-$(uname -m)"
 while getopts :hvsq:b o
 do
 	case $o in
-	v) VERBOSE=1;;
+	v) VERBOSE=$(( VERBOSE - 1 ));;
 	b) BUILD=1;;
 	q) QEMU=$OPTARG;;
 	h|*) usage;;
@@ -452,10 +451,10 @@ handle_build
 
 echo "1..${#ARGS[@]}"
 
-log_setup "Booting up VM"
+log_host "Booting up VM"
 vm_start
 vm_wait_for_ssh
-log_setup "VM booted up"
+log_host "VM booted up"
 
 cnt_pass=0
 cnt_fail=0

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 8/9] selftests/vsock: invoke vsock_test through helpers
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
                   ` (6 preceding siblings ...)
  2025-08-28  0:31 ` [PATCH net-next v5 7/9] selftests/vsock: improve logging in vmtest.sh Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-08-28  0:31 ` [PATCH net-next v5 9/9] selftests/vsock: add namespace tests Bobby Eshleman
  8 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add helper calls vm_vsock_test() and host_vsock_test() to invoke the
vsock_test binary. This encapsulates several items of repeat logic, such
as waiting for the server to reach listening state and
enabling/disabling the bash option pipefail to avoid pipe-style logging
from hiding failures.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 tools/testing/selftests/vsock/vmtest.sh | 120 ++++++++++++++++++++++++++++----
 1 file changed, 108 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 183647a86c8a..5e36d1068f6f 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -248,6 +248,7 @@ wait_for_listener()
 	local port=$1
 	local interval=$2
 	local max_intervals=$3
+	local old_pipefail
 	local protocol=tcp
 	local pattern
 	local i
@@ -256,6 +257,13 @@ wait_for_listener()
 
 	# for tcp protocol additionally check the socket state
 	[ "${protocol}" = "tcp" ] && pattern="${pattern}0A"
+
+	# 'grep -q' exits on match, sending SIGPIPE to 'awk', which exits with
+	# an error, causing the if-condition to fail when pipefail is set.
+	# Instead, temporarily disable pipefail and restore it later.
+	old_pipefail=$(set -o | awk '/^pipefail[[:space:]]+(on|off)$/{print $2}')
+	set +o pipefail
+
 	for i in $(seq "${max_intervals}"); do
 		if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \
 		   grep -q "${pattern}"; then
@@ -263,6 +271,10 @@ wait_for_listener()
 		fi
 		sleep "${interval}"
 	done
+
+	if [[ "${old_pipefail}" == on ]]; then
+		set -o pipefail
+	fi
 }
 
 vm_wait_for_listener() {
@@ -314,28 +326,112 @@ log_guest() {
 	LOG_PREFIX=guest log $@
 }
 
+vm_vsock_test() {
+	local ns=$1
+	local mode=$2
+	local rc
+
+	set -o pipefail
+	if [[ "${mode}" == client ]]; then
+		local host=$3
+		local cid=$4
+		local port=$5
+
+		# log output and use pipefail to respect vsock_test errors
+		vm_ssh "${ns}" -- "${VSOCK_TEST}" \
+			--mode=client \
+			--control-host="${host}" \
+			--peer-cid="${cid}" \
+			--control-port="${port}" \
+			2>&1 | log_guest
+		rc=$?
+	else
+		local cid=$3
+		local port=$4
+
+		# log output and use pipefail to respect vsock_test errors
+		vm_ssh "${ns}" -- "${VSOCK_TEST}" \
+			--mode=server \
+			--peer-cid="${cid}" \
+			--control-port="${port}" \
+			2>&1 | log_guest &
+		rc=$?
+
+		if [[ $rc -ne 0 ]]; then
+			set +o pipefail
+			return $rc
+		fi
+
+		vm_wait_for_listener "${ns}" "${port}"
+		rc=$?
+	fi
+	set +o pipefail
+
+	return $rc
 }
 
+host_vsock_test() {
+	local ns=$1
+	local mode=$2
+	local cmd
+
+	if [[ "${ns}" == none ]]; then
+		cmd="${VSOCK_TEST}"
+	else
+		cmd="ip netns exec ${ns} ${VSOCK_TEST}"
+	fi
+
+	# log output and use pipefail to respect vsock_test errors
+	set -o pipefail
+	if [[ "${mode}" == client ]]; then
+		local host=$3
+		local cid=$4
+		local port=$5
+
+		${cmd} \
+			--mode="${mode}" \
+			--peer-cid="${cid}" \
+			--control-host="${host}" \
+			--control-port="${port}" 2>&1 | log_host
+		rc=$?
+	else
+		local cid=$3
+		local port=$4
+
+		${cmd} \
+			--mode="${mode}" \
+			--peer-cid="${cid}" \
+			--control-port="${port}" 2>&1 | log_host &
+		rc=$?
+
+		if [[ $rc -ne 0 ]]; then
+			return $rc
+		fi
+
+		host_wait_for_listener "${ns}" "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+		rc=$?
+	fi
+	set +o pipefail
 
+	return $rc
 }
 
 test_vm_server_host_client() {
+	vm_vsock_test "none" "server" 2 "${TEST_GUEST_PORT}"
+	host_vsock_test "none" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+}
 
-	vm_ssh -- "${VSOCK_TEST}" \
-		--mode=server \
-		--control-port="${TEST_GUEST_PORT}" \
-		--peer-cid=2 \
-		2>&1 | log_guest &
+test_vm_client_host_server() {
+	host_vsock_test "none" "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"
+	vm_vsock_test "none" "client" "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"
+}
 
-	vm_wait_for_listener "${TEST_GUEST_PORT}"
+test_vm_loopback() {
+	vm_vsock_test "none" "server" 1 "${TEST_HOST_PORT_LISTENER}"
+	vm_vsock_test "none" "client" "127.0.0.1" 1 "${TEST_HOST_PORT_LISTENER}"
+}
 
-	${VSOCK_TEST} \
-		--mode=client \
-		--control-host=127.0.0.1 \
-		--peer-cid="${VSOCK_CID}" \
-		--control-port="${TEST_HOST_PORT}" 2>&1 | log_host
 
-	return $?
 }
 
 test_vm_client_host_server() {

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v5 9/9] selftests/vsock: add namespace tests
  2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
                   ` (7 preceding siblings ...)
  2025-08-28  0:31 ` [PATCH net-next v5 8/9] selftests/vsock: invoke vsock_test through helpers Bobby Eshleman
@ 2025-08-28  0:31 ` Bobby Eshleman
  2025-09-02 15:40   ` Stefano Garzarella
  8 siblings, 1 reply; 18+ messages in thread
From: Bobby Eshleman @ 2025-08-28  0:31 UTC (permalink / raw)
  To: Stefano Garzarella, Shuah Khan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list
  Cc: virtualization, netdev, linux-kselftest, linux-kernel, kvm,
	linux-hyperv, Bobby Eshleman, berrange, Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add tests for namespace support in vsock. Use socat for basic connection
failure tests and vsock_test for full functionality tests when
communication is expected to succeed. vsock_test is not used for failure
cases because in theory vsock_test could allow connection and some
traffic flow but fail on some other case (e.g., fail on MSG_ZEROCOPY).

Tests cover all cases of clients and servers being in all variants of
local ns, global ns, host process, and VM process.

Legacy tests are retained and executed in the init ns.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Changes in v5:
- use /proc/sys/net/vsock/ns_mode
- clarify logic of tests that reuse the same VM and tests that require
  netns setup
- fix unassigned BUILD bug
---
 tools/testing/selftests/vsock/vmtest.sh | 913 ++++++++++++++++++++++++++++----
 1 file changed, 808 insertions(+), 105 deletions(-)

diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 5e36d1068f6f..9d830eb7e829 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -7,6 +7,7 @@
 #		* virtme-ng
 #		* busybox-static (used by virtme-ng)
 #		* qemu	(used by virtme-ng)
+#		* socat
 
 readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
 readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
@@ -23,7 +24,7 @@ readonly VSOCK_CID=1234
 readonly WAIT_PERIOD=3
 readonly WAIT_PERIOD_MAX=60
 readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
-readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+readonly WAIT_QEMU=5
 
 # virtme-ng offers a netdev for ssh when using "--ssh", but we also need a
 # control port forwarded for vsock_test.  Because virtme-ng doesn't support
@@ -33,23 +34,125 @@ readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
 # add the kernel cmdline options that virtme-init uses to setup the interface.
 readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}"
 readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}"
-readonly QEMU_OPTS="\
-	 -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
-	 -device virtio-net-pci,netdev=n0 \
-	 -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \
-	 --pidfile ${QEMU_PIDFILE} \
-"
 readonly KERNEL_CMDLINE="\
 	virtme.dhcp net.ifnames=0 biosdevname=0 \
 	virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \
 "
 readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
-readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly TEST_NAMES=(
+	vm_server_host_client
+	vm_client_host_server
+	vm_loopback
+	host_vsock_ns_mode_ok
+	host_vsock_ns_mode_write_once_ok
+	global_same_cid_fails
+	local_same_cid_ok
+	global_local_same_cid_ok
+	local_global_same_cid_ok
+	diff_ns_global_host_connect_to_global_vm_ok
+	diff_ns_global_host_connect_to_local_vm_fails
+	diff_ns_global_vm_connect_to_global_host_ok
+	diff_ns_global_vm_connect_to_local_host_fails
+	diff_ns_local_host_connect_to_local_vm_fails
+	diff_ns_local_vm_connect_to_local_host_fails
+	diff_ns_global_to_local_loopback_local_fails
+	diff_ns_local_to_global_loopback_fails
+	diff_ns_local_to_local_loopback_fails
+	diff_ns_global_to_global_loopback_ok
+	same_ns_local_loopback_ok
+	same_ns_local_host_connect_to_local_vm_ok
+	same_ns_local_vm_connect_to_local_host_ok
+)
+
 readonly TEST_DESCS=(
+	# vm_server_host_client
 	"Run vsock_test in server mode on the VM and in client mode on the host."
+
+	# vm_client_host_server
 	"Run vsock_test in client mode on the VM and in server mode on the host."
+
+	# vm_loopback
 	"Run vsock_test using the loopback transport in the VM."
+
+	# host_vsock_ns_mode_ok
+	"Check /proc/sys/net/vsock/ns_mode strings on the host."
+
+	# host_vsock_ns_mode_write_once_ok
+	"Check /proc/sys/net/vsock/ns_mode is write-once on the host."
+
+	# global_same_cid_fails
+	"Check QEMU fails to start two VMs with same CID in two different global namespaces."
+
+	# local_same_cid_ok
+	"Check QEMU successfully starts two VMs with same CID in two different local namespaces."
+
+	# global_local_same_cid_ok
+	"Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID."
+
+	# local_global_same_cid_ok
+	"Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
+
+	# diff_ns_global_host_connect_to_global_vm_ok
+	"Run vsock_test client in global ns with server in VM in another global ns."
+
+	# diff_ns_global_host_connect_to_local_vm_fails
+	"Run socat to test a process in a global ns fails to connect to a VM in a local ns."
+
+	# diff_ns_global_vm_connect_to_global_host_ok
+	"Run vsock_test client in VM in a global ns with server in another global ns."
+
+	# diff_ns_global_vm_connect_to_local_host_fails
+	"Run socat to test a VM in a global ns fails to connect to a host process in a local ns."
+
+	# diff_ns_local_host_connect_to_local_vm_fails
+	"Run socat to test a host process in a local ns fails to connect to a VM in another local ns."
+
+	# diff_ns_local_vm_connect_to_local_host_fails
+	"Run socat to test a VM in a local ns fails to connect to a host process in another local ns."
+
+	# diff_ns_global_to_local_loopback_local_fails
+	"Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns."
+
+	# diff_ns_local_to_global_loopback_fails
+	"Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns."
+
+	# diff_ns_local_to_local_loopback_fails
+	"Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns."
+
+	# diff_ns_global_to_global_loopback_ok
+	"Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns."
+
+	# same_ns_local_loopback_ok
+	"Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns."
+
+	# same_ns_local_host_connect_to_local_vm_ok
+	"Run vsock_test client in a local ns with server in VM in same ns."
+
+	# same_ns_local_vm_connect_to_local_host_ok
+	"Run vsock_test client in VM in a local ns with server in same ns."
+)
+
+readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly USE_INIT_NETNS=(
+	global_same_cid_fails
+	local_same_cid_ok
+	global_local_same_cid_ok
+	local_global_same_cid_ok
+	diff_ns_global_host_connect_to_global_vm_ok
+	diff_ns_global_host_connect_to_local_vm_fails
+	diff_ns_global_vm_connect_to_global_host_ok
+	diff_ns_global_vm_connect_to_local_host_fails
+	diff_ns_local_host_connect_to_local_vm_fails
+	diff_ns_local_vm_connect_to_local_host_fails
+	diff_ns_global_to_local_loopback_local_fails
+	diff_ns_local_to_global_loopback_fails
+	diff_ns_local_to_local_loopback_fails
+	diff_ns_global_to_global_loopback_ok
+	same_ns_local_loopback_ok
+	same_ns_local_host_connect_to_local_vm_ok
+	same_ns_local_vm_connect_to_local_host_ok
 )
+readonly MODES=("local" "global")
 
 readonly LOG_LEVEL_DEBUG=0
 readonly LOG_LEVEL_INFO=1
@@ -58,6 +161,12 @@ readonly LOG_LEVEL_ERROR=3
 
 VERBOSE="${LOG_LEVEL_WARN}"
 
+# Test pass/fail counters
+cnt_pass=0
+cnt_fail=0
+cnt_skip=0
+cnt_total=0
+
 usage() {
 	local name
 	local desc
@@ -77,7 +186,7 @@ usage() {
 	for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
 		name=${TEST_NAMES[${i}]}
 		desc=${TEST_DESCS[${i}]}
-		printf "\t%-35s%-35s\n" "${name}" "${desc}"
+		printf "\t%-55s%-35s\n" "${name}" "${desc}"
 	done
 	echo
 
@@ -89,21 +198,87 @@ die() {
 	exit "${KSFT_FAIL}"
 }
 
+add_namespaces() {
+	# add namespaces local0, local1, global0, and global1
+	for mode in "${MODES[@]}"; do
+		ip netns add "${mode}0" 2>/dev/null
+		ip netns add "${mode}1" 2>/dev/null
+	done
+}
+
+init_namespaces() {
+	for mode in "${MODES[@]}"; do
+		ns_set_mode "${mode}0" "${mode}"
+		ns_set_mode "${mode}1" "${mode}"
+
+		log_host "set ns ${mode}0 to mode ${mode}"
+		log_host "set ns ${mode}1 to mode ${mode}"
+
+		# we need lo for qemu port forwarding
+		ip netns exec "${mode}0" ip link set dev lo up
+		ip netns exec "${mode}1" ip link set dev lo up
+	done
+}
+
+del_namespaces() {
+	for mode in "${MODES[@]}"; do
+		ip netns del "${mode}0"
+		ip netns del "${mode}1"
+		log_host "removed ns ${mode}0"
+		log_host "removed ns ${mode}1"
+	done &>/dev/null
+}
+
+ns_set_mode() {
+	local ns=$1
+	local mode=$2
+
+	echo "${mode}" | ip netns exec "${ns}" \
+		tee /proc/sys/net/vsock/ns_mode &>/dev/null
+}
+
 vm_ssh() {
-	ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
+	local ns_exec
+
+	if [[ "${1}" == none ]]; then
+		local ns_exec=""
+	else
+		local ns_exec="ip netns exec ${1}"
+	fi
+
+	shift
+
+	${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost $*
+
 	return $?
 }
 
 cleanup() {
-	if [[ -s "${QEMU_PIDFILE}" ]]; then
-		pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
-	fi
+	del_namespaces
+}
 
-	# If failure occurred during or before qemu start up, then we need
-	# to clean this up ourselves.
-	if [[ -e "${QEMU_PIDFILE}" ]]; then
-		rm "${QEMU_PIDFILE}"
-	fi
+terminate_pidfiles() {
+	local pidfile
+
+	for pidfile in "$@"; do
+		if [[ -s "${pidfile}" ]]; then
+			pkill -SIGTERM -F "${pidfile}" 2>&1 > /dev/null
+		fi
+
+		# If failure occurred during or before qemu start up, then we need
+		# to clean this up ourselves.
+		if [[ -e "${pidfile}" ]]; then
+			rm -f "${pidfile}"
+		fi
+	done
+}
+
+terminate_pids() {
+	local pid
+
+	for pid in "$@"; do
+		kill -SIGTERM "${pid}" &>/dev/null || :
+	done
 }
 
 check_args() {
@@ -133,7 +308,7 @@ check_args() {
 }
 
 check_deps() {
-	for dep in vng ${QEMU} busybox pkill ssh; do
+	for dep in vng ${QEMU} busybox pkill ssh socat; do
 		if [[ ! -x $(command -v "${dep}") ]]; then
 			echo -e "skip:    dependency ${dep} not found!\n"
 			exit "${KSFT_SKIP}"
@@ -170,6 +345,20 @@ check_vng() {
 	fi
 }
 
+check_socat() {
+	local support_string
+
+	support_string="$(socat -V)"
+
+	if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then
+		die "err: socat is missing vsock support"
+	fi
+
+	if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then
+		die "err: socat is missing unix support"
+	fi
+}
+
 handle_build() {
 	if [[ ! "${BUILD}" -eq 1 ]]; then
 		return
@@ -194,9 +383,14 @@ handle_build() {
 }
 
 vm_start() {
+	local cid=$1
+	local ns=$2
+	local pidfile=$3
 	local logfile=/dev/null
 	local verbose_opt=""
+	local qemu_opts=""
 	local kernel_opt=""
+	local ns_exec=""
 	local qemu
 
 	qemu=$(command -v "${QEMU}")
@@ -206,27 +400,37 @@ vm_start() {
 		logfile=/dev/stdout
 	fi
 
+	qemu_opts="\
+		 -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
+		 -device virtio-net-pci,netdev=n0 \
+		${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${cid} \
+		--pidfile ${pidfile}
+	"
+
 	if [[ "${BUILD}" -eq 1 ]]; then
 		kernel_opt="${KERNEL_CHECKOUT}"
 	fi
 
-	vng \
+	if [[ "${ns}" != "none" ]]; then
+		ns_exec="ip netns exec ${ns}"
+	fi
+
+	${ns_exec} vng \
 		--run \
 		${kernel_opt} \
 		${verbose_opt} \
-		--qemu-opts="${QEMU_OPTS}" \
+		--qemu-opts="${qemu_opts}" \
 		--qemu="${qemu}" \
 		--user root \
 		--append "${KERNEL_CMDLINE}" \
 		--rw  &> ${logfile} &
 
-	if ! timeout ${WAIT_TOTAL} \
-		bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then
-		die "failed to boot VM"
-	fi
+	timeout "${WAIT_QEMU}" \
+		bash -c 'while [[ ! -s '"${pidfile}"' ]]; do sleep 1; done; exit 0'
 }
 
 vm_wait_for_ssh() {
+	local ns=$1
 	local i
 
 	i=0
@@ -234,7 +438,8 @@ vm_wait_for_ssh() {
 		if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
 			die "Timed out waiting for guest ssh"
 		fi
-		if vm_ssh -- true; then
+
+		if vm_ssh "${ns}" -- true; then
 			break
 		fi
 		i=$(( i + 1 ))
@@ -269,6 +474,7 @@ wait_for_listener()
 		   grep -q "${pattern}"; then
 			break
 		fi
+
 		sleep "${interval}"
 	done
 
@@ -278,17 +484,29 @@ wait_for_listener()
 }
 
 vm_wait_for_listener() {
-	local port=$1
+	local ns=$1
+	local port=$2
+
+	log "Waiting for listener on port ${port} on vm"
 
-	vm_ssh <<EOF
+	vm_ssh "${ns}" <<EOF
 $(declare -f wait_for_listener)
 wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
 EOF
 }
 
 host_wait_for_listener() {
-	wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+	local ns=$1
+	local port=$2
 
+	if [[ "${ns}" == none ]]; then
+		wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+	else
+		ip netns exec "${ns}" bash <<-EOF
+			$(declare -f wait_for_listener)
+			wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+		EOF
+	fi
 }
 
 log() {
@@ -427,51 +645,506 @@ test_vm_client_host_server() {
 }
 
 test_vm_loopback() {
+	vm_ssh "none" modprobe vsock_loopback || :
 	vm_vsock_test "none" "server" 1 "${TEST_HOST_PORT_LISTENER}"
 	vm_vsock_test "none" "client" "127.0.0.1" 1 "${TEST_HOST_PORT_LISTENER}"
 }
 
+test_host_vsock_ns_mode_ok() {
+	add_namespaces
+
+	for mode in "${MODES[@]}"; do
+		if ! ns_set_mode "${mode}0" "${mode}"; then
+			del_namespaces
+			return "${KSFT_FAIL}"
+		fi
+	done
 
+	del_namespaces
 }
 
-test_vm_client_host_server() {
+test_host_vsock_ns_mode_write_once_ok() {
+	add_namespaces
 
-	${VSOCK_TEST} \
-		--mode "server" \
-		--control-port "${TEST_HOST_PORT_LISTENER}" \
-		--peer-cid "${VSOCK_CID}" 2>&1 | log_host &
+	for mode in "${MODES[@]}"; do
+		local ns="${mode}0"
+		if ! ns_set_mode "${ns}" "${mode}"; then
+			del_namespaces
+			return "${KSFT_FAIL}"
+		fi
 
-	host_wait_for_listener
+		# try writing again and expect failure
+		if ns_set_mode "${ns}" "${mode}"; then
+			del_namespaces
+			return "${KSFT_FAIL}"
+		fi
+	done
 
-	vm_ssh -- "${VSOCK_TEST}" \
-		--mode=client \
-		--control-host=10.0.2.2 \
-		--peer-cid=2 \
-		--control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest
+	del_namespaces
 
-	return $?
+	return "${KSFT_PASS}"
 }
 
-test_vm_loopback() {
-	local port=60000 # non-forwarded local port
+namespaces_can_boot_same_cid() {
+	local ns0=$1
+	local ns1=$2
+	local pidfile1 pidfile2
+	local cid=20
+	readonly cid
+	local rc
 
-	vm_ssh -- "${VSOCK_TEST}" \
-		--mode=server \
-		--control-port="${port}" \
-		--peer-cid=1 2>&1 | log_guest &
+	pidfile1=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	vm_start "${cid}" "${ns0}" "${pidfile1}"
 
-	vm_wait_for_listener "${port}"
+	pidfile2=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	vm_start "${cid}" "${ns1}" "${pidfile2}"
 
-	vm_ssh -- "${VSOCK_TEST}" \
-		--mode=client \
-		--control-host="127.0.0.1" \
-		--control-port="${port}" \
-		--peer-cid=1 2>&1 | log_guest
+	rc=$?
+	terminate_pidfiles "${pidfile1}" "${pidfile2}"
 
-	return $?
+	return $rc
+}
+
+test_global_same_cid_fails() {
+	if namespaces_can_boot_same_cid "global0" "global1"; then
+		return "${KSFT_FAIL}"
+	fi
+
+	return "${KSFT_PASS}"
+}
+
+test_local_global_same_cid_ok() {
+	if namespaces_can_boot_same_cid "local0" "global0"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_global_local_same_cid_ok() {
+	if namespaces_can_boot_same_cid "global0" "local0"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_local_same_cid_ok() {
+	if namespaces_can_boot_same_cid "local0" "local0"; then
+		return "${KSFT_FAIL}"
+	fi
+
+	return "${KSFT_PASS}"
+}
+
+test_diff_ns_global_host_connect_to_global_vm_ok() {
+	local pids pid pidfile
+	local ns0 ns1 port
+	declare -a pids
+	local unixfile
+	ns0="global0"
+	ns1="global1"
+	port=1234
+	local rc
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+
+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
+		return "${KSFT_FAIL}"
+	fi
+
+	unixfile=$(mktemp -u /tmp/XXXX.sock)
+	ip netns exec "${ns1}" \
+		socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \
+			UNIX-CONNECT:"${unixfile}" &
+	pids+=($!)
+	host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}"
+
+	ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \
+		TCP-CONNECT:localhost:"${TEST_HOST_PORT}" &
+	pids+=($!)
+
+	vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}"
+	vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}"
+	host_vsock_test "${ns1}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+	rc=$?
+
+	for pid in "${pids[@]}"; do
+		if [[ "$(jobs -p)" = *"${pid}"* ]]; then
+			kill -SIGTERM "${pid}" &>/dev/null
+		fi
+	done
+
+	terminate_pidfiles "${pidfile}"
+
+	if [[ $rc -ne 0 ]]; then
+		return "${KSFT_FAIL}"
+	fi
+
+	return "${KSFT_PASS}"
+}
+
+test_diff_ns_global_host_connect_to_local_vm_fails() {
+	local ns0="global0"
+	local ns1="local0"
+	local port=12345
+	local pidfile
+	local result
+	local pid
+
+	outfile=$(mktemp)
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	if ! vm_start "${VSOCK_CID}" "${ns1}" "${pidfile}"; then
+		log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})"
+		return $KSFT_FAIL
+	fi
+
+	vm_wait_for_ssh "${ns1}"
+	vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
+	echo TEST | ip netns exec "${ns0}" \
+		socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
+
+	terminate_pidfiles "${pidfile}"
+
+	result=$(cat "${outfile}")
+	rm -f "${outfile}"
+
+	if [[ "${result}" != TEST ]]; then
+		return $KSFT_PASS
+	fi
+
+	return $KSFT_FAIL
+}
+
+test_diff_ns_global_vm_connect_to_global_host_ok() {
+	local ns0="global0"
+	local ns1="global1"
+	local port=12345
+	local unixfile
+	local pidfile
+	local pids
+
+	declare -a pids
+
+	log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
+
+	unixfile=$(mktemp -u /tmp/XXXX.sock)
+
+	ip netns exec "${ns0}" \
+		socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &
+	pids+=($!)
+
+	ip netns exec "${ns1}" \
+		socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" &
+	pids+=($!)
+
+	log_host "Launching ${VSOCK_TEST} in ns ${ns1}"
+	host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}"
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+		terminate_pids "${pids[@]}"
+		rm -f "${unixfile}"
+		return $KSFT_FAIL
+	fi
+
+	vm_wait_for_ssh "${ns0}"
+	vm_vsock_test "${ns0}" "client" "10.0.2.2" 2 "${port}"
+	rc=$?
+
+	terminate_pidfiles "${pidfile}"
+	terminate_pids "${pids[@]}"
+	rm -f "${unixfile}"
+
+	if [[ ! $rc -eq 0 ]]; then
+		return "${KSFT_FAIL}"
+	fi
+
+	return "${KSFT_PASS}"
+
+}
+
+test_diff_ns_global_vm_connect_to_local_host_fails() {
+	local ns0="global0"
+	local ns1="local0"
+	local port=12345
+	local pidfile
+	local result
+	local pid
+
+	log_host "Launching socat in ns ${ns1}"
+	outfile=$(mktemp)
+	ip netns exec "${ns1}" socat VSOCK-LISTEN:${port} STDOUT &> "${outfile}" &
+	pid=$!
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+		terminate_pids "${pid}"
+		rm -f "${outfile}"
+		return $KSFT_FAIL
+	fi
+
+	vm_wait_for_ssh "${ns0}"
+
+	vm_ssh "${ns0}" -- \
+		bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
+
+	terminate_pidfiles "${pidfile}"
+	terminate_pids "${pid}"
+
+	result=$(cat "${outfile}")
+	rm -f "${outfile}"
+
+	if [[ "${result}" != TEST ]]; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_diff_ns_local_host_connect_to_local_vm_fails() {
+	local ns0="local0"
+	local ns1="local1"
+	local port=12345
+	local pidfile
+	local result
+	local pid
+
+	outfile=$(mktemp)
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	if ! vm_start "${VSOCK_CID}" "${ns1}" "${pidfile}"; then
+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+		return $KSFT_FAIL
+	fi
+
+	vm_wait_for_ssh "${ns1}"
+	vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
+	echo TEST | ip netns exec "${ns0}" \
+		socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
+
+	terminate_pidfiles "${pidfile}"
+
+	result=$(cat "${outfile}")
+	rm -f "${outfile}"
+
+	if [[ "${result}" != TEST ]]; then
+		return $KSFT_PASS
+	fi
+
+	return $KSFT_FAIL
+}
+
+test_diff_ns_local_vm_connect_to_local_host_fails() {
+	local ns0="local0"
+	local ns1="local1"
+	local port=12345
+	local pidfile
+	local result
+	local pid
+
+	log_host "Launching socat in ns ${ns1}"
+	outfile=$(mktemp)
+	ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
+	pid=$!
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+		rm -f "${outfile}"
+		return "${KSFT_FAIL}"
+	fi
+
+	vm_wait_for_ssh "${ns0}"
+
+	vm_ssh "${ns0}" -- \
+		bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
+
+	terminate_pidfiles "${pidfile}"
+	terminate_pids "${pid}"
+
+	result=$(cat "${outfile}")
+	rm -f "${outfile}"
+
+	if [[ "${result}" != TEST ]]; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+__test_loopback_two_netns() {
+	local ns0=$1
+	local ns1=$2
+	local port=12345
+	local result
+	local pid
+
+	modprobe vsock_loopback &> /dev/null || :
+
+	log_host "Launching socat in ns ${ns1}"
+	outfile=$(mktemp)
+	ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null &
+	pid=$!
+
+	log_host "Launching socat in ns ${ns0}"
+	echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null
+	terminate_pids "${pid}"
+
+	result=$(cat "${outfile}")
+	rm -f "${outfile}"
+
+	if [[ "${result}" == TEST ]]; then
+		return 0
+	fi
+
+	return 1
+}
+
+test_diff_ns_global_to_local_loopback_local_fails() {
+	if ! __test_loopback_two_netns "global0" "local0"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_diff_ns_local_to_global_loopback_fails() {
+	if ! __test_loopback_two_netns "local0" "global0"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_diff_ns_local_to_local_loopback_fails() {
+	if ! __test_loopback_two_netns "local0" "local1"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_diff_ns_global_to_global_loopback_ok() {
+	if __test_loopback_two_netns "global0" "global1"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_same_ns_local_loopback_ok() {
+	if __test_loopback_two_netns "local0" "local0"; then
+		return "${KSFT_PASS}"
+	fi
+
+	return "${KSFT_FAIL}"
+}
+
+test_same_ns_local_host_connect_to_local_vm_ok() {
+	local ns="local0"
+	local port=1234
+	local pidfile
+	local rc
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+
+	if ! vm_start "${VSOCK_CID}" "${ns}" "${pidfile}"; then
+		return "${KSFT_FAIL}"
+	fi
+
+	vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
+	host_vsock_test "${ns}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+	rc=$?
+
+	terminate_pidfiles "${pidfile}"
+
+	if [[ $rc -ne 0 ]]; then
+		return "${KSFT_FAIL}"
+	fi
+
+	return "${KSFT_PASS}"
+}
+
+test_same_ns_local_vm_connect_to_local_host_ok() {
+	local ns="local0"
+	local port=1234
+	local pidfile
+	local rc
+
+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+
+	if ! vm_start "${VSOCK_CID}" "${ns}" "${pidfile}"; then
+		return "${KSFT_FAIL}"
+	fi
+
+	vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
+	host_vsock_test "${ns}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+	rc=$?
+
+	terminate_pidfiles "${pidfile}"
+
+	if [[ $rc -ne 0 ]]; then
+		return "${KSFT_FAIL}"
+	fi
+
+	return "${KSFT_PASS}"
+}
+
+shared_vm_test() {
+	local tname
+
+	tname="${1}"
+
+	for testname in "${USE_SHARED_VM[@]}"; do
+		if [[ "${tname}" == "${testname}" ]]; then
+			return 0
+		fi
+	done
+
+	return 1
 }
 
-run_test() {
+
+init_netns_test() {
+	local tname
+
+	tname="${1}"
+
+	for testname in "${USE_INIT_NETNS[@]}"; do
+		if [[ "${tname}" == "${testname}" ]]; then
+			return 0
+		fi
+	done
+
+	return 1
+}
+
+check_result() {
+	local rc num
+
+	rc=$1
+	num=$(( cnt_total + 1 ))
+
+	if [[ ${rc} -eq $KSFT_PASS ]]; then
+		cnt_pass=$(( cnt_pass + 1 ))
+		echo "ok ${num} ${arg}"
+	elif [[ ${rc} -eq $KSFT_SKIP ]]; then
+		cnt_skip=$(( cnt_skip + 1 ))
+		echo "ok ${num} ${arg} # SKIP"
+	elif [[ ${rc} -eq $KSFT_FAIL ]]; then
+		cnt_fail=$(( cnt_fail + 1 ))
+		echo "not ok ${num} ${arg} # exit=$rc"
+	fi
+
+	cnt_total=$(( cnt_total + 1 ))
+}
+
+run_shared_vm_tests() {
+	local start_shared_vm pidfile
 	local host_oops_cnt_before
 	local host_warn_cnt_before
 	local vm_oops_cnt_before
@@ -483,42 +1156,93 @@ run_test() {
 	local name
 	local rc
 
-	host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
-	host_warn_cnt_before=$(dmesg --level=warn | wc -l)
-	vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
-	vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l)
+	start_shared_vm=0
 
-	name=$(echo "${1}" | awk '{ print $1 }')
-	eval test_"${name}"
-	rc=$?
+	for arg in "${ARGS[@]}"; do
+		if shared_vm_test "${arg}"; then
+			start_shared_vm=1
+			break
+		fi
+	done
 
-	host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
-	if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
-		echo "FAIL: kernel oops detected on host" | log_host "${name}"
-		rc=$KSFT_FAIL
+	pidfile=""
+	if [[ "${start_shared_vm}" == 1 ]]; then
+		pidfile=$(mktemp $PIDFILE_TEMPLATE)
+		log_host "Booting up VM"
+		vm_start "${VSOCK_CID}" "none" "${pidfile}"
+		vm_wait_for_ssh "none"
+		log_host "VM booted up"
 	fi
 
-	host_warn_cnt_after=$(dmesg --level=warn | wc -l)
-	if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
-		echo "FAIL: kernel warning detected on host" | log_host "${name}"
-		rc=$KSFT_FAIL
-	fi
+	for arg in "${ARGS[@]}"; do
+		if ! shared_vm_test "${arg}"; then
+			continue
+		fi
 
-	vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
-	if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
-		echo "FAIL: kernel oops detected on vm" | log_host "${name}"
-		rc=$KSFT_FAIL
-	fi
+		host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
+		host_warn_cnt_before=$(dmesg --level=warn | wc -l)
+		vm_oops_cnt_before=$(vm_ssh none -- dmesg | grep -c -i 'Oops')
+		vm_warn_cnt_before=$(vm_ssh none -- dmesg --level=warn | wc -l)
+
+		name=$(echo "${arg}" | awk '{ print $1 }')
+		log_host "Executing test_${name}"
+		eval test_"${name}"
+		rc=$?
+
+		host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
+		if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
+			echo "FAIL: kernel oops detected on host" | log_host "${name}"
+			rc=$KSFT_FAIL
+		fi
+
+		host_warn_cnt_after=$(dmesg --level=warn | wc -l)
+		if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
+			echo "FAIL: kernel warning detected on host" | log_host "${name}"
+			rc=$KSFT_FAIL
+		fi
+
+		vm_oops_cnt_after=$(vm_ssh none -- dmesg | grep -i 'Oops' | wc -l)
+		if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
+			echo "FAIL: kernel oops detected on vm" | log_host "${name}"
+			rc=$KSFT_FAIL
+		fi
+
+		vm_warn_cnt_after=$(vm_ssh none -- dmesg --level=warn | wc -l)
+		if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
+			echo "FAIL: kernel warning detected on vm" | log_host "${name}"
+			rc=$KSFT_FAIL
+		fi
 
-	vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l)
-	if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
-		echo "FAIL: kernel warning detected on vm" | log_host "${name}"
-		rc=$KSFT_FAIL
+		check_result "${rc}"
+	done
+
+	if [[ -n "${pidfile}" ]]; then
+		log_host "VM terminate"
+		terminate_pidfiles "${pidfile}"
 	fi
+}
+
+run_isolated_vm_tests() {
+	for arg in "${ARGS[@]}"; do
+		if shared_vm_test "${arg}"; then
+			continue
+		fi
 
-	return "${rc}"
+		add_namespaces
+		if init_netns_test "${arg}"; then
+			init_namespaces
+		fi
+
+		name=$(echo "${arg}" | awk '{ print $1 }')
+		log_host "Executing test_${name}"
+		eval test_"${name}"
+		check_result $?
+
+		del_namespaces
+	done
 }
 
+BUILD=0
 QEMU="qemu-system-$(uname -m)"
 
 while getopts :hvsq:b o
@@ -543,34 +1267,13 @@ fi
 check_args "${ARGS[@]}"
 check_deps
 check_vng
+check_socat
 handle_build
 
 echo "1..${#ARGS[@]}"
 
-log_host "Booting up VM"
-vm_start
-vm_wait_for_ssh
-log_host "VM booted up"
-
-cnt_pass=0
-cnt_fail=0
-cnt_skip=0
-cnt_total=0
-for arg in "${ARGS[@]}"; do
-	run_test "${arg}"
-	rc=$?
-	if [[ ${rc} -eq $KSFT_PASS ]]; then
-		cnt_pass=$(( cnt_pass + 1 ))
-		echo "ok ${cnt_total} ${arg}"
-	elif [[ ${rc} -eq $KSFT_SKIP ]]; then
-		cnt_skip=$(( cnt_skip + 1 ))
-		echo "ok ${cnt_total} ${arg} # SKIP"
-	elif [[ ${rc} -eq $KSFT_FAIL ]]; then
-		cnt_fail=$(( cnt_fail + 1 ))
-		echo "not ok ${cnt_total} ${arg} # exit=$rc"
-	fi
-	cnt_total=$(( cnt_total + 1 ))
-done
+run_shared_vm_tests
+run_isolated_vm_tests
 
 echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
 echo "Log: ${LOG}"

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 4/9] vsock/loopback: add netns support
  2025-08-28  0:31 ` [PATCH net-next v5 4/9] vsock/loopback: add netns support Bobby Eshleman
@ 2025-08-28 10:35   ` kernel test robot
  2025-09-02 15:39   ` Stefano Garzarella
  1 sibling, 0 replies; 18+ messages in thread
From: kernel test robot @ 2025-08-28 10:35 UTC (permalink / raw)
  To: Bobby Eshleman, Stefano Garzarella, Shuah Khan, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list
  Cc: oe-kbuild-all, netdev, virtualization, linux-kselftest,
	linux-kernel, kvm, linux-hyperv, Bobby Eshleman, berrange

Hi Bobby,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 242041164339594ca019481d54b4f68a7aaff64e]

url:    https://github.com/intel-lab-lkp/linux/commits/Bobby-Eshleman/vsock-a-per-net-vsock-NS-mode-state/20250828-083629
base:   242041164339594ca019481d54b4f68a7aaff64e
patch link:    https://lore.kernel.org/r/20250827-vsock-vmtest-v5-4-0ba580bede5b%40meta.com
patch subject: [PATCH net-next v5 4/9] vsock/loopback: add netns support
config: nios2-randconfig-001-20250828 (https://download.01.org/0day-ci/archive/20250828/202508281824.3XZiIgxs-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250828/202508281824.3XZiIgxs-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508281824.3XZiIgxs-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/vmw_vsock/af_vsock.c:137:35: warning: 'vsock_net_callbacks' defined but not used [-Wunused-variable]
    static struct vsock_net_callbacks vsock_net_callbacks;
                                      ^~~~~~~~~~~~~~~~~~~


vim +/vsock_net_callbacks +137 net/vmw_vsock/af_vsock.c

   136	
 > 137	static struct vsock_net_callbacks vsock_net_callbacks;
   138	static DEFINE_MUTEX(vsock_net_callbacks_lock);
   139	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 3/9] vsock: add netns to vsock core
  2025-08-28  0:31 ` [PATCH net-next v5 3/9] vsock: add netns to vsock core Bobby Eshleman
@ 2025-09-02 15:39   ` Stefano Garzarella
  2025-09-02 17:10     ` Bobby Eshleman
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2025-09-02 15:39 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Wed, Aug 27, 2025 at 05:31:31PM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add netns to logic to vsock core. Additionally, modify transport hook
>prototypes to be used by later transport-specific patches (e.g.,
>*_seqpacket_allow()).
>
>Namespaces are supported primarily by changing socket lookup functions
>(e.g., vsock_find_connected_socket()) to take into account the socket
>namespace and the namespace mode before considering a candidate socket a
>"match".
>
>Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
>used by transports that do not support namespacing. This dummy always
>has mode "global" to preserve previous CID behavior.
>
>This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
>accepts the "global" or "local" mode strings.
>
>The transports (besides vhost) are modified to use the global dummy.
>
>Add netns functionality (initialization, passing to transports, procfs,
>etc...) to the af_vsock socket layer. Later patches that add netns
>support to transports depend on this patch.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>
>---
>Changes in v5:
>- vsock_global_net() -> vsock_global_dummy_net()
>- update comments for new uAPI
>- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
>- add prototype changes so patch remains compilable
>---
> drivers/vhost/vsock.c                   |   4 +-
> include/net/af_vsock.h                  |  13 +-
> net/vmw_vsock/af_vsock.c                | 202 +++++++++++++++++++++++++++++---
> net/vmw_vsock/hyperv_transport.c        |   2 +-
> net/vmw_vsock/virtio_transport.c        |   5 +-
> net/vmw_vsock/virtio_transport_common.c |   4 +-
> net/vmw_vsock/vmci_transport.c          |   4 +-
> net/vmw_vsock/vsock_loopback.c          |   4 +-
> 8 files changed, 210 insertions(+), 28 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index ae01457ea2cd..34adf0cf9124 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
> 	return true;
> }
>
>-static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
> 	.transport = {
>@@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
> 	.send_pkt = vhost_transport_send_pkt,
> };
>
>-static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
> 	struct vhost_vsock *vsock;
> 	bool seqpacket_allow = false;
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 5707514c30b6..83f873174ba3 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -144,7 +144,7 @@ struct vsock_transport {
> 				     int flags);
> 	int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> 				 size_t len);
>-	bool (*seqpacket_allow)(u32 remote_cid);
>+	bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
> 	u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
>
> 	/* Notification. */
>@@ -214,9 +214,10 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
> void vsock_insert_connected(struct vsock_sock *vsk);
> void vsock_remove_bound(struct vsock_sock *vsk);
> void vsock_remove_connected(struct vsock_sock *vsk);
>-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
>+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net);
> struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>-					 struct sockaddr_vm *dst);
>+					 struct sockaddr_vm *dst,
>+					 struct net *net);
> void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(struct vsock_transport 
> *transport,
> 				     void (*fn)(struct sock *sk));
>@@ -258,6 +259,12 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> 	return t->msgzerocopy_allow && t->msgzerocopy_allow();
> }
>
>+extern struct net __vsock_global_dummy_net;
>+static inline struct net *vsock_global_dummy_net(void)
>+{
>+	return &__vsock_global_dummy_net;
>+}
>+
> static inline u8 vsock_net_mode(struct net *net)
> {
> 	enum vsock_net_mode ret;
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 0538948d5fd9..68a8875c8106 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -83,6 +83,24 @@
>  *   TCP_ESTABLISHED - connected
>  *   TCP_CLOSING - disconnecting
>  *   TCP_LISTEN - listening
>+ *
>+ * - Namespaces in vsock support two different modes configured
>+ *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
>+ *   Each mode defines how the namespace interacts with CIDs.
>+ *   /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
>+ *   and locked down by a namespace manager. The default is "global". The mode
>+ *   is set per-namespace.
>+ *
>+ *   The modes affect the allocation and accessibility of CIDs as follows:
>+ *   - global - aka fully public
>+ *      - CID allocation draws from the public pool
>+ *      - AF_VSOCK sockets may reach any CID allocated from the public pool
>+ *      - AF_VSOCK sockets may not reach CIDs allocated from private 
>pools

Should we define what public and private pools are?

What I found difficult to understand was the allocation of CIDs, meaning 
I had to reread it two or three times to perhaps understand it.

IIUC, netns with mode=global can only allocate public CIDs, while netns 
with mode=local can only allocate private CIDs, right?

Perhaps we should first better define how CIDs are allocated and then 
explain the interaction between them.

>+ *
>+ *   - local - aka fully private
>+ *     - CID allocation draws only from the private pool, does not affect public pool
>+ *     - AF_VSOCK sockets may only reach CIDs from the private pool
>+ *     - AF_VSOCK sockets may not reach CIDs allocated from outside the pool

Why using "may" ? I mean, can be cases when this is not true?

>  */
>
> #include <linux/compat.h>
>@@ -100,6 +118,7 @@
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/net.h>
>+#include <linux/proc_fs.h>
> #include <linux/poll.h>
> #include <linux/random.h>
> #include <linux/skbuff.h>
>@@ -111,6 +130,7 @@
> #include <linux/workqueue.h>
> #include <net/sock.h>
> #include <net/af_vsock.h>
>+#include <net/netns/vsock.h>
> #include <uapi/linux/vm_sockets.h>
> #include <uapi/asm-generic/ioctls.h>
>
>@@ -149,6 +169,9 @@ static const struct vsock_transport *transport_dgram;
> static const struct vsock_transport *transport_local;
> static DEFINE_MUTEX(vsock_register_mutex);
>
>+struct net __vsock_global_dummy_net;
>+EXPORT_SYMBOL_GPL(__vsock_global_dummy_net);
>+
> /**** UTILS ****/
>
> /* Each bound VSocket is stored in the bind hash table and each connected
>@@ -235,33 +258,42 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> 	sock_put(&vsk->sk);
> }
>
>-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr,
>+					      struct net *net)
> {
> 	struct vsock_sock *vsk;
>
> 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>+		struct sock *sk = sk_vsock(vsk);
>+
> 		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
>-			return sk_vsock(vsk);
>+			if (vsock_net_check_mode(net, sock_net(sk)))
>+				return sk;
>
> 		if (addr->svm_port == vsk->local_addr.svm_port &&
> 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
>-		     addr->svm_cid == VMADDR_CID_ANY))
>-			return sk_vsock(vsk);
>+		     addr->svm_cid == VMADDR_CID_ANY) &&
>+		     vsock_net_check_mode(net, sock_net(sk)))
>+				return sk;
> 	}
>
> 	return NULL;
> }
>
> static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>-						  struct sockaddr_vm *dst)
>+						  struct sockaddr_vm *dst,
>+						  struct net *net)
> {
> 	struct vsock_sock *vsk;
>
> 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
> 			    connected_table) {
>+		struct sock *sk = sk_vsock(vsk);
>+
> 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
>-		    dst->svm_port == vsk->local_addr.svm_port) {
>-			return sk_vsock(vsk);
>+		    dst->svm_port == vsk->local_addr.svm_port &&
>+		    vsock_net_check_mode(net, sock_net(sk))) {
>+			return sk;
> 		}
> 	}
>
>@@ -304,12 +336,12 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_remove_connected);
>
>-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net)
> {
> 	struct sock *sk;
>
> 	spin_lock_bh(&vsock_table_lock);
>-	sk = __vsock_find_bound_socket(addr);
>+	sk = __vsock_find_bound_socket(addr, net);
> 	if (sk)
> 		sock_hold(sk);
>
>@@ -320,12 +352,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
>
> struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>-					 struct sockaddr_vm *dst)
>+					 struct sockaddr_vm *dst,
>+					 struct net *net)
> {
> 	struct sock *sk;
>
> 	spin_lock_bh(&vsock_table_lock);
>-	sk = __vsock_find_connected_socket(src, dst);
>+	sk = __vsock_find_connected_socket(src, dst, net);
> 	if (sk)
> 		sock_hold(sk);
>
>@@ -528,7 +561,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>
> 	if (sk->sk_type == SOCK_SEQPACKET) {
> 		if (!new_transport->seqpacket_allow ||
>-		    !new_transport->seqpacket_allow(remote_cid)) {
>+		    !new_transport->seqpacket_allow(vsk, remote_cid)) {
> 			module_put(new_transport->module);
> 			return -ESOCKTNOSUPPORT;
> 		}
>@@ -678,6 +711,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> {
> 	static u32 port;
> 	struct sockaddr_vm new_addr;
>+	struct net *net = sock_net(sk_vsock(vsk));
>
> 	if (!port)
> 		port = get_random_u32_above(LAST_RESERVED_PORT);
>@@ -695,7 +729,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>
> 			new_addr.svm_port = port++;
>
>-			if (!__vsock_find_bound_socket(&new_addr)) {
>+			if (!__vsock_find_bound_socket(&new_addr, net)) {
> 				found = true;
> 				break;
> 			}
>@@ -712,7 +746,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> 			return -EACCES;
> 		}
>
>-		if (__vsock_find_bound_socket(&new_addr))
>+		if (__vsock_find_bound_socket(&new_addr, net))
> 			return -EADDRINUSE;
> 	}
>
>@@ -2636,6 +2670,137 @@ static struct miscdevice vsock_device = {
> 	.fops		= &vsock_device_ops,
> };
>
>+#define VSOCK_NET_MODE_STRING_MAX 7
>+
>+static int vsock_net_mode_string(const struct ctl_table *table, int write,
>+				 void *buffer, size_t *lenp, loff_t *ppos)
>+{
>+	char buf[VSOCK_NET_MODE_STRING_MAX] = {0};

Can we change `buf` name?

I find it confusing to have both a `buffer` variable and a `buf` 
variable in the same function.

>+	enum vsock_net_mode mode;
>+	struct ctl_table tmp;
>+	struct net *net;
>+	const char *p;

Can we move `p` declaration in the `if (!write) {` block?

>+	int ret;
>+
>+	if (!table->data || !table->maxlen || !*lenp) {
>+		*lenp = 0;
>+		return 0;
>+	}
>+
>+	net = current->nsproxy->net_ns;
>+	tmp = *table;
>+	tmp.data = buf;
>+
>+	if (!write) {
>+		mode = vsock_net_mode(net);
>+
>+		if (mode == VSOCK_NET_MODE_GLOBAL) {
>+			p = "global";
>+		} else if (mode == VSOCK_NET_MODE_LOCAL) {
>+			p = "local";
>+		} else {
>+			WARN_ONCE(true, "netns has invalid vsock mode");
>+			*lenp = 0;
>+			return 0;
>+		}
>+
>+		strscpy(buf, p, sizeof(buf));
>+		tmp.maxlen = strlen(p);
>+	}
>+
>+	ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
>+	if (ret)
>+		return ret;
>+
>+	if (write) {
>+		if (!strncmp(buffer, "global", 6))

Are we sure that the `buffer` is at least 6 bytes long and 
NULL-terminated?

Maybe we can just check that `lenp <= sizeof(buf)`...

Should we add macros for "global" and "local" ?


>+			mode = VSOCK_NET_MODE_GLOBAL;
>+		else if (!strncmp(buffer, "local", 5))
>+			mode = VSOCK_NET_MODE_LOCAL;
>+		else
>+			return -EINVAL;
>+
>+		if (!vsock_net_write_mode(net, mode))
>+			return -EPERM;
>+	}
>+
>+	return 0;
>+}
>+
>+static struct ctl_table vsock_table[] = {
>+	{
>+		.procname	= "ns_mode",
>+		.data		= &init_net.vsock.mode,
>+		.maxlen		= sizeof(u8),
>+		.mode		= 0644,
>+		.proc_handler	= vsock_net_mode_string
>+	},
>+};
>+
>+static int __net_init vsock_sysctl_register(struct net *net)
>+{
>+	struct ctl_table *table;
>+
>+	if (net_eq(net, &init_net)) {
>+		table = vsock_table;
>+	} else {
>+		table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);
>+		if (!table)
>+			goto err_alloc;
>+
>+		table[0].data = &net->vsock.mode;
>+	}
>+
>+	net->vsock.vsock_hdr = register_net_sysctl_sz(net, "net/vsock", table,
>+						      ARRAY_SIZE(vsock_table));
>+	if (!net->vsock.vsock_hdr)
>+		goto err_reg;
>+
>+	return 0;
>+
>+err_reg:
>+	if (!net_eq(net, &init_net))
>+		kfree(table);
>+err_alloc:
>+	return -ENOMEM;
>+}
>+
>+static void vsock_sysctl_unregister(struct net *net)
>+{
>+	const struct ctl_table *table;
>+
>+	table = net->vsock.vsock_hdr->ctl_table_arg;
>+	unregister_net_sysctl_table(net->vsock.vsock_hdr);
>+	if (!net_eq(net, &init_net))
>+		kfree(table);
>+}
>+
>+static void vsock_net_init(struct net *net)
>+{
>+	spin_lock_init(&net->vsock.lock);
>+	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
>+}
>+
>+static __net_init int vsock_sysctl_init_net(struct net *net)
>+{
>+	vsock_net_init(net);
>+
>+	if (vsock_sysctl_register(net))
>+		return -ENOMEM;
>+
>+	return 0;
>+}
>+
>+static __net_exit void vsock_sysctl_exit_net(struct net *net)
>+{
>+	vsock_sysctl_unregister(net);
>+}
>+
>+static struct pernet_operations vsock_sysctl_ops __net_initdata = {
>+	.init = vsock_sysctl_init_net,
>+	.exit = vsock_sysctl_exit_net,
>+};
>+
> static int __init vsock_init(void)
> {
> 	int err = 0;
>@@ -2663,10 +2828,19 @@ static int __init vsock_init(void)
> 		goto err_unregister_proto;
> 	}
>
>+	if (register_pernet_subsys(&vsock_sysctl_ops)) {
>+		err = -ENOMEM;
>+		goto err_unregister_sock;
>+	}
>+
>+	vsock_net_init(&init_net);
>+	vsock_net_init(vsock_global_dummy_net());
> 	vsock_bpf_build_proto();
>
> 	return 0;
>
>+err_unregister_sock:
>+	sock_unregister(AF_VSOCK);
> err_unregister_proto:
> 	proto_unregister(&vsock_proto);
> err_deregister_misc:
>diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>index 432fcbbd14d4..79bc55eeecb3 100644
>--- a/net/vmw_vsock/hyperv_transport.c
>+++ b/net/vmw_vsock/hyperv_transport.c
>@@ -313,7 +313,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
> 		return;
>
> 	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
>-	sk = vsock_find_bound_socket(&addr);
>+	sk = vsock_find_bound_socket(&addr, vsock_global_dummy_net());
> 	if (!sk)
> 		return;
>
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index b6569b0ca2bb..af3e924fcc31 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -536,7 +536,7 @@ static bool virtio_transport_msgzerocopy_allow(void)
> 	return true;
> }
>
>-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
> 	.transport = {
>@@ -593,7 +593,7 @@ static struct virtio_transport virtio_transport = {
> 	.can_msgzerocopy = virtio_transport_can_msgzerocopy,
> };
>
>-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
> 	struct virtio_vsock *vsock;
> 	bool seqpacket_allow;
>@@ -659,6 +659,7 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 			if (payload_len)
> 				virtio_vsock_skb_put(skb, payload_len);
>
>+			virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
> 			virtio_transport_deliver_tap_pkt(skb);
> 			virtio_transport_recv_pkt(&virtio_transport, skb);
> 		}
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index fe92e5fa95b4..9b3aa4f0395d 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1604,9 +1604,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 	/* The socket must be in connected or bound table
> 	 * otherwise send reset back
> 	 */
>-	sk = vsock_find_connected_socket(&src, &dst);
>+	sk = vsock_find_connected_socket(&src, &dst, vsock_global_dummy_net());
> 	if (!sk) {
>-		sk = vsock_find_bound_socket(&dst);
>+		sk = vsock_find_bound_socket(&dst, vsock_global_dummy_net());
> 		if (!sk) {
> 			(void)virtio_transport_reset_no_sock(t, skb);
> 			goto free_pkt;
>diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>index 7eccd6708d66..fd600ad77d73 100644
>--- a/net/vmw_vsock/vmci_transport.c
>+++ b/net/vmw_vsock/vmci_transport.c
>@@ -703,9 +703,9 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
> 	vsock_addr_init(&src, pkt->dg.src.context, pkt->src_port);
> 	vsock_addr_init(&dst, pkt->dg.dst.context, pkt->dst_port);
>
>-	sk = vsock_find_connected_socket(&src, &dst);
>+	sk = vsock_find_connected_socket(&src, &dst, vsock_global_dummy_net());
> 	if (!sk) {
>-		sk = vsock_find_bound_socket(&dst);
>+		sk = vsock_find_bound_socket(&dst, vsock_global_dummy_net());
> 		if (!sk) {
> 			/* We could not find a socket for this specified
> 			 * address.  If this packet is a RST, we just drop it.
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index 6e78927a598e..1b2fab73e0d0 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
> 	return 0;
> }
>
>-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
>+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
> static bool vsock_loopback_msgzerocopy_allow(void)
> {
> 	return true;
>@@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = {
> 	.send_pkt = vsock_loopback_send_pkt,
> };
>
>-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
>+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> {
> 	return true;
> }
>
>-- 
>2.47.3
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 4/9] vsock/loopback: add netns support
  2025-08-28  0:31 ` [PATCH net-next v5 4/9] vsock/loopback: add netns support Bobby Eshleman
  2025-08-28 10:35   ` kernel test robot
@ 2025-09-02 15:39   ` Stefano Garzarella
  2025-09-02 18:09     ` Bobby Eshleman
  1 sibling, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2025-09-02 15:39 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Wed, Aug 27, 2025 at 05:31:32PM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add NS support to vsock loopback. Sockets in a global mode netns
>communicate with each other, regardless of namespace. Sockets in a local
>mode netns may only communicate with other sockets within the same
>namespace.
>
>Add callbacks for transport to hook into the initialization and exit of
>net namespaces.
>
>The transport's init hook will be called once per netns init. Likewise
>for exit.
>
>When a set of init/exit callbacks is registered, the init callback is
>called on each already existing namespace.
>
>Only one callback registration is supported for now. Currently
>vsock_loopback is the only user.

Why?

In general, commit descriptions (and code comments) should focus on the 
reason (why?) to simplify also the review.

>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>
>---
>Changes in v5:
>- add callbacks code to avoid reverse dependency
>- add logic for handling vsock_loopback setup for already existing
>  namespaces
>---
> include/net/af_vsock.h         |  34 +++++++++++++
> include/net/netns/vsock.h      |   5 ++
> net/vmw_vsock/af_vsock.c       | 110 +++++++++++++++++++++++++++++++++++++++++
> net/vmw_vsock/vsock_loopback.c |  72 ++++++++++++++++++++++++---
> 4 files changed, 213 insertions(+), 8 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 83f873174ba3..9333a98b9a1e 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -305,4 +305,38 @@ static inline bool vsock_net_check_mode(struct net *n1, struct net *n2)
> 	       (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL &&
> 		vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL);
> }
>+
>+struct vsock_net_callbacks {
>+	int (*init)(struct net *net);
>+	void (*exit)(struct net *net);
>+	struct module *owner;
>+};
>+
>+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>+
>+#define vsock_register_net_callbacks(__init, __exit) \
>+	__vsock_register_net_callbacks((__init), (__exit), THIS_MODULE)
>+
>+int __vsock_register_net_callbacks(int (*init)(struct net *net),
>+				   void (*exit)(struct net *net),
>+				   struct module *owner);
>+void vsock_unregister_net_callbacks(void);
>+
>+#else
>+
>+#define vsock_register_net_callbacks(__init, __exit) do { } while (0)
>+
>+static inline int __vsock_register_net_callbacks(int (*init)(struct net *net),
>+						 void (*exit)(struct net *net),
>+						 struct module *owner)
>+{
>+	return 0;
>+}
>+
>+static inline void vsock_unregister_net_callbacks(void) {}
>+static inline int vsock_net_call_init(struct net *net) { return 0; }
>+static inline void vsock_net_call_exit(struct net *net) {}
>+
>+#endif /* CONFIG_VSOCKETS_LOOPBACK */
>+
> #endif /* __AF_VSOCK_H__ */
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>index d4593c0b8dc4..08d9a933c540 100644
>--- a/include/net/netns/vsock.h
>+++ b/include/net/netns/vsock.h
>@@ -9,6 +9,8 @@ enum vsock_net_mode {
> 	VSOCK_NET_MODE_LOCAL,
> };
>
>+struct vsock_loopback;
>+
> struct netns_vsock {
> 	struct ctl_table_header *vsock_hdr;
> 	spinlock_t lock;
>@@ -16,5 +18,8 @@ struct netns_vsock {
> 	/* protected by lock */
> 	enum vsock_net_mode mode;
> 	bool written;
>+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>+	struct vsock_loopback *loopback;

If this is not protected by `lock`, please leave an empty line, but 
maybe we should consider using locking (see comment later).

>+#endif
> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 68a8875c8106..5a73d9e1a96f 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -134,6 +134,9 @@
> #include <uapi/linux/vm_sockets.h>
> #include <uapi/asm-generic/ioctls.h>
>
>+static struct vsock_net_callbacks vsock_net_callbacks;
>+static DEFINE_MUTEX(vsock_net_callbacks_lock);
>+
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>@@ -2781,6 +2784,49 @@ static void vsock_net_init(struct net *net)
> 	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
> }
>
>+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>+static int vsock_net_call_init(struct net *net)
>+{
>+	struct vsock_net_callbacks *cbs;
>+	int ret;
>+
>+	mutex_lock(&vsock_net_callbacks_lock);
>+	cbs = &vsock_net_callbacks;
>+
>+	ret = 0;
>+	if (!cbs->owner)
>+		goto out;
>+
>+	if (try_module_get(cbs->owner)) {
>+		ret = cbs->init(net);
>+		module_put(cbs->owner);
>+	}
>+
>+out:
>+	mutex_unlock(&vsock_net_callbacks_lock);
>+	return ret;
>+}
>+
>+static void vsock_net_call_exit(struct net *net)
>+{
>+	struct vsock_net_callbacks *cbs;
>+
>+	mutex_lock(&vsock_net_callbacks_lock);
>+	cbs = &vsock_net_callbacks;
>+
>+	if (!cbs->owner)
>+		goto out;
>+
>+	if (try_module_get(cbs->owner)) {
>+		cbs->exit(net);
>+		module_put(cbs->owner);
>+	}
>+
>+out:
>+	mutex_unlock(&vsock_net_callbacks_lock);
>+}
>+#endif /* CONFIG_VSOCKETS_LOOPBACK */
>+
> static __net_init int vsock_sysctl_init_net(struct net *net)
> {
> 	vsock_net_init(net);
>@@ -2788,12 +2834,20 @@ static __net_init int vsock_sysctl_init_net(struct net *net)
> 	if (vsock_sysctl_register(net))
> 		return -ENOMEM;
>
>+	if (vsock_net_call_init(net) < 0)
>+		goto err_sysctl;
>+
> 	return 0;
>+
>+err_sysctl:
>+	vsock_sysctl_unregister(net);
>+	return -ENOMEM;
> }
>
> static __net_exit void vsock_sysctl_exit_net(struct net *net)
> {
> 	vsock_sysctl_unregister(net);
>+	vsock_net_call_exit(net);
> }
>
> static struct pernet_operations vsock_sysctl_ops __net_initdata = {
>@@ -2938,6 +2992,62 @@ void vsock_core_unregister(const struct 
>vsock_transport *t)
> }
> EXPORT_SYMBOL_GPL(vsock_core_unregister);
>
>+#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>+int __vsock_register_net_callbacks(int (*init)(struct net *net),
>+				   void (*exit)(struct net *net),
>+				   struct module *owner)
>+{
>+	struct vsock_net_callbacks *cbs;
>+	struct net *net;
>+	int ret = 0;
>+
>+	mutex_lock(&vsock_net_callbacks_lock);
>+
>+	cbs = &vsock_net_callbacks;
>+	cbs->init = init;
>+	cbs->exit = exit;
>+	cbs->owner = owner;
>+
>+	/* call callbacks on any net previously created */
>+	down_read(&net_rwsem);
>+
>+	if (try_module_get(cbs->owner)) {
>+		for_each_net(net) {
>+			ret = cbs->init(net);
>+			if (ret < 0)
>+				break;
>+		}
>+
>+		if (ret < 0)
>+			for_each_net(net)
>+				cbs->exit(net);
>+
>+		module_put(cbs->owner);
>+	}
>+
>+	up_read(&net_rwsem);
>+	mutex_unlock(&vsock_net_callbacks_lock);
>+
>+	return ret;
>+}
>+EXPORT_SYMBOL_GPL(__vsock_register_net_callbacks);
>+
>+void vsock_unregister_net_callbacks(void)
>+{
>+	struct vsock_net_callbacks *cbs;
>+
>+	mutex_lock(&vsock_net_callbacks_lock);
>+
>+	cbs = &vsock_net_callbacks;
>+	cbs->init = NULL;
>+	cbs->exit = NULL;
>+	cbs->owner = NULL;
>+
>+	mutex_unlock(&vsock_net_callbacks_lock);
>+}
>+EXPORT_SYMBOL_GPL(vsock_unregister_net_callbacks);

IIUC this function is called only in the error path of 
`vsock_loopback_init()`, but shuold we call it also in the 
vsock_loopback_exit() ?

>+#endif /* CONFIG_VSOCKETS_LOOPBACK */
>+
> module_init(vsock_init);
> module_exit(vsock_exit);
>
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index 1b2fab73e0d0..f16d21711cb0 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -28,8 +28,19 @@ static u32 vsock_loopback_get_local_cid(void)
>
> static int vsock_loopback_send_pkt(struct sk_buff *skb)
> {
>-	struct vsock_loopback *vsock = &the_vsock_loopback;
>+	struct vsock_loopback *vsock;
> 	int len = skb->len;
>+	struct net *net;
>+
>+	if (skb->sk)
>+		net = sock_net(skb->sk);
>+	else
>+		net = NULL;

Why we can't use `virtio_vsock_skb_net` here?

>+
>+	if (net && net->vsock.mode == VSOCK_NET_MODE_LOCAL)
>+		vsock = net->vsock.loopback;
>+	else
>+		vsock = &the_vsock_loopback;
>
> 	virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
> 	queue_work(vsock->workqueue, &vsock->pkt_work);
>@@ -134,27 +145,72 @@ static void vsock_loopback_work(struct work_struct *work)
> 	}
> }
>
>-static int __init vsock_loopback_init(void)
>+static int vsock_loopback_init_vsock(struct vsock_loopback *vsock)
> {
>-	struct vsock_loopback *vsock = &the_vsock_loopback;
>-	int ret;
>-
> 	vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0);
> 	if (!vsock->workqueue)
> 		return -ENOMEM;
>
> 	skb_queue_head_init(&vsock->pkt_queue);
> 	INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
>+	return 0;
>+}
>+
>+static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock)
>+{
>+	if (vsock->workqueue)
>+		destroy_workqueue(vsock->workqueue);
>+}
>+
>+/* called with vsock_net_callbacks lock held */
>+static int vsock_loopback_init_net(struct net *net)
>+{
>+	if (WARN_ON_ONCE(net->vsock.loopback))
>+		return 0;
>+

Do we need some kind of locking here? I mean when reading/setting 
`net->vsock.loopback`?

>+	net->vsock.loopback = kmalloc(sizeof(*net->vsock.loopback), 
>GFP_KERNEL);
>+	if (!net->vsock.loopback)
>+		return -ENOMEM;
>+
>+	return vsock_loopback_init_vsock(net->vsock.loopback);
>+}
>+
>+/* called with vsock_net_callbacks lock held */
>+static void vsock_loopback_exit_net(struct net *net)
>+{
>+	if (net->vsock.loopback) {
>+		vsock_loopback_deinit_vsock(net->vsock.loopback);
>+		kfree(net->vsock.loopback);

Should we set `net->vsock.loopback` to NULL here?

>+	}
>+}
>+
>+static int __init vsock_loopback_init(void)
>+{
>+	struct vsock_loopback *vsock = &the_vsock_loopback;
>+	int ret;
>+
>+	ret = vsock_loopback_init_vsock(vsock);
>+	if (ret < 0)
>+		return ret;
>+
>+	ret = vsock_register_net_callbacks(vsock_loopback_init_net,
>+					   vsock_loopback_exit_net);

IIUC we need this only here because for now the only other transport 
supported is vhost-vsock, and IIUC `struct vhost_vsock *` there is 
handled with a map instead of a static variable, and `net` assigned when 
/dev/vhost-vsock is opened, right?

If in the future we will need to support G2H transports, like 
virtio-transport, we need to do something similar, right?

BTW I think we really need to exaplin this better in the commit 
description. It tooks me a while to get all of this (if it's correct)

Thanks,
Stefano

>+	if (ret < 0)
>+		goto out_deinit_vsock;
>
> 	ret = vsock_core_register(&loopback_transport.transport,
> 				  VSOCK_TRANSPORT_F_LOCAL);
> 	if (ret)
>-		goto out_wq;
>+		goto out_unregister_net;
>+
>
> 	return 0;
>
>-out_wq:
>-	destroy_workqueue(vsock->workqueue);
>+out_unregister_net:
>+	vsock_unregister_net_callbacks();
>+
>+out_deinit_vsock:
>+	vsock_loopback_deinit_vsock(vsock);
> 	return ret;
> }
>
>
>-- 
>2.47.3
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 9/9] selftests/vsock: add namespace tests
  2025-08-28  0:31 ` [PATCH net-next v5 9/9] selftests/vsock: add namespace tests Bobby Eshleman
@ 2025-09-02 15:40   ` Stefano Garzarella
  2025-09-02 18:10     ` Bobby Eshleman
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2025-09-02 15:40 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Wed, Aug 27, 2025 at 05:31:37PM -0700, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add tests for namespace support in vsock. Use socat for basic connection

Are netns tests skipped if the kernel doesn't support it?

Thanks,
Stefano

>failure tests and vsock_test for full functionality tests when
>communication is expected to succeed. vsock_test is not used for failure
>cases because in theory vsock_test could allow connection and some
>traffic flow but fail on some other case (e.g., fail on MSG_ZEROCOPY).
>
>Tests cover all cases of clients and servers being in all variants of
>local ns, global ns, host process, and VM process.
>
>Legacy tests are retained and executed in the init ns.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>
>---
>Changes in v5:
>- use /proc/sys/net/vsock/ns_mode
>- clarify logic of tests that reuse the same VM and tests that require
>  netns setup
>- fix unassigned BUILD bug
>---
> tools/testing/selftests/vsock/vmtest.sh | 913 ++++++++++++++++++++++++++++----
> 1 file changed, 808 insertions(+), 105 deletions(-)
>
>diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
>index 5e36d1068f6f..9d830eb7e829 100755
>--- a/tools/testing/selftests/vsock/vmtest.sh
>+++ b/tools/testing/selftests/vsock/vmtest.sh
>@@ -7,6 +7,7 @@
> #		* virtme-ng
> #		* busybox-static (used by virtme-ng)
> #		* qemu	(used by virtme-ng)
>+#		* socat
>
> readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
> readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
>@@ -23,7 +24,7 @@ readonly VSOCK_CID=1234
> readonly WAIT_PERIOD=3
> readonly WAIT_PERIOD_MAX=60
> readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
>-readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+readonly WAIT_QEMU=5
>
> # virtme-ng offers a netdev for ssh when using "--ssh", but we also need a
> # control port forwarded for vsock_test.  Because virtme-ng doesn't support
>@@ -33,23 +34,125 @@ readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
> # add the kernel cmdline options that virtme-init uses to setup the interface.
> readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}"
> readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}"
>-readonly QEMU_OPTS="\
>-	 -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
>-	 -device virtio-net-pci,netdev=n0 \
>-	 -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \
>-	 --pidfile ${QEMU_PIDFILE} \
>-"
> readonly KERNEL_CMDLINE="\
> 	virtme.dhcp net.ifnames=0 biosdevname=0 \
> 	virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \
> "
> readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
>-readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
>+readonly TEST_NAMES=(
>+	vm_server_host_client
>+	vm_client_host_server
>+	vm_loopback
>+	host_vsock_ns_mode_ok
>+	host_vsock_ns_mode_write_once_ok
>+	global_same_cid_fails
>+	local_same_cid_ok
>+	global_local_same_cid_ok
>+	local_global_same_cid_ok
>+	diff_ns_global_host_connect_to_global_vm_ok
>+	diff_ns_global_host_connect_to_local_vm_fails
>+	diff_ns_global_vm_connect_to_global_host_ok
>+	diff_ns_global_vm_connect_to_local_host_fails
>+	diff_ns_local_host_connect_to_local_vm_fails
>+	diff_ns_local_vm_connect_to_local_host_fails
>+	diff_ns_global_to_local_loopback_local_fails
>+	diff_ns_local_to_global_loopback_fails
>+	diff_ns_local_to_local_loopback_fails
>+	diff_ns_global_to_global_loopback_ok
>+	same_ns_local_loopback_ok
>+	same_ns_local_host_connect_to_local_vm_ok
>+	same_ns_local_vm_connect_to_local_host_ok
>+)
>+
> readonly TEST_DESCS=(
>+	# vm_server_host_client
> 	"Run vsock_test in server mode on the VM and in client mode on the host."
>+
>+	# vm_client_host_server
> 	"Run vsock_test in client mode on the VM and in server mode on the host."
>+
>+	# vm_loopback
> 	"Run vsock_test using the loopback transport in the VM."
>+
>+	# host_vsock_ns_mode_ok
>+	"Check /proc/sys/net/vsock/ns_mode strings on the host."
>+
>+	# host_vsock_ns_mode_write_once_ok
>+	"Check /proc/sys/net/vsock/ns_mode is write-once on the host."
>+
>+	# global_same_cid_fails
>+	"Check QEMU fails to start two VMs with same CID in two different global namespaces."
>+
>+	# local_same_cid_ok
>+	"Check QEMU successfully starts two VMs with same CID in two different local namespaces."
>+
>+	# global_local_same_cid_ok
>+	"Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID."
>+
>+	# local_global_same_cid_ok
>+	"Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
>+
>+	# diff_ns_global_host_connect_to_global_vm_ok
>+	"Run vsock_test client in global ns with server in VM in another global ns."
>+
>+	# diff_ns_global_host_connect_to_local_vm_fails
>+	"Run socat to test a process in a global ns fails to connect to a VM in a local ns."
>+
>+	# diff_ns_global_vm_connect_to_global_host_ok
>+	"Run vsock_test client in VM in a global ns with server in another global ns."
>+
>+	# diff_ns_global_vm_connect_to_local_host_fails
>+	"Run socat to test a VM in a global ns fails to connect to a host process in a local ns."
>+
>+	# diff_ns_local_host_connect_to_local_vm_fails
>+	"Run socat to test a host process in a local ns fails to connect to a VM in another local ns."
>+
>+	# diff_ns_local_vm_connect_to_local_host_fails
>+	"Run socat to test a VM in a local ns fails to connect to a host process in another local ns."
>+
>+	# diff_ns_global_to_local_loopback_local_fails
>+	"Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns."
>+
>+	# diff_ns_local_to_global_loopback_fails
>+	"Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns."
>+
>+	# diff_ns_local_to_local_loopback_fails
>+	"Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns."
>+
>+	# diff_ns_global_to_global_loopback_ok
>+	"Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns."
>+
>+	# same_ns_local_loopback_ok
>+	"Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns."
>+
>+	# same_ns_local_host_connect_to_local_vm_ok
>+	"Run vsock_test client in a local ns with server in VM in same ns."
>+
>+	# same_ns_local_vm_connect_to_local_host_ok
>+	"Run vsock_test client in VM in a local ns with server in same ns."
>+)
>+
>+readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
>+readonly USE_INIT_NETNS=(
>+	global_same_cid_fails
>+	local_same_cid_ok
>+	global_local_same_cid_ok
>+	local_global_same_cid_ok
>+	diff_ns_global_host_connect_to_global_vm_ok
>+	diff_ns_global_host_connect_to_local_vm_fails
>+	diff_ns_global_vm_connect_to_global_host_ok
>+	diff_ns_global_vm_connect_to_local_host_fails
>+	diff_ns_local_host_connect_to_local_vm_fails
>+	diff_ns_local_vm_connect_to_local_host_fails
>+	diff_ns_global_to_local_loopback_local_fails
>+	diff_ns_local_to_global_loopback_fails
>+	diff_ns_local_to_local_loopback_fails
>+	diff_ns_global_to_global_loopback_ok
>+	same_ns_local_loopback_ok
>+	same_ns_local_host_connect_to_local_vm_ok
>+	same_ns_local_vm_connect_to_local_host_ok
> )
>+readonly MODES=("local" "global")
>
> readonly LOG_LEVEL_DEBUG=0
> readonly LOG_LEVEL_INFO=1
>@@ -58,6 +161,12 @@ readonly LOG_LEVEL_ERROR=3
>
> VERBOSE="${LOG_LEVEL_WARN}"
>
>+# Test pass/fail counters
>+cnt_pass=0
>+cnt_fail=0
>+cnt_skip=0
>+cnt_total=0
>+
> usage() {
> 	local name
> 	local desc
>@@ -77,7 +186,7 @@ usage() {
> 	for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
> 		name=${TEST_NAMES[${i}]}
> 		desc=${TEST_DESCS[${i}]}
>-		printf "\t%-35s%-35s\n" "${name}" "${desc}"
>+		printf "\t%-55s%-35s\n" "${name}" "${desc}"
> 	done
> 	echo
>
>@@ -89,21 +198,87 @@ die() {
> 	exit "${KSFT_FAIL}"
> }
>
>+add_namespaces() {
>+	# add namespaces local0, local1, global0, and global1
>+	for mode in "${MODES[@]}"; do
>+		ip netns add "${mode}0" 2>/dev/null
>+		ip netns add "${mode}1" 2>/dev/null
>+	done
>+}
>+
>+init_namespaces() {
>+	for mode in "${MODES[@]}"; do
>+		ns_set_mode "${mode}0" "${mode}"
>+		ns_set_mode "${mode}1" "${mode}"
>+
>+		log_host "set ns ${mode}0 to mode ${mode}"
>+		log_host "set ns ${mode}1 to mode ${mode}"
>+
>+		# we need lo for qemu port forwarding
>+		ip netns exec "${mode}0" ip link set dev lo up
>+		ip netns exec "${mode}1" ip link set dev lo up
>+	done
>+}
>+
>+del_namespaces() {
>+	for mode in "${MODES[@]}"; do
>+		ip netns del "${mode}0"
>+		ip netns del "${mode}1"
>+		log_host "removed ns ${mode}0"
>+		log_host "removed ns ${mode}1"
>+	done &>/dev/null
>+}
>+
>+ns_set_mode() {
>+	local ns=$1
>+	local mode=$2
>+
>+	echo "${mode}" | ip netns exec "${ns}" \
>+		tee /proc/sys/net/vsock/ns_mode &>/dev/null
>+}
>+
> vm_ssh() {
>-	ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
>+	local ns_exec
>+
>+	if [[ "${1}" == none ]]; then
>+		local ns_exec=""
>+	else
>+		local ns_exec="ip netns exec ${1}"
>+	fi
>+
>+	shift
>+
>+	${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost $*
>+
> 	return $?
> }
>
> cleanup() {
>-	if [[ -s "${QEMU_PIDFILE}" ]]; then
>-		pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
>-	fi
>+	del_namespaces
>+}
>
>-	# If failure occurred during or before qemu start up, then we need
>-	# to clean this up ourselves.
>-	if [[ -e "${QEMU_PIDFILE}" ]]; then
>-		rm "${QEMU_PIDFILE}"
>-	fi
>+terminate_pidfiles() {
>+	local pidfile
>+
>+	for pidfile in "$@"; do
>+		if [[ -s "${pidfile}" ]]; then
>+			pkill -SIGTERM -F "${pidfile}" 2>&1 > /dev/null
>+		fi
>+
>+		# If failure occurred during or before qemu start up, then we need
>+		# to clean this up ourselves.
>+		if [[ -e "${pidfile}" ]]; then
>+			rm -f "${pidfile}"
>+		fi
>+	done
>+}
>+
>+terminate_pids() {
>+	local pid
>+
>+	for pid in "$@"; do
>+		kill -SIGTERM "${pid}" &>/dev/null || :
>+	done
> }
>
> check_args() {
>@@ -133,7 +308,7 @@ check_args() {
> }
>
> check_deps() {
>-	for dep in vng ${QEMU} busybox pkill ssh; do
>+	for dep in vng ${QEMU} busybox pkill ssh socat; do
> 		if [[ ! -x $(command -v "${dep}") ]]; then
> 			echo -e "skip:    dependency ${dep} not found!\n"
> 			exit "${KSFT_SKIP}"
>@@ -170,6 +345,20 @@ check_vng() {
> 	fi
> }
>
>+check_socat() {
>+	local support_string
>+
>+	support_string="$(socat -V)"
>+
>+	if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then
>+		die "err: socat is missing vsock support"
>+	fi
>+
>+	if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then
>+		die "err: socat is missing unix support"
>+	fi
>+}
>+
> handle_build() {
> 	if [[ ! "${BUILD}" -eq 1 ]]; then
> 		return
>@@ -194,9 +383,14 @@ handle_build() {
> }
>
> vm_start() {
>+	local cid=$1
>+	local ns=$2
>+	local pidfile=$3
> 	local logfile=/dev/null
> 	local verbose_opt=""
>+	local qemu_opts=""
> 	local kernel_opt=""
>+	local ns_exec=""
> 	local qemu
>
> 	qemu=$(command -v "${QEMU}")
>@@ -206,27 +400,37 @@ vm_start() {
> 		logfile=/dev/stdout
> 	fi
>
>+	qemu_opts="\
>+		 -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
>+		 -device virtio-net-pci,netdev=n0 \
>+		${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${cid} \
>+		--pidfile ${pidfile}
>+	"
>+
> 	if [[ "${BUILD}" -eq 1 ]]; then
> 		kernel_opt="${KERNEL_CHECKOUT}"
> 	fi
>
>-	vng \
>+	if [[ "${ns}" != "none" ]]; then
>+		ns_exec="ip netns exec ${ns}"
>+	fi
>+
>+	${ns_exec} vng \
> 		--run \
> 		${kernel_opt} \
> 		${verbose_opt} \
>-		--qemu-opts="${QEMU_OPTS}" \
>+		--qemu-opts="${qemu_opts}" \
> 		--qemu="${qemu}" \
> 		--user root \
> 		--append "${KERNEL_CMDLINE}" \
> 		--rw  &> ${logfile} &
>
>-	if ! timeout ${WAIT_TOTAL} \
>-		bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then
>-		die "failed to boot VM"
>-	fi
>+	timeout "${WAIT_QEMU}" \
>+		bash -c 'while [[ ! -s '"${pidfile}"' ]]; do sleep 1; done; exit 0'
> }
>
> vm_wait_for_ssh() {
>+	local ns=$1
> 	local i
>
> 	i=0
>@@ -234,7 +438,8 @@ vm_wait_for_ssh() {
> 		if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
> 			die "Timed out waiting for guest ssh"
> 		fi
>-		if vm_ssh -- true; then
>+
>+		if vm_ssh "${ns}" -- true; then
> 			break
> 		fi
> 		i=$(( i + 1 ))
>@@ -269,6 +474,7 @@ wait_for_listener()
> 		   grep -q "${pattern}"; then
> 			break
> 		fi
>+
> 		sleep "${interval}"
> 	done
>
>@@ -278,17 +484,29 @@ wait_for_listener()
> }
>
> vm_wait_for_listener() {
>-	local port=$1
>+	local ns=$1
>+	local port=$2
>+
>+	log "Waiting for listener on port ${port} on vm"
>
>-	vm_ssh <<EOF
>+	vm_ssh "${ns}" <<EOF
> $(declare -f wait_for_listener)
> wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
> EOF
> }
>
> host_wait_for_listener() {
>-	wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
>+	local ns=$1
>+	local port=$2
>
>+	if [[ "${ns}" == none ]]; then
>+		wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
>+	else
>+		ip netns exec "${ns}" bash <<-EOF
>+			$(declare -f wait_for_listener)
>+			wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
>+		EOF
>+	fi
> }
>
> log() {
>@@ -427,51 +645,506 @@ test_vm_client_host_server() {
> }
>
> test_vm_loopback() {
>+	vm_ssh "none" modprobe vsock_loopback || :
> 	vm_vsock_test "none" "server" 1 "${TEST_HOST_PORT_LISTENER}"
> 	vm_vsock_test "none" "client" "127.0.0.1" 1 "${TEST_HOST_PORT_LISTENER}"
> }
>
>+test_host_vsock_ns_mode_ok() {
>+	add_namespaces
>+
>+	for mode in "${MODES[@]}"; do
>+		if ! ns_set_mode "${mode}0" "${mode}"; then
>+			del_namespaces
>+			return "${KSFT_FAIL}"
>+		fi
>+	done
>
>+	del_namespaces
> }
>
>-test_vm_client_host_server() {
>+test_host_vsock_ns_mode_write_once_ok() {
>+	add_namespaces
>
>-	${VSOCK_TEST} \
>-		--mode "server" \
>-		--control-port "${TEST_HOST_PORT_LISTENER}" \
>-		--peer-cid "${VSOCK_CID}" 2>&1 | log_host &
>+	for mode in "${MODES[@]}"; do
>+		local ns="${mode}0"
>+		if ! ns_set_mode "${ns}" "${mode}"; then
>+			del_namespaces
>+			return "${KSFT_FAIL}"
>+		fi
>
>-	host_wait_for_listener
>+		# try writing again and expect failure
>+		if ns_set_mode "${ns}" "${mode}"; then
>+			del_namespaces
>+			return "${KSFT_FAIL}"
>+		fi
>+	done
>
>-	vm_ssh -- "${VSOCK_TEST}" \
>-		--mode=client \
>-		--control-host=10.0.2.2 \
>-		--peer-cid=2 \
>-		--control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest
>+	del_namespaces
>
>-	return $?
>+	return "${KSFT_PASS}"
> }
>
>-test_vm_loopback() {
>-	local port=60000 # non-forwarded local port
>+namespaces_can_boot_same_cid() {
>+	local ns0=$1
>+	local ns1=$2
>+	local pidfile1 pidfile2
>+	local cid=20
>+	readonly cid
>+	local rc
>
>-	vm_ssh -- "${VSOCK_TEST}" \
>-		--mode=server \
>-		--control-port="${port}" \
>-		--peer-cid=1 2>&1 | log_guest &
>+	pidfile1=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	vm_start "${cid}" "${ns0}" "${pidfile1}"
>
>-	vm_wait_for_listener "${port}"
>+	pidfile2=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	vm_start "${cid}" "${ns1}" "${pidfile2}"
>
>-	vm_ssh -- "${VSOCK_TEST}" \
>-		--mode=client \
>-		--control-host="127.0.0.1" \
>-		--control-port="${port}" \
>-		--peer-cid=1 2>&1 | log_guest
>+	rc=$?
>+	terminate_pidfiles "${pidfile1}" "${pidfile2}"
>
>-	return $?
>+	return $rc
>+}
>+
>+test_global_same_cid_fails() {
>+	if namespaces_can_boot_same_cid "global0" "global1"; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	return "${KSFT_PASS}"
>+}
>+
>+test_local_global_same_cid_ok() {
>+	if namespaces_can_boot_same_cid "local0" "global0"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_global_local_same_cid_ok() {
>+	if namespaces_can_boot_same_cid "global0" "local0"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_local_same_cid_ok() {
>+	if namespaces_can_boot_same_cid "local0" "local0"; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	return "${KSFT_PASS}"
>+}
>+
>+test_diff_ns_global_host_connect_to_global_vm_ok() {
>+	local pids pid pidfile
>+	local ns0 ns1 port
>+	declare -a pids
>+	local unixfile
>+	ns0="global0"
>+	ns1="global1"
>+	port=1234
>+	local rc
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+
>+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	unixfile=$(mktemp -u /tmp/XXXX.sock)
>+	ip netns exec "${ns1}" \
>+		socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \
>+			UNIX-CONNECT:"${unixfile}" &
>+	pids+=($!)
>+	host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}"
>+
>+	ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \
>+		TCP-CONNECT:localhost:"${TEST_HOST_PORT}" &
>+	pids+=($!)
>+
>+	vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}"
>+	vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}"
>+	host_vsock_test "${ns1}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
>+	rc=$?
>+
>+	for pid in "${pids[@]}"; do
>+		if [[ "$(jobs -p)" = *"${pid}"* ]]; then
>+			kill -SIGTERM "${pid}" &>/dev/null
>+		fi
>+	done
>+
>+	terminate_pidfiles "${pidfile}"
>+
>+	if [[ $rc -ne 0 ]]; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	return "${KSFT_PASS}"
>+}
>+
>+test_diff_ns_global_host_connect_to_local_vm_fails() {
>+	local ns0="global0"
>+	local ns1="local0"
>+	local port=12345
>+	local pidfile
>+	local result
>+	local pid
>+
>+	outfile=$(mktemp)
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	if ! vm_start "${VSOCK_CID}" "${ns1}" "${pidfile}"; then
>+		log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})"
>+		return $KSFT_FAIL
>+	fi
>+
>+	vm_wait_for_ssh "${ns1}"
>+	vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
>+	echo TEST | ip netns exec "${ns0}" \
>+		socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
>+
>+	terminate_pidfiles "${pidfile}"
>+
>+	result=$(cat "${outfile}")
>+	rm -f "${outfile}"
>+
>+	if [[ "${result}" != TEST ]]; then
>+		return $KSFT_PASS
>+	fi
>+
>+	return $KSFT_FAIL
>+}
>+
>+test_diff_ns_global_vm_connect_to_global_host_ok() {
>+	local ns0="global0"
>+	local ns1="global1"
>+	local port=12345
>+	local unixfile
>+	local pidfile
>+	local pids
>+
>+	declare -a pids
>+
>+	log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
>+
>+	unixfile=$(mktemp -u /tmp/XXXX.sock)
>+
>+	ip netns exec "${ns0}" \
>+		socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &
>+	pids+=($!)
>+
>+	ip netns exec "${ns1}" \
>+		socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" &
>+	pids+=($!)
>+
>+	log_host "Launching ${VSOCK_TEST} in ns ${ns1}"
>+	host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}"
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
>+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
>+		terminate_pids "${pids[@]}"
>+		rm -f "${unixfile}"
>+		return $KSFT_FAIL
>+	fi
>+
>+	vm_wait_for_ssh "${ns0}"
>+	vm_vsock_test "${ns0}" "client" "10.0.2.2" 2 "${port}"
>+	rc=$?
>+
>+	terminate_pidfiles "${pidfile}"
>+	terminate_pids "${pids[@]}"
>+	rm -f "${unixfile}"
>+
>+	if [[ ! $rc -eq 0 ]]; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	return "${KSFT_PASS}"
>+
>+}
>+
>+test_diff_ns_global_vm_connect_to_local_host_fails() {
>+	local ns0="global0"
>+	local ns1="local0"
>+	local port=12345
>+	local pidfile
>+	local result
>+	local pid
>+
>+	log_host "Launching socat in ns ${ns1}"
>+	outfile=$(mktemp)
>+	ip netns exec "${ns1}" socat VSOCK-LISTEN:${port} STDOUT &> "${outfile}" &
>+	pid=$!
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
>+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
>+		terminate_pids "${pid}"
>+		rm -f "${outfile}"
>+		return $KSFT_FAIL
>+	fi
>+
>+	vm_wait_for_ssh "${ns0}"
>+
>+	vm_ssh "${ns0}" -- \
>+		bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
>+
>+	terminate_pidfiles "${pidfile}"
>+	terminate_pids "${pid}"
>+
>+	result=$(cat "${outfile}")
>+	rm -f "${outfile}"
>+
>+	if [[ "${result}" != TEST ]]; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_diff_ns_local_host_connect_to_local_vm_fails() {
>+	local ns0="local0"
>+	local ns1="local1"
>+	local port=12345
>+	local pidfile
>+	local result
>+	local pid
>+
>+	outfile=$(mktemp)
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	if ! vm_start "${VSOCK_CID}" "${ns1}" "${pidfile}"; then
>+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
>+		return $KSFT_FAIL
>+	fi
>+
>+	vm_wait_for_ssh "${ns1}"
>+	vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
>+	echo TEST | ip netns exec "${ns0}" \
>+		socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
>+
>+	terminate_pidfiles "${pidfile}"
>+
>+	result=$(cat "${outfile}")
>+	rm -f "${outfile}"
>+
>+	if [[ "${result}" != TEST ]]; then
>+		return $KSFT_PASS
>+	fi
>+
>+	return $KSFT_FAIL
>+}
>+
>+test_diff_ns_local_vm_connect_to_local_host_fails() {
>+	local ns0="local0"
>+	local ns1="local1"
>+	local port=12345
>+	local pidfile
>+	local result
>+	local pid
>+
>+	log_host "Launching socat in ns ${ns1}"
>+	outfile=$(mktemp)
>+	ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
>+	pid=$!
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+	if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then
>+		log_host "failed to start vm (cid=${cid}, ns=${ns0})"
>+		rm -f "${outfile}"
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	vm_wait_for_ssh "${ns0}"
>+
>+	vm_ssh "${ns0}" -- \
>+		bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
>+
>+	terminate_pidfiles "${pidfile}"
>+	terminate_pids "${pid}"
>+
>+	result=$(cat "${outfile}")
>+	rm -f "${outfile}"
>+
>+	if [[ "${result}" != TEST ]]; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+__test_loopback_two_netns() {
>+	local ns0=$1
>+	local ns1=$2
>+	local port=12345
>+	local result
>+	local pid
>+
>+	modprobe vsock_loopback &> /dev/null || :
>+
>+	log_host "Launching socat in ns ${ns1}"
>+	outfile=$(mktemp)
>+	ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null &
>+	pid=$!
>+
>+	log_host "Launching socat in ns ${ns0}"
>+	echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null
>+	terminate_pids "${pid}"
>+
>+	result=$(cat "${outfile}")
>+	rm -f "${outfile}"
>+
>+	if [[ "${result}" == TEST ]]; then
>+		return 0
>+	fi
>+
>+	return 1
>+}
>+
>+test_diff_ns_global_to_local_loopback_local_fails() {
>+	if ! __test_loopback_two_netns "global0" "local0"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_diff_ns_local_to_global_loopback_fails() {
>+	if ! __test_loopback_two_netns "local0" "global0"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_diff_ns_local_to_local_loopback_fails() {
>+	if ! __test_loopback_two_netns "local0" "local1"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_diff_ns_global_to_global_loopback_ok() {
>+	if __test_loopback_two_netns "global0" "global1"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_same_ns_local_loopback_ok() {
>+	if __test_loopback_two_netns "local0" "local0"; then
>+		return "${KSFT_PASS}"
>+	fi
>+
>+	return "${KSFT_FAIL}"
>+}
>+
>+test_same_ns_local_host_connect_to_local_vm_ok() {
>+	local ns="local0"
>+	local port=1234
>+	local pidfile
>+	local rc
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+
>+	if ! vm_start "${VSOCK_CID}" "${ns}" "${pidfile}"; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
>+	host_vsock_test "${ns}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
>+	rc=$?
>+
>+	terminate_pidfiles "${pidfile}"
>+
>+	if [[ $rc -ne 0 ]]; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	return "${KSFT_PASS}"
>+}
>+
>+test_same_ns_local_vm_connect_to_local_host_ok() {
>+	local ns="local0"
>+	local port=1234
>+	local pidfile
>+	local rc
>+
>+	pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
>+
>+	if ! vm_start "${VSOCK_CID}" "${ns}" "${pidfile}"; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
>+	host_vsock_test "${ns}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
>+	rc=$?
>+
>+	terminate_pidfiles "${pidfile}"
>+
>+	if [[ $rc -ne 0 ]]; then
>+		return "${KSFT_FAIL}"
>+	fi
>+
>+	return "${KSFT_PASS}"
>+}
>+
>+shared_vm_test() {
>+	local tname
>+
>+	tname="${1}"
>+
>+	for testname in "${USE_SHARED_VM[@]}"; do
>+		if [[ "${tname}" == "${testname}" ]]; then
>+			return 0
>+		fi
>+	done
>+
>+	return 1
> }
>
>-run_test() {
>+
>+init_netns_test() {
>+	local tname
>+
>+	tname="${1}"
>+
>+	for testname in "${USE_INIT_NETNS[@]}"; do
>+		if [[ "${tname}" == "${testname}" ]]; then
>+			return 0
>+		fi
>+	done
>+
>+	return 1
>+}
>+
>+check_result() {
>+	local rc num
>+
>+	rc=$1
>+	num=$(( cnt_total + 1 ))
>+
>+	if [[ ${rc} -eq $KSFT_PASS ]]; then
>+		cnt_pass=$(( cnt_pass + 1 ))
>+		echo "ok ${num} ${arg}"
>+	elif [[ ${rc} -eq $KSFT_SKIP ]]; then
>+		cnt_skip=$(( cnt_skip + 1 ))
>+		echo "ok ${num} ${arg} # SKIP"
>+	elif [[ ${rc} -eq $KSFT_FAIL ]]; then
>+		cnt_fail=$(( cnt_fail + 1 ))
>+		echo "not ok ${num} ${arg} # exit=$rc"
>+	fi
>+
>+	cnt_total=$(( cnt_total + 1 ))
>+}
>+
>+run_shared_vm_tests() {
>+	local start_shared_vm pidfile
> 	local host_oops_cnt_before
> 	local host_warn_cnt_before
> 	local vm_oops_cnt_before
>@@ -483,42 +1156,93 @@ run_test() {
> 	local name
> 	local rc
>
>-	host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
>-	host_warn_cnt_before=$(dmesg --level=warn | wc -l)
>-	vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
>-	vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l)
>+	start_shared_vm=0
>
>-	name=$(echo "${1}" | awk '{ print $1 }')
>-	eval test_"${name}"
>-	rc=$?
>+	for arg in "${ARGS[@]}"; do
>+		if shared_vm_test "${arg}"; then
>+			start_shared_vm=1
>+			break
>+		fi
>+	done
>
>-	host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
>-	if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
>-		echo "FAIL: kernel oops detected on host" | log_host "${name}"
>-		rc=$KSFT_FAIL
>+	pidfile=""
>+	if [[ "${start_shared_vm}" == 1 ]]; then
>+		pidfile=$(mktemp $PIDFILE_TEMPLATE)
>+		log_host "Booting up VM"
>+		vm_start "${VSOCK_CID}" "none" "${pidfile}"
>+		vm_wait_for_ssh "none"
>+		log_host "VM booted up"
> 	fi
>
>-	host_warn_cnt_after=$(dmesg --level=warn | wc -l)
>-	if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
>-		echo "FAIL: kernel warning detected on host" | log_host "${name}"
>-		rc=$KSFT_FAIL
>-	fi
>+	for arg in "${ARGS[@]}"; do
>+		if ! shared_vm_test "${arg}"; then
>+			continue
>+		fi
>
>-	vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
>-	if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
>-		echo "FAIL: kernel oops detected on vm" | log_host "${name}"
>-		rc=$KSFT_FAIL
>-	fi
>+		host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
>+		host_warn_cnt_before=$(dmesg --level=warn | wc -l)
>+		vm_oops_cnt_before=$(vm_ssh none -- dmesg | grep -c -i 'Oops')
>+		vm_warn_cnt_before=$(vm_ssh none -- dmesg --level=warn | wc -l)
>+
>+		name=$(echo "${arg}" | awk '{ print $1 }')
>+		log_host "Executing test_${name}"
>+		eval test_"${name}"
>+		rc=$?
>+
>+		host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
>+		if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
>+			echo "FAIL: kernel oops detected on host" | log_host "${name}"
>+			rc=$KSFT_FAIL
>+		fi
>+
>+		host_warn_cnt_after=$(dmesg --level=warn | wc -l)
>+		if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
>+			echo "FAIL: kernel warning detected on host" | log_host "${name}"
>+			rc=$KSFT_FAIL
>+		fi
>+
>+		vm_oops_cnt_after=$(vm_ssh none -- dmesg | grep -i 'Oops' | wc -l)
>+		if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
>+			echo "FAIL: kernel oops detected on vm" | log_host "${name}"
>+			rc=$KSFT_FAIL
>+		fi
>+
>+		vm_warn_cnt_after=$(vm_ssh none -- dmesg --level=warn | wc -l)
>+		if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
>+			echo "FAIL: kernel warning detected on vm" | log_host "${name}"
>+			rc=$KSFT_FAIL
>+		fi
>
>-	vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l)
>-	if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
>-		echo "FAIL: kernel warning detected on vm" | log_host "${name}"
>-		rc=$KSFT_FAIL
>+		check_result "${rc}"
>+	done
>+
>+	if [[ -n "${pidfile}" ]]; then
>+		log_host "VM terminate"
>+		terminate_pidfiles "${pidfile}"
> 	fi
>+}
>+
>+run_isolated_vm_tests() {
>+	for arg in "${ARGS[@]}"; do
>+		if shared_vm_test "${arg}"; then
>+			continue
>+		fi
>
>-	return "${rc}"
>+		add_namespaces
>+		if init_netns_test "${arg}"; then
>+			init_namespaces
>+		fi
>+
>+		name=$(echo "${arg}" | awk '{ print $1 }')
>+		log_host "Executing test_${name}"
>+		eval test_"${name}"
>+		check_result $?
>+
>+		del_namespaces
>+	done
> }
>
>+BUILD=0
> QEMU="qemu-system-$(uname -m)"
>
> while getopts :hvsq:b o
>@@ -543,34 +1267,13 @@ fi
> check_args "${ARGS[@]}"
> check_deps
> check_vng
>+check_socat
> handle_build
>
> echo "1..${#ARGS[@]}"
>
>-log_host "Booting up VM"
>-vm_start
>-vm_wait_for_ssh
>-log_host "VM booted up"
>-
>-cnt_pass=0
>-cnt_fail=0
>-cnt_skip=0
>-cnt_total=0
>-for arg in "${ARGS[@]}"; do
>-	run_test "${arg}"
>-	rc=$?
>-	if [[ ${rc} -eq $KSFT_PASS ]]; then
>-		cnt_pass=$(( cnt_pass + 1 ))
>-		echo "ok ${cnt_total} ${arg}"
>-	elif [[ ${rc} -eq $KSFT_SKIP ]]; then
>-		cnt_skip=$(( cnt_skip + 1 ))
>-		echo "ok ${cnt_total} ${arg} # SKIP"
>-	elif [[ ${rc} -eq $KSFT_FAIL ]]; then
>-		cnt_fail=$(( cnt_fail + 1 ))
>-		echo "not ok ${cnt_total} ${arg} # exit=$rc"
>-	fi
>-	cnt_total=$(( cnt_total + 1 ))
>-done
>+run_shared_vm_tests
>+run_isolated_vm_tests
>
> echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
> echo "Log: ${LOG}"
>
>-- 
>2.47.3
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 3/9] vsock: add netns to vsock core
  2025-09-02 15:39   ` Stefano Garzarella
@ 2025-09-02 17:10     ` Bobby Eshleman
  0 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-09-02 17:10 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Tue, Sep 02, 2025 at 05:39:10PM +0200, Stefano Garzarella wrote:
> On Wed, Aug 27, 2025 at 05:31:31PM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>

...

> > {
> > 	enum vsock_net_mode ret;
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 0538948d5fd9..68a8875c8106 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -83,6 +83,24 @@
> >  *   TCP_ESTABLISHED - connected
> >  *   TCP_CLOSING - disconnecting
> >  *   TCP_LISTEN - listening
> > + *
> > + * - Namespaces in vsock support two different modes configured
> > + *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
> > + *   Each mode defines how the namespace interacts with CIDs.
> > + *   /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
> > + *   and locked down by a namespace manager. The default is "global". The mode
> > + *   is set per-namespace.
> > + *
> > + *   The modes affect the allocation and accessibility of CIDs as follows:
> > + *   - global - aka fully public
> > + *      - CID allocation draws from the public pool
> > + *      - AF_VSOCK sockets may reach any CID allocated from the public pool
> > + *      - AF_VSOCK sockets may not reach CIDs allocated from private
> > pools
> 
> Should we define what public and private pools are?
> 
> What I found difficult to understand was the allocation of CIDs, meaning I
> had to reread it two or three times to perhaps understand it.
> 
> IIUC, netns with mode=global can only allocate public CIDs, while netns with
> mode=local can only allocate private CIDs, right?
> 

Correct.

> Perhaps we should first better define how CIDs are allocated and then
> explain the interaction between them.
> 

Makes sense, I'll clarify that.

> > + *
> > + *   - local - aka fully private
> > + *     - CID allocation draws only from the private pool, does not affect public pool
> > + *     - AF_VSOCK sockets may only reach CIDs from the private pool
> > + *     - AF_VSOCK sockets may not reach CIDs allocated from outside the pool
> 
> Why using "may" ? I mean, can be cases when this is not true?
> 


Good point, will change to stronger language since it is always true.

[...]

> > 
> > @@ -2636,6 +2670,137 @@ static struct miscdevice vsock_device = {
> > 	.fops		= &vsock_device_ops,
> > };
> > 
> > +#define VSOCK_NET_MODE_STRING_MAX 7
> > +
> > +static int vsock_net_mode_string(const struct ctl_table *table, int write,
> > +				 void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	char buf[VSOCK_NET_MODE_STRING_MAX] = {0};
> 
> Can we change `buf` name?
> 
> I find it confusing to have both a `buffer` variable and a `buf` variable in
> the same function.
> 

Makes sense, will do.

> > +	enum vsock_net_mode mode;
> > +	struct ctl_table tmp;
> > +	struct net *net;
> > +	const char *p;
> 
> Can we move `p` declaration in the `if (!write) {` block?
> 

yes.

> > +	int ret;
> > +
> > +	if (!table->data || !table->maxlen || !*lenp) {
> > +		*lenp = 0;
> > +		return 0;
> > +	}
> > +
> > +	net = current->nsproxy->net_ns;
> > +	tmp = *table;
> > +	tmp.data = buf;
> > +
> > +	if (!write) {
> > +		mode = vsock_net_mode(net);
> > +
> > +		if (mode == VSOCK_NET_MODE_GLOBAL) {
> > +			p = "global";
> > +		} else if (mode == VSOCK_NET_MODE_LOCAL) {
> > +			p = "local";
> > +		} else {
> > +			WARN_ONCE(true, "netns has invalid vsock mode");
> > +			*lenp = 0;
> > +			return 0;
> > +		}
> > +
> > +		strscpy(buf, p, sizeof(buf));
> > +		tmp.maxlen = strlen(p);
> > +	}
> > +
> > +	ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (write) {
> > +		if (!strncmp(buffer, "global", 6))
> 
> Are we sure that the `buffer` is at least 6 bytes long and NULL-terminated?
> 
> Maybe we can just check that `lenp <= sizeof(buf)`...
> 
> Should we add macros for "global" and "local" ?
> 

That all sounds reasonable. IIRC I tested with some garbage writes, but might
as well err on the side of caution.

> 
> > +			mode = VSOCK_NET_MODE_GLOBAL;
> > +		else if (!strncmp(buffer, "local", 5))
> > +			mode = VSOCK_NET_MODE_LOCAL;
> > +		else
> > +			return -EINVAL;
> > +
> > +		if (!vsock_net_write_mode(net, mode))
> > +			return -EPERM;
> > +	}
> > +
> > +	return 0;
> > +}
> > +

...


Thanks for the review!

Best,
Bobby

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 4/9] vsock/loopback: add netns support
  2025-09-02 15:39   ` Stefano Garzarella
@ 2025-09-02 18:09     ` Bobby Eshleman
  2025-09-03 15:10       ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Bobby Eshleman @ 2025-09-02 18:09 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Tue, Sep 02, 2025 at 05:39:33PM +0200, Stefano Garzarella wrote:
> On Wed, Aug 27, 2025 at 05:31:32PM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Add NS support to vsock loopback. Sockets in a global mode netns
> > communicate with each other, regardless of namespace. Sockets in a local
> > mode netns may only communicate with other sockets within the same
> > namespace.
> > 
> > Add callbacks for transport to hook into the initialization and exit of
> > net namespaces.
> > 
> > The transport's init hook will be called once per netns init. Likewise
> > for exit.
> > 
> > When a set of init/exit callbacks is registered, the init callback is
> > called on each already existing namespace.
> > 
> > Only one callback registration is supported for now. Currently
> > vsock_loopback is the only user.
> 
> Why?
> 
> In general, commit descriptions (and code comments) should focus on the
> reason (why?) to simplify also the review.
> 

Sounds good, will improve the message/comments. I'm realizing as I type
this there may be a way to avoid the callbacks altogether with
pernet_operations, so I'll clarify that before next rev.

> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > ---
> > Changes in v5:
> > - add callbacks code to avoid reverse dependency
> > - add logic for handling vsock_loopback setup for already existing
> >  namespaces
> > ---
> > include/net/af_vsock.h         |  34 +++++++++++++
> > include/net/netns/vsock.h      |   5 ++
> > net/vmw_vsock/af_vsock.c       | 110 +++++++++++++++++++++++++++++++++++++++++
> > net/vmw_vsock/vsock_loopback.c |  72 ++++++++++++++++++++++++---
> > 4 files changed, 213 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 83f873174ba3..9333a98b9a1e 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -305,4 +305,38 @@ static inline bool vsock_net_check_mode(struct net *n1, struct net *n2)
> > 	       (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL &&
> > 		vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL);
> > }
> > +
> > +struct vsock_net_callbacks {
> > +	int (*init)(struct net *net);
> > +	void (*exit)(struct net *net);
> > +	struct module *owner;
> > +};
> > +
> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
> > +
> > +#define vsock_register_net_callbacks(__init, __exit) \
> > +	__vsock_register_net_callbacks((__init), (__exit), THIS_MODULE)
> > +
> > +int __vsock_register_net_callbacks(int (*init)(struct net *net),
> > +				   void (*exit)(struct net *net),
> > +				   struct module *owner);
> > +void vsock_unregister_net_callbacks(void);
> > +
> > +#else
> > +
> > +#define vsock_register_net_callbacks(__init, __exit) do { } while (0)
> > +
> > +static inline int __vsock_register_net_callbacks(int (*init)(struct net *net),
> > +						 void (*exit)(struct net *net),
> > +						 struct module *owner)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void vsock_unregister_net_callbacks(void) {}
> > +static inline int vsock_net_call_init(struct net *net) { return 0; }
> > +static inline void vsock_net_call_exit(struct net *net) {}
> > +
> > +#endif /* CONFIG_VSOCKETS_LOOPBACK */
> > +
> > #endif /* __AF_VSOCK_H__ */
> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> > index d4593c0b8dc4..08d9a933c540 100644
> > --- a/include/net/netns/vsock.h
> > +++ b/include/net/netns/vsock.h
> > @@ -9,6 +9,8 @@ enum vsock_net_mode {
> > 	VSOCK_NET_MODE_LOCAL,
> > };
> > 
> > +struct vsock_loopback;
> > +
> > struct netns_vsock {
> > 	struct ctl_table_header *vsock_hdr;
> > 	spinlock_t lock;
> > @@ -16,5 +18,8 @@ struct netns_vsock {
> > 	/* protected by lock */
> > 	enum vsock_net_mode mode;
> > 	bool written;
> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
> > +	struct vsock_loopback *loopback;
> 
> If this is not protected by `lock`, please leave an empty line, but maybe we
> should consider using locking (see comment later).
> 

Will do.

> > +#endif
> > };
> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 68a8875c8106..5a73d9e1a96f 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -134,6 +134,9 @@
> > #include <uapi/linux/vm_sockets.h>
> > #include <uapi/asm-generic/ioctls.h>
> > 
> > +static struct vsock_net_callbacks vsock_net_callbacks;
> > +static DEFINE_MUTEX(vsock_net_callbacks_lock);
> > +
> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> > static void vsock_sk_destruct(struct sock *sk);
> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> > @@ -2781,6 +2784,49 @@ static void vsock_net_init(struct net *net)
> > 	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
> > }
> > 
> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
> > +static int vsock_net_call_init(struct net *net)
> > +{
> > +	struct vsock_net_callbacks *cbs;
> > +	int ret;
> > +
> > +	mutex_lock(&vsock_net_callbacks_lock);
> > +	cbs = &vsock_net_callbacks;
> > +
> > +	ret = 0;
> > +	if (!cbs->owner)
> > +		goto out;
> > +
> > +	if (try_module_get(cbs->owner)) {
> > +		ret = cbs->init(net);
> > +		module_put(cbs->owner);
> > +	}
> > +
> > +out:
> > +	mutex_unlock(&vsock_net_callbacks_lock);
> > +	return ret;
> > +}
> > +
> > +static void vsock_net_call_exit(struct net *net)
> > +{
> > +	struct vsock_net_callbacks *cbs;
> > +
> > +	mutex_lock(&vsock_net_callbacks_lock);
> > +	cbs = &vsock_net_callbacks;
> > +
> > +	if (!cbs->owner)
> > +		goto out;
> > +
> > +	if (try_module_get(cbs->owner)) {
> > +		cbs->exit(net);
> > +		module_put(cbs->owner);
> > +	}
> > +
> > +out:
> > +	mutex_unlock(&vsock_net_callbacks_lock);
> > +}
> > +#endif /* CONFIG_VSOCKETS_LOOPBACK */
> > +
> > static __net_init int vsock_sysctl_init_net(struct net *net)
> > {
> > 	vsock_net_init(net);
> > @@ -2788,12 +2834,20 @@ static __net_init int vsock_sysctl_init_net(struct net *net)
> > 	if (vsock_sysctl_register(net))
> > 		return -ENOMEM;
> > 
> > +	if (vsock_net_call_init(net) < 0)
> > +		goto err_sysctl;
> > +
> > 	return 0;
> > +
> > +err_sysctl:
> > +	vsock_sysctl_unregister(net);
> > +	return -ENOMEM;
> > }
> > 
> > static __net_exit void vsock_sysctl_exit_net(struct net *net)
> > {
> > 	vsock_sysctl_unregister(net);
> > +	vsock_net_call_exit(net);
> > }
> > 
> > static struct pernet_operations vsock_sysctl_ops __net_initdata = {
> > @@ -2938,6 +2992,62 @@ void vsock_core_unregister(const struct
> > vsock_transport *t)
> > }
> > EXPORT_SYMBOL_GPL(vsock_core_unregister);
> > 
> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
> > +int __vsock_register_net_callbacks(int (*init)(struct net *net),
> > +				   void (*exit)(struct net *net),
> > +				   struct module *owner)
> > +{
> > +	struct vsock_net_callbacks *cbs;
> > +	struct net *net;
> > +	int ret = 0;
> > +
> > +	mutex_lock(&vsock_net_callbacks_lock);
> > +
> > +	cbs = &vsock_net_callbacks;
> > +	cbs->init = init;
> > +	cbs->exit = exit;
> > +	cbs->owner = owner;
> > +
> > +	/* call callbacks on any net previously created */
> > +	down_read(&net_rwsem);
> > +
> > +	if (try_module_get(cbs->owner)) {
> > +		for_each_net(net) {
> > +			ret = cbs->init(net);
> > +			if (ret < 0)
> > +				break;
> > +		}
> > +
> > +		if (ret < 0)
> > +			for_each_net(net)
> > +				cbs->exit(net);
> > +
> > +		module_put(cbs->owner);
> > +	}
> > +
> > +	up_read(&net_rwsem);
> > +	mutex_unlock(&vsock_net_callbacks_lock);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(__vsock_register_net_callbacks);
> > +
> > +void vsock_unregister_net_callbacks(void)
> > +{
> > +	struct vsock_net_callbacks *cbs;
> > +
> > +	mutex_lock(&vsock_net_callbacks_lock);
> > +
> > +	cbs = &vsock_net_callbacks;
> > +	cbs->init = NULL;
> > +	cbs->exit = NULL;
> > +	cbs->owner = NULL;
> > +
> > +	mutex_unlock(&vsock_net_callbacks_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(vsock_unregister_net_callbacks);
> 
> IIUC this function is called only in the error path of
> `vsock_loopback_init()`, but shuold we call it also in the
> vsock_loopback_exit() ?
> 

Ah right, that needs to be done there too.

> > +#endif /* CONFIG_VSOCKETS_LOOPBACK */
> > +
> > module_init(vsock_init);
> > module_exit(vsock_exit);
> > 
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 1b2fab73e0d0..f16d21711cb0 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -28,8 +28,19 @@ static u32 vsock_loopback_get_local_cid(void)
> > 
> > static int vsock_loopback_send_pkt(struct sk_buff *skb)
> > {
> > -	struct vsock_loopback *vsock = &the_vsock_loopback;
> > +	struct vsock_loopback *vsock;
> > 	int len = skb->len;
> > +	struct net *net;
> > +
> > +	if (skb->sk)
> > +		net = sock_net(skb->sk);
> > +	else
> > +		net = NULL;
> 
> Why we can't use `virtio_vsock_skb_net` here?
> 

No reason why not. Using it should make it more uniform.

> > +
> > +	if (net && net->vsock.mode == VSOCK_NET_MODE_LOCAL)
> > +		vsock = net->vsock.loopback;
> > +	else
> > +		vsock = &the_vsock_loopback;
> > 
> > 	virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
> > 	queue_work(vsock->workqueue, &vsock->pkt_work);
> > @@ -134,27 +145,72 @@ static void vsock_loopback_work(struct work_struct *work)
> > 	}
> > }
> > 
> > -static int __init vsock_loopback_init(void)
> > +static int vsock_loopback_init_vsock(struct vsock_loopback *vsock)
> > {
> > -	struct vsock_loopback *vsock = &the_vsock_loopback;
> > -	int ret;
> > -
> > 	vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0);
> > 	if (!vsock->workqueue)
> > 		return -ENOMEM;
> > 
> > 	skb_queue_head_init(&vsock->pkt_queue);
> > 	INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
> > +	return 0;
> > +}
> > +
> > +static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock)
> > +{
> > +	if (vsock->workqueue)
> > +		destroy_workqueue(vsock->workqueue);
> > +}
> > +
> > +/* called with vsock_net_callbacks lock held */
> > +static int vsock_loopback_init_net(struct net *net)
> > +{
> > +	if (WARN_ON_ONCE(net->vsock.loopback))
> > +		return 0;
> > +
> 
> Do we need some kind of locking here? I mean when reading/setting
> `net->vsock.loopback`?
> 

I could be wrong here, but I think net->vsock.loopback being set before
vsock_core_register() prevents racing with net->vsock.loopback reads. We
could add a lock to make sure and to make the protection explicit
though.

> > +	net->vsock.loopback = kmalloc(sizeof(*net->vsock.loopback),
> > GFP_KERNEL);
> > +	if (!net->vsock.loopback)
> > +		return -ENOMEM;
> > +
> > +	return vsock_loopback_init_vsock(net->vsock.loopback);
> > +}
> > +
> > +/* called with vsock_net_callbacks lock held */
> > +static void vsock_loopback_exit_net(struct net *net)
> > +{
> > +	if (net->vsock.loopback) {
> > +		vsock_loopback_deinit_vsock(net->vsock.loopback);
> > +		kfree(net->vsock.loopback);
> 
> Should we set `net->vsock.loopback` to NULL here?
> 

Yes.

> > +	}
> > +}
> > +
> > +static int __init vsock_loopback_init(void)
> > +{
> > +	struct vsock_loopback *vsock = &the_vsock_loopback;
> > +	int ret;
> > +
> > +	ret = vsock_loopback_init_vsock(vsock);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = vsock_register_net_callbacks(vsock_loopback_init_net,
> > +					   vsock_loopback_exit_net);
> 
> IIUC we need this only here because for now the only other transport
> supported is vhost-vsock, and IIUC `struct vhost_vsock *` there is handled
> with a map instead of a static variable, and `net` assigned when
> /dev/vhost-vsock is opened, right?
> 

Correct. The vhost map lookup is modified to account for namespaces, but
vsock loopback doesn't have a map to do that. The callbacks are used to
hook into the netns initialization.

I wonder if it is possible to do this with just pernet_operations
though... when I wrote this I was pretty laser-focused on the
sysctl/procfs + netns init code, and may not have realized there may be
similar hooks that aren't bound to the sysctl/proc init. I'll clarify
this before the next rev.


> If in the future we will need to support G2H transports, like
> virtio-transport, we need to do something similar, right?
> 

My guess is that we'll be able to avoid using these callbacks unless
there is some per-net data we need to initialize. I'm guessing if we
follow a similar path as using ioctl to assign the dev netns, then we
won't need it. It might be moot if pernet_operations work to avoid the
module circular dependency.

> BTW I think we really need to exaplin this better in the commit description.
> It tooks me a while to get all of this (if it's correct)
> 

Roger that, I'll improve this going forward.

Best,
Bobby

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 9/9] selftests/vsock: add namespace tests
  2025-09-02 15:40   ` Stefano Garzarella
@ 2025-09-02 18:10     ` Bobby Eshleman
  0 siblings, 0 replies; 18+ messages in thread
From: Bobby Eshleman @ 2025-09-02 18:10 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Tue, Sep 02, 2025 at 05:40:38PM +0200, Stefano Garzarella wrote:
> On Wed, Aug 27, 2025 at 05:31:37PM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Add tests for namespace support in vsock. Use socat for basic connection
> 
> Are netns tests skipped if the kernel doesn't support it?

No, will fix in next rev.

Thanks,
Bobby

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5 4/9] vsock/loopback: add netns support
  2025-09-02 18:09     ` Bobby Eshleman
@ 2025-09-03 15:10       ` Stefano Garzarella
  0 siblings, 0 replies; 18+ messages in thread
From: Stefano Garzarella @ 2025-09-03 15:10 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Shuah Khan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, virtualization, netdev,
	linux-kselftest, linux-kernel, kvm, linux-hyperv, berrange,
	Bobby Eshleman

On Tue, Sep 02, 2025 at 11:09:36AM -0700, Bobby Eshleman wrote:
>On Tue, Sep 02, 2025 at 05:39:33PM +0200, Stefano Garzarella wrote:
>> On Wed, Aug 27, 2025 at 05:31:32PM -0700, Bobby Eshleman wrote:
>> > From: Bobby Eshleman <bobbyeshleman@meta.com>
>> >
>> > Add NS support to vsock loopback. Sockets in a global mode netns
>> > communicate with each other, regardless of namespace. Sockets in a local
>> > mode netns may only communicate with other sockets within the same
>> > namespace.
>> >
>> > Add callbacks for transport to hook into the initialization and exit of
>> > net namespaces.
>> >
>> > The transport's init hook will be called once per netns init. Likewise
>> > for exit.
>> >
>> > When a set of init/exit callbacks is registered, the init callback is
>> > called on each already existing namespace.
>> >
>> > Only one callback registration is supported for now. Currently
>> > vsock_loopback is the only user.
>>
>> Why?
>>
>> In general, commit descriptions (and code comments) should focus on the
>> reason (why?) to simplify also the review.
>>
>
>Sounds good, will improve the message/comments. I'm realizing as I type
>this there may be a way to avoid the callbacks altogether with
>pernet_operations, so I'll clarify that before next rev.

Yeah, that would be great.

>
>> >
>> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>> >
>> > ---
>> > Changes in v5:
>> > - add callbacks code to avoid reverse dependency
>> > - add logic for handling vsock_loopback setup for already existing
>> >  namespaces
>> > ---
>> > include/net/af_vsock.h         |  34 +++++++++++++
>> > include/net/netns/vsock.h      |   5 ++
>> > net/vmw_vsock/af_vsock.c       | 110 +++++++++++++++++++++++++++++++++++++++++
>> > net/vmw_vsock/vsock_loopback.c |  72 ++++++++++++++++++++++++---
>> > 4 files changed, 213 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > index 83f873174ba3..9333a98b9a1e 100644
>> > --- a/include/net/af_vsock.h
>> > +++ b/include/net/af_vsock.h
>> > @@ -305,4 +305,38 @@ static inline bool vsock_net_check_mode(struct net *n1, struct net *n2)
>> > 	       (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL &&
>> > 		vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL);
>> > }
>> > +
>> > +struct vsock_net_callbacks {
>> > +	int (*init)(struct net *net);
>> > +	void (*exit)(struct net *net);
>> > +	struct module *owner;
>> > +};
>> > +
>> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>> > +
>> > +#define vsock_register_net_callbacks(__init, __exit) \
>> > +	__vsock_register_net_callbacks((__init), (__exit), THIS_MODULE)
>> > +
>> > +int __vsock_register_net_callbacks(int (*init)(struct net *net),
>> > +				   void (*exit)(struct net *net),
>> > +				   struct module *owner);
>> > +void vsock_unregister_net_callbacks(void);
>> > +
>> > +#else
>> > +
>> > +#define vsock_register_net_callbacks(__init, __exit) do { } while (0)
>> > +
>> > +static inline int __vsock_register_net_callbacks(int (*init)(struct net *net),
>> > +						 void (*exit)(struct net *net),
>> > +						 struct module *owner)
>> > +{
>> > +	return 0;
>> > +}
>> > +
>> > +static inline void vsock_unregister_net_callbacks(void) {}
>> > +static inline int vsock_net_call_init(struct net *net) { return 0; }
>> > +static inline void vsock_net_call_exit(struct net *net) {}
>> > +
>> > +#endif /* CONFIG_VSOCKETS_LOOPBACK */
>> > +
>> > #endif /* __AF_VSOCK_H__ */
>> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>> > index d4593c0b8dc4..08d9a933c540 100644
>> > --- a/include/net/netns/vsock.h
>> > +++ b/include/net/netns/vsock.h
>> > @@ -9,6 +9,8 @@ enum vsock_net_mode {
>> > 	VSOCK_NET_MODE_LOCAL,
>> > };
>> >
>> > +struct vsock_loopback;
>> > +
>> > struct netns_vsock {
>> > 	struct ctl_table_header *vsock_hdr;
>> > 	spinlock_t lock;
>> > @@ -16,5 +18,8 @@ struct netns_vsock {
>> > 	/* protected by lock */
>> > 	enum vsock_net_mode mode;
>> > 	bool written;
>> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>> > +	struct vsock_loopback *loopback;
>>
>> If this is not protected by `lock`, please leave an empty line, but maybe we
>> should consider using locking (see comment later).
>>
>
>Will do.
>
>> > +#endif
>> > };
>> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > index 68a8875c8106..5a73d9e1a96f 100644
>> > --- a/net/vmw_vsock/af_vsock.c
>> > +++ b/net/vmw_vsock/af_vsock.c
>> > @@ -134,6 +134,9 @@
>> > #include <uapi/linux/vm_sockets.h>
>> > #include <uapi/asm-generic/ioctls.h>
>> >
>> > +static struct vsock_net_callbacks vsock_net_callbacks;
>> > +static DEFINE_MUTEX(vsock_net_callbacks_lock);
>> > +
>> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
>> > static void vsock_sk_destruct(struct sock *sk);
>> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>> > @@ -2781,6 +2784,49 @@ static void vsock_net_init(struct net *net)
>> > 	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
>> > }
>> >
>> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>> > +static int vsock_net_call_init(struct net *net)
>> > +{
>> > +	struct vsock_net_callbacks *cbs;
>> > +	int ret;
>> > +
>> > +	mutex_lock(&vsock_net_callbacks_lock);
>> > +	cbs = &vsock_net_callbacks;
>> > +
>> > +	ret = 0;
>> > +	if (!cbs->owner)
>> > +		goto out;
>> > +
>> > +	if (try_module_get(cbs->owner)) {
>> > +		ret = cbs->init(net);
>> > +		module_put(cbs->owner);
>> > +	}
>> > +
>> > +out:
>> > +	mutex_unlock(&vsock_net_callbacks_lock);
>> > +	return ret;
>> > +}
>> > +
>> > +static void vsock_net_call_exit(struct net *net)
>> > +{
>> > +	struct vsock_net_callbacks *cbs;
>> > +
>> > +	mutex_lock(&vsock_net_callbacks_lock);
>> > +	cbs = &vsock_net_callbacks;
>> > +
>> > +	if (!cbs->owner)
>> > +		goto out;
>> > +
>> > +	if (try_module_get(cbs->owner)) {
>> > +		cbs->exit(net);
>> > +		module_put(cbs->owner);
>> > +	}
>> > +
>> > +out:
>> > +	mutex_unlock(&vsock_net_callbacks_lock);
>> > +}
>> > +#endif /* CONFIG_VSOCKETS_LOOPBACK */
>> > +
>> > static __net_init int vsock_sysctl_init_net(struct net *net)
>> > {
>> > 	vsock_net_init(net);
>> > @@ -2788,12 +2834,20 @@ static __net_init int vsock_sysctl_init_net(struct net *net)
>> > 	if (vsock_sysctl_register(net))
>> > 		return -ENOMEM;
>> >
>> > +	if (vsock_net_call_init(net) < 0)
>> > +		goto err_sysctl;
>> > +
>> > 	return 0;
>> > +
>> > +err_sysctl:
>> > +	vsock_sysctl_unregister(net);
>> > +	return -ENOMEM;
>> > }
>> >
>> > static __net_exit void vsock_sysctl_exit_net(struct net *net)
>> > {
>> > 	vsock_sysctl_unregister(net);
>> > +	vsock_net_call_exit(net);
>> > }
>> >
>> > static struct pernet_operations vsock_sysctl_ops __net_initdata = {
>> > @@ -2938,6 +2992,62 @@ void vsock_core_unregister(const struct
>> > vsock_transport *t)
>> > }
>> > EXPORT_SYMBOL_GPL(vsock_core_unregister);
>> >
>> > +#if IS_ENABLED(CONFIG_VSOCKETS_LOOPBACK)
>> > +int __vsock_register_net_callbacks(int (*init)(struct net *net),
>> > +				   void (*exit)(struct net *net),
>> > +				   struct module *owner)
>> > +{
>> > +	struct vsock_net_callbacks *cbs;
>> > +	struct net *net;
>> > +	int ret = 0;
>> > +
>> > +	mutex_lock(&vsock_net_callbacks_lock);
>> > +
>> > +	cbs = &vsock_net_callbacks;
>> > +	cbs->init = init;
>> > +	cbs->exit = exit;
>> > +	cbs->owner = owner;
>> > +
>> > +	/* call callbacks on any net previously created */
>> > +	down_read(&net_rwsem);
>> > +
>> > +	if (try_module_get(cbs->owner)) {
>> > +		for_each_net(net) {
>> > +			ret = cbs->init(net);
>> > +			if (ret < 0)
>> > +				break;
>> > +		}
>> > +
>> > +		if (ret < 0)
>> > +			for_each_net(net)
>> > +				cbs->exit(net);
>> > +
>> > +		module_put(cbs->owner);
>> > +	}
>> > +
>> > +	up_read(&net_rwsem);
>> > +	mutex_unlock(&vsock_net_callbacks_lock);
>> > +
>> > +	return ret;
>> > +}
>> > +EXPORT_SYMBOL_GPL(__vsock_register_net_callbacks);
>> > +
>> > +void vsock_unregister_net_callbacks(void)
>> > +{
>> > +	struct vsock_net_callbacks *cbs;
>> > +
>> > +	mutex_lock(&vsock_net_callbacks_lock);
>> > +
>> > +	cbs = &vsock_net_callbacks;
>> > +	cbs->init = NULL;
>> > +	cbs->exit = NULL;
>> > +	cbs->owner = NULL;
>> > +
>> > +	mutex_unlock(&vsock_net_callbacks_lock);
>> > +}
>> > +EXPORT_SYMBOL_GPL(vsock_unregister_net_callbacks);
>>
>> IIUC this function is called only in the error path of
>> `vsock_loopback_init()`, but shuold we call it also in the
>> vsock_loopback_exit() ?
>>
>
>Ah right, that needs to be done there too.
>
>> > +#endif /* CONFIG_VSOCKETS_LOOPBACK */
>> > +
>> > module_init(vsock_init);
>> > module_exit(vsock_exit);
>> >
>> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>> > index 1b2fab73e0d0..f16d21711cb0 100644
>> > --- a/net/vmw_vsock/vsock_loopback.c
>> > +++ b/net/vmw_vsock/vsock_loopback.c
>> > @@ -28,8 +28,19 @@ static u32 vsock_loopback_get_local_cid(void)
>> >
>> > static int vsock_loopback_send_pkt(struct sk_buff *skb)
>> > {
>> > -	struct vsock_loopback *vsock = &the_vsock_loopback;
>> > +	struct vsock_loopback *vsock;
>> > 	int len = skb->len;
>> > +	struct net *net;
>> > +
>> > +	if (skb->sk)
>> > +		net = sock_net(skb->sk);
>> > +	else
>> > +		net = NULL;
>>
>> Why we can't use `virtio_vsock_skb_net` here?
>>
>
>No reason why not. Using it should make it more uniform.
>
>> > +
>> > +	if (net && net->vsock.mode == VSOCK_NET_MODE_LOCAL)
>> > +		vsock = net->vsock.loopback;
>> > +	else
>> > +		vsock = &the_vsock_loopback;
>> >
>> > 	virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb);
>> > 	queue_work(vsock->workqueue, &vsock->pkt_work);
>> > @@ -134,27 +145,72 @@ static void vsock_loopback_work(struct work_struct *work)
>> > 	}
>> > }
>> >
>> > -static int __init vsock_loopback_init(void)
>> > +static int vsock_loopback_init_vsock(struct vsock_loopback *vsock)
>> > {
>> > -	struct vsock_loopback *vsock = &the_vsock_loopback;
>> > -	int ret;
>> > -
>> > 	vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0);
>> > 	if (!vsock->workqueue)
>> > 		return -ENOMEM;
>> >
>> > 	skb_queue_head_init(&vsock->pkt_queue);
>> > 	INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
>> > +	return 0;
>> > +}
>> > +
>> > +static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock)
>> > +{
>> > +	if (vsock->workqueue)
>> > +		destroy_workqueue(vsock->workqueue);
>> > +}
>> > +
>> > +/* called with vsock_net_callbacks lock held */
>> > +static int vsock_loopback_init_net(struct net *net)
>> > +{
>> > +	if (WARN_ON_ONCE(net->vsock.loopback))
>> > +		return 0;
>> > +
>>
>> Do we need some kind of locking here? I mean when reading/setting
>> `net->vsock.loopback`?
>>
>
>I could be wrong here, but I think net->vsock.loopback being set before
>vsock_core_register() prevents racing with net->vsock.loopback reads. We
>could add a lock to make sure and to make the protection explicit
>though.

I see, talkink about vsock_core_register(), I was thinking about,
extending it, maybe passing a struct with all parameters (e.g. transport
type, net callbacks, etc.). In this way we can easily check if the type
of transport is allowed to register net callbacks or not.

Also because currently we don't do any check in
__vsock_register_net_callbacks() about transport type or even about
overriding calls.

>
>> > +	net->vsock.loopback = kmalloc(sizeof(*net->vsock.loopback),
>> > GFP_KERNEL);
>> > +	if (!net->vsock.loopback)
>> > +		return -ENOMEM;
>> > +
>> > +	return vsock_loopback_init_vsock(net->vsock.loopback);
>> > +}
>> > +
>> > +/* called with vsock_net_callbacks lock held */
>> > +static void vsock_loopback_exit_net(struct net *net)
>> > +{
>> > +	if (net->vsock.loopback) {
>> > +		vsock_loopback_deinit_vsock(net->vsock.loopback);
>> > +		kfree(net->vsock.loopback);
>>
>> Should we set `net->vsock.loopback` to NULL here?
>>
>
>Yes.
>
>> > +	}
>> > +}
>> > +
>> > +static int __init vsock_loopback_init(void)
>> > +{
>> > +	struct vsock_loopback *vsock = &the_vsock_loopback;
>> > +	int ret;
>> > +
>> > +	ret = vsock_loopback_init_vsock(vsock);
>> > +	if (ret < 0)
>> > +		return ret;
>> > +
>> > +	ret = vsock_register_net_callbacks(vsock_loopback_init_net,
>> > +					   vsock_loopback_exit_net);
>>
>> IIUC we need this only here because for now the only other transport
>> supported is vhost-vsock, and IIUC `struct vhost_vsock *` there is handled
>> with a map instead of a static variable, and `net` assigned when
>> /dev/vhost-vsock is opened, right?
>>
>
>Correct. The vhost map lookup is modified to account for namespaces, but
>vsock loopback doesn't have a map to do that. The callbacks are used to
>hook into the netns initialization.
>
>I wonder if it is possible to do this with just pernet_operations
>though... when I wrote this I was pretty laser-focused on the
>sysctl/procfs + netns init code, and may not have realized there may be
>similar hooks that aren't bound to the sysctl/proc init. I'll clarify
>this before the next rev.

I like the idea of removing vsock_register_net_callbacks() if possible,
but if it's not possible I'd like to reuse vsock_core_register() as much
as possible, avoiding to add a new register function that is not clear
when it needs to be called or not by the transport.

So, to be clear, I'd like to have a single registration function that
transports need to call (if possible).

>
>
>> If in the future we will need to support G2H transports, like
>> virtio-transport, we need to do something similar, right?
>>
>
>My guess is that we'll be able to avoid using these callbacks unless
>there is some per-net data we need to initialize. I'm guessing if we
>follow a similar path as using ioctl to assign the dev netns, then we
>won't need it. It might be moot if pernet_operations work to avoid the
>module circular dependency.

Cool!

>
>> BTW I think we really need to exaplin this better in the commit description.
>> It tooks me a while to get all of this (if it's correct)
>>
>
>Roger that, I'll improve this going forward.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-09-03 15:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28  0:31 [PATCH net-next v5 0/9] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 1/9] vsock: a per-net vsock NS mode state Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 2/9] vsock: add net to vsock skb cb Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 3/9] vsock: add netns to vsock core Bobby Eshleman
2025-09-02 15:39   ` Stefano Garzarella
2025-09-02 17:10     ` Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 4/9] vsock/loopback: add netns support Bobby Eshleman
2025-08-28 10:35   ` kernel test robot
2025-09-02 15:39   ` Stefano Garzarella
2025-09-02 18:09     ` Bobby Eshleman
2025-09-03 15:10       ` Stefano Garzarella
2025-08-28  0:31 ` [PATCH net-next v5 5/9] vsock/virtio: add netns to virtio transport common Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 6/9] vhost/vsock: add netns support Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 7/9] selftests/vsock: improve logging in vmtest.sh Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 8/9] selftests/vsock: invoke vsock_test through helpers Bobby Eshleman
2025-08-28  0:31 ` [PATCH net-next v5 9/9] selftests/vsock: add namespace tests Bobby Eshleman
2025-09-02 15:40   ` Stefano Garzarella
2025-09-02 18:10     ` Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).