[PATCH net-next 0/3] vsock: support network namespace

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] vsock: support network namespace
@ 2020-01-16 17:24 Stefano Garzarella
  2020-01-16 17:24 ` [PATCH net-next 1/3] vsock: add network namespace support Stefano Garzarella
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-16 17:24 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-kernel, Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi,
	Stefano Garzarella, virtualization, linux-hyperv,
	Michael S. Tsirkin, Dexuan Cui

RFC -> v1:
 * added 'netns' module param to vsock.ko to enable the
   network namespace support (disabled by default)
 * added 'vsock_net_eq()' to check the "net" assigned to a socket
   only when 'netns' support is enabled

RFC: https://patchwork.ozlabs.org/cover/1202235/

Now that we have multi-transport upstream, I started to take a look to
support network namespace in vsock.

As we partially discussed in the multi-transport proposal [1], it could
be nice to support network namespace in vsock to reach the following
goals:
- isolate host applications from guest applications using the same ports
  with CID_ANY
- assign the same CID of VMs running in different network namespaces
- partition VMs between VMMs or at finer granularity

This new feature is disabled by default, because it changes vsock's
behavior with network namespaces and could break existing applications.
It can be enabled with the new 'netns' module parameter of vsock.ko.

This implementation provides the following behavior:
- packets received from the host (received by G2H transports) are
  assigned to the default netns (init_net)
- packets received from the guest (received by H2G - vhost-vsock) are
  assigned to the netns of the process that opens /dev/vhost-vsock
  (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
    - for vmci I need some suggestions, because I don't know how to do
      and test the same in the vmci driver, for now vmci uses the
      init_net
- loopback packets are exchanged only in the same netns

I tested the series in this way:
l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
            -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
            -device vhost-vsock-pci,guest-cid=3

l1_vm$ echo 1 > /sys/module/vsock/parameters/netns

l1_vm$ ip netns add ns1
l1_vm$ ip netns add ns2
 # same CID on different netns
l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
            -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
            -device vhost-vsock-pci,guest-cid=4
l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
            -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
            -device vhost-vsock-pci,guest-cid=4

 # all iperf3 listen on CID_ANY and port 5201, but in different netns
l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
                           # on default netns (init_net)
l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s

l0_host$ ./iperf3 --vsock -c 3
l2_vm1$ ./iperf3 --vsock -c 2
l2_vm2$ ./iperf3 --vsock -c 2

[1] https://www.spinics.net/lists/netdev/msg575792.html

Stefano Garzarella (3):
  vsock: add network namespace support
  vsock/virtio_transport_common: handle netns of received packets
  vhost/vsock: use netns of process that opens the vhost-vsock device

 drivers/vhost/vsock.c                   | 29 ++++++++++++-----
 include/linux/virtio_vsock.h            |  2 ++
 include/net/af_vsock.h                  |  7 +++--
 net/vmw_vsock/af_vsock.c                | 41 +++++++++++++++++++------
 net/vmw_vsock/hyperv_transport.c        |  5 +--
 net/vmw_vsock/virtio_transport.c        |  2 ++
 net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
 net/vmw_vsock/vmci_transport.c          |  5 +--
 8 files changed, 78 insertions(+), 25 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
@ 2020-01-16 17:24 ` Stefano Garzarella
  2020-01-20  9:06   ` David Miller
  2025-03-05  7:32   ` Michael S. Tsirkin
  2020-01-16 17:24 ` [PATCH net-next 2/3] vsock/virtio_transport_common: handle netns of received packets Stefano Garzarella
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-16 17:24 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-kernel, Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi,
	Stefano Garzarella, virtualization, linux-hyperv,
	Michael S. Tsirkin, Dexuan Cui

This patch adds a check of the "net" assigned to a socket during
the vsock_find_bound_socket() and vsock_find_connected_socket()
to support network namespace, allowing to share the same address
(cid, port) across different network namespaces.

This patch adds 'netns' module param to enable this new feature
(disabled by default), because it changes vsock's behavior with
network namespaces and could break existing applications.

G2H transports will use the default network namepsace (init_net).
H2G transports can use different network namespace for different
VMs.

This patch uses default network namepsace (init_net) in all
transports.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
RFC -> v1
 * added 'netns' module param
 * added 'vsock_net_eq()' to check the "net" assigned to a socket
   only when 'netns' support is enabled
---
 include/net/af_vsock.h                  |  7 +++--
 net/vmw_vsock/af_vsock.c                | 41 +++++++++++++++++++------
 net/vmw_vsock/hyperv_transport.c        |  5 +--
 net/vmw_vsock/virtio_transport_common.c |  5 +--
 net/vmw_vsock/vmci_transport.c          |  5 +--
 5 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index b1c717286993..015913601fad 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -193,13 +193,16 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
 void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
-					 struct sockaddr_vm *dst);
+					 struct sockaddr_vm *dst,
+					 struct net *net);
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
 bool vsock_find_cid(unsigned int cid);
+bool vsock_net_eq(const struct net *net1, const struct net *net2);
+struct net *vsock_default_net(void);
 
 /**** TAP ****/
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9c5b2a91baad..457ccd677756 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -140,6 +140,10 @@ static const struct vsock_transport *transport_dgram;
 static const struct vsock_transport *transport_local;
 static DEFINE_MUTEX(vsock_register_mutex);
 
+static bool netns;
+module_param(netns, bool, 0644);
+MODULE_PARM_DESC(netns, "Enable network namespace support");
+
 /**** UTILS ****/
 
 /* Each bound VSocket is stored in the bind hash table and each connected
@@ -226,15 +230,18 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
 	sock_put(&vsk->sk);
 }
 
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr,
+					      struct net *net)
 {
 	struct vsock_sock *vsk;
 
 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
-		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
+		if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
+		    vsock_net_eq(net, sock_net(sk_vsock(vsk))))
 			return sk_vsock(vsk);
 
 		if (addr->svm_port == vsk->local_addr.svm_port &&
+		    vsock_net_eq(net, sock_net(sk_vsock(vsk))) &&
 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
 		     addr->svm_cid == VMADDR_CID_ANY))
 			return sk_vsock(vsk);
@@ -244,13 +251,15 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
 }
 
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
-						  struct sockaddr_vm *dst)
+						  struct sockaddr_vm *dst,
+						  struct net *net)
 {
 	struct vsock_sock *vsk;
 
 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
 			    connected_table) {
 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
+		    vsock_net_eq(net, sock_net(sk_vsock(vsk))) &&
 		    dst->svm_port == vsk->local_addr.svm_port) {
 			return sk_vsock(vsk);
 		}
@@ -295,12 +304,12 @@ void vsock_remove_connected(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(vsock_remove_connected);
 
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net)
 {
 	struct sock *sk;
 
 	spin_lock_bh(&vsock_table_lock);
-	sk = __vsock_find_bound_socket(addr);
+	sk = __vsock_find_bound_socket(addr, net);
 	if (sk)
 		sock_hold(sk);
 
@@ -311,12 +320,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
-					 struct sockaddr_vm *dst)
+					 struct sockaddr_vm *dst,
+					 struct net *net)
 {
 	struct sock *sk;
 
 	spin_lock_bh(&vsock_table_lock);
-	sk = __vsock_find_connected_socket(src, dst);
+	sk = __vsock_find_connected_socket(src, dst, net);
 	if (sk)
 		sock_hold(sk);
 
@@ -488,6 +498,18 @@ bool vsock_find_cid(unsigned int cid)
 }
 EXPORT_SYMBOL_GPL(vsock_find_cid);
 
+bool vsock_net_eq(const struct net *net1, const struct net *net2)
+{
+	return !netns || net_eq(net1, net2);
+}
+EXPORT_SYMBOL_GPL(vsock_net_eq);
+
+struct net *vsock_default_net(void)
+{
+	return &init_net;
+}
+EXPORT_SYMBOL_GPL(vsock_default_net);
+
 static struct sock *vsock_dequeue_accept(struct sock *listener)
 {
 	struct vsock_sock *vlistener;
@@ -586,6 +608,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
 {
 	static u32 port;
 	struct sockaddr_vm new_addr;
+	struct net *net = sock_net(sk_vsock(vsk));
 
 	if (!port)
 		port = LAST_RESERVED_PORT + 1 +
@@ -603,7 +626,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
 
 			new_addr.svm_port = port++;
 
-			if (!__vsock_find_bound_socket(&new_addr)) {
+			if (!__vsock_find_bound_socket(&new_addr, net)) {
 				found = true;
 				break;
 			}
@@ -620,7 +643,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
 			return -EACCES;
 		}
 
-		if (__vsock_find_bound_socket(&new_addr))
+		if (__vsock_find_bound_socket(&new_addr, net))
 			return -EADDRINUSE;
 	}
 
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index b3bdae74c243..237c53316d70 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -201,7 +201,8 @@ static void hvs_remote_addr_init(struct sockaddr_vm *remote,
 
 		remote->svm_port = host_ephemeral_port++;
 
-		sk = vsock_find_connected_socket(remote, local);
+		sk = vsock_find_connected_socket(remote, local,
+						 vsock_default_net());
 		if (!sk) {
 			/* Found an available ephemeral port */
 			return;
@@ -350,7 +351,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		return;
 
 	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
-	sk = vsock_find_bound_socket(&addr);
+	sk = vsock_find_bound_socket(&addr, vsock_default_net());
 	if (!sk)
 		return;
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index d9f0c9c5425a..cecdfd91ed00 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1088,6 +1088,7 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
 void virtio_transport_recv_pkt(struct virtio_transport *t,
 			       struct virtio_vsock_pkt *pkt)
 {
+	struct net *net = vsock_default_net();
 	struct sockaddr_vm src, dst;
 	struct vsock_sock *vsk;
 	struct sock *sk;
@@ -1115,9 +1116,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 	/* The socket must be in connected or bound table
 	 * otherwise send reset back
 	 */
-	sk = vsock_find_connected_socket(&src, &dst);
+	sk = vsock_find_connected_socket(&src, &dst, net);
 	if (!sk) {
-		sk = vsock_find_bound_socket(&dst);
+		sk = vsock_find_bound_socket(&dst, net);
 		if (!sk) {
 			(void)virtio_transport_reset_no_sock(t, pkt);
 			goto free_pkt;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 4b8b1150a738..3ad15d51b30b 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -669,6 +669,7 @@ static bool vmci_transport_stream_allow(u32 cid, u32 port)
 
 static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
 {
+	struct net *net = vsock_default_net();
 	struct sock *sk;
 	struct sockaddr_vm dst;
 	struct sockaddr_vm src;
@@ -702,9 +703,9 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
 	vsock_addr_init(&src, pkt->dg.src.context, pkt->src_port);
 	vsock_addr_init(&dst, pkt->dg.dst.context, pkt->dst_port);
 
-	sk = vsock_find_connected_socket(&src, &dst);
+	sk = vsock_find_connected_socket(&src, &dst, net);
 	if (!sk) {
-		sk = vsock_find_bound_socket(&dst);
+		sk = vsock_find_bound_socket(&dst, net);
 		if (!sk) {
 			/* We could not find a socket for this specified
 			 * address.  If this packet is a RST, we just drop it.
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next 2/3] vsock/virtio_transport_common: handle netns of received packets
  2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
  2020-01-16 17:24 ` [PATCH net-next 1/3] vsock: add network namespace support Stefano Garzarella
@ 2020-01-16 17:24 ` Stefano Garzarella
  2020-01-16 17:24 ` [PATCH net-next 3/3] vhost/vsock: use netns of process that opens the vhost-vsock device Stefano Garzarella
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-16 17:24 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-kernel, Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi,
	Stefano Garzarella, virtualization, linux-hyperv,
	Michael S. Tsirkin, Dexuan Cui

This patch allows transports that use virtio_transport_common
to specify the network namespace where a received packet is to
be delivered.

virtio_transport and vhost_transport, for now, use the default
network namespace.

vsock_loopback uses the same network namespace of the transmitter.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   |  1 +
 include/linux/virtio_vsock.h            |  2 ++
 net/vmw_vsock/virtio_transport.c        |  2 ++
 net/vmw_vsock/virtio_transport_common.c | 13 ++++++++++---
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index c2d7d57e98cf..f1d39939d5e4 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -474,6 +474,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			continue;
 		}
 
+		pkt->net = vsock_default_net();
 		len = pkt->len;
 
 		/* Deliver to monitoring devices all received packets */
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 71c81e0dc8f2..d4fc93e6e03e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -43,6 +43,7 @@ struct virtio_vsock_pkt {
 	struct list_head list;
 	/* socket refcnt not held, only use for cancellation */
 	struct vsock_sock *vsk;
+	struct net *net;
 	void *buf;
 	u32 buf_len;
 	u32 len;
@@ -54,6 +55,7 @@ struct virtio_vsock_pkt_info {
 	u32 remote_cid, remote_port;
 	struct vsock_sock *vsk;
 	struct msghdr *msg;
+	struct net *net;
 	u32 pkt_len;
 	u16 type;
 	u16 op;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index dfbaf6bd8b1c..fb03a1535c21 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -527,6 +527,8 @@ static void virtio_transport_rx_work(struct work_struct *work)
 			}
 
 			pkt->len = len - sizeof(pkt->hdr);
+			pkt->net = vsock_default_net();
+
 			virtio_transport_deliver_tap_pkt(pkt);
 			virtio_transport_recv_pkt(&virtio_transport, pkt);
 		}
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index cecdfd91ed00..6402dea62e45 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -63,6 +63,7 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 	pkt->hdr.len		= cpu_to_le32(len);
 	pkt->reply		= info->reply;
 	pkt->vsk		= info->vsk;
+	pkt->net		= info->net;
 
 	if (info->msg && len > 0) {
 		pkt->buf = kmalloc(len, GFP_KERNEL);
@@ -273,6 +274,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk,
 		.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
 		.type = type,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -622,6 +624,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
 		.op = VIRTIO_VSOCK_OP_REQUEST,
 		.type = VIRTIO_VSOCK_TYPE_STREAM,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -638,6 +641,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
 			 (mode & SEND_SHUTDOWN ?
 			  VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -665,6 +669,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
 		.msg = msg,
 		.pkt_len = len,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -687,6 +692,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
 		.type = VIRTIO_VSOCK_TYPE_STREAM,
 		.reply = !!pkt,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	/* Send RST only if the original pkt is not a RST pkt */
@@ -707,6 +713,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 		.op = VIRTIO_VSOCK_OP_RST,
 		.type = le16_to_cpu(pkt->hdr.type),
 		.reply = true,
+		.net = pkt->net,
 	};
 
 	/* Send RST only if the original pkt is not a RST pkt */
@@ -991,6 +998,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
 		.remote_port = le32_to_cpu(pkt->hdr.src_port),
 		.reply = true,
 		.vsk = vsk,
+		.net = sock_net(sk_vsock(vsk)),
 	};
 
 	return virtio_transport_send_pkt_info(vsk, &info);
@@ -1088,7 +1096,6 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
 void virtio_transport_recv_pkt(struct virtio_transport *t,
 			       struct virtio_vsock_pkt *pkt)
 {
-	struct net *net = vsock_default_net();
 	struct sockaddr_vm src, dst;
 	struct vsock_sock *vsk;
 	struct sock *sk;
@@ -1116,9 +1123,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 	/* The socket must be in connected or bound table
 	 * otherwise send reset back
 	 */
-	sk = vsock_find_connected_socket(&src, &dst, net);
+	sk = vsock_find_connected_socket(&src, &dst, pkt->net);
 	if (!sk) {
-		sk = vsock_find_bound_socket(&dst, net);
+		sk = vsock_find_bound_socket(&dst, pkt->net);
 		if (!sk) {
 			(void)virtio_transport_reset_no_sock(t, pkt);
 			goto free_pkt;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next 3/3] vhost/vsock: use netns of process that opens the vhost-vsock device
  2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
  2020-01-16 17:24 ` [PATCH net-next 1/3] vsock: add network namespace support Stefano Garzarella
  2020-01-16 17:24 ` [PATCH net-next 2/3] vsock/virtio_transport_common: handle netns of received packets Stefano Garzarella
@ 2020-01-16 17:24 ` Stefano Garzarella
  2020-01-21 15:50 ` [PATCH net-next 0/3] vsock: support network namespace Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-16 17:24 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-kernel, Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi,
	Stefano Garzarella, virtualization, linux-hyperv,
	Michael S. Tsirkin, Dexuan Cui

This patch assigns the network namespace of the process that opened
vhost-vsock device (e.g. VMM) to the packets coming from the guest,
allowing only host sockets in the same network namespace to
communicate with the guest.

This patch also allows having different VMs, running in different
network namespace, with the same CID.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
RFC -> v1
 * used 'vsock_net_eq()' insted of 'net_eq()'
---
 drivers/vhost/vsock.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index f1d39939d5e4..8b0169105559 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -40,6 +40,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
 struct vhost_vsock {
 	struct vhost_dev dev;
 	struct vhost_virtqueue vqs[2];
+	struct net *net;
 
 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
 	struct hlist_node hash;
@@ -61,7 +62,7 @@ static u32 vhost_transport_get_local_cid(void)
 /* Callers that dereference the return value must hold vhost_vsock_mutex or the
  * RCU read lock.
  */
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net)
 {
 	struct vhost_vsock *vsock;
 
@@ -72,7 +73,7 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
 		if (other_cid == 0)
 			continue;
 
-		if (other_cid == guest_cid)
+		if (other_cid == guest_cid && vsock_net_eq(net, vsock->net))
 			return vsock;
 
 	}
@@ -245,7 +246,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 	rcu_read_lock();
 
 	/* Find the vhost_vsock according to guest context id  */
-	vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
+	vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid), pkt->net);
 	if (!vsock) {
 		rcu_read_unlock();
 		virtio_transport_free_pkt(pkt);
@@ -277,7 +278,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
 	rcu_read_lock();
 
 	/* Find the vhost_vsock according to guest context id  */
-	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid,
+				sock_net(sk_vsock(vsk)));
 	if (!vsock)
 		goto out;
 
@@ -474,7 +476,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			continue;
 		}
 
-		pkt->net = vsock_default_net();
+		pkt->net = vsock->net;
 		len = pkt->len;
 
 		/* Deliver to monitoring devices all received packets */
@@ -608,7 +610,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_vsock;
+	}
+
+	/* Derive the network namespace from the pid opening the device */
+	vsock->net = get_net_ns_by_pid(current->pid);
+	if (IS_ERR(vsock->net)) {
+		ret = PTR_ERR(vsock->net);
+		goto out_vqs;
 	}
 
 	vsock->guest_cid = 0; /* no CID assigned yet */
@@ -630,7 +639,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
 	return 0;
 
-out:
+out_vqs:
+	kfree(vqs);
+out_vsock:
 	vhost_vsock_free(vsock);
 	return ret;
 }
@@ -655,7 +666,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
 	 */
 
 	/* If the peer is still valid, no need to reset connection */
-	if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+	if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk)))
 		return;
 
 	/* If the close timeout is pending, let it expire.  This avoids races
@@ -703,6 +714,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
 	vhost_dev_cleanup(&vsock->dev);
+	put_net(vsock->net);
 	kfree(vsock->dev.vqs);
 	vhost_vsock_free(vsock);
 	return 0;
@@ -729,7 +741,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
 
 	/* Refuse if CID is already in use */
 	mutex_lock(&vhost_vsock_mutex);
-	other = vhost_vsock_get(guest_cid);
+	other = vhost_vsock_get(guest_cid, vsock->net);
 	if (other && other != vsock) {
 		mutex_unlock(&vhost_vsock_mutex);
 		return -EADDRINUSE;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-16 17:24 ` [PATCH net-next 1/3] vsock: add network namespace support Stefano Garzarella
@ 2020-01-20  9:06   ` David Miller
  2020-01-20 10:17     ` Stefano Garzarella
  2025-03-05  7:32   ` Michael S. Tsirkin
  1 sibling, 1 reply; 50+ messages in thread
From: David Miller @ 2020-01-20  9:06 UTC (permalink / raw)
  To: sgarzare
  Cc: netdev, linux-kernel, jhansen, jasowang, kvm, stefanha,
	virtualization, linux-hyperv, mst, decui

From: Stefano Garzarella <sgarzare@redhat.com>
Date: Thu, 16 Jan 2020 18:24:26 +0100

> This patch adds 'netns' module param to enable this new feature
> (disabled by default), because it changes vsock's behavior with
> network namespaces and could break existing applications.

Sorry, no.

I wonder if you can even design a legitimate, reasonable, use case
where these netns changes could break things.

I am totally against adding a module parameter for this, it's
incredibly confusing for users and will create a test scenerio
that is strongly less likely to be covered.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20  9:06   ` David Miller
@ 2020-01-20 10:17     ` Stefano Garzarella
  2020-01-20 12:03       ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-20 10:17 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, jhansen, jasowang, kvm, stefanha,
	virtualization, linux-hyperv, mst, decui

On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
> Date: Thu, 16 Jan 2020 18:24:26 +0100
> 
> > This patch adds 'netns' module param to enable this new feature
> > (disabled by default), because it changes vsock's behavior with
> > network namespaces and could break existing applications.
> 
> Sorry, no.
> 
> I wonder if you can even design a legitimate, reasonable, use case
> where these netns changes could break things.

I forgot to mention the use case.
I tried the RFC with Kata containers and we found that Kata shim-v1
doesn't work (Kata shim-v2 works as is) because there are the following
processes involved:
- kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
  passes it to qemu
- kata-shim (runs in a container) wants to talk with the guest but the
  vsock device is assigned to the init_netns and kata-shim runs in a
  different netns, so the communication is not allowed

But, as you said, this could be a wrong design, indeed they already
found a fix, but I was not sure if others could have the same issue.

In this case, do you think it is acceptable to make this change in
the vsock's behavior with netns and ask the user to change the design?

> 
> I am totally against adding a module parameter for this, it's
> incredibly confusing for users and will create a test scenerio
> that is strongly less likely to be covered.
> 

Got it, I'll remove the module parameter!

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20 10:17     ` Stefano Garzarella
@ 2020-01-20 12:03       ` Michael S. Tsirkin
  2020-01-20 13:58         ` Stefano Garzarella
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 12:03 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David Miller, netdev, linux-kernel, jhansen, jasowang, kvm,
	stefanha, virtualization, linux-hyperv, decui

On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > From: Stefano Garzarella <sgarzare@redhat.com>
> > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > 
> > > This patch adds 'netns' module param to enable this new feature
> > > (disabled by default), because it changes vsock's behavior with
> > > network namespaces and could break existing applications.
> > 
> > Sorry, no.
> > 
> > I wonder if you can even design a legitimate, reasonable, use case
> > where these netns changes could break things.
> 
> I forgot to mention the use case.
> I tried the RFC with Kata containers and we found that Kata shim-v1
> doesn't work (Kata shim-v2 works as is) because there are the following
> processes involved:
> - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
>   passes it to qemu
> - kata-shim (runs in a container) wants to talk with the guest but the
>   vsock device is assigned to the init_netns and kata-shim runs in a
>   different netns, so the communication is not allowed
> But, as you said, this could be a wrong design, indeed they already
> found a fix, but I was not sure if others could have the same issue.
> 
> In this case, do you think it is acceptable to make this change in
> the vsock's behavior with netns and ask the user to change the design?

David's question is what would be a usecase that's broken
(as opposed to fixed) by enabling this by default.

If it does exist, you need a way for userspace to opt-in,
module parameter isn't that.

> 
> > 
> > I am totally against adding a module parameter for this, it's
> > incredibly confusing for users and will create a test scenerio
> > that is strongly less likely to be covered.
> > 
> 
> Got it, I'll remove the module parameter!
> 
> Thanks,
> Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20 12:03       ` Michael S. Tsirkin
@ 2020-01-20 13:58         ` Stefano Garzarella
  2020-01-20 16:04           ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-20 13:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > >
> > > > This patch adds 'netns' module param to enable this new feature
> > > > (disabled by default), because it changes vsock's behavior with
> > > > network namespaces and could break existing applications.
> > >
> > > Sorry, no.
> > >
> > > I wonder if you can even design a legitimate, reasonable, use case
> > > where these netns changes could break things.
> >
> > I forgot to mention the use case.
> > I tried the RFC with Kata containers and we found that Kata shim-v1
> > doesn't work (Kata shim-v2 works as is) because there are the following
> > processes involved:
> > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> >   passes it to qemu
> > - kata-shim (runs in a container) wants to talk with the guest but the
> >   vsock device is assigned to the init_netns and kata-shim runs in a
> >   different netns, so the communication is not allowed
> > But, as you said, this could be a wrong design, indeed they already
> > found a fix, but I was not sure if others could have the same issue.
> >
> > In this case, do you think it is acceptable to make this change in
> > the vsock's behavior with netns and ask the user to change the design?
>
> David's question is what would be a usecase that's broken
> (as opposed to fixed) by enabling this by default.

Yes, I got that. Thanks for clarifying.
I just reported a broken example that can be fixed with a different
design (due to the fact that before this series, vsock devices were
accessible to all netns).

>
> If it does exist, you need a way for userspace to opt-in,
> module parameter isn't that.

Okay, but I honestly can't find a case that can't be solved.
So I don't know whether to add an option (ioctl, sysfs ?) or wait for
a real case to come up.

I'll try to see better if there's any particular case where we need
to disable netns in vsock.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20 13:58         ` Stefano Garzarella
@ 2020-01-20 16:04           ` Michael S. Tsirkin
  2020-01-20 16:53             ` Stefano Garzarella
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 16:04 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > >
> > > > > This patch adds 'netns' module param to enable this new feature
> > > > > (disabled by default), because it changes vsock's behavior with
> > > > > network namespaces and could break existing applications.
> > > >
> > > > Sorry, no.
> > > >
> > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > where these netns changes could break things.
> > >
> > > I forgot to mention the use case.
> > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > processes involved:
> > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > >   passes it to qemu
> > > - kata-shim (runs in a container) wants to talk with the guest but the
> > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > >   different netns, so the communication is not allowed
> > > But, as you said, this could be a wrong design, indeed they already
> > > found a fix, but I was not sure if others could have the same issue.
> > >
> > > In this case, do you think it is acceptable to make this change in
> > > the vsock's behavior with netns and ask the user to change the design?
> >
> > David's question is what would be a usecase that's broken
> > (as opposed to fixed) by enabling this by default.
> 
> Yes, I got that. Thanks for clarifying.
> I just reported a broken example that can be fixed with a different
> design (due to the fact that before this series, vsock devices were
> accessible to all netns).
> 
> >
> > If it does exist, you need a way for userspace to opt-in,
> > module parameter isn't that.
> 
> Okay, but I honestly can't find a case that can't be solved.
> So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> a real case to come up.
> 
> I'll try to see better if there's any particular case where we need
> to disable netns in vsock.
> 
> Thanks,
> Stefano

Me neither. so what did you have in mind when you wrote:
"could break existing applications"?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20 16:04           ` Michael S. Tsirkin
@ 2020-01-20 16:53             ` Stefano Garzarella
  2020-01-20 22:02               ` Michael S. Tsirkin
  0 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-20 16:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > >
> > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > network namespaces and could break existing applications.
> > > > >
> > > > > Sorry, no.
> > > > >
> > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > where these netns changes could break things.
> > > >
> > > > I forgot to mention the use case.
> > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > processes involved:
> > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > >   passes it to qemu
> > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > >   different netns, so the communication is not allowed
> > > > But, as you said, this could be a wrong design, indeed they already
> > > > found a fix, but I was not sure if others could have the same issue.
> > > >
> > > > In this case, do you think it is acceptable to make this change in
> > > > the vsock's behavior with netns and ask the user to change the design?
> > >
> > > David's question is what would be a usecase that's broken
> > > (as opposed to fixed) by enabling this by default.
> >
> > Yes, I got that. Thanks for clarifying.
> > I just reported a broken example that can be fixed with a different
> > design (due to the fact that before this series, vsock devices were
> > accessible to all netns).
> >
> > >
> > > If it does exist, you need a way for userspace to opt-in,
> > > module parameter isn't that.
> >
> > Okay, but I honestly can't find a case that can't be solved.
> > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > a real case to come up.
> >
> > I'll try to see better if there's any particular case where we need
> > to disable netns in vsock.
> >
> > Thanks,
> > Stefano
>
> Me neither. so what did you have in mind when you wrote:
> "could break existing applications"?

I had in mind:
1. the Kata case. It is fixable (the fix is not merged on kata), but
   older versions will not work with newer Linux.

2. a single process running on init_netns that wants to communicate with
   VMs handled by VMMs running in different netns, but this case can be
   solved opening the /dev/vhost-vsock in the same netns of the process
   that wants to communicate with the VMs (init_netns in this case), and
   passig it to the VMM.

These cases can work with vsock+netns, but they require changes because
I'm modifying the vsock behavior with netns.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20 16:53             ` Stefano Garzarella
@ 2020-01-20 22:02               ` Michael S. Tsirkin
  2020-01-21  9:07                 ` Stefano Garzarella
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 22:02 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > >
> > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > network namespaces and could break existing applications.
> > > > > >
> > > > > > Sorry, no.
> > > > > >
> > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > where these netns changes could break things.
> > > > >
> > > > > I forgot to mention the use case.
> > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > processes involved:
> > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > >   passes it to qemu
> > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > >   different netns, so the communication is not allowed
> > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > found a fix, but I was not sure if others could have the same issue.
> > > > >
> > > > > In this case, do you think it is acceptable to make this change in
> > > > > the vsock's behavior with netns and ask the user to change the design?
> > > >
> > > > David's question is what would be a usecase that's broken
> > > > (as opposed to fixed) by enabling this by default.
> > >
> > > Yes, I got that. Thanks for clarifying.
> > > I just reported a broken example that can be fixed with a different
> > > design (due to the fact that before this series, vsock devices were
> > > accessible to all netns).
> > >
> > > >
> > > > If it does exist, you need a way for userspace to opt-in,
> > > > module parameter isn't that.
> > >
> > > Okay, but I honestly can't find a case that can't be solved.
> > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > a real case to come up.
> > >
> > > I'll try to see better if there's any particular case where we need
> > > to disable netns in vsock.
> > >
> > > Thanks,
> > > Stefano
> >
> > Me neither. so what did you have in mind when you wrote:
> > "could break existing applications"?
> 
> I had in mind:
> 1. the Kata case. It is fixable (the fix is not merged on kata), but
>    older versions will not work with newer Linux.

meaning they will keep not working, right?

> 2. a single process running on init_netns that wants to communicate with
>    VMs handled by VMMs running in different netns, but this case can be
>    solved opening the /dev/vhost-vsock in the same netns of the process
>    that wants to communicate with the VMs (init_netns in this case), and
>    passig it to the VMM.

again right now they just don't work, right?

> These cases can work with vsock+netns, but they require changes because
> I'm modifying the vsock behavior with netns.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-20 22:02               ` Michael S. Tsirkin
@ 2020-01-21  9:07                 ` Stefano Garzarella
  2020-01-21 11:14                   ` Michael S. Tsirkin
  2020-01-21 13:59                   ` Stefan Hajnoczi
  0 siblings, 2 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-21  9:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > >
> > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > network namespaces and could break existing applications.
> > > > > > >
> > > > > > > Sorry, no.
> > > > > > >
> > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > where these netns changes could break things.
> > > > > >
> > > > > > I forgot to mention the use case.
> > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > processes involved:
> > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > >   passes it to qemu
> > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > >   different netns, so the communication is not allowed
> > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > >
> > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > >
> > > > > David's question is what would be a usecase that's broken
> > > > > (as opposed to fixed) by enabling this by default.
> > > >
> > > > Yes, I got that. Thanks for clarifying.
> > > > I just reported a broken example that can be fixed with a different
> > > > design (due to the fact that before this series, vsock devices were
> > > > accessible to all netns).
> > > >
> > > > >
> > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > module parameter isn't that.
> > > >
> > > > Okay, but I honestly can't find a case that can't be solved.
> > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > a real case to come up.
> > > >
> > > > I'll try to see better if there's any particular case where we need
> > > > to disable netns in vsock.
> > > >
> > > > Thanks,
> > > > Stefano
> > >
> > > Me neither. so what did you have in mind when you wrote:
> > > "could break existing applications"?
> >
> > I had in mind:
> > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> >    older versions will not work with newer Linux.
>
> meaning they will keep not working, right?

Right, I mean without this series they work, with this series they work
only if the netns support is disabled or with a patch proposed but not
merged in kata.

>
> > 2. a single process running on init_netns that wants to communicate with
> >    VMs handled by VMMs running in different netns, but this case can be
> >    solved opening the /dev/vhost-vsock in the same netns of the process
> >    that wants to communicate with the VMs (init_netns in this case), and
> >    passig it to the VMM.
>
> again right now they just don't work, right?

Right, as above.

What do you recommend I do?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-21  9:07                 ` Stefano Garzarella
@ 2020-01-21 11:14                   ` Michael S. Tsirkin
  2020-01-21 13:13                     ` Stefano Garzarella
  2020-01-21 15:43                     ` Stefan Hajnoczi
  2020-01-21 13:59                   ` Stefan Hajnoczi
  1 sibling, 2 replies; 50+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21 11:14 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > >
> > > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > > network namespaces and could break existing applications.
> > > > > > > >
> > > > > > > > Sorry, no.
> > > > > > > >
> > > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > > where these netns changes could break things.
> > > > > > >
> > > > > > > I forgot to mention the use case.
> > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > > processes involved:
> > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > >   passes it to qemu
> > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > > >   different netns, so the communication is not allowed
> > > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > > >
> > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > > >
> > > > > > David's question is what would be a usecase that's broken
> > > > > > (as opposed to fixed) by enabling this by default.
> > > > >
> > > > > Yes, I got that. Thanks for clarifying.
> > > > > I just reported a broken example that can be fixed with a different
> > > > > design (due to the fact that before this series, vsock devices were
> > > > > accessible to all netns).
> > > > >
> > > > > >
> > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > module parameter isn't that.
> > > > >
> > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > a real case to come up.
> > > > >
> > > > > I'll try to see better if there's any particular case where we need
> > > > > to disable netns in vsock.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Me neither. so what did you have in mind when you wrote:
> > > > "could break existing applications"?
> > >
> > > I had in mind:
> > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > >    older versions will not work with newer Linux.
> >
> > meaning they will keep not working, right?
> 
> Right, I mean without this series they work, with this series they work
> only if the netns support is disabled or with a patch proposed but not
> merged in kata.
> 
> >
> > > 2. a single process running on init_netns that wants to communicate with
> > >    VMs handled by VMMs running in different netns, but this case can be
> > >    solved opening the /dev/vhost-vsock in the same netns of the process
> > >    that wants to communicate with the VMs (init_netns in this case), and
> > >    passig it to the VMM.
> >
> > again right now they just don't work, right?
> 
> Right, as above.
> 
> What do you recommend I do?
> 
> Thanks,
> Stefano

If this breaks userspace, then we need to maintain compatibility.
For example, have two devices, /dev/vhost-vsock and /dev/vhost-vsock-netns?

-- 
MST


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-21 11:14                   ` Michael S. Tsirkin
@ 2020-01-21 13:13                     ` Stefano Garzarella
  2020-01-21 15:43                     ` Stefan Hajnoczi
  1 sibling, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-21 13:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Tue, Jan 21, 2020 at 06:14:48AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > > >
> > > > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > > > network namespaces and could break existing applications.
> > > > > > > > >
> > > > > > > > > Sorry, no.
> > > > > > > > >
> > > > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > > > where these netns changes could break things.
> > > > > > > >
> > > > > > > > I forgot to mention the use case.
> > > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > > > processes involved:
> > > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > > >   passes it to qemu
> > > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > > > >   different netns, so the communication is not allowed
> > > > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > > > >
> > > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > > > >
> > > > > > > David's question is what would be a usecase that's broken
> > > > > > > (as opposed to fixed) by enabling this by default.
> > > > > >
> > > > > > Yes, I got that. Thanks for clarifying.
> > > > > > I just reported a broken example that can be fixed with a different
> > > > > > design (due to the fact that before this series, vsock devices were
> > > > > > accessible to all netns).
> > > > > >
> > > > > > >
> > > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > > module parameter isn't that.
> > > > > >
> > > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > > a real case to come up.
> > > > > >
> > > > > > I'll try to see better if there's any particular case where we need
> > > > > > to disable netns in vsock.
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Me neither. so what did you have in mind when you wrote:
> > > > > "could break existing applications"?
> > > >
> > > > I had in mind:
> > > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > > >    older versions will not work with newer Linux.
> > >
> > > meaning they will keep not working, right?
> > 
> > Right, I mean without this series they work, with this series they work
> > only if the netns support is disabled or with a patch proposed but not
> > merged in kata.
> > 
> > >
> > > > 2. a single process running on init_netns that wants to communicate with
> > > >    VMs handled by VMMs running in different netns, but this case can be
> > > >    solved opening the /dev/vhost-vsock in the same netns of the process
> > > >    that wants to communicate with the VMs (init_netns in this case), and
> > > >    passig it to the VMM.
> > >
> > > again right now they just don't work, right?
> > 
> > Right, as above.
> > 
> > What do you recommend I do?
> > 
> > Thanks,
> > Stefano
> 
> If this breaks userspace, then we need to maintain compatibility.
> For example, have two devices, /dev/vhost-vsock and /dev/vhost-vsock-netns?

Interesting!

So, VMs handled with /dev/vhost-vsock will be reachable from any netns (as
it happens now) and VMs handled with /dev/vhost-vsock-netns will be
reachable only from the same netns of the process that opens it.

It requires more changes, but we will preserve the previous behavior,
adding the new feature!

Thanks a lot for this idea! I'll try to implement it!
Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-21  9:07                 ` Stefano Garzarella
  2020-01-21 11:14                   ` Michael S. Tsirkin
@ 2020-01-21 13:59                   ` Stefan Hajnoczi
  2020-01-21 14:31                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2020-01-21 13:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, David Miller, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, virtualization, linux-hyperv,
	Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 5589 bytes --]

On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > >
> > > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > > network namespaces and could break existing applications.
> > > > > > > >
> > > > > > > > Sorry, no.
> > > > > > > >
> > > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > > where these netns changes could break things.
> > > > > > >
> > > > > > > I forgot to mention the use case.
> > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > > processes involved:
> > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > >   passes it to qemu
> > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > > >   different netns, so the communication is not allowed
> > > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > > >
> > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > > >
> > > > > > David's question is what would be a usecase that's broken
> > > > > > (as opposed to fixed) by enabling this by default.
> > > > >
> > > > > Yes, I got that. Thanks for clarifying.
> > > > > I just reported a broken example that can be fixed with a different
> > > > > design (due to the fact that before this series, vsock devices were
> > > > > accessible to all netns).
> > > > >
> > > > > >
> > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > module parameter isn't that.
> > > > >
> > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > a real case to come up.
> > > > >
> > > > > I'll try to see better if there's any particular case where we need
> > > > > to disable netns in vsock.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Me neither. so what did you have in mind when you wrote:
> > > > "could break existing applications"?
> > >
> > > I had in mind:
> > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > >    older versions will not work with newer Linux.
> >
> > meaning they will keep not working, right?
> 
> Right, I mean without this series they work, with this series they work
> only if the netns support is disabled or with a patch proposed but not
> merged in kata.
> 
> >
> > > 2. a single process running on init_netns that wants to communicate with
> > >    VMs handled by VMMs running in different netns, but this case can be
> > >    solved opening the /dev/vhost-vsock in the same netns of the process
> > >    that wants to communicate with the VMs (init_netns in this case), and
> > >    passig it to the VMM.
> >
> > again right now they just don't work, right?
> 
> Right, as above.
> 
> What do you recommend I do?

Existing userspace applications must continue to work.

Guests are fine because G2H transports are always in the initial network
namespace.

On the host side we have a real case where Kata Containers and other
vsock users break.  Existing applications run in other network
namespaces and assume they can communicate over vsock (it's only
available in the initial network namespace by default).

It seems we cannot isolate new network namespaces from the initial
network namespace by default because it will break existing
applications.  That's a bummer.

There is one solution that maintains compatibility:

Introduce a per-namespace vsock isolation flag that can only transition
from false to true.  Once it becomes true it cannot be reset to false
anymore (for security).

When vsock isolation is false the initial network namespace is used for
<CID, port> addressing.

When vsock isolation is true the current namespace is used for <CID,
port> addressing.

I guess the vsock isolation flag would be set via a rtnetlink message,
but I haven't checked.

The upshot is: existing software doesn't benefit from namespaces for
vsock isolation but it continues to work!  New software makes 1 special
call after creating the namespace to opt in to vsock isolation.

This approach is secure because whoever sets up namespaces can
transition the flag from false to true and know that it can never be
reset to false anymore.

Does this make sense to everyone?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-21 13:59                   ` Stefan Hajnoczi
@ 2020-01-21 14:31                     ` Michael S. Tsirkin
  2020-01-21 15:44                       ` Stefan Hajnoczi
  0 siblings, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21 14:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefano Garzarella, David Miller, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, virtualization, linux-hyperv,
	Dexuan Cui

On Tue, Jan 21, 2020 at 01:59:07PM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > > >
> > > > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > > > network namespaces and could break existing applications.
> > > > > > > > >
> > > > > > > > > Sorry, no.
> > > > > > > > >
> > > > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > > > where these netns changes could break things.
> > > > > > > >
> > > > > > > > I forgot to mention the use case.
> > > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > > > processes involved:
> > > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > > >   passes it to qemu
> > > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > > > >   different netns, so the communication is not allowed
> > > > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > > > >
> > > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > > > >
> > > > > > > David's question is what would be a usecase that's broken
> > > > > > > (as opposed to fixed) by enabling this by default.
> > > > > >
> > > > > > Yes, I got that. Thanks for clarifying.
> > > > > > I just reported a broken example that can be fixed with a different
> > > > > > design (due to the fact that before this series, vsock devices were
> > > > > > accessible to all netns).
> > > > > >
> > > > > > >
> > > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > > module parameter isn't that.
> > > > > >
> > > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > > a real case to come up.
> > > > > >
> > > > > > I'll try to see better if there's any particular case where we need
> > > > > > to disable netns in vsock.
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Me neither. so what did you have in mind when you wrote:
> > > > > "could break existing applications"?
> > > >
> > > > I had in mind:
> > > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > > >    older versions will not work with newer Linux.
> > >
> > > meaning they will keep not working, right?
> > 
> > Right, I mean without this series they work, with this series they work
> > only if the netns support is disabled or with a patch proposed but not
> > merged in kata.
> > 
> > >
> > > > 2. a single process running on init_netns that wants to communicate with
> > > >    VMs handled by VMMs running in different netns, but this case can be
> > > >    solved opening the /dev/vhost-vsock in the same netns of the process
> > > >    that wants to communicate with the VMs (init_netns in this case), and
> > > >    passig it to the VMM.
> > >
> > > again right now they just don't work, right?
> > 
> > Right, as above.
> > 
> > What do you recommend I do?
> 
> Existing userspace applications must continue to work.
> 
> Guests are fine because G2H transports are always in the initial network
> namespace.
> 
> On the host side we have a real case where Kata Containers and other
> vsock users break.  Existing applications run in other network
> namespaces and assume they can communicate over vsock (it's only
> available in the initial network namespace by default).
> 
> It seems we cannot isolate new network namespaces from the initial
> network namespace by default because it will break existing
> applications.  That's a bummer.
> 
> There is one solution that maintains compatibility:
> 
> Introduce a per-namespace vsock isolation flag that can only transition
> from false to true.  Once it becomes true it cannot be reset to false
> anymore (for security).
> 
> When vsock isolation is false the initial network namespace is used for
> <CID, port> addressing.
> 
> When vsock isolation is true the current namespace is used for <CID,
> port> addressing.
> 
> I guess the vsock isolation flag would be set via a rtnetlink message,
> but I haven't checked.
> 
> The upshot is: existing software doesn't benefit from namespaces for
> vsock isolation but it continues to work!  New software makes 1 special
> call after creating the namespace to opt in to vsock isolation.
> 
> This approach is secure because whoever sets up namespaces can
> transition the flag from false to true and know that it can never be
> reset to false anymore.
> 
> Does this make sense to everyone?
> 
> Stefan

Anything wrong with a separate device? whoever opens it decides
whether netns will work ...

-- 
MST


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-21 11:14                   ` Michael S. Tsirkin
  2020-01-21 13:13                     ` Stefano Garzarella
@ 2020-01-21 15:43                     ` Stefan Hajnoczi
  1 sibling, 0 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2020-01-21 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefano Garzarella, David Miller, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, virtualization, linux-hyperv,
	Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 4855 bytes --]

On Tue, Jan 21, 2020 at 06:14:48AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > > >
> > > > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > > > network namespaces and could break existing applications.
> > > > > > > > >
> > > > > > > > > Sorry, no.
> > > > > > > > >
> > > > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > > > where these netns changes could break things.
> > > > > > > >
> > > > > > > > I forgot to mention the use case.
> > > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > > > processes involved:
> > > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > > >   passes it to qemu
> > > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > > > >   different netns, so the communication is not allowed
> > > > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > > > >
> > > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > > > >
> > > > > > > David's question is what would be a usecase that's broken
> > > > > > > (as opposed to fixed) by enabling this by default.
> > > > > >
> > > > > > Yes, I got that. Thanks for clarifying.
> > > > > > I just reported a broken example that can be fixed with a different
> > > > > > design (due to the fact that before this series, vsock devices were
> > > > > > accessible to all netns).
> > > > > >
> > > > > > >
> > > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > > module parameter isn't that.
> > > > > >
> > > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > > a real case to come up.
> > > > > >
> > > > > > I'll try to see better if there's any particular case where we need
> > > > > > to disable netns in vsock.
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Me neither. so what did you have in mind when you wrote:
> > > > > "could break existing applications"?
> > > >
> > > > I had in mind:
> > > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > > >    older versions will not work with newer Linux.
> > >
> > > meaning they will keep not working, right?
> > 
> > Right, I mean without this series they work, with this series they work
> > only if the netns support is disabled or with a patch proposed but not
> > merged in kata.
> > 
> > >
> > > > 2. a single process running on init_netns that wants to communicate with
> > > >    VMs handled by VMMs running in different netns, but this case can be
> > > >    solved opening the /dev/vhost-vsock in the same netns of the process
> > > >    that wants to communicate with the VMs (init_netns in this case), and
> > > >    passig it to the VMM.
> > >
> > > again right now they just don't work, right?
> > 
> > Right, as above.
> > 
> > What do you recommend I do?
> > 
> > Thanks,
> > Stefano
> 
> If this breaks userspace, then we need to maintain compatibility.
> For example, have two devices, /dev/vhost-vsock and /dev/vhost-vsock-netns?

/dev/vhost-vsock-netns is cleaner and simpler than my suggestion.  I
like it!

This is nice for containers (say you want to run QEMU inside a container
on the host) because you can allow only /dev/vhost-vsock-netns inside
containers.  This prevents them from opening /dev/vhost-vsock to get
access to the initial network namespace.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-21 14:31                     ` Michael S. Tsirkin
@ 2020-01-21 15:44                       ` Stefan Hajnoczi
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2020-01-21 15:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefano Garzarella, David Miller, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, virtualization, linux-hyperv,
	Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 6399 bytes --]

On Tue, Jan 21, 2020 at 09:31:42AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 01:59:07PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > > > >
> > > > > > > > > > > This patch adds 'netns' module param to enable this new feature
> > > > > > > > > > > (disabled by default), because it changes vsock's behavior with
> > > > > > > > > > > network namespaces and could break existing applications.
> > > > > > > > > >
> > > > > > > > > > Sorry, no.
> > > > > > > > > >
> > > > > > > > > > I wonder if you can even design a legitimate, reasonable, use case
> > > > > > > > > > where these netns changes could break things.
> > > > > > > > >
> > > > > > > > > I forgot to mention the use case.
> > > > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1
> > > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following
> > > > > > > > > processes involved:
> > > > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > > > >   passes it to qemu
> > > > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the
> > > > > > > > >   vsock device is assigned to the init_netns and kata-shim runs in a
> > > > > > > > >   different netns, so the communication is not allowed
> > > > > > > > > But, as you said, this could be a wrong design, indeed they already
> > > > > > > > > found a fix, but I was not sure if others could have the same issue.
> > > > > > > > >
> > > > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > > > the vsock's behavior with netns and ask the user to change the design?
> > > > > > > >
> > > > > > > > David's question is what would be a usecase that's broken
> > > > > > > > (as opposed to fixed) by enabling this by default.
> > > > > > >
> > > > > > > Yes, I got that. Thanks for clarifying.
> > > > > > > I just reported a broken example that can be fixed with a different
> > > > > > > design (due to the fact that before this series, vsock devices were
> > > > > > > accessible to all netns).
> > > > > > >
> > > > > > > >
> > > > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > > > module parameter isn't that.
> > > > > > >
> > > > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > > > a real case to come up.
> > > > > > >
> > > > > > > I'll try to see better if there's any particular case where we need
> > > > > > > to disable netns in vsock.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > >
> > > > > > Me neither. so what did you have in mind when you wrote:
> > > > > > "could break existing applications"?
> > > > >
> > > > > I had in mind:
> > > > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > > > >    older versions will not work with newer Linux.
> > > >
> > > > meaning they will keep not working, right?
> > > 
> > > Right, I mean without this series they work, with this series they work
> > > only if the netns support is disabled or with a patch proposed but not
> > > merged in kata.
> > > 
> > > >
> > > > > 2. a single process running on init_netns that wants to communicate with
> > > > >    VMs handled by VMMs running in different netns, but this case can be
> > > > >    solved opening the /dev/vhost-vsock in the same netns of the process
> > > > >    that wants to communicate with the VMs (init_netns in this case), and
> > > > >    passig it to the VMM.
> > > >
> > > > again right now they just don't work, right?
> > > 
> > > Right, as above.
> > > 
> > > What do you recommend I do?
> > 
> > Existing userspace applications must continue to work.
> > 
> > Guests are fine because G2H transports are always in the initial network
> > namespace.
> > 
> > On the host side we have a real case where Kata Containers and other
> > vsock users break.  Existing applications run in other network
> > namespaces and assume they can communicate over vsock (it's only
> > available in the initial network namespace by default).
> > 
> > It seems we cannot isolate new network namespaces from the initial
> > network namespace by default because it will break existing
> > applications.  That's a bummer.
> > 
> > There is one solution that maintains compatibility:
> > 
> > Introduce a per-namespace vsock isolation flag that can only transition
> > from false to true.  Once it becomes true it cannot be reset to false
> > anymore (for security).
> > 
> > When vsock isolation is false the initial network namespace is used for
> > <CID, port> addressing.
> > 
> > When vsock isolation is true the current namespace is used for <CID,
> > port> addressing.
> > 
> > I guess the vsock isolation flag would be set via a rtnetlink message,
> > but I haven't checked.
> > 
> > The upshot is: existing software doesn't benefit from namespaces for
> > vsock isolation but it continues to work!  New software makes 1 special
> > call after creating the namespace to opt in to vsock isolation.
> > 
> > This approach is secure because whoever sets up namespaces can
> > transition the flag from false to true and know that it can never be
> > reset to false anymore.
> > 
> > Does this make sense to everyone?
> > 
> > Stefan
> 
> Anything wrong with a separate device? whoever opens it decides
> whether netns will work ...

Your idea is better.  I think a separate device is the way to go.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
                   ` (2 preceding siblings ...)
  2020-01-16 17:24 ` [PATCH net-next 3/3] vhost/vsock: use netns of process that opens the vhost-vsock device Stefano Garzarella
@ 2020-01-21 15:50 ` Stefan Hajnoczi
  2020-01-22  9:13   ` Stefano Garzarella
  2020-04-27 14:25 ` Stefano Garzarella
  2025-03-05  0:06 ` Bobby Eshleman
  5 siblings, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2020-01-21 15:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	virtualization, linux-hyperv, Michael S. Tsirkin, Dexuan Cui

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

What should vsock_dev_do_ioctl() IOCTL_VM_SOCKETS_GET_LOCAL_CID return?
The answer is probably dependent on the caller's network namespace.

Ultimately we may need per-namespace transports.  Imagine assigning a
G2H transport to a specific network namespace.

vsock_stream_connect() needs to be namespace-aware so that other
namespaces cannot use the G2H transport to send a connection
establishment packet.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-01-21 15:50 ` [PATCH net-next 0/3] vsock: support network namespace Stefan Hajnoczi
@ 2020-01-22  9:13   ` Stefano Garzarella
  0 siblings, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-01-22  9:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	virtualization, linux-hyperv, Michael S. Tsirkin, Dexuan Cui

On Tue, Jan 21, 2020 at 03:50:53PM +0000, Stefan Hajnoczi wrote:
> What should vsock_dev_do_ioctl() IOCTL_VM_SOCKETS_GET_LOCAL_CID return?
> The answer is probably dependent on the caller's network namespace.

Right, and I'm not handling this case. I'll fix!

> 
> Ultimately we may need per-namespace transports.  Imagine assigning a
> G2H transport to a specific network namespace.

Agree.

> 
> vsock_stream_connect() needs to be namespace-aware so that other
> namespaces cannot use the G2H transport to send a connection
> establishment packet.

Right, maybe I can change the vsock_assign_transport() to check if a
transport can be assigned to a socket, checking the namespace.

I'll send a v2 handling these cases and implementing the Michael's idea
about /dev/vhost-vsock-netns

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
                   ` (3 preceding siblings ...)
  2020-01-21 15:50 ` [PATCH net-next 0/3] vsock: support network namespace Stefan Hajnoczi
@ 2020-04-27 14:25 ` Stefano Garzarella
  2020-04-27 14:31   ` Michael S. Tsirkin
  2020-04-28  8:13   ` Jason Wang
  2025-03-05  0:06 ` Bobby Eshleman
  5 siblings, 2 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-04-27 14:25 UTC (permalink / raw)
  To: davem, Stefan Hajnoczi, Michael S. Tsirkin
  Cc: linux-kernel, Jorgen Hansen, Jason Wang, kvm, virtualization,
	linux-hyperv, Dexuan Cui, netdev

Hi David, Michael, Stefan,
I'm restarting to work on this topic since Kata guys are interested to
have that, especially on the guest side.

While working on the v2 I had few doubts, and I'd like to have your
suggestions:

 1. netns assigned to the device inside the guest

   Currently I assigned this device to 'init_net'. Maybe it is better
   if we allow the user to decide which netns assign to the device
   or to disable this new feature to have the same behavior as before
   (host reachable from any netns).
   I think we can handle this in the vsock core and not in the single
   transports.

   The simplest way that I found, is to add a new
   IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
   and assign the device to the same netns of the process that do the
   ioctl(), but I'm not sure it is clean enough.

   Maybe it is better to add new rtnetlink messages, but I'm not sure if
   it is feasible since we don't have a netdev device.

   What do you suggest?


 2. netns assigned in the host

    As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
    userspace application to use this new feature, leaving to
    /dev/vhost-vsock the previous behavior (guest reachable from any
    netns).

    I like this approach, but I had these doubts:

    - I need to allocate a new minor for that device (e.g.
      VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
      use?

    - It is vhost-vsock specific, should we provide something handled in
      the vsock core, maybe centralizing the CID allocation and adding a
      new IOCTL or rtnetlink message like for the guest side?
      (maybe it could be a second step, and for now we can continue with
      the new device)


Thanks for the help,
Stefano


On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
> RFC -> v1:
>  * added 'netns' module param to vsock.ko to enable the
>    network namespace support (disabled by default)
>  * added 'vsock_net_eq()' to check the "net" assigned to a socket
>    only when 'netns' support is enabled
> 
> RFC: https://patchwork.ozlabs.org/cover/1202235/
> 
> Now that we have multi-transport upstream, I started to take a look to
> support network namespace in vsock.
> 
> As we partially discussed in the multi-transport proposal [1], it could
> be nice to support network namespace in vsock to reach the following
> goals:
> - isolate host applications from guest applications using the same ports
>   with CID_ANY
> - assign the same CID of VMs running in different network namespaces
> - partition VMs between VMMs or at finer granularity
> 
> This new feature is disabled by default, because it changes vsock's
> behavior with network namespaces and could break existing applications.
> It can be enabled with the new 'netns' module parameter of vsock.ko.
> 
> This implementation provides the following behavior:
> - packets received from the host (received by G2H transports) are
>   assigned to the default netns (init_net)
> - packets received from the guest (received by H2G - vhost-vsock) are
>   assigned to the netns of the process that opens /dev/vhost-vsock
>   (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
>     - for vmci I need some suggestions, because I don't know how to do
>       and test the same in the vmci driver, for now vmci uses the
>       init_net
> - loopback packets are exchanged only in the same netns
> 
> I tested the series in this way:
> l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
>             -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
>             -device vhost-vsock-pci,guest-cid=3
> 
> l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
> 
> l1_vm$ ip netns add ns1
> l1_vm$ ip netns add ns2
>  # same CID on different netns
> l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
>             -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
>             -device vhost-vsock-pci,guest-cid=4
> l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
>             -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
>             -device vhost-vsock-pci,guest-cid=4
> 
>  # all iperf3 listen on CID_ANY and port 5201, but in different netns
> l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
>                            # on default netns (init_net)
> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> 
> l0_host$ ./iperf3 --vsock -c 3
> l2_vm1$ ./iperf3 --vsock -c 2
> l2_vm2$ ./iperf3 --vsock -c 2
> 
> [1] https://www.spinics.net/lists/netdev/msg575792.html
> 
> Stefano Garzarella (3):
>   vsock: add network namespace support
>   vsock/virtio_transport_common: handle netns of received packets
>   vhost/vsock: use netns of process that opens the vhost-vsock device
> 
>  drivers/vhost/vsock.c                   | 29 ++++++++++++-----
>  include/linux/virtio_vsock.h            |  2 ++
>  include/net/af_vsock.h                  |  7 +++--
>  net/vmw_vsock/af_vsock.c                | 41 +++++++++++++++++++------
>  net/vmw_vsock/hyperv_transport.c        |  5 +--
>  net/vmw_vsock/virtio_transport.c        |  2 ++
>  net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
>  net/vmw_vsock/vmci_transport.c          |  5 +--
>  8 files changed, 78 insertions(+), 25 deletions(-)
> 
> -- 
> 2.24.1
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-04-27 14:25 ` Stefano Garzarella
@ 2020-04-27 14:31   ` Michael S. Tsirkin
  2020-04-27 15:21     ` Stefano Garzarella
  2020-04-28  8:13   ` Jason Wang
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2020-04-27 14:31 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: davem, Stefan Hajnoczi, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, virtualization, linux-hyperv, Dexuan Cui, netdev

On Mon, Apr 27, 2020 at 04:25:18PM +0200, Stefano Garzarella wrote:
> Hi David, Michael, Stefan,
> I'm restarting to work on this topic since Kata guys are interested to
> have that, especially on the guest side.
> 
> While working on the v2 I had few doubts, and I'd like to have your
> suggestions:
> 
>  1. netns assigned to the device inside the guest
> 
>    Currently I assigned this device to 'init_net'. Maybe it is better
>    if we allow the user to decide which netns assign to the device
>    or to disable this new feature to have the same behavior as before
>    (host reachable from any netns).
>    I think we can handle this in the vsock core and not in the single
>    transports.
> 
>    The simplest way that I found, is to add a new
>    IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
>    and assign the device to the same netns of the process that do the
>    ioctl(), but I'm not sure it is clean enough.
> 
>    Maybe it is better to add new rtnetlink messages, but I'm not sure if
>    it is feasible since we don't have a netdev device.
> 
>    What do you suggest?

Maybe /dev/vsock-netns here too, like in the host?


> 
>  2. netns assigned in the host
> 
>     As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
>     userspace application to use this new feature, leaving to
>     /dev/vhost-vsock the previous behavior (guest reachable from any
>     netns).
> 
>     I like this approach, but I had these doubts:
> 
>     - I need to allocate a new minor for that device (e.g.
>       VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
>       use?

Not that I see. I agree it's a bit annoying. I'll think about it a bit.

>     - It is vhost-vsock specific, should we provide something handled in
>       the vsock core, maybe centralizing the CID allocation and adding a
>       new IOCTL or rtnetlink message like for the guest side?
>       (maybe it could be a second step, and for now we can continue with
>       the new device)
> 
> 
> Thanks for the help,
> Stefano
> 
> 
> On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
> > RFC -> v1:
> >  * added 'netns' module param to vsock.ko to enable the
> >    network namespace support (disabled by default)
> >  * added 'vsock_net_eq()' to check the "net" assigned to a socket
> >    only when 'netns' support is enabled
> > 
> > RFC: https://patchwork.ozlabs.org/cover/1202235/
> > 
> > Now that we have multi-transport upstream, I started to take a look to
> > support network namespace in vsock.
> > 
> > As we partially discussed in the multi-transport proposal [1], it could
> > be nice to support network namespace in vsock to reach the following
> > goals:
> > - isolate host applications from guest applications using the same ports
> >   with CID_ANY
> > - assign the same CID of VMs running in different network namespaces
> > - partition VMs between VMMs or at finer granularity
> > 
> > This new feature is disabled by default, because it changes vsock's
> > behavior with network namespaces and could break existing applications.
> > It can be enabled with the new 'netns' module parameter of vsock.ko.
> > 
> > This implementation provides the following behavior:
> > - packets received from the host (received by G2H transports) are
> >   assigned to the default netns (init_net)
> > - packets received from the guest (received by H2G - vhost-vsock) are
> >   assigned to the netns of the process that opens /dev/vhost-vsock
> >   (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
> >     - for vmci I need some suggestions, because I don't know how to do
> >       and test the same in the vmci driver, for now vmci uses the
> >       init_net
> > - loopback packets are exchanged only in the same netns
> > 
> > I tested the series in this way:
> > l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
> >             -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
> >             -device vhost-vsock-pci,guest-cid=3
> > 
> > l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
> > 
> > l1_vm$ ip netns add ns1
> > l1_vm$ ip netns add ns2
> >  # same CID on different netns
> > l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
> >             -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
> >             -device vhost-vsock-pci,guest-cid=4
> > l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
> >             -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
> >             -device vhost-vsock-pci,guest-cid=4
> > 
> >  # all iperf3 listen on CID_ANY and port 5201, but in different netns
> > l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
> >                            # on default netns (init_net)
> > l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> > l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> > 
> > l0_host$ ./iperf3 --vsock -c 3
> > l2_vm1$ ./iperf3 --vsock -c 2
> > l2_vm2$ ./iperf3 --vsock -c 2
> > 
> > [1] https://www.spinics.net/lists/netdev/msg575792.html
> > 
> > Stefano Garzarella (3):
> >   vsock: add network namespace support
> >   vsock/virtio_transport_common: handle netns of received packets
> >   vhost/vsock: use netns of process that opens the vhost-vsock device
> > 
> >  drivers/vhost/vsock.c                   | 29 ++++++++++++-----
> >  include/linux/virtio_vsock.h            |  2 ++
> >  include/net/af_vsock.h                  |  7 +++--
> >  net/vmw_vsock/af_vsock.c                | 41 +++++++++++++++++++------
> >  net/vmw_vsock/hyperv_transport.c        |  5 +--
> >  net/vmw_vsock/virtio_transport.c        |  2 ++
> >  net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
> >  net/vmw_vsock/vmci_transport.c          |  5 +--
> >  8 files changed, 78 insertions(+), 25 deletions(-)
> > 
> > -- 
> > 2.24.1
> > 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-04-27 14:31   ` Michael S. Tsirkin
@ 2020-04-27 15:21     ` Stefano Garzarella
  0 siblings, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-04-27 15:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, Stefan Hajnoczi, linux-kernel, Jorgen Hansen, Jason Wang,
	kvm, virtualization, linux-hyperv, Dexuan Cui, netdev

On Mon, Apr 27, 2020 at 10:31:57AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 27, 2020 at 04:25:18PM +0200, Stefano Garzarella wrote:
> > Hi David, Michael, Stefan,
> > I'm restarting to work on this topic since Kata guys are interested to
> > have that, especially on the guest side.
> > 
> > While working on the v2 I had few doubts, and I'd like to have your
> > suggestions:
> > 
> >  1. netns assigned to the device inside the guest
> > 
> >    Currently I assigned this device to 'init_net'. Maybe it is better
> >    if we allow the user to decide which netns assign to the device
> >    or to disable this new feature to have the same behavior as before
> >    (host reachable from any netns).
> >    I think we can handle this in the vsock core and not in the single
> >    transports.
> > 
> >    The simplest way that I found, is to add a new
> >    IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
> >    and assign the device to the same netns of the process that do the
> >    ioctl(), but I'm not sure it is clean enough.
> > 
> >    Maybe it is better to add new rtnetlink messages, but I'm not sure if
> >    it is feasible since we don't have a netdev device.
> > 
> >    What do you suggest?
> 
> Maybe /dev/vsock-netns here too, like in the host?
> 

I'm not sure I get it.

In the guest, /dev/vsock is only used to get the CID assigned to the
guest through an ioctl().

In the virtio-vsock case, the guest transport is loaded when it is discovered
on the PCI bus, so we need a way to "move" it to a netns or to specify
which netns should be used when it is probed.

> 
> > 
> >  2. netns assigned in the host
> > 
> >     As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
> >     userspace application to use this new feature, leaving to
> >     /dev/vhost-vsock the previous behavior (guest reachable from any
> >     netns).
> > 
> >     I like this approach, but I had these doubts:
> > 
> >     - I need to allocate a new minor for that device (e.g.
> >       VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
> >       use?
> 
> Not that I see. I agree it's a bit annoying. I'll think about it a bit.
> 

Thanks for that!
An idea that I had, was to add a new ioctl to /dev/vhost-vsock to enable
the netns support, but I'm not sure it is a clean approach.

> >     - It is vhost-vsock specific, should we provide something handled in
> >       the vsock core, maybe centralizing the CID allocation and adding a
> >       new IOCTL or rtnetlink message like for the guest side?
> >       (maybe it could be a second step, and for now we can continue with
> >       the new device)
> > 

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-04-27 14:25 ` Stefano Garzarella
  2020-04-27 14:31   ` Michael S. Tsirkin
@ 2020-04-28  8:13   ` Jason Wang
  2020-04-28 16:00     ` Stefano Garzarella
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Wang @ 2020-04-28  8:13 UTC (permalink / raw)
  To: Stefano Garzarella, davem, Stefan Hajnoczi, Michael S. Tsirkin
  Cc: linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev


On 2020/4/27 下午10:25, Stefano Garzarella wrote:
> Hi David, Michael, Stefan,
> I'm restarting to work on this topic since Kata guys are interested to
> have that, especially on the guest side.
>
> While working on the v2 I had few doubts, and I'd like to have your
> suggestions:
>
>   1. netns assigned to the device inside the guest
>
>     Currently I assigned this device to 'init_net'. Maybe it is better
>     if we allow the user to decide which netns assign to the device
>     or to disable this new feature to have the same behavior as before
>     (host reachable from any netns).
>     I think we can handle this in the vsock core and not in the single
>     transports.
>
>     The simplest way that I found, is to add a new
>     IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
>     and assign the device to the same netns of the process that do the
>     ioctl(), but I'm not sure it is clean enough.
>
>     Maybe it is better to add new rtnetlink messages, but I'm not sure if
>     it is feasible since we don't have a netdev device.
>
>     What do you suggest?


As we've discussed, it should be a netdev probably in either guest or 
host side. And it would be much simpler if we want do implement 
namespace then. No new API is needed.

Thanks


>
>
>   2. netns assigned in the host
>
>      As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
>      userspace application to use this new feature, leaving to
>      /dev/vhost-vsock the previous behavior (guest reachable from any
>      netns).
>
>      I like this approach, but I had these doubts:
>
>      - I need to allocate a new minor for that device (e.g.
>        VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
>        use?
>
>      - It is vhost-vsock specific, should we provide something handled in
>        the vsock core, maybe centralizing the CID allocation and adding a
>        new IOCTL or rtnetlink message like for the guest side?
>        (maybe it could be a second step, and for now we can continue with
>        the new device)
>
>
> Thanks for the help,
> Stefano
>
>
> On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
>> RFC -> v1:
>>   * added 'netns' module param to vsock.ko to enable the
>>     network namespace support (disabled by default)
>>   * added 'vsock_net_eq()' to check the "net" assigned to a socket
>>     only when 'netns' support is enabled
>>
>> RFC: https://patchwork.ozlabs.org/cover/1202235/
>>
>> Now that we have multi-transport upstream, I started to take a look to
>> support network namespace in vsock.
>>
>> As we partially discussed in the multi-transport proposal [1], it could
>> be nice to support network namespace in vsock to reach the following
>> goals:
>> - isolate host applications from guest applications using the same ports
>>    with CID_ANY
>> - assign the same CID of VMs running in different network namespaces
>> - partition VMs between VMMs or at finer granularity
>>
>> This new feature is disabled by default, because it changes vsock's
>> behavior with network namespaces and could break existing applications.
>> It can be enabled with the new 'netns' module parameter of vsock.ko.
>>
>> This implementation provides the following behavior:
>> - packets received from the host (received by G2H transports) are
>>    assigned to the default netns (init_net)
>> - packets received from the guest (received by H2G - vhost-vsock) are
>>    assigned to the netns of the process that opens /dev/vhost-vsock
>>    (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
>>      - for vmci I need some suggestions, because I don't know how to do
>>        and test the same in the vmci driver, for now vmci uses the
>>        init_net
>> - loopback packets are exchanged only in the same netns
>>
>> I tested the series in this way:
>> l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
>>              -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
>>              -device vhost-vsock-pci,guest-cid=3
>>
>> l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
>>
>> l1_vm$ ip netns add ns1
>> l1_vm$ ip netns add ns2
>>   # same CID on different netns
>> l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
>>              -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
>>              -device vhost-vsock-pci,guest-cid=4
>> l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
>>              -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
>>              -device vhost-vsock-pci,guest-cid=4
>>
>>   # all iperf3 listen on CID_ANY and port 5201, but in different netns
>> l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
>>                             # on default netns (init_net)
>> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
>> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
>>
>> l0_host$ ./iperf3 --vsock -c 3
>> l2_vm1$ ./iperf3 --vsock -c 2
>> l2_vm2$ ./iperf3 --vsock -c 2
>>
>> [1] https://www.spinics.net/lists/netdev/msg575792.html
>>
>> Stefano Garzarella (3):
>>    vsock: add network namespace support
>>    vsock/virtio_transport_common: handle netns of received packets
>>    vhost/vsock: use netns of process that opens the vhost-vsock device
>>
>>   drivers/vhost/vsock.c                   | 29 ++++++++++++-----
>>   include/linux/virtio_vsock.h            |  2 ++
>>   include/net/af_vsock.h                  |  7 +++--
>>   net/vmw_vsock/af_vsock.c                | 41 +++++++++++++++++++------
>>   net/vmw_vsock/hyperv_transport.c        |  5 +--
>>   net/vmw_vsock/virtio_transport.c        |  2 ++
>>   net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
>>   net/vmw_vsock/vmci_transport.c          |  5 +--
>>   8 files changed, 78 insertions(+), 25 deletions(-)
>>
>> -- 
>> 2.24.1
>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-04-28  8:13   ` Jason Wang
@ 2020-04-28 16:00     ` Stefano Garzarella
  2020-04-29  9:21       ` Jason Wang
  2025-03-05  0:39       ` Bobby Eshleman
  0 siblings, 2 replies; 50+ messages in thread
From: Stefano Garzarella @ 2020-04-28 16:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: davem, Stefan Hajnoczi, Michael S. Tsirkin, linux-kernel,
	Jorgen Hansen, kvm, virtualization, linux-hyperv, Dexuan Cui,
	netdev

On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
> 
> On 2020/4/27 下午10:25, Stefano Garzarella wrote:
> > Hi David, Michael, Stefan,
> > I'm restarting to work on this topic since Kata guys are interested to
> > have that, especially on the guest side.
> > 
> > While working on the v2 I had few doubts, and I'd like to have your
> > suggestions:
> > 
> >   1. netns assigned to the device inside the guest
> > 
> >     Currently I assigned this device to 'init_net'. Maybe it is better
> >     if we allow the user to decide which netns assign to the device
> >     or to disable this new feature to have the same behavior as before
> >     (host reachable from any netns).
> >     I think we can handle this in the vsock core and not in the single
> >     transports.
> > 
> >     The simplest way that I found, is to add a new
> >     IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
> >     and assign the device to the same netns of the process that do the
> >     ioctl(), but I'm not sure it is clean enough.
> > 
> >     Maybe it is better to add new rtnetlink messages, but I'm not sure if
> >     it is feasible since we don't have a netdev device.
> > 
> >     What do you suggest?
> 
> 
> As we've discussed, it should be a netdev probably in either guest or host
> side. And it would be much simpler if we want do implement namespace then.
> No new API is needed.
> 

Thanks Jason!

It would be cool, but I don't have much experience on netdev.
Do you see any particular obstacles?

I'll take a look to understand how to do it, surely in the guest would
be very useful to have the vsock device as a netdev and maybe also in the host.

Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-04-28 16:00     ` Stefano Garzarella
@ 2020-04-29  9:21       ` Jason Wang
  2025-03-05  0:39       ` Bobby Eshleman
  1 sibling, 0 replies; 50+ messages in thread
From: Jason Wang @ 2020-04-29  9:21 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: davem, Stefan Hajnoczi, Michael S. Tsirkin, linux-kernel,
	Jorgen Hansen, kvm, virtualization, linux-hyperv, Dexuan Cui,
	netdev


On 2020/4/29 上午12:00, Stefano Garzarella wrote:
> On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
>> On 2020/4/27 下午10:25, Stefano Garzarella wrote:
>>> Hi David, Michael, Stefan,
>>> I'm restarting to work on this topic since Kata guys are interested to
>>> have that, especially on the guest side.
>>>
>>> While working on the v2 I had few doubts, and I'd like to have your
>>> suggestions:
>>>
>>>    1. netns assigned to the device inside the guest
>>>
>>>      Currently I assigned this device to 'init_net'. Maybe it is better
>>>      if we allow the user to decide which netns assign to the device
>>>      or to disable this new feature to have the same behavior as before
>>>      (host reachable from any netns).
>>>      I think we can handle this in the vsock core and not in the single
>>>      transports.
>>>
>>>      The simplest way that I found, is to add a new
>>>      IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
>>>      and assign the device to the same netns of the process that do the
>>>      ioctl(), but I'm not sure it is clean enough.
>>>
>>>      Maybe it is better to add new rtnetlink messages, but I'm not sure if
>>>      it is feasible since we don't have a netdev device.
>>>
>>>      What do you suggest?
>> As we've discussed, it should be a netdev probably in either guest or host
>> side. And it would be much simpler if we want do implement namespace then.
>> No new API is needed.
>>
> Thanks Jason!
>
> It would be cool, but I don't have much experience on netdev.
> Do you see any particular obstacles?


I don't see but if there's we can try to find a solution or ask for 
netdev experts for that. I do hear from somebody that is interested in 
having netdev in the past.


>
> I'll take a look to understand how to do it, surely in the guest would
> be very useful to have the vsock device as a netdev and maybe also in the host.


Yes, it's worth to have a try then we will have a unified management 
interface and we will benefit from it in the future.

Starting form guest is good idea which should be less complicated than host.

Thanks


>
> Stefano
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
                   ` (4 preceding siblings ...)
  2020-04-27 14:25 ` Stefano Garzarella
@ 2025-03-05  0:06 ` Bobby Eshleman
  2025-03-05  9:42   ` Stefano Garzarella
  5 siblings, 1 reply; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-05  0:06 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	Stefan Hajnoczi, virtualization, linux-hyperv, Michael S. Tsirkin,
	Dexuan Cui, Jakub Kicinski

On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
> RFC -> v1:
>  * added 'netns' module param to vsock.ko to enable the
>    network namespace support (disabled by default)
>  * added 'vsock_net_eq()' to check the "net" assigned to a socket
>    only when 'netns' support is enabled
> 
> RFC: https://patchwork.ozlabs.org/cover/1202235/
> 
> Now that we have multi-transport upstream, I started to take a look to
> support network namespace in vsock.
> 
> As we partially discussed in the multi-transport proposal [1], it could
> be nice to support network namespace in vsock to reach the following
> goals:
> - isolate host applications from guest applications using the same ports
>   with CID_ANY
> - assign the same CID of VMs running in different network namespaces
> - partition VMs between VMMs or at finer granularity
> 
> This new feature is disabled by default, because it changes vsock's
> behavior with network namespaces and could break existing applications.
> It can be enabled with the new 'netns' module parameter of vsock.ko.
> 
> This implementation provides the following behavior:
> - packets received from the host (received by G2H transports) are
>   assigned to the default netns (init_net)
> - packets received from the guest (received by H2G - vhost-vsock) are
>   assigned to the netns of the process that opens /dev/vhost-vsock
>   (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
>     - for vmci I need some suggestions, because I don't know how to do
>       and test the same in the vmci driver, for now vmci uses the
>       init_net
> - loopback packets are exchanged only in the same netns


Hey Stefano,

I recently picked up this series and am hoping to help update it / get
it merged to address a known use case. I have some questions and
thoughts (in other parts of this thread) and would love some
suggestions!

I already have a local branch with this updated with skbs and using
/dev/vhost-vsock-netns to opt-in the VM as per the discussion in this
thread.

One question: what is the behavior we expect from guest namespaces?  In
v2, you mentioned prototyping a /dev/vsock ioctl() to define the
namespace for the virtio-vsock device. This would mean only one
namespace could use vsock in the guest? Do we want to make sure that our
design makes it possible to support multiple namespaces in the future if
the use case arrives?

More questions/comments in other parts of this thread.

Thanks!

- Bobby


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2020-04-28 16:00     ` Stefano Garzarella
  2020-04-29  9:21       ` Jason Wang
@ 2025-03-05  0:39       ` Bobby Eshleman
  2025-03-05  5:46         ` Jason Wang
  2025-03-05  7:27         ` Michael S. Tsirkin
  1 sibling, 2 replies; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-05  0:39 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jason Wang, davem, Stefan Hajnoczi, Michael S. Tsirkin,
	linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev, Jakub Kicinski

On Tue, Apr 28, 2020 at 06:00:52PM +0200, Stefano Garzarella wrote:
> On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
> > 
> > 
> > As we've discussed, it should be a netdev probably in either guest or host
> > side. And it would be much simpler if we want do implement namespace then.
> > No new API is needed.
> > 
> 
> Thanks Jason!
> 
> It would be cool, but I don't have much experience on netdev.
> Do you see any particular obstacles?
> 
> I'll take a look to understand how to do it, surely in the guest would
> be very useful to have the vsock device as a netdev and maybe also in the host.
> 

WRT netdev, do we foresee big gains beyond just leveraging the netdev's
namespace?

IIUC, the idea is that we could follow the tcp/ip model and introduce
vsock-supported netdevs. This would allow us to have a netdev associated
with the virtio-vsock device and create virtual netdev pairs (i.e.,
veth) that can bridge namespaces. Then, allocate CIDs or configure port
mappings for those namespaces?

I think it might be a lot of complexity to bring into the picture from
netdev, and I'm not sure there is a big win since the vsock device could
also have a vsock->net itself? I think the complexity will come from the
address translation, which I don't think netdev buys us because there
would still be all of the work work to support vsock in netfilter?

Some other thoughts I had: netdev's flow control features would all have
to be ignored or disabled somehow (I think dev_direct_xmit()?), because
queueing introduces packet loss and the vsock protocol is unable to
survive packet loss. Netfilter's ability to drop packets would have to
be disabled too.

Best,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  0:39       ` Bobby Eshleman
@ 2025-03-05  5:46         ` Jason Wang
  2025-03-10 20:14           ` Bobby Eshleman
  2025-03-05  7:27         ` Michael S. Tsirkin
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Wang @ 2025-03-05  5:46 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefano Garzarella, davem, Stefan Hajnoczi, Michael S. Tsirkin,
	linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev, Jakub Kicinski

On Wed, Mar 5, 2025 at 8:39 AM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> On Tue, Apr 28, 2020 at 06:00:52PM +0200, Stefano Garzarella wrote:
> > On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
> > >
> > >
> > > As we've discussed, it should be a netdev probably in either guest or host
> > > side. And it would be much simpler if we want do implement namespace then.
> > > No new API is needed.
> > >
> >
> > Thanks Jason!
> >
> > It would be cool, but I don't have much experience on netdev.
> > Do you see any particular obstacles?
> >
> > I'll take a look to understand how to do it, surely in the guest would
> > be very useful to have the vsock device as a netdev and maybe also in the host.
> >
>
> WRT netdev, do we foresee big gains beyond just leveraging the netdev's
> namespace?

It's a leverage of the network subsystem (netdevice, steering, uAPI,
tracing, probably a lot of others), not only its namespace. It can
avoid duplicating existing mechanisms in a vsock specific way. If we
manage to do that, namespace support will be a "byproduct".

>
> IIUC, the idea is that we could follow the tcp/ip model and introduce
> vsock-supported netdevs. This would allow us to have a netdev associated
> with the virtio-vsock device and create virtual netdev pairs (i.e.,
> veth) that can bridge namespaces. Then, allocate CIDs or configure port
> mappings for those namespaces?

Probably.

>
> I think it might be a lot of complexity to bring into the picture from
> netdev, and I'm not sure there is a big win since the vsock device could
> also have a vsock->net itself?

Yes, it can. I think we need to evaluate both approaches (that's why I
raise the approach of reusing netdevice). We can hear from others.

> I think the complexity will come from the
> address translation, which I don't think netdev buys us because there
> would still be all of the work work to support vsock in netfilter?

Netfilter should not work as vsock will behave as a separate protocol
other than TCP/IP (e.g ETH_P_VSOCK)  if we try to implement netdevice.

>
> Some other thoughts I had: netdev's flow control features would all have
> to be ignored or disabled somehow (I think dev_direct_xmit()?), because
> queueing introduces packet loss and the vsock protocol is unable to
> survive packet loss.

Or just allow it and then configuring a qdisc that may drop packets
could be treated as a misconfiguration.

> Netfilter's ability to drop packets would have to
> be disabled too.
>
> Best,
> Bobby
>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  0:39       ` Bobby Eshleman
  2025-03-05  5:46         ` Jason Wang
@ 2025-03-05  7:27         ` Michael S. Tsirkin
  2025-03-05  9:30           ` Stefano Garzarella
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2025-03-05  7:27 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefano Garzarella, Jason Wang, davem, Stefan Hajnoczi,
	linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev, Jakub Kicinski

On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> I think it might be a lot of complexity to bring into the picture from
> netdev, and I'm not sure there is a big win since the vsock device could
> also have a vsock->net itself? I think the complexity will come from the
> address translation, which I don't think netdev buys us because there
> would still be all of the work work to support vsock in netfilter?

Ugh.

Guys, let's remember what vsock is.

It's a replacement for the serial device with an interface
that's easier for userspace to consume, as you get
the demultiplexing by the port number.

The whole point of vsock is that people do not want
any firewalling, filtering, or management on it.

It needs to work with no configuration even if networking is
misconfigured or blocked.

-- 
MST


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2020-01-16 17:24 ` [PATCH net-next 1/3] vsock: add network namespace support Stefano Garzarella
  2020-01-20  9:06   ` David Miller
@ 2025-03-05  7:32   ` Michael S. Tsirkin
  2025-03-05  9:23     ` Stefano Garzarella
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2025-03-05  7:32 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui

On Thu, Jan 16, 2020 at 06:24:26PM +0100, Stefano Garzarella wrote:
> This patch adds a check of the "net" assigned to a socket during
> the vsock_find_bound_socket() and vsock_find_connected_socket()
> to support network namespace, allowing to share the same address
> (cid, port) across different network namespaces.
> 
> This patch adds 'netns' module param to enable this new feature
> (disabled by default), because it changes vsock's behavior with
> network namespaces and could break existing applications.
> G2H transports will use the default network namepsace (init_net).
> H2G transports can use different network namespace for different
> VMs.


I'm not sure I understand the usecase. Can you explain a bit more,
please?

> 
> This patch uses default network namepsace (init_net) in all
> transports.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1
>  * added 'netns' module param
>  * added 'vsock_net_eq()' to check the "net" assigned to a socket
>    only when 'netns' support is enabled
> ---
>  include/net/af_vsock.h                  |  7 +++--
>  net/vmw_vsock/af_vsock.c                | 41 +++++++++++++++++++------
>  net/vmw_vsock/hyperv_transport.c        |  5 +--
>  net/vmw_vsock/virtio_transport_common.c |  5 +--
>  net/vmw_vsock/vmci_transport.c          |  5 +--
>  5 files changed, 46 insertions(+), 17 deletions(-)
> 
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index b1c717286993..015913601fad 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -193,13 +193,16 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
>  void vsock_insert_connected(struct vsock_sock *vsk);
>  void vsock_remove_bound(struct vsock_sock *vsk);
>  void vsock_remove_connected(struct vsock_sock *vsk);
> -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net);
>  struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> -					 struct sockaddr_vm *dst);
> +					 struct sockaddr_vm *dst,
> +					 struct net *net);
>  void vsock_remove_sock(struct vsock_sock *vsk);
>  void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
>  int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
>  bool vsock_find_cid(unsigned int cid);
> +bool vsock_net_eq(const struct net *net1, const struct net *net2);
> +struct net *vsock_default_net(void);
>  
>  /**** TAP ****/
>  
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 9c5b2a91baad..457ccd677756 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -140,6 +140,10 @@ static const struct vsock_transport *transport_dgram;
>  static const struct vsock_transport *transport_local;
>  static DEFINE_MUTEX(vsock_register_mutex);
>  
> +static bool netns;
> +module_param(netns, bool, 0644);
> +MODULE_PARM_DESC(netns, "Enable network namespace support");
> +
>  /**** UTILS ****/
>  
>  /* Each bound VSocket is stored in the bind hash table and each connected
> @@ -226,15 +230,18 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
>  	sock_put(&vsk->sk);
>  }
>  
> -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr,
> +					      struct net *net)
>  {
>  	struct vsock_sock *vsk;
>  
>  	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> -		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> +		if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
> +		    vsock_net_eq(net, sock_net(sk_vsock(vsk))))
>  			return sk_vsock(vsk);
>  
>  		if (addr->svm_port == vsk->local_addr.svm_port &&
> +		    vsock_net_eq(net, sock_net(sk_vsock(vsk))) &&
>  		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
>  		     addr->svm_cid == VMADDR_CID_ANY))
>  			return sk_vsock(vsk);
> @@ -244,13 +251,15 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>  }
>  
>  static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> -						  struct sockaddr_vm *dst)
> +						  struct sockaddr_vm *dst,
> +						  struct net *net)
>  {
>  	struct vsock_sock *vsk;
>  
>  	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
>  			    connected_table) {
>  		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
> +		    vsock_net_eq(net, sock_net(sk_vsock(vsk))) &&
>  		    dst->svm_port == vsk->local_addr.svm_port) {
>  			return sk_vsock(vsk);
>  		}
> @@ -295,12 +304,12 @@ void vsock_remove_connected(struct vsock_sock *vsk)
>  }
>  EXPORT_SYMBOL_GPL(vsock_remove_connected);
>  
> -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net)
>  {
>  	struct sock *sk;
>  
>  	spin_lock_bh(&vsock_table_lock);
> -	sk = __vsock_find_bound_socket(addr);
> +	sk = __vsock_find_bound_socket(addr, net);
>  	if (sk)
>  		sock_hold(sk);
>  
> @@ -311,12 +320,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>  EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
>  
>  struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> -					 struct sockaddr_vm *dst)
> +					 struct sockaddr_vm *dst,
> +					 struct net *net)
>  {
>  	struct sock *sk;
>  
>  	spin_lock_bh(&vsock_table_lock);
> -	sk = __vsock_find_connected_socket(src, dst);
> +	sk = __vsock_find_connected_socket(src, dst, net);
>  	if (sk)
>  		sock_hold(sk);
>  
> @@ -488,6 +498,18 @@ bool vsock_find_cid(unsigned int cid)
>  }
>  EXPORT_SYMBOL_GPL(vsock_find_cid);
>  
> +bool vsock_net_eq(const struct net *net1, const struct net *net2)
> +{
> +	return !netns || net_eq(net1, net2);
> +}
> +EXPORT_SYMBOL_GPL(vsock_net_eq);
> +
> +struct net *vsock_default_net(void)
> +{
> +	return &init_net;
> +}
> +EXPORT_SYMBOL_GPL(vsock_default_net);
> +
>  static struct sock *vsock_dequeue_accept(struct sock *listener)
>  {
>  	struct vsock_sock *vlistener;
> @@ -586,6 +608,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
>  {
>  	static u32 port;
>  	struct sockaddr_vm new_addr;
> +	struct net *net = sock_net(sk_vsock(vsk));
>  
>  	if (!port)
>  		port = LAST_RESERVED_PORT + 1 +
> @@ -603,7 +626,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
>  
>  			new_addr.svm_port = port++;
>  
> -			if (!__vsock_find_bound_socket(&new_addr)) {
> +			if (!__vsock_find_bound_socket(&new_addr, net)) {
>  				found = true;
>  				break;
>  			}
> @@ -620,7 +643,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
>  			return -EACCES;
>  		}
>  
> -		if (__vsock_find_bound_socket(&new_addr))
> +		if (__vsock_find_bound_socket(&new_addr, net))
>  			return -EADDRINUSE;
>  	}
>  
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index b3bdae74c243..237c53316d70 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -201,7 +201,8 @@ static void hvs_remote_addr_init(struct sockaddr_vm *remote,
>  
>  		remote->svm_port = host_ephemeral_port++;
>  
> -		sk = vsock_find_connected_socket(remote, local);
> +		sk = vsock_find_connected_socket(remote, local,
> +						 vsock_default_net());
>  		if (!sk) {
>  			/* Found an available ephemeral port */
>  			return;
> @@ -350,7 +351,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
>  		return;
>  
>  	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
> -	sk = vsock_find_bound_socket(&addr);
> +	sk = vsock_find_bound_socket(&addr, vsock_default_net());
>  	if (!sk)
>  		return;
>  
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index d9f0c9c5425a..cecdfd91ed00 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1088,6 +1088,7 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
>  void virtio_transport_recv_pkt(struct virtio_transport *t,
>  			       struct virtio_vsock_pkt *pkt)
>  {
> +	struct net *net = vsock_default_net();
>  	struct sockaddr_vm src, dst;
>  	struct vsock_sock *vsk;
>  	struct sock *sk;
> @@ -1115,9 +1116,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  	/* The socket must be in connected or bound table
>  	 * otherwise send reset back
>  	 */
> -	sk = vsock_find_connected_socket(&src, &dst);
> +	sk = vsock_find_connected_socket(&src, &dst, net);
>  	if (!sk) {
> -		sk = vsock_find_bound_socket(&dst);
> +		sk = vsock_find_bound_socket(&dst, net);
>  		if (!sk) {
>  			(void)virtio_transport_reset_no_sock(t, pkt);
>  			goto free_pkt;
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 4b8b1150a738..3ad15d51b30b 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -669,6 +669,7 @@ static bool vmci_transport_stream_allow(u32 cid, u32 port)
>  
>  static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
>  {
> +	struct net *net = vsock_default_net();
>  	struct sock *sk;
>  	struct sockaddr_vm dst;
>  	struct sockaddr_vm src;
> @@ -702,9 +703,9 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
>  	vsock_addr_init(&src, pkt->dg.src.context, pkt->src_port);
>  	vsock_addr_init(&dst, pkt->dg.dst.context, pkt->dst_port);
>  
> -	sk = vsock_find_connected_socket(&src, &dst);
> +	sk = vsock_find_connected_socket(&src, &dst, net);
>  	if (!sk) {
> -		sk = vsock_find_bound_socket(&dst);
> +		sk = vsock_find_bound_socket(&dst, net);
>  		if (!sk) {
>  			/* We could not find a socket for this specified
>  			 * address.  If this packet is a RST, we just drop it.
> -- 
> 2.24.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2025-03-05  7:32   ` Michael S. Tsirkin
@ 2025-03-05  9:23     ` Stefano Garzarella
  2025-03-05  9:28       ` Michael S. Tsirkin
  2025-03-05 15:54       ` Bobby Eshleman
  0 siblings, 2 replies; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-05  9:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jakub Kicinski
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	Stefan Hajnoczi, virtualization, linux-hyperv, Dexuan Cui,
	Bobby Eshleman

On Wed, 5 Mar 2025 at 08:32, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 16, 2020 at 06:24:26PM +0100, Stefano Garzarella wrote:
> > This patch adds a check of the "net" assigned to a socket during
> > the vsock_find_bound_socket() and vsock_find_connected_socket()
> > to support network namespace, allowing to share the same address
> > (cid, port) across different network namespaces.
> >
> > This patch adds 'netns' module param to enable this new feature
> > (disabled by default), because it changes vsock's behavior with
> > network namespaces and could break existing applications.
> > G2H transports will use the default network namepsace (init_net).
> > H2G transports can use different network namespace for different
> > VMs.
>
>
> I'm not sure I understand the usecase. Can you explain a bit more,
> please?

It's been five years, but I'm trying!
We are tracking this RFE here [1].

I also add Jakub in the thread with who I discussed last year a possible 
restart of this effort, he could add more use cases.

The problem with vsock, host-side, currently is that if you launch a VM 
with a virtio-vsock device (using vhost) inside a container (e.g., 
Kata), so inside a network namespace, it is reachable from any other 
container, whereas they would like some isolation. Also the CID is 
shared among all, while they would like to reuse the same CID in 
different namespaces.

This has been partially solved with vhost-user-vsock, but it is 
inconvenient to use sometimes because of the hybrid-vsock problem 
(host-side vsock is remapped to AF_UNIX).

Something from the cover letter of the series [2]:

  As we partially discussed in the multi-transport proposal, it could
  be nice to support network namespace in vsock to reach the following
  goals:
  - isolate host applications from guest applications using the same ports
    with CID_ANY
  - assign the same CID of VMs running in different network namespaces
  - partition VMs between VMMs or at finer granularity

Thanks,
Stefano

[1] https://gitlab.com/vsock/vsock/-/issues/2
[2] https://lore.kernel.org/virtualization/20200116172428.311437-1-sgarzare@redhat.com/


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2025-03-05  9:23     ` Stefano Garzarella
@ 2025-03-05  9:28       ` Michael S. Tsirkin
  2025-03-05  9:37         ` Stefano Garzarella
  2025-03-05 15:54       ` Bobby Eshleman
  1 sibling, 1 reply; 50+ messages in thread
From: Michael S. Tsirkin @ 2025-03-05  9:28 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jakub Kicinski, davem, netdev, linux-kernel, Jorgen Hansen,
	Jason Wang, kvm, Stefan Hajnoczi, virtualization, linux-hyperv,
	Dexuan Cui, Bobby Eshleman

On Wed, Mar 05, 2025 at 10:23:08AM +0100, Stefano Garzarella wrote:
> On Wed, 5 Mar 2025 at 08:32, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jan 16, 2020 at 06:24:26PM +0100, Stefano Garzarella wrote:
> > > This patch adds a check of the "net" assigned to a socket during
> > > the vsock_find_bound_socket() and vsock_find_connected_socket()
> > > to support network namespace, allowing to share the same address
> > > (cid, port) across different network namespaces.
> > >
> > > This patch adds 'netns' module param to enable this new feature
> > > (disabled by default), because it changes vsock's behavior with
> > > network namespaces and could break existing applications.
> > > G2H transports will use the default network namepsace (init_net).
> > > H2G transports can use different network namespace for different
> > > VMs.
> >
> >
> > I'm not sure I understand the usecase. Can you explain a bit more,
> > please?
> 
> It's been five years, but I'm trying!
> We are tracking this RFE here [1].
> 
> I also add Jakub in the thread with who I discussed last year a possible 
> restart of this effort, he could add more use cases.
> 
> The problem with vsock, host-side, currently is that if you launch a VM 
> with a virtio-vsock device (using vhost) inside a container (e.g., 
> Kata), so inside a network namespace, it is reachable from any other 
> container, whereas they would like some isolation. Also the CID is 
> shared among all, while they would like to reuse the same CID in 
> different namespaces.
> 
> This has been partially solved with vhost-user-vsock, but it is 
> inconvenient to use sometimes because of the hybrid-vsock problem 
> (host-side vsock is remapped to AF_UNIX).
> 
> Something from the cover letter of the series [2]:
> 
>   As we partially discussed in the multi-transport proposal, it could
>   be nice to support network namespace in vsock to reach the following
>   goals:
>   - isolate host applications from guest applications using the same ports
>     with CID_ANY
>   - assign the same CID of VMs running in different network namespaces
>   - partition VMs between VMMs or at finer granularity
> 
> Thanks,
> Stefano
> 
> [1] https://gitlab.com/vsock/vsock/-/issues/2
> [2] https://lore.kernel.org/virtualization/20200116172428.311437-1-sgarzare@redhat.com/


Ok so, host side. I get it. And the problem with your patches is that
they affect the guest side. Fix that, basically.

-- 
MST


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  7:27         ` Michael S. Tsirkin
@ 2025-03-05  9:30           ` Stefano Garzarella
  2025-03-05 16:09             ` Bobby Eshleman
  2025-03-06  0:16             ` Jason Wang
  0 siblings, 2 replies; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-05  9:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Bobby Eshleman, Jakub Kicinski
  Cc: Jason Wang, davem, Stefan Hajnoczi, linux-kernel, Jorgen Hansen,
	kvm, virtualization, linux-hyperv, Dexuan Cui, netdev,
	Jakub Kicinski

On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
>On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
>> I think it might be a lot of complexity to bring into the picture from
>> netdev, and I'm not sure there is a big win since the vsock device could
>> also have a vsock->net itself? I think the complexity will come from the
>> address translation, which I don't think netdev buys us because there
>> would still be all of the work work to support vsock in netfilter?
>
>Ugh.
>
>Guys, let's remember what vsock is.
>
>It's a replacement for the serial device with an interface
>that's easier for userspace to consume, as you get
>the demultiplexing by the port number.
>
>The whole point of vsock is that people do not want
>any firewalling, filtering, or management on it.
>
>It needs to work with no configuration even if networking is
>misconfigured or blocked.

I agree with Michael here.

It's been 5 years and my memory is bad, but using netdev seemed like a 
mess, especially because in vsock we don't have anything related to 
IP/Ethernet/ARP, etc.

I see vsock more as AF_UNIX than netdev.

I put in CC Jakub who was covering network namespace, maybe he has some 
advice for us regarding this. Context [1].

Thanks,
Stefano

[1] https://lore.kernel.org/netdev/Z8edJjqAqAaV3Vkt@devvm6277.cco0.facebook.com/


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2025-03-05  9:28       ` Michael S. Tsirkin
@ 2025-03-05  9:37         ` Stefano Garzarella
  0 siblings, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-05  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jakub Kicinski, davem, netdev, linux-kernel, Jorgen Hansen,
	Jason Wang, kvm, Stefan Hajnoczi, virtualization, linux-hyperv,
	Dexuan Cui, Bobby Eshleman

On Wed, 5 Mar 2025 at 10:29, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Mar 05, 2025 at 10:23:08AM +0100, Stefano Garzarella wrote:
> > On Wed, 5 Mar 2025 at 08:32, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jan 16, 2020 at 06:24:26PM +0100, Stefano Garzarella wrote:
> > > > This patch adds a check of the "net" assigned to a socket during
> > > > the vsock_find_bound_socket() and vsock_find_connected_socket()
> > > > to support network namespace, allowing to share the same address
> > > > (cid, port) across different network namespaces.
> > > >
> > > > This patch adds 'netns' module param to enable this new feature
> > > > (disabled by default), because it changes vsock's behavior with
> > > > network namespaces and could break existing applications.
> > > > G2H transports will use the default network namepsace (init_net).
> > > > H2G transports can use different network namespace for different
> > > > VMs.
> > >
> > >
> > > I'm not sure I understand the usecase. Can you explain a bit more,
> > > please?
> >
> > It's been five years, but I'm trying!
> > We are tracking this RFE here [1].
> >
> > I also add Jakub in the thread with who I discussed last year a possible
> > restart of this effort, he could add more use cases.
> >
> > The problem with vsock, host-side, currently is that if you launch a VM
> > with a virtio-vsock device (using vhost) inside a container (e.g.,
> > Kata), so inside a network namespace, it is reachable from any other
> > container, whereas they would like some isolation. Also the CID is
> > shared among all, while they would like to reuse the same CID in
> > different namespaces.
> >
> > This has been partially solved with vhost-user-vsock, but it is
> > inconvenient to use sometimes because of the hybrid-vsock problem
> > (host-side vsock is remapped to AF_UNIX).
> >
> > Something from the cover letter of the series [2]:
> >
> >   As we partially discussed in the multi-transport proposal, it could
> >   be nice to support network namespace in vsock to reach the following
> >   goals:
> >   - isolate host applications from guest applications using the same ports
> >     with CID_ANY
> >   - assign the same CID of VMs running in different network namespaces
> >   - partition VMs between VMMs or at finer granularity
> >
> > Thanks,
> > Stefano
> >
> > [1] https://gitlab.com/vsock/vsock/-/issues/2
> > [2] https://lore.kernel.org/virtualization/20200116172428.311437-1-sgarzare@redhat.com/
>
>
> Ok so, host side. I get it.

Now that we're talking about it, I also came back to a guest side
case, again related to containers and possible nested VMs.

If you launch a container in a L1 guest, for example to launch a
nested VM, maybe you don't want to have it communicate with the L0
host, so it would be desirable to be able to isolate the virtio-vsock
device from it.

> And the problem with your patches is that
> they affect the guest side. Fix that, basically.

My main problem, IIRC, was making sure to allow the old behavior as
well (but that maybe we had solved with two /dev/vhost-vsock and
/dev/vhost-vsock-netns).

The other problem was really in the guest, on how to tell that the
virtio-vsock device (thus communication with the host) was reachable
from a netnamespace or not.

Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  0:06 ` Bobby Eshleman
@ 2025-03-05  9:42   ` Stefano Garzarella
  2025-03-05 17:14     ` Bobby Eshleman
  0 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-05  9:42 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	Stefan Hajnoczi, virtualization, linux-hyperv, Michael S. Tsirkin,
	Dexuan Cui, Jakub Kicinski

On Tue, Mar 04, 2025 at 04:06:02PM -0800, Bobby Eshleman wrote:
>On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
>> RFC -> v1:
>>  * added 'netns' module param to vsock.ko to enable the
>>    network namespace support (disabled by default)
>>  * added 'vsock_net_eq()' to check the "net" assigned to a socket
>>    only when 'netns' support is enabled
>>
>> RFC: https://patchwork.ozlabs.org/cover/1202235/
>>
>> Now that we have multi-transport upstream, I started to take a look to
>> support network namespace in vsock.
>>
>> As we partially discussed in the multi-transport proposal [1], it could
>> be nice to support network namespace in vsock to reach the following
>> goals:
>> - isolate host applications from guest applications using the same ports
>>   with CID_ANY
>> - assign the same CID of VMs running in different network namespaces
>> - partition VMs between VMMs or at finer granularity
>>
>> This new feature is disabled by default, because it changes vsock's
>> behavior with network namespaces and could break existing applications.
>> It can be enabled with the new 'netns' module parameter of vsock.ko.
>>
>> This implementation provides the following behavior:
>> - packets received from the host (received by G2H transports) are
>>   assigned to the default netns (init_net)
>> - packets received from the guest (received by H2G - vhost-vsock) are
>>   assigned to the netns of the process that opens /dev/vhost-vsock
>>   (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
>>     - for vmci I need some suggestions, because I don't know how to do
>>       and test the same in the vmci driver, for now vmci uses the
>>       init_net
>> - loopback packets are exchanged only in the same netns
>
>
>Hey Stefano,
>
>I recently picked up this series and am hoping to help update it / get
>it merged to address a known use case. I have some questions and
>thoughts (in other parts of this thread) and would love some
>suggestions!

Great!

>
>I already have a local branch with this updated with skbs and using
>/dev/vhost-vsock-netns to opt-in the VM as per the discussion in this
>thread.
>
>One question: what is the behavior we expect from guest namespaces?  In
>v2, you mentioned prototyping a /dev/vsock ioctl() to define the
>namespace for the virtio-vsock device. This would mean only one
>namespace could use vsock in the guest? Do we want to make sure that our
>design makes it possible to support multiple namespaces in the future if
>the use case arrives?

Yes, I guess it makes sense that multiple namespaces can communicate 
with the host and then use the virtio-vsock device!

IIRC, the main use case here was also nested VMs. So a netns could be 
used to isolate a nested VM in L1 and it may not need to talk to L0, so 
the software in the L1 netns can use vsock, but only to talk to L2.

>
>More questions/comments in other parts of this thread.

Sure, I'm happy to help with this effort with discussions/reviews!

Stefano


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2025-03-05  9:23     ` Stefano Garzarella
  2025-03-05  9:28       ` Michael S. Tsirkin
@ 2025-03-05 15:54       ` Bobby Eshleman
  2025-03-05 16:07         ` Stefano Garzarella
  1 sibling, 1 reply; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-05 15:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Jakub Kicinski, davem, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi, virtualization,
	linux-hyperv, Dexuan Cui

On Wed, Mar 05, 2025 at 10:23:08AM +0100, Stefano Garzarella wrote:
> On Wed, 5 Mar 2025 at 08:32, Michael S. Tsirkin <mst@redhat.com> wrote:
> >

[...]

> >
> >
> > I'm not sure I understand the usecase. Can you explain a bit more,
> > please?
> 
> It's been five years, but I'm trying!
> We are tracking this RFE here [1].
> 
> I also add Jakub in the thread with who I discussed last year a possible 
> restart of this effort, he could add more use cases.
> 
> The problem with vsock, host-side, currently is that if you launch a VM 
> with a virtio-vsock device (using vhost) inside a container (e.g., 
> Kata), so inside a network namespace, it is reachable from any other 
> container, whereas they would like some isolation. Also the CID is 
> shared among all, while they would like to reuse the same CID in 
> different namespaces.
> 
> This has been partially solved with vhost-user-vsock, but it is 
> inconvenient to use sometimes because of the hybrid-vsock problem 
> (host-side vsock is remapped to AF_UNIX).
> 
> Something from the cover letter of the series [2]:
> 
>   As we partially discussed in the multi-transport proposal, it could
>   be nice to support network namespace in vsock to reach the following
>   goals:
>   - isolate host applications from guest applications using the same ports
>     with CID_ANY
>   - assign the same CID of VMs running in different network namespaces
>   - partition VMs between VMMs or at finer granularity
> 
> Thanks,
> Stefano
> 

Do you know of any use cases for guest-side vsock netns?

Our use case is also host-side. vsock is used to communicate with a
host-side shim/proxy/debug console. Each vmm and these components share
a namespace and are isolated from other vmm + components. The VM
connects back to the host via vsock after startup and communicates its
port of choice out-of-band (fw_cfg).  The main problem is in security:
untrusted VM programs can potentially connect with and exploit the
host-side vsock services meant for other VMs. If vsock respected
namespaces, then these host-side services would be unreachable by other
VMs and protected.  Namespaces would also allow the vsock port to be
static across VMs, and avoid the need for the out-of-band mechanism for
communicating the port.

Jakub can jump in to add anything, but I think this is the same use case
/ user he was probably referring to.

Best,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2025-03-05 15:54       ` Bobby Eshleman
@ 2025-03-05 16:07         ` Stefano Garzarella
  2025-03-05 20:19           ` Bobby Eshleman
  0 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-05 16:07 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Michael S. Tsirkin, Jakub Kicinski, davem, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi, virtualization,
	linux-hyperv, Dexuan Cui

On Wed, 5 Mar 2025 at 16:55, Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> On Wed, Mar 05, 2025 at 10:23:08AM +0100, Stefano Garzarella wrote:
> > On Wed, 5 Mar 2025 at 08:32, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
>
> [...]
>
> > >
> > >
> > > I'm not sure I understand the usecase. Can you explain a bit more,
> > > please?
> >
> > It's been five years, but I'm trying!
> > We are tracking this RFE here [1].
> >
> > I also add Jakub in the thread with who I discussed last year a possible
> > restart of this effort, he could add more use cases.
> >
> > The problem with vsock, host-side, currently is that if you launch a VM
> > with a virtio-vsock device (using vhost) inside a container (e.g.,
> > Kata), so inside a network namespace, it is reachable from any other
> > container, whereas they would like some isolation. Also the CID is
> > shared among all, while they would like to reuse the same CID in
> > different namespaces.
> >
> > This has been partially solved with vhost-user-vsock, but it is
> > inconvenient to use sometimes because of the hybrid-vsock problem
> > (host-side vsock is remapped to AF_UNIX).
> >
> > Something from the cover letter of the series [2]:
> >
> >   As we partially discussed in the multi-transport proposal, it could
> >   be nice to support network namespace in vsock to reach the following
> >   goals:
> >   - isolate host applications from guest applications using the same ports
> >     with CID_ANY
> >   - assign the same CID of VMs running in different network namespaces
> >   - partition VMs between VMMs or at finer granularity
> >
> > Thanks,
> > Stefano
> >
>
> Do you know of any use cases for guest-side vsock netns?

Yep, as I mentioned in another mail this morning, the use case is
nested VMs or containers running in the L1 guests.
Users (e.g. Kata) would like to hide the L0<->L1 vsock channel in the
container, so anything running there can't talk with the L0 host.

BTW we can do that incrementally if it's too complicated.

>
> Our use case is also host-side. vsock is used to communicate with a
> host-side shim/proxy/debug console. Each vmm and these components share
> a namespace and are isolated from other vmm + components. The VM
> connects back to the host via vsock after startup and communicates its
> port of choice out-of-band (fw_cfg).  The main problem is in security:
> untrusted VM programs can potentially connect with and exploit the
> host-side vsock services meant for other VMs. If vsock respected
> namespaces, then these host-side services would be unreachable by other
> VMs and protected.  Namespaces would also allow the vsock port to be
> static across VMs, and avoid the need for the out-of-band mechanism for
> communicating the port.

Yeah, I see.

Thanks,
Stefano

>
> Jakub can jump in to add anything, but I think this is the same use case
> / user he was probably referring to.
>
> Best,
> Bobby
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  9:30           ` Stefano Garzarella
@ 2025-03-05 16:09             ` Bobby Eshleman
  2025-03-06  0:16             ` Jason Wang
  1 sibling, 0 replies; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-05 16:09 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Jakub Kicinski, Jason Wang, davem,
	Stefan Hajnoczi, linux-kernel, Jorgen Hansen, kvm, virtualization,
	linux-hyperv, Dexuan Cui, netdev

On Wed, Mar 05, 2025 at 10:30:17AM +0100, Stefano Garzarella wrote:
> On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> > > I think it might be a lot of complexity to bring into the picture from
> > > netdev, and I'm not sure there is a big win since the vsock device could
> > > also have a vsock->net itself? I think the complexity will come from the
> > > address translation, which I don't think netdev buys us because there
> > > would still be all of the work work to support vsock in netfilter?
> > 
> > Ugh.
> > 
> > Guys, let's remember what vsock is.
> > 
> > It's a replacement for the serial device with an interface
> > that's easier for userspace to consume, as you get
> > the demultiplexing by the port number.
> > 
> > The whole point of vsock is that people do not want
> > any firewalling, filtering, or management on it.
> > 
> > It needs to work with no configuration even if networking is
> > misconfigured or blocked.
> 
> I agree with Michael here.
> 
> It's been 5 years and my memory is bad, but using netdev seemed like a mess,
> especially because in vsock we don't have anything related to
> IP/Ethernet/ARP, etc.
> 
> I see vsock more as AF_UNIX than netdev.
> 

+1, I also agree with this.

For reference I added netdev to vsock before [1] to use qdisc and at
least from the qdisc perspect the juice wasn't worth the squeeze (tldr:
only pfifo_fast worked because vsock can't recover when other qdiscs silently
drop packets).

[1] https://lore.kernel.org/all/5a93c5aad99d79f028d349cb7e3c128c65d5d7e2.1660362668.git.bobby.eshleman@bytedance.com/

Best,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  9:42   ` Stefano Garzarella
@ 2025-03-05 17:14     ` Bobby Eshleman
  0 siblings, 0 replies; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-05 17:14 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: davem, netdev, linux-kernel, Jorgen Hansen, Jason Wang, kvm,
	Stefan Hajnoczi, virtualization, linux-hyperv, Michael S. Tsirkin,
	Dexuan Cui, Jakub Kicinski

On Wed, Mar 05, 2025 at 10:42:58AM +0100, Stefano Garzarella wrote:
> On Tue, Mar 04, 2025 at 04:06:02PM -0800, Bobby Eshleman wrote:
> > On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
> > 
> > One question: what is the behavior we expect from guest namespaces?  In
> > v2, you mentioned prototyping a /dev/vsock ioctl() to define the
> > namespace for the virtio-vsock device. This would mean only one
> > namespace could use vsock in the guest? Do we want to make sure that our
> > design makes it possible to support multiple namespaces in the future if
> > the use case arrives?
> 
> Yes, I guess it makes sense that multiple namespaces can communicate with
> the host and then use the virtio-vsock device!
> 
> IIRC, the main use case here was also nested VMs. So a netns could be used
> to isolate a nested VM in L1 and it may not need to talk to L0, so the
> software in the L1 netns can use vsock, but only to talk to L2.
> 

Oh I see. The ioctl(IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS) makes sense here
and seems like the simplest approach. Maybe we don't want multiple
namespaces for virtio-vsocka then? The problem I see is that then users
might expect non-colliding port spaces, which means there needs to be
some kind of port-mapping, which would then require vsock users to pass
around their port mappings out-of-band...

It sounds like none of our known use cases requires non-colliding ports?

> > 
> > More questions/comments in other parts of this thread.
> 
> Sure, I'm happy to help with this effort with discussions/reviews!
> 

Awesome, thank you!

Best,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 1/3] vsock: add network namespace support
  2025-03-05 16:07         ` Stefano Garzarella
@ 2025-03-05 20:19           ` Bobby Eshleman
  0 siblings, 0 replies; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-05 20:19 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Jakub Kicinski, davem, netdev, linux-kernel,
	Jorgen Hansen, Jason Wang, kvm, Stefan Hajnoczi, virtualization,
	linux-hyperv, Dexuan Cui

On Wed, Mar 05, 2025 at 05:07:13PM +0100, Stefano Garzarella wrote:
> On Wed, 5 Mar 2025 at 16:55, Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
> >
> > Do you know of any use cases for guest-side vsock netns?
> 
> Yep, as I mentioned in another mail this morning, the use case is
> nested VMs or containers running in the L1 guests.
> Users (e.g. Kata) would like to hide the L0<->L1 vsock channel in the
> container, so anything running there can't talk with the L0 host.
> 
> BTW we can do that incrementally if it's too complicated.
> 

Got it! I will try your solution with /dev/vsock-netns (unless there are
strong feelings otherwise), and if it becomes hairy maybe I'll omit it
in the next rev.

I don't think my earlier concern about port collissions in the G2H
scenario is worth worrying about without a real use case, that doesn't
sound expected by any users right now.

Thanks,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  9:30           ` Stefano Garzarella
  2025-03-05 16:09             ` Bobby Eshleman
@ 2025-03-06  0:16             ` Jason Wang
  2025-03-06  1:36               ` Lei Yang
  2025-03-10 14:14               ` Stefano Garzarella
  1 sibling, 2 replies; 50+ messages in thread
From: Jason Wang @ 2025-03-06  0:16 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Bobby Eshleman, Jakub Kicinski, davem,
	Stefan Hajnoczi, linux-kernel, Jorgen Hansen, kvm, virtualization,
	linux-hyperv, Dexuan Cui, netdev

On Wed, Mar 5, 2025 at 5:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> >On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> >> I think it might be a lot of complexity to bring into the picture from
> >> netdev, and I'm not sure there is a big win since the vsock device could
> >> also have a vsock->net itself? I think the complexity will come from the
> >> address translation, which I don't think netdev buys us because there
> >> would still be all of the work work to support vsock in netfilter?
> >
> >Ugh.
> >
> >Guys, let's remember what vsock is.
> >
> >It's a replacement for the serial device with an interface
> >that's easier for userspace to consume, as you get
> >the demultiplexing by the port number.

Interesting, but at least VSOCKETS said:

"""
config VSOCKETS
        tristate "Virtual Socket protocol"
        help
         Virtual Socket Protocol is a socket protocol similar to TCP/IP
          allowing communication between Virtual Machines and hypervisor
          or host.

          You should also select one or more hypervisor-specific transports
          below.

          To compile this driver as a module, choose M here: the module
          will be called vsock. If unsure, say N.
"""

This sounds exactly like networking stuff and spec also said something similar

"""
The virtio socket device is a zero-configuration socket communications
device. It facilitates data transfer between the guest and device
without using the Ethernet or IP protocols.
"""

> >
> >The whole point of vsock is that people do not want
> >any firewalling, filtering, or management on it.

We won't get this, these are for ethernet and TCP/IP mostly.

> >
> >It needs to work with no configuration even if networking is
> >misconfigured or blocked.

I don't see any blockers that prevent us from zero configuration, or I
miss something?

>
> I agree with Michael here.
>
> It's been 5 years and my memory is bad, but using netdev seemed like a
> mess, especially because in vsock we don't have anything related to
> IP/Ethernet/ARP, etc.

We don't need to bother with that, kernel support protocols other than TCP/IP.

>
> I see vsock more as AF_UNIX than netdev.

But you have a device in guest that differs from the AF_UNIX.

>
> I put in CC Jakub who was covering network namespace, maybe he has some
> advice for us regarding this. Context [1].
>
> Thanks,
> Stefano
>
> [1] https://lore.kernel.org/netdev/Z8edJjqAqAaV3Vkt@devvm6277.cco0.facebook.com/
>

Thanks


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-06  0:16             ` Jason Wang
@ 2025-03-06  1:36               ` Lei Yang
  2025-03-06  8:23                 ` Stefano Garzarella
  2025-03-10 14:14               ` Stefano Garzarella
  1 sibling, 1 reply; 50+ messages in thread
From: Lei Yang @ 2025-03-06  1:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Jakub Kicinski, davem, Stefan Hajnoczi,
	linux-kernel, Jorgen Hansen, kvm, virtualization, Bobby Eshleman,
	linux-hyperv, Dexuan Cui, netdev, Jason Wang

QE tested this series patch with virtio-net regression tests,
everything works fine.

Tested-by: Lei Yang <leiyang@redhat.com>

On Thu, Mar 6, 2025 at 8:17 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Mar 5, 2025 at 5:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> > >On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> > >> I think it might be a lot of complexity to bring into the picture from
> > >> netdev, and I'm not sure there is a big win since the vsock device could
> > >> also have a vsock->net itself? I think the complexity will come from the
> > >> address translation, which I don't think netdev buys us because there
> > >> would still be all of the work work to support vsock in netfilter?
> > >
> > >Ugh.
> > >
> > >Guys, let's remember what vsock is.
> > >
> > >It's a replacement for the serial device with an interface
> > >that's easier for userspace to consume, as you get
> > >the demultiplexing by the port number.
>
> Interesting, but at least VSOCKETS said:
>
> """
> config VSOCKETS
>         tristate "Virtual Socket protocol"
>         help
>          Virtual Socket Protocol is a socket protocol similar to TCP/IP
>           allowing communication between Virtual Machines and hypervisor
>           or host.
>
>           You should also select one or more hypervisor-specific transports
>           below.
>
>           To compile this driver as a module, choose M here: the module
>           will be called vsock. If unsure, say N.
> """
>
> This sounds exactly like networking stuff and spec also said something similar
>
> """
> The virtio socket device is a zero-configuration socket communications
> device. It facilitates data transfer between the guest and device
> without using the Ethernet or IP protocols.
> """
>
> > >
> > >The whole point of vsock is that people do not want
> > >any firewalling, filtering, or management on it.
>
> We won't get this, these are for ethernet and TCP/IP mostly.
>
> > >
> > >It needs to work with no configuration even if networking is
> > >misconfigured or blocked.
>
> I don't see any blockers that prevent us from zero configuration, or I
> miss something?
>
> >
> > I agree with Michael here.
> >
> > It's been 5 years and my memory is bad, but using netdev seemed like a
> > mess, especially because in vsock we don't have anything related to
> > IP/Ethernet/ARP, etc.
>
> We don't need to bother with that, kernel support protocols other than TCP/IP.
>
> >
> > I see vsock more as AF_UNIX than netdev.
>
> But you have a device in guest that differs from the AF_UNIX.
>
> >
> > I put in CC Jakub who was covering network namespace, maybe he has some
> > advice for us regarding this. Context [1].
> >
> > Thanks,
> > Stefano
> >
> > [1] https://lore.kernel.org/netdev/Z8edJjqAqAaV3Vkt@devvm6277.cco0.facebook.com/
> >
>
> Thanks
>
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-06  1:36               ` Lei Yang
@ 2025-03-06  8:23                 ` Stefano Garzarella
  0 siblings, 0 replies; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-06  8:23 UTC (permalink / raw)
  To: Lei Yang
  Cc: Michael S. Tsirkin, Jakub Kicinski, davem, Stefan Hajnoczi,
	linux-kernel, Jorgen Hansen, kvm, virtualization, Bobby Eshleman,
	linux-hyperv, Dexuan Cui, netdev, Jason Wang

On Thu, 6 Mar 2025 at 02:37, Lei Yang <leiyang@redhat.com> wrote:
>
> QE tested this series patch with virtio-net regression tests,
> everything works fine.
>
> Tested-by: Lei Yang <leiyang@redhat.com>

Sorry, but this test doesn't involve virtio-net at all, so what is the
point on testing it with virtio-net?

Thanks,
Stefano

>
> On Thu, Mar 6, 2025 at 8:17 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Mar 5, 2025 at 5:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> > > >On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> > > >> I think it might be a lot of complexity to bring into the picture from
> > > >> netdev, and I'm not sure there is a big win since the vsock device could
> > > >> also have a vsock->net itself? I think the complexity will come from the
> > > >> address translation, which I don't think netdev buys us because there
> > > >> would still be all of the work work to support vsock in netfilter?
> > > >
> > > >Ugh.
> > > >
> > > >Guys, let's remember what vsock is.
> > > >
> > > >It's a replacement for the serial device with an interface
> > > >that's easier for userspace to consume, as you get
> > > >the demultiplexing by the port number.
> >
> > Interesting, but at least VSOCKETS said:
> >
> > """
> > config VSOCKETS
> >         tristate "Virtual Socket protocol"
> >         help
> >          Virtual Socket Protocol is a socket protocol similar to TCP/IP
> >           allowing communication between Virtual Machines and hypervisor
> >           or host.
> >
> >           You should also select one or more hypervisor-specific transports
> >           below.
> >
> >           To compile this driver as a module, choose M here: the module
> >           will be called vsock. If unsure, say N.
> > """
> >
> > This sounds exactly like networking stuff and spec also said something similar
> >
> > """
> > The virtio socket device is a zero-configuration socket communications
> > device. It facilitates data transfer between the guest and device
> > without using the Ethernet or IP protocols.
> > """
> >
> > > >
> > > >The whole point of vsock is that people do not want
> > > >any firewalling, filtering, or management on it.
> >
> > We won't get this, these are for ethernet and TCP/IP mostly.
> >
> > > >
> > > >It needs to work with no configuration even if networking is
> > > >misconfigured or blocked.
> >
> > I don't see any blockers that prevent us from zero configuration, or I
> > miss something?
> >
> > >
> > > I agree with Michael here.
> > >
> > > It's been 5 years and my memory is bad, but using netdev seemed like a
> > > mess, especially because in vsock we don't have anything related to
> > > IP/Ethernet/ARP, etc.
> >
> > We don't need to bother with that, kernel support protocols other than TCP/IP.
> >
> > >
> > > I see vsock more as AF_UNIX than netdev.
> >
> > But you have a device in guest that differs from the AF_UNIX.
> >
> > >
> > > I put in CC Jakub who was covering network namespace, maybe he has some
> > > advice for us regarding this. Context [1].
> > >
> > > Thanks,
> > > Stefano
> > >
> > > [1] https://lore.kernel.org/netdev/Z8edJjqAqAaV3Vkt@devvm6277.cco0.facebook.com/
> > >
> >
> > Thanks
> >
> >
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-06  0:16             ` Jason Wang
  2025-03-06  1:36               ` Lei Yang
@ 2025-03-10 14:14               ` Stefano Garzarella
  2025-03-11  0:54                 ` Jason Wang
  1 sibling, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2025-03-10 14:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Bobby Eshleman, Jakub Kicinski, davem,
	Stefan Hajnoczi, linux-kernel, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev

On Thu, 6 Mar 2025 at 01:17, Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Mar 5, 2025 at 5:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> > >On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> > >> I think it might be a lot of complexity to bring into the picture from
> > >> netdev, and I'm not sure there is a big win since the vsock device could
> > >> also have a vsock->net itself? I think the complexity will come from the
> > >> address translation, which I don't think netdev buys us because there
> > >> would still be all of the work work to support vsock in netfilter?
> > >
> > >Ugh.
> > >
> > >Guys, let's remember what vsock is.
> > >
> > >It's a replacement for the serial device with an interface
> > >that's easier for userspace to consume, as you get
> > >the demultiplexing by the port number.
>
> Interesting, but at least VSOCKETS said:
>
> """
> config VSOCKETS
>         tristate "Virtual Socket protocol"
>         help
>          Virtual Socket Protocol is a socket protocol similar to TCP/IP
>           allowing communication between Virtual Machines and hypervisor
>           or host.
>
>           You should also select one or more hypervisor-specific transports
>           below.
>
>           To compile this driver as a module, choose M here: the module
>           will be called vsock. If unsure, say N.
> """
>
> This sounds exactly like networking stuff and spec also said something similar
>
> """
> The virtio socket device is a zero-configuration socket communications
> device. It facilitates data transfer between the guest and device
> without using the Ethernet or IP protocols.
> """
>
> > >
> > >The whole point of vsock is that people do not want
> > >any firewalling, filtering, or management on it.
>
> We won't get this, these are for ethernet and TCP/IP mostly.
>
> > >
> > >It needs to work with no configuration even if networking is
> > >misconfigured or blocked.
>
> I don't see any blockers that prevent us from zero configuration, or I
> miss something?
>
> >
> > I agree with Michael here.
> >
> > It's been 5 years and my memory is bad, but using netdev seemed like a
> > mess, especially because in vsock we don't have anything related to
> > IP/Ethernet/ARP, etc.
>
> We don't need to bother with that, kernel support protocols other than TCP/IP.

Do we have an example of any other non-Ethernet device that uses
netdev? Just to see what we should do.

I'm not completely against the idea, but from what I remember when I
looked at it five years ago, it wasn't that easy and straightforward
to use.

>
> >
> > I see vsock more as AF_UNIX than netdev.
>
> But you have a device in guest that differs from the AF_UNIX.

Yes, but the device is simply for carrying messages.
Another thing that makes me think of AF_UNIX is the hybrid-vsock
developed by Firecracker [1] that we also reused in vhost-user-vsock
[2], where the mapping between AF_VSOCK and AF_UNIX is really
implemented.

Thanks,
Stefano

[1] https://github.com/firecracker-microvm/firecracker/blob/main/docs/vsock.md#firecracker-virtio-vsock-design
[2] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-05  5:46         ` Jason Wang
@ 2025-03-10 20:14           ` Bobby Eshleman
  2025-03-11  0:59             ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-10 20:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Stefano Garzarella, davem, Stefan Hajnoczi, Michael S. Tsirkin,
	linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev, Jakub Kicinski

On Wed, Mar 05, 2025 at 01:46:54PM +0800, Jason Wang wrote:
> On Wed, Mar 5, 2025 at 8:39 AM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 06:00:52PM +0200, Stefano Garzarella wrote:
> > > On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
> >
> > WRT netdev, do we foresee big gains beyond just leveraging the netdev's
> > namespace?
> 
> It's a leverage of the network subsystem (netdevice, steering, uAPI,
> tracing, probably a lot of others), not only its namespace. It can
> avoid duplicating existing mechanisms in a vsock specific way. If we
> manage to do that, namespace support will be a "byproduct".
> 
[...]
> 
> Yes, it can. I think we need to evaluate both approaches (that's why I
> raise the approach of reusing netdevice). We can hear from others.
> 

I agree it is worth evaluating. If netdev is being considered, then it
is probably also worth considering your suggestion from a few years back
to add these capabilities by building vsock on top of virtio-net [1].

[1] https://lore.kernel.org/all/2747ac1f-390e-99f9-b24e-f179af79a9da@redhat.com/

Considering that the current vsock protocol will only ever be able to
enjoy a restricted feature set of these other net subsystems due to its
lack of tolerance for packet loss (e.g., no multiqueue steering, no
packet scheduling), I wonder if it would be best to a) wait until a user
requires these capabilities, and b) at that point extend vsock to tolerate
packet loss (add a seqnum)?

> >
> > Some other thoughts I had: netdev's flow control features would all have
> > to be ignored or disabled somehow (I think dev_direct_xmit()?), because
> > queueing introduces packet loss and the vsock protocol is unable to
> > survive packet loss.
> 
> Or just allow it and then configuring a qdisc that may drop packets
> could be treated as a misconfiguration.
> 

That is possible, but when I was playing with vsock qdisc the only one
that worked was pfifo_fast/pfifo, as the others that I tested async drop
packets.

Thanks,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-10 14:14               ` Stefano Garzarella
@ 2025-03-11  0:54                 ` Jason Wang
  2025-03-11  1:01                   ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2025-03-11  0:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Bobby Eshleman, Jakub Kicinski, davem,
	Stefan Hajnoczi, linux-kernel, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev

On Mon, Mar 10, 2025 at 10:15 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, 6 Mar 2025 at 01:17, Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Mar 5, 2025 at 5:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> > > >On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> > > >> I think it might be a lot of complexity to bring into the picture from
> > > >> netdev, and I'm not sure there is a big win since the vsock device could
> > > >> also have a vsock->net itself? I think the complexity will come from the
> > > >> address translation, which I don't think netdev buys us because there
> > > >> would still be all of the work work to support vsock in netfilter?
> > > >
> > > >Ugh.
> > > >
> > > >Guys, let's remember what vsock is.
> > > >
> > > >It's a replacement for the serial device with an interface
> > > >that's easier for userspace to consume, as you get
> > > >the demultiplexing by the port number.
> >
> > Interesting, but at least VSOCKETS said:
> >
> > """
> > config VSOCKETS
> >         tristate "Virtual Socket protocol"
> >         help
> >          Virtual Socket Protocol is a socket protocol similar to TCP/IP
> >           allowing communication between Virtual Machines and hypervisor
> >           or host.
> >
> >           You should also select one or more hypervisor-specific transports
> >           below.
> >
> >           To compile this driver as a module, choose M here: the module
> >           will be called vsock. If unsure, say N.
> > """
> >
> > This sounds exactly like networking stuff and spec also said something similar
> >
> > """
> > The virtio socket device is a zero-configuration socket communications
> > device. It facilitates data transfer between the guest and device
> > without using the Ethernet or IP protocols.
> > """
> >
> > > >
> > > >The whole point of vsock is that people do not want
> > > >any firewalling, filtering, or management on it.
> >
> > We won't get this, these are for ethernet and TCP/IP mostly.
> >
> > > >
> > > >It needs to work with no configuration even if networking is
> > > >misconfigured or blocked.
> >
> > I don't see any blockers that prevent us from zero configuration, or I
> > miss something?
> >
> > >
> > > I agree with Michael here.
> > >
> > > It's been 5 years and my memory is bad, but using netdev seemed like a
> > > mess, especially because in vsock we don't have anything related to
> > > IP/Ethernet/ARP, etc.
> >
> > We don't need to bother with that, kernel support protocols other than TCP/IP.
>
> Do we have an example of any other non-Ethernet device that uses
> netdev? Just to see what we should do.

Yes, I think can device is one example and it should have others.

>
> I'm not completely against the idea, but from what I remember when I
> looked at it five years ago, it wasn't that easy and straightforward
> to use.

Can just hook the packets into its own stack, maybe vsock can do the same.

>
> >
> > >
> > > I see vsock more as AF_UNIX than netdev.
> >
> > But you have a device in guest that differs from the AF_UNIX.
>
> Yes, but the device is simply for carrying messages.
> Another thing that makes me think of AF_UNIX is the hybrid-vsock
> developed by Firecracker [1] that we also reused in vhost-user-vsock
> [2], where the mapping between AF_VSOCK and AF_UNIX is really
> implemented.

I see. But the main difference is that vsock can work across the
boundary of guest and host. This makes it hard to be a 100% socket
implementation in the guest.

Thanks

>
> Thanks,
> Stefano
>
> [1] https://github.com/firecracker-microvm/firecracker/blob/main/docs/vsock.md#firecracker-virtio-vsock-design
> [2] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-10 20:14           ` Bobby Eshleman
@ 2025-03-11  0:59             ` Jason Wang
  2025-03-12 22:29               ` Bobby Eshleman
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2025-03-11  0:59 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefano Garzarella, davem, Stefan Hajnoczi, Michael S. Tsirkin,
	linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev, Jakub Kicinski

On Tue, Mar 11, 2025 at 4:14 AM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> On Wed, Mar 05, 2025 at 01:46:54PM +0800, Jason Wang wrote:
> > On Wed, Mar 5, 2025 at 8:39 AM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 06:00:52PM +0200, Stefano Garzarella wrote:
> > > > On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
> > >
> > > WRT netdev, do we foresee big gains beyond just leveraging the netdev's
> > > namespace?
> >
> > It's a leverage of the network subsystem (netdevice, steering, uAPI,
> > tracing, probably a lot of others), not only its namespace. It can
> > avoid duplicating existing mechanisms in a vsock specific way. If we
> > manage to do that, namespace support will be a "byproduct".
> >
> [...]
> >
> > Yes, it can. I think we need to evaluate both approaches (that's why I
> > raise the approach of reusing netdevice). We can hear from others.
> >
>
> I agree it is worth evaluating. If netdev is being considered, then it
> is probably also worth considering your suggestion from a few years back
> to add these capabilities by building vsock on top of virtio-net [1].
>
> [1] https://lore.kernel.org/all/2747ac1f-390e-99f9-b24e-f179af79a9da@redhat.com/

Yes. I think having a dedicated netdev might be simpler than reusing
the virito-net.

>
> Considering that the current vsock protocol will only ever be able to
> enjoy a restricted feature set of these other net subsystems due to its
> lack of tolerance for packet loss (e.g., no multiqueue steering, no
> packet scheduling), I wonder if it would be best to a) wait until a user
> requires these capabilities, and b) at that point extend vsock to tolerate
> packet loss (add a seqnum)?

Maybe, a question back to this proposal. What's the plan for the
userspace? For example, do we expect to extend iproute2 and other and
how (e.g having a new vsock dedicated tool)?

>
> > >
> > > Some other thoughts I had: netdev's flow control features would all have
> > > to be ignored or disabled somehow (I think dev_direct_xmit()?), because
> > > queueing introduces packet loss and the vsock protocol is unable to
> > > survive packet loss.
> >
> > Or just allow it and then configuring a qdisc that may drop packets
> > could be treated as a misconfiguration.
> >
>
> That is possible, but when I was playing with vsock qdisc the only one
> that worked was pfifo_fast/pfifo, as the others that I tested async drop
> packets.

I guess it should work with qdiscs with TCQ_F_CAN_BYPASS. Or if it
turns out to be hard we can just bypass the qdisc layer as you said.

Thanks

>
> Thanks,
> Bobby
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-11  0:54                 ` Jason Wang
@ 2025-03-11  1:01                   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2025-03-11  1:01 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Bobby Eshleman, Jakub Kicinski, davem,
	Stefan Hajnoczi, linux-kernel, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev

On Tue, Mar 11, 2025 at 8:54 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Mar 10, 2025 at 10:15 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Thu, 6 Mar 2025 at 01:17, Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Mar 5, 2025 at 5:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > >
> > > > On Wed, Mar 05, 2025 at 02:27:12AM -0500, Michael S. Tsirkin wrote:
> > > > >On Tue, Mar 04, 2025 at 04:39:02PM -0800, Bobby Eshleman wrote:
> > > > >> I think it might be a lot of complexity to bring into the picture from
> > > > >> netdev, and I'm not sure there is a big win since the vsock device could
> > > > >> also have a vsock->net itself? I think the complexity will come from the
> > > > >> address translation, which I don't think netdev buys us because there
> > > > >> would still be all of the work work to support vsock in netfilter?
> > > > >
> > > > >Ugh.
> > > > >
> > > > >Guys, let's remember what vsock is.
> > > > >
> > > > >It's a replacement for the serial device with an interface
> > > > >that's easier for userspace to consume, as you get
> > > > >the demultiplexing by the port number.
> > >
> > > Interesting, but at least VSOCKETS said:
> > >
> > > """
> > > config VSOCKETS
> > >         tristate "Virtual Socket protocol"
> > >         help
> > >          Virtual Socket Protocol is a socket protocol similar to TCP/IP
> > >           allowing communication between Virtual Machines and hypervisor
> > >           or host.
> > >
> > >           You should also select one or more hypervisor-specific transports
> > >           below.
> > >
> > >           To compile this driver as a module, choose M here: the module
> > >           will be called vsock. If unsure, say N.
> > > """
> > >
> > > This sounds exactly like networking stuff and spec also said something similar
> > >
> > > """
> > > The virtio socket device is a zero-configuration socket communications
> > > device. It facilitates data transfer between the guest and device
> > > without using the Ethernet or IP protocols.
> > > """
> > >
> > > > >
> > > > >The whole point of vsock is that people do not want
> > > > >any firewalling, filtering, or management on it.
> > >
> > > We won't get this, these are for ethernet and TCP/IP mostly.
> > >
> > > > >
> > > > >It needs to work with no configuration even if networking is
> > > > >misconfigured or blocked.
> > >
> > > I don't see any blockers that prevent us from zero configuration, or I
> > > miss something?
> > >
> > > >
> > > > I agree with Michael here.
> > > >
> > > > It's been 5 years and my memory is bad, but using netdev seemed like a
> > > > mess, especially because in vsock we don't have anything related to
> > > > IP/Ethernet/ARP, etc.
> > >
> > > We don't need to bother with that, kernel support protocols other than TCP/IP.
> >
> > Do we have an example of any other non-Ethernet device that uses
> > netdev? Just to see what we should do.
>
> Yes, I think can device is one example and it should have others.
>
> >
> > I'm not completely against the idea, but from what I remember when I
> > looked at it five years ago, it wasn't that easy and straightforward
> > to use.
>
> Can just hook the packets into its own stack, maybe vsock can do the same.
>
> >
> > >
> > > >
> > > > I see vsock more as AF_UNIX than netdev.
> > >
> > > But you have a device in guest that differs from the AF_UNIX.
> >
> > Yes, but the device is simply for carrying messages.
> > Another thing that makes me think of AF_UNIX is the hybrid-vsock
> > developed by Firecracker [1] that we also reused in vhost-user-vsock
> > [2], where the mapping between AF_VSOCK and AF_UNIX is really
> > implemented.
>
> I see. But the main difference is that vsock can work across the
> boundary of guest and host. This makes it hard to be a 100% socket
> implementation in the guest.

Or inventing a protocol to make vsosk can be transported via ethernet
(not sure this is possible then).

Thanks

>
> Thanks
>
> >
> > Thanks,
> > Stefano
> >
> > [1] https://github.com/firecracker-microvm/firecracker/blob/main/docs/vsock.md#firecracker-virtio-vsock-design
> > [2] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
> >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next 0/3] vsock: support network namespace
  2025-03-11  0:59             ` Jason Wang
@ 2025-03-12 22:29               ` Bobby Eshleman
  0 siblings, 0 replies; 50+ messages in thread
From: Bobby Eshleman @ 2025-03-12 22:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Stefano Garzarella, davem, Stefan Hajnoczi, Michael S. Tsirkin,
	linux-kernel, Jorgen Hansen, kvm, virtualization, linux-hyperv,
	Dexuan Cui, netdev, Jakub Kicinski

On Tue, Mar 11, 2025 at 08:59:44AM +0800, Jason Wang wrote:
> On Tue, Mar 11, 2025 at 4:14 AM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
> >
> > On Wed, Mar 05, 2025 at 01:46:54PM +0800, Jason Wang wrote:
> > > On Wed, Mar 5, 2025 at 8:39 AM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 28, 2020 at 06:00:52PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
> > > >
> > > > WRT netdev, do we foresee big gains beyond just leveraging the netdev's
> > > > namespace?
> > >
> > > It's a leverage of the network subsystem (netdevice, steering, uAPI,
> > > tracing, probably a lot of others), not only its namespace. It can
> > > avoid duplicating existing mechanisms in a vsock specific way. If we
> > > manage to do that, namespace support will be a "byproduct".
> > >
> > [...]
> > >
> > > Yes, it can. I think we need to evaluate both approaches (that's why I
> > > raise the approach of reusing netdevice). We can hear from others.
> > >
> >
> > I agree it is worth evaluating. If netdev is being considered, then it
> > is probably also worth considering your suggestion from a few years back
> > to add these capabilities by building vsock on top of virtio-net [1].
> >
> > [1] https://lore.kernel.org/all/2747ac1f-390e-99f9-b24e-f179af79a9da@redhat.com/
> 
> Yes. I think having a dedicated netdev might be simpler than reusing
> the virito-net.
> 
> >
> > Considering that the current vsock protocol will only ever be able to
> > enjoy a restricted feature set of these other net subsystems due to its
> > lack of tolerance for packet loss (e.g., no multiqueue steering, no
> > packet scheduling), I wonder if it would be best to a) wait until a user
> > requires these capabilities, and b) at that point extend vsock to tolerate
> > packet loss (add a seqnum)?
> 
> Maybe, a question back to this proposal. What's the plan for the
> userspace? For example, do we expect to extend iproute2 and other and
> how (e.g having a new vsock dedicated tool)?
> 

If we were going to add a seqnum and start bringing in other systems, we
would probably want to add support into iproute2. For example, when I
played with qdisc, using ip seemed like the best from the user side.
The iproute2 changes weren't bad at all[1]. We'd probably need the
device to carry a new feature bit too.

That said, all of this still creates the problem of adding new
system-level ways to disrupt AF_VSOCK users. I think we could offer this
in a way that is orthogonal to prior vsock, possibly AF_VSOCK2, a
sockopt, or ioctl to opt-in to using net features... so that we aren't
violating commitment to existing users that vsock should work regardless
of network configuration? letting the user that holds the fd of the
socket make the choice might be the best way to safeguard the contract?

[1]:	https://github.com/beshleman/iproute2/commit/55fd8a6c133335cda4ede6f8928eb3cea54534b8

Best,
Bobby

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2025-03-12 22:29 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-16 17:24 [PATCH net-next 0/3] vsock: support network namespace Stefano Garzarella
2020-01-16 17:24 ` [PATCH net-next 1/3] vsock: add network namespace support Stefano Garzarella
2020-01-20  9:06   ` David Miller
2020-01-20 10:17     ` Stefano Garzarella
2020-01-20 12:03       ` Michael S. Tsirkin
2020-01-20 13:58         ` Stefano Garzarella
2020-01-20 16:04           ` Michael S. Tsirkin
2020-01-20 16:53             ` Stefano Garzarella
2020-01-20 22:02               ` Michael S. Tsirkin
2020-01-21  9:07                 ` Stefano Garzarella
2020-01-21 11:14                   ` Michael S. Tsirkin
2020-01-21 13:13                     ` Stefano Garzarella
2020-01-21 15:43                     ` Stefan Hajnoczi
2020-01-21 13:59                   ` Stefan Hajnoczi
2020-01-21 14:31                     ` Michael S. Tsirkin
2020-01-21 15:44                       ` Stefan Hajnoczi
2025-03-05  7:32   ` Michael S. Tsirkin
2025-03-05  9:23     ` Stefano Garzarella
2025-03-05  9:28       ` Michael S. Tsirkin
2025-03-05  9:37         ` Stefano Garzarella
2025-03-05 15:54       ` Bobby Eshleman
2025-03-05 16:07         ` Stefano Garzarella
2025-03-05 20:19           ` Bobby Eshleman
2020-01-16 17:24 ` [PATCH net-next 2/3] vsock/virtio_transport_common: handle netns of received packets Stefano Garzarella
2020-01-16 17:24 ` [PATCH net-next 3/3] vhost/vsock: use netns of process that opens the vhost-vsock device Stefano Garzarella
2020-01-21 15:50 ` [PATCH net-next 0/3] vsock: support network namespace Stefan Hajnoczi
2020-01-22  9:13   ` Stefano Garzarella
2020-04-27 14:25 ` Stefano Garzarella
2020-04-27 14:31   ` Michael S. Tsirkin
2020-04-27 15:21     ` Stefano Garzarella
2020-04-28  8:13   ` Jason Wang
2020-04-28 16:00     ` Stefano Garzarella
2020-04-29  9:21       ` Jason Wang
2025-03-05  0:39       ` Bobby Eshleman
2025-03-05  5:46         ` Jason Wang
2025-03-10 20:14           ` Bobby Eshleman
2025-03-11  0:59             ` Jason Wang
2025-03-12 22:29               ` Bobby Eshleman
2025-03-05  7:27         ` Michael S. Tsirkin
2025-03-05  9:30           ` Stefano Garzarella
2025-03-05 16:09             ` Bobby Eshleman
2025-03-06  0:16             ` Jason Wang
2025-03-06  1:36               ` Lei Yang
2025-03-06  8:23                 ` Stefano Garzarella
2025-03-10 14:14               ` Stefano Garzarella
2025-03-11  0:54                 ` Jason Wang
2025-03-11  1:01                   ` Jason Wang
2025-03-05  0:06 ` Bobby Eshleman
2025-03-05  9:42   ` Stefano Garzarella
2025-03-05 17:14     ` Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).