All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Bryan Tan" <bryan-bt.tan@broadcom.com>,
	"Vishnu Dasa" <vishnu.dasa@broadcom.com>,
	"Broadcom internal kernel review list"
	<bcm-kernel-feedback-list@broadcom.com>,
	"Shuah Khan" <shuah@kernel.org>, "Long Li" <longli@microsoft.com>,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kselftest@vger.kernel.org,
	berrange@redhat.com, "Sargun Dhillon" <sargun@sargun.me>,
	"Bobby Eshleman" <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next v14 01/12] vsock: add netns to vsock core
Date: Tue, 13 Jan 2026 12:24:21 -0800	[thread overview]
Message-ID: <aWap9cOdhMAg2KPZ@devvm11784.nha0.facebook.com> (raw)
In-Reply-To: <aWYrwU__c895PymJ@sgarzare-redhat>

On Tue, Jan 13, 2026 at 03:37:15PM +0100, Stefano Garzarella wrote:
> On Mon, Jan 12, 2026 at 07:11:10PM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Add netns logic to vsock core. Additionally, modify transport hook
> > prototypes to be used by later transport-specific patches (e.g.,
> > *_seqpacket_allow()).
> > 
> > Namespaces are supported primarily by changing socket lookup functions
> > (e.g., vsock_find_connected_socket()) to take into account the socket
> > namespace and the namespace mode before considering a candidate socket a
> > "match".
> > 
> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to
> > report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode
> > for new namespaces.
> > 
> > Add netns functionality (initialization, passing to transports, procfs,
> > etc...) to the af_vsock socket layer. Later patches that add netns
> > support to transports depend on this patch.
> > 
> > dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
> > modified to take a vsk in order to perform logic on namespace modes. In
> > future patches, the net will also be used for socket
> > lookups in these functions.
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v14:
> > - include linux/sysctl.h in af_vsock.c
> > - squash patch 'vsock: add per-net vsock NS mode state' into this patch
> >  (prior version can be found here):
> >  https://lore.kernel.org/all/20251223-vsock-vmtest-v13-1-9d6db8e7c80b@meta.com/)
> > 
> > Changes in v13:
> > - remove net_mode and replace with direct accesses to net->vsock.mode,
> >  since this is now immutable.
> > - update comments about mode behavior and mutability, and sysctl API
> > - only pass NULL for net when wanting global, instead of net_mode ==
> >  VSOCK_NET_MODE_GLOBAL. This reflects the new logic
> >  of vsock_net_check_mode() that only requires net pointers (not
> >  net_mode).
> > - refactor sysctl string code into a re-usable function, because
> >  child_ns_mode and ns_mode both handle the same strings.
> > - remove redundant vsock_net_init(&init_net) call in module init because
> >  pernet registration calls the callback on the init_net too
> > 
> > Changes in v12:
> > - return true in dgram_allow(), stream_allow(), and seqpacket_allow()
> >  only if net_mode == VSOCK_NET_MODE_GLOBAL (Stefano)
> > - document bind(VMADDR_CID_ANY) case in af_vsock.c (Stefano)
> > - change order of stream_allow() call in vmci so we can pass vsk
> >  to it
> > 
> > Changes in v10:
> > - add file-level comment about what happens to sockets/devices
> >  when the namespace mode changes (Stefano)
> > - change the 'if (write)' boolean in vsock_net_mode_string() to
> >  if (!write), this simplifies a later patch which adds "goto"
> >  for mutex unlocking on function exit.
> > 
> > Changes in v9:
> > - remove virtio_vsock_alloc_rx_skb() (Stefano)
> > - remove vsock_global_dummy_net, not needed as net=NULL +
> >  net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result
> > 
> > Changes in v7:
> > - hv_sock: fix hyperv build error
> > - explain why vhost does not use the dummy
> > - explain usage of __vsock_global_dummy_net
> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
> > - use switch-case in vsock_net_mode_string()
> > - avoid changing transports as much as possible
> > - add vsock_find_{bound,connected}_socket_net()
> > - rename `vsock_hdr` to `sysctl_hdr`
> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
> >  global mode for virtio-vsock, move skb->cb zero-ing into wrapper
> > - explain seqpacket_allow() change
> > - move net setting to __vsock_create() instead of vsock_create() so
> >  that child sockets also have their net assigned upon accept()
> > 
> > Changes in v6:
> > - unregister sysctl ops in vsock_exit()
> > - af_vsock: clarify description of CID behavior
> > - af_vsock: fix buf vs buffer naming, and length checking
> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
> > 
> > Changes in v5:
> > - vsock_global_net() -> vsock_global_dummy_net()
> > - update comments for new uAPI
> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
> > - add prototype changes so patch remains compilable
> > ---
> > MAINTAINERS                             |   1 +
> > drivers/vhost/vsock.c                   |   6 +-
> > include/linux/virtio_vsock.h            |   4 +-
> > include/net/af_vsock.h                  |  53 +++++-
> > include/net/net_namespace.h             |   4 +
> > include/net/netns/vsock.h               |  17 ++
> > net/vmw_vsock/af_vsock.c                | 297 +++++++++++++++++++++++++++++---
> > net/vmw_vsock/hyperv_transport.c        |   7 +-
> > net/vmw_vsock/virtio_transport.c        |   9 +-
> > net/vmw_vsock/virtio_transport_common.c |   6 +-
> > net/vmw_vsock/vmci_transport.c          |  26 ++-
> > net/vmw_vsock/vsock_loopback.c          |   8 +-
> > 12 files changed, 394 insertions(+), 44 deletions(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6737aad729d6..f4aa476427c8 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -27522,6 +27522,7 @@ L:	netdev@vger.kernel.org
> > S:	Maintained
> > F:	drivers/vhost/vsock.c
> > F:	include/linux/virtio_vsock.h
> > +F:	include/net/netns/vsock.h
> > F:	include/uapi/linux/virtio_vsock.h
> > F:	net/vmw_vsock/virtio_transport.c
> > F:	net/vmw_vsock/virtio_transport_common.c
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 552cfb53498a..647ded6f6ea5 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -407,7 +407,8 @@ static bool vhost_transport_msgzerocopy_allow(void)
> > 	return true;
> > }
> > 
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
> > +					    u32 remote_cid);
> > 
> > static struct virtio_transport vhost_transport = {
> > 	.transport = {
> > @@ -463,7 +464,8 @@ static struct virtio_transport vhost_transport = {
> > 	.send_pkt = vhost_transport_send_pkt,
> > };
> > 
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
> > +					    u32 remote_cid)
> > {
> > 	struct vhost_vsock *vsock;
> > 	bool seqpacket_allow = false;
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 0c67543a45c8..1845e8d4f78d 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -256,10 +256,10 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > 
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > -bool virtio_transport_stream_allow(u32 cid, u32 port);
> > +bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> > int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > 				struct sockaddr_vm *addr);
> > -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > +bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> > 
> > int virtio_transport_connect(struct vsock_sock *vsk);
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index d40e978126e3..10c2846fcc58 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -10,6 +10,7 @@
> > 
> > #include <linux/kernel.h>
> > #include <linux/workqueue.h>
> > +#include <net/netns/vsock.h>
> > #include <net/sock.h>
> > #include <uapi/linux/vm_sockets.h>
> > 
> > @@ -124,7 +125,7 @@ struct vsock_transport {
> > 			     size_t len, int flags);
> > 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > 			     struct msghdr *, size_t len);
> > -	bool (*dgram_allow)(u32 cid, u32 port);
> > +	bool (*dgram_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
> > 
> > 	/* STREAM. */
> > 	/* TODO: stream_bind() */
> > @@ -136,14 +137,14 @@ struct vsock_transport {
> > 	s64 (*stream_has_space)(struct vsock_sock *);
> > 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
> > 	bool (*stream_is_active)(struct vsock_sock *);
> > -	bool (*stream_allow)(u32 cid, u32 port);
> > +	bool (*stream_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
> > 
> > 	/* SEQ_PACKET. */
> > 	ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > 				     int flags);
> > 	int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> > 				 size_t len);
> > -	bool (*seqpacket_allow)(u32 remote_cid);
> > +	bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
> > 	u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
> > 
> > 	/* Notification. */
> > @@ -216,6 +217,11 @@ void vsock_remove_connected(struct vsock_sock *vsk);
> > struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> > struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > 					 struct sockaddr_vm *dst);
> > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > +					 struct net *net);
> > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > +					     struct sockaddr_vm *dst,
> > +					     struct net *net);
> > void vsock_remove_sock(struct vsock_sock *vsk);
> > void vsock_for_each_connected_socket(struct vsock_transport *transport,
> > 				     void (*fn)(struct sock *sk));
> > @@ -256,4 +262,45 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> > {
> > 	return t->msgzerocopy_allow && t->msgzerocopy_allow();
> > }
> > +
> > +static inline enum vsock_net_mode vsock_net_mode(struct net *net)
> > +{
> 
> What about moving here the check about NULL namespace?
> (adding the comment we have later)
> 
> I mean just adding a
> 	if (!net)
> 		return VSOCK_NET_MODE_GLOBAL;
> 
> Or using the ternary operator, as you prefer.
> 

SGTM!

> 
> > +	return READ_ONCE(net->vsock.mode);
> > +}
> > +
> > +static inline void vsock_net_set_child_mode(struct net *net,
> > +					    enum vsock_net_mode mode)
> > +{
> > +	WRITE_ONCE(net->vsock.child_ns_mode, mode);
> > +}
> > +
> > +static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> > +{
> > +	return READ_ONCE(net->vsock.child_ns_mode);
> > +}
> > +
> > +/* Return true if two namespaces pass the mode rules. Otherwise, return false.
> > + *
> > + * A NULL namespace is treated as VSOCK_NET_MODE_GLOBAL.
> > + *
> > + * Read more about modes in the comment header of net/vmw_vsock/af_vsock.c.
> > + */
> > +static inline bool vsock_net_check_mode(struct net *ns0, struct net *ns1)
> > +{
> > +	enum vsock_net_mode mode0, mode1;
> > +
> > +	/* Any vsocks within the same network namespace are always reachable,
> > +	 * regardless of the mode.
> > +	 */
> > +	if (net_eq(ns0, ns1))
> > +		return true;
> > +
> > +	mode0 = ns0 ? vsock_net_mode(ns0) : VSOCK_NET_MODE_GLOBAL;
> > +	mode1 = ns1 ? vsock_net_mode(ns1) : VSOCK_NET_MODE_GLOBAL;
> 
> So we can avoid duplicating code here and maybe in other places where
> vsock_net_mode() is used.
> 
> About them, in all transports we use that in this way:
> 
> net/vmw_vsock/hyperv_transport.c:       if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL)
> net/vmw_vsock/virtio_transport.c:       return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> net/vmw_vsock/virtio_transport.c:       if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL)
> net/vmw_vsock/vmci_transport.c: if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL)
> net/vmw_vsock/vmci_transport.c: if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL)
> net/vmw_vsock/vsock_loopback.c: return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> 
> So, I'm thinking if we should provide some `vsock_net_mode_global(vsk)` that
> can be used there, but not a strong opinion at all.

Sounds good!

> 
> > +
> > +	/* Different namespaces are only reachable if they are both
> > +	 * global mode.
> > +	 */
> > +	return mode0 == VSOCK_NET_MODE_GLOBAL && mode0 == mode1;
> > +}
> > #endif /* __AF_VSOCK_H__ */
> > diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> > index cb664f6e3558..66d3de1d935f 100644
> > --- a/include/net/net_namespace.h
> > +++ b/include/net/net_namespace.h
> > @@ -37,6 +37,7 @@
> > #include <net/netns/smc.h>
> > #include <net/netns/bpf.h>
> > #include <net/netns/mctp.h>
> > +#include <net/netns/vsock.h>
> > #include <net/net_trackers.h>
> > #include <linux/ns_common.h>
> > #include <linux/idr.h>
> > @@ -196,6 +197,9 @@ struct net {
> > 	/* Move to a better place when the config guard is removed. */
> > 	struct mutex		rtnl_mutex;
> > #endif
> > +#if IS_ENABLED(CONFIG_VSOCKETS)
> > +	struct netns_vsock	vsock;
> > +#endif
> > } __randomize_layout;
> > 
> > #include <linux/seq_file_net.h>
> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> > new file mode 100644
> > index 000000000000..e2325e2d6ec5
> > --- /dev/null
> > +++ b/include/net/netns/vsock.h
> > @@ -0,0 +1,17 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __NET_NET_NAMESPACE_VSOCK_H
> > +#define __NET_NET_NAMESPACE_VSOCK_H
> > +
> > +#include <linux/types.h>
> > +
> > +enum vsock_net_mode {
> > +	VSOCK_NET_MODE_GLOBAL,
> > +	VSOCK_NET_MODE_LOCAL,
> > +};
> > +
> > +struct netns_vsock {
> > +	struct ctl_table_header *sysctl_hdr;
> > +	enum vsock_net_mode mode;
> > +	enum vsock_net_mode child_ns_mode;
> > +};
> > +#endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index a3505a4dcee0..9d614e4a4fa5 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -83,6 +83,42 @@
> >  *   TCP_ESTABLISHED - connected
> >  *   TCP_CLOSING - disconnecting
> >  *   TCP_LISTEN - listening
> > + *
> > + * - Namespaces in vsock support two different modes configured
> > + *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
> 
> IIUC `/proc/sys/net/vsock/ns_mode` is read-only so "two different modes
> configured through /proc/sys/net/vsock/ns_mode" is not really clear IMO.

Oh good call, this is stale.

> 
> > + *   Each mode defines how the namespace interacts with CIDs.
> > + *   /proc/sys/net/vsock/ns_mode is read-only and inherited from the
> > + *   parent namespace's /proc/sys/net/vsock/child_ns_mode at creation
> > + *   time and is immutable thereafter. The default is "global".
> 
> Here too, it is not clear to me that child_ns_mode is used to manage new
> children. What do you think about clarifying the situation a little in this
> way:
> 
>    * - Namespaces in vsock support two different modes: "local" and "global".
>    *   Each mode defines how the namespace interacts with CIDs.
>    *   Each namespace exposes two sysctl files:
>    *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>    *     mode, which is set at namespace creation and immutable thereafter.
>    *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>    *     child namespaces will inherit when created. The default is "global".
>    *   Changing child_ns_mode only affects newly created namespaces, not the
>    *   current namespace or existing children. At namespace creation, ns_mode
>    *   is inherited from the parent's child_ns_mode.
> 

That sounds good to me, a good clarification.

> > + *
> > + *   The modes affect the allocation and accessibility of CIDs as follows:
> > + *
> > + *   - global - access and allocation are all system-wide
> > + *      - all CID allocation from global namespaces draw from the same
> > + *        system-wide pool.
> > + *      - if one global namespace has already allocated some CID, another
> > + *        global namespace will not be able to allocate the same CID.
> > + *      - global mode AF_VSOCK sockets can reach any VM or socket in any global
> > + *        namespace, they are not contained to only their own namespace.
> > + *      - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
> > + *        sockets in any local mode namespace.
> > + *   - local - access and allocation are contained within the namespace
> > + *     - CID allocation draws only from a private pool local only to the
> > + *       namespace, and does not affect the CIDs available for allocation in any
> > + *       other namespace (global or local).
> > + *     - VMs in a local namespace do not collide with CIDs in any other local
> > + *       namespace or any global namespace. For example, if a VM in a local mode
> > + *       namespace is given CID 10, then CID 10 is still available for
> > + *       allocation in any other namespace, but not in the same namespace.
> > + *     - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
> > + *       other sockets within their own namespace.
> > + *     - sockets bound to VMADDR_CID_ANY in local namespaces will never resolve
> > + *       to any transport that is not compatible with local mode. There is no
> > + *       error that propagates to the user (as there is for connection attempts)
> > + *       because it is possible for some packet to reach this socket from
> > + *       a different transport that *does* support local mode. For
> > + *       example, virtio-vsock may not support local mode, but the socket
> > + *       may still accept a connection from vhost-vsock which does.
> >  */
> 
> So, compared to the previous implementation, now there is no way to specify
> the mode of `init_net` right?
> 
> At this point, it will always be global IIUC.
> 
> Should we provide something for init_net? Module parameter, kernel cmdline.
> I mean something that can be decided before af_vsock is loaded, so we don't
> have the previous problem of having sockets with different modes within the
> same namespace.

True, I hadn't considered inet_net. Those two options sound good to me,
but I'll wait for Paolo's thoughts.

> 
> @Paolo any suggestion?
> 
> > 
> > #include <linux/compat.h>
> > @@ -100,20 +136,31 @@
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/net.h>
> > +#include <linux/proc_fs.h>
> > #include <linux/poll.h>
> > #include <linux/random.h>
> > #include <linux/skbuff.h>
> > #include <linux/smp.h>
> > #include <linux/socket.h>
> > #include <linux/stddef.h>
> > +#include <linux/sysctl.h>
> > #include <linux/unistd.h>
> > #include <linux/wait.h>
> > #include <linux/workqueue.h>
> > #include <net/sock.h>
> > #include <net/af_vsock.h>
> > +#include <net/netns/vsock.h>
> > #include <uapi/linux/vm_sockets.h>
> > #include <uapi/asm-generic/ioctls.h>
> > 
> > +#define VSOCK_NET_MODE_STR_GLOBAL "global"
> > +#define VSOCK_NET_MODE_STR_LOCAL "local"
> > +
> > +/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
> > + * The newline is added by proc_dostring() for read operations.
> > + */
> > +#define VSOCK_NET_MODE_STR_MAX 8
> > +
> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> > static void vsock_sk_destruct(struct sock *sk);
> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> > @@ -235,33 +282,42 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > 	sock_put(&vsk->sk);
> > }
> > 
> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > +						  struct net *net)
> > {
> > 	struct vsock_sock *vsk;
> > 
> > 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> > -		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> > -			return sk_vsock(vsk);
> > +		struct sock *sk = sk_vsock(vsk);
> > +
> > +		if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
> > +		    vsock_net_check_mode(sock_net(sk), net))
> > +			return sk;
> > 
> > 		if (addr->svm_port == vsk->local_addr.svm_port &&
> > 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
> > -		     addr->svm_cid == VMADDR_CID_ANY))
> > -			return sk_vsock(vsk);
> > +		     addr->svm_cid == VMADDR_CID_ANY) &&
> > +		     vsock_net_check_mode(sock_net(sk), net))
> > +			return sk;
> > 	}
> > 
> > 	return NULL;
> > }
> > 
> > -static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > -						  struct sockaddr_vm *dst)
> > +static struct sock *
> > +__vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > +				  struct sockaddr_vm *dst, struct net *net)
> > {
> > 	struct vsock_sock *vsk;
> > 
> > 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
> > 			    connected_table) {
> > +		struct sock *sk = sk_vsock(vsk);
> > +
> > 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
> > -		    dst->svm_port == vsk->local_addr.svm_port) {
> > -			return sk_vsock(vsk);
> > +		    dst->svm_port == vsk->local_addr.svm_port &&
> > +		    vsock_net_check_mode(sock_net(sk), net)) {
> > +			return sk;
> > 		}
> > 	}
> > 
> > @@ -304,12 +360,13 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> > }
> > EXPORT_SYMBOL_GPL(vsock_remove_connected);
> > 
> > -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > +					 struct net *net)
> > {
> > 	struct sock *sk;
> > 
> > 	spin_lock_bh(&vsock_table_lock);
> > -	sk = __vsock_find_bound_socket(addr);
> > +	sk = __vsock_find_bound_socket_net(addr, net);
> > 	if (sk)
> > 		sock_hold(sk);
> > 
> > @@ -317,15 +374,22 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > 
> > 	return sk;
> > }
> > +EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
> > +
> 
> Can we document this and the _net version to make it clear to transports
> when using `vsock_find_bound_socket_net()` or `vsock_find_bound_socket()`?
> 
> > +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +{
> > +	return vsock_find_bound_socket_net(addr, NULL);
> > +}
> > EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
> > 
> > -struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > -					 struct sockaddr_vm *dst)
> > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > +					     struct sockaddr_vm *dst,
> > +					     struct net *net)
> > {
> > 	struct sock *sk;
> > 
> > 	spin_lock_bh(&vsock_table_lock);
> > -	sk = __vsock_find_connected_socket(src, dst);
> > +	sk = __vsock_find_connected_socket_net(src, dst, net);
> > 	if (sk)
> > 		sock_hold(sk);
> > 
> > @@ -333,6 +397,13 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > 
> > 	return sk;
> > }
> > +EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
> > +
> 
> Ditto about docuementation.
> 
> > +struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > +					 struct sockaddr_vm *dst)
> > +{
> > +	return vsock_find_connected_socket_net(src, dst, NULL);
> > +}
> > EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
> > 
> > void vsock_remove_sock(struct vsock_sock *vsk)
> > @@ -528,7 +599,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > 
> > 	if (sk->sk_type == SOCK_SEQPACKET) {
> > 		if (!new_transport->seqpacket_allow ||
> > -		    !new_transport->seqpacket_allow(remote_cid)) {
> > +		    !new_transport->seqpacket_allow(vsk, remote_cid)) {
> > 			module_put(new_transport->module);
> > 			return -ESOCKTNOSUPPORT;
> > 		}
> > @@ -676,6 +747,7 @@ static void vsock_pending_work(struct work_struct *work)
> > static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > 				    struct sockaddr_vm *addr)
> > {
> > +	struct net *net = sock_net(sk_vsock(vsk));
> > 	static u32 port;
> > 	struct sockaddr_vm new_addr;
> > 
> > @@ -695,7 +767,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > 
> > 			new_addr.svm_port = port++;
> > 
> > -			if (!__vsock_find_bound_socket(&new_addr)) {
> > +			if (!__vsock_find_bound_socket_net(&new_addr, net)) {
> > 				found = true;
> > 				break;
> > 			}
> > @@ -712,7 +784,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > 			return -EACCES;
> > 		}
> > 
> > -		if (__vsock_find_bound_socket(&new_addr))
> > +		if (__vsock_find_bound_socket_net(&new_addr, net))
> > 			return -EADDRINUSE;
> > 	}
> > 
> > @@ -1314,7 +1386,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > 		goto out;
> > 	}
> > 
> > -	if (!transport->dgram_allow(remote_addr->svm_cid,
> > +	if (!transport->dgram_allow(vsk, remote_addr->svm_cid,
> > 				    remote_addr->svm_port)) {
> > 		err = -EINVAL;
> > 		goto out;
> > @@ -1355,7 +1427,7 @@ static int vsock_dgram_connect(struct socket *sock,
> > 	if (err)
> > 		goto out;
> > 
> > -	if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > +	if (!vsk->transport->dgram_allow(vsk, remote_addr->svm_cid,
> > 					 remote_addr->svm_port)) {
> > 		err = -EINVAL;
> > 		goto out;
> > @@ -1585,7 +1657,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr_unsized *addr,
> > 		 * endpoints.
> > 		 */
> > 		if (!transport ||
> > -		    !transport->stream_allow(remote_addr->svm_cid,
> > +		    !transport->stream_allow(vsk, remote_addr->svm_cid,
> > 					     remote_addr->svm_port)) {
> > 			err = -ENETUNREACH;
> > 			goto out;
> > @@ -2662,6 +2734,183 @@ static struct miscdevice vsock_device = {
> > 	.fops		= &vsock_device_ops,
> > };
> > 
> > +static int __vsock_net_mode_string(const struct ctl_table *table, int write,
> > +				   void *buffer, size_t *lenp, loff_t *ppos,
> > +				   enum vsock_net_mode mode,
> > +				   enum vsock_net_mode *new_mode)
> > +{
> > +	char data[VSOCK_NET_MODE_STR_MAX] = {0};
> > +	struct ctl_table tmp;
> > +	int ret;
> > +
> > +	if (!table->data || !table->maxlen || !*lenp) {
> > +		*lenp = 0;
> > +		return 0;
> > +	}
> > +
> > +	tmp = *table;
> > +	tmp.data = data;
> > +
> > +	if (!write) {
> > +		const char *p;
> > +
> > +		switch (mode) {
> > +		case VSOCK_NET_MODE_GLOBAL:
> > +			p = VSOCK_NET_MODE_STR_GLOBAL;
> > +			break;
> > +		case VSOCK_NET_MODE_LOCAL:
> > +			p = VSOCK_NET_MODE_STR_LOCAL;
> > +			break;
> > +		default:
> > +			WARN_ONCE(true, "netns has invalid vsock mode");
> > +			*lenp = 0;
> > +			return 0;
> > +		}
> > +
> > +		strscpy(data, p, sizeof(data));
> > +		tmp.maxlen = strlen(p);
> > +	}
> > +
> > +	ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
> 
> nit:
> 	if (ret || !write)
> 		return ret;
> 
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!write)
> > +		return 0;
> > +
> > +	if (*lenp >= sizeof(data))
> > +		return -EINVAL;
> > +
> > +	if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
> > +		*new_mode = VSOCK_NET_MODE_GLOBAL;
> > +	else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
> > +		*new_mode = VSOCK_NET_MODE_LOCAL;
> > +	else
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +
> > +static int vsock_net_mode_string(const struct ctl_table *table, int write,
> > +				 void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	struct net *net;
> > +
> > +	if (write)
> > +		return -EPERM;
> > +
> > +	net = current->nsproxy->net_ns;
> > +
> > +	return __vsock_net_mode_string(table, write, buffer, lenp, ppos,
> > +				       vsock_net_mode(net), NULL);
> > +}
> > +
> > +static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> > +				       void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	enum vsock_net_mode new_mode;
> > +	struct net *net;
> > +	int ret;
> > +
> > +	net = current->nsproxy->net_ns;
> > +
> > +	ret = __vsock_net_mode_string(table, write, buffer, lenp, ppos,
> > +				      vsock_net_child_mode(net), &new_mode);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (write)
> > +		vsock_net_set_child_mode(net, new_mode);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct ctl_table vsock_table[] = {
> > +	{
> > +		.procname	= "ns_mode",
> > +		.data		= &init_net.vsock.mode,
> > +		.maxlen		= VSOCK_NET_MODE_STR_MAX,
> > +		.mode		= 0444,
> > +		.proc_handler	= vsock_net_mode_string
> > +	},
> > +	{
> > +		.procname	= "child_ns_mode",
> > +		.data		= &init_net.vsock.child_ns_mode,
> > +		.maxlen		= VSOCK_NET_MODE_STR_MAX,
> > +		.mode		= 0644,
> > +		.proc_handler	= vsock_net_child_mode_string
> > +	},
> > +};
> > +
> > +static int __net_init vsock_sysctl_register(struct net *net)
> > +{
> > +	struct ctl_table *table;
> > +
> > +	if (net_eq(net, &init_net)) {
> > +		table = vsock_table;
> > +	} else {
> > +		table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);
> > +		if (!table)
> > +			goto err_alloc;
> > +
> > +		table[0].data = &net->vsock.mode;
> > +		table[1].data = &net->vsock.child_ns_mode;
> > +	}
> > +
> > +	net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table,
> > +						       ARRAY_SIZE(vsock_table));
> > +	if (!net->vsock.sysctl_hdr)
> > +		goto err_reg;
> > +
> > +	return 0;
> > +
> > +err_reg:
> > +	if (!net_eq(net, &init_net))
> > +		kfree(table);
> > +err_alloc:
> > +	return -ENOMEM;
> > +}
> > +
> > +static void vsock_sysctl_unregister(struct net *net)
> > +{
> > +	const struct ctl_table *table;
> > +
> > +	table = net->vsock.sysctl_hdr->ctl_table_arg;
> > +	unregister_net_sysctl_table(net->vsock.sysctl_hdr);
> > +	if (!net_eq(net, &init_net))
> > +		kfree(table);
> > +}
> > +
> > +static void vsock_net_init(struct net *net)
> > +{
> > +	if (net_eq(net, &init_net))
> > +		net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
> 
> Right here, I mean, do we always have to use VSOCK_NET_MODE_GLOBAL or make
> it configurable somehow?

I think I prefer kernel cmdline. Will wait for further comment from Paolo
before next rev though.

> 
> The rest LGTM.
> 
> Stefano
> 

Thanks!

-Bobby

  reply	other threads:[~2026-01-13 21:50 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  3:11 [PATCH net-next v14 00/12] vsock: add namespace support to vhost-vsock and loopback Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 01/12] vsock: add netns to vsock core Bobby Eshleman
2026-01-13  7:45   ` Michael S. Tsirkin
2026-01-13 20:28     ` Bobby Eshleman
2026-01-13 13:53   ` kernel test robot
2026-01-13 14:37   ` Stefano Garzarella
2026-01-13 20:24     ` Bobby Eshleman [this message]
2026-01-13 23:12   ` kernel test robot
2026-01-14 15:54     ` Stefano Garzarella
2026-01-14 17:21       ` Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 02/12] virtio: set skb owner of virtio_transport_reset_no_sock() reply Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 03/12] vsock: add netns support to virtio transports Bobby Eshleman
2026-01-13 14:37   ` Stefano Garzarella
2026-01-13  3:11 ` [PATCH net-next v14 04/12] selftests/vsock: increase timeout to 1200 Bobby Eshleman
2026-01-13 15:42   ` Stefano Garzarella
2026-01-13  3:11 ` [PATCH net-next v14 05/12] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
2026-01-13 15:44   ` Stefano Garzarella
2026-01-13 15:53   ` Stefano Garzarella
2026-01-13  3:11 ` [PATCH net-next v14 06/12] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 07/12] selftests/vsock: add vm_dmesg_{warn,oops}_count() helpers Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 08/12] selftests/vsock: use ss to wait for listeners instead of /proc/net Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 09/12] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
2026-01-13 16:50   ` Stefano Garzarella
2026-01-13  3:11 ` [PATCH net-next v14 10/12] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 11/12] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
2026-01-13  3:11 ` [PATCH net-next v14 12/12] selftests/vsock: add tests for namespace deletion Bobby Eshleman
2026-01-13 17:16   ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWap9cOdhMAg2KPZ@devvm11784.nha0.facebook.com \
    --to=bobbyeshleman@gmail.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=berrange@redhat.com \
    --cc=bobbyeshleman@meta.com \
    --cc=bryan-bt.tan@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=haiyangz@microsoft.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sargun@sargun.me \
    --cc=sgarzare@redhat.com \
    --cc=shuah@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vishnu.dasa@broadcom.com \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.