From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08FF71ACEDE for ; Tue, 13 Jan 2026 07:45:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768290344; cv=none; b=aeq02sxkhGloUXa3J6iszw/ZR48FA+wgx+Z8A3svvfYZ+FrA7hBNF8cU3uOyPCWjoJCy3/1EpsB8QGkgUbxtXMsOsnaXxS/QgoGI32FMBMxjlf99hB7NlGNmFHh7fHhn0kM77iQ9bPXJIaxt7JQt19g+HMV2DotRp0OgxLpcsxY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768290344; c=relaxed/simple; bh=A6C8TXwN0CT9ruSuJKQpYQb53WtBOIPSkber8f6XUx4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: In-Reply-To:Content-Type:Content-Disposition; b=f6XEaJWkUitBHec9glpe8cGAWPD8aTzxAND/x6OIDAX+ETPzJYaZvC1jLXtWcVWxV+DpdKf71BJYUr7VIgUZcH0J3kP90vfCmnyzdmOufZsA/FcfCEFgWUS9YfdrCq9HTSOY16ZhWDzj7O13T9CeBx72qMq7AtOjRtu5cFU7lCk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OHU+VQYY; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OHU+VQYY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768290341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=t/cOjq9xqz+Aze2iLQc4UXZ1d04tQbRmzO7fd0FOu/c=; b=OHU+VQYY5IJ+S0MjKEofkVYLRLI1Op/q7YeG8aTtf+5lJ0IZ9sYsmcLZ3gKmaWd3dAXSDr YapBiJwLeSL4hsLaiwDrMQcp7T22Cq3Do1i/JXOSC0VjPca1RWhB3BSoigASGVBVn89MQD gEcIgB7fzWp6o17edd1cPZp3Gx0orRs= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-658-_KzKYPzAMhqZEEM1Z4LLUw-1; Tue, 13 Jan 2026 02:45:39 -0500 X-MC-Unique: _KzKYPzAMhqZEEM1Z4LLUw-1 X-Mimecast-MFC-AGG-ID: _KzKYPzAMhqZEEM1Z4LLUw_1768290338 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-47d3ffa98fcso47209995e9.3 for ; Mon, 12 Jan 2026 23:45:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768290338; x=1768895138; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t/cOjq9xqz+Aze2iLQc4UXZ1d04tQbRmzO7fd0FOu/c=; b=UvxLHFdTadQ+jppVdYqZpM9TskJdiMDHMSbUqMVf/JxIrNSI37pzoLmdRHPHR1wRqR BU/uV+RKavKLQoUal+6aWoG8+YVeKWjUWYOd5dGFNZNv6rOuduQaKX9jI+qfYfW72Og+ sOuqNwnxKLaC8JLdzwa4TzydB6jOY4XoEXrjiDq3L7IMKNxGcVMC9vJxSfN8JnjtZJMC hgwyxdFoHiTn2ejkswTDFl/HIPWMcrvySFrCrdEIYitaFcdGOuamdTsVK1S7/V2/JctV aoyBEPahEQoNdFmgrwXoSVuhDGXPIFdhpIlqdCBBvzONpicJlH1NwSj64qk1Sy59pTop cLeg== X-Forwarded-Encrypted: i=1; AJvYcCXnEACKnwQzYgp5b6ZyeHG9VSlK1/eF679GHyW4PtGEkFeJIjF1BM37AxvE3RWyp/MXE7CdCvrrQtSgd+xo8w==@lists.linux.dev X-Gm-Message-State: AOJu0YzeXGwmF7K2JZuifLcik7qEFQjYc5K+LXk8HBulfMeTqXwxmVOP 3pC9G515nzdPxEiIqHb8jAEGfOhuETPPq7WNhucNIUGW9PzQzMEigsT6hCpG+OOXIrZ40Ibe8Jp j3Q7IN4eEiINV0YHcG0vnEILvPiL6qaZmwlX6+ApNwlv3Nco8VeVhaOSOzIkNaL6yZsbA X-Gm-Gg: AY/fxX5zI8sgmsu0x4L39Mo8O7JlgOC5ZdJ+hL3kaFQbTRVnEy/Op11ChbrXQVU8K1u RcCODwf1cBF0RtgaRqHzYRBM/75jr7xvjZ6k92TlgqVdqpnKrHO+5DjZ/UxxtljVetx0mx2N44u QtukOstMQNIBVCORSKycl8BCWEoaMV496R7ft60NYzebLtXoa1QL92RALY3lgMZzm3qiul4NnKw /H/AhN6bi5sY4QmIcWhdjoVJsc6I9ZnG2GtpMlSzJGOrBOKAfp03ozg8V24dnrc1osbR7sEN4oL eixHumRLf9qNQ9+fLFJuxx4Z1aF4KTr05VLJiLV6FH840i2G23cMRdM+AyIvbUlaqj7lSy0Zj1T FKJq3jAG942V9P7VTuTFHlN/TAyp5dB4= X-Received: by 2002:a05:600c:c8a:b0:479:3a88:de5e with SMTP id 5b1f17b1804b1-47d84b4a079mr216267815e9.37.1768290337504; Mon, 12 Jan 2026 23:45:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IGlH2HrJ15i5Q18aj/kTJm8HxVkvxrK+HnGE8Tnb9f+u0z9vALtAxcYlfOSyKGboTn4DGttjg== X-Received: by 2002:a05:600c:c8a:b0:479:3a88:de5e with SMTP id 5b1f17b1804b1-47d84b4a079mr216267365e9.37.1768290336793; Mon, 12 Jan 2026 23:45:36 -0800 (PST) Received: from redhat.com (IGLD-80-230-35-22.inter.net.il. [80.230.35.22]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47ed9ea45cesm10199875e9.1.2026.01.12.23.45.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jan 2026 23:45:36 -0800 (PST) Date: Tue, 13 Jan 2026 02:45:32 -0500 From: "Michael S. Tsirkin" To: Bobby Eshleman Cc: Stefano Garzarella , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Stefan Hajnoczi , Jason Wang , Eugenio =?iso-8859-1?Q?P=E9rez?= , Xuan Zhuo , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , Broadcom internal kernel review list , Shuah Khan , Long Li , linux-kernel@vger.kernel.org, virtualization@lists.linux.dev, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kselftest@vger.kernel.org, berrange@redhat.com, Sargun Dhillon , Bobby Eshleman Subject: Re: [PATCH net-next v14 01/12] vsock: add netns to vsock core Message-ID: <20260113024503-mutt-send-email-mst@kernel.org> References: <20260112-vsock-vmtest-v14-0-a5c332db3e2b@meta.com> <20260112-vsock-vmtest-v14-1-a5c332db3e2b@meta.com> Precedence: bulk X-Mailing-List: virtualization@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <20260112-vsock-vmtest-v14-1-a5c332db3e2b@meta.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: eedYA4noMEE5gnQnc662bZjSh95qlFaunHgQpM7s6zA_1768290338 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Jan 12, 2026 at 07:11:10PM -0800, Bobby Eshleman wrote: > From: Bobby Eshleman > > Add netns logic to vsock core. Additionally, modify transport hook > prototypes to be used by later transport-specific patches (e.g., > *_seqpacket_allow()). > > Namespaces are supported primarily by changing socket lookup functions > (e.g., vsock_find_connected_socket()) to take into account the socket > namespace and the namespace mode before considering a candidate socket a > "match". > > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to > report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode > for new namespaces. > > Add netns functionality (initialization, passing to transports, procfs, > etc...) to the af_vsock socket layer. Later patches that add netns > support to transports depend on this patch. > > dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are > modified to take a vsk in order to perform logic on namespace modes. In > future patches, the net will also be used for socket > lookups in these functions. > > Signed-off-by: Bobby Eshleman > --- > Changes in v14: > - include linux/sysctl.h in af_vsock.c > - squash patch 'vsock: add per-net vsock NS mode state' into this patch > (prior version can be found here): > https://lore.kernel.org/all/20251223-vsock-vmtest-v13-1-9d6db8e7c80b@meta.com/) So, about the static port, are you going to address it in the next version then? > Changes in v13: > - remove net_mode and replace with direct accesses to net->vsock.mode, > since this is now immutable. > - update comments about mode behavior and mutability, and sysctl API > - only pass NULL for net when wanting global, instead of net_mode == > VSOCK_NET_MODE_GLOBAL. This reflects the new logic > of vsock_net_check_mode() that only requires net pointers (not > net_mode). > - refactor sysctl string code into a re-usable function, because > child_ns_mode and ns_mode both handle the same strings. > - remove redundant vsock_net_init(&init_net) call in module init because > pernet registration calls the callback on the init_net too > > Changes in v12: > - return true in dgram_allow(), stream_allow(), and seqpacket_allow() > only if net_mode == VSOCK_NET_MODE_GLOBAL (Stefano) > - document bind(VMADDR_CID_ANY) case in af_vsock.c (Stefano) > - change order of stream_allow() call in vmci so we can pass vsk > to it > > Changes in v10: > - add file-level comment about what happens to sockets/devices > when the namespace mode changes (Stefano) > - change the 'if (write)' boolean in vsock_net_mode_string() to > if (!write), this simplifies a later patch which adds "goto" > for mutex unlocking on function exit. > > Changes in v9: > - remove virtio_vsock_alloc_rx_skb() (Stefano) > - remove vsock_global_dummy_net, not needed as net=NULL + > net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result > > Changes in v7: > - hv_sock: fix hyperv build error > - explain why vhost does not use the dummy > - explain usage of __vsock_global_dummy_net > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters > - use switch-case in vsock_net_mode_string() > - avoid changing transports as much as possible > - add vsock_find_{bound,connected}_socket_net() > - rename `vsock_hdr` to `sysctl_hdr` > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and > global mode for virtio-vsock, move skb->cb zero-ing into wrapper > - explain seqpacket_allow() change > - move net setting to __vsock_create() instead of vsock_create() so > that child sockets also have their net assigned upon accept() > > Changes in v6: > - unregister sysctl ops in vsock_exit() > - af_vsock: clarify description of CID behavior > - af_vsock: fix buf vs buffer naming, and length checking > - af_vsock: fix length checking w/ correct ctl_table->maxlen > > Changes in v5: > - vsock_global_net() -> vsock_global_dummy_net() > - update comments for new uAPI > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode > - add prototype changes so patch remains compilable > --- > MAINTAINERS | 1 + > drivers/vhost/vsock.c | 6 +- > include/linux/virtio_vsock.h | 4 +- > include/net/af_vsock.h | 53 +++++- > include/net/net_namespace.h | 4 + > include/net/netns/vsock.h | 17 ++ > net/vmw_vsock/af_vsock.c | 297 +++++++++++++++++++++++++++++--- > net/vmw_vsock/hyperv_transport.c | 7 +- > net/vmw_vsock/virtio_transport.c | 9 +- > net/vmw_vsock/virtio_transport_common.c | 6 +- > net/vmw_vsock/vmci_transport.c | 26 ++- > net/vmw_vsock/vsock_loopback.c | 8 +- > 12 files changed, 394 insertions(+), 44 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 6737aad729d6..f4aa476427c8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -27522,6 +27522,7 @@ L: netdev@vger.kernel.org > S: Maintained > F: drivers/vhost/vsock.c > F: include/linux/virtio_vsock.h > +F: include/net/netns/vsock.h > F: include/uapi/linux/virtio_vsock.h > F: net/vmw_vsock/virtio_transport.c > F: net/vmw_vsock/virtio_transport_common.c > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 552cfb53498a..647ded6f6ea5 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -407,7 +407,8 @@ static bool vhost_transport_msgzerocopy_allow(void) > return true; > } > > -static bool vhost_transport_seqpacket_allow(u32 remote_cid); > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, > + u32 remote_cid); > > static struct virtio_transport vhost_transport = { > .transport = { > @@ -463,7 +464,8 @@ static struct virtio_transport vhost_transport = { > .send_pkt = vhost_transport_send_pkt, > }; > > -static bool vhost_transport_seqpacket_allow(u32 remote_cid) > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, > + u32 remote_cid) > { > struct vhost_vsock *vsock; > bool seqpacket_allow = false; > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > index 0c67543a45c8..1845e8d4f78d 100644 > --- a/include/linux/virtio_vsock.h > +++ b/include/linux/virtio_vsock.h > @@ -256,10 +256,10 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val); > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk); > bool virtio_transport_stream_is_active(struct vsock_sock *vsk); > -bool virtio_transport_stream_allow(u32 cid, u32 port); > +bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port); > int virtio_transport_dgram_bind(struct vsock_sock *vsk, > struct sockaddr_vm *addr); > -bool virtio_transport_dgram_allow(u32 cid, u32 port); > +bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port); > > int virtio_transport_connect(struct vsock_sock *vsk); > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h > index d40e978126e3..10c2846fcc58 100644 > --- a/include/net/af_vsock.h > +++ b/include/net/af_vsock.h > @@ -10,6 +10,7 @@ > > #include > #include > +#include > #include > #include > > @@ -124,7 +125,7 @@ struct vsock_transport { > size_t len, int flags); > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *, > struct msghdr *, size_t len); > - bool (*dgram_allow)(u32 cid, u32 port); > + bool (*dgram_allow)(struct vsock_sock *vsk, u32 cid, u32 port); > > /* STREAM. */ > /* TODO: stream_bind() */ > @@ -136,14 +137,14 @@ struct vsock_transport { > s64 (*stream_has_space)(struct vsock_sock *); > u64 (*stream_rcvhiwat)(struct vsock_sock *); > bool (*stream_is_active)(struct vsock_sock *); > - bool (*stream_allow)(u32 cid, u32 port); > + bool (*stream_allow)(struct vsock_sock *vsk, u32 cid, u32 port); > > /* SEQ_PACKET. */ > ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, > int flags); > int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg, > size_t len); > - bool (*seqpacket_allow)(u32 remote_cid); > + bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid); > u32 (*seqpacket_has_data)(struct vsock_sock *vsk); > > /* Notification. */ > @@ -216,6 +217,11 @@ void vsock_remove_connected(struct vsock_sock *vsk); > struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); > struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, > struct sockaddr_vm *dst); > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, > + struct net *net); > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src, > + struct sockaddr_vm *dst, > + struct net *net); > void vsock_remove_sock(struct vsock_sock *vsk); > void vsock_for_each_connected_socket(struct vsock_transport *transport, > void (*fn)(struct sock *sk)); > @@ -256,4 +262,45 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) > { > return t->msgzerocopy_allow && t->msgzerocopy_allow(); > } > + > +static inline enum vsock_net_mode vsock_net_mode(struct net *net) > +{ > + return READ_ONCE(net->vsock.mode); > +} > + > +static inline void vsock_net_set_child_mode(struct net *net, > + enum vsock_net_mode mode) > +{ > + WRITE_ONCE(net->vsock.child_ns_mode, mode); > +} > + > +static inline enum vsock_net_mode vsock_net_child_mode(struct net *net) > +{ > + return READ_ONCE(net->vsock.child_ns_mode); > +} > + > +/* Return true if two namespaces pass the mode rules. Otherwise, return false. > + * > + * A NULL namespace is treated as VSOCK_NET_MODE_GLOBAL. > + * > + * Read more about modes in the comment header of net/vmw_vsock/af_vsock.c. > + */ > +static inline bool vsock_net_check_mode(struct net *ns0, struct net *ns1) > +{ > + enum vsock_net_mode mode0, mode1; > + > + /* Any vsocks within the same network namespace are always reachable, > + * regardless of the mode. > + */ > + if (net_eq(ns0, ns1)) > + return true; > + > + mode0 = ns0 ? vsock_net_mode(ns0) : VSOCK_NET_MODE_GLOBAL; > + mode1 = ns1 ? vsock_net_mode(ns1) : VSOCK_NET_MODE_GLOBAL; > + > + /* Different namespaces are only reachable if they are both > + * global mode. > + */ > + return mode0 == VSOCK_NET_MODE_GLOBAL && mode0 == mode1; > +} > #endif /* __AF_VSOCK_H__ */ > diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h > index cb664f6e3558..66d3de1d935f 100644 > --- a/include/net/net_namespace.h > +++ b/include/net/net_namespace.h > @@ -37,6 +37,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -196,6 +197,9 @@ struct net { > /* Move to a better place when the config guard is removed. */ > struct mutex rtnl_mutex; > #endif > +#if IS_ENABLED(CONFIG_VSOCKETS) > + struct netns_vsock vsock; > +#endif > } __randomize_layout; > > #include > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h > new file mode 100644 > index 000000000000..e2325e2d6ec5 > --- /dev/null > +++ b/include/net/netns/vsock.h > @@ -0,0 +1,17 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef __NET_NET_NAMESPACE_VSOCK_H > +#define __NET_NET_NAMESPACE_VSOCK_H > + > +#include > + > +enum vsock_net_mode { > + VSOCK_NET_MODE_GLOBAL, > + VSOCK_NET_MODE_LOCAL, > +}; > + > +struct netns_vsock { > + struct ctl_table_header *sysctl_hdr; > + enum vsock_net_mode mode; > + enum vsock_net_mode child_ns_mode; > +}; > +#endif /* __NET_NET_NAMESPACE_VSOCK_H */ > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > index a3505a4dcee0..9d614e4a4fa5 100644 > --- a/net/vmw_vsock/af_vsock.c > +++ b/net/vmw_vsock/af_vsock.c > @@ -83,6 +83,42 @@ > * TCP_ESTABLISHED - connected > * TCP_CLOSING - disconnecting > * TCP_LISTEN - listening > + * > + * - Namespaces in vsock support two different modes configured > + * through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global". > + * Each mode defines how the namespace interacts with CIDs. > + * /proc/sys/net/vsock/ns_mode is read-only and inherited from the > + * parent namespace's /proc/sys/net/vsock/child_ns_mode at creation > + * time and is immutable thereafter. The default is "global". > + * > + * The modes affect the allocation and accessibility of CIDs as follows: > + * > + * - global - access and allocation are all system-wide > + * - all CID allocation from global namespaces draw from the same > + * system-wide pool. > + * - if one global namespace has already allocated some CID, another > + * global namespace will not be able to allocate the same CID. > + * - global mode AF_VSOCK sockets can reach any VM or socket in any global > + * namespace, they are not contained to only their own namespace. > + * - AF_VSOCK sockets in a global mode namespace cannot reach VMs or > + * sockets in any local mode namespace. > + * - local - access and allocation are contained within the namespace > + * - CID allocation draws only from a private pool local only to the > + * namespace, and does not affect the CIDs available for allocation in any > + * other namespace (global or local). > + * - VMs in a local namespace do not collide with CIDs in any other local > + * namespace or any global namespace. For example, if a VM in a local mode > + * namespace is given CID 10, then CID 10 is still available for > + * allocation in any other namespace, but not in the same namespace. > + * - AF_VSOCK sockets in a local mode namespace can connect only to VMs or > + * other sockets within their own namespace. > + * - sockets bound to VMADDR_CID_ANY in local namespaces will never resolve > + * to any transport that is not compatible with local mode. There is no > + * error that propagates to the user (as there is for connection attempts) > + * because it is possible for some packet to reach this socket from > + * a different transport that *does* support local mode. For > + * example, virtio-vsock may not support local mode, but the socket > + * may still accept a connection from vhost-vsock which does. > */ > > #include > @@ -100,20 +136,31 @@ > #include > #include > #include > +#include > #include > #include > #include > #include > #include > #include > +#include > #include > #include > #include > #include > #include > +#include > #include > #include > > +#define VSOCK_NET_MODE_STR_GLOBAL "global" > +#define VSOCK_NET_MODE_STR_LOCAL "local" > + > +/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'. > + * The newline is added by proc_dostring() for read operations. > + */ > +#define VSOCK_NET_MODE_STR_MAX 8 > + > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); > static void vsock_sk_destruct(struct sock *sk); > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); > @@ -235,33 +282,42 @@ static void __vsock_remove_connected(struct vsock_sock *vsk) > sock_put(&vsk->sk); > } > > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) > +static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr, > + struct net *net) > { > struct vsock_sock *vsk; > > list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) { > - if (vsock_addr_equals_addr(addr, &vsk->local_addr)) > - return sk_vsock(vsk); > + struct sock *sk = sk_vsock(vsk); > + > + if (vsock_addr_equals_addr(addr, &vsk->local_addr) && > + vsock_net_check_mode(sock_net(sk), net)) > + return sk; > > if (addr->svm_port == vsk->local_addr.svm_port && > (vsk->local_addr.svm_cid == VMADDR_CID_ANY || > - addr->svm_cid == VMADDR_CID_ANY)) > - return sk_vsock(vsk); > + addr->svm_cid == VMADDR_CID_ANY) && > + vsock_net_check_mode(sock_net(sk), net)) > + return sk; > } > > return NULL; > } > > -static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, > - struct sockaddr_vm *dst) > +static struct sock * > +__vsock_find_connected_socket_net(struct sockaddr_vm *src, > + struct sockaddr_vm *dst, struct net *net) > { > struct vsock_sock *vsk; > > list_for_each_entry(vsk, vsock_connected_sockets(src, dst), > connected_table) { > + struct sock *sk = sk_vsock(vsk); > + > if (vsock_addr_equals_addr(src, &vsk->remote_addr) && > - dst->svm_port == vsk->local_addr.svm_port) { > - return sk_vsock(vsk); > + dst->svm_port == vsk->local_addr.svm_port && > + vsock_net_check_mode(sock_net(sk), net)) { > + return sk; > } > } > > @@ -304,12 +360,13 @@ void vsock_remove_connected(struct vsock_sock *vsk) > } > EXPORT_SYMBOL_GPL(vsock_remove_connected); > > -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, > + struct net *net) > { > struct sock *sk; > > spin_lock_bh(&vsock_table_lock); > - sk = __vsock_find_bound_socket(addr); > + sk = __vsock_find_bound_socket_net(addr, net); > if (sk) > sock_hold(sk); > > @@ -317,15 +374,22 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) > > return sk; > } > +EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net); > + > +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) > +{ > + return vsock_find_bound_socket_net(addr, NULL); > +} > EXPORT_SYMBOL_GPL(vsock_find_bound_socket); > > -struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, > - struct sockaddr_vm *dst) > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src, > + struct sockaddr_vm *dst, > + struct net *net) > { > struct sock *sk; > > spin_lock_bh(&vsock_table_lock); > - sk = __vsock_find_connected_socket(src, dst); > + sk = __vsock_find_connected_socket_net(src, dst, net); > if (sk) > sock_hold(sk); > > @@ -333,6 +397,13 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, > > return sk; > } > +EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net); > + > +struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, > + struct sockaddr_vm *dst) > +{ > + return vsock_find_connected_socket_net(src, dst, NULL); > +} > EXPORT_SYMBOL_GPL(vsock_find_connected_socket); > > void vsock_remove_sock(struct vsock_sock *vsk) > @@ -528,7 +599,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) > > if (sk->sk_type == SOCK_SEQPACKET) { > if (!new_transport->seqpacket_allow || > - !new_transport->seqpacket_allow(remote_cid)) { > + !new_transport->seqpacket_allow(vsk, remote_cid)) { > module_put(new_transport->module); > return -ESOCKTNOSUPPORT; > } > @@ -676,6 +747,7 @@ static void vsock_pending_work(struct work_struct *work) > static int __vsock_bind_connectible(struct vsock_sock *vsk, > struct sockaddr_vm *addr) > { > + struct net *net = sock_net(sk_vsock(vsk)); > static u32 port; > struct sockaddr_vm new_addr; > > @@ -695,7 +767,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, > > new_addr.svm_port = port++; > > - if (!__vsock_find_bound_socket(&new_addr)) { > + if (!__vsock_find_bound_socket_net(&new_addr, net)) { > found = true; > break; > } > @@ -712,7 +784,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, > return -EACCES; > } > > - if (__vsock_find_bound_socket(&new_addr)) > + if (__vsock_find_bound_socket_net(&new_addr, net)) > return -EADDRINUSE; > } > > @@ -1314,7 +1386,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, > goto out; > } > > - if (!transport->dgram_allow(remote_addr->svm_cid, > + if (!transport->dgram_allow(vsk, remote_addr->svm_cid, > remote_addr->svm_port)) { > err = -EINVAL; > goto out; > @@ -1355,7 +1427,7 @@ static int vsock_dgram_connect(struct socket *sock, > if (err) > goto out; > > - if (!vsk->transport->dgram_allow(remote_addr->svm_cid, > + if (!vsk->transport->dgram_allow(vsk, remote_addr->svm_cid, > remote_addr->svm_port)) { > err = -EINVAL; > goto out; > @@ -1585,7 +1657,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr_unsized *addr, > * endpoints. > */ > if (!transport || > - !transport->stream_allow(remote_addr->svm_cid, > + !transport->stream_allow(vsk, remote_addr->svm_cid, > remote_addr->svm_port)) { > err = -ENETUNREACH; > goto out; > @@ -2662,6 +2734,183 @@ static struct miscdevice vsock_device = { > .fops = &vsock_device_ops, > }; > > +static int __vsock_net_mode_string(const struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos, > + enum vsock_net_mode mode, > + enum vsock_net_mode *new_mode) > +{ > + char data[VSOCK_NET_MODE_STR_MAX] = {0}; > + struct ctl_table tmp; > + int ret; > + > + if (!table->data || !table->maxlen || !*lenp) { > + *lenp = 0; > + return 0; > + } > + > + tmp = *table; > + tmp.data = data; > + > + if (!write) { > + const char *p; > + > + switch (mode) { > + case VSOCK_NET_MODE_GLOBAL: > + p = VSOCK_NET_MODE_STR_GLOBAL; > + break; > + case VSOCK_NET_MODE_LOCAL: > + p = VSOCK_NET_MODE_STR_LOCAL; > + break; > + default: > + WARN_ONCE(true, "netns has invalid vsock mode"); > + *lenp = 0; > + return 0; > + } > + > + strscpy(data, p, sizeof(data)); > + tmp.maxlen = strlen(p); > + } > + > + ret = proc_dostring(&tmp, write, buffer, lenp, ppos); > + if (ret) > + return ret; > + > + if (!write) > + return 0; > + > + if (*lenp >= sizeof(data)) > + return -EINVAL; > + > + if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) > + *new_mode = VSOCK_NET_MODE_GLOBAL; > + else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) > + *new_mode = VSOCK_NET_MODE_LOCAL; > + else > + return -EINVAL; > + > + return 0; > +} > + > +static int vsock_net_mode_string(const struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + struct net *net; > + > + if (write) > + return -EPERM; > + > + net = current->nsproxy->net_ns; > + > + return __vsock_net_mode_string(table, write, buffer, lenp, ppos, > + vsock_net_mode(net), NULL); > +} > + > +static int vsock_net_child_mode_string(const struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + enum vsock_net_mode new_mode; > + struct net *net; > + int ret; > + > + net = current->nsproxy->net_ns; > + > + ret = __vsock_net_mode_string(table, write, buffer, lenp, ppos, > + vsock_net_child_mode(net), &new_mode); > + if (ret) > + return ret; > + > + if (write) > + vsock_net_set_child_mode(net, new_mode); > + > + return 0; > +} > + > +static struct ctl_table vsock_table[] = { > + { > + .procname = "ns_mode", > + .data = &init_net.vsock.mode, > + .maxlen = VSOCK_NET_MODE_STR_MAX, > + .mode = 0444, > + .proc_handler = vsock_net_mode_string > + }, > + { > + .procname = "child_ns_mode", > + .data = &init_net.vsock.child_ns_mode, > + .maxlen = VSOCK_NET_MODE_STR_MAX, > + .mode = 0644, > + .proc_handler = vsock_net_child_mode_string > + }, > +}; > + > +static int __net_init vsock_sysctl_register(struct net *net) > +{ > + struct ctl_table *table; > + > + if (net_eq(net, &init_net)) { > + table = vsock_table; > + } else { > + table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL); > + if (!table) > + goto err_alloc; > + > + table[0].data = &net->vsock.mode; > + table[1].data = &net->vsock.child_ns_mode; > + } > + > + net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table, > + ARRAY_SIZE(vsock_table)); > + if (!net->vsock.sysctl_hdr) > + goto err_reg; > + > + return 0; > + > +err_reg: > + if (!net_eq(net, &init_net)) > + kfree(table); > +err_alloc: > + return -ENOMEM; > +} > + > +static void vsock_sysctl_unregister(struct net *net) > +{ > + const struct ctl_table *table; > + > + table = net->vsock.sysctl_hdr->ctl_table_arg; > + unregister_net_sysctl_table(net->vsock.sysctl_hdr); > + if (!net_eq(net, &init_net)) > + kfree(table); > +} > + > +static void vsock_net_init(struct net *net) > +{ > + if (net_eq(net, &init_net)) > + net->vsock.mode = VSOCK_NET_MODE_GLOBAL; > + else > + net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns); > + > + net->vsock.child_ns_mode = VSOCK_NET_MODE_GLOBAL; > +} > + > +static __net_init int vsock_sysctl_init_net(struct net *net) > +{ > + vsock_net_init(net); > + > + if (vsock_sysctl_register(net)) > + return -ENOMEM; > + > + return 0; > +} > + > +static __net_exit void vsock_sysctl_exit_net(struct net *net) > +{ > + vsock_sysctl_unregister(net); > +} > + > +static struct pernet_operations vsock_sysctl_ops __net_initdata = { > + .init = vsock_sysctl_init_net, > + .exit = vsock_sysctl_exit_net, > +}; > + > static int __init vsock_init(void) > { > int err = 0; > @@ -2689,10 +2938,17 @@ static int __init vsock_init(void) > goto err_unregister_proto; > } > > + if (register_pernet_subsys(&vsock_sysctl_ops)) { > + err = -ENOMEM; > + goto err_unregister_sock; > + } > + > vsock_bpf_build_proto(); > > return 0; > > +err_unregister_sock: > + sock_unregister(AF_VSOCK); > err_unregister_proto: > proto_unregister(&vsock_proto); > err_deregister_misc: > @@ -2706,6 +2962,7 @@ static void __exit vsock_exit(void) > misc_deregister(&vsock_device); > sock_unregister(AF_VSOCK); > proto_unregister(&vsock_proto); > + unregister_pernet_subsys(&vsock_sysctl_ops); > } > > const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk) > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c > index 432fcbbd14d4..4d6d7807f152 100644 > --- a/net/vmw_vsock/hyperv_transport.c > +++ b/net/vmw_vsock/hyperv_transport.c > @@ -570,7 +570,7 @@ static int hvs_dgram_enqueue(struct vsock_sock *vsk, > return -EOPNOTSUPP; > } > > -static bool hvs_dgram_allow(u32 cid, u32 port) > +static bool hvs_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port) > { > return false; > } > @@ -745,8 +745,11 @@ static bool hvs_stream_is_active(struct vsock_sock *vsk) > return hvs->chan != NULL; > } > > -static bool hvs_stream_allow(u32 cid, u32 port) > +static bool hvs_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port) > { > + if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL) > + return false; > + > if (cid == VMADDR_CID_HOST) > return true; > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c > index 8c867023a2e5..37eeefddb48c 100644 > --- a/net/vmw_vsock/virtio_transport.c > +++ b/net/vmw_vsock/virtio_transport.c > @@ -536,7 +536,8 @@ static bool virtio_transport_msgzerocopy_allow(void) > return true; > } > > -static bool virtio_transport_seqpacket_allow(u32 remote_cid); > +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, > + u32 remote_cid); > > static struct virtio_transport virtio_transport = { > .transport = { > @@ -593,11 +594,15 @@ static struct virtio_transport virtio_transport = { > .can_msgzerocopy = virtio_transport_can_msgzerocopy, > }; > > -static bool virtio_transport_seqpacket_allow(u32 remote_cid) > +static bool > +virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) > { > struct virtio_vsock *vsock; > bool seqpacket_allow; > > + if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL) > + return false; > + > seqpacket_allow = false; > rcu_read_lock(); > vsock = rcu_dereference(the_virtio_vsock); > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index dcc8a1d5851e..fdb8f5b3fa60 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -1043,9 +1043,9 @@ bool virtio_transport_stream_is_active(struct vsock_sock *vsk) > } > EXPORT_SYMBOL_GPL(virtio_transport_stream_is_active); > > -bool virtio_transport_stream_allow(u32 cid, u32 port) > +bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port) > { > - return true; > + return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL; > } > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow); > > @@ -1056,7 +1056,7 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk, > } > EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind); > > -bool virtio_transport_dgram_allow(u32 cid, u32 port) > +bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port) > { > return false; > } > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c > index 7eccd6708d66..d5ce39ea5a1b 100644 > --- a/net/vmw_vsock/vmci_transport.c > +++ b/net/vmw_vsock/vmci_transport.c > @@ -646,13 +646,17 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg) > return VMCI_SUCCESS; > } > > -static bool vmci_transport_stream_allow(u32 cid, u32 port) > +static bool vmci_transport_stream_allow(struct vsock_sock *vsk, u32 cid, > + u32 port) > { > static const u32 non_socket_contexts[] = { > VMADDR_CID_LOCAL, > }; > int i; > > + if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL) > + return false; > + > BUILD_BUG_ON(sizeof(cid) != sizeof(*non_socket_contexts)); > > for (i = 0; i < ARRAY_SIZE(non_socket_contexts); i++) { > @@ -682,12 +686,10 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg) > err = VMCI_SUCCESS; > bh_process_pkt = false; > > - /* Ignore incoming packets from contexts without sockets, or resources > - * that aren't vsock implementations. > + /* Ignore incoming packets from resources that aren't vsock > + * implementations. > */ > - > - if (!vmci_transport_stream_allow(dg->src.context, -1) > - || vmci_transport_peer_rid(dg->src.context) != dg->src.resource) > + if (vmci_transport_peer_rid(dg->src.context) != dg->src.resource) > return VMCI_ERROR_NO_ACCESS; > > if (VMCI_DG_SIZE(dg) < sizeof(*pkt)) > @@ -749,6 +751,12 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg) > goto out; > } > > + /* Ignore incoming packets from contexts without sockets. */ > + if (!vmci_transport_stream_allow(vsk, dg->src.context, -1)) { > + err = VMCI_ERROR_NO_ACCESS; > + goto out; > + } > + > /* We do most everything in a work queue, but let's fast path the > * notification of reads and writes to help data transfer performance. > * We can only do this if there is no process context code executing > @@ -1784,8 +1792,12 @@ static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk, > return err; > } > > -static bool vmci_transport_dgram_allow(u32 cid, u32 port) > +static bool vmci_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, > + u32 port) > { > + if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL) > + return false; > + > if (cid == VMADDR_CID_HYPERVISOR) { > /* Registrations of PBRPC Servers do not modify VMX/Hypervisor > * state and are allowed. > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c > index bc2ff918b315..378a96dcb666 100644 > --- a/net/vmw_vsock/vsock_loopback.c > +++ b/net/vmw_vsock/vsock_loopback.c > @@ -46,7 +46,8 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) > return 0; > } > > -static bool vsock_loopback_seqpacket_allow(u32 remote_cid); > +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, > + u32 remote_cid); > static bool vsock_loopback_msgzerocopy_allow(void) > { > return true; > @@ -106,9 +107,10 @@ static struct virtio_transport loopback_transport = { > .send_pkt = vsock_loopback_send_pkt, > }; > > -static bool vsock_loopback_seqpacket_allow(u32 remote_cid) > +static bool > +vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) > { > - return true; > + return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL; > } > > static void vsock_loopback_work(struct work_struct *work) > > -- > 2.47.3