netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Shuah Khan" <shuah@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Bryan Tan" <bryan-bt.tan@broadcom.com>,
	"Vishnu Dasa" <vishnu.dasa@broadcom.com>,
	"Broadcom internal kernel review list"
	<bcm-kernel-feedback-list@broadcom.com>,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-hyperv@vger.kernel.org,
	"Sargun Dhillon" <sargun@sargun.me>,
	berrange@redhat.com, "Bobby Eshleman" <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next v9 08/14] vsock: reject bad VSOCK_NET_MODE_LOCAL configuration for G2H
Date: Wed, 12 Nov 2025 10:36:50 -0800	[thread overview]
Message-ID: <aRTTwuuXSz5CvNjt@devvm11784.nha0.facebook.com> (raw)
In-Reply-To: <ureyl5b2tneivmlce4fdtmuoxgayfxwgewoypb6oyxeh7ozt3i@chygpr2uvtcp>

On Wed, Nov 12, 2025 at 03:21:39PM +0100, Stefano Garzarella wrote:
> On Tue, Nov 11, 2025 at 10:54:50PM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Reject setting VSOCK_NET_MODE_LOCAL with -EOPNOTSUPP if a G2H transport
> > is operational. Additionally, reject G2H transport registration if there
> > already exists a namespace in local mode.
> > 
> > G2H sockets break in local mode because the G2H transports don't support
> > namespacing yet. The current approach is to coerce packets coming out of
> > G2H transports into VSOCK_NET_MODE_GLOBAL mode, but it is not possible
> > to coerce sockets in the same way because it cannot be deduced which
> > transport will be used by the socket. Specifically, when bound to
> > VMADDR_CID_ANY in a nested VM (both G2H and H2G available), it is not
> > until a packet is received and matched to the bound socket that we
> > assign the transport. This presents a chicken-and-egg problem, because
> > we need the namespace to lookup the socket and resolve the transport,
> > but we need the transport to know how to use the namespace during
> > lookup.
> > 
> > For that reason, this patch prevents VSOCK_NET_MODE_LOCAL from being
> > used on systems that support G2H, even nested systems that also have H2G
> > transports.
> > 
> > Local mode is blocked based on detecting the presence of G2H devices
> > (when possible, as hyperv is special). This means that a host kernel
> > with G2H support compiled in (or has the module loaded), will still
> > support local mode because there is no G2H (e.g., virtio-vsock) device
> > detected. This enables using the same kernel in the host and in the
> > guest, as we do in kselftest.
> > 
> > Systems with only namespace-aware transports (vhost-vsock, loopback) can
> > still use both VSOCK_NET_MODE_GLOBAL and VSOCK_NET_MODE_LOCAL modes as
> > intended.
> > 
> > The hyperv transport must be treated specially. Other G2H transports can
> > can report presence of a device using get_local_cid(). When a device is
> > present it returns a valid CID; otherwise, it returns VMADDR_CID_ANY.
> > THe hyperv transport's get_local_cid() always returns VMADDR_CID_ANY,
> > however, even when a device is present.
> > 
> > For that reason, this patch adds an always_block_local_mode flag to
> > struct vsock_transport. When set to true, VSOCK_NET_MODE_LOCAL is
> > blocked unconditionally whenever the transport is registered, regardless
> > of device presence. When false, LOCAL mode is only blocked when
> > get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
> > 
> > The hyperv transport sets this flag to true to unconditionally block
> > local mode. Other G2H transports (virtio-vsock, vmci-vsock) leave it
> > false and continue using device detection via get_local_cid() to block
> > local mode.
> > 
> > These restrictions can be lifted in a future patch series when G2H
> > transports gain namespace support.
> 
> IMO this commit should be before supporting namespaces in any device,
> so we are sure we don't have commits where this can happen.

sgtm!

> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > include/net/af_vsock.h           |  8 +++++++
> > net/vmw_vsock/af_vsock.c         | 45 +++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/hyperv_transport.c |  1 +
> > 3 files changed, 51 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index cfd121bb5ab7..089c61105dda 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -108,6 +108,14 @@ struct vsock_transport_send_notify_data {
> > 
> > struct vsock_transport {
> > 	struct module *module;
> > +	/* If true, block VSOCK_NET_MODE_LOCAL unconditionally when this G2H
> > +	 * transport is registered. If false, only block LOCAL mode when
> > +	 * get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
> > +	 * Hyperv sets this true because it doesn't offer a callback that
> > +	 * detects device presence. This only applies to G2H transports; H2G
> > +	 * transports are unaffected.
> > +	 */
> > +	bool always_block_local_mode;
> > 
> > 	/* Initialize/tear-down socket. */
> > 	int (*init)(struct vsock_sock *, struct vsock_sock *);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index c0b5946bdc95..a2da1810b802 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -91,6 +91,11 @@
> >  *   and locked down by a namespace manager. The default is "global". The mode
> >  *   is set per-namespace.
> >  *
> > + *   Note: LOCAL mode is only supported when using namespace-aware transports
> > + *   (vhost-vsock, loopback). If a guest-to-host transport (virtio-vsock,
> > + *   hyperv-vsock, vmci-vsock) is loaded, attempts to set LOCAL mode will fail
> > + *   with EOPNOTSUPP, as these transports do not support per-namespace
> > isolation.
> > + *
> >  *   The modes affect the allocation and accessibility of CIDs as follows:
> >  *
> >  *   - global - access and allocation are all system-wide
> > @@ -2757,12 +2762,30 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write,
> > 		if (*lenp >= sizeof(data))
> > 			return -EINVAL;
> > 
> > -		if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
> > +		if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) {
> > 			mode = VSOCK_NET_MODE_GLOBAL;
> > -		else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
> > +		} else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) {
> > +			/* LOCAL mode is not supported when G2H transports
> > +			 * (virtio-vsock, hyperv, vmci) are active, because
> > +			 * these transports don't support namespaces. We must
> > +			 * stay in GLOBAL mode to avoid bind/lookup mismatches.
> > +			 *
> > +			 * Check if G2H transport is present and either:
> > +			 * 1. Has always_block_local_mode set (hyperv), OR
> > +			 * 2. Has an actual device present (get_local_cid() != VMADDR_CID_ANY)
> > +			 */
> > +			mutex_lock(&vsock_register_mutex);
> > +			if (transport_g2h &&
> > +			    (transport_g2h->always_block_local_mode ||
> > +			     transport_g2h->get_local_cid() != VMADDR_CID_ANY)) {
> 
> This seems almost like a hack. What about adding a new function in the
> transports that tells us whether a device is present or not and implement it
> in all of them?
> 
> Or a more specific function to check if the transport can work with local
> mode (e.g.  netns_local_aware() or something like that - I'm not great with
> nameming xD)

That sounds good to me, I probably prefer option 2 because I think it'll
be simpler for the hyperv case.

> 
> > +				mutex_unlock(&vsock_register_mutex);
> > +				return -EOPNOTSUPP;
> > +			}
> > +			mutex_unlock(&vsock_register_mutex);
> 
> What happen if the G2H is loaded here, just after we release the mutex?
> 
> I suspect there could be a race that we may fix postponing the unlock after
> the vsock_net_write_mode() call.
> 
> WDYT?

Oh good eye, yeah I think you are right. Writing the net mode should
definitely be in the critical section.

> 
> > 			mode = VSOCK_NET_MODE_LOCAL;
> > -		else
> > +		} else {
> > 			return -EINVAL;
> > +		}
> > 
> > 		if (!vsock_net_write_mode(net, mode))
> > 			return -EPERM;
> > @@ -2909,6 +2932,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > {
> > 	const struct vsock_transport *t_h2g, *t_g2h, *t_dgram, *t_local;
> > 	int err = mutex_lock_interruptible(&vsock_register_mutex);
> > +	struct net *net;
> > 
> > 	if (err)
> > 		return err;
> > @@ -2931,6 +2955,21 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > 			err = -EBUSY;
> > 			goto err_busy;
> > 		}
> > +
> > +		/* G2H sockets break in LOCAL mode namespaces because G2H transports
> > +		 * don't support them yet. Block registering new G2H transports if we
> > +		 * already have local mode namespaces on the system.
> > +		 */
> > +		rcu_read_lock();
> > +		for_each_net_rcu(net) {
> > +			if (vsock_net_mode(net) == VSOCK_NET_MODE_LOCAL) {
> > +				rcu_read_unlock();
> > +				err = -EOPNOTSUPP;
> > +				goto err_busy;
> > +			}
> > +		}
> > +		rcu_read_unlock();
> > +
> > 		t_g2h = t;
> > 	}
> > 
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 432fcbbd14d4..ed48dd1ff19b 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -835,6 +835,7 @@ int hvs_notify_set_rcvlowat(struct vsock_sock *vsk, int val)
> > 
> > static struct vsock_transport hvs_transport = {
> > 	.module                   = THIS_MODULE,
> > +	.always_block_local_mode  = true,
> > 
> > 	.get_local_cid            = hvs_get_local_cid,
> > 
> > 
> > -- 
> > 2.47.3
> > 
> 

  reply	other threads:[~2025-11-12 18:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  6:54 [PATCH net-next v9 00/14] vsock: add namespace support to vhost-vsock and loopback Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 01/14] vsock: a per-net vsock NS mode state Bobby Eshleman
2025-11-12 14:13   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 02/14] vsock: add netns to vsock core Bobby Eshleman
2025-11-12 14:14   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 03/14] vsock/virtio: add netns support to virtio transport and virtio common Bobby Eshleman
2025-11-12 14:18   ` Stefano Garzarella
2025-11-12 16:13     ` Bobby Eshleman
2025-11-12 17:39       ` Stefano Garzarella
2025-11-12 19:32         ` Bobby Eshleman
2025-11-13 15:31           ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 04/14] vsock/virtio: pack struct virtio_vsock_skb_cb Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 05/14] vsock: add netns and netns_tracker to vsock skb cb Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 06/14] vsock/loopback: add netns support Bobby Eshleman
2025-11-12 14:19   ` Stefano Garzarella
2025-11-12 18:27     ` Bobby Eshleman
2025-11-13 15:24       ` Stefano Garzarella
2025-11-13 18:26         ` Bobby Eshleman
2025-11-14  9:33           ` Stefano Garzarella
2025-11-14 22:13             ` Bobby Eshleman
2025-11-17  9:27               ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 07/14] vhost/vsock: " Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 08/14] vsock: reject bad VSOCK_NET_MODE_LOCAL configuration for G2H Bobby Eshleman
2025-11-12 14:21   ` Stefano Garzarella
2025-11-12 18:36     ` Bobby Eshleman [this message]
2025-11-12  6:54 ` [PATCH net-next v9 09/14] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 10/14] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
2025-11-12 14:23   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 11/14] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
2025-11-12 14:38   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 12/14] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
2025-11-12  6:54 ` [PATCH net-next v9 13/14] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
2025-11-12 14:41   ` Stefano Garzarella
2025-11-12  6:54 ` [PATCH net-next v9 14/14] selftests/vsock: add tests for namespace deletion and mode changes Bobby Eshleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRTTwuuXSz5CvNjt@devvm11784.nha0.facebook.com \
    --to=bobbyeshleman@gmail.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=berrange@redhat.com \
    --cc=bobbyeshleman@meta.com \
    --cc=bryan-bt.tan@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=haiyangz@microsoft.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sargun@sargun.me \
    --cc=sgarzare@redhat.com \
    --cc=shuah@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vishnu.dasa@broadcom.com \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).