public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefano Garzarella <sgarzare@redhat.com>
To: Bobby Eshleman <bobbyeshleman@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	 Stefan Hajnoczi <stefanha@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 Bobby Eshleman <bobbyeshleman@meta.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	 virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,  kvm@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	 Daan De Meyer <daan.j.demeyer@gmail.com>
Subject: Re: [PATCH net] vsock: lock down child_ns_mode as write-once
Date: Wed, 18 Feb 2026 11:02:02 +0100	[thread overview]
Message-ID: <aZV6HRAIsf_rNRM2@sgarzare-redhat> (raw)
In-Reply-To: <20260217-vsock-ns-write-once-v1-1-a1fb30f289a9@meta.com>

On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>To improve the security posture of vsock namespacing, this patch locks
>down the vsock child_ns_mode sysctl setting with a write-once policy.
>The user may write to child_ns_mode only once in each namespace, making
>changes to either local or global mode be irreversible.
>
>This avoids security breaches where a process in a local namespace may
>attempt to jailbreak into the global vsock ns space by setting
>child_ns_mode to "global", creating a new namespace, and accessing the
>global space through the new namespace.

Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from local 
to global") should avoid exactly that, so I don't get this. Can you 
elaborate more how this can happen without this patch?

I think here we should talk more about what we described in 
https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which 
is that two administrator processes could compete in setting 
`child_ns_mode` and end up creating a namespace with an `ns_mode` 
different from the one set in `child_ns_mode`. But I would also explain 
that this can still be detected by the process by looking at `ns_mode` 
and trying again.  With this patch, we avoid this and allow the 
namespace manager to set it once and be sure that it cannot be changed 
again.

>
>Additionally, fix the test functions that this change would otherwise
>break by adding "global-parent" and "local-parent" namespaces and using
>them as intermediaries to spawn namespaces in the given modes. This
>avoids the need to change "child_ns_mode" in the init_ns. nsenter must
>be used because ip netns unshares the mount namespace so nested "ip
>netns add" breaks exec calls from the init ns.

I'm not sure what the policy is in netdev, but I would prefer to have 
selftest changes in another patch (I think earlier in the series so as 
not to break the bisection), in order to simplify backporting (e.g. in 
CentOS Stream, to keep the backport small, I didn't backport the dozens 
of patches for selftest that we did previously).

Obviously, if it's not possible and breaks the bisection, I can safely 
skip these changes during the backport.

>
>Test run:
>
>1..25
>ok 1 vm_server_host_client
>ok 2 vm_client_host_server
>ok 3 vm_loopback
>ok 4 ns_host_vsock_ns_mode_ok
>ok 5 ns_host_vsock_child_ns_mode_ok
>ok 6 ns_global_same_cid_fails
>ok 7 ns_local_same_cid_ok
>ok 8 ns_global_local_same_cid_ok
>ok 9 ns_local_global_same_cid_ok
>ok 10 ns_diff_global_host_connect_to_global_vm_ok
>ok 11 ns_diff_global_host_connect_to_local_vm_fails
>ok 12 ns_diff_global_vm_connect_to_global_host_ok
>ok 13 ns_diff_global_vm_connect_to_local_host_fails
>ok 14 ns_diff_local_host_connect_to_local_vm_fails
>ok 15 ns_diff_local_vm_connect_to_local_host_fails
>ok 16 ns_diff_global_to_local_loopback_local_fails
>ok 17 ns_diff_local_to_global_loopback_fails
>ok 18 ns_diff_local_to_local_loopback_fails
>ok 19 ns_diff_global_to_global_loopback_ok
>ok 20 ns_same_local_loopback_ok
>ok 21 ns_same_local_host_connect_to_local_vm_ok
>ok 22 ns_same_local_vm_connect_to_local_host_ok
>ok 23 ns_delete_vm_ok
>ok 24 ns_delete_host_ok
>ok 25 ns_delete_both_ok
>SUMMARY: PASS=25 SKIP=0 FAIL=0

IMO this can be removed from the commit message, doesn't add much value 
other than say that all test passes.

>
>Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>---
> include/net/af_vsock.h                  |  6 +++++-
> include/net/netns/vsock.h               |  1 +
> net/vmw_vsock/af_vsock.c                | 10 ++++++----
> tools/testing/selftests/vsock/vmtest.sh | 35 
> +++++++++++++++------------------
> 4 files changed, 28 insertions(+), 24 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index d3ff48a2fbe0..c7de33039907 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> }
>
>-static inline void vsock_net_set_child_mode(struct net *net,
>+static inline bool vsock_net_set_child_mode(struct net *net,
> 					    enum vsock_net_mode mode)
> {
>+	if (xchg(&net->vsock.child_ns_mode_locked, 1))
>+		return false;
>+
> 	WRITE_ONCE(net->vsock.child_ns_mode, mode);
>+	return true;
> }
>
> static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>index b34d69a22fa8..8c855fff8039 100644
>--- a/include/net/netns/vsock.h
>+++ b/include/net/netns/vsock.h
>@@ -17,5 +17,6 @@ struct netns_vsock {
>
> 	enum vsock_net_mode mode;
> 	enum vsock_net_mode child_ns_mode;
>+	int child_ns_mode_locked;
> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 9880756d9eff..35e097f4fde8 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -90,14 +90,15 @@
>  *
>  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>  *     mode, which is set at namespace creation and immutable thereafter.
>- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>  *     child namespaces will inherit when created. The initial value matches
>  *     the namespace's own ns_mode.
>  *
>  *   Changing child_ns_mode only affects newly created namespaces, not the
>  *   current namespace or existing children. A "local" namespace cannot set
>- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
>- *   from the parent's child_ns_mode.
>+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
>+ *   be configured and locked down by a namespace manager. At namespace
>+ *   creation, ns_mode is inherited from the parent's child_ns_mode.

We just merged commit a07c33c6f2fc ("vsock: document namespace mode 
sysctls") in the net tree, so we should update also 
Documentation/admin-guide/sysctl/net.rst

>  *
>  *   The init_net mode is "global" and cannot be modified.

Maybe we should also emphasise in the documentation and in the commit 
description that `child_ns_mode` in `init_net` also is write-once, so
writing `local` to that one by the init process (e.g. systemd), 
essentially will make all the new namespaces in `local` mode. This could 
be useful for container/namespace managers.

>  *
>@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> 		    new_mode == VSOCK_NET_MODE_GLOBAL)
> 			return -EPERM;
>
>-		vsock_net_set_child_mode(net, new_mode);
>+		if (!vsock_net_set_child_mode(net, new_mode))
>+			return -EPERM;

So, if `child_ns_mode` is set to `local` but locked, writing `local` 
again will return -EPERM, is this really what we want?

I'm not sure if we can relax it a bit, but then we may race between 
reader and writer, so maybe it's fine like it is in this patch, but we 
should document better that any writes (even same value) after the first 
one will return -EPERM.

About that, should we return something different, like -EBUSY ?
Not a strong opinion, just to differentiate with the other check before.

Thanks,
Stefano


  reply	other threads:[~2026-02-18 10:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-18  1:45 [PATCH net] vsock: lock down child_ns_mode as write-once Bobby Eshleman
2026-02-18 10:02 ` Stefano Garzarella [this message]
2026-02-18 10:43   ` Stefano Garzarella
2026-02-18 16:15     ` Bobby Eshleman
2026-02-19  0:41       ` Jakub Kicinski
2026-02-19 16:21         ` Bobby Eshleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZV6HRAIsf_rNRM2@sgarzare-redhat \
    --to=sgarzare@redhat.com \
    --cc=bobbyeshleman@gmail.com \
    --cc=bobbyeshleman@meta.com \
    --cc=daan.j.demeyer@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox