From: Stefano Garzarella <sgarzare@redhat.com>
To: Bobby Eshleman <bobbyeshleman@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Shuah Khan <shuah@kernel.org>,
Bobby Eshleman <bobbyeshleman@meta.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
virtualization@lists.linux.dev, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-kselftest@vger.kernel.org,
Daan De Meyer <daan.j.demeyer@gmail.com>
Subject: Re: [PATCH net] vsock: lock down child_ns_mode as write-once
Date: Wed, 18 Feb 2026 11:43:42 +0100 [thread overview]
Message-ID: <aZWUmbiH11Eh3Y4v@sgarzare-redhat> (raw)
In-Reply-To: <aZV6HRAIsf_rNRM2@sgarzare-redhat>
On Wed, Feb 18, 2026 at 11:02:02AM +0100, Stefano Garzarella wrote:
>On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
>>From: Bobby Eshleman <bobbyeshleman@meta.com>
>>
>>To improve the security posture of vsock namespacing, this patch locks
>>down the vsock child_ns_mode sysctl setting with a write-once policy.
>>The user may write to child_ns_mode only once in each namespace, making
>>changes to either local or global mode be irreversible.
>>
>>This avoids security breaches where a process in a local namespace may
>>attempt to jailbreak into the global vsock ns space by setting
>>child_ns_mode to "global", creating a new namespace, and accessing the
>>global space through the new namespace.
>
>Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from
>local to global") should avoid exactly that, so I don't get this. Can
>you elaborate more how this can happen without this patch?
>
>I think here we should talk more about what we described in
>https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which
>is that two administrator processes could compete in setting
>`child_ns_mode` and end up creating a namespace with an `ns_mode`
>different from the one set in `child_ns_mode`. But I would also
>explain that this can still be detected by the process by looking at
>`ns_mode` and trying again. With this patch, we avoid this and allow
>the namespace manager to set it once and be sure that it cannot be
>changed again.
>
>>
>>Additionally, fix the test functions that this change would otherwise
>>break by adding "global-parent" and "local-parent" namespaces and using
>>them as intermediaries to spawn namespaces in the given modes. This
>>avoids the need to change "child_ns_mode" in the init_ns. nsenter must
>>be used because ip netns unshares the mount namespace so nested "ip
>>netns add" breaks exec calls from the init ns.
>
>I'm not sure what the policy is in netdev, but I would prefer to have
>selftest changes in another patch (I think earlier in the series so as
>not to break the bisection), in order to simplify backporting (e.g. in
>CentOS Stream, to keep the backport small, I didn't backport the
>dozens of patches for selftest that we did previously).
>
>Obviously, if it's not possible and breaks the bisection, I can safely
>skip these changes during the backport.
>
>>
>>Test run:
>>
>>1..25
>>ok 1 vm_server_host_client
>>ok 2 vm_client_host_server
>>ok 3 vm_loopback
>>ok 4 ns_host_vsock_ns_mode_ok
>>ok 5 ns_host_vsock_child_ns_mode_ok
>>ok 6 ns_global_same_cid_fails
>>ok 7 ns_local_same_cid_ok
>>ok 8 ns_global_local_same_cid_ok
>>ok 9 ns_local_global_same_cid_ok
>>ok 10 ns_diff_global_host_connect_to_global_vm_ok
>>ok 11 ns_diff_global_host_connect_to_local_vm_fails
>>ok 12 ns_diff_global_vm_connect_to_global_host_ok
>>ok 13 ns_diff_global_vm_connect_to_local_host_fails
>>ok 14 ns_diff_local_host_connect_to_local_vm_fails
>>ok 15 ns_diff_local_vm_connect_to_local_host_fails
>>ok 16 ns_diff_global_to_local_loopback_local_fails
>>ok 17 ns_diff_local_to_global_loopback_fails
>>ok 18 ns_diff_local_to_local_loopback_fails
>>ok 19 ns_diff_global_to_global_loopback_ok
>>ok 20 ns_same_local_loopback_ok
>>ok 21 ns_same_local_host_connect_to_local_vm_ok
>>ok 22 ns_same_local_vm_connect_to_local_host_ok
>>ok 23 ns_delete_vm_ok
>>ok 24 ns_delete_host_ok
>>ok 25 ns_delete_both_ok
>>SUMMARY: PASS=25 SKIP=0 FAIL=0
>
>IMO this can be removed from the commit message, doesn't add much
>value other than say that all test passes.
>
>>
>>Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>>Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>>---
>>include/net/af_vsock.h | 6 +++++-
>>include/net/netns/vsock.h | 1 +
>>net/vmw_vsock/af_vsock.c | 10 ++++++----
>>tools/testing/selftests/vsock/vmtest.sh | 35
>>+++++++++++++++------------------
>>4 files changed, 28 insertions(+), 24 deletions(-)
>>
>>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>>index d3ff48a2fbe0..c7de33039907 100644
>>--- a/include/net/af_vsock.h
>>+++ b/include/net/af_vsock.h
>>@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>> return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>>}
>>
>>-static inline void vsock_net_set_child_mode(struct net *net,
>>+static inline bool vsock_net_set_child_mode(struct net *net,
>> enum vsock_net_mode mode)
>>{
>>+ if (xchg(&net->vsock.child_ns_mode_locked, 1))
>>+ return false;
>>+
>> WRITE_ONCE(net->vsock.child_ns_mode, mode);
>>+ return true;
>>}
>>
>>static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>>index b34d69a22fa8..8c855fff8039 100644
>>--- a/include/net/netns/vsock.h
>>+++ b/include/net/netns/vsock.h
>>@@ -17,5 +17,6 @@ struct netns_vsock {
>>
>> enum vsock_net_mode mode;
>> enum vsock_net_mode child_ns_mode;
>>+ int child_ns_mode_locked;
>>};
>>#endif /* __NET_NET_NAMESPACE_VSOCK_H */
>>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>index 9880756d9eff..35e097f4fde8 100644
>>--- a/net/vmw_vsock/af_vsock.c
>>+++ b/net/vmw_vsock/af_vsock.c
>>@@ -90,14 +90,15 @@
>> *
>> * - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>> * mode, which is set at namespace creation and immutable thereafter.
>>- * - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>>+ * - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>> * child namespaces will inherit when created. The initial value matches
>> * the namespace's own ns_mode.
>> *
>> * Changing child_ns_mode only affects newly created namespaces, not the
>> * current namespace or existing children. A "local" namespace cannot set
>>- * child_ns_mode to "global". At namespace creation, ns_mode is inherited
>>- * from the parent's child_ns_mode.
>>+ * child_ns_mode to "global". child_ns_mode is write-once, so that it may
>>+ * be configured and locked down by a namespace manager. At namespace
>>+ * creation, ns_mode is inherited from the parent's child_ns_mode.
>
>We just merged commit a07c33c6f2fc ("vsock: document namespace mode
>sysctls") in the net tree, so we should update also
>Documentation/admin-guide/sysctl/net.rst
>
>> *
>> * The init_net mode is "global" and cannot be modified.
>
>Maybe we should also emphasise in the documentation and in the commit
>description that `child_ns_mode` in `init_net` also is write-once, so
>writing `local` to that one by the init process (e.g. systemd),
>essentially will make all the new namespaces in `local` mode. This
>could be useful for container/namespace managers.
>
>> *
>>@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>> new_mode == VSOCK_NET_MODE_GLOBAL)
>> return -EPERM;
>>
>>- vsock_net_set_child_mode(net, new_mode);
>>+ if (!vsock_net_set_child_mode(net, new_mode))
>>+ return -EPERM;
>
>So, if `child_ns_mode` is set to `local` but locked, writing `local`
>again will return -EPERM, is this really what we want?
>
>I'm not sure if we can relax it a bit, but then we may race between
>reader and writer, so maybe it's fine like it is in this patch, but we
>should document better that any writes (even same value) after the
>first one will return -EPERM.
I think we can try in this way:
static inline bool vsock_net_set_child_mode(struct net *net,
enum vsock_net_mode mode)
{
int new_locked = mode + 1;
int old_locked = 0;
if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
&old_locked, new_locked)) {
WRITE_ONCE(net->vsock.child_ns_mode, mode);
return true;
}
return old_locked == new_locked;
}
With a comment like this near child_ns_mode_locked in struct
netns_vsock:
/* 0 = unlocked
* 1 = locked to GLOBAL (VSOCK_NET_MODE_GLOBAL + 1)
* 2 = locked to LOCAL (VSOCK_NET_MODE_LOCAL + 1)
*/
While writing that, I thought that we can even remove
`child_ns_mode_locked` and use a single variable where encode the value
and the state, but maybe it's an unnecessary extra complexity.
Stefano
>
>About that, should we return something different, like -EBUSY ?
>Not a strong opinion, just to differentiate with the other check before.
>
>Thanks,
>Stefano
next prev parent reply other threads:[~2026-02-18 10:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-18 1:45 [PATCH net] vsock: lock down child_ns_mode as write-once Bobby Eshleman
2026-02-18 10:02 ` Stefano Garzarella
2026-02-18 10:43 ` Stefano Garzarella [this message]
2026-02-18 16:15 ` Bobby Eshleman
2026-02-19 0:41 ` Jakub Kicinski
2026-02-19 16:21 ` Bobby Eshleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZWUmbiH11Eh3Y4v@sgarzare-redhat \
--to=sgarzare@redhat.com \
--cc=bobbyeshleman@gmail.com \
--cc=bobbyeshleman@meta.com \
--cc=daan.j.demeyer@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox