public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
* [PATCH net] vsock: lock down child_ns_mode as write-once
@ 2026-02-18  1:45 Bobby Eshleman
  2026-02-18 10:02 ` Stefano Garzarella
  0 siblings, 1 reply; 6+ messages in thread
From: Bobby Eshleman @ 2026-02-18  1:45 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
	Bobby Eshleman, Michael S. Tsirkin
  Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
	Daan De Meyer

From: Bobby Eshleman <bobbyeshleman@meta.com>

To improve the security posture of vsock namespacing, this patch locks
down the vsock child_ns_mode sysctl setting with a write-once policy.
The user may write to child_ns_mode only once in each namespace, making
changes to either local or global mode be irreversible.

This avoids security breaches where a process in a local namespace may
attempt to jailbreak into the global vsock ns space by setting
child_ns_mode to "global", creating a new namespace, and accessing the
global space through the new namespace.

Additionally, fix the test functions that this change would otherwise
break by adding "global-parent" and "local-parent" namespaces and using
them as intermediaries to spawn namespaces in the given modes. This
avoids the need to change "child_ns_mode" in the init_ns. nsenter must
be used because ip netns unshares the mount namespace so nested "ip
netns add" breaks exec calls from the init ns.

Test run:

1..25
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_host_vsock_ns_mode_ok
ok 5 ns_host_vsock_child_ns_mode_ok
ok 6 ns_global_same_cid_fails
ok 7 ns_local_same_cid_ok
ok 8 ns_global_local_same_cid_ok
ok 9 ns_local_global_same_cid_ok
ok 10 ns_diff_global_host_connect_to_global_vm_ok
ok 11 ns_diff_global_host_connect_to_local_vm_fails
ok 12 ns_diff_global_vm_connect_to_global_host_ok
ok 13 ns_diff_global_vm_connect_to_local_host_fails
ok 14 ns_diff_local_host_connect_to_local_vm_fails
ok 15 ns_diff_local_vm_connect_to_local_host_fails
ok 16 ns_diff_global_to_local_loopback_local_fails
ok 17 ns_diff_local_to_global_loopback_fails
ok 18 ns_diff_local_to_local_loopback_fails
ok 19 ns_diff_global_to_global_loopback_ok
ok 20 ns_same_local_loopback_ok
ok 21 ns_same_local_host_connect_to_local_vm_ok
ok 22 ns_same_local_vm_connect_to_local_host_ok
ok 23 ns_delete_vm_ok
ok 24 ns_delete_host_ok
ok 25 ns_delete_both_ok
SUMMARY: PASS=25 SKIP=0 FAIL=0

Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/net/af_vsock.h                  |  6 +++++-
 include/net/netns/vsock.h               |  1 +
 net/vmw_vsock/af_vsock.c                | 10 ++++++----
 tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++------------------
 4 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..c7de33039907 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
 }
 
-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
 					    enum vsock_net_mode mode)
 {
+	if (xchg(&net->vsock.child_ns_mode_locked, 1))
+		return false;
+
 	WRITE_ONCE(net->vsock.child_ns_mode, mode);
+	return true;
 }
 
 static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..8c855fff8039 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -17,5 +17,6 @@ struct netns_vsock {
 
 	enum vsock_net_mode mode;
 	enum vsock_net_mode child_ns_mode;
+	int child_ns_mode_locked;
 };
 #endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..35e097f4fde8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -90,14 +90,15 @@
  *
  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
  *     mode, which is set at namespace creation and immutable thereafter.
- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
  *     child namespaces will inherit when created. The initial value matches
  *     the namespace's own ns_mode.
  *
  *   Changing child_ns_mode only affects newly created namespaces, not the
  *   current namespace or existing children. A "local" namespace cannot set
- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
- *   from the parent's child_ns_mode.
+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
+ *   be configured and locked down by a namespace manager. At namespace
+ *   creation, ns_mode is inherited from the parent's child_ns_mode.
  *
  *   The init_net mode is "global" and cannot be modified.
  *
@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
 		    new_mode == VSOCK_NET_MODE_GLOBAL)
 			return -EPERM;
 
-		vsock_net_set_child_mode(net, new_mode);
+		if (!vsock_net_set_child_mode(net, new_mode))
+			return -EPERM;
 	}
 
 	return 0;
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index dc8dbe74a6d0..e1e78b295e41 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -210,16 +210,17 @@ check_result() {
 }
 
 add_namespaces() {
-	local orig_mode
-	orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+	ip netns add "global-parent" 2>/dev/null
+	echo "global" | ip netns exec "global-parent" \
+		tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
+	ip netns add "local-parent" 2>/dev/null
+	echo "local" | ip netns exec "local-parent" \
+		tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
 
-	for mode in "${NS_MODES[@]}"; do
-		echo "${mode}" > /proc/sys/net/vsock/child_ns_mode
-		ip netns add "${mode}0" 2>/dev/null
-		ip netns add "${mode}1" 2>/dev/null
-	done
-
-	echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
+	nsenter --net=/var/run/netns/global-parent ip netns add "global0" 2>/dev/null
+	nsenter --net=/var/run/netns/global-parent ip netns add "global1" 2>/dev/null
+	nsenter --net=/var/run/netns/local-parent ip netns add "local0" 2>/dev/null
+	nsenter --net=/var/run/netns/local-parent ip netns add "local1" 2>/dev/null
 }
 
 init_namespaces() {
@@ -237,6 +238,8 @@ del_namespaces() {
 		log_host "removed ns ${mode}0"
 		log_host "removed ns ${mode}1"
 	done
+	ip netns del "global-parent" &>/dev/null
+	ip netns del "local-parent" &>/dev/null
 }
 
 vm_ssh() {
@@ -287,7 +290,7 @@ check_args() {
 }
 
 check_deps() {
-	for dep in vng ${QEMU} busybox pkill ssh ss socat; do
+	for dep in vng ${QEMU} busybox pkill ssh ss socat nsenter; do
 		if [[ ! -x $(command -v "${dep}") ]]; then
 			echo -e "skip:    dependency ${dep} not found!\n"
 			exit "${KSFT_SKIP}"
@@ -1231,12 +1234,8 @@ test_ns_local_same_cid_ok() {
 }
 
 test_ns_host_vsock_child_ns_mode_ok() {
-	local orig_mode
-	local rc
-
-	orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+	local rc="${KSFT_PASS}"
 
-	rc="${KSFT_PASS}"
 	for mode in "${NS_MODES[@]}"; do
 		local ns="${mode}0"
 
@@ -1246,15 +1245,13 @@ test_ns_host_vsock_child_ns_mode_ok() {
 			continue
 		fi
 
-		if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then
-			log_host "child_ns_mode should be writable to ${mode}"
+		if ! echo "${mode}" | ip netns exec "${ns}" \
+			tee /proc/sys/net/vsock/child_ns_mode &>/dev/null; then
 			rc="${KSFT_FAIL}"
 			continue
 		fi
 	done
 
-	echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
-
 	return "${rc}"
 }
 

---
base-commit: 77c5e3fdd2793f478e6fdae55c9ea85b21d06f8f
change-id: 20260217-vsock-ns-write-once-8834d684e0a2

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] vsock: lock down child_ns_mode as write-once
  2026-02-18  1:45 [PATCH net] vsock: lock down child_ns_mode as write-once Bobby Eshleman
@ 2026-02-18 10:02 ` Stefano Garzarella
  2026-02-18 10:43   ` Stefano Garzarella
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Garzarella @ 2026-02-18 10:02 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, virtualization, netdev, linux-kernel, kvm,
	linux-kselftest, Daan De Meyer

On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>To improve the security posture of vsock namespacing, this patch locks
>down the vsock child_ns_mode sysctl setting with a write-once policy.
>The user may write to child_ns_mode only once in each namespace, making
>changes to either local or global mode be irreversible.
>
>This avoids security breaches where a process in a local namespace may
>attempt to jailbreak into the global vsock ns space by setting
>child_ns_mode to "global", creating a new namespace, and accessing the
>global space through the new namespace.

Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from local 
to global") should avoid exactly that, so I don't get this. Can you 
elaborate more how this can happen without this patch?

I think here we should talk more about what we described in 
https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which 
is that two administrator processes could compete in setting 
`child_ns_mode` and end up creating a namespace with an `ns_mode` 
different from the one set in `child_ns_mode`. But I would also explain 
that this can still be detected by the process by looking at `ns_mode` 
and trying again.  With this patch, we avoid this and allow the 
namespace manager to set it once and be sure that it cannot be changed 
again.

>
>Additionally, fix the test functions that this change would otherwise
>break by adding "global-parent" and "local-parent" namespaces and using
>them as intermediaries to spawn namespaces in the given modes. This
>avoids the need to change "child_ns_mode" in the init_ns. nsenter must
>be used because ip netns unshares the mount namespace so nested "ip
>netns add" breaks exec calls from the init ns.

I'm not sure what the policy is in netdev, but I would prefer to have 
selftest changes in another patch (I think earlier in the series so as 
not to break the bisection), in order to simplify backporting (e.g. in 
CentOS Stream, to keep the backport small, I didn't backport the dozens 
of patches for selftest that we did previously).

Obviously, if it's not possible and breaks the bisection, I can safely 
skip these changes during the backport.

>
>Test run:
>
>1..25
>ok 1 vm_server_host_client
>ok 2 vm_client_host_server
>ok 3 vm_loopback
>ok 4 ns_host_vsock_ns_mode_ok
>ok 5 ns_host_vsock_child_ns_mode_ok
>ok 6 ns_global_same_cid_fails
>ok 7 ns_local_same_cid_ok
>ok 8 ns_global_local_same_cid_ok
>ok 9 ns_local_global_same_cid_ok
>ok 10 ns_diff_global_host_connect_to_global_vm_ok
>ok 11 ns_diff_global_host_connect_to_local_vm_fails
>ok 12 ns_diff_global_vm_connect_to_global_host_ok
>ok 13 ns_diff_global_vm_connect_to_local_host_fails
>ok 14 ns_diff_local_host_connect_to_local_vm_fails
>ok 15 ns_diff_local_vm_connect_to_local_host_fails
>ok 16 ns_diff_global_to_local_loopback_local_fails
>ok 17 ns_diff_local_to_global_loopback_fails
>ok 18 ns_diff_local_to_local_loopback_fails
>ok 19 ns_diff_global_to_global_loopback_ok
>ok 20 ns_same_local_loopback_ok
>ok 21 ns_same_local_host_connect_to_local_vm_ok
>ok 22 ns_same_local_vm_connect_to_local_host_ok
>ok 23 ns_delete_vm_ok
>ok 24 ns_delete_host_ok
>ok 25 ns_delete_both_ok
>SUMMARY: PASS=25 SKIP=0 FAIL=0

IMO this can be removed from the commit message, doesn't add much value 
other than say that all test passes.

>
>Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>---
> include/net/af_vsock.h                  |  6 +++++-
> include/net/netns/vsock.h               |  1 +
> net/vmw_vsock/af_vsock.c                | 10 ++++++----
> tools/testing/selftests/vsock/vmtest.sh | 35 
> +++++++++++++++------------------
> 4 files changed, 28 insertions(+), 24 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index d3ff48a2fbe0..c7de33039907 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> }
>
>-static inline void vsock_net_set_child_mode(struct net *net,
>+static inline bool vsock_net_set_child_mode(struct net *net,
> 					    enum vsock_net_mode mode)
> {
>+	if (xchg(&net->vsock.child_ns_mode_locked, 1))
>+		return false;
>+
> 	WRITE_ONCE(net->vsock.child_ns_mode, mode);
>+	return true;
> }
>
> static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>index b34d69a22fa8..8c855fff8039 100644
>--- a/include/net/netns/vsock.h
>+++ b/include/net/netns/vsock.h
>@@ -17,5 +17,6 @@ struct netns_vsock {
>
> 	enum vsock_net_mode mode;
> 	enum vsock_net_mode child_ns_mode;
>+	int child_ns_mode_locked;
> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 9880756d9eff..35e097f4fde8 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -90,14 +90,15 @@
>  *
>  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>  *     mode, which is set at namespace creation and immutable thereafter.
>- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>  *     child namespaces will inherit when created. The initial value matches
>  *     the namespace's own ns_mode.
>  *
>  *   Changing child_ns_mode only affects newly created namespaces, not the
>  *   current namespace or existing children. A "local" namespace cannot set
>- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
>- *   from the parent's child_ns_mode.
>+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
>+ *   be configured and locked down by a namespace manager. At namespace
>+ *   creation, ns_mode is inherited from the parent's child_ns_mode.

We just merged commit a07c33c6f2fc ("vsock: document namespace mode 
sysctls") in the net tree, so we should update also 
Documentation/admin-guide/sysctl/net.rst

>  *
>  *   The init_net mode is "global" and cannot be modified.

Maybe we should also emphasise in the documentation and in the commit 
description that `child_ns_mode` in `init_net` also is write-once, so
writing `local` to that one by the init process (e.g. systemd), 
essentially will make all the new namespaces in `local` mode. This could 
be useful for container/namespace managers.

>  *
>@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> 		    new_mode == VSOCK_NET_MODE_GLOBAL)
> 			return -EPERM;
>
>-		vsock_net_set_child_mode(net, new_mode);
>+		if (!vsock_net_set_child_mode(net, new_mode))
>+			return -EPERM;

So, if `child_ns_mode` is set to `local` but locked, writing `local` 
again will return -EPERM, is this really what we want?

I'm not sure if we can relax it a bit, but then we may race between 
reader and writer, so maybe it's fine like it is in this patch, but we 
should document better that any writes (even same value) after the first 
one will return -EPERM.

About that, should we return something different, like -EBUSY ?
Not a strong opinion, just to differentiate with the other check before.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] vsock: lock down child_ns_mode as write-once
  2026-02-18 10:02 ` Stefano Garzarella
@ 2026-02-18 10:43   ` Stefano Garzarella
  2026-02-18 16:15     ` Bobby Eshleman
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Garzarella @ 2026-02-18 10:43 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, virtualization, netdev, linux-kernel, kvm,
	linux-kselftest, Daan De Meyer

On Wed, Feb 18, 2026 at 11:02:02AM +0100, Stefano Garzarella wrote:
>On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
>>From: Bobby Eshleman <bobbyeshleman@meta.com>
>>
>>To improve the security posture of vsock namespacing, this patch locks
>>down the vsock child_ns_mode sysctl setting with a write-once policy.
>>The user may write to child_ns_mode only once in each namespace, making
>>changes to either local or global mode be irreversible.
>>
>>This avoids security breaches where a process in a local namespace may
>>attempt to jailbreak into the global vsock ns space by setting
>>child_ns_mode to "global", creating a new namespace, and accessing the
>>global space through the new namespace.
>
>Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from 
>local to global") should avoid exactly that, so I don't get this. Can 
>you elaborate more how this can happen without this patch?
>
>I think here we should talk more about what we described in 
>https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which 
>is that two administrator processes could compete in setting 
>`child_ns_mode` and end up creating a namespace with an `ns_mode` 
>different from the one set in `child_ns_mode`. But I would also 
>explain that this can still be detected by the process by looking at 
>`ns_mode` and trying again.  With this patch, we avoid this and allow 
>the namespace manager to set it once and be sure that it cannot be 
>changed again.
>
>>
>>Additionally, fix the test functions that this change would otherwise
>>break by adding "global-parent" and "local-parent" namespaces and using
>>them as intermediaries to spawn namespaces in the given modes. This
>>avoids the need to change "child_ns_mode" in the init_ns. nsenter must
>>be used because ip netns unshares the mount namespace so nested "ip
>>netns add" breaks exec calls from the init ns.
>
>I'm not sure what the policy is in netdev, but I would prefer to have 
>selftest changes in another patch (I think earlier in the series so as 
>not to break the bisection), in order to simplify backporting (e.g. in 
>CentOS Stream, to keep the backport small, I didn't backport the 
>dozens of patches for selftest that we did previously).
>
>Obviously, if it's not possible and breaks the bisection, I can safely 
>skip these changes during the backport.
>
>>
>>Test run:
>>
>>1..25
>>ok 1 vm_server_host_client
>>ok 2 vm_client_host_server
>>ok 3 vm_loopback
>>ok 4 ns_host_vsock_ns_mode_ok
>>ok 5 ns_host_vsock_child_ns_mode_ok
>>ok 6 ns_global_same_cid_fails
>>ok 7 ns_local_same_cid_ok
>>ok 8 ns_global_local_same_cid_ok
>>ok 9 ns_local_global_same_cid_ok
>>ok 10 ns_diff_global_host_connect_to_global_vm_ok
>>ok 11 ns_diff_global_host_connect_to_local_vm_fails
>>ok 12 ns_diff_global_vm_connect_to_global_host_ok
>>ok 13 ns_diff_global_vm_connect_to_local_host_fails
>>ok 14 ns_diff_local_host_connect_to_local_vm_fails
>>ok 15 ns_diff_local_vm_connect_to_local_host_fails
>>ok 16 ns_diff_global_to_local_loopback_local_fails
>>ok 17 ns_diff_local_to_global_loopback_fails
>>ok 18 ns_diff_local_to_local_loopback_fails
>>ok 19 ns_diff_global_to_global_loopback_ok
>>ok 20 ns_same_local_loopback_ok
>>ok 21 ns_same_local_host_connect_to_local_vm_ok
>>ok 22 ns_same_local_vm_connect_to_local_host_ok
>>ok 23 ns_delete_vm_ok
>>ok 24 ns_delete_host_ok
>>ok 25 ns_delete_both_ok
>>SUMMARY: PASS=25 SKIP=0 FAIL=0
>
>IMO this can be removed from the commit message, doesn't add much 
>value other than say that all test passes.
>
>>
>>Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>>Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>>---
>>include/net/af_vsock.h                  |  6 +++++-
>>include/net/netns/vsock.h               |  1 +
>>net/vmw_vsock/af_vsock.c                | 10 ++++++----
>>tools/testing/selftests/vsock/vmtest.sh | 35 
>>+++++++++++++++------------------
>>4 files changed, 28 insertions(+), 24 deletions(-)
>>
>>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>>index d3ff48a2fbe0..c7de33039907 100644
>>--- a/include/net/af_vsock.h
>>+++ b/include/net/af_vsock.h
>>@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>>	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>>}
>>
>>-static inline void vsock_net_set_child_mode(struct net *net,
>>+static inline bool vsock_net_set_child_mode(struct net *net,
>>					    enum vsock_net_mode mode)
>>{
>>+	if (xchg(&net->vsock.child_ns_mode_locked, 1))
>>+		return false;
>>+
>>	WRITE_ONCE(net->vsock.child_ns_mode, mode);
>>+	return true;
>>}
>>
>>static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>>index b34d69a22fa8..8c855fff8039 100644
>>--- a/include/net/netns/vsock.h
>>+++ b/include/net/netns/vsock.h
>>@@ -17,5 +17,6 @@ struct netns_vsock {
>>
>>	enum vsock_net_mode mode;
>>	enum vsock_net_mode child_ns_mode;
>>+	int child_ns_mode_locked;
>>};
>>#endif /* __NET_NET_NAMESPACE_VSOCK_H */
>>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>index 9880756d9eff..35e097f4fde8 100644
>>--- a/net/vmw_vsock/af_vsock.c
>>+++ b/net/vmw_vsock/af_vsock.c
>>@@ -90,14 +90,15 @@
>> *
>> *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>> *     mode, which is set at namespace creation and immutable thereafter.
>>- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>>+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>> *     child namespaces will inherit when created. The initial value matches
>> *     the namespace's own ns_mode.
>> *
>> *   Changing child_ns_mode only affects newly created namespaces, not the
>> *   current namespace or existing children. A "local" namespace cannot set
>>- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
>>- *   from the parent's child_ns_mode.
>>+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
>>+ *   be configured and locked down by a namespace manager. At namespace
>>+ *   creation, ns_mode is inherited from the parent's child_ns_mode.
>
>We just merged commit a07c33c6f2fc ("vsock: document namespace mode 
>sysctls") in the net tree, so we should update also 
>Documentation/admin-guide/sysctl/net.rst
>
>> *
>> *   The init_net mode is "global" and cannot be modified.
>
>Maybe we should also emphasise in the documentation and in the commit 
>description that `child_ns_mode` in `init_net` also is write-once, so
>writing `local` to that one by the init process (e.g. systemd), 
>essentially will make all the new namespaces in `local` mode. This 
>could be useful for container/namespace managers.
>
>> *
>>@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>>		    new_mode == VSOCK_NET_MODE_GLOBAL)
>>			return -EPERM;
>>
>>-		vsock_net_set_child_mode(net, new_mode);
>>+		if (!vsock_net_set_child_mode(net, new_mode))
>>+			return -EPERM;
>
>So, if `child_ns_mode` is set to `local` but locked, writing `local` 
>again will return -EPERM, is this really what we want?
>
>I'm not sure if we can relax it a bit, but then we may race between 
>reader and writer, so maybe it's fine like it is in this patch, but we 
>should document better that any writes (even same value) after the 
>first one will return -EPERM.

I think we can try in this way:

static inline bool vsock_net_set_child_mode(struct net *net,
					    enum vsock_net_mode mode)
{
	int new_locked = mode + 1;
	int old_locked = 0;

	if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
			&old_locked, new_locked)) {
		WRITE_ONCE(net->vsock.child_ns_mode, mode);
		return true;
	}

	return old_locked == new_locked;
}

With a comment like this near child_ns_mode_locked in struct 
netns_vsock:
/* 0 = unlocked
  * 1 = locked to GLOBAL (VSOCK_NET_MODE_GLOBAL + 1)
  * 2 = locked to LOCAL  (VSOCK_NET_MODE_LOCAL + 1)
  */

While writing that, I thought that we can even remove 
`child_ns_mode_locked` and use a single variable where encode the value 
and the state, but maybe it's an unnecessary extra complexity.

Stefano

>
>About that, should we return something different, like -EBUSY ?
>Not a strong opinion, just to differentiate with the other check before.
>
>Thanks,
>Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] vsock: lock down child_ns_mode as write-once
  2026-02-18 10:43   ` Stefano Garzarella
@ 2026-02-18 16:15     ` Bobby Eshleman
  2026-02-19  0:41       ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Bobby Eshleman @ 2026-02-18 16:15 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, virtualization, netdev, linux-kernel, kvm,
	linux-kselftest, Daan De Meyer

On Wed, Feb 18, 2026 at 11:43:42AM +0100, Stefano Garzarella wrote:
> On Wed, Feb 18, 2026 at 11:02:02AM +0100, Stefano Garzarella wrote:
> > On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
> > > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > > 
> > > To improve the security posture of vsock namespacing, this patch locks
> > > down the vsock child_ns_mode sysctl setting with a write-once policy.
> > > The user may write to child_ns_mode only once in each namespace, making
> > > changes to either local or global mode be irreversible.
> > > 
> > > This avoids security breaches where a process in a local namespace may
> > > attempt to jailbreak into the global vsock ns space by setting
> > > child_ns_mode to "global", creating a new namespace, and accessing the
> > > global space through the new namespace.
> > 
> > Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from local
> > to global") should avoid exactly that, so I don't get this. Can you
> > elaborate more how this can happen without this patch?
> > 
> > I think here we should talk more about what we described in
> > https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which
> > is that two administrator processes could compete in setting
> > `child_ns_mode` and end up creating a namespace with an `ns_mode`
> > different from the one set in `child_ns_mode`. But I would also explain
> > that this can still be detected by the process by looking at `ns_mode`
> > and trying again.  With this patch, we avoid this and allow the
> > namespace manager to set it once and be sure that it cannot be changed
> > again.

Oh that's right, I lost track of the original intent when writing this.

> > 
> > > 
> > > Additionally, fix the test functions that this change would otherwise
> > > break by adding "global-parent" and "local-parent" namespaces and using
> > > them as intermediaries to spawn namespaces in the given modes. This
> > > avoids the need to change "child_ns_mode" in the init_ns. nsenter must
> > > be used because ip netns unshares the mount namespace so nested "ip
> > > netns add" breaks exec calls from the init ns.
> > 
> > I'm not sure what the policy is in netdev, but I would prefer to have
> > selftest changes in another patch (I think earlier in the series so as
> > not to break the bisection), in order to simplify backporting (e.g. in
> > CentOS Stream, to keep the backport small, I didn't backport the dozens
> > of patches for selftest that we did previously).

Sounds good! I wasn't sure if breakage so tightly coupled should be in
the same patch or not, I'm happy to split it up to ease backporting.

> > 
> > Obviously, if it's not possible and breaks the bisection, I can safely
> > skip these changes during the backport.
> > 
> > > 
> > > Test run:
> > > 
> > > 1..25
> > > ok 1 vm_server_host_client
> > > ok 2 vm_client_host_server
> > > ok 3 vm_loopback
> > > ok 4 ns_host_vsock_ns_mode_ok
> > > ok 5 ns_host_vsock_child_ns_mode_ok
> > > ok 6 ns_global_same_cid_fails
> > > ok 7 ns_local_same_cid_ok
> > > ok 8 ns_global_local_same_cid_ok
> > > ok 9 ns_local_global_same_cid_ok
> > > ok 10 ns_diff_global_host_connect_to_global_vm_ok
> > > ok 11 ns_diff_global_host_connect_to_local_vm_fails
> > > ok 12 ns_diff_global_vm_connect_to_global_host_ok
> > > ok 13 ns_diff_global_vm_connect_to_local_host_fails
> > > ok 14 ns_diff_local_host_connect_to_local_vm_fails
> > > ok 15 ns_diff_local_vm_connect_to_local_host_fails
> > > ok 16 ns_diff_global_to_local_loopback_local_fails
> > > ok 17 ns_diff_local_to_global_loopback_fails
> > > ok 18 ns_diff_local_to_local_loopback_fails
> > > ok 19 ns_diff_global_to_global_loopback_ok
> > > ok 20 ns_same_local_loopback_ok
> > > ok 21 ns_same_local_host_connect_to_local_vm_ok
> > > ok 22 ns_same_local_vm_connect_to_local_host_ok
> > > ok 23 ns_delete_vm_ok
> > > ok 24 ns_delete_host_ok
> > > ok 25 ns_delete_both_ok
> > > SUMMARY: PASS=25 SKIP=0 FAIL=0
> > 
> > IMO this can be removed from the commit message, doesn't add much value
> > other than say that all test passes.

sgtm!

> > 
> > > 
> > > Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
> > > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > > Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
> > > Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > include/net/af_vsock.h                  |  6 +++++-
> > > include/net/netns/vsock.h               |  1 +
> > > net/vmw_vsock/af_vsock.c                | 10 ++++++----
> > > tools/testing/selftests/vsock/vmtest.sh | 35
> > > +++++++++++++++------------------
> > > 4 files changed, 28 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > index d3ff48a2fbe0..c7de33039907 100644
> > > --- a/include/net/af_vsock.h
> > > +++ b/include/net/af_vsock.h
> > > @@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> > > 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> > > }
> > > 
> > > -static inline void vsock_net_set_child_mode(struct net *net,
> > > +static inline bool vsock_net_set_child_mode(struct net *net,
> > > 					    enum vsock_net_mode mode)
> > > {
> > > +	if (xchg(&net->vsock.child_ns_mode_locked, 1))
> > > +		return false;
> > > +
> > > 	WRITE_ONCE(net->vsock.child_ns_mode, mode);
> > > +	return true;
> > > }
> > > 
> > > static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> > > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> > > index b34d69a22fa8..8c855fff8039 100644
> > > --- a/include/net/netns/vsock.h
> > > +++ b/include/net/netns/vsock.h
> > > @@ -17,5 +17,6 @@ struct netns_vsock {
> > > 
> > > 	enum vsock_net_mode mode;
> > > 	enum vsock_net_mode child_ns_mode;
> > > +	int child_ns_mode_locked;
> > > };
> > > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index 9880756d9eff..35e097f4fde8 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -90,14 +90,15 @@
> > > *
> > > *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
> > > *     mode, which is set at namespace creation and immutable thereafter.
> > > - *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
> > > + *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
> > > *     child namespaces will inherit when created. The initial value matches
> > > *     the namespace's own ns_mode.
> > > *
> > > *   Changing child_ns_mode only affects newly created namespaces, not the
> > > *   current namespace or existing children. A "local" namespace cannot set
> > > - *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
> > > - *   from the parent's child_ns_mode.
> > > + *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
> > > + *   be configured and locked down by a namespace manager. At namespace
> > > + *   creation, ns_mode is inherited from the parent's child_ns_mode.
> > 
> > We just merged commit a07c33c6f2fc ("vsock: document namespace mode
> > sysctls") in the net tree, so we should update also
> > Documentation/admin-guide/sysctl/net.rst

Indeed.

> > 
> > > *
> > > *   The init_net mode is "global" and cannot be modified.
> > 
> > Maybe we should also emphasise in the documentation and in the commit
> > description that `child_ns_mode` in `init_net` also is write-once, so
> > writing `local` to that one by the init process (e.g. systemd),
> > essentially will make all the new namespaces in `local` mode. This could
> > be useful for container/namespace managers.
> > 

Sounds good.

> > > *
> > > @@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> > > 		    new_mode == VSOCK_NET_MODE_GLOBAL)
> > > 			return -EPERM;
> > > 
> > > -		vsock_net_set_child_mode(net, new_mode);
> > > +		if (!vsock_net_set_child_mode(net, new_mode))
> > > +			return -EPERM;
> > 
> > So, if `child_ns_mode` is set to `local` but locked, writing `local`
> > again will return -EPERM, is this really what we want?
> > 
> > I'm not sure if we can relax it a bit, but then we may race between
> > reader and writer, so maybe it's fine like it is in this patch, but we
> > should document better that any writes (even same value) after the first
> > one will return -EPERM.
> 
> I think we can try in this way:
> 
> static inline bool vsock_net_set_child_mode(struct net *net,
> 					    enum vsock_net_mode mode)
> {
> 	int new_locked = mode + 1;
> 	int old_locked = 0;
> 
> 	if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
> 			&old_locked, new_locked)) {
> 		WRITE_ONCE(net->vsock.child_ns_mode, mode);
> 		return true;
> 	}
> 
> 	return old_locked == new_locked;
> }
> 
> With a comment like this near child_ns_mode_locked in struct netns_vsock:
> /* 0 = unlocked
>  * 1 = locked to GLOBAL (VSOCK_NET_MODE_GLOBAL + 1)
>  * 2 = locked to LOCAL  (VSOCK_NET_MODE_LOCAL + 1)
>  */

This looks good to me. Will change.

> 
> While writing that, I thought that we can even remove `child_ns_mode_locked`
> and use a single variable where encode the value and the state, but maybe
> it's an unnecessary extra complexity.
> 
> Stefano
> 
> > 
> > About that, should we return something different, like -EBUSY ?
> > Not a strong opinion, just to differentiate with the other check before.
> > 
> > Thanks,
> > Stefano
> 

Differentiating with -EBUSY sounds useful. Will add that and document.

Best,
Bobby

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] vsock: lock down child_ns_mode as write-once
  2026-02-18 16:15     ` Bobby Eshleman
@ 2026-02-19  0:41       ` Jakub Kicinski
  2026-02-19 16:21         ` Bobby Eshleman
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2026-02-19  0:41 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, virtualization, netdev, linux-kernel, kvm,
	linux-kselftest, Daan De Meyer

On Wed, 18 Feb 2026 08:15:38 -0800 Bobby Eshleman wrote:
> > > I'm not sure what the policy is in netdev, but I would prefer to have
> > > selftest changes in another patch (I think earlier in the series so as
> > > not to break the bisection), in order to simplify backporting (e.g. in
> > > CentOS Stream, to keep the backport small, I didn't backport the dozens
> > > of patches for selftest that we did previously).  
> 
> Sounds good! I wasn't sure if breakage so tightly coupled should be in
> the same patch or not, I'm happy to split it up to ease backporting.

FWIW the netdev recommendation is indeed to split the selftests out.
Also to bungle selftest patches with the fix into one series targeting
net, even tho the selftests patches won't have a Fixes tag.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] vsock: lock down child_ns_mode as write-once
  2026-02-19  0:41       ` Jakub Kicinski
@ 2026-02-19 16:21         ` Bobby Eshleman
  0 siblings, 0 replies; 6+ messages in thread
From: Bobby Eshleman @ 2026-02-19 16:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stefano Garzarella, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, virtualization, netdev, linux-kernel, kvm,
	linux-kselftest, Daan De Meyer

On Wed, Feb 18, 2026 at 04:41:19PM -0800, Jakub Kicinski wrote:
> On Wed, 18 Feb 2026 08:15:38 -0800 Bobby Eshleman wrote:
> > > > I'm not sure what the policy is in netdev, but I would prefer to have
> > > > selftest changes in another patch (I think earlier in the series so as
> > > > not to break the bisection), in order to simplify backporting (e.g. in
> > > > CentOS Stream, to keep the backport small, I didn't backport the dozens
> > > > of patches for selftest that we did previously).  
> > 
> > Sounds good! I wasn't sure if breakage so tightly coupled should be in
> > the same patch or not, I'm happy to split it up to ease backporting.
> 
> FWIW the netdev recommendation is indeed to split the selftests out.
> Also to bungle selftest patches with the fix into one series targeting
> net, even tho the selftests patches won't have a Fixes tag.

Duly noted, thanks.

-Bobby

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-19 16:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18  1:45 [PATCH net] vsock: lock down child_ns_mode as write-once Bobby Eshleman
2026-02-18 10:02 ` Stefano Garzarella
2026-02-18 10:43   ` Stefano Garzarella
2026-02-18 16:15     ` Bobby Eshleman
2026-02-19  0:41       ` Jakub Kicinski
2026-02-19 16:21         ` Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox