public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode
@ 2026-02-18 18:10 Bobby Eshleman
  2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
	Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
  Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
	linux-doc, Daan De Meyer

Two administrator processes may race when setting child_ns_mode: one
sets it to "local" and creates a namespace, but another changes it to
"global" in between. The first process ends up with a namespace in the
wrong mode. Make child_ns_mode write-once so that a namespace manager
can set it once, check the value, and be guaranteed it won't change
before creating its namespaces. Writing a different value after the
first write returns -EBUSY.

One patch for the implementation, one for docs, and one for tests.

---
Changes in v2:
- break docs, tests, and implementation into separate patches
- clarify commit message
- only use child_ns_mode, do not add additional child_ns_mode_locked
  variable
- add documentation to Documentation/
- Link to v1: https://lore.kernel.org/r/20260217-vsock-ns-write-once-v1-1-a1fb30f289a9@meta.com

---
Bobby Eshleman (3):
      selftests/vsock: change tests to respect write-once child ns mode
      vsock: lock down child_ns_mode as write-once
      vsock: document write-once behavior of the child_ns_mode sysctl

 Documentation/admin-guide/sysctl/net.rst | 10 ++++++---
 include/net/af_vsock.h                   | 20 +++++++++++++++---
 include/net/netns/vsock.h                |  9 +++++++-
 net/vmw_vsock/af_vsock.c                 | 15 +++++++++-----
 tools/testing/selftests/vsock/vmtest.sh  | 35 +++++++++++++++-----------------
 5 files changed, 58 insertions(+), 31 deletions(-)
---
base-commit: ccd8e87748ad083047d6c8544c5809b7f96cc8df
change-id: 20260217-vsock-ns-write-once-8834d684e0a2

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode
  2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
@ 2026-02-18 18:10 ` Bobby Eshleman
  2026-02-19 10:35   ` Stefano Garzarella
  2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
  2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
  2 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
	Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
  Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
	linux-doc

From: Bobby Eshleman <bobbyeshleman@meta.com>

The child_ns_mode sysctl parameter becomes write-once in a future patch
in this series, which breaks existing tests. This patch updates the
tests to respect this new policy. No additional tests are added.

Add "global-parent" and "local-parent" namespaces as intermediaries to
spawn namespaces in the given modes. This avoids the need to change
"child_ns_mode" in the init_ns. nsenter must be used because ip netns
unshares the mount namespace so nested "ip netns add" breaks exec calls
from the init ns. Adds nsenter to the deps check.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index dc8dbe74a6d0..e1e78b295e41 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -210,16 +210,17 @@ check_result() {
 }
 
 add_namespaces() {
-	local orig_mode
-	orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+	ip netns add "global-parent" 2>/dev/null
+	echo "global" | ip netns exec "global-parent" \
+		tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
+	ip netns add "local-parent" 2>/dev/null
+	echo "local" | ip netns exec "local-parent" \
+		tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
 
-	for mode in "${NS_MODES[@]}"; do
-		echo "${mode}" > /proc/sys/net/vsock/child_ns_mode
-		ip netns add "${mode}0" 2>/dev/null
-		ip netns add "${mode}1" 2>/dev/null
-	done
-
-	echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
+	nsenter --net=/var/run/netns/global-parent ip netns add "global0" 2>/dev/null
+	nsenter --net=/var/run/netns/global-parent ip netns add "global1" 2>/dev/null
+	nsenter --net=/var/run/netns/local-parent ip netns add "local0" 2>/dev/null
+	nsenter --net=/var/run/netns/local-parent ip netns add "local1" 2>/dev/null
 }
 
 init_namespaces() {
@@ -237,6 +238,8 @@ del_namespaces() {
 		log_host "removed ns ${mode}0"
 		log_host "removed ns ${mode}1"
 	done
+	ip netns del "global-parent" &>/dev/null
+	ip netns del "local-parent" &>/dev/null
 }
 
 vm_ssh() {
@@ -287,7 +290,7 @@ check_args() {
 }
 
 check_deps() {
-	for dep in vng ${QEMU} busybox pkill ssh ss socat; do
+	for dep in vng ${QEMU} busybox pkill ssh ss socat nsenter; do
 		if [[ ! -x $(command -v "${dep}") ]]; then
 			echo -e "skip:    dependency ${dep} not found!\n"
 			exit "${KSFT_SKIP}"
@@ -1231,12 +1234,8 @@ test_ns_local_same_cid_ok() {
 }
 
 test_ns_host_vsock_child_ns_mode_ok() {
-	local orig_mode
-	local rc
-
-	orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+	local rc="${KSFT_PASS}"
 
-	rc="${KSFT_PASS}"
 	for mode in "${NS_MODES[@]}"; do
 		local ns="${mode}0"
 
@@ -1246,15 +1245,13 @@ test_ns_host_vsock_child_ns_mode_ok() {
 			continue
 		fi
 
-		if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then
-			log_host "child_ns_mode should be writable to ${mode}"
+		if ! echo "${mode}" | ip netns exec "${ns}" \
+			tee /proc/sys/net/vsock/child_ns_mode &>/dev/null; then
 			rc="${KSFT_FAIL}"
 			continue
 		fi
 	done
 
-	echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
-
 	return "${rc}"
 }
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
  2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
  2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
@ 2026-02-18 18:10 ` Bobby Eshleman
  2026-02-19 10:35   ` Stefano Garzarella
  2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
  2 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
	Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
  Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
	linux-doc, Daan De Meyer

From: Bobby Eshleman <bobbyeshleman@meta.com>

Two administrator processes may race when setting child_ns_mode as one
process sets child_ns_mode to "local" and then creates a namespace, but
another process changes child_ns_mode to "global" between the write and
the namespace creation. The first process ends up with a namespace in
"global" mode instead of "local". While this can be detected after the
fact by reading ns_mode and retrying, it is fragile and error-prone.

Make child_ns_mode write-once so that a namespace manager can set it
once and be sure it won't change. Writing a different value after the
first write returns -EBUSY. This applies to all namespaces, including
init_net, where an init process can write "local" to lock all future
namespaces into local mode.

Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/net/af_vsock.h    | 20 +++++++++++++++++---
 include/net/netns/vsock.h |  9 ++++++++-
 net/vmw_vsock/af_vsock.c  | 15 ++++++++++-----
 3 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..9bd42147626d 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
 }
 
-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
 					    enum vsock_net_mode mode)
 {
-	WRITE_ONCE(net->vsock.child_ns_mode, mode);
+	int locked = mode + VSOCK_NET_MODE_LOCKED;
+	int cur;
+
+	cur = READ_ONCE(net->vsock.child_ns_mode);
+	if (cur == locked)
+		return true;
+	if (cur >= VSOCK_NET_MODE_LOCKED)
+		return false;
+
+	if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
+		return true;
+
+	return cur == locked;
 }
 
 static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
 {
-	return READ_ONCE(net->vsock.child_ns_mode);
+	int mode = READ_ONCE(net->vsock.child_ns_mode);
+
+	return mode & (VSOCK_NET_MODE_LOCKED - 1);
 }
 
 /* Return true if two namespaces pass the mode rules. Otherwise, return false.
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..d20ab6269342 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -7,6 +7,7 @@
 enum vsock_net_mode {
 	VSOCK_NET_MODE_GLOBAL,
 	VSOCK_NET_MODE_LOCAL,
+	VSOCK_NET_MODE_LOCKED,
 };
 
 struct netns_vsock {
@@ -16,6 +17,12 @@ struct netns_vsock {
 	u32 port;
 
 	enum vsock_net_mode mode;
-	enum vsock_net_mode child_ns_mode;
+
+	/* 0 (GLOBAL)
+	 * 1 (LOCAL)
+	 * 2 (GLOBAL + LOCKED)
+	 * 3 (LOCAL + LOCKED)
+	 */
+	int child_ns_mode;
 };
 #endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..50044a838c89 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -90,16 +90,20 @@
  *
  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
  *     mode, which is set at namespace creation and immutable thereafter.
- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
  *     child namespaces will inherit when created. The initial value matches
  *     the namespace's own ns_mode.
  *
  *   Changing child_ns_mode only affects newly created namespaces, not the
  *   current namespace or existing children. A "local" namespace cannot set
- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
- *   from the parent's child_ns_mode.
+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may be
+ *   configured and locked down by a namespace manager. Writing a different
+ *   value after the first write returns -EBUSY. At namespace creation, ns_mode
+ *   is inherited from the parent's child_ns_mode.
  *
- *   The init_net mode is "global" and cannot be modified.
+ *   The init_net mode is "global" and cannot be modified. The init_net
+ *   child_ns_mode is also write-once, so an init process (e.g. systemd) can
+ *   set it to "local" to ensure all new namespaces inherit local mode.
  *
  *   The modes affect the allocation and accessibility of CIDs as follows:
  *
@@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
 		    new_mode == VSOCK_NET_MODE_GLOBAL)
 			return -EPERM;
 
-		vsock_net_set_child_mode(net, new_mode);
+		if (!vsock_net_set_child_mode(net, new_mode))
+			return -EBUSY;
 	}
 
 	return 0;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl
  2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
  2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
  2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
@ 2026-02-18 18:10 ` Bobby Eshleman
  2026-02-19 10:36   ` Stefano Garzarella
  2 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
	Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
  Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
	linux-doc

From: Bobby Eshleman <bobbyeshleman@meta.com>

Update the vsock child_ns_mode documentation to include the new the
write-once semantics of setting child_ns_mode. The semantics are
implemented in a different patch in this series.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 Documentation/admin-guide/sysctl/net.rst | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index c10530624f1e..976a176fb451 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -581,9 +581,9 @@ The init_net mode is always ``global``.
 child_ns_mode
 -------------
 
-Controls what mode newly created child namespaces will inherit. At namespace
-creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
-initial value matches the namespace's own ``ns_mode``.
+Write-once. Controls what mode newly created child namespaces will inherit. At
+namespace creation, ``ns_mode`` is inherited from the parent's
+``child_ns_mode``. The initial value matches the namespace's own ``ns_mode``.
 
 Values:
 
@@ -594,6 +594,10 @@ Values:
 	  their sockets will only be able to connect within their own
 	  namespace.
 
+``child_ns_mode`` can only be written once per namespace. Writing the same
+value that is already set succeeds. Writing a different value after the first
+write returns ``-EBUSY``.
+
 Changing ``child_ns_mode`` only affects namespaces created after the change;
 it does not modify the current namespace or any existing children.
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode
  2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
@ 2026-02-19 10:35   ` Stefano Garzarella
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 10:35 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
	netdev, linux-kernel, kvm, linux-kselftest, linux-doc

On Wed, Feb 18, 2026 at 10:10:36AM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>The child_ns_mode sysctl parameter becomes write-once in a future patch
>in this series, which breaks existing tests. This patch updates the
>tests to respect this new policy. No additional tests are added.
>
>Add "global-parent" and "local-parent" namespaces as intermediaries to
>spawn namespaces in the given modes. This avoids the need to change
>"child_ns_mode" in the init_ns. nsenter must be used because ip netns
>unshares the mount namespace so nested "ip netns add" breaks exec calls
>from the init ns. Adds nsenter to the deps check.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
> tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++------------------
> 1 file changed, 16 insertions(+), 19 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
>index dc8dbe74a6d0..e1e78b295e41 100755
>--- a/tools/testing/selftests/vsock/vmtest.sh
>+++ b/tools/testing/selftests/vsock/vmtest.sh
>@@ -210,16 +210,17 @@ check_result() {
> }
>
> add_namespaces() {
>-	local orig_mode
>-	orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
>+	ip netns add "global-parent" 2>/dev/null
>+	echo "global" | ip netns exec "global-parent" \
>+		tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
>+	ip netns add "local-parent" 2>/dev/null
>+	echo "local" | ip netns exec "local-parent" \
>+		tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
>
>-	for mode in "${NS_MODES[@]}"; do
>-		echo "${mode}" > /proc/sys/net/vsock/child_ns_mode
>-		ip netns add "${mode}0" 2>/dev/null
>-		ip netns add "${mode}1" 2>/dev/null
>-	done
>-
>-	echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
>+	nsenter --net=/var/run/netns/global-parent ip netns add "global0" 2>/dev/null
>+	nsenter --net=/var/run/netns/global-parent ip netns add "global1" 2>/dev/null
>+	nsenter --net=/var/run/netns/local-parent ip netns add "local0" 2>/dev/null
>+	nsenter --net=/var/run/netns/local-parent ip netns add "local1" 2>/dev/null
> }
>
> init_namespaces() {
>@@ -237,6 +238,8 @@ del_namespaces() {
> 		log_host "removed ns ${mode}0"
> 		log_host "removed ns ${mode}1"
> 	done
>+	ip netns del "global-parent" &>/dev/null
>+	ip netns del "local-parent" &>/dev/null
> }
>
> vm_ssh() {
>@@ -287,7 +290,7 @@ check_args() {
> }
>
> check_deps() {
>-	for dep in vng ${QEMU} busybox pkill ssh ss socat; do
>+	for dep in vng ${QEMU} busybox pkill ssh ss socat nsenter; do
> 		if [[ ! -x $(command -v "${dep}") ]]; then
> 			echo -e "skip:    dependency ${dep} not found!\n"
> 			exit "${KSFT_SKIP}"
>@@ -1231,12 +1234,8 @@ test_ns_local_same_cid_ok() {
> }
>
> test_ns_host_vsock_child_ns_mode_ok() {
>-	local orig_mode
>-	local rc
>-
>-	orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
>+	local rc="${KSFT_PASS}"
>
>-	rc="${KSFT_PASS}"
> 	for mode in "${NS_MODES[@]}"; do
> 		local ns="${mode}0"
>
>@@ -1246,15 +1245,13 @@ test_ns_host_vsock_child_ns_mode_ok() {
> 			continue
> 		fi
>
>-		if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then
>-			log_host "child_ns_mode should be writable to ${mode}"
>+		if ! echo "${mode}" | ip netns exec "${ns}" \
>+			tee /proc/sys/net/vsock/child_ns_mode &>/dev/null; then
> 			rc="${KSFT_FAIL}"
> 			continue
> 		fi
> 	done
>
>-	echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
>-
> 	return "${rc}"
> }
>
>
>-- 
>2.47.3
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
  2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
@ 2026-02-19 10:35   ` Stefano Garzarella
  2026-02-19 16:20     ` Bobby Eshleman
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 10:35 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
	netdev, linux-kernel, kvm, linux-kselftest, linux-doc,
	Daan De Meyer

On Wed, Feb 18, 2026 at 10:10:37AM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Two administrator processes may race when setting child_ns_mode as one
>process sets child_ns_mode to "local" and then creates a namespace, but
>another process changes child_ns_mode to "global" between the write and
>the namespace creation. The first process ends up with a namespace in
>"global" mode instead of "local". While this can be detected after the
>fact by reading ns_mode and retrying, it is fragile and error-prone.
>
>Make child_ns_mode write-once so that a namespace manager can set it
>once and be sure it won't change. Writing a different value after the
>first write returns -EBUSY. This applies to all namespaces, including
>init_net, where an init process can write "local" to lock all future
>namespaces into local mode.
>
>Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>

nit: usually the S-o-b of the author is the last when sending a patch.

>---
> include/net/af_vsock.h    | 20 +++++++++++++++++---
> include/net/netns/vsock.h |  9 ++++++++-
> net/vmw_vsock/af_vsock.c  | 15 ++++++++++-----
> 3 files changed, 35 insertions(+), 9 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index d3ff48a2fbe0..9bd42147626d 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> }
>
>-static inline void vsock_net_set_child_mode(struct net *net,
>+static inline bool vsock_net_set_child_mode(struct net *net,
> 					    enum vsock_net_mode mode)
> {
>-	WRITE_ONCE(net->vsock.child_ns_mode, mode);
>+	int locked = mode + VSOCK_NET_MODE_LOCKED;
>+	int cur;
>+
>+	cur = READ_ONCE(net->vsock.child_ns_mode);
>+	if (cur == locked)
>+		return true;
>+	if (cur >= VSOCK_NET_MODE_LOCKED)
>+		return false;
>+
>+	if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
>+		return true;
>+
>+	return cur == locked;

Sorry, it took me a while to get it entirely :-(
This overcomplication is exactly what I wanted to avoid when I proposed 
the change in v1: 
https://lore.kernel.org/netdev/aZWUmbiH11Eh3Y4v@sgarzare-redhat/


> }
>
> static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> {
>-	return READ_ONCE(net->vsock.child_ns_mode);
>+	int mode = READ_ONCE(net->vsock.child_ns_mode);
>+
>+	return mode & (VSOCK_NET_MODE_LOCKED - 1);

This is working just because VSOCK_NET_MODE_LOCKED == 2, so IMO this 
should at least set as value in the enum and documented on top of 
vsock_net_mode.

> }
>
> /* Return true if two namespaces pass the mode rules. Otherwise, return false.
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>index b34d69a22fa8..d20ab6269342 100644
>--- a/include/net/netns/vsock.h
>+++ b/include/net/netns/vsock.h
>@@ -7,6 +7,7 @@
> enum vsock_net_mode {
> 	VSOCK_NET_MODE_GLOBAL,
> 	VSOCK_NET_MODE_LOCAL,
>+	VSOCK_NET_MODE_LOCKED,

This is not really a mode, so IMO should not be part of `enum 
vsock_net_mode`. If you really want it, maybe we can add both 
VSOCK_NET_MODE_GLOBAL_LOCKED and VSOCK_NET_MODE_LOCAL_LOCKED, which can 
be less error prone if we will touch this enum one day.

> };
>
> struct netns_vsock {
>@@ -16,6 +17,12 @@ struct netns_vsock {
> 	u32 port;
>
> 	enum vsock_net_mode mode;
>-	enum vsock_net_mode child_ns_mode;
>+
>+	/* 0 (GLOBAL)
>+	 * 1 (LOCAL)
>+	 * 2 (GLOBAL + LOCKED)
>+	 * 3 (LOCAL + LOCKED)
>+	 */
>+	int child_ns_mode;

Sorry, I don't like this too much, since it seems too complicated to 
read and to maintain, If we really want to use just one variable, maybe 
we can use -1 as UNSET for child_ns_mode. If it is UNSET, 
vsock_net_child_mode() can just return `mode` since it's the default 
that we also documented, if it's set, it means that is locked with the 
value specified.

Maybe with code is easier, I mean something like this:

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..fcd5b538df35 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,15 +276,25 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
  	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
  }
  
-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
  					    enum vsock_net_mode mode)
  {
-	WRITE_ONCE(net->vsock.child_ns_mode, mode);
+	int old = VSOCK_NET_CHILD_NS_UNSET;
+
+	if (try_cmpxchg(&net->vsock.child_ns_mode, &old, mode))
+		return true;
+
+	return old == mode;
  }
  
  static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
  {
-	return READ_ONCE(net->vsock.child_ns_mode);
+	int mode = READ_ONCE(net->vsock.child_ns_mode);
+
+	if (mode == VSOCK_NET_CHILD_NS_UNSET)
+		return net->vsock.mode;
+
+	return mode;
  }
  
  /* Return true if two namespaces pass the mode rules. Otherwise, return false.
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..bf52baf7d7a7 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -9,6 +9,8 @@ enum vsock_net_mode {
  	VSOCK_NET_MODE_LOCAL,
  };
  
+#define VSOCK_NET_CHILD_NS_UNSET (-1)
+
  struct netns_vsock {
  	struct ctl_table_header *sysctl_hdr;
  
@@ -16,6 +18,13 @@ struct netns_vsock {
  	u32 port;
  
  	enum vsock_net_mode mode;
-	enum vsock_net_mode child_ns_mode;
+
+	/* Write-once child namespace mode, must be initialized to
+	 * VSOCK_NET_CHILD_NS_UNSET. Transitions once from UNSET to a
+	 * vsock_net_mode value via try_cmpxchg on first sysctl write.
+	 * While UNSET, vsock_net_child_mode() returns the namespace's
+	 * own mode since it's the default.
+	 */
+	int child_ns_mode;
  };
  #endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..f0cb7c6a8212 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2853,7 +2853,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
  		    new_mode == VSOCK_NET_MODE_GLOBAL)
  			return -EPERM;
  
-		vsock_net_set_child_mode(net, new_mode);
+		if (!vsock_net_set_child_mode(net, new_mode))
+			return -EBUSY;
  	}
  
  	return 0;
@@ -2922,7 +2923,7 @@ static void vsock_net_init(struct net *net)
  	else
  		net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns);
  
-	net->vsock.child_ns_mode = net->vsock.mode;
+	net->vsock.child_ns_mode = VSOCK_NET_CHILD_NS_UNSET;
  }
  
  static __net_init int vsock_sysctl_init_net(struct net *net)

If you like it, please add my Co-developed-by and S-o-b.

BTW, let's discuss here more about it and agree before sending a new 
version, so this should also allow other to comment eventually.

Thanks,
Stefano

> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 9880756d9eff..50044a838c89 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -90,16 +90,20 @@
>  *
>  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>  *     mode, which is set at namespace creation and immutable thereafter.
>- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>  *     child namespaces will inherit when created. The initial value matches
>  *     the namespace's own ns_mode.
>  *
>  *   Changing child_ns_mode only affects newly created namespaces, not the
>  *   current namespace or existing children. A "local" namespace cannot set
>- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
>- *   from the parent's child_ns_mode.
>+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may be
>+ *   configured and locked down by a namespace manager. Writing a different
>+ *   value after the first write returns -EBUSY. At namespace creation, ns_mode
>+ *   is inherited from the parent's child_ns_mode.
>  *
>- *   The init_net mode is "global" and cannot be modified.
>+ *   The init_net mode is "global" and cannot be modified. The init_net
>+ *   child_ns_mode is also write-once, so an init process (e.g. systemd) can
>+ *   set it to "local" to ensure all new namespaces inherit local mode.
>  *
>  *   The modes affect the allocation and accessibility of CIDs as follows:
>  *
>@@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> 		    new_mode == VSOCK_NET_MODE_GLOBAL)
> 			return -EPERM;
>
>-		vsock_net_set_child_mode(net, new_mode);
>+		if (!vsock_net_set_child_mode(net, new_mode))
>+			return -EBUSY;
> 	}
>
> 	return 0;
>
>-- 
>2.47.3
>


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl
  2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
@ 2026-02-19 10:36   ` Stefano Garzarella
  2026-02-19 16:06     ` Bobby Eshleman
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 10:36 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
	netdev, linux-kernel, kvm, linux-kselftest, linux-doc

On Wed, Feb 18, 2026 at 10:10:38AM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Update the vsock child_ns_mode documentation to include the new the

nit: s/the new the/the new

>write-once semantics of setting child_ns_mode. The semantics are
>implemented in a different patch in this series.

s/different/preceding ?

IMO this can be squashed with the previous patch, but not sure netdev 
policy about that. Not a strong opinion, it's fine also in this way.

>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
> Documentation/admin-guide/sysctl/net.rst | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
>diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
>index c10530624f1e..976a176fb451 100644
>--- a/Documentation/admin-guide/sysctl/net.rst
>+++ b/Documentation/admin-guide/sysctl/net.rst
>@@ -581,9 +581,9 @@ The init_net mode is always ``global``.
> child_ns_mode
> -------------
>
>-Controls what mode newly created child namespaces will inherit. At namespace
>-creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
>-initial value matches the namespace's own ``ns_mode``.
>+Write-once. Controls what mode newly created child namespaces will inherit. At
>+namespace creation, ``ns_mode`` is inherited from the parent's
>+``child_ns_mode``. The initial value matches the namespace's own ``ns_mode``.
>
> Values:
>
>@@ -594,6 +594,10 @@ Values:
> 	  their sockets will only be able to connect within their own
> 	  namespace.
>
>+``child_ns_mode`` can only be written once per namespace. Writing the same
>+value that is already set succeeds. Writing a different value after the first
>+write returns ``-EBUSY``.

nit: instead of saying that it can only be written once, we could say 
that the first write locks the value, to be closer to the actual 
behavior, something like this: 

   The first write to ``child_ns_mode`` locks its value. Subsequent
   writes of the same value succeed, but writing a different value
   returns ``-EBUSY``.


Thanks,
Stefano

>+
> Changing ``child_ns_mode`` only affects namespaces created after the change;
> it does not modify the current namespace or any existing children.
>
>
>-- 
>2.47.3
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl
  2026-02-19 10:36   ` Stefano Garzarella
@ 2026-02-19 16:06     ` Bobby Eshleman
  0 siblings, 0 replies; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-19 16:06 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
	netdev, linux-kernel, kvm, linux-kselftest, linux-doc

On Thu, Feb 19, 2026 at 11:36:40AM +0100, Stefano Garzarella wrote:
> On Wed, Feb 18, 2026 at 10:10:38AM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Update the vsock child_ns_mode documentation to include the new the
> 
> nit: s/the new the/the new
> 
> > write-once semantics of setting child_ns_mode. The semantics are
> > implemented in a different patch in this series.
> 
> s/different/preceding ?
> 
> IMO this can be squashed with the previous patch, but not sure netdev policy
> about that. Not a strong opinion, it's fine also in this way.
> 
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Documentation/admin-guide/sysctl/net.rst | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
> > index c10530624f1e..976a176fb451 100644
> > --- a/Documentation/admin-guide/sysctl/net.rst
> > +++ b/Documentation/admin-guide/sysctl/net.rst
> > @@ -581,9 +581,9 @@ The init_net mode is always ``global``.
> > child_ns_mode
> > -------------
> > 
> > -Controls what mode newly created child namespaces will inherit. At namespace
> > -creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
> > -initial value matches the namespace's own ``ns_mode``.
> > +Write-once. Controls what mode newly created child namespaces will inherit. At
> > +namespace creation, ``ns_mode`` is inherited from the parent's
> > +``child_ns_mode``. The initial value matches the namespace's own ``ns_mode``.
> > 
> > Values:
> > 
> > @@ -594,6 +594,10 @@ Values:
> > 	  their sockets will only be able to connect within their own
> > 	  namespace.
> > 
> > +``child_ns_mode`` can only be written once per namespace. Writing the same
> > +value that is already set succeeds. Writing a different value after the first
> > +write returns ``-EBUSY``.
> 
> nit: instead of saying that it can only be written once, we could say that
> the first write locks the value, to be closer to the actual behavior,
> something like this:
> 
>   The first write to ``child_ns_mode`` locks its value. Subsequent
>   writes of the same value succeed, but writing a different value
>   returns ``-EBUSY``.
> 
> 
> Thanks,
> Stefano

Sounds good! I agree that is more clear. I'll also remove the change
above that adds "Write-once" at the beginning of the paragraph, since
this clause does a better job explaining how it actually works.

> 
> > +
> > Changing ``child_ns_mode`` only affects namespaces created after the change;
> > it does not modify the current namespace or any existing children.
> > 
> > 
> > -- 
> > 2.47.3
> > 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
  2026-02-19 10:35   ` Stefano Garzarella
@ 2026-02-19 16:20     ` Bobby Eshleman
  2026-02-19 16:36       ` Stefano Garzarella
  0 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-19 16:20 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
	netdev, linux-kernel, kvm, linux-kselftest, linux-doc,
	Daan De Meyer

On Thu, Feb 19, 2026 at 11:35:52AM +0100, Stefano Garzarella wrote:
> On Wed, Feb 18, 2026 at 10:10:37AM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Two administrator processes may race when setting child_ns_mode as one
> > process sets child_ns_mode to "local" and then creates a namespace, but
> > another process changes child_ns_mode to "global" between the write and
> > the namespace creation. The first process ends up with a namespace in
> > "global" mode instead of "local". While this can be detected after the
> > fact by reading ns_mode and retrying, it is fragile and error-prone.
> > 
> > Make child_ns_mode write-once so that a namespace manager can set it
> > once and be sure it won't change. Writing a different value after the
> > first write returns -EBUSY. This applies to all namespaces, including
> > init_net, where an init process can write "local" to lock all future
> > namespaces into local mode.
> > 
> > Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
> > Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> nit: usually the S-o-b of the author is the last when sending a patch.

Ah good to know, thanks. Will change.

> 
> > ---
> > include/net/af_vsock.h    | 20 +++++++++++++++++---
> > include/net/netns/vsock.h |  9 ++++++++-
> > net/vmw_vsock/af_vsock.c  | 15 ++++++++++-----
> > 3 files changed, 35 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index d3ff48a2fbe0..9bd42147626d 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> > 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> > }
> > 
> > -static inline void vsock_net_set_child_mode(struct net *net,
> > +static inline bool vsock_net_set_child_mode(struct net *net,
> > 					    enum vsock_net_mode mode)
> > {
> > -	WRITE_ONCE(net->vsock.child_ns_mode, mode);
> > +	int locked = mode + VSOCK_NET_MODE_LOCKED;
> > +	int cur;
> > +
> > +	cur = READ_ONCE(net->vsock.child_ns_mode);
> > +	if (cur == locked)
> > +		return true;
> > +	if (cur >= VSOCK_NET_MODE_LOCKED)
> > +		return false;
> > +
> > +	if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
> > +		return true;
> > +
> > +	return cur == locked;
> 
> Sorry, it took me a while to get it entirely :-(
> This overcomplication is exactly what I wanted to avoid when I proposed the
> change in v1:
> https://lore.kernel.org/netdev/aZWUmbiH11Eh3Y4v@sgarzare-redhat/

Glad you thought so too, because I actually think your original proposed
snippet in that thread is the best/simplest so far.

> 
> 
> > }
> > 
> > static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> > {
> > -	return READ_ONCE(net->vsock.child_ns_mode);
> > +	int mode = READ_ONCE(net->vsock.child_ns_mode);
> > +
> > +	return mode & (VSOCK_NET_MODE_LOCKED - 1);
> 
> This is working just because VSOCK_NET_MODE_LOCKED == 2, so IMO this should
> at least set as value in the enum and documented on top of vsock_net_mode.
> 
> > }
> > 
> > /* Return true if two namespaces pass the mode rules. Otherwise, return false.
> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> > index b34d69a22fa8..d20ab6269342 100644
> > --- a/include/net/netns/vsock.h
> > +++ b/include/net/netns/vsock.h
> > @@ -7,6 +7,7 @@
> > enum vsock_net_mode {
> > 	VSOCK_NET_MODE_GLOBAL,
> > 	VSOCK_NET_MODE_LOCAL,
> > +	VSOCK_NET_MODE_LOCKED,
> 
> This is not really a mode, so IMO should not be part of `enum
> vsock_net_mode`. If you really want it, maybe we can add both
> VSOCK_NET_MODE_GLOBAL_LOCKED and VSOCK_NET_MODE_LOCAL_LOCKED, which can be
> less error prone if we will touch this enum one day.
> 
> > };
> > 
> > struct netns_vsock {
> > @@ -16,6 +17,12 @@ struct netns_vsock {
> > 	u32 port;
> > 
> > 	enum vsock_net_mode mode;
> > -	enum vsock_net_mode child_ns_mode;
> > +
> > +	/* 0 (GLOBAL)
> > +	 * 1 (LOCAL)
> > +	 * 2 (GLOBAL + LOCKED)
> > +	 * 3 (LOCAL + LOCKED)
> > +	 */
> > +	int child_ns_mode;
> 
> Sorry, I don't like this too much, since it seems too complicated to read
> and to maintain, If we really want to use just one variable, maybe we can
> use -1 as UNSET for child_ns_mode. If it is UNSET, vsock_net_child_mode()
> can just return `mode` since it's the default that we also documented, if
> it's set, it means that is locked with the value specified.
> 
> Maybe with code is easier, I mean something like this:
> 
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index d3ff48a2fbe0..fcd5b538df35 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -276,15 +276,25 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>  	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>  }
> -static inline void vsock_net_set_child_mode(struct net *net,
> +static inline bool vsock_net_set_child_mode(struct net *net,
>  					    enum vsock_net_mode mode)
>  {
> -	WRITE_ONCE(net->vsock.child_ns_mode, mode);
> +	int old = VSOCK_NET_CHILD_NS_UNSET;
> +
> +	if (try_cmpxchg(&net->vsock.child_ns_mode, &old, mode))
> +		return true;
> +
> +	return old == mode;
>  }
>  static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>  {
> -	return READ_ONCE(net->vsock.child_ns_mode);
> +	int mode = READ_ONCE(net->vsock.child_ns_mode);
> +
> +	if (mode == VSOCK_NET_CHILD_NS_UNSET)
> +		return net->vsock.mode;
> +
> +	return mode;
>  }
>  /* Return true if two namespaces pass the mode rules. Otherwise, return false.
> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> index b34d69a22fa8..bf52baf7d7a7 100644
> --- a/include/net/netns/vsock.h
> +++ b/include/net/netns/vsock.h
> @@ -9,6 +9,8 @@ enum vsock_net_mode {
>  	VSOCK_NET_MODE_LOCAL,
>  };
> +#define VSOCK_NET_CHILD_NS_UNSET (-1)
> +
>  struct netns_vsock {
>  	struct ctl_table_header *sysctl_hdr;
> @@ -16,6 +18,13 @@ struct netns_vsock {
>  	u32 port;
>  	enum vsock_net_mode mode;
> -	enum vsock_net_mode child_ns_mode;
> +
> +	/* Write-once child namespace mode, must be initialized to
> +	 * VSOCK_NET_CHILD_NS_UNSET. Transitions once from UNSET to a
> +	 * vsock_net_mode value via try_cmpxchg on first sysctl write.
> +	 * While UNSET, vsock_net_child_mode() returns the namespace's
> +	 * own mode since it's the default.
> +	 */
> +	int child_ns_mode;
>  };
>  #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 9880756d9eff..f0cb7c6a8212 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -2853,7 +2853,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>  		    new_mode == VSOCK_NET_MODE_GLOBAL)
>  			return -EPERM;
> -		vsock_net_set_child_mode(net, new_mode);
> +		if (!vsock_net_set_child_mode(net, new_mode))
> +			return -EBUSY;
>  	}
>  	return 0;
> @@ -2922,7 +2923,7 @@ static void vsock_net_init(struct net *net)
>  	else
>  		net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns);
> -	net->vsock.child_ns_mode = net->vsock.mode;
> +	net->vsock.child_ns_mode = VSOCK_NET_CHILD_NS_UNSET;
>  }
>  static __net_init int vsock_sysctl_init_net(struct net *net)
> 
> If you like it, please add my Co-developed-by and S-o-b.

Will do!

> 
> BTW, let's discuss here more about it and agree before sending a new
> version, so this should also allow other to comment eventually.
> 
> Thanks,
> Stefano

Tbh, I like your original proposal from v1 best (copied below). I like
that the whole locking mechanism is self-contained there in one place,
and doesn't ripple out elsewhere into the code (e.g.,
vsock_net_child_mode() carrying logic around UNSET). Wdyt?

static inline bool vsock_net_set_child_mode(struct net *net,
					    enum vsock_net_mode mode)
{
	int new_locked = mode + 1;
	int old_locked = 0;

	if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
			&old_locked, new_locked)) {
		WRITE_ONCE(net->vsock.child_ns_mode, mode);
		return true;
	}

	return old_locked == new_locked;
}


Best,
Bobby

> 
> > };
> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 9880756d9eff..50044a838c89 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -90,16 +90,20 @@
> >  *
> >  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
> >  *     mode, which is set at namespace creation and immutable thereafter.
> > - *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
> > + *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
> >  *     child namespaces will inherit when created. The initial value matches
> >  *     the namespace's own ns_mode.
> >  *
> >  *   Changing child_ns_mode only affects newly created namespaces, not the
> >  *   current namespace or existing children. A "local" namespace cannot set
> > - *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
> > - *   from the parent's child_ns_mode.
> > + *   child_ns_mode to "global". child_ns_mode is write-once, so that it may be
> > + *   configured and locked down by a namespace manager. Writing a different
> > + *   value after the first write returns -EBUSY. At namespace creation, ns_mode
> > + *   is inherited from the parent's child_ns_mode.
> >  *
> > - *   The init_net mode is "global" and cannot be modified.
> > + *   The init_net mode is "global" and cannot be modified. The init_net
> > + *   child_ns_mode is also write-once, so an init process (e.g. systemd) can
> > + *   set it to "local" to ensure all new namespaces inherit local mode.
> >  *
> >  *   The modes affect the allocation and accessibility of CIDs as follows:
> >  *
> > @@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> > 		    new_mode == VSOCK_NET_MODE_GLOBAL)
> > 			return -EPERM;
> > 
> > -		vsock_net_set_child_mode(net, new_mode);
> > +		if (!vsock_net_set_child_mode(net, new_mode))
> > +			return -EBUSY;
> > 	}
> > 
> > 	return 0;
> > 
> > -- 
> > 2.47.3
> > 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
  2026-02-19 16:20     ` Bobby Eshleman
@ 2026-02-19 16:36       ` Stefano Garzarella
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 16:36 UTC (permalink / raw)
  To: Bobby Eshleman, Jakub Kicinski, Paolo Abeni
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
	Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
	netdev, linux-kernel, kvm, linux-kselftest, linux-doc,
	Daan De Meyer

On Thu, Feb 19, 2026 at 08:20:54AM -0800, Bobby Eshleman wrote:
>On Thu, Feb 19, 2026 at 11:35:52AM +0100, Stefano Garzarella wrote:
>> On Wed, Feb 18, 2026 at 10:10:37AM -0800, Bobby Eshleman wrote:
>> > From: Bobby Eshleman <bobbyeshleman@meta.com>
>> >
>> > Two administrator processes may race when setting child_ns_mode as one
>> > process sets child_ns_mode to "local" and then creates a namespace, but
>> > another process changes child_ns_mode to "global" between the write and
>> > the namespace creation. The first process ends up with a namespace in
>> > "global" mode instead of "local". While this can be detected after the
>> > fact by reading ns_mode and retrying, it is fragile and error-prone.
>> >
>> > Make child_ns_mode write-once so that a namespace manager can set it
>> > once and be sure it won't change. Writing a different value after the
>> > first write returns -EBUSY. This applies to all namespaces, including
>> > init_net, where an init process can write "local" to lock all future
>> > namespaces into local mode.
>> >
>> > Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>> > Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>> > Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>>
>> nit: usually the S-o-b of the author is the last when sending a patch.
>
>Ah good to know, thanks. Will change.
>
>>
>> > ---
>> > include/net/af_vsock.h    | 20 +++++++++++++++++---
>> > include/net/netns/vsock.h |  9 ++++++++-
>> > net/vmw_vsock/af_vsock.c  | 15 ++++++++++-----
>> > 3 files changed, 35 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > index d3ff48a2fbe0..9bd42147626d 100644
>> > --- a/include/net/af_vsock.h
>> > +++ b/include/net/af_vsock.h
>> > @@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>> > 	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>> > }
>> >
>> > -static inline void vsock_net_set_child_mode(struct net *net,
>> > +static inline bool vsock_net_set_child_mode(struct net *net,
>> > 					    enum vsock_net_mode mode)
>> > {
>> > -	WRITE_ONCE(net->vsock.child_ns_mode, mode);
>> > +	int locked = mode + VSOCK_NET_MODE_LOCKED;
>> > +	int cur;
>> > +
>> > +	cur = READ_ONCE(net->vsock.child_ns_mode);
>> > +	if (cur == locked)
>> > +		return true;
>> > +	if (cur >= VSOCK_NET_MODE_LOCKED)
>> > +		return false;
>> > +
>> > +	if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
>> > +		return true;
>> > +
>> > +	return cur == locked;
>>
>> Sorry, it took me a while to get it entirely :-(
>> This overcomplication is exactly what I wanted to avoid when I proposed the
>> change in v1:
>> https://lore.kernel.org/netdev/aZWUmbiH11Eh3Y4v@sgarzare-redhat/
>
>Glad you thought so too, because I actually think your original proposed
>snippet in that thread is the best/simplest so far.
>
>>
>>
>> > }
>> >
>> > static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>> > {
>> > -	return READ_ONCE(net->vsock.child_ns_mode);
>> > +	int mode = READ_ONCE(net->vsock.child_ns_mode);
>> > +
>> > +	return mode & (VSOCK_NET_MODE_LOCKED - 1);
>>
>> This is working just because VSOCK_NET_MODE_LOCKED == 2, so IMO this should
>> at least set as value in the enum and documented on top of vsock_net_mode.
>>
>> > }
>> >
>> > /* Return true if two namespaces pass the mode rules. Otherwise, return false.
>> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>> > index b34d69a22fa8..d20ab6269342 100644
>> > --- a/include/net/netns/vsock.h
>> > +++ b/include/net/netns/vsock.h
>> > @@ -7,6 +7,7 @@
>> > enum vsock_net_mode {
>> > 	VSOCK_NET_MODE_GLOBAL,
>> > 	VSOCK_NET_MODE_LOCAL,
>> > +	VSOCK_NET_MODE_LOCKED,
>>
>> This is not really a mode, so IMO should not be part of `enum
>> vsock_net_mode`. If you really want it, maybe we can add both
>> VSOCK_NET_MODE_GLOBAL_LOCKED and VSOCK_NET_MODE_LOCAL_LOCKED, which can be
>> less error prone if we will touch this enum one day.
>>
>> > };
>> >
>> > struct netns_vsock {
>> > @@ -16,6 +17,12 @@ struct netns_vsock {
>> > 	u32 port;
>> >
>> > 	enum vsock_net_mode mode;
>> > -	enum vsock_net_mode child_ns_mode;
>> > +
>> > +	/* 0 (GLOBAL)
>> > +	 * 1 (LOCAL)
>> > +	 * 2 (GLOBAL + LOCKED)
>> > +	 * 3 (LOCAL + LOCKED)
>> > +	 */
>> > +	int child_ns_mode;
>>
>> Sorry, I don't like this too much, since it seems too complicated to read
>> and to maintain, If we really want to use just one variable, maybe we can
>> use -1 as UNSET for child_ns_mode. If it is UNSET, vsock_net_child_mode()
>> can just return `mode` since it's the default that we also documented, if
>> it's set, it means that is locked with the value specified.
>>
>> Maybe with code is easier, I mean something like this:
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index d3ff48a2fbe0..fcd5b538df35 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -276,15 +276,25 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>>  	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>>  }
>> -static inline void vsock_net_set_child_mode(struct net *net,
>> +static inline bool vsock_net_set_child_mode(struct net *net,
>>  					    enum vsock_net_mode mode)
>>  {
>> -	WRITE_ONCE(net->vsock.child_ns_mode, mode);
>> +	int old = VSOCK_NET_CHILD_NS_UNSET;
>> +
>> +	if (try_cmpxchg(&net->vsock.child_ns_mode, &old, mode))
>> +		return true;
>> +
>> +	return old == mode;
>>  }
>>  static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>>  {
>> -	return READ_ONCE(net->vsock.child_ns_mode);
>> +	int mode = READ_ONCE(net->vsock.child_ns_mode);
>> +
>> +	if (mode == VSOCK_NET_CHILD_NS_UNSET)
>> +		return net->vsock.mode;
>> +
>> +	return mode;
>>  }
>>  /* Return true if two namespaces pass the mode rules. Otherwise, return false.
>> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>> index b34d69a22fa8..bf52baf7d7a7 100644
>> --- a/include/net/netns/vsock.h
>> +++ b/include/net/netns/vsock.h
>> @@ -9,6 +9,8 @@ enum vsock_net_mode {
>>  	VSOCK_NET_MODE_LOCAL,
>>  };
>> +#define VSOCK_NET_CHILD_NS_UNSET (-1)
>> +
>>  struct netns_vsock {
>>  	struct ctl_table_header *sysctl_hdr;
>> @@ -16,6 +18,13 @@ struct netns_vsock {
>>  	u32 port;
>>  	enum vsock_net_mode mode;
>> -	enum vsock_net_mode child_ns_mode;
>> +
>> +	/* Write-once child namespace mode, must be initialized to
>> +	 * VSOCK_NET_CHILD_NS_UNSET. Transitions once from UNSET to a
>> +	 * vsock_net_mode value via try_cmpxchg on first sysctl write.
>> +	 * While UNSET, vsock_net_child_mode() returns the namespace's
>> +	 * own mode since it's the default.
>> +	 */
>> +	int child_ns_mode;
>>  };
>>  #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 9880756d9eff..f0cb7c6a8212 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -2853,7 +2853,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>>  		    new_mode == VSOCK_NET_MODE_GLOBAL)
>>  			return -EPERM;
>> -		vsock_net_set_child_mode(net, new_mode);
>> +		if (!vsock_net_set_child_mode(net, new_mode))
>> +			return -EBUSY;
>>  	}
>>  	return 0;
>> @@ -2922,7 +2923,7 @@ static void vsock_net_init(struct net *net)
>>  	else
>>  		net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns);
>> -	net->vsock.child_ns_mode = net->vsock.mode;
>> +	net->vsock.child_ns_mode = VSOCK_NET_CHILD_NS_UNSET;
>>  }
>>  static __net_init int vsock_sysctl_init_net(struct net *net)
>>
>> If you like it, please add my Co-developed-by and S-o-b.
>
>Will do!
>
>>
>> BTW, let's discuss here more about it and agree before sending a new
>> version, so this should also allow other to comment eventually.
>>
>> Thanks,
>> Stefano
>
>Tbh, I like your original proposal from v1 best (copied below). I like
>that the whole locking mechanism is self-contained there in one place,
>and doesn't ripple out elsewhere into the code (e.g.,
>vsock_net_child_mode() carrying logic around UNSET). Wdyt?

Initially, yes, I liked that one too, especially because, being a patch 
for net, it remains very small and clear to read. But now, after 
spending some time on how to reuse `child_ns_mode` for that, I also like 
the last version I sent using UNSET so that we don't have the same 
information in two variables.

I'm truly conflicted, but not a strong preference, so if you like more 
the one with `child_ns_mode_locked`, let's go with that, we can always 
change it in the future.

Jacub, Paolo, any preference?

>
>static inline bool vsock_net_set_child_mode(struct net *net,
>					    enum vsock_net_mode mode)
>{
>	int new_locked = mode + 1;
>	int old_locked = 0;

If we are going to use this one, maybe a macro for 0, or a comment here 
+ on top of child_ns_mode_locked should be better.

Thanks,
Stefano

>
>	if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
>			&old_locked, new_locked)) {
>		WRITE_ONCE(net->vsock.child_ns_mode, mode);
>		return true;
>	}
>
>	return old_locked == new_locked;
>}
>
>
>Best,
>Bobby
>
>>
>> > };
>> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > index 9880756d9eff..50044a838c89 100644
>> > --- a/net/vmw_vsock/af_vsock.c
>> > +++ b/net/vmw_vsock/af_vsock.c
>> > @@ -90,16 +90,20 @@
>> >  *
>> >  *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>> >  *     mode, which is set at namespace creation and immutable thereafter.
>> > - *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>> > + *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>> >  *     child namespaces will inherit when created. The initial value matches
>> >  *     the namespace's own ns_mode.
>> >  *
>> >  *   Changing child_ns_mode only affects newly created namespaces, not the
>> >  *   current namespace or existing children. A "local" namespace cannot set
>> > - *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
>> > - *   from the parent's child_ns_mode.
>> > + *   child_ns_mode to "global". child_ns_mode is write-once, so that it may be
>> > + *   configured and locked down by a namespace manager. Writing a different
>> > + *   value after the first write returns -EBUSY. At namespace creation, ns_mode
>> > + *   is inherited from the parent's child_ns_mode.
>> >  *
>> > - *   The init_net mode is "global" and cannot be modified.
>> > + *   The init_net mode is "global" and cannot be modified. The init_net
>> > + *   child_ns_mode is also write-once, so an init process (e.g. systemd) can
>> > + *   set it to "local" to ensure all new namespaces inherit local mode.
>> >  *
>> >  *   The modes affect the allocation and accessibility of CIDs as follows:
>> >  *
>> > @@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>> > 		    new_mode == VSOCK_NET_MODE_GLOBAL)
>> > 			return -EPERM;
>> >
>> > -		vsock_net_set_child_mode(net, new_mode);
>> > +		if (!vsock_net_set_child_mode(net, new_mode))
>> > +			return -EBUSY;
>> > 	}
>> >
>> > 	return 0;
>> >
>> > --
>> > 2.47.3
>> >
>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-19 16:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
2026-02-19 10:35   ` Stefano Garzarella
2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
2026-02-19 10:35   ` Stefano Garzarella
2026-02-19 16:20     ` Bobby Eshleman
2026-02-19 16:36       ` Stefano Garzarella
2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
2026-02-19 10:36   ` Stefano Garzarella
2026-02-19 16:06     ` Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox