* [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode
@ 2026-02-18 18:10 Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
linux-doc, Daan De Meyer
Two administrator processes may race when setting child_ns_mode: one
sets it to "local" and creates a namespace, but another changes it to
"global" in between. The first process ends up with a namespace in the
wrong mode. Make child_ns_mode write-once so that a namespace manager
can set it once, check the value, and be guaranteed it won't change
before creating its namespaces. Writing a different value after the
first write returns -EBUSY.
One patch for the implementation, one for docs, and one for tests.
---
Changes in v2:
- break docs, tests, and implementation into separate patches
- clarify commit message
- only use child_ns_mode, do not add additional child_ns_mode_locked
variable
- add documentation to Documentation/
- Link to v1: https://lore.kernel.org/r/20260217-vsock-ns-write-once-v1-1-a1fb30f289a9@meta.com
---
Bobby Eshleman (3):
selftests/vsock: change tests to respect write-once child ns mode
vsock: lock down child_ns_mode as write-once
vsock: document write-once behavior of the child_ns_mode sysctl
Documentation/admin-guide/sysctl/net.rst | 10 ++++++---
include/net/af_vsock.h | 20 +++++++++++++++---
include/net/netns/vsock.h | 9 +++++++-
net/vmw_vsock/af_vsock.c | 15 +++++++++-----
tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++-----------------
5 files changed, 58 insertions(+), 31 deletions(-)
---
base-commit: ccd8e87748ad083047d6c8544c5809b7f96cc8df
change-id: 20260217-vsock-ns-write-once-8834d684e0a2
Best regards,
--
Bobby Eshleman <bobbyeshleman@meta.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode
2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
@ 2026-02-18 18:10 ` Bobby Eshleman
2026-02-19 10:35 ` Stefano Garzarella
2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
2 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
linux-doc
From: Bobby Eshleman <bobbyeshleman@meta.com>
The child_ns_mode sysctl parameter becomes write-once in a future patch
in this series, which breaks existing tests. This patch updates the
tests to respect this new policy. No additional tests are added.
Add "global-parent" and "local-parent" namespaces as intermediaries to
spawn namespaces in the given modes. This avoids the need to change
"child_ns_mode" in the init_ns. nsenter must be used because ip netns
unshares the mount namespace so nested "ip netns add" breaks exec calls
from the init ns. Adds nsenter to the deps check.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++------------------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index dc8dbe74a6d0..e1e78b295e41 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -210,16 +210,17 @@ check_result() {
}
add_namespaces() {
- local orig_mode
- orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+ ip netns add "global-parent" 2>/dev/null
+ echo "global" | ip netns exec "global-parent" \
+ tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
+ ip netns add "local-parent" 2>/dev/null
+ echo "local" | ip netns exec "local-parent" \
+ tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
- for mode in "${NS_MODES[@]}"; do
- echo "${mode}" > /proc/sys/net/vsock/child_ns_mode
- ip netns add "${mode}0" 2>/dev/null
- ip netns add "${mode}1" 2>/dev/null
- done
-
- echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
+ nsenter --net=/var/run/netns/global-parent ip netns add "global0" 2>/dev/null
+ nsenter --net=/var/run/netns/global-parent ip netns add "global1" 2>/dev/null
+ nsenter --net=/var/run/netns/local-parent ip netns add "local0" 2>/dev/null
+ nsenter --net=/var/run/netns/local-parent ip netns add "local1" 2>/dev/null
}
init_namespaces() {
@@ -237,6 +238,8 @@ del_namespaces() {
log_host "removed ns ${mode}0"
log_host "removed ns ${mode}1"
done
+ ip netns del "global-parent" &>/dev/null
+ ip netns del "local-parent" &>/dev/null
}
vm_ssh() {
@@ -287,7 +290,7 @@ check_args() {
}
check_deps() {
- for dep in vng ${QEMU} busybox pkill ssh ss socat; do
+ for dep in vng ${QEMU} busybox pkill ssh ss socat nsenter; do
if [[ ! -x $(command -v "${dep}") ]]; then
echo -e "skip: dependency ${dep} not found!\n"
exit "${KSFT_SKIP}"
@@ -1231,12 +1234,8 @@ test_ns_local_same_cid_ok() {
}
test_ns_host_vsock_child_ns_mode_ok() {
- local orig_mode
- local rc
-
- orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+ local rc="${KSFT_PASS}"
- rc="${KSFT_PASS}"
for mode in "${NS_MODES[@]}"; do
local ns="${mode}0"
@@ -1246,15 +1245,13 @@ test_ns_host_vsock_child_ns_mode_ok() {
continue
fi
- if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then
- log_host "child_ns_mode should be writable to ${mode}"
+ if ! echo "${mode}" | ip netns exec "${ns}" \
+ tee /proc/sys/net/vsock/child_ns_mode &>/dev/null; then
rc="${KSFT_FAIL}"
continue
fi
done
- echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
-
return "${rc}"
}
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
@ 2026-02-18 18:10 ` Bobby Eshleman
2026-02-19 10:35 ` Stefano Garzarella
2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
2 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
linux-doc, Daan De Meyer
From: Bobby Eshleman <bobbyeshleman@meta.com>
Two administrator processes may race when setting child_ns_mode as one
process sets child_ns_mode to "local" and then creates a namespace, but
another process changes child_ns_mode to "global" between the write and
the namespace creation. The first process ends up with a namespace in
"global" mode instead of "local". While this can be detected after the
fact by reading ns_mode and retrying, it is fragile and error-prone.
Make child_ns_mode write-once so that a namespace manager can set it
once and be sure it won't change. Writing a different value after the
first write returns -EBUSY. This applies to all namespaces, including
init_net, where an init process can write "local" to lock all future
namespaces into local mode.
Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
---
include/net/af_vsock.h | 20 +++++++++++++++++---
include/net/netns/vsock.h | 9 ++++++++-
net/vmw_vsock/af_vsock.c | 15 ++++++++++-----
3 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..9bd42147626d 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
}
-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
enum vsock_net_mode mode)
{
- WRITE_ONCE(net->vsock.child_ns_mode, mode);
+ int locked = mode + VSOCK_NET_MODE_LOCKED;
+ int cur;
+
+ cur = READ_ONCE(net->vsock.child_ns_mode);
+ if (cur == locked)
+ return true;
+ if (cur >= VSOCK_NET_MODE_LOCKED)
+ return false;
+
+ if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
+ return true;
+
+ return cur == locked;
}
static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
{
- return READ_ONCE(net->vsock.child_ns_mode);
+ int mode = READ_ONCE(net->vsock.child_ns_mode);
+
+ return mode & (VSOCK_NET_MODE_LOCKED - 1);
}
/* Return true if two namespaces pass the mode rules. Otherwise, return false.
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..d20ab6269342 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -7,6 +7,7 @@
enum vsock_net_mode {
VSOCK_NET_MODE_GLOBAL,
VSOCK_NET_MODE_LOCAL,
+ VSOCK_NET_MODE_LOCKED,
};
struct netns_vsock {
@@ -16,6 +17,12 @@ struct netns_vsock {
u32 port;
enum vsock_net_mode mode;
- enum vsock_net_mode child_ns_mode;
+
+ /* 0 (GLOBAL)
+ * 1 (LOCAL)
+ * 2 (GLOBAL + LOCKED)
+ * 3 (LOCAL + LOCKED)
+ */
+ int child_ns_mode;
};
#endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..50044a838c89 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -90,16 +90,20 @@
*
* - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
* mode, which is set at namespace creation and immutable thereafter.
- * - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
+ * - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
* child namespaces will inherit when created. The initial value matches
* the namespace's own ns_mode.
*
* Changing child_ns_mode only affects newly created namespaces, not the
* current namespace or existing children. A "local" namespace cannot set
- * child_ns_mode to "global". At namespace creation, ns_mode is inherited
- * from the parent's child_ns_mode.
+ * child_ns_mode to "global". child_ns_mode is write-once, so that it may be
+ * configured and locked down by a namespace manager. Writing a different
+ * value after the first write returns -EBUSY. At namespace creation, ns_mode
+ * is inherited from the parent's child_ns_mode.
*
- * The init_net mode is "global" and cannot be modified.
+ * The init_net mode is "global" and cannot be modified. The init_net
+ * child_ns_mode is also write-once, so an init process (e.g. systemd) can
+ * set it to "local" to ensure all new namespaces inherit local mode.
*
* The modes affect the allocation and accessibility of CIDs as follows:
*
@@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
new_mode == VSOCK_NET_MODE_GLOBAL)
return -EPERM;
- vsock_net_set_child_mode(net, new_mode);
+ if (!vsock_net_set_child_mode(net, new_mode))
+ return -EBUSY;
}
return 0;
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl
2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
@ 2026-02-18 18:10 ` Bobby Eshleman
2026-02-19 10:36 ` Stefano Garzarella
2 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-18 18:10 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Shuah Khan,
Bobby Eshleman, Michael S. Tsirkin, Jonathan Corbet, Shuah Khan
Cc: virtualization, netdev, linux-kernel, kvm, linux-kselftest,
linux-doc
From: Bobby Eshleman <bobbyeshleman@meta.com>
Update the vsock child_ns_mode documentation to include the new the
write-once semantics of setting child_ns_mode. The semantics are
implemented in a different patch in this series.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Documentation/admin-guide/sysctl/net.rst | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index c10530624f1e..976a176fb451 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -581,9 +581,9 @@ The init_net mode is always ``global``.
child_ns_mode
-------------
-Controls what mode newly created child namespaces will inherit. At namespace
-creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
-initial value matches the namespace's own ``ns_mode``.
+Write-once. Controls what mode newly created child namespaces will inherit. At
+namespace creation, ``ns_mode`` is inherited from the parent's
+``child_ns_mode``. The initial value matches the namespace's own ``ns_mode``.
Values:
@@ -594,6 +594,10 @@ Values:
their sockets will only be able to connect within their own
namespace.
+``child_ns_mode`` can only be written once per namespace. Writing the same
+value that is already set succeeds. Writing a different value after the first
+write returns ``-EBUSY``.
+
Changing ``child_ns_mode`` only affects namespaces created after the change;
it does not modify the current namespace or any existing children.
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode
2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
@ 2026-02-19 10:35 ` Stefano Garzarella
0 siblings, 0 replies; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 10:35 UTC (permalink / raw)
To: Bobby Eshleman
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
netdev, linux-kernel, kvm, linux-kselftest, linux-doc
On Wed, Feb 18, 2026 at 10:10:36AM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>The child_ns_mode sysctl parameter becomes write-once in a future patch
>in this series, which breaks existing tests. This patch updates the
>tests to respect this new policy. No additional tests are added.
>
>Add "global-parent" and "local-parent" namespaces as intermediaries to
>spawn namespaces in the given modes. This avoids the need to change
>"child_ns_mode" in the init_ns. nsenter must be used because ip netns
>unshares the mount namespace so nested "ip netns add" breaks exec calls
>from the init ns. Adds nsenter to the deps check.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
> tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++------------------
> 1 file changed, 16 insertions(+), 19 deletions(-)
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
>index dc8dbe74a6d0..e1e78b295e41 100755
>--- a/tools/testing/selftests/vsock/vmtest.sh
>+++ b/tools/testing/selftests/vsock/vmtest.sh
>@@ -210,16 +210,17 @@ check_result() {
> }
>
> add_namespaces() {
>- local orig_mode
>- orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
>+ ip netns add "global-parent" 2>/dev/null
>+ echo "global" | ip netns exec "global-parent" \
>+ tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
>+ ip netns add "local-parent" 2>/dev/null
>+ echo "local" | ip netns exec "local-parent" \
>+ tee /proc/sys/net/vsock/child_ns_mode &>/dev/null
>
>- for mode in "${NS_MODES[@]}"; do
>- echo "${mode}" > /proc/sys/net/vsock/child_ns_mode
>- ip netns add "${mode}0" 2>/dev/null
>- ip netns add "${mode}1" 2>/dev/null
>- done
>-
>- echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
>+ nsenter --net=/var/run/netns/global-parent ip netns add "global0" 2>/dev/null
>+ nsenter --net=/var/run/netns/global-parent ip netns add "global1" 2>/dev/null
>+ nsenter --net=/var/run/netns/local-parent ip netns add "local0" 2>/dev/null
>+ nsenter --net=/var/run/netns/local-parent ip netns add "local1" 2>/dev/null
> }
>
> init_namespaces() {
>@@ -237,6 +238,8 @@ del_namespaces() {
> log_host "removed ns ${mode}0"
> log_host "removed ns ${mode}1"
> done
>+ ip netns del "global-parent" &>/dev/null
>+ ip netns del "local-parent" &>/dev/null
> }
>
> vm_ssh() {
>@@ -287,7 +290,7 @@ check_args() {
> }
>
> check_deps() {
>- for dep in vng ${QEMU} busybox pkill ssh ss socat; do
>+ for dep in vng ${QEMU} busybox pkill ssh ss socat nsenter; do
> if [[ ! -x $(command -v "${dep}") ]]; then
> echo -e "skip: dependency ${dep} not found!\n"
> exit "${KSFT_SKIP}"
>@@ -1231,12 +1234,8 @@ test_ns_local_same_cid_ok() {
> }
>
> test_ns_host_vsock_child_ns_mode_ok() {
>- local orig_mode
>- local rc
>-
>- orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
>+ local rc="${KSFT_PASS}"
>
>- rc="${KSFT_PASS}"
> for mode in "${NS_MODES[@]}"; do
> local ns="${mode}0"
>
>@@ -1246,15 +1245,13 @@ test_ns_host_vsock_child_ns_mode_ok() {
> continue
> fi
>
>- if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then
>- log_host "child_ns_mode should be writable to ${mode}"
>+ if ! echo "${mode}" | ip netns exec "${ns}" \
>+ tee /proc/sys/net/vsock/child_ns_mode &>/dev/null; then
> rc="${KSFT_FAIL}"
> continue
> fi
> done
>
>- echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
>-
> return "${rc}"
> }
>
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
@ 2026-02-19 10:35 ` Stefano Garzarella
2026-02-19 16:20 ` Bobby Eshleman
0 siblings, 1 reply; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 10:35 UTC (permalink / raw)
To: Bobby Eshleman
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
netdev, linux-kernel, kvm, linux-kselftest, linux-doc,
Daan De Meyer
On Wed, Feb 18, 2026 at 10:10:37AM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Two administrator processes may race when setting child_ns_mode as one
>process sets child_ns_mode to "local" and then creates a namespace, but
>another process changes child_ns_mode to "global" between the write and
>the namespace creation. The first process ends up with a namespace in
>"global" mode instead of "local". While this can be detected after the
>fact by reading ns_mode and retrying, it is fragile and error-prone.
>
>Make child_ns_mode write-once so that a namespace manager can set it
>once and be sure it won't change. Writing a different value after the
>first write returns -EBUSY. This applies to all namespaces, including
>init_net, where an init process can write "local" to lock all future
>namespaces into local mode.
>
>Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
nit: usually the S-o-b of the author is the last when sending a patch.
>---
> include/net/af_vsock.h | 20 +++++++++++++++++---
> include/net/netns/vsock.h | 9 ++++++++-
> net/vmw_vsock/af_vsock.c | 15 ++++++++++-----
> 3 files changed, 35 insertions(+), 9 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index d3ff48a2fbe0..9bd42147626d 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> }
>
>-static inline void vsock_net_set_child_mode(struct net *net,
>+static inline bool vsock_net_set_child_mode(struct net *net,
> enum vsock_net_mode mode)
> {
>- WRITE_ONCE(net->vsock.child_ns_mode, mode);
>+ int locked = mode + VSOCK_NET_MODE_LOCKED;
>+ int cur;
>+
>+ cur = READ_ONCE(net->vsock.child_ns_mode);
>+ if (cur == locked)
>+ return true;
>+ if (cur >= VSOCK_NET_MODE_LOCKED)
>+ return false;
>+
>+ if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
>+ return true;
>+
>+ return cur == locked;
Sorry, it took me a while to get it entirely :-(
This overcomplication is exactly what I wanted to avoid when I proposed
the change in v1:
https://lore.kernel.org/netdev/aZWUmbiH11Eh3Y4v@sgarzare-redhat/
> }
>
> static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> {
>- return READ_ONCE(net->vsock.child_ns_mode);
>+ int mode = READ_ONCE(net->vsock.child_ns_mode);
>+
>+ return mode & (VSOCK_NET_MODE_LOCKED - 1);
This is working just because VSOCK_NET_MODE_LOCKED == 2, so IMO this
should at least set as value in the enum and documented on top of
vsock_net_mode.
> }
>
> /* Return true if two namespaces pass the mode rules. Otherwise, return false.
>diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>index b34d69a22fa8..d20ab6269342 100644
>--- a/include/net/netns/vsock.h
>+++ b/include/net/netns/vsock.h
>@@ -7,6 +7,7 @@
> enum vsock_net_mode {
> VSOCK_NET_MODE_GLOBAL,
> VSOCK_NET_MODE_LOCAL,
>+ VSOCK_NET_MODE_LOCKED,
This is not really a mode, so IMO should not be part of `enum
vsock_net_mode`. If you really want it, maybe we can add both
VSOCK_NET_MODE_GLOBAL_LOCKED and VSOCK_NET_MODE_LOCAL_LOCKED, which can
be less error prone if we will touch this enum one day.
> };
>
> struct netns_vsock {
>@@ -16,6 +17,12 @@ struct netns_vsock {
> u32 port;
>
> enum vsock_net_mode mode;
>- enum vsock_net_mode child_ns_mode;
>+
>+ /* 0 (GLOBAL)
>+ * 1 (LOCAL)
>+ * 2 (GLOBAL + LOCKED)
>+ * 3 (LOCAL + LOCKED)
>+ */
>+ int child_ns_mode;
Sorry, I don't like this too much, since it seems too complicated to
read and to maintain, If we really want to use just one variable, maybe
we can use -1 as UNSET for child_ns_mode. If it is UNSET,
vsock_net_child_mode() can just return `mode` since it's the default
that we also documented, if it's set, it means that is locked with the
value specified.
Maybe with code is easier, I mean something like this:
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..fcd5b538df35 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,15 +276,25 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
}
-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
enum vsock_net_mode mode)
{
- WRITE_ONCE(net->vsock.child_ns_mode, mode);
+ int old = VSOCK_NET_CHILD_NS_UNSET;
+
+ if (try_cmpxchg(&net->vsock.child_ns_mode, &old, mode))
+ return true;
+
+ return old == mode;
}
static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
{
- return READ_ONCE(net->vsock.child_ns_mode);
+ int mode = READ_ONCE(net->vsock.child_ns_mode);
+
+ if (mode == VSOCK_NET_CHILD_NS_UNSET)
+ return net->vsock.mode;
+
+ return mode;
}
/* Return true if two namespaces pass the mode rules. Otherwise, return false.
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..bf52baf7d7a7 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -9,6 +9,8 @@ enum vsock_net_mode {
VSOCK_NET_MODE_LOCAL,
};
+#define VSOCK_NET_CHILD_NS_UNSET (-1)
+
struct netns_vsock {
struct ctl_table_header *sysctl_hdr;
@@ -16,6 +18,13 @@ struct netns_vsock {
u32 port;
enum vsock_net_mode mode;
- enum vsock_net_mode child_ns_mode;
+
+ /* Write-once child namespace mode, must be initialized to
+ * VSOCK_NET_CHILD_NS_UNSET. Transitions once from UNSET to a
+ * vsock_net_mode value via try_cmpxchg on first sysctl write.
+ * While UNSET, vsock_net_child_mode() returns the namespace's
+ * own mode since it's the default.
+ */
+ int child_ns_mode;
};
#endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..f0cb7c6a8212 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2853,7 +2853,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
new_mode == VSOCK_NET_MODE_GLOBAL)
return -EPERM;
- vsock_net_set_child_mode(net, new_mode);
+ if (!vsock_net_set_child_mode(net, new_mode))
+ return -EBUSY;
}
return 0;
@@ -2922,7 +2923,7 @@ static void vsock_net_init(struct net *net)
else
net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns);
- net->vsock.child_ns_mode = net->vsock.mode;
+ net->vsock.child_ns_mode = VSOCK_NET_CHILD_NS_UNSET;
}
static __net_init int vsock_sysctl_init_net(struct net *net)
If you like it, please add my Co-developed-by and S-o-b.
BTW, let's discuss here more about it and agree before sending a new
version, so this should also allow other to comment eventually.
Thanks,
Stefano
> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 9880756d9eff..50044a838c89 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -90,16 +90,20 @@
> *
> * - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
> * mode, which is set at namespace creation and immutable thereafter.
>- * - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>+ * - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
> * child namespaces will inherit when created. The initial value matches
> * the namespace's own ns_mode.
> *
> * Changing child_ns_mode only affects newly created namespaces, not the
> * current namespace or existing children. A "local" namespace cannot set
>- * child_ns_mode to "global". At namespace creation, ns_mode is inherited
>- * from the parent's child_ns_mode.
>+ * child_ns_mode to "global". child_ns_mode is write-once, so that it may be
>+ * configured and locked down by a namespace manager. Writing a different
>+ * value after the first write returns -EBUSY. At namespace creation, ns_mode
>+ * is inherited from the parent's child_ns_mode.
> *
>- * The init_net mode is "global" and cannot be modified.
>+ * The init_net mode is "global" and cannot be modified. The init_net
>+ * child_ns_mode is also write-once, so an init process (e.g. systemd) can
>+ * set it to "local" to ensure all new namespaces inherit local mode.
> *
> * The modes affect the allocation and accessibility of CIDs as follows:
> *
>@@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> new_mode == VSOCK_NET_MODE_GLOBAL)
> return -EPERM;
>
>- vsock_net_set_child_mode(net, new_mode);
>+ if (!vsock_net_set_child_mode(net, new_mode))
>+ return -EBUSY;
> }
>
> return 0;
>
>--
>2.47.3
>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl
2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
@ 2026-02-19 10:36 ` Stefano Garzarella
2026-02-19 16:06 ` Bobby Eshleman
0 siblings, 1 reply; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 10:36 UTC (permalink / raw)
To: Bobby Eshleman
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
netdev, linux-kernel, kvm, linux-kselftest, linux-doc
On Wed, Feb 18, 2026 at 10:10:38AM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Update the vsock child_ns_mode documentation to include the new the
nit: s/the new the/the new
>write-once semantics of setting child_ns_mode. The semantics are
>implemented in a different patch in this series.
s/different/preceding ?
IMO this can be squashed with the previous patch, but not sure netdev
policy about that. Not a strong opinion, it's fine also in this way.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
> Documentation/admin-guide/sysctl/net.rst | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
>diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
>index c10530624f1e..976a176fb451 100644
>--- a/Documentation/admin-guide/sysctl/net.rst
>+++ b/Documentation/admin-guide/sysctl/net.rst
>@@ -581,9 +581,9 @@ The init_net mode is always ``global``.
> child_ns_mode
> -------------
>
>-Controls what mode newly created child namespaces will inherit. At namespace
>-creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
>-initial value matches the namespace's own ``ns_mode``.
>+Write-once. Controls what mode newly created child namespaces will inherit. At
>+namespace creation, ``ns_mode`` is inherited from the parent's
>+``child_ns_mode``. The initial value matches the namespace's own ``ns_mode``.
>
> Values:
>
>@@ -594,6 +594,10 @@ Values:
> their sockets will only be able to connect within their own
> namespace.
>
>+``child_ns_mode`` can only be written once per namespace. Writing the same
>+value that is already set succeeds. Writing a different value after the first
>+write returns ``-EBUSY``.
nit: instead of saying that it can only be written once, we could say
that the first write locks the value, to be closer to the actual
behavior, something like this:
The first write to ``child_ns_mode`` locks its value. Subsequent
writes of the same value succeed, but writing a different value
returns ``-EBUSY``.
Thanks,
Stefano
>+
> Changing ``child_ns_mode`` only affects namespaces created after the change;
> it does not modify the current namespace or any existing children.
>
>
>--
>2.47.3
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl
2026-02-19 10:36 ` Stefano Garzarella
@ 2026-02-19 16:06 ` Bobby Eshleman
0 siblings, 0 replies; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-19 16:06 UTC (permalink / raw)
To: Stefano Garzarella
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
netdev, linux-kernel, kvm, linux-kselftest, linux-doc
On Thu, Feb 19, 2026 at 11:36:40AM +0100, Stefano Garzarella wrote:
> On Wed, Feb 18, 2026 at 10:10:38AM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> >
> > Update the vsock child_ns_mode documentation to include the new the
>
> nit: s/the new the/the new
>
> > write-once semantics of setting child_ns_mode. The semantics are
> > implemented in a different patch in this series.
>
> s/different/preceding ?
>
> IMO this can be squashed with the previous patch, but not sure netdev policy
> about that. Not a strong opinion, it's fine also in this way.
>
> >
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Documentation/admin-guide/sysctl/net.rst | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
> > index c10530624f1e..976a176fb451 100644
> > --- a/Documentation/admin-guide/sysctl/net.rst
> > +++ b/Documentation/admin-guide/sysctl/net.rst
> > @@ -581,9 +581,9 @@ The init_net mode is always ``global``.
> > child_ns_mode
> > -------------
> >
> > -Controls what mode newly created child namespaces will inherit. At namespace
> > -creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
> > -initial value matches the namespace's own ``ns_mode``.
> > +Write-once. Controls what mode newly created child namespaces will inherit. At
> > +namespace creation, ``ns_mode`` is inherited from the parent's
> > +``child_ns_mode``. The initial value matches the namespace's own ``ns_mode``.
> >
> > Values:
> >
> > @@ -594,6 +594,10 @@ Values:
> > their sockets will only be able to connect within their own
> > namespace.
> >
> > +``child_ns_mode`` can only be written once per namespace. Writing the same
> > +value that is already set succeeds. Writing a different value after the first
> > +write returns ``-EBUSY``.
>
> nit: instead of saying that it can only be written once, we could say that
> the first write locks the value, to be closer to the actual behavior,
> something like this:
>
> The first write to ``child_ns_mode`` locks its value. Subsequent
> writes of the same value succeed, but writing a different value
> returns ``-EBUSY``.
>
>
> Thanks,
> Stefano
Sounds good! I agree that is more clear. I'll also remove the change
above that adds "Write-once" at the beginning of the paragraph, since
this clause does a better job explaining how it actually works.
>
> > +
> > Changing ``child_ns_mode`` only affects namespaces created after the change;
> > it does not modify the current namespace or any existing children.
> >
> >
> > --
> > 2.47.3
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
2026-02-19 10:35 ` Stefano Garzarella
@ 2026-02-19 16:20 ` Bobby Eshleman
2026-02-19 16:36 ` Stefano Garzarella
0 siblings, 1 reply; 10+ messages in thread
From: Bobby Eshleman @ 2026-02-19 16:20 UTC (permalink / raw)
To: Stefano Garzarella
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
netdev, linux-kernel, kvm, linux-kselftest, linux-doc,
Daan De Meyer
On Thu, Feb 19, 2026 at 11:35:52AM +0100, Stefano Garzarella wrote:
> On Wed, Feb 18, 2026 at 10:10:37AM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> >
> > Two administrator processes may race when setting child_ns_mode as one
> > process sets child_ns_mode to "local" and then creates a namespace, but
> > another process changes child_ns_mode to "global" between the write and
> > the namespace creation. The first process ends up with a namespace in
> > "global" mode instead of "local". While this can be detected after the
> > fact by reading ns_mode and retrying, it is fragile and error-prone.
> >
> > Make child_ns_mode write-once so that a namespace manager can set it
> > once and be sure it won't change. Writing a different value after the
> > first write returns -EBUSY. This applies to all namespaces, including
> > init_net, where an init process can write "local" to lock all future
> > namespaces into local mode.
> >
> > Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
> > Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>
> nit: usually the S-o-b of the author is the last when sending a patch.
Ah good to know, thanks. Will change.
>
> > ---
> > include/net/af_vsock.h | 20 +++++++++++++++++---
> > include/net/netns/vsock.h | 9 ++++++++-
> > net/vmw_vsock/af_vsock.c | 15 ++++++++++-----
> > 3 files changed, 35 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index d3ff48a2fbe0..9bd42147626d 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> > return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> > }
> >
> > -static inline void vsock_net_set_child_mode(struct net *net,
> > +static inline bool vsock_net_set_child_mode(struct net *net,
> > enum vsock_net_mode mode)
> > {
> > - WRITE_ONCE(net->vsock.child_ns_mode, mode);
> > + int locked = mode + VSOCK_NET_MODE_LOCKED;
> > + int cur;
> > +
> > + cur = READ_ONCE(net->vsock.child_ns_mode);
> > + if (cur == locked)
> > + return true;
> > + if (cur >= VSOCK_NET_MODE_LOCKED)
> > + return false;
> > +
> > + if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
> > + return true;
> > +
> > + return cur == locked;
>
> Sorry, it took me a while to get it entirely :-(
> This overcomplication is exactly what I wanted to avoid when I proposed the
> change in v1:
> https://lore.kernel.org/netdev/aZWUmbiH11Eh3Y4v@sgarzare-redhat/
Glad you thought so too, because I actually think your original proposed
snippet in that thread is the best/simplest so far.
>
>
> > }
> >
> > static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> > {
> > - return READ_ONCE(net->vsock.child_ns_mode);
> > + int mode = READ_ONCE(net->vsock.child_ns_mode);
> > +
> > + return mode & (VSOCK_NET_MODE_LOCKED - 1);
>
> This is working just because VSOCK_NET_MODE_LOCKED == 2, so IMO this should
> at least set as value in the enum and documented on top of vsock_net_mode.
>
> > }
> >
> > /* Return true if two namespaces pass the mode rules. Otherwise, return false.
> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> > index b34d69a22fa8..d20ab6269342 100644
> > --- a/include/net/netns/vsock.h
> > +++ b/include/net/netns/vsock.h
> > @@ -7,6 +7,7 @@
> > enum vsock_net_mode {
> > VSOCK_NET_MODE_GLOBAL,
> > VSOCK_NET_MODE_LOCAL,
> > + VSOCK_NET_MODE_LOCKED,
>
> This is not really a mode, so IMO should not be part of `enum
> vsock_net_mode`. If you really want it, maybe we can add both
> VSOCK_NET_MODE_GLOBAL_LOCKED and VSOCK_NET_MODE_LOCAL_LOCKED, which can be
> less error prone if we will touch this enum one day.
>
> > };
> >
> > struct netns_vsock {
> > @@ -16,6 +17,12 @@ struct netns_vsock {
> > u32 port;
> >
> > enum vsock_net_mode mode;
> > - enum vsock_net_mode child_ns_mode;
> > +
> > + /* 0 (GLOBAL)
> > + * 1 (LOCAL)
> > + * 2 (GLOBAL + LOCKED)
> > + * 3 (LOCAL + LOCKED)
> > + */
> > + int child_ns_mode;
>
> Sorry, I don't like this too much, since it seems too complicated to read
> and to maintain, If we really want to use just one variable, maybe we can
> use -1 as UNSET for child_ns_mode. If it is UNSET, vsock_net_child_mode()
> can just return `mode` since it's the default that we also documented, if
> it's set, it means that is locked with the value specified.
>
> Maybe with code is easier, I mean something like this:
>
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index d3ff48a2fbe0..fcd5b538df35 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -276,15 +276,25 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
> return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
> }
> -static inline void vsock_net_set_child_mode(struct net *net,
> +static inline bool vsock_net_set_child_mode(struct net *net,
> enum vsock_net_mode mode)
> {
> - WRITE_ONCE(net->vsock.child_ns_mode, mode);
> + int old = VSOCK_NET_CHILD_NS_UNSET;
> +
> + if (try_cmpxchg(&net->vsock.child_ns_mode, &old, mode))
> + return true;
> +
> + return old == mode;
> }
> static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
> {
> - return READ_ONCE(net->vsock.child_ns_mode);
> + int mode = READ_ONCE(net->vsock.child_ns_mode);
> +
> + if (mode == VSOCK_NET_CHILD_NS_UNSET)
> + return net->vsock.mode;
> +
> + return mode;
> }
> /* Return true if two namespaces pass the mode rules. Otherwise, return false.
> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
> index b34d69a22fa8..bf52baf7d7a7 100644
> --- a/include/net/netns/vsock.h
> +++ b/include/net/netns/vsock.h
> @@ -9,6 +9,8 @@ enum vsock_net_mode {
> VSOCK_NET_MODE_LOCAL,
> };
> +#define VSOCK_NET_CHILD_NS_UNSET (-1)
> +
> struct netns_vsock {
> struct ctl_table_header *sysctl_hdr;
> @@ -16,6 +18,13 @@ struct netns_vsock {
> u32 port;
> enum vsock_net_mode mode;
> - enum vsock_net_mode child_ns_mode;
> +
> + /* Write-once child namespace mode, must be initialized to
> + * VSOCK_NET_CHILD_NS_UNSET. Transitions once from UNSET to a
> + * vsock_net_mode value via try_cmpxchg on first sysctl write.
> + * While UNSET, vsock_net_child_mode() returns the namespace's
> + * own mode since it's the default.
> + */
> + int child_ns_mode;
> };
> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 9880756d9eff..f0cb7c6a8212 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -2853,7 +2853,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> new_mode == VSOCK_NET_MODE_GLOBAL)
> return -EPERM;
> - vsock_net_set_child_mode(net, new_mode);
> + if (!vsock_net_set_child_mode(net, new_mode))
> + return -EBUSY;
> }
> return 0;
> @@ -2922,7 +2923,7 @@ static void vsock_net_init(struct net *net)
> else
> net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns);
> - net->vsock.child_ns_mode = net->vsock.mode;
> + net->vsock.child_ns_mode = VSOCK_NET_CHILD_NS_UNSET;
> }
> static __net_init int vsock_sysctl_init_net(struct net *net)
>
> If you like it, please add my Co-developed-by and S-o-b.
Will do!
>
> BTW, let's discuss here more about it and agree before sending a new
> version, so this should also allow other to comment eventually.
>
> Thanks,
> Stefano
Tbh, I like your original proposal from v1 best (copied below). I like
that the whole locking mechanism is self-contained there in one place,
and doesn't ripple out elsewhere into the code (e.g.,
vsock_net_child_mode() carrying logic around UNSET). Wdyt?
static inline bool vsock_net_set_child_mode(struct net *net,
enum vsock_net_mode mode)
{
int new_locked = mode + 1;
int old_locked = 0;
if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
&old_locked, new_locked)) {
WRITE_ONCE(net->vsock.child_ns_mode, mode);
return true;
}
return old_locked == new_locked;
}
Best,
Bobby
>
> > };
> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 9880756d9eff..50044a838c89 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -90,16 +90,20 @@
> > *
> > * - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
> > * mode, which is set at namespace creation and immutable thereafter.
> > - * - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
> > + * - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
> > * child namespaces will inherit when created. The initial value matches
> > * the namespace's own ns_mode.
> > *
> > * Changing child_ns_mode only affects newly created namespaces, not the
> > * current namespace or existing children. A "local" namespace cannot set
> > - * child_ns_mode to "global". At namespace creation, ns_mode is inherited
> > - * from the parent's child_ns_mode.
> > + * child_ns_mode to "global". child_ns_mode is write-once, so that it may be
> > + * configured and locked down by a namespace manager. Writing a different
> > + * value after the first write returns -EBUSY. At namespace creation, ns_mode
> > + * is inherited from the parent's child_ns_mode.
> > *
> > - * The init_net mode is "global" and cannot be modified.
> > + * The init_net mode is "global" and cannot be modified. The init_net
> > + * child_ns_mode is also write-once, so an init process (e.g. systemd) can
> > + * set it to "local" to ensure all new namespaces inherit local mode.
> > *
> > * The modes affect the allocation and accessibility of CIDs as follows:
> > *
> > @@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
> > new_mode == VSOCK_NET_MODE_GLOBAL)
> > return -EPERM;
> >
> > - vsock_net_set_child_mode(net, new_mode);
> > + if (!vsock_net_set_child_mode(net, new_mode))
> > + return -EBUSY;
> > }
> >
> > return 0;
> >
> > --
> > 2.47.3
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once
2026-02-19 16:20 ` Bobby Eshleman
@ 2026-02-19 16:36 ` Stefano Garzarella
0 siblings, 0 replies; 10+ messages in thread
From: Stefano Garzarella @ 2026-02-19 16:36 UTC (permalink / raw)
To: Bobby Eshleman, Jakub Kicinski, Paolo Abeni
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, Shuah Khan, Bobby Eshleman,
Michael S. Tsirkin, Jonathan Corbet, Shuah Khan, virtualization,
netdev, linux-kernel, kvm, linux-kselftest, linux-doc,
Daan De Meyer
On Thu, Feb 19, 2026 at 08:20:54AM -0800, Bobby Eshleman wrote:
>On Thu, Feb 19, 2026 at 11:35:52AM +0100, Stefano Garzarella wrote:
>> On Wed, Feb 18, 2026 at 10:10:37AM -0800, Bobby Eshleman wrote:
>> > From: Bobby Eshleman <bobbyeshleman@meta.com>
>> >
>> > Two administrator processes may race when setting child_ns_mode as one
>> > process sets child_ns_mode to "local" and then creates a namespace, but
>> > another process changes child_ns_mode to "global" between the write and
>> > the namespace creation. The first process ends up with a namespace in
>> > "global" mode instead of "local". While this can be detected after the
>> > fact by reading ns_mode and retrying, it is fragile and error-prone.
>> >
>> > Make child_ns_mode write-once so that a namespace manager can set it
>> > once and be sure it won't change. Writing a different value after the
>> > first write returns -EBUSY. This applies to all namespaces, including
>> > init_net, where an init process can write "local" to lock all future
>> > namespaces into local mode.
>> >
>> > Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
>> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>> > Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
>> > Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>>
>> nit: usually the S-o-b of the author is the last when sending a patch.
>
>Ah good to know, thanks. Will change.
>
>>
>> > ---
>> > include/net/af_vsock.h | 20 +++++++++++++++++---
>> > include/net/netns/vsock.h | 9 ++++++++-
>> > net/vmw_vsock/af_vsock.c | 15 ++++++++++-----
>> > 3 files changed, 35 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > index d3ff48a2fbe0..9bd42147626d 100644
>> > --- a/include/net/af_vsock.h
>> > +++ b/include/net/af_vsock.h
>> > @@ -276,15 +276,29 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>> > return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>> > }
>> >
>> > -static inline void vsock_net_set_child_mode(struct net *net,
>> > +static inline bool vsock_net_set_child_mode(struct net *net,
>> > enum vsock_net_mode mode)
>> > {
>> > - WRITE_ONCE(net->vsock.child_ns_mode, mode);
>> > + int locked = mode + VSOCK_NET_MODE_LOCKED;
>> > + int cur;
>> > +
>> > + cur = READ_ONCE(net->vsock.child_ns_mode);
>> > + if (cur == locked)
>> > + return true;
>> > + if (cur >= VSOCK_NET_MODE_LOCKED)
>> > + return false;
>> > +
>> > + if (try_cmpxchg(&net->vsock.child_ns_mode, &cur, locked))
>> > + return true;
>> > +
>> > + return cur == locked;
>>
>> Sorry, it took me a while to get it entirely :-(
>> This overcomplication is exactly what I wanted to avoid when I proposed the
>> change in v1:
>> https://lore.kernel.org/netdev/aZWUmbiH11Eh3Y4v@sgarzare-redhat/
>
>Glad you thought so too, because I actually think your original proposed
>snippet in that thread is the best/simplest so far.
>
>>
>>
>> > }
>> >
>> > static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>> > {
>> > - return READ_ONCE(net->vsock.child_ns_mode);
>> > + int mode = READ_ONCE(net->vsock.child_ns_mode);
>> > +
>> > + return mode & (VSOCK_NET_MODE_LOCKED - 1);
>>
>> This is working just because VSOCK_NET_MODE_LOCKED == 2, so IMO this should
>> at least set as value in the enum and documented on top of vsock_net_mode.
>>
>> > }
>> >
>> > /* Return true if two namespaces pass the mode rules. Otherwise, return false.
>> > diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>> > index b34d69a22fa8..d20ab6269342 100644
>> > --- a/include/net/netns/vsock.h
>> > +++ b/include/net/netns/vsock.h
>> > @@ -7,6 +7,7 @@
>> > enum vsock_net_mode {
>> > VSOCK_NET_MODE_GLOBAL,
>> > VSOCK_NET_MODE_LOCAL,
>> > + VSOCK_NET_MODE_LOCKED,
>>
>> This is not really a mode, so IMO should not be part of `enum
>> vsock_net_mode`. If you really want it, maybe we can add both
>> VSOCK_NET_MODE_GLOBAL_LOCKED and VSOCK_NET_MODE_LOCAL_LOCKED, which can be
>> less error prone if we will touch this enum one day.
>>
>> > };
>> >
>> > struct netns_vsock {
>> > @@ -16,6 +17,12 @@ struct netns_vsock {
>> > u32 port;
>> >
>> > enum vsock_net_mode mode;
>> > - enum vsock_net_mode child_ns_mode;
>> > +
>> > + /* 0 (GLOBAL)
>> > + * 1 (LOCAL)
>> > + * 2 (GLOBAL + LOCKED)
>> > + * 3 (LOCAL + LOCKED)
>> > + */
>> > + int child_ns_mode;
>>
>> Sorry, I don't like this too much, since it seems too complicated to read
>> and to maintain, If we really want to use just one variable, maybe we can
>> use -1 as UNSET for child_ns_mode. If it is UNSET, vsock_net_child_mode()
>> can just return `mode` since it's the default that we also documented, if
>> it's set, it means that is locked with the value specified.
>>
>> Maybe with code is easier, I mean something like this:
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index d3ff48a2fbe0..fcd5b538df35 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -276,15 +276,25 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
>> return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
>> }
>> -static inline void vsock_net_set_child_mode(struct net *net,
>> +static inline bool vsock_net_set_child_mode(struct net *net,
>> enum vsock_net_mode mode)
>> {
>> - WRITE_ONCE(net->vsock.child_ns_mode, mode);
>> + int old = VSOCK_NET_CHILD_NS_UNSET;
>> +
>> + if (try_cmpxchg(&net->vsock.child_ns_mode, &old, mode))
>> + return true;
>> +
>> + return old == mode;
>> }
>> static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
>> {
>> - return READ_ONCE(net->vsock.child_ns_mode);
>> + int mode = READ_ONCE(net->vsock.child_ns_mode);
>> +
>> + if (mode == VSOCK_NET_CHILD_NS_UNSET)
>> + return net->vsock.mode;
>> +
>> + return mode;
>> }
>> /* Return true if two namespaces pass the mode rules. Otherwise, return false.
>> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
>> index b34d69a22fa8..bf52baf7d7a7 100644
>> --- a/include/net/netns/vsock.h
>> +++ b/include/net/netns/vsock.h
>> @@ -9,6 +9,8 @@ enum vsock_net_mode {
>> VSOCK_NET_MODE_LOCAL,
>> };
>> +#define VSOCK_NET_CHILD_NS_UNSET (-1)
>> +
>> struct netns_vsock {
>> struct ctl_table_header *sysctl_hdr;
>> @@ -16,6 +18,13 @@ struct netns_vsock {
>> u32 port;
>> enum vsock_net_mode mode;
>> - enum vsock_net_mode child_ns_mode;
>> +
>> + /* Write-once child namespace mode, must be initialized to
>> + * VSOCK_NET_CHILD_NS_UNSET. Transitions once from UNSET to a
>> + * vsock_net_mode value via try_cmpxchg on first sysctl write.
>> + * While UNSET, vsock_net_child_mode() returns the namespace's
>> + * own mode since it's the default.
>> + */
>> + int child_ns_mode;
>> };
>> #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 9880756d9eff..f0cb7c6a8212 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -2853,7 +2853,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>> new_mode == VSOCK_NET_MODE_GLOBAL)
>> return -EPERM;
>> - vsock_net_set_child_mode(net, new_mode);
>> + if (!vsock_net_set_child_mode(net, new_mode))
>> + return -EBUSY;
>> }
>> return 0;
>> @@ -2922,7 +2923,7 @@ static void vsock_net_init(struct net *net)
>> else
>> net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns);
>> - net->vsock.child_ns_mode = net->vsock.mode;
>> + net->vsock.child_ns_mode = VSOCK_NET_CHILD_NS_UNSET;
>> }
>> static __net_init int vsock_sysctl_init_net(struct net *net)
>>
>> If you like it, please add my Co-developed-by and S-o-b.
>
>Will do!
>
>>
>> BTW, let's discuss here more about it and agree before sending a new
>> version, so this should also allow other to comment eventually.
>>
>> Thanks,
>> Stefano
>
>Tbh, I like your original proposal from v1 best (copied below). I like
>that the whole locking mechanism is self-contained there in one place,
>and doesn't ripple out elsewhere into the code (e.g.,
>vsock_net_child_mode() carrying logic around UNSET). Wdyt?
Initially, yes, I liked that one too, especially because, being a patch
for net, it remains very small and clear to read. But now, after
spending some time on how to reuse `child_ns_mode` for that, I also like
the last version I sent using UNSET so that we don't have the same
information in two variables.
I'm truly conflicted, but not a strong preference, so if you like more
the one with `child_ns_mode_locked`, let's go with that, we can always
change it in the future.
Jacub, Paolo, any preference?
>
>static inline bool vsock_net_set_child_mode(struct net *net,
> enum vsock_net_mode mode)
>{
> int new_locked = mode + 1;
> int old_locked = 0;
If we are going to use this one, maybe a macro for 0, or a comment here
+ on top of child_ns_mode_locked should be better.
Thanks,
Stefano
>
> if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
> &old_locked, new_locked)) {
> WRITE_ONCE(net->vsock.child_ns_mode, mode);
> return true;
> }
>
> return old_locked == new_locked;
>}
>
>
>Best,
>Bobby
>
>>
>> > };
>> > #endif /* __NET_NET_NAMESPACE_VSOCK_H */
>> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > index 9880756d9eff..50044a838c89 100644
>> > --- a/net/vmw_vsock/af_vsock.c
>> > +++ b/net/vmw_vsock/af_vsock.c
>> > @@ -90,16 +90,20 @@
>> > *
>> > * - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
>> > * mode, which is set at namespace creation and immutable thereafter.
>> > - * - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
>> > + * - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
>> > * child namespaces will inherit when created. The initial value matches
>> > * the namespace's own ns_mode.
>> > *
>> > * Changing child_ns_mode only affects newly created namespaces, not the
>> > * current namespace or existing children. A "local" namespace cannot set
>> > - * child_ns_mode to "global". At namespace creation, ns_mode is inherited
>> > - * from the parent's child_ns_mode.
>> > + * child_ns_mode to "global". child_ns_mode is write-once, so that it may be
>> > + * configured and locked down by a namespace manager. Writing a different
>> > + * value after the first write returns -EBUSY. At namespace creation, ns_mode
>> > + * is inherited from the parent's child_ns_mode.
>> > *
>> > - * The init_net mode is "global" and cannot be modified.
>> > + * The init_net mode is "global" and cannot be modified. The init_net
>> > + * child_ns_mode is also write-once, so an init process (e.g. systemd) can
>> > + * set it to "local" to ensure all new namespaces inherit local mode.
>> > *
>> > * The modes affect the allocation and accessibility of CIDs as follows:
>> > *
>> > @@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
>> > new_mode == VSOCK_NET_MODE_GLOBAL)
>> > return -EPERM;
>> >
>> > - vsock_net_set_child_mode(net, new_mode);
>> > + if (!vsock_net_set_child_mode(net, new_mode))
>> > + return -EBUSY;
>> > }
>> >
>> > return 0;
>> >
>> > --
>> > 2.47.3
>> >
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-02-19 16:37 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 18:10 [PATCH net v2 0/3] vsock: add write-once semantics to child_ns_mode Bobby Eshleman
2026-02-18 18:10 ` [PATCH net v2 1/3] selftests/vsock: change tests to respect write-once child ns mode Bobby Eshleman
2026-02-19 10:35 ` Stefano Garzarella
2026-02-18 18:10 ` [PATCH net v2 2/3] vsock: lock down child_ns_mode as write-once Bobby Eshleman
2026-02-19 10:35 ` Stefano Garzarella
2026-02-19 16:20 ` Bobby Eshleman
2026-02-19 16:36 ` Stefano Garzarella
2026-02-18 18:10 ` [PATCH net v2 3/3] vsock: document write-once behavior of the child_ns_mode sysctl Bobby Eshleman
2026-02-19 10:36 ` Stefano Garzarella
2026-02-19 16:06 ` Bobby Eshleman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox