* [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period
@ 2026-04-05 16:50 Yohei Kojima
2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kernel, linux-kselftest
This series fixes the spurious ENOENT set by listns when (1) pagination
is used and (2) listns tries to start enumeration from a destroyed or
inactive namespace.
The Cause of the Bug
====================
This bug was caused by lookup_ns_id_at(kls->last_ns_id + 1, ...), which
is called by do_listns(). This function returned NULL if the first
namespace after the given ns id was destroyed or inactivated before this
function is called:
A: active namespace
D: destroyed (or inactive) namespace
+-----+-----+-----+-----+-----+-----+-----+-----+
state: | A | A | A | D | D | A | A | A |
ns_id: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+-----+-----+-----+-----+-----+-----+-----+-----+
| |
| +-- (kls->last_ns_id + 1)
+-- req.ns_id = 3
For listns(), there is no way to distinguish this case with the case
nstree is empty, therefore it returns -ENOENT although three namespaces
remains in the tree.
Solution
========
The bug is fixed by iterating over the nstree's internal list until it
reaches the first active namespace.
Patches Sequence
================
Patches 1 and Patch 2 fix the existing issues in namespace selftests.
Patch 3 fixes the spurious ENOENT bug. Patch 4 adds a regression test
for this bug.
Disclaimer on Reproduction
==========================
Unfortunately I couldn't reproduce this bug on VM environment, perhaps
because the test I added relies on timing-sensitive RCU behavior. At
least, I confirmed that this bug reproduces on my bare-metal machine
equipped with i7-14700K. Also, I confirmed that all namespaces tests
pass after applying this series.
Yohei Kojima (4):
selftests/namespace: fix selftest hang-up caused by zombie processes
selftests/namespace: fix unintentional skip in ns_active_ref_test.c
nstree: Fix spurious ENOENT in listns pagination during grace period
selftests/namespace: test spurious ENOENT bug in listns pagination
kernel/nstree.c | 68 ++++--
.../namespaces/listns_pagination_bug.c | 200 ++++++++++++++++++
.../selftests/namespaces/ns_active_ref_test.c | 4 +
.../testing/selftests/namespaces/nsid_test.c | 8 +
4 files changed, 258 insertions(+), 22 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes
2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
2026-04-05 16:50 ` [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c Yohei Kojima
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kselftest, linux-kernel
Fix zombie grandchild processes spawned by timens_separate and
pidns_separate tests in nsid_test.c. This also prevents kselftest from
hanging up after running these tests.
Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
tools/testing/selftests/namespaces/nsid_test.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/namespaces/nsid_test.c b/tools/testing/selftests/namespaces/nsid_test.c
index b4a14c6693a5..1e2669372b7b 100644
--- a/tools/testing/selftests/namespaces/nsid_test.c
+++ b/tools/testing/selftests/namespaces/nsid_test.c
@@ -25,10 +25,12 @@
/* Fixture for tests that create child processes */
FIXTURE(nsid) {
pid_t child_pid;
+ pid_t grandchild_pid;
};
FIXTURE_SETUP(nsid) {
self->child_pid = 0;
+ self->grandchild_pid = 0;
}
FIXTURE_TEARDOWN(nsid) {
@@ -37,6 +39,10 @@ FIXTURE_TEARDOWN(nsid) {
kill(self->child_pid, SIGKILL);
waitpid(self->child_pid, NULL, 0);
}
+ if (self->grandchild_pid > 0) {
+ kill(self->grandchild_pid, SIGKILL);
+ waitpid(self->grandchild_pid, NULL, 0);
+ }
}
TEST(nsid_mntns_basic)
@@ -677,6 +683,7 @@ TEST_F(nsid, timens_separate)
pid_t grandchild_pid;
ASSERT_EQ(read(pipefd[0], &grandchild_pid, sizeof(grandchild_pid)), sizeof(grandchild_pid));
close(pipefd[0]);
+ self->grandchild_pid = grandchild_pid;
/* Open grandchild's time namespace */
char path[256];
@@ -798,6 +805,7 @@ TEST_F(nsid, pidns_separate)
pid_t grandchild_pid;
ASSERT_EQ(read(pipefd[0], &grandchild_pid, sizeof(grandchild_pid)), sizeof(grandchild_pid));
close(pipefd[0]);
+ self->grandchild_pid = grandchild_pid;
/* Open grandchild's PID namespace */
char path[256];
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c
2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
2026-04-05 16:50 ` [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination Yohei Kojima
3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kselftest, linux-kernel
Fix ESTALE from open_by_handle_at() in ns_multiple_children_same_parent
when child processes exit before the parent run it.
Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
tools/testing/selftests/namespaces/ns_active_ref_test.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/namespaces/ns_active_ref_test.c b/tools/testing/selftests/namespaces/ns_active_ref_test.c
index 093268f0efaa..29d96a6e8100 100644
--- a/tools/testing/selftests/namespaces/ns_active_ref_test.c
+++ b/tools/testing/selftests/namespaces/ns_active_ref_test.c
@@ -1193,6 +1193,10 @@ TEST(ns_multiple_children_same_parent)
write(pipefd[1], &c1_id, sizeof(c1_id));
write(pipefd[1], &c2_id, sizeof(c2_id));
close(pipefd[1]);
+
+ /* give parent a time to run open_by_handle_at() */
+ usleep(10000);
+
exit(0);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period
2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
2026-04-05 16:50 ` [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
2026-04-05 16:50 ` [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination Yohei Kojima
3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
To: Christian Brauner; +Cc: Yohei Kojima, linux-kernel
Fix false ENOENT returned from listns when (1) pagination is used
(req.ns_id != 0) and (2) listns tries to start enumeration from a
destroyed or inactive namespace.
The cause was that lookup_ns_id_at(kls->last_ns_id + 1, ...) returned
NULL if the first namespace after ns_id was destroyed or inactivated
like below: (Note that we can take nstree as a list as it is an rbtree
sorted by ns id.)
A: active namespace
D: destroyed (or inactive) namespace
+-----+-----+-----+-----+-----+-----+-----+-----+
state: | A | A | A | D | D | A | A | A |
ns_id: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+-----+-----+-----+-----+-----+-----+-----+-----+
| |
| +-- (kls->last_ns_id + 1)
+-- req.ns_id = 3
In this case, lookup_ns_id_at() returns NULL, which results in -ENOENT
returned from do_listns() although three namespaces remains in the nstree.
The bug is fixed by iterating over the nstree's internal list until it
reaches the first active namespace.
Fixes: 76b6f5dfb3fd ("nstree: add listns()")
Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
kernel/nstree.c | 68 +++++++++++++++++++++++++++++++++----------------
1 file changed, 46 insertions(+), 22 deletions(-)
diff --git a/kernel/nstree.c b/kernel/nstree.c
index 6d12e5900ac0..476d22203ee0 100644
--- a/kernel/nstree.c
+++ b/kernel/nstree.c
@@ -618,14 +618,32 @@ static ssize_t do_listns_userns(struct klistns *kls)
return ret;
}
+static inline struct ns_common *next_ns_common(struct ns_common *ns,
+ struct ns_tree_root *ns_tree)
+{
+ if (ns_tree)
+ return list_entry_rcu(ns->ns_tree_node.ns_list_entry.next, struct ns_common, ns_tree_node.ns_list_entry);
+ return list_entry_rcu(ns->ns_unified_node.ns_list_entry.next, struct ns_common, ns_unified_node.ns_list_entry);
+}
+
+static inline bool ns_common_is_head(struct ns_common *ns,
+ const struct list_head *head,
+ struct ns_tree_root *ns_tree)
+{
+ if (ns_tree)
+ return &ns->ns_tree_node.ns_list_entry == head;
+ return &ns->ns_unified_node.ns_list_entry == head;
+}
+
/*
* Lookup a namespace with id >= ns_id in either the unified tree or a type-specific tree.
* Returns the namespace with the smallest id that is >= ns_id.
*/
static struct ns_common *lookup_ns_id_at(u64 ns_id, int ns_type)
{
- struct ns_common *ret = NULL;
+ struct ns_common *min = NULL, *ret = NULL;
struct ns_tree_root *ns_tree = NULL;
+ struct list_head *head;
struct rb_node *node;
if (ns_type) {
@@ -651,9 +669,9 @@ static struct ns_common *lookup_ns_id_at(u64 ns_id, int ns_type)
if (ns_id <= ns->ns_id) {
if (ns_type)
- ret = node_to_ns(node);
+ min = node_to_ns(node);
else
- ret = node_to_ns_unified(node);
+ min = node_to_ns_unified(node);
if (ns_id == ns->ns_id)
break;
node = node->rb_left;
@@ -662,8 +680,31 @@ static struct ns_common *lookup_ns_id_at(u64 ns_id, int ns_type)
}
}
- if (ret)
- ret = ns_get_unless_inactive(ret);
+ if (!min)
+ return NULL;
+ /*
+ * Now min->ns_id is the minimum id where min->ns_id >= ns_id holds,
+ * but min could be inactive or destroyed here, therefore
+ * ns_get_unless_inactive(min) could return NULL.
+ *
+ * To handle this case, try acquiring the next ns until it reaches the
+ * first valid ns.
+ */
+ if (ns_tree)
+ head = &ns_tree->ns_list_head;
+ else
+ head = &ns_unified_root.ns_list_head;
+
+ while (!ns_common_is_head(min, head, ns_tree)) {
+ ret = ns_get_unless_inactive(min);
+ if (ret)
+ break;
+
+ rcu_read_lock();
+ min = next_ns_common(min, ns_tree);
+ rcu_read_unlock();
+ }
+
return ret;
}
@@ -675,23 +716,6 @@ static inline struct ns_common *first_ns_common(const struct list_head *head,
return list_entry_rcu(head->next, struct ns_common, ns_unified_node.ns_list_entry);
}
-static inline struct ns_common *next_ns_common(struct ns_common *ns,
- struct ns_tree_root *ns_tree)
-{
- if (ns_tree)
- return list_entry_rcu(ns->ns_tree_node.ns_list_entry.next, struct ns_common, ns_tree_node.ns_list_entry);
- return list_entry_rcu(ns->ns_unified_node.ns_list_entry.next, struct ns_common, ns_unified_node.ns_list_entry);
-}
-
-static inline bool ns_common_is_head(struct ns_common *ns,
- const struct list_head *head,
- struct ns_tree_root *ns_tree)
-{
- if (ns_tree)
- return &ns->ns_tree_node.ns_list_entry == head;
- return &ns->ns_unified_node.ns_list_entry == head;
-}
-
static ssize_t do_listns(struct klistns *kls)
{
u64 __user *ns_ids = kls->uns_ids;
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination
2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
` (2 preceding siblings ...)
2026-04-05 16:50 ` [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kselftest, linux-kernel
Test spurious ENOENT which occurs when it tried to start pagination from
an inactivated or destroyed namespace. The new test is almost identical
to pagination_with_type_filter, except that it calls run_noisy_children()
which creates and lists namespaces to disturb nstree.
As far as the author tested, this bug only reproduced on a baremetal
environment, probably because the test relies on the RCU behavior and
the kernel behaves differently on VM.
Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
| 200 ++++++++++++++++++
1 file changed, 200 insertions(+)
--git a/tools/testing/selftests/namespaces/listns_pagination_bug.c b/tools/testing/selftests/namespaces/listns_pagination_bug.c
index da7d33f96397..f71d8f4d64bb 100644
--- a/tools/testing/selftests/namespaces/listns_pagination_bug.c
+++ b/tools/testing/selftests/namespaces/listns_pagination_bug.c
@@ -135,4 +135,204 @@ TEST(pagination_with_type_filter)
}
}
+static void run_noisy_children(int num_workers)
+{
+ struct ns_id_req req = {
+ .size = sizeof(req),
+ .spare = 0,
+ .ns_id = 0,
+ .ns_type = CLONE_NEWUSER, /* Filter by user namespace */
+ .spare2 = 0,
+ .user_ns_id = 0,
+ };
+ pid_t pids[num_workers];
+ int num_forked = 0;
+ int i;
+
+ /*
+ * Create worker processes that do concurrent operations;
+ * most of this part is borrowed from concurrent_namespace_operations
+ * test in stress_test.c
+ */
+ for (i = 0; i < num_workers; i++) {
+ pids[i] = fork();
+ if (pids[i] < 0)
+ goto failure;
+ if (pids[i] > 0)
+ num_forked++;
+
+ if (pids[i] == 0) {
+ /* Each worker: create namespaces, list them, repeat */
+ int iterations;
+
+ for (iterations = 0; iterations < 10; iterations++) {
+ int userns_fd;
+ __u64 temp_ns_ids[100];
+ ssize_t ret;
+
+ /* Create a user namespace */
+ userns_fd = get_userns_fd(0, getuid(), 1);
+ if (userns_fd < 0)
+ continue;
+
+ /* List namespaces */
+ ret = sys_listns(&req, temp_ns_ids, ARRAY_SIZE(temp_ns_ids), 0);
+ (void)ret;
+
+ close(userns_fd);
+
+ /* Small delay */
+ usleep(1000);
+ }
+
+ exit(0);
+ }
+ }
+
+ /*
+ * Return after waiting for children; this is enough for
+ * reproduction, and help keeping the test code simple.
+ */
+ for (i = 0; i < num_forked; i++)
+ waitpid(pids[i], NULL, 0);
+
+ return;
+
+failure:
+ for (i = 0; i < num_forked; i++)
+ kill(pids[i], SIGKILL);
+ for (i = 0; i < num_forked; i++)
+ waitpid(pids[i], NULL, 0);
+}
+
+/*
+ * A test case to reproduce spurious ENOENT in listns pagination
+ *
+ * The bug occurs when the ns id to start pagination is inactivated or
+ * destroyed before listns is called (or during listns is processed).
+ *
+ * This test is almost identical to pagination_with_type_filter test
+ * except that this calls run_noisy_children().
+ */
+TEST(pagination_during_grace_period)
+{
+ struct ns_id_req req = {
+ .size = sizeof(req),
+ .spare = 0,
+ .ns_id = 0,
+ .ns_type = CLONE_NEWUSER, /* Filter by user namespace */
+ .spare2 = 0,
+ .user_ns_id = 0,
+ };
+ pid_t pids[10];
+ int num_children = 10;
+ const int num_noisy_children = 10;
+ int i;
+ int sv[2];
+ __u64 first_batch[3];
+ ssize_t ret;
+
+ ASSERT_EQ(socketpair(AF_UNIX, SOCK_STREAM, 0, sv), 0);
+
+ run_noisy_children(num_noisy_children);
+
+ /* Create children with user namespaces */
+ for (i = 0; i < num_children; i++) {
+ pids[i] = fork();
+ ASSERT_GE(pids[i], 0);
+
+ if (pids[i] == 0) {
+ char c;
+
+ close(sv[0]);
+
+ if (setup_userns() < 0) {
+ close(sv[1]);
+ exit(1);
+ }
+
+ /* Signal parent we're ready */
+ if (write(sv[1], &c, 1) != 1) {
+ close(sv[1]);
+ exit(1);
+ }
+
+ /* Wait for parent signal to exit */
+ if (read(sv[1], &c, 1) != 1) {
+ close(sv[1]);
+ exit(1);
+ }
+
+ close(sv[1]);
+ exit(0);
+ }
+ }
+
+ close(sv[1]);
+
+ /* Wait for all children to signal ready */
+ for (i = 0; i < num_children; i++) {
+ char c;
+
+ if (read(sv[0], &c, 1) != 1) {
+ close(sv[0]);
+ for (int j = 0; j < num_children; j++)
+ kill(pids[j], SIGKILL);
+ for (int j = 0; j < num_children; j++)
+ waitpid(pids[j], NULL, 0);
+ ASSERT_TRUE(false);
+ }
+ }
+
+ /* First batch - this should work */
+ ret = sys_listns(&req, first_batch, 3, 0);
+ if (ret < 0) {
+ if (errno == ENOSYS) {
+ close(sv[0]);
+ for (i = 0; i < num_children; i++)
+ kill(pids[i], SIGKILL);
+ for (i = 0; i < num_children; i++)
+ waitpid(pids[i], NULL, 0);
+ SKIP(return, "listns() not supported");
+ }
+ ASSERT_GE(ret, 0);
+ }
+
+ TH_LOG("First batch returned %zd entries", ret);
+
+ if (ret == 3) {
+ __u64 second_batch[3];
+
+ /* Second batch - pagination triggers the bug */
+ req.ns_id = first_batch[2]; /* Continue from last ID */
+ ret = sys_listns(&req, second_batch, 3, 0);
+
+ TH_LOG("Second batch returned %zd entries", ret);
+ ASSERT_GE(ret, 0);
+ }
+
+ /* Signal all children to exit */
+ for (i = 0; i < num_children; i++) {
+ char c = 'X';
+
+ if (write(sv[0], &c, 1) != 1) {
+ close(sv[0]);
+ for (int j = i; j < num_children; j++)
+ kill(pids[j], SIGKILL);
+ for (int j = 0; j < num_children; j++)
+ waitpid(pids[j], NULL, 0);
+ ASSERT_TRUE(false);
+ }
+ }
+
+ close(sv[0]);
+
+ /* Cleanup */
+ for (i = 0; i < num_children; i++) {
+ int status;
+
+ waitpid(pids[i], &status, 0);
+ }
+}
+
TEST_HARNESS_MAIN
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-05 16:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
2026-04-05 16:50 ` [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c Yohei Kojima
2026-04-05 16:50 ` [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination Yohei Kojima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox