public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period
@ 2026-04-05 16:50 Yohei Kojima
  2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
  To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kernel, linux-kselftest

This series fixes the spurious ENOENT set by listns when (1) pagination
is used and (2) listns tries to start enumeration from a destroyed or
inactive namespace.

The Cause of the Bug
====================
This bug was caused by lookup_ns_id_at(kls->last_ns_id + 1, ...), which
is called by do_listns(). This function returned NULL if the first
namespace after the given ns id was destroyed or inactivated before this
function is called:

A: active namespace
D: destroyed (or inactive) namespace

         +-----+-----+-----+-----+-----+-----+-----+-----+
state:   |  A  |  A  |  A  |  D  |  D  |  A  |  A  |  A  |
ns_id:   |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |
         +-----+-----+-----+-----+-----+-----+-----+-----+
	                |     |
		        |     +-- (kls->last_ns_id + 1)
		        +-- req.ns_id = 3

For listns(), there is no way to distinguish this case with the case
nstree is empty, therefore it returns -ENOENT although three namespaces
remains in the tree.

Solution
========
The bug is fixed by iterating over the nstree's internal list until it
reaches the first active namespace.

Patches Sequence
================
Patches 1 and Patch 2 fix the existing issues in namespace selftests.
Patch 3 fixes the spurious ENOENT bug. Patch 4 adds a regression test
for this bug.

Disclaimer on Reproduction
==========================
Unfortunately I couldn't reproduce this bug on VM environment, perhaps
because the test I added relies on timing-sensitive RCU behavior. At
least, I confirmed that this bug reproduces on my bare-metal machine
equipped with i7-14700K. Also, I confirmed that all namespaces tests
pass after applying this series.


Yohei Kojima (4):
  selftests/namespace: fix selftest hang-up caused by zombie processes
  selftests/namespace: fix unintentional skip in ns_active_ref_test.c
  nstree: Fix spurious ENOENT in listns pagination during grace period
  selftests/namespace: test spurious ENOENT bug in listns pagination

 kernel/nstree.c                               |  68 ++++--
 .../namespaces/listns_pagination_bug.c        | 200 ++++++++++++++++++
 .../selftests/namespaces/ns_active_ref_test.c |   4 +
 .../testing/selftests/namespaces/nsid_test.c  |   8 +
 4 files changed, 258 insertions(+), 22 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes
  2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
  2026-04-05 16:50 ` [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c Yohei Kojima
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
  To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kselftest, linux-kernel

Fix zombie grandchild processes spawned by timens_separate and
pidns_separate tests in nsid_test.c. This also prevents kselftest from
hanging up after running these tests.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
 tools/testing/selftests/namespaces/nsid_test.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/namespaces/nsid_test.c b/tools/testing/selftests/namespaces/nsid_test.c
index b4a14c6693a5..1e2669372b7b 100644
--- a/tools/testing/selftests/namespaces/nsid_test.c
+++ b/tools/testing/selftests/namespaces/nsid_test.c
@@ -25,10 +25,12 @@
 /* Fixture for tests that create child processes */
 FIXTURE(nsid) {
 	pid_t child_pid;
+	pid_t grandchild_pid;
 };
 
 FIXTURE_SETUP(nsid) {
 	self->child_pid = 0;
+	self->grandchild_pid = 0;
 }
 
 FIXTURE_TEARDOWN(nsid) {
@@ -37,6 +39,10 @@ FIXTURE_TEARDOWN(nsid) {
 		kill(self->child_pid, SIGKILL);
 		waitpid(self->child_pid, NULL, 0);
 	}
+	if (self->grandchild_pid > 0) {
+		kill(self->grandchild_pid, SIGKILL);
+		waitpid(self->grandchild_pid, NULL, 0);
+	}
 }
 
 TEST(nsid_mntns_basic)
@@ -677,6 +683,7 @@ TEST_F(nsid, timens_separate)
 	pid_t grandchild_pid;
 	ASSERT_EQ(read(pipefd[0], &grandchild_pid, sizeof(grandchild_pid)), sizeof(grandchild_pid));
 	close(pipefd[0]);
+	self->grandchild_pid = grandchild_pid;
 
 	/* Open grandchild's time namespace */
 	char path[256];
@@ -798,6 +805,7 @@ TEST_F(nsid, pidns_separate)
 	pid_t grandchild_pid;
 	ASSERT_EQ(read(pipefd[0], &grandchild_pid, sizeof(grandchild_pid)), sizeof(grandchild_pid));
 	close(pipefd[0]);
+	self->grandchild_pid = grandchild_pid;
 
 	/* Open grandchild's PID namespace */
 	char path[256];
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c
  2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
  2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
  2026-04-05 16:50 ` [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
  2026-04-05 16:50 ` [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination Yohei Kojima
  3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
  To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kselftest, linux-kernel

Fix ESTALE from open_by_handle_at() in ns_multiple_children_same_parent
when child processes exit before the parent run it.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
 tools/testing/selftests/namespaces/ns_active_ref_test.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/namespaces/ns_active_ref_test.c b/tools/testing/selftests/namespaces/ns_active_ref_test.c
index 093268f0efaa..29d96a6e8100 100644
--- a/tools/testing/selftests/namespaces/ns_active_ref_test.c
+++ b/tools/testing/selftests/namespaces/ns_active_ref_test.c
@@ -1193,6 +1193,10 @@ TEST(ns_multiple_children_same_parent)
 		write(pipefd[1], &c1_id, sizeof(c1_id));
 		write(pipefd[1], &c2_id, sizeof(c2_id));
 		close(pipefd[1]);
+
+		/* give parent a time to run open_by_handle_at() */
+		usleep(10000);
+
 		exit(0);
 	}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period
  2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
  2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
  2026-04-05 16:50 ` [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
  2026-04-05 16:50 ` [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination Yohei Kojima
  3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Yohei Kojima, linux-kernel

Fix false ENOENT returned from listns when (1) pagination is used
(req.ns_id != 0) and (2) listns tries to start enumeration from a
destroyed or inactive namespace.

The cause was that lookup_ns_id_at(kls->last_ns_id + 1, ...) returned
NULL if the first namespace after ns_id was destroyed or inactivated
like below: (Note that we can take nstree as a list as it is an rbtree
sorted by ns id.)

A: active namespace
D: destroyed (or inactive) namespace

         +-----+-----+-----+-----+-----+-----+-----+-----+
state:   |  A  |  A  |  A  |  D  |  D  |  A  |  A  |  A  |
ns_id:   |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |
         +-----+-----+-----+-----+-----+-----+-----+-----+
	                |     |
		        |     +-- (kls->last_ns_id + 1)
		        +-- req.ns_id = 3

In this case, lookup_ns_id_at() returns NULL, which results in -ENOENT
returned from do_listns() although three namespaces remains in the nstree.

The bug is fixed by iterating over the nstree's internal list until it
reaches the first active namespace.

Fixes: 76b6f5dfb3fd ("nstree: add listns()")
Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
 kernel/nstree.c | 68 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 46 insertions(+), 22 deletions(-)

diff --git a/kernel/nstree.c b/kernel/nstree.c
index 6d12e5900ac0..476d22203ee0 100644
--- a/kernel/nstree.c
+++ b/kernel/nstree.c
@@ -618,14 +618,32 @@ static ssize_t do_listns_userns(struct klistns *kls)
 	return ret;
 }
 
+static inline struct ns_common *next_ns_common(struct ns_common *ns,
+					       struct ns_tree_root *ns_tree)
+{
+	if (ns_tree)
+		return list_entry_rcu(ns->ns_tree_node.ns_list_entry.next, struct ns_common, ns_tree_node.ns_list_entry);
+	return list_entry_rcu(ns->ns_unified_node.ns_list_entry.next, struct ns_common, ns_unified_node.ns_list_entry);
+}
+
+static inline bool ns_common_is_head(struct ns_common *ns,
+				     const struct list_head *head,
+				     struct ns_tree_root *ns_tree)
+{
+	if (ns_tree)
+		return &ns->ns_tree_node.ns_list_entry == head;
+	return &ns->ns_unified_node.ns_list_entry == head;
+}
+
 /*
  * Lookup a namespace with id >= ns_id in either the unified tree or a type-specific tree.
  * Returns the namespace with the smallest id that is >= ns_id.
  */
 static struct ns_common *lookup_ns_id_at(u64 ns_id, int ns_type)
 {
-	struct ns_common *ret = NULL;
+	struct ns_common *min = NULL, *ret = NULL;
 	struct ns_tree_root *ns_tree = NULL;
+	struct list_head *head;
 	struct rb_node *node;
 
 	if (ns_type) {
@@ -651,9 +669,9 @@ static struct ns_common *lookup_ns_id_at(u64 ns_id, int ns_type)
 
 		if (ns_id <= ns->ns_id) {
 			if (ns_type)
-				ret = node_to_ns(node);
+				min = node_to_ns(node);
 			else
-				ret = node_to_ns_unified(node);
+				min = node_to_ns_unified(node);
 			if (ns_id == ns->ns_id)
 				break;
 			node = node->rb_left;
@@ -662,8 +680,31 @@ static struct ns_common *lookup_ns_id_at(u64 ns_id, int ns_type)
 		}
 	}
 
-	if (ret)
-		ret = ns_get_unless_inactive(ret);
+	if (!min)
+		return NULL;
+	/*
+	 * Now min->ns_id is the minimum id where min->ns_id >= ns_id holds,
+	 * but min could be inactive or destroyed here, therefore
+	 * ns_get_unless_inactive(min) could return NULL.
+	 *
+	 * To handle this case, try acquiring the next ns until it reaches the
+	 * first valid ns.
+	 */
+	if (ns_tree)
+		head = &ns_tree->ns_list_head;
+	else
+		head = &ns_unified_root.ns_list_head;
+
+	while (!ns_common_is_head(min, head, ns_tree)) {
+		ret = ns_get_unless_inactive(min);
+		if (ret)
+			break;
+
+		rcu_read_lock();
+		min = next_ns_common(min, ns_tree);
+		rcu_read_unlock();
+	}
+
 	return ret;
 }
 
@@ -675,23 +716,6 @@ static inline struct ns_common *first_ns_common(const struct list_head *head,
 	return list_entry_rcu(head->next, struct ns_common, ns_unified_node.ns_list_entry);
 }
 
-static inline struct ns_common *next_ns_common(struct ns_common *ns,
-					       struct ns_tree_root *ns_tree)
-{
-	if (ns_tree)
-		return list_entry_rcu(ns->ns_tree_node.ns_list_entry.next, struct ns_common, ns_tree_node.ns_list_entry);
-	return list_entry_rcu(ns->ns_unified_node.ns_list_entry.next, struct ns_common, ns_unified_node.ns_list_entry);
-}
-
-static inline bool ns_common_is_head(struct ns_common *ns,
-				     const struct list_head *head,
-				     struct ns_tree_root *ns_tree)
-{
-	if (ns_tree)
-		return &ns->ns_tree_node.ns_list_entry == head;
-	return &ns->ns_unified_node.ns_list_entry == head;
-}
-
 static ssize_t do_listns(struct klistns *kls)
 {
 	u64 __user *ns_ids = kls->uns_ids;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination
  2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
                   ` (2 preceding siblings ...)
  2026-04-05 16:50 ` [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
@ 2026-04-05 16:50 ` Yohei Kojima
  3 siblings, 0 replies; 5+ messages in thread
From: Yohei Kojima @ 2026-04-05 16:50 UTC (permalink / raw)
  To: Christian Brauner, Shuah Khan; +Cc: Yohei Kojima, linux-kselftest, linux-kernel

Test spurious ENOENT which occurs when it tried to start pagination from
an inactivated or destroyed namespace. The new test is almost identical
to pagination_with_type_filter, except that it calls run_noisy_children()
which creates and lists namespaces to disturb nstree.

As far as the author tested, this bug only reproduced on a baremetal
environment, probably because the test relies on the RCU behavior and
the kernel behaves differently on VM.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
---
 .../namespaces/listns_pagination_bug.c        | 200 ++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/tools/testing/selftests/namespaces/listns_pagination_bug.c b/tools/testing/selftests/namespaces/listns_pagination_bug.c
index da7d33f96397..f71d8f4d64bb 100644
--- a/tools/testing/selftests/namespaces/listns_pagination_bug.c
+++ b/tools/testing/selftests/namespaces/listns_pagination_bug.c
@@ -135,4 +135,204 @@ TEST(pagination_with_type_filter)
 	}
 }
 
+static void run_noisy_children(int num_workers)
+{
+	struct ns_id_req req = {
+		.size = sizeof(req),
+		.spare = 0,
+		.ns_id = 0,
+		.ns_type = CLONE_NEWUSER,  /* Filter by user namespace */
+		.spare2 = 0,
+		.user_ns_id = 0,
+	};
+	pid_t pids[num_workers];
+	int num_forked = 0;
+	int i;
+
+	/*
+	 * Create worker processes that do concurrent operations;
+	 * most of this part is borrowed from concurrent_namespace_operations
+	 * test in stress_test.c
+	 */
+	for (i = 0; i < num_workers; i++) {
+		pids[i] = fork();
+		if (pids[i] < 0)
+			goto failure;
+		if (pids[i] > 0)
+			num_forked++;
+
+		if (pids[i] == 0) {
+			/* Each worker: create namespaces, list them, repeat */
+			int iterations;
+
+			for (iterations = 0; iterations < 10; iterations++) {
+				int userns_fd;
+				__u64 temp_ns_ids[100];
+				ssize_t ret;
+
+				/* Create a user namespace */
+				userns_fd = get_userns_fd(0, getuid(), 1);
+				if (userns_fd < 0)
+					continue;
+
+				/* List namespaces */
+				ret = sys_listns(&req, temp_ns_ids, ARRAY_SIZE(temp_ns_ids), 0);
+				(void)ret;
+
+				close(userns_fd);
+
+				/* Small delay */
+				usleep(1000);
+			}
+
+			exit(0);
+		}
+	}
+
+	/*
+	 * Return after waiting for children; this is enough for
+	 * reproduction, and help keeping the test code simple.
+	 */
+	for (i = 0; i < num_forked; i++)
+		waitpid(pids[i], NULL, 0);
+
+	return;
+
+failure:
+	for (i = 0; i < num_forked; i++)
+		kill(pids[i], SIGKILL);
+	for (i = 0; i < num_forked; i++)
+		waitpid(pids[i], NULL, 0);
+}
+
+/*
+ * A test case to reproduce spurious ENOENT in listns pagination
+ *
+ * The bug occurs when the ns id to start pagination is inactivated or
+ * destroyed before listns is called (or during listns is processed).
+ *
+ * This test is almost identical to pagination_with_type_filter test
+ * except that this calls run_noisy_children().
+ */
+TEST(pagination_during_grace_period)
+{
+	struct ns_id_req req = {
+		.size = sizeof(req),
+		.spare = 0,
+		.ns_id = 0,
+		.ns_type = CLONE_NEWUSER,  /* Filter by user namespace */
+		.spare2 = 0,
+		.user_ns_id = 0,
+	};
+	pid_t pids[10];
+	int num_children = 10;
+	const int num_noisy_children = 10;
+	int i;
+	int sv[2];
+	__u64 first_batch[3];
+	ssize_t ret;
+
+	ASSERT_EQ(socketpair(AF_UNIX, SOCK_STREAM, 0, sv), 0);
+
+	run_noisy_children(num_noisy_children);
+
+	/* Create children with user namespaces */
+	for (i = 0; i < num_children; i++) {
+		pids[i] = fork();
+		ASSERT_GE(pids[i], 0);
+
+		if (pids[i] == 0) {
+			char c;
+
+			close(sv[0]);
+
+			if (setup_userns() < 0) {
+				close(sv[1]);
+				exit(1);
+			}
+
+			/* Signal parent we're ready */
+			if (write(sv[1], &c, 1) != 1) {
+				close(sv[1]);
+				exit(1);
+			}
+
+			/* Wait for parent signal to exit */
+			if (read(sv[1], &c, 1) != 1) {
+				close(sv[1]);
+				exit(1);
+			}
+
+			close(sv[1]);
+			exit(0);
+		}
+	}
+
+	close(sv[1]);
+
+	/* Wait for all children to signal ready */
+	for (i = 0; i < num_children; i++) {
+		char c;
+
+		if (read(sv[0], &c, 1) != 1) {
+			close(sv[0]);
+			for (int j = 0; j < num_children; j++)
+				kill(pids[j], SIGKILL);
+			for (int j = 0; j < num_children; j++)
+				waitpid(pids[j], NULL, 0);
+			ASSERT_TRUE(false);
+		}
+	}
+
+	/* First batch - this should work */
+	ret = sys_listns(&req, first_batch, 3, 0);
+	if (ret < 0) {
+		if (errno == ENOSYS) {
+			close(sv[0]);
+			for (i = 0; i < num_children; i++)
+				kill(pids[i], SIGKILL);
+			for (i = 0; i < num_children; i++)
+				waitpid(pids[i], NULL, 0);
+			SKIP(return, "listns() not supported");
+		}
+		ASSERT_GE(ret, 0);
+	}
+
+	TH_LOG("First batch returned %zd entries", ret);
+
+	if (ret == 3) {
+		__u64 second_batch[3];
+
+		/* Second batch - pagination triggers the bug */
+		req.ns_id = first_batch[2];  /* Continue from last ID */
+		ret = sys_listns(&req, second_batch, 3, 0);
+
+		TH_LOG("Second batch returned %zd entries", ret);
+		ASSERT_GE(ret, 0);
+	}
+
+	/* Signal all children to exit */
+	for (i = 0; i < num_children; i++) {
+		char c = 'X';
+
+		if (write(sv[0], &c, 1) != 1) {
+			close(sv[0]);
+			for (int j = i; j < num_children; j++)
+				kill(pids[j], SIGKILL);
+			for (int j = 0; j < num_children; j++)
+				waitpid(pids[j], NULL, 0);
+			ASSERT_TRUE(false);
+		}
+	}
+
+	close(sv[0]);
+
+	/* Cleanup */
+	for (i = 0; i < num_children; i++) {
+		int status;
+
+		waitpid(pids[i], &status, 0);
+	}
+}
+
 TEST_HARNESS_MAIN
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-05 16:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-05 16:50 [PATCH 0/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 1/4] selftests/namespace: fix selftest hang-up caused by zombie processes Yohei Kojima
2026-04-05 16:50 ` [PATCH 2/4] selftests/namespace: fix unintentional skip in ns_active_ref_test.c Yohei Kojima
2026-04-05 16:50 ` [PATCH 3/4] nstree: Fix spurious ENOENT in listns pagination during grace period Yohei Kojima
2026-04-05 16:50 ` [PATCH 4/4] selftests/namespace: test spurious ENOENT bug in listns pagination Yohei Kojima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox