linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	 Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	 linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-kselftest@vger.kernel.org,
	Aleksa Sarai <cyphar@cyphar.com>
Subject: [PATCH RFC 3/4] procfs: add PROCFS_GET_PID_NAMESPACE ioctl
Date: Mon, 21 Jul 2025 18:44:13 +1000	[thread overview]
Message-ID: <20250721-procfs-pidns-api-v1-3-5cd9007e512d@cyphar.com> (raw)
In-Reply-To: <20250721-procfs-pidns-api-v1-0-5cd9007e512d@cyphar.com>

/proc has historically had very opaque semantics about PID namespaces,
which is a little unfortunate for container runtimes and other programs
that deal with switching namespaces very often. One common issue is that
of converting between PIDs in the process's namespace and PIDs in the
namespace of /proc.

In principle, it is possible to do this today by opening a pidfd with
pidfd_open(2) and then looking at /proc/self/fdinfo/$n (which will
contain a PID value translated to the pid namespace associated with that
procfs superblock).

However, allocating a new file for each PID to be converted is less than
ideal for programs that may need to scan procfs, and it is generally
useful for userspace to be able to finally get this information from
procfs. This also acts as a sister feature to the pidns= mount option,
finally allowing userspace full control of the pid namespaces associated
with /proc instances.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 Documentation/filesystems/proc.rst |  4 +++
 fs/proc/root.c                     | 52 ++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/fs.h            |  3 +++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index c520b9f8a3fd..506383273c9d 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -2398,6 +2398,10 @@ pidns= specifies a pid namespace (either as a string path to something like
 will be used by the procfs instance when translating pids. By default, procfs
 will use the calling process's active pid namespace.
 
+Processes can check which pid namespace is used by a procfs instance by using
+the `PROCFS_GET_PID_NAMESPACE` ioctl() on the root directory of the procfs
+instance.
+
 Chapter 5: Filesystem behavior
 ==============================
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 10ca94be0eef..ee90749ccd8e 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -23,8 +23,10 @@
 #include <linux/cred.h>
 #include <linux/magic.h>
 #include <linux/slab.h>
+#include <linux/ptrace.h>
 
 #include "internal.h"
+#include "../internal.h"
 
 struct proc_fs_context {
 	struct pid_namespace	*pid_ns;
@@ -408,15 +410,61 @@ static int proc_root_readdir(struct file *file, struct dir_context *ctx)
 	return proc_pid_readdir(file, ctx);
 }
 
+static long int proc_root_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case PROCFS_GET_PID_NAMESPACE: {
+		struct pid_namespace *active = task_active_pid_ns(current);
+		struct pid_namespace *ns = proc_pid_ns(file_inode(filp)->i_sb);
+		bool can_access_pidns = false;
+
+		/*
+		 * If we are in an ancestors of the pidns, or have join
+		 * privileges (CAP_SYS_ADMIN), then it makes sense that we
+		 * would be able to grab a handle to the pidns.
+		 *
+		 * Otherwise, if there is a root process, then being able to
+		 * access /proc/$pid/ns/pid is equivalent to this ioctl and so
+		 * we should probably match the permission model. For empty
+		 * namespaces it seems unlikely for there to be a downside to
+		 * allowing unprivileged users to open a handle to it (setns
+		 * will fail for unprivileged users anyway).
+		 */
+		can_access_pidns = pidns_is_ancestor(ns, active) ||
+				   ns_capable(ns->user_ns, CAP_SYS_ADMIN);
+		if (!can_access_pidns) {
+			bool cannot_ptrace_pid1 = false;
+
+			read_lock(&tasklist_lock);
+			if (ns->child_reaper)
+				cannot_ptrace_pid1 = ptrace_may_access(ns->child_reaper,
+								       PTRACE_MODE_READ_FSCREDS);
+			read_unlock(&tasklist_lock);
+			can_access_pidns = !cannot_ptrace_pid1;
+		}
+		if (!can_access_pidns)
+			return -EPERM;
+
+		/* open_namespace() unconditionally consumes the reference. */
+		get_pid_ns(ns);
+		return open_namespace(to_ns_common(ns));
+	}
+	default:
+		return -ENOIOCTLCMD;
+	}
+}
+
 /*
  * The root /proc directory is special, as it has the
  * <pid> directories. Thus we don't use the generic
  * directory handling functions for that..
  */
 static const struct file_operations proc_root_operations = {
-	.read		 = generic_read_dir,
-	.iterate_shared	 = proc_root_readdir,
+	.read		= generic_read_dir,
+	.iterate_shared	= proc_root_readdir,
 	.llseek		= generic_file_llseek,
+	.unlocked_ioctl = proc_root_ioctl,
+	.compat_ioctl   = compat_ptr_ioctl,
 };
 
 /*
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 0bd678a4a10e..aa642cb48feb 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -437,6 +437,9 @@ typedef int __bitwise __kernel_rwf_t;
 
 #define PROCFS_IOCTL_MAGIC 'f'
 
+/* procfs root ioctls */
+#define PROCFS_GET_PID_NAMESPACE	_IO(PROCFS_IOCTL_MAGIC, 1)
+
 /* Pagemap ioctl */
 #define PAGEMAP_SCAN	_IOWR(PROCFS_IOCTL_MAGIC, 16, struct pm_scan_arg)
 

-- 
2.50.0


  parent reply	other threads:[~2025-07-21  8:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-21  8:44 [PATCH RFC 0/4] procfs: make reference pidns more user-visible Aleksa Sarai
2025-07-21  8:44 ` [PATCH RFC 1/4] pidns: move is-ancestor logic to helper Aleksa Sarai
2025-07-21  8:44 ` [PATCH RFC 2/4] procfs: add pidns= mount option Aleksa Sarai
2025-07-21  8:44 ` Aleksa Sarai [this message]
2025-07-21  8:44 ` [PATCH RFC 4/4] selftests/proc: add tests for new pidns APIs Aleksa Sarai
2025-07-21 14:54 ` [PATCH RFC 0/4] procfs: make reference pidns more user-visible Andy Lutomirski
2025-07-21 15:19   ` Aleksa Sarai
2025-07-23 23:55     ` Aleksa Sarai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250721-procfs-pidns-api-v1-3-5cd9007e512d@cyphar.com \
    --to=cyphar@cyphar.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).