From: Christian Brauner <brauner@kernel.org>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH RFC v2 3/4] procfs: add PROCFS_GET_PID_NAMESPACE ioctl
Date: Thu, 24 Jul 2025 09:34:59 +0200 [thread overview]
Message-ID: <20250724-beobachten-verfassen-9a39c0318341@brauner> (raw)
In-Reply-To: <20250723-procfs-pidns-api-v2-3-621e7edd8e40@cyphar.com>
On Wed, Jul 23, 2025 at 09:18:53AM +1000, Aleksa Sarai wrote:
> /proc has historically had very opaque semantics about PID namespaces,
> which is a little unfortunate for container runtimes and other programs
> that deal with switching namespaces very often. One common issue is that
> of converting between PIDs in the process's namespace and PIDs in the
> namespace of /proc.
>
> In principle, it is possible to do this today by opening a pidfd with
> pidfd_open(2) and then looking at /proc/self/fdinfo/$n (which will
> contain a PID value translated to the pid namespace associated with that
> procfs superblock). However, allocating a new file for each PID to be
> converted is less than ideal for programs that may need to scan procfs,
> and it is generally useful for userspace to be able to finally get this
> information from procfs.
>
> So, add a new API for this in the form of an ioctl(2) you can call on
> the root directory of procfs. The returned file descriptor will have
> O_CLOEXEC set. This acts as a sister feature to the new "pidns" mount
> option, finally allowing userspace full control of the pid namespaces
> associated with procfs instances.
>
> The permission model for this is a bit looser than that of the "pidns"
> mount option, but this is mainly because /proc/1/ns/pid provides the
> same information, so as long as you have access to that magic-link (or
> something equivalently reasonable such as privileges with CAP_SYS_ADMIN
> or being in an ancestor pid namespace) it makes sense to allow userspace
> to grab a handle. setns(2) will still have their own permission checks,
> so being able to open a pidns handle doesn't really provide too many
> other capabilities.
>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
> Documentation/filesystems/proc.rst | 4 +++
> fs/proc/root.c | 54 ++++++++++++++++++++++++++++++++++++--
> include/uapi/linux/fs.h | 3 +++
> 3 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index c520b9f8a3fd..506383273c9d 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -2398,6 +2398,10 @@ pidns= specifies a pid namespace (either as a string path to something like
> will be used by the procfs instance when translating pids. By default, procfs
> will use the calling process's active pid namespace.
>
> +Processes can check which pid namespace is used by a procfs instance by using
> +the `PROCFS_GET_PID_NAMESPACE` ioctl() on the root directory of the procfs
> +instance.
> +
> Chapter 5: Filesystem behavior
> ==============================
>
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 057c8a125c6e..548a57ec2152 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -23,8 +23,10 @@
> #include <linux/cred.h>
> #include <linux/magic.h>
> #include <linux/slab.h>
> +#include <linux/ptrace.h>
>
> #include "internal.h"
> +#include "../internal.h"
>
> struct proc_fs_context {
> struct pid_namespace *pid_ns;
> @@ -418,15 +420,63 @@ static int proc_root_readdir(struct file *file, struct dir_context *ctx)
> return proc_pid_readdir(file, ctx);
> }
>
> +static long int proc_root_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> +{
> + switch (cmd) {
> +#ifdef CONFIG_PID_NS
> + case PROCFS_GET_PID_NAMESPACE: {
> + struct pid_namespace *active = task_active_pid_ns(current);
> + struct pid_namespace *ns = proc_pid_ns(file_inode(filp)->i_sb);
> + bool can_access_pidns = false;
> +
> + /*
> + * If we are in an ancestors of the pidns, or have join
> + * privileges (CAP_SYS_ADMIN), then it makes sense that we
> + * would be able to grab a handle to the pidns.
> + *
> + * Otherwise, if there is a root process, then being able to
> + * access /proc/$pid/ns/pid is equivalent to this ioctl and so
> + * we should probably match the permission model. For empty
> + * namespaces it seems unlikely for there to be a downside to
> + * allowing unprivileged users to open a handle to it (setns
> + * will fail for unprivileged users anyway).
> + */
> + can_access_pidns = pidns_is_ancestor(ns, active) ||
> + ns_capable(ns->user_ns, CAP_SYS_ADMIN);
This seems to imply that if @ns is a descendant of @active that the
caller holds privileges over it. Is that actually always true?
IOW, why is the check different from the previous pidns= mount option
check. I would've expected:
ns_capable(_no_audit)(ns->user_ns) && pidns_is_ancestor(ns, active)
and then the ptrace check as a fallback.
> + if (!can_access_pidns) {
> + bool cannot_ptrace_pid1 = false;
> +
> + read_lock(&tasklist_lock);
> + if (ns->child_reaper)
> + cannot_ptrace_pid1 = ptrace_may_access(ns->child_reaper,
> + PTRACE_MODE_READ_FSCREDS);
> + read_unlock(&tasklist_lock);
> + can_access_pidns = !cannot_ptrace_pid1;
> + }
> + if (!can_access_pidns)
> + return -EPERM;
> +
> + /* open_namespace() unconditionally consumes the reference. */
> + get_pid_ns(ns);
> + return open_namespace(to_ns_common(ns));
> + }
> +#endif /* CONFIG_PID_NS */
> + default:
> + return -ENOIOCTLCMD;
> + }
> +}
> +
> /*
> * The root /proc directory is special, as it has the
> * <pid> directories. Thus we don't use the generic
> * directory handling functions for that..
> */
> static const struct file_operations proc_root_operations = {
> - .read = generic_read_dir,
> - .iterate_shared = proc_root_readdir,
> + .read = generic_read_dir,
> + .iterate_shared = proc_root_readdir,
> .llseek = generic_file_llseek,
> + .unlocked_ioctl = proc_root_ioctl,
> + .compat_ioctl = compat_ptr_ioctl,
> };
>
> /*
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 0bd678a4a10e..aa642cb48feb 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -437,6 +437,9 @@ typedef int __bitwise __kernel_rwf_t;
>
> #define PROCFS_IOCTL_MAGIC 'f'
>
> +/* procfs root ioctls */
> +#define PROCFS_GET_PID_NAMESPACE _IO(PROCFS_IOCTL_MAGIC, 1)
> +
> /* Pagemap ioctl */
> #define PAGEMAP_SCAN _IOWR(PROCFS_IOCTL_MAGIC, 16, struct pm_scan_arg)
>
>
> --
> 2.50.0
>
next prev parent reply other threads:[~2025-07-24 7:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 23:18 [PATCH RFC v2 0/4] procfs: make reference pidns more user-visible Aleksa Sarai
2025-07-22 23:18 ` [PATCH RFC v2 1/4] pidns: move is-ancestor logic to helper Aleksa Sarai
2025-07-24 7:06 ` Christian Brauner
2025-07-22 23:18 ` [PATCH RFC v2 2/4] procfs: add "pidns" mount option Aleksa Sarai
2025-07-24 7:25 ` Christian Brauner
2025-07-25 2:13 ` Aleksa Sarai
2025-07-22 23:18 ` [PATCH RFC v2 3/4] procfs: add PROCFS_GET_PID_NAMESPACE ioctl Aleksa Sarai
2025-07-24 7:34 ` Christian Brauner [this message]
2025-07-25 2:24 ` Aleksa Sarai
2025-07-31 10:31 ` Christian Brauner
2025-07-31 14:21 ` Aleksa Sarai
2025-07-22 23:18 ` [PATCH RFC v2 4/4] selftests/proc: add tests for new pidns APIs Aleksa Sarai
2025-07-24 7:36 ` [PATCH RFC v2 0/4] procfs: make reference pidns more user-visible Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250724-beobachten-verfassen-9a39c0318341@brauner \
--to=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=cyphar@cyphar.com \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=shuah@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox