* [PATCH RFC 1/4] file: add take_fd() cleanup helper
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
@ 2024-06-27 14:11 ` Christian Brauner
2024-06-27 17:24 ` Josef Bacik
2024-06-27 14:11 ` [PATCH RFC 2/4] nsproxy: add a cleanup helper for nsproxy Christian Brauner
` (5 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: Christian Brauner @ 2024-06-27 14:11 UTC (permalink / raw)
To: linux-fsdevel
Cc: Josef Bacik, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn, Christian Brauner
Add a helper that returns the file descriptor and ensures that the old
variable contains a negative value. This makes it easy to rely on
CLASS(get_unused_fd).
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
include/linux/cleanup.h | 13 ++++++++-----
include/linux/file.h | 2 ++
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index c2d09bc4f976..80c4181e194a 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -63,17 +63,20 @@
#define __free(_name) __cleanup(__free_##_name)
-#define __get_and_null_ptr(p) \
- ({ __auto_type __ptr = &(p); \
- __auto_type __val = *__ptr; \
- *__ptr = NULL; __val; })
+#define __get_and_null(p, nullvalue) \
+ ({ \
+ __auto_type __ptr = &(p); \
+ __auto_type __val = *__ptr; \
+ *__ptr = nullvalue; \
+ __val; \
+ })
static inline __must_check
const volatile void * __must_check_fn(const volatile void *val)
{ return val; }
#define no_free_ptr(p) \
- ((typeof(p)) __must_check_fn(__get_and_null_ptr(p)))
+ ((typeof(p)) __must_check_fn(__get_and_null(p, NULL)))
#define return_ptr(p) return no_free_ptr(p)
diff --git a/include/linux/file.h b/include/linux/file.h
index 45d0f4800abd..3ea7f2452f20 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -97,6 +97,8 @@ extern void put_unused_fd(unsigned int fd);
DEFINE_CLASS(get_unused_fd, int, if (_T >= 0) put_unused_fd(_T),
get_unused_fd_flags(flags), unsigned flags)
+#define take_fd(fd) __get_and_null(fd, -EBADF)
+
extern void fd_install(unsigned int fd, struct file *file);
int receive_fd(struct file *file, int __user *ufd, unsigned int o_flags);
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH RFC 1/4] file: add take_fd() cleanup helper
2024-06-27 14:11 ` [PATCH RFC 1/4] file: add take_fd() cleanup helper Christian Brauner
@ 2024-06-27 17:24 ` Josef Bacik
2024-06-28 8:37 ` Christian Brauner
0 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2024-06-27 17:24 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn
On Thu, Jun 27, 2024 at 04:11:39PM +0200, Christian Brauner wrote:
> Add a helper that returns the file descriptor and ensures that the old
> variable contains a negative value. This makes it easy to rely on
> CLASS(get_unused_fd).
>
Can we get an extra bit of explanation here, because I had to go read a bunch of
code to figure out what exactly was happening here. Something like
This makes it easy to rely on CLASS(get_unused_fd) for success, as the fd will
be returned and the cleanup will not occur.
Or something like this. Some of us are dumb and have a hard time with these new
cleanup uses. Thanks,
Josef
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/4] file: add take_fd() cleanup helper
2024-06-27 17:24 ` Josef Bacik
@ 2024-06-28 8:37 ` Christian Brauner
0 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2024-06-28 8:37 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-fsdevel, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn
On Thu, Jun 27, 2024 at 01:24:48PM GMT, Josef Bacik wrote:
> On Thu, Jun 27, 2024 at 04:11:39PM +0200, Christian Brauner wrote:
> > Add a helper that returns the file descriptor and ensures that the old
> > variable contains a negative value. This makes it easy to rely on
> > CLASS(get_unused_fd).
> >
>
> Can we get an extra bit of explanation here, because I had to go read a bunch of
> code to figure out what exactly was happening here. Something like
>
> This makes it easy to rely on CLASS(get_unused_fd) for success, as the fd will
> be returned and the cleanup will not occur.
Yes, absolutely.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RFC 2/4] nsproxy: add a cleanup helper for nsproxy
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
2024-06-27 14:11 ` [PATCH RFC 1/4] file: add take_fd() cleanup helper Christian Brauner
@ 2024-06-27 14:11 ` Christian Brauner
2024-06-27 14:11 ` [PATCH RFC 3/4] nsfs: add open_namespace() Christian Brauner
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2024-06-27 14:11 UTC (permalink / raw)
To: linux-fsdevel
Cc: Josef Bacik, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn, Christian Brauner
Add a simple cleanup helper for nsproxy.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
include/linux/nsproxy.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 5601d14e2886..e6bec522b139 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -112,4 +112,6 @@ static inline void get_nsproxy(struct nsproxy *ns)
refcount_inc(&ns->count);
}
+DEFINE_FREE(put_nsproxy, struct nsproxy *, if (_T) put_nsproxy(_T))
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH RFC 3/4] nsfs: add open_namespace()
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
2024-06-27 14:11 ` [PATCH RFC 1/4] file: add take_fd() cleanup helper Christian Brauner
2024-06-27 14:11 ` [PATCH RFC 2/4] nsproxy: add a cleanup helper for nsproxy Christian Brauner
@ 2024-06-27 14:11 ` Christian Brauner
2024-06-27 14:11 ` [PATCH RFC 4/4] pidfs: allow retrieval of namespace file descriptors Christian Brauner
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2024-06-27 14:11 UTC (permalink / raw)
To: linux-fsdevel
Cc: Josef Bacik, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn, Christian Brauner
and call it from open_related_ns().
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
fs/internal.h | 1 +
fs/nsfs.c | 55 +++++++++++++++++++++++++++++++------------------------
2 files changed, 32 insertions(+), 24 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h
index ab2225136f60..1ece6a3d34cb 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -321,3 +321,4 @@ struct stashed_operations {
int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
struct path *path);
void stashed_dentry_prune(struct dentry *dentry);
+int open_namespace(struct ns_common *ns);
diff --git a/fs/nsfs.c b/fs/nsfs.c
index af352dadffe1..c35bc3ca466f 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -83,40 +83,47 @@ int ns_get_path(struct path *path, struct task_struct *task,
return ns_get_path_cb(path, ns_get_path_task, &args);
}
-int open_related_ns(struct ns_common *ns,
- struct ns_common *(*get_ns)(struct ns_common *ns))
+/**
+ * open_namespace - open a namespace
+ * @ns: the namespace to open
+ *
+ * This will consume a reference to @ns indendent of success or failure.
+ *
+ * Return: A file descriptor on success or a negative error code on failure.
+ */
+int open_namespace(struct ns_common *ns)
{
- struct path path = {};
- struct ns_common *relative;
+ struct path path __free(path_put) = {};
struct file *f;
int err;
- int fd;
- fd = get_unused_fd_flags(O_CLOEXEC);
+ /* call first to consume reference */
+ err = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path);
+ if (err < 0)
+ return err;
+
+ CLASS(get_unused_fd, fd)(O_CLOEXEC);
if (fd < 0)
return fd;
+ f = dentry_open(&path, O_RDONLY, current_cred());
+ if (IS_ERR(f))
+ return PTR_ERR(f);
+
+ fd_install(fd, f);
+ return take_fd(fd);
+}
+
+int open_related_ns(struct ns_common *ns,
+ struct ns_common *(*get_ns)(struct ns_common *ns))
+{
+ struct ns_common *relative;
+
relative = get_ns(ns);
- if (IS_ERR(relative)) {
- put_unused_fd(fd);
+ if (IS_ERR(relative))
return PTR_ERR(relative);
- }
- err = path_from_stashed(&relative->stashed, nsfs_mnt, relative, &path);
- if (err < 0) {
- put_unused_fd(fd);
- return err;
- }
-
- f = dentry_open(&path, O_RDONLY, current_cred());
- path_put(&path);
- if (IS_ERR(f)) {
- put_unused_fd(fd);
- fd = PTR_ERR(f);
- } else
- fd_install(fd, f);
-
- return fd;
+ return open_namespace(relative);
}
EXPORT_SYMBOL_GPL(open_related_ns);
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH RFC 4/4] pidfs: allow retrieval of namespace file descriptors
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
` (2 preceding siblings ...)
2024-06-27 14:11 ` [PATCH RFC 3/4] nsfs: add open_namespace() Christian Brauner
@ 2024-06-27 14:11 ` Christian Brauner
2024-06-27 16:51 ` [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Jeff Layton
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2024-06-27 14:11 UTC (permalink / raw)
To: linux-fsdevel
Cc: Josef Bacik, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn, Christian Brauner
For users that hold a reference to a pidfd procfs might not even be
available nor is it desirable to parse through procfs just for the sake
of getting namespace file descriptors for a process.
Make it possible to directly retrieve namespace file descriptors from a
pidfd. Pidfds already can be used with setns() to change a set of
namespaces atomically.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
fs/pidfs.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/pidfd.h | 14 +++++++
2 files changed, 106 insertions(+)
diff --git a/fs/pidfs.c b/fs/pidfs.c
index dbb9d854d1c5..957284e8b2dd 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -11,10 +11,16 @@
#include <linux/proc_fs.h>
#include <linux/proc_ns.h>
#include <linux/pseudo_fs.h>
+#include <linux/ptrace.h>
#include <linux/seq_file.h>
#include <uapi/linux/pidfd.h>
+#include <linux/ipc_namespace.h>
+#include <linux/time_namespace.h>
+#include <linux/utsname.h>
+#include <net/net_namespace.h>
#include "internal.h"
+#include "mount.h"
#ifdef CONFIG_PROC_FS
/**
@@ -108,11 +114,97 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
return poll_flags;
}
+static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct task_struct *task __free(put_task) = NULL;
+ struct nsproxy *nsp __free(put_nsproxy) = NULL;
+ struct user_namespace *user_ns = NULL;
+ struct pid_namespace *pid_ns = NULL;
+ struct pid *pid = pidfd_pid(file);
+ struct ns_common *ns_common;
+
+ if (arg)
+ return -EINVAL;
+
+ task = get_pid_task(pid, PIDTYPE_PID);
+ if (!task)
+ return -ESRCH;
+
+ scoped_guard(task_lock, task) {
+ nsp = task->nsproxy;
+ if (nsp)
+ get_nsproxy(nsp);
+ }
+ if (!nsp)
+ return -ESRCH; /* just pretend it didn't exist */
+
+ /*
+ * We're trying to open a file descriptor to the namespace so perform a
+ * filesystem cred ptrace check. Also, we mirror nsfs behavior.
+ */
+ if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+ return -EACCES;
+
+ switch (cmd) {
+ case PIDFD_GET_CGROUP_NAMESPACE:
+ get_cgroup_ns(nsp->cgroup_ns);
+ ns_common = &nsp->cgroup_ns->ns;
+ break;
+ case PIDFD_GET_IPC_NAMESPACE:
+ get_ipc_ns(nsp->ipc_ns);
+ ns_common = &nsp->ipc_ns->ns;
+ break;
+ case PIDFD_GET_MNT_NAMESPACE:
+ get_mnt_ns(nsp->mnt_ns);
+ ns_common = &nsp->mnt_ns->ns;
+ break;
+ case PIDFD_GET_NET_NAMESPACE:
+ ns_common = &nsp->net_ns->ns;
+ get_net_ns(ns_common);
+ break;
+ case PIDFD_GET_PID_NAMESPACE:
+ rcu_read_lock();
+ pid_ns = get_pid_ns(task_active_pid_ns(task));
+ rcu_read_unlock();
+ ns_common = &pid_ns->ns;
+ break;
+ case PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE:
+ get_pid_ns(nsp->pid_ns_for_children);
+ ns_common = &nsp->pid_ns_for_children->ns;
+ break;
+ case PIDFD_GET_TIME_NAMESPACE:
+ get_time_ns(nsp->time_ns);
+ ns_common = &nsp->time_ns->ns;
+ break;
+ case PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE:
+ get_time_ns(nsp->time_ns_for_children);
+ ns_common = &nsp->time_ns_for_children->ns;
+ break;
+ case PIDFD_GET_USER_NAMESPACE:
+ rcu_read_lock();
+ user_ns = get_user_ns(task_cred_xxx(task, user_ns));
+ rcu_read_unlock();
+ ns_common = &user_ns->ns;
+ break;
+ case PIDFD_GET_UTS_NAMESPACE:
+ get_uts_ns(nsp->uts_ns);
+ ns_common = &nsp->uts_ns->ns;
+ break;
+ default:
+ return -ENOIOCTLCMD;
+ }
+
+ /* open_namespace() unconditionally consumes the reference */
+ return open_namespace(ns_common);
+}
+
static const struct file_operations pidfs_file_operations = {
.poll = pidfd_poll,
#ifdef CONFIG_PROC_FS
.show_fdinfo = pidfd_show_fdinfo,
#endif
+ .unlocked_ioctl = pidfd_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
};
struct pid *pidfd_pid(const struct file *file)
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
index 72ec000a97cd..565fc0629fff 100644
--- a/include/uapi/linux/pidfd.h
+++ b/include/uapi/linux/pidfd.h
@@ -5,6 +5,7 @@
#include <linux/types.h>
#include <linux/fcntl.h>
+#include <linux/ioctl.h>
/* Flags for pidfd_open(). */
#define PIDFD_NONBLOCK O_NONBLOCK
@@ -15,4 +16,17 @@
#define PIDFD_SIGNAL_THREAD_GROUP (1UL << 1)
#define PIDFD_SIGNAL_PROCESS_GROUP (1UL << 2)
+#define PIDFS_IOCTL_MAGIC 0xFF
+
+#define PIDFD_GET_CGROUP_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 1)
+#define PIDFD_GET_IPC_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 2)
+#define PIDFD_GET_MNT_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 3)
+#define PIDFD_GET_NET_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 4)
+#define PIDFD_GET_PID_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 5)
+#define PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 6)
+#define PIDFD_GET_TIME_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 7)
+#define PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 8)
+#define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9)
+#define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10)
+
#endif /* _UAPI_LINUX_PIDFD_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
` (3 preceding siblings ...)
2024-06-27 14:11 ` [PATCH RFC 4/4] pidfs: allow retrieval of namespace file descriptors Christian Brauner
@ 2024-06-27 16:51 ` Jeff Layton
2024-06-27 17:26 ` Josef Bacik
2024-06-27 20:05 ` Alexander Mikhalitsyn
6 siblings, 0 replies; 10+ messages in thread
From: Jeff Layton @ 2024-06-27 16:51 UTC (permalink / raw)
To: Christian Brauner, linux-fsdevel
Cc: Josef Bacik, Seth Forshee, Stephane Graber, Aleksa Sarai,
Alexander Mikhalitsyn
On Thu, 2024-06-27 at 16:11 +0200, Christian Brauner wrote:
> In recent discussions it became clear that having the ability to go
> from
> pidfd to namespace file descriptor is desirable. Not just because it
> is
> already possible to use pidfds with setns() to switch namespaces
> atomically but also because it makes it possible to interact with
> namespaces without having procfs mounted solely relying on the pidfd.
>
> This adds support from deriving a namespace file descriptor from a
> pidfd for all namespace types.
>
> Thanks!
> Christian
>
> ---
> ---
> base-commit: 2a79498f76350570427af72da04b1c7d0e24149e
> change-id: 20240627-work-pidfs-fd415f4d3cd1
>
Neat. I'm not too familiar with all of the CLASS() macro stuff so it
took me a minute to unwind, but this all looks pretty straightforward.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
` (4 preceding siblings ...)
2024-06-27 16:51 ` [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Jeff Layton
@ 2024-06-27 17:26 ` Josef Bacik
2024-06-27 20:05 ` Alexander Mikhalitsyn
6 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2024-06-27 17:26 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, Seth Forshee, Stephane Graber, Jeff Layton,
Aleksa Sarai, Alexander Mikhalitsyn
On Thu, Jun 27, 2024 at 04:11:38PM +0200, Christian Brauner wrote:
> In recent discussions it became clear that having the ability to go from
> pidfd to namespace file descriptor is desirable. Not just because it is
> already possible to use pidfds with setns() to switch namespaces
> atomically but also because it makes it possible to interact with
> namespaces without having procfs mounted solely relying on the pidfd.
>
> This adds support from deriving a namespace file descriptor from a
> pidfd for all namespace types.
>
I love this, just have the one comment about documenting how the take_fd() thing
works, but then you can add
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors
2024-06-27 14:11 [PATCH RFC 0/4] pidfs: allow retrieval of namespace descriptors Christian Brauner
` (5 preceding siblings ...)
2024-06-27 17:26 ` Josef Bacik
@ 2024-06-27 20:05 ` Alexander Mikhalitsyn
6 siblings, 0 replies; 10+ messages in thread
From: Alexander Mikhalitsyn @ 2024-06-27 20:05 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, Josef Bacik, Seth Forshee, Stephane Graber,
Jeff Layton, Aleksa Sarai
Am Do., 27. Juni 2024 um 16:11 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> In recent discussions it became clear that having the ability to go from
> pidfd to namespace file descriptor is desirable. Not just because it is
> already possible to use pidfds with setns() to switch namespaces
> atomically but also because it makes it possible to interact with
> namespaces without having procfs mounted solely relying on the pidfd.
>
> This adds support from deriving a namespace file descriptor from a
> pidfd for all namespace types.
Extremely useful API. Thanks!
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>
> Thanks!
> Christian
>
> ---
> ---
> base-commit: 2a79498f76350570427af72da04b1c7d0e24149e
> change-id: 20240627-work-pidfs-fd415f4d3cd1
>
^ permalink raw reply [flat|nested] 10+ messages in thread