linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] fs: introduce dedicated idmap type for mounts
@ 2022-10-28 11:10 Christian Brauner
  2022-10-28 11:10 ` [PATCH 2/2] acl: conver higher-level helpers to rely on mnt_idmap Christian Brauner
  2022-10-31 14:32 ` [PATCH 1/2] fs: introduce dedicated idmap type for mounts Seth Forshee
  0 siblings, 2 replies; 4+ messages in thread
From: Christian Brauner @ 2022-10-28 11:10 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christian Brauner, Al Viro, Christoph Hellwig, Seth Forshee

Last cycle we've already made the interaction with idmapped mounts more
robust and type safe by introducing the vfs{g,u}id_t type. This cycle we
concluded the conversion and removed the legacy helpers.

Currently we still pass around the plain namespace that was attached to
a mount. This is in general pretty convenient but it makes it easy to
conflate filesystem and mount namespaces and what different roles they
have to play. Especially for filesystem developers without much
experience in this area this is an easy source for bugs.

Instead of passing the plain namespace we introduce a dedicated type
struct mnt_idmap and replace the pointer with a pointer to a struct
mnt_idmap. There are no semantic or size changes for the mount struct
caused by this.

We then start converting all places aware of idmapped mounts to rely on
struct mnt_idmap. Once the conversion is done all helpers down to the
really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a
struct mnt_idmap argument instead of two namespace arguments. This way
it becomes impossible to conflate the two removing and thus eliminating
the possibility of any bugs. Fwiw, I fixed some issues in that area a
while ago in ntfs3 and ksmbd in the past. Afterwards only low-level code
can ultimately use the associated namespace for any permission checks.
Even most of the vfs can be completely obivious about this ultimately
and filesystems will never interact with it in any form in the future.

A struct mnt_idmap currently encompasses a simple refcount a pointer to
the relevant namespace the mount is idmapped to. If a mount isn't
idmapped then it will point to a static nop_mnt_idmap and if it doesn't
that it is idmapped. As usual there are no allocations or anything
happening for non-idmapped mounts. Everthing is carefully written to be
a nop for non-idmapped mounts as has always been the case.

If an idmapped mount is created a struct mnt_idmap is allocated and a
reference taken on the relevant namespace. Each mount that gets idmapped
or inherits the idmap simply bumps the reference count on struct
mnt_idmap. Just a reminder that we only allow a mount to change it's
idmapping a single time and only if it hasn't already been attached to
the filesystems and has no active writers.

The actual changes are fairly straightforward but this will have huge
benefits for maintenance and security in the long run even if it causes
some churn. I'm aware that there's some cost for all of you. And I'll
commit to doing this work and make this as painless as I can.

Note that this also makes it possible to extend struct mount_idmap in
the future. For example, it would be possible to place the namespace
pointer in an anonymous union together with an idmapping struct. This
would allow us to expose an api to userspace that would let it specify
idmappings directly instead of having to go through the detour of
setting up namespaces at all.

This adds the infrastructure.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/namespace.c                | 136 ++++++++++++++++++++++++++--------
 include/linux/fs.h            |  10 ++-
 include/linux/mnt_idmapping.h |   7 ++
 include/linux/mount.h         |  10 ++-
 4 files changed, 127 insertions(+), 36 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index df137ba19d37..788479fc784b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -75,6 +75,11 @@ static DECLARE_RWSEM(namespace_sem);
 static HLIST_HEAD(unmounted);	/* protected by namespace_sem */
 static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
 
+struct mnt_idmap nop_mnt_idmap = {
+	.owner = &init_user_ns,
+	.count	 = REFCOUNT_INIT(1),
+};
+
 struct mount_kattr {
 	unsigned int attr_set;
 	unsigned int attr_clr;
@@ -82,6 +87,7 @@ struct mount_kattr {
 	unsigned int lookup_flags;
 	bool recurse;
 	struct user_namespace *mnt_userns;
+	struct mnt_idmap *mnt_idmap;
 };
 
 /* /sys/fs */
@@ -193,6 +199,83 @@ int mnt_get_count(struct mount *mnt)
 #endif
 }
 
+/**
+ * mnt_user_ns - retrieve owner of the mount's idmapping
+ * @mnt: the relevant vfsmount
+ *
+ * This helper will go away once the conversion to use struct mnt_idmap
+ * everywhere has finished at which point the helper will be unexported and
+ * move to fs/internal.h. Only code that needs to perform permission checks
+ * based on the owner of the idmapping will get access to it. All other code
+ * will solely rely on idmappings. This will get us type safety so it's
+ * impossible to conflate filesystems idmappings with mount idmappings.
+ *
+ * Return: The owner of the idmapping.
+ */
+struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
+{
+	struct mnt_idmap *idmap = mnt_idmapping(mnt);
+
+	/* Return the actual owner of the filesystem instead of the nop. */
+	if (idmap == &nop_mnt_idmap &&
+	    !initial_idmapping(mnt->mnt_sb->s_user_ns))
+		return mnt->mnt_sb->s_user_ns;
+	return mnt_idmapping(mnt)->owner;
+}
+EXPORT_SYMBOL_GPL(mnt_user_ns);
+
+/**
+ * alloc_mnt_idmap - allocate a new idmapping for the mount
+ * @mnt_userns: owning userns of the idmapping
+ *
+ * Allocate a new struct mnt_idmap which carries the idmapping of the mount.
+ *
+ * Return: On success a new idmap, on error an error pointer is returned.
+ */
+static struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns)
+{
+	struct mnt_idmap *idmap;
+
+	idmap = kzalloc(sizeof(struct mnt_idmap), GFP_KERNEL_ACCOUNT);
+	if (!idmap)
+		return ERR_PTR(-ENOMEM);
+
+	idmap->owner = get_user_ns(mnt_userns);
+	refcount_set(&idmap->count, 1);
+	return idmap;
+}
+
+/**
+ * mnt_idmap_get - get a reference to an idmapping
+ * @idmap: the idmap to bump the reference on
+ *
+ * If @idmap is not the @nop_mnt_idmap bump the reference count.
+ *
+ * Return: @idmap with reference count bumped if @not_mnt_idmap isn't passed.
+ */
+static inline struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap)
+{
+	if (idmap != &nop_mnt_idmap)
+		refcount_inc(&idmap->count);
+
+	return idmap;
+}
+
+/**
+ * mnt_idmap_put - put a reference to an idmapping
+ * @idmap: the idmap to put the reference on
+ *
+ * If this is a non-initial idmapping, put the reference count when a mount is
+ * released and free it if we're the last user.
+ */
+static inline void mnt_idmap_put(struct mnt_idmap *idmap)
+{
+	if (idmap != &nop_mnt_idmap && refcount_dec_and_test(&idmap->count)) {
+		put_user_ns(idmap->owner);
+		kfree(idmap);
+	}
+}
+
 static struct mount *alloc_vfsmnt(const char *name)
 {
 	struct mount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -232,7 +315,7 @@ static struct mount *alloc_vfsmnt(const char *name)
 		INIT_HLIST_NODE(&mnt->mnt_mp_list);
 		INIT_LIST_HEAD(&mnt->mnt_umounting);
 		INIT_HLIST_HEAD(&mnt->mnt_stuck_children);
-		mnt->mnt.mnt_userns = &init_user_ns;
+		mnt->mnt.mnt_idmap = &nop_mnt_idmap;
 	}
 	return mnt;
 
@@ -602,11 +685,10 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 
 static void free_vfsmnt(struct mount *mnt)
 {
-	struct user_namespace *mnt_userns;
+	struct mnt_idmap *idmap;
 
-	mnt_userns = mnt_user_ns(&mnt->mnt);
-	if (!initial_idmapping(mnt_userns))
-		put_user_ns(mnt_userns);
+	idmap = mnt_idmapping(&mnt->mnt);
+	mnt_idmap_put(idmap);
 	kfree_const(mnt->mnt_devname);
 #ifdef CONFIG_SMP
 	free_percpu(mnt->mnt_pcp);
@@ -1009,7 +1091,6 @@ static struct mount *skip_mnt_tree(struct mount *p)
 struct vfsmount *vfs_create_mount(struct fs_context *fc)
 {
 	struct mount *mnt;
-	struct user_namespace *fs_userns;
 
 	if (!fc->root)
 		return ERR_PTR(-EINVAL);
@@ -1027,10 +1108,6 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	mnt->mnt_mountpoint	= mnt->mnt.mnt_root;
 	mnt->mnt_parent		= mnt;
 
-	fs_userns = mnt->mnt.mnt_sb->s_user_ns;
-	if (!initial_idmapping(fs_userns))
-		mnt->mnt.mnt_userns = get_user_ns(fs_userns);
-
 	lock_mount_hash();
 	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
 	unlock_mount_hash();
@@ -1120,9 +1197,8 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
 
 	atomic_inc(&sb->s_active);
-	mnt->mnt.mnt_userns = mnt_user_ns(&old->mnt);
-	if (!initial_idmapping(mnt->mnt.mnt_userns))
-		mnt->mnt.mnt_userns = get_user_ns(mnt->mnt.mnt_userns);
+	mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmapping(&old->mnt));
+
 	mnt->mnt.mnt_sb = sb;
 	mnt->mnt.mnt_root = dget(root);
 	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
@@ -4082,27 +4158,18 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 
 static void do_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
 {
-	struct user_namespace *mnt_userns, *old_mnt_userns;
-
 	if (!kattr->mnt_userns)
 		return;
 
 	/*
-	 * We're the only ones able to change the mount's idmapping. So
-	 * mnt->mnt.mnt_userns is stable and we can retrieve it directly.
-	 */
-	old_mnt_userns = mnt->mnt.mnt_userns;
-
-	mnt_userns = get_user_ns(kattr->mnt_userns);
-	/* Pairs with smp_load_acquire() in mnt_user_ns(). */
-	smp_store_release(&mnt->mnt.mnt_userns, mnt_userns);
-
-	/*
-	 * If this is an idmapped filesystem drop the reference we've taken
-	 * in vfs_create_mount() before.
+	 * Pairs with smp_load_acquire() in mnt_idmapping().
+	 *
+	 * Since we only allow a mount to change the idmapping once and
+	 * verified this in can_idmap_mount() we know that the mount has
+	 * @nop_mnt_idmap attached to it. So there's no need to drop any
+	 * references.
 	 */
-	if (!initial_idmapping(old_mnt_userns))
-		put_user_ns(old_mnt_userns);
+	smp_store_release(&mnt->mnt.mnt_idmap, mnt_idmap_get(kattr->mnt_idmap));
 }
 
 static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
@@ -4131,11 +4198,19 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
 {
 	struct mount *mnt = real_mount(path->mnt);
+	struct mnt_idmap *mnt_idmap;
 	int err = 0;
 
 	if (path->dentry != mnt->mnt.mnt_root)
 		return -EINVAL;
 
+	if (kattr->mnt_userns) {
+		mnt_idmap = alloc_mnt_idmap(kattr->mnt_userns);
+		if (IS_ERR(mnt_idmap))
+			return PTR_ERR(mnt_idmap);
+		kattr->mnt_idmap = mnt_idmap;
+	}
+
 	if (kattr->propagation) {
 		/*
 		 * Only take namespace_lock() if we're actually changing
@@ -4323,6 +4398,9 @@ static void finish_mount_kattr(struct mount_kattr *kattr)
 {
 	put_user_ns(kattr->mnt_userns);
 	kattr->mnt_userns = NULL;
+
+	if (kattr->mnt_idmap)
+		mnt_idmap_put(kattr->mnt_idmap);
 }
 
 SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cffc402c2451..fc03099bb895 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2700,18 +2700,22 @@ static inline struct user_namespace *file_mnt_user_ns(struct file *file)
 	return mnt_user_ns(file->f_path.mnt);
 }
 
+static inline struct mnt_idmap *file_mnt_idmap(struct file *file)
+{
+	return mnt_idmapping(file->f_path.mnt);
+}
+
 /**
  * is_idmapped_mnt - check whether a mount is mapped
  * @mnt: the mount to check
  *
- * If @mnt has an idmapping attached different from the
- * filesystem's idmapping then @mnt is mapped.
+ * If @mnt has an non @nop_mnt_idmap attached to it then @mnt is mapped.
  *
  * Return: true if mount is mapped, false if not.
  */
 static inline bool is_idmapped_mnt(const struct vfsmount *mnt)
 {
-	return mnt_user_ns(mnt) != mnt->mnt_sb->s_user_ns;
+	return mnt_idmapping(mnt) != &nop_mnt_idmap;
 }
 
 extern long vfs_truncate(const struct path *, loff_t);
diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h
index c8002294a72d..a918facee0cf 100644
--- a/include/linux/mnt_idmapping.h
+++ b/include/linux/mnt_idmapping.h
@@ -13,6 +13,13 @@ struct user_namespace;
  */
 extern struct user_namespace init_user_ns;
 
+struct mnt_idmap {
+	struct user_namespace *owner;
+	refcount_t count;
+};
+
+extern struct mnt_idmap nop_mnt_idmap;
+
 typedef struct {
 	uid_t val;
 } vfsuid_t;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 55a4abaf6715..0df15a02d460 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -16,6 +16,7 @@
 struct super_block;
 struct dentry;
 struct user_namespace;
+struct mnt_idmap;
 struct file_system_type;
 struct fs_context;
 struct file;
@@ -70,13 +71,14 @@ struct vfsmount {
 	struct dentry *mnt_root;	/* root of the mounted tree */
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	int mnt_flags;
-	struct user_namespace *mnt_userns;
+	struct mnt_idmap *mnt_idmap;
 } __randomize_layout;
 
-static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
+struct user_namespace *mnt_user_ns(const struct vfsmount *mnt);
+static inline struct mnt_idmap *mnt_idmapping(const struct vfsmount *mnt)
 {
-	/* Pairs with smp_store_release() in do_idmap_mount(). */
-	return smp_load_acquire(&mnt->mnt_userns);
+	/* Pairs with smp_rmb() in do_idmap_mount(). */
+	return smp_load_acquire(&mnt->mnt_idmap);
 }
 
 extern int mnt_want_write(struct vfsmount *mnt);

base-commit: f7adeea9ebdbf73454f083c21de57579e982f2a1
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] acl: conver higher-level helpers to rely on mnt_idmap
  2022-10-28 11:10 [PATCH 1/2] fs: introduce dedicated idmap type for mounts Christian Brauner
@ 2022-10-28 11:10 ` Christian Brauner
  2022-10-31 14:33   ` Seth Forshee
  2022-10-31 14:32 ` [PATCH 1/2] fs: introduce dedicated idmap type for mounts Seth Forshee
  1 sibling, 1 reply; 4+ messages in thread
From: Christian Brauner @ 2022-10-28 11:10 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christian Brauner, Al Viro, Christoph Hellwig, Seth Forshee

Convert an initial portion to rely on struct mnt_idmap by converting the
high level xattr helpers.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/internal.h    | 12 ++++++------
 fs/posix_acl.c   | 15 ++++++++-------
 fs/xattr.c       | 39 ++++++++++++++++++++-------------------
 io_uring/xattr.c |  6 +++---
 4 files changed, 37 insertions(+), 35 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 0c8812fe7ca4..a803cc3cf716 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -227,28 +227,28 @@ struct xattr_ctx {
 };
 
 
-ssize_t do_getxattr(struct user_namespace *mnt_userns,
+ssize_t do_getxattr(struct mnt_idmap *idmap,
 		    struct dentry *d,
 		    struct xattr_ctx *ctx);
 
 int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
-int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		struct xattr_ctx *ctx);
 int may_write_xattr(struct user_namespace *mnt_userns, struct inode *inode);
 
 #ifdef CONFIG_FS_POSIX_ACL
-int do_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
+int do_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
 	       const char *acl_name, const void *kvalue, size_t size);
-ssize_t do_get_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
+ssize_t do_get_acl(struct mnt_idmap *idmap, struct dentry *dentry,
 		   const char *acl_name, void *kvalue, size_t size);
 #else
-static inline int do_set_acl(struct user_namespace *mnt_userns,
+static inline int do_set_acl(struct mnt_idmap *idmap,
 			     struct dentry *dentry, const char *acl_name,
 			     const void *kvalue, size_t size)
 {
 	return -EOPNOTSUPP;
 }
-static inline ssize_t do_get_acl(struct user_namespace *mnt_userns,
+static inline ssize_t do_get_acl(struct mnt_idmap *idmap,
 				 struct dentry *dentry, const char *acl_name,
 				 void *kvalue, size_t size)
 {
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 989bbf280bfe..47b5263ba92e 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -871,7 +871,7 @@ EXPORT_SYMBOL (posix_acl_to_xattr);
 
 /**
  * vfs_posix_acl_to_xattr - convert from kernel to userspace representation
- * @mnt_userns: user namespace of the mount
+ * @idmap: idmap of the mount
  * @inode: inode the posix acls are set on
  * @acl: the posix acls as represented by the vfs
  * @buffer: the buffer into which to convert @acl
@@ -884,7 +884,7 @@ EXPORT_SYMBOL (posix_acl_to_xattr);
  * Return: On success, the size of the stored uapi posix acls, on error a
  * negative errno.
  */
-static ssize_t vfs_posix_acl_to_xattr(struct user_namespace *mnt_userns,
+static ssize_t vfs_posix_acl_to_xattr(struct mnt_idmap *idmap,
 				      struct inode *inode,
 				      const struct posix_acl *acl, void *buffer,
 				      size_t size)
@@ -893,6 +893,7 @@ static ssize_t vfs_posix_acl_to_xattr(struct user_namespace *mnt_userns,
 	struct posix_acl_xattr_header *ext_acl = buffer;
 	struct posix_acl_xattr_entry *ext_entry;
 	struct user_namespace *fs_userns, *caller_userns;
+	struct user_namespace *mnt_userns = idmap->owner;
 	ssize_t real_size, n;
 	vfsuid_t vfsuid;
 	vfsgid_t vfsgid;
@@ -1227,7 +1228,7 @@ int vfs_remove_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
 }
 EXPORT_SYMBOL_GPL(vfs_remove_acl);
 
-int do_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
+int do_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
 	       const char *acl_name, const void *kvalue, size_t size)
 {
 	int error;
@@ -1243,22 +1244,22 @@ int do_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
 			return PTR_ERR(acl);
 	}
 
-	error = vfs_set_acl(mnt_userns, dentry, acl_name, acl);
+	error = vfs_set_acl(idmap->owner, dentry, acl_name, acl);
 	posix_acl_release(acl);
 	return error;
 }
 
-ssize_t do_get_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
+ssize_t do_get_acl(struct mnt_idmap *idmap, struct dentry *dentry,
 		   const char *acl_name, void *kvalue, size_t size)
 {
 	ssize_t error;
 	struct posix_acl *acl;
 
-	acl = vfs_get_acl(mnt_userns, dentry, acl_name);
+	acl = vfs_get_acl(idmap->owner, dentry, acl_name);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
 
-	error = vfs_posix_acl_to_xattr(mnt_userns, d_inode(dentry),
+	error = vfs_posix_acl_to_xattr(idmap, d_inode(dentry),
 				       acl, kvalue, size);
 	posix_acl_release(acl);
 	return error;
diff --git a/fs/xattr.c b/fs/xattr.c
index df3af9fa8c77..1c01cca472ea 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -597,19 +597,19 @@ int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
 	return error;
 }
 
-int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		struct xattr_ctx *ctx)
 {
 	if (is_posix_acl_xattr(ctx->kname->name))
-		return do_set_acl(mnt_userns, dentry, ctx->kname->name,
+		return do_set_acl(idmap, dentry, ctx->kname->name,
 				  ctx->kvalue, ctx->size);
 
-	return vfs_setxattr(mnt_userns, dentry, ctx->kname->name,
+	return vfs_setxattr(idmap->owner, dentry, ctx->kname->name,
 			ctx->kvalue, ctx->size, ctx->flags);
 }
 
 static long
-setxattr(struct user_namespace *mnt_userns, struct dentry *d,
+setxattr(struct mnt_idmap *idmap, struct dentry *d,
 	const char __user *name, const void __user *value, size_t size,
 	int flags)
 {
@@ -627,7 +627,7 @@ setxattr(struct user_namespace *mnt_userns, struct dentry *d,
 	if (error)
 		return error;
 
-	error = do_setxattr(mnt_userns, d, &ctx);
+	error = do_setxattr(idmap, d, &ctx);
 
 	kvfree(ctx.kvalue);
 	return error;
@@ -646,7 +646,7 @@ static int path_setxattr(const char __user *pathname,
 		return error;
 	error = mnt_want_write(path.mnt);
 	if (!error) {
-		error = setxattr(mnt_user_ns(path.mnt), path.dentry, name,
+		error = setxattr(mnt_idmapping(path.mnt), path.dentry, name,
 				 value, size, flags);
 		mnt_drop_write(path.mnt);
 	}
@@ -683,7 +683,7 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 	audit_file(f.file);
 	error = mnt_want_write_file(f.file);
 	if (!error) {
-		error = setxattr(file_mnt_user_ns(f.file),
+		error = setxattr(file_mnt_idmap(f.file),
 				 f.file->f_path.dentry, name,
 				 value, size, flags);
 		mnt_drop_write_file(f.file);
@@ -696,7 +696,7 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
  * Extended attribute GET operations
  */
 ssize_t
-do_getxattr(struct user_namespace *mnt_userns, struct dentry *d,
+do_getxattr(struct mnt_idmap *idmap, struct dentry *d,
 	struct xattr_ctx *ctx)
 {
 	ssize_t error;
@@ -711,9 +711,9 @@ do_getxattr(struct user_namespace *mnt_userns, struct dentry *d,
 	}
 
 	if (is_posix_acl_xattr(ctx->kname->name))
-		error = do_get_acl(mnt_userns, d, kname, ctx->kvalue, ctx->size);
+		error = do_get_acl(idmap, d, kname, ctx->kvalue, ctx->size);
 	else
-		error = vfs_getxattr(mnt_userns, d, kname, ctx->kvalue, ctx->size);
+		error = vfs_getxattr(idmap->owner, d, kname, ctx->kvalue, ctx->size);
 	if (error > 0) {
 		if (ctx->size && copy_to_user(ctx->value, ctx->kvalue, error))
 			error = -EFAULT;
@@ -727,7 +727,7 @@ do_getxattr(struct user_namespace *mnt_userns, struct dentry *d,
 }
 
 static ssize_t
-getxattr(struct user_namespace *mnt_userns, struct dentry *d,
+getxattr(struct mnt_idmap *idmap, struct dentry *d,
 	 const char __user *name, void __user *value, size_t size)
 {
 	ssize_t error;
@@ -746,7 +746,7 @@ getxattr(struct user_namespace *mnt_userns, struct dentry *d,
 	if (error < 0)
 		return error;
 
-	error =  do_getxattr(mnt_userns, d, &ctx);
+	error =  do_getxattr(idmap, d, &ctx);
 
 	kvfree(ctx.kvalue);
 	return error;
@@ -762,7 +762,8 @@ static ssize_t path_getxattr(const char __user *pathname,
 	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
 	if (error)
 		return error;
-	error = getxattr(mnt_user_ns(path.mnt), path.dentry, name, value, size);
+	error = getxattr(mnt_idmapping(path.mnt), path.dentry,
+			 name, value, size);
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
@@ -792,7 +793,7 @@ SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
 	if (!f.file)
 		return error;
 	audit_file(f.file);
-	error = getxattr(file_mnt_user_ns(f.file), f.file->f_path.dentry,
+	error = getxattr(file_mnt_idmap(f.file), f.file->f_path.dentry,
 			 name, value, size);
 	fdput(f);
 	return error;
@@ -877,7 +878,7 @@ SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
  * Extended attribute REMOVE operations
  */
 static long
-removexattr(struct user_namespace *mnt_userns, struct dentry *d,
+removexattr(struct mnt_idmap *idmap, struct dentry *d,
 	    const char __user *name)
 {
 	int error;
@@ -890,9 +891,9 @@ removexattr(struct user_namespace *mnt_userns, struct dentry *d,
 		return error;
 
 	if (is_posix_acl_xattr(kname))
-		return vfs_remove_acl(mnt_userns, d, kname);
+		return vfs_remove_acl(idmap->owner, d, kname);
 
-	return vfs_removexattr(mnt_userns, d, kname);
+	return vfs_removexattr(idmap->owner, d, kname);
 }
 
 static int path_removexattr(const char __user *pathname,
@@ -906,7 +907,7 @@ static int path_removexattr(const char __user *pathname,
 		return error;
 	error = mnt_want_write(path.mnt);
 	if (!error) {
-		error = removexattr(mnt_user_ns(path.mnt), path.dentry, name);
+		error = removexattr(mnt_idmapping(path.mnt), path.dentry, name);
 		mnt_drop_write(path.mnt);
 	}
 	path_put(&path);
@@ -939,7 +940,7 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 	audit_file(f.file);
 	error = mnt_want_write_file(f.file);
 	if (!error) {
-		error = removexattr(file_mnt_user_ns(f.file),
+		error = removexattr(file_mnt_idmap(f.file),
 				    f.file->f_path.dentry, name);
 		mnt_drop_write_file(f.file);
 	}
diff --git a/io_uring/xattr.c b/io_uring/xattr.c
index 99df641594d7..1da0f06f3634 100644
--- a/io_uring/xattr.c
+++ b/io_uring/xattr.c
@@ -112,7 +112,7 @@ int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
 	if (issue_flags & IO_URING_F_NONBLOCK)
 		return -EAGAIN;
 
-	ret = do_getxattr(mnt_user_ns(req->file->f_path.mnt),
+	ret = do_getxattr(mnt_idmapping(req->file->f_path.mnt),
 			req->file->f_path.dentry,
 			&ix->ctx);
 
@@ -133,7 +133,7 @@ int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
 retry:
 	ret = filename_lookup(AT_FDCWD, ix->filename, lookup_flags, &path, NULL);
 	if (!ret) {
-		ret = do_getxattr(mnt_user_ns(path.mnt),
+		ret = do_getxattr(mnt_idmapping(path.mnt),
 				path.dentry,
 				&ix->ctx);
 
@@ -213,7 +213,7 @@ static int __io_setxattr(struct io_kiocb *req, unsigned int issue_flags,
 
 	ret = mnt_want_write(path->mnt);
 	if (!ret) {
-		ret = do_setxattr(mnt_user_ns(path->mnt), path->dentry, &ix->ctx);
+		ret = do_setxattr(mnt_idmapping(path->mnt), path->dentry, &ix->ctx);
 		mnt_drop_write(path->mnt);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] fs: introduce dedicated idmap type for mounts
  2022-10-28 11:10 [PATCH 1/2] fs: introduce dedicated idmap type for mounts Christian Brauner
  2022-10-28 11:10 ` [PATCH 2/2] acl: conver higher-level helpers to rely on mnt_idmap Christian Brauner
@ 2022-10-31 14:32 ` Seth Forshee
  1 sibling, 0 replies; 4+ messages in thread
From: Seth Forshee @ 2022-10-31 14:32 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Al Viro, Christoph Hellwig

On Fri, Oct 28, 2022 at 01:10:41PM +0200, Christian Brauner wrote:
> Last cycle we've already made the interaction with idmapped mounts more
> robust and type safe by introducing the vfs{g,u}id_t type. This cycle we
> concluded the conversion and removed the legacy helpers.
> 
> Currently we still pass around the plain namespace that was attached to
> a mount. This is in general pretty convenient but it makes it easy to
> conflate filesystem and mount namespaces and what different roles they
> have to play. Especially for filesystem developers without much
> experience in this area this is an easy source for bugs.
> 
> Instead of passing the plain namespace we introduce a dedicated type
> struct mnt_idmap and replace the pointer with a pointer to a struct
> mnt_idmap. There are no semantic or size changes for the mount struct
> caused by this.
> 
> We then start converting all places aware of idmapped mounts to rely on
> struct mnt_idmap. Once the conversion is done all helpers down to the
> really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a
> struct mnt_idmap argument instead of two namespace arguments. This way
> it becomes impossible to conflate the two removing and thus eliminating
> the possibility of any bugs. Fwiw, I fixed some issues in that area a
> while ago in ntfs3 and ksmbd in the past. Afterwards only low-level code
> can ultimately use the associated namespace for any permission checks.
> Even most of the vfs can be completely obivious about this ultimately
> and filesystems will never interact with it in any form in the future.
> 
> A struct mnt_idmap currently encompasses a simple refcount a pointer to
> the relevant namespace the mount is idmapped to. If a mount isn't
> idmapped then it will point to a static nop_mnt_idmap and if it doesn't
> that it is idmapped. As usual there are no allocations or anything
> happening for non-idmapped mounts. Everthing is carefully written to be
> a nop for non-idmapped mounts as has always been the case.
> 
> If an idmapped mount is created a struct mnt_idmap is allocated and a
> reference taken on the relevant namespace. Each mount that gets idmapped
> or inherits the idmap simply bumps the reference count on struct
> mnt_idmap. Just a reminder that we only allow a mount to change it's
> idmapping a single time and only if it hasn't already been attached to
> the filesystems and has no active writers.
> 
> The actual changes are fairly straightforward but this will have huge
> benefits for maintenance and security in the long run even if it causes
> some churn. I'm aware that there's some cost for all of you. And I'll
> commit to doing this work and make this as painless as I can.
> 
> Note that this also makes it possible to extend struct mount_idmap in
> the future. For example, it would be possible to place the namespace
> pointer in an anonymous union together with an idmapping struct. This
> would allow us to expose an api to userspace that would let it specify
> idmappings directly instead of having to go through the detour of
> setting up namespaces at all.
> 
> This adds the infrastructure.
> 
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

I like this for avoiding confusion about which namespace is which, and
for the clarity provided by nop_mnt_idmapping.

Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] acl: conver higher-level helpers to rely on mnt_idmap
  2022-10-28 11:10 ` [PATCH 2/2] acl: conver higher-level helpers to rely on mnt_idmap Christian Brauner
@ 2022-10-31 14:33   ` Seth Forshee
  0 siblings, 0 replies; 4+ messages in thread
From: Seth Forshee @ 2022-10-31 14:33 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Al Viro, Christoph Hellwig

On Fri, Oct 28, 2022 at 01:10:42PM +0200, Christian Brauner wrote:
> Convert an initial portion to rely on struct mnt_idmap by converting the
> high level xattr helpers.
> 
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-31 14:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-28 11:10 [PATCH 1/2] fs: introduce dedicated idmap type for mounts Christian Brauner
2022-10-28 11:10 ` [PATCH 2/2] acl: conver higher-level helpers to rely on mnt_idmap Christian Brauner
2022-10-31 14:33   ` Seth Forshee
2022-10-31 14:32 ` [PATCH 1/2] fs: introduce dedicated idmap type for mounts Seth Forshee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).