linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO
@ 2023-08-09  4:28 Hugh Dickins
  2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-09  4:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

This series enables and limits user extended attributes on tmpfs,
and independently provides a trivial direct IO stub for tmpfs.

It is here based on the vfs.tmpfs branch in vfs.git in next-20230808
but with a cherry-pick of v6.5-rc4's commit
253e5df8b8f0 ("tmpfs: fix Documentation of noswap and huge mount options")
first: since the vfs.tmpfs branch is based on v6.5-rc1, but 3/5 in this
series updates tmpfs.rst in a way which depends on that commit.

IIUC the right thing to do would be to cherry-pick 253e5df8b8f0 into
vfs.tmpfs before applying this series.  I'm sorry that the series as
posted does not apply cleanly to any known tree! but I think posting
it against v6.5-rc5 or next-20230808 would be even less helpful.

There is one "conflict" between this series and the final next-20230808:
Jeff Layton's vfs.ctime mods update a line of shmem_xattr_handler_set(),
where neighbouring lines are modified by 1/5 and 3/5 here: easily
resolved in the merge commit, I hope.

1/5 xattr: simple_xattr_set() return old_xattr to be freed
2/5 tmpfs: track free_ispace instead of free_inodes
3/5 tmpfs,xattr: enable limited user extended attributes
4/5 tmpfs: trivial support for direct IO
5/5 mm: invalidation check mapping before folio_contains

 Documentation/filesystems/tmpfs.rst |   7 +-
 fs/Kconfig                          |   4 +-
 fs/kernfs/dir.c                     |   2 +-
 fs/kernfs/inode.c                   |  46 +++++++----
 fs/xattr.c                          |  79 +++++++++++-------
 include/linux/shmem_fs.h            |   2 +-
 include/linux/xattr.h               |  10 ++-
 mm/shmem.c                          | 130 +++++++++++++++++++++++-------
 mm/truncate.c                       |   4 +-
 9 files changed, 197 insertions(+), 87 deletions(-)

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed
  2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
@ 2023-08-09  4:30 ` Hugh Dickins
  2023-08-09  9:21   ` Jan Kara
                     ` (2 more replies)
  2023-08-09  4:32 ` [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes Hugh Dickins
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-09  4:30 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

tmpfs wants to support limited user extended attributes, but kernfs
(or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR)
already supports user extended attributes through simple xattrs: but
limited by a policy (128KiB per inode) too liberal to be used on tmpfs.

To allow a different limiting policy for tmpfs, without affecting the
policy for kernfs, change simple_xattr_set() to return the replaced or
removed xattr (if any), leaving the caller to update their accounting
then free the xattr (by simple_xattr_free(), renamed from the static
free_simple_xattr()).

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 fs/kernfs/inode.c     | 46 +++++++++++++++++++++++++---------------
 fs/xattr.c            | 51 +++++++++++++++++++--------------------------
 include/linux/xattr.h |  7 ++++---
 mm/shmem.c            | 10 +++++----
 4 files changed, 61 insertions(+), 53 deletions(-)

diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index b22b74d1a115..fec5d5f78f07 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -306,11 +306,17 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
 int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
 		     const void *value, size_t size, int flags)
 {
+	struct simple_xattr *old_xattr;
 	struct kernfs_iattrs *attrs = kernfs_iattrs(kn);
 	if (!attrs)
 		return -ENOMEM;
 
-	return simple_xattr_set(&attrs->xattrs, name, value, size, flags, NULL);
+	old_xattr = simple_xattr_set(&attrs->xattrs, name, value, size, flags);
+	if (IS_ERR(old_xattr))
+		return PTR_ERR(old_xattr);
+
+	simple_xattr_free(old_xattr);
+	return 0;
 }
 
 static int kernfs_vfs_xattr_get(const struct xattr_handler *handler,
@@ -342,7 +348,7 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
 {
 	atomic_t *sz = &kn->iattr->user_xattr_size;
 	atomic_t *nr = &kn->iattr->nr_user_xattrs;
-	ssize_t removed_size;
+	struct simple_xattr *old_xattr;
 	int ret;
 
 	if (atomic_inc_return(nr) > KERNFS_MAX_USER_XATTRS) {
@@ -355,13 +361,18 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
 		goto dec_size_out;
 	}
 
-	ret = simple_xattr_set(xattrs, full_name, value, size, flags,
-			       &removed_size);
-
-	if (!ret && removed_size >= 0)
-		size = removed_size;
-	else if (!ret)
+	old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
+	if (!old_xattr)
 		return 0;
+
+	if (IS_ERR(old_xattr)) {
+		ret = PTR_ERR(old_xattr);
+		goto dec_size_out;
+	}
+
+	ret = 0;
+	size = old_xattr->size;
+	simple_xattr_free(old_xattr);
 dec_size_out:
 	atomic_sub(size, sz);
 dec_count_out:
@@ -376,18 +387,19 @@ static int kernfs_vfs_user_xattr_rm(struct kernfs_node *kn,
 {
 	atomic_t *sz = &kn->iattr->user_xattr_size;
 	atomic_t *nr = &kn->iattr->nr_user_xattrs;
-	ssize_t removed_size;
-	int ret;
+	struct simple_xattr *old_xattr;
 
-	ret = simple_xattr_set(xattrs, full_name, value, size, flags,
-			       &removed_size);
+	old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
+	if (!old_xattr)
+		return 0;
 
-	if (removed_size >= 0) {
-		atomic_sub(removed_size, sz);
-		atomic_dec(nr);
-	}
+	if (IS_ERR(old_xattr))
+		return PTR_ERR(old_xattr);
 
-	return ret;
+	atomic_sub(old_xattr->size, sz);
+	atomic_dec(nr);
+	simple_xattr_free(old_xattr);
+	return 0;
 }
 
 static int kernfs_vfs_user_xattr_set(const struct xattr_handler *handler,
diff --git a/fs/xattr.c b/fs/xattr.c
index e7bbb7f57557..ba37a8f5cfd1 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1040,12 +1040,12 @@ const char *xattr_full_name(const struct xattr_handler *handler,
 EXPORT_SYMBOL(xattr_full_name);
 
 /**
- * free_simple_xattr - free an xattr object
+ * simple_xattr_free - free an xattr object
  * @xattr: the xattr object
  *
  * Free the xattr object. Can handle @xattr being NULL.
  */
-static inline void free_simple_xattr(struct simple_xattr *xattr)
+void simple_xattr_free(struct simple_xattr *xattr)
 {
 	if (xattr)
 		kfree(xattr->name);
@@ -1164,7 +1164,6 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
  * @value: the value to store along the xattr
  * @size: the size of @value
  * @flags: the flags determining how to set the xattr
- * @removed_size: the size of the removed xattr
  *
  * Set a new xattr object.
  * If @value is passed a new xattr object will be allocated. If XATTR_REPLACE
@@ -1181,29 +1180,27 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
  * nothing if XATTR_CREATE is specified in @flags or @flags is zero. For
  * XATTR_REPLACE we fail as mentioned above.
  *
- * Return: On success zero and on error a negative error code is returned.
+ * Return: On success, the removed or replaced xattr is returned, to be freed
+ * by the caller; or NULL if none. On failure a negative error code is returned.
  */
-int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
-		     const void *value, size_t size, int flags,
-		     ssize_t *removed_size)
+struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
+				      const char *name, const void *value,
+				      size_t size, int flags)
 {
-	struct simple_xattr *xattr = NULL, *new_xattr = NULL;
+	struct simple_xattr *old_xattr = NULL, *new_xattr = NULL;
 	struct rb_node *parent = NULL, **rbp;
 	int err = 0, ret;
 
-	if (removed_size)
-		*removed_size = -1;
-
 	/* value == NULL means remove */
 	if (value) {
 		new_xattr = simple_xattr_alloc(value, size);
 		if (!new_xattr)
-			return -ENOMEM;
+			return ERR_PTR(-ENOMEM);
 
 		new_xattr->name = kstrdup(name, GFP_KERNEL);
 		if (!new_xattr->name) {
-			free_simple_xattr(new_xattr);
-			return -ENOMEM;
+			simple_xattr_free(new_xattr);
+			return ERR_PTR(-ENOMEM);
 		}
 	}
 
@@ -1217,12 +1214,12 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
 		else if (ret > 0)
 			rbp = &(*rbp)->rb_right;
 		else
-			xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
-		if (xattr)
+			old_xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
+		if (old_xattr)
 			break;
 	}
 
-	if (xattr) {
+	if (old_xattr) {
 		/* Fail if XATTR_CREATE is requested and the xattr exists. */
 		if (flags & XATTR_CREATE) {
 			err = -EEXIST;
@@ -1230,12 +1227,10 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
 		}
 
 		if (new_xattr)
-			rb_replace_node(&xattr->rb_node, &new_xattr->rb_node,
-					&xattrs->rb_root);
+			rb_replace_node(&old_xattr->rb_node,
+					&new_xattr->rb_node, &xattrs->rb_root);
 		else
-			rb_erase(&xattr->rb_node, &xattrs->rb_root);
-		if (!err && removed_size)
-			*removed_size = xattr->size;
+			rb_erase(&old_xattr->rb_node, &xattrs->rb_root);
 	} else {
 		/* Fail if XATTR_REPLACE is requested but no xattr is found. */
 		if (flags & XATTR_REPLACE) {
@@ -1260,12 +1255,10 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
 
 out_unlock:
 	write_unlock(&xattrs->lock);
-	if (err)
-		free_simple_xattr(new_xattr);
-	else
-		free_simple_xattr(xattr);
-	return err;
-
+	if (!err)
+		return old_xattr;
+	simple_xattr_free(new_xattr);
+	return ERR_PTR(err);
 }
 
 static bool xattr_is_trusted(const char *name)
@@ -1386,7 +1379,7 @@ void simple_xattrs_free(struct simple_xattrs *xattrs)
 		rbp_next = rb_next(rbp);
 		xattr = rb_entry(rbp, struct simple_xattr, rb_node);
 		rb_erase(&xattr->rb_node, &xattrs->rb_root);
-		free_simple_xattr(xattr);
+		simple_xattr_free(xattr);
 		rbp = rbp_next;
 	}
 }
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index d591ef59aa98..e37fe667ae04 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -116,11 +116,12 @@ struct simple_xattr {
 void simple_xattrs_init(struct simple_xattrs *xattrs);
 void simple_xattrs_free(struct simple_xattrs *xattrs);
 struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
+void simple_xattr_free(struct simple_xattr *xattr);
 int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
 		     void *buffer, size_t size);
-int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
-		     const void *value, size_t size, int flags,
-		     ssize_t *removed_size);
+struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
+				      const char *name, const void *value,
+				      size_t size, int flags);
 ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
 			  char *buffer, size_t size);
 void simple_xattr_add(struct simple_xattrs *xattrs,
diff --git a/mm/shmem.c b/mm/shmem.c
index 0f83d86fd8b4..df3cabf54206 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3595,15 +3595,17 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
 				   size_t size, int flags)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	int err;
+	struct simple_xattr *old_xattr;
 
 	name = xattr_full_name(handler, name);
-	err = simple_xattr_set(&info->xattrs, name, value, size, flags, NULL);
-	if (!err) {
+	old_xattr = simple_xattr_set(&info->xattrs, name, value, size, flags);
+	if (!IS_ERR(old_xattr)) {
+		simple_xattr_free(old_xattr);
+		old_xattr = NULL;
 		inode->i_ctime = current_time(inode);
 		inode_inc_iversion(inode);
 	}
-	return err;
+	return PTR_ERR(old_xattr);
 }
 
 static const struct xattr_handler shmem_security_xattr_handler = {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes
  2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
  2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
@ 2023-08-09  4:32 ` Hugh Dickins
  2023-08-09  9:33   ` Jan Kara
  2023-08-09 13:29   ` Carlos Maiolino
  2023-08-09  4:33 ` [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes Hugh Dickins
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-09  4:32 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

In preparation for assigning some inode space to extended attributes,
keep track of free_ispace instead of number of free_inodes: as if one
tmpfs inode (and accompanying dentry) occupies very approximately 1KiB.

Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit:
but take care to enforce the maximum.  And fix the nr_blocks maximum on
32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead.

Delete the incorrect limited<->unlimited blocks/inodes comment above
shmem_reconfigure(): leave it to the error messages below to describe.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/shmem_fs.h |  2 +-
 mm/shmem.c               | 33 +++++++++++++++++----------------
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 9b2d2faff1d0..6b0c626620f5 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -54,7 +54,7 @@ struct shmem_sb_info {
 	unsigned long max_blocks;   /* How many blocks are allowed */
 	struct percpu_counter used_blocks;  /* How many are allocated */
 	unsigned long max_inodes;   /* How many inodes are allowed */
-	unsigned long free_inodes;  /* How many are left for allocation */
+	unsigned long free_ispace;  /* How much ispace left for allocation */
 	raw_spinlock_t stat_lock;   /* Serialize shmem_sb_info changes */
 	umode_t mode;		    /* Mount mode for root directory */
 	unsigned char huge;	    /* Whether to try for hugepages */
diff --git a/mm/shmem.c b/mm/shmem.c
index df3cabf54206..c39471384168 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -90,6 +90,9 @@ static struct vfsmount *shm_mnt;
 /* Pretend that each entry is of this size in directory's i_size */
 #define BOGO_DIRENT_SIZE 20
 
+/* Pretend that one inode + its dentry occupy this much memory */
+#define BOGO_INODE_SIZE 1024
+
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
@@ -137,7 +140,8 @@ static unsigned long shmem_default_max_inodes(void)
 {
 	unsigned long nr_pages = totalram_pages();
 
-	return min(nr_pages - totalhigh_pages(), nr_pages / 2);
+	return min3(nr_pages - totalhigh_pages(), nr_pages / 2,
+			ULONG_MAX / BOGO_INODE_SIZE);
 }
 #endif
 
@@ -331,11 +335,11 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop)
 	if (!(sb->s_flags & SB_KERNMOUNT)) {
 		raw_spin_lock(&sbinfo->stat_lock);
 		if (sbinfo->max_inodes) {
-			if (!sbinfo->free_inodes) {
+			if (sbinfo->free_ispace < BOGO_INODE_SIZE) {
 				raw_spin_unlock(&sbinfo->stat_lock);
 				return -ENOSPC;
 			}
-			sbinfo->free_inodes--;
+			sbinfo->free_ispace -= BOGO_INODE_SIZE;
 		}
 		if (inop) {
 			ino = sbinfo->next_ino++;
@@ -394,7 +398,7 @@ static void shmem_free_inode(struct super_block *sb)
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 	if (sbinfo->max_inodes) {
 		raw_spin_lock(&sbinfo->stat_lock);
-		sbinfo->free_inodes++;
+		sbinfo->free_ispace += BOGO_INODE_SIZE;
 		raw_spin_unlock(&sbinfo->stat_lock);
 	}
 }
@@ -3155,7 +3159,7 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 	}
 	if (sbinfo->max_inodes) {
 		buf->f_files = sbinfo->max_inodes;
-		buf->f_ffree = sbinfo->free_inodes;
+		buf->f_ffree = sbinfo->free_ispace / BOGO_INODE_SIZE;
 	}
 	/* else leave those fields 0 like simple_statfs */
 
@@ -3815,13 +3819,13 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		break;
 	case Opt_nr_blocks:
 		ctx->blocks = memparse(param->string, &rest);
-		if (*rest || ctx->blocks > S64_MAX)
+		if (*rest || ctx->blocks > LONG_MAX)
 			goto bad_value;
 		ctx->seen |= SHMEM_SEEN_BLOCKS;
 		break;
 	case Opt_nr_inodes:
 		ctx->inodes = memparse(param->string, &rest);
-		if (*rest)
+		if (*rest || ctx->inodes > ULONG_MAX / BOGO_INODE_SIZE)
 			goto bad_value;
 		ctx->seen |= SHMEM_SEEN_INODES;
 		break;
@@ -4002,21 +4006,17 @@ static int shmem_parse_options(struct fs_context *fc, void *data)
 
 /*
  * Reconfigure a shmem filesystem.
- *
- * Note that we disallow change from limited->unlimited blocks/inodes while any
- * are in use; but we must separately disallow unlimited->limited, because in
- * that case we have no record of how much is already in use.
  */
 static int shmem_reconfigure(struct fs_context *fc)
 {
 	struct shmem_options *ctx = fc->fs_private;
 	struct shmem_sb_info *sbinfo = SHMEM_SB(fc->root->d_sb);
-	unsigned long inodes;
+	unsigned long used_isp;
 	struct mempolicy *mpol = NULL;
 	const char *err;
 
 	raw_spin_lock(&sbinfo->stat_lock);
-	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
+	used_isp = sbinfo->max_inodes * BOGO_INODE_SIZE - sbinfo->free_ispace;
 
 	if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
 		if (!sbinfo->max_blocks) {
@@ -4034,7 +4034,7 @@ static int shmem_reconfigure(struct fs_context *fc)
 			err = "Cannot retroactively limit inodes";
 			goto out;
 		}
-		if (ctx->inodes < inodes) {
+		if (ctx->inodes * BOGO_INODE_SIZE < used_isp) {
 			err = "Too few inodes for current use";
 			goto out;
 		}
@@ -4080,7 +4080,7 @@ static int shmem_reconfigure(struct fs_context *fc)
 		sbinfo->max_blocks  = ctx->blocks;
 	if (ctx->seen & SHMEM_SEEN_INODES) {
 		sbinfo->max_inodes  = ctx->inodes;
-		sbinfo->free_inodes = ctx->inodes - inodes;
+		sbinfo->free_ispace = ctx->inodes * BOGO_INODE_SIZE - used_isp;
 	}
 
 	/*
@@ -4211,7 +4211,8 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_flags |= SB_NOUSER;
 #endif
 	sbinfo->max_blocks = ctx->blocks;
-	sbinfo->free_inodes = sbinfo->max_inodes = ctx->inodes;
+	sbinfo->max_inodes = ctx->inodes;
+	sbinfo->free_ispace = sbinfo->max_inodes * BOGO_INODE_SIZE;
 	if (sb->s_flags & SB_KERNMOUNT) {
 		sbinfo->ino_batch = alloc_percpu(ino_t);
 		if (!sbinfo->ino_batch)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes
  2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
  2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
  2023-08-09  4:32 ` [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes Hugh Dickins
@ 2023-08-09  4:33 ` Hugh Dickins
  2023-08-09  9:50   ` Jan Kara
  2023-08-09 13:52   ` Carlos Maiolino
  2023-08-09  4:34 ` [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO Hugh Dickins
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-09  4:33 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

Enable "user." extended attributes on tmpfs, limiting them by tracking
the space they occupy, and deducting that space from the limited ispace
(unless tmpfs mounted with nr_inodes=0 to leave that ispace unlimited).

tmpfs inodes and simple xattrs are both unswappable, and have to be in
lowmem on a 32-bit highmem kernel: so the ispace limit is appropriate
for xattrs, without any need for a further mount option.

Add simple_xattr_space() to give approximate but deterministic estimate
of the space taken up by each xattr: with simple_xattrs_free() outputting
the space freed if required (but kernfs and even some tmpfs usages do not
require that, so don't waste time on strlen'ing if not needed).

Security and trusted xattrs were already supported: for consistency and
simplicity, account them from the same pool; though there's a small risk
that a tmpfs with enough space before would now be considered too small.

When extended attributes are used, "df -i" does show more IUsed and less
IFree than can be explained by the inodes: document that (manpage later).

xfstests tests/generic which were not run on tmpfs before but now pass:
020 037 062 070 077 097 103 117 337 377 454 486 523 533 611 618 728
with no new failures.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 Documentation/filesystems/tmpfs.rst |  7 ++-
 fs/Kconfig                          |  4 +-
 fs/kernfs/dir.c                     |  2 +-
 fs/xattr.c                          | 28 ++++++++++-
 include/linux/xattr.h               |  3 +-
 mm/shmem.c                          | 78 +++++++++++++++++++++++++++----
 6 files changed, 106 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 67422ee10e03..56a26c843dbe 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -21,8 +21,8 @@ explained further below, some of which can be reconfigured dynamically on the
 fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
 filesystem can be resized but it cannot be resized to a size below its current
 usage. tmpfs also supports POSIX ACLs, and extended attributes for the
-trusted.* and security.* namespaces. ramfs does not use swap and you cannot
-modify any parameter for a ramfs filesystem. The size limit of a ramfs
+trusted.*, security.* and user.* namespaces. ramfs does not use swap and you
+cannot modify any parameter for a ramfs filesystem. The size limit of a ramfs
 filesystem is how much memory you have available, and so care must be taken if
 used so to not run out of memory.
 
@@ -97,6 +97,9 @@ mount with such options, since it allows any user with write access to
 use up all the memory on the machine; but enhances the scalability of
 that instance in a system with many CPUs making intensive use of it.
 
+If nr_inodes is not 0, that limited space for inodes is also used up by
+extended attributes: "df -i"'s IUsed and IUse% increase, IFree decreases.
+
 tmpfs blocks may be swapped out, when there is a shortage of memory.
 tmpfs has a mount option to disable its use of swap:
 
diff --git a/fs/Kconfig b/fs/Kconfig
index 8218a71933f9..7da21f563192 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -205,8 +205,8 @@ config TMPFS_XATTR
 	  Extended attributes are name:value pairs associated with inodes by
 	  the kernel or by users (see the attr(5) manual page for details).
 
-	  Currently this enables support for the trusted.* and
-	  security.* namespaces.
+	  This enables support for the trusted.*, security.* and user.*
+	  namespaces.
 
 	  You need this for POSIX ACL support on tmpfs.
 
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 5a1a4af9d3d2..660995856a04 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -556,7 +556,7 @@ void kernfs_put(struct kernfs_node *kn)
 	kfree_const(kn->name);
 
 	if (kn->iattr) {
-		simple_xattrs_free(&kn->iattr->xattrs);
+		simple_xattrs_free(&kn->iattr->xattrs, NULL);
 		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
 	}
 	spin_lock(&kernfs_idr_lock);
diff --git a/fs/xattr.c b/fs/xattr.c
index ba37a8f5cfd1..2d607542281b 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1039,6 +1039,26 @@ const char *xattr_full_name(const struct xattr_handler *handler,
 }
 EXPORT_SYMBOL(xattr_full_name);
 
+/**
+ * simple_xattr_space - estimate the memory used by a simple xattr
+ * @name: the full name of the xattr
+ * @size: the size of its value
+ *
+ * This takes no account of how much larger the two slab objects actually are:
+ * that would depend on the slab implementation, when what is required is a
+ * deterministic number, which grows with name length and size and quantity.
+ *
+ * Return: The approximate number of bytes of memory used by such an xattr.
+ */
+size_t simple_xattr_space(const char *name, size_t size)
+{
+	/*
+	 * Use "40" instead of sizeof(struct simple_xattr), to return the
+	 * same result on 32-bit and 64-bit, and even if simple_xattr grows.
+	 */
+	return 40 + size + strlen(name);
+}
+
 /**
  * simple_xattr_free - free an xattr object
  * @xattr: the xattr object
@@ -1363,14 +1383,17 @@ void simple_xattrs_init(struct simple_xattrs *xattrs)
 /**
  * simple_xattrs_free - free xattrs
  * @xattrs: xattr header whose xattrs to destroy
+ * @freed_space: approximate number of bytes of memory freed from @xattrs
  *
  * Destroy all xattrs in @xattr. When this is called no one can hold a
  * reference to any of the xattrs anymore.
  */
-void simple_xattrs_free(struct simple_xattrs *xattrs)
+void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space)
 {
 	struct rb_node *rbp;
 
+	if (freed_space)
+		*freed_space = 0;
 	rbp = rb_first(&xattrs->rb_root);
 	while (rbp) {
 		struct simple_xattr *xattr;
@@ -1379,6 +1402,9 @@ void simple_xattrs_free(struct simple_xattrs *xattrs)
 		rbp_next = rb_next(rbp);
 		xattr = rb_entry(rbp, struct simple_xattr, rb_node);
 		rb_erase(&xattr->rb_node, &xattrs->rb_root);
+		if (freed_space)
+			*freed_space += simple_xattr_space(xattr->name,
+							   xattr->size);
 		simple_xattr_free(xattr);
 		rbp = rbp_next;
 	}
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index e37fe667ae04..d20051865800 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -114,7 +114,8 @@ struct simple_xattr {
 };
 
 void simple_xattrs_init(struct simple_xattrs *xattrs);
-void simple_xattrs_free(struct simple_xattrs *xattrs);
+void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space);
+size_t simple_xattr_space(const char *name, size_t size);
 struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
 void simple_xattr_free(struct simple_xattr *xattr);
 int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
diff --git a/mm/shmem.c b/mm/shmem.c
index c39471384168..7420b510a9f3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -393,12 +393,12 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop)
 	return 0;
 }
 
-static void shmem_free_inode(struct super_block *sb)
+static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 	if (sbinfo->max_inodes) {
 		raw_spin_lock(&sbinfo->stat_lock);
-		sbinfo->free_ispace += BOGO_INODE_SIZE;
+		sbinfo->free_ispace += BOGO_INODE_SIZE + freed_ispace;
 		raw_spin_unlock(&sbinfo->stat_lock);
 	}
 }
@@ -1232,6 +1232,7 @@ static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
+	size_t freed;
 
 	if (shmem_mapping(inode->i_mapping)) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -1258,9 +1259,9 @@ static void shmem_evict_inode(struct inode *inode)
 		}
 	}
 
-	simple_xattrs_free(&info->xattrs);
+	simple_xattrs_free(&info->xattrs, sbinfo->max_inodes ? &freed : NULL);
+	shmem_free_inode(inode->i_sb, freed);
 	WARN_ON(inode->i_blocks);
-	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
 #ifdef CONFIG_TMPFS_QUOTA
 	dquot_free_inode(inode);
@@ -2440,7 +2441,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
 	inode = new_inode(sb);
 
 	if (!inode) {
-		shmem_free_inode(sb);
+		shmem_free_inode(sb, 0);
 		return ERR_PTR(-ENOSPC);
 	}
 
@@ -3281,7 +3282,7 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 	ret = simple_offset_add(shmem_get_offset_ctx(dir), dentry);
 	if (ret) {
 		if (inode->i_nlink)
-			shmem_free_inode(inode->i_sb);
+			shmem_free_inode(inode->i_sb, 0);
 		goto out;
 	}
 
@@ -3301,7 +3302,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 	struct inode *inode = d_inode(dentry);
 
 	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
-		shmem_free_inode(inode->i_sb);
+		shmem_free_inode(inode->i_sb, 0);
 
 	simple_offset_remove(shmem_get_offset_ctx(dir), dentry);
 
@@ -3554,21 +3555,40 @@ static int shmem_initxattrs(struct inode *inode,
 			    void *fs_info)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	const struct xattr *xattr;
 	struct simple_xattr *new_xattr;
+	size_t ispace = 0;
 	size_t len;
 
+	if (sbinfo->max_inodes) {
+		for (xattr = xattr_array; xattr->name != NULL; xattr++) {
+			ispace += simple_xattr_space(xattr->name,
+				xattr->value_len + XATTR_SECURITY_PREFIX_LEN);
+		}
+		if (ispace) {
+			raw_spin_lock(&sbinfo->stat_lock);
+			if (sbinfo->free_ispace < ispace)
+				ispace = 0;
+			else
+				sbinfo->free_ispace -= ispace;
+			raw_spin_unlock(&sbinfo->stat_lock);
+			if (!ispace)
+				return -ENOSPC;
+		}
+	}
+
 	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
 		new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
 		if (!new_xattr)
-			return -ENOMEM;
+			break;
 
 		len = strlen(xattr->name) + 1;
 		new_xattr->name = kmalloc(XATTR_SECURITY_PREFIX_LEN + len,
 					  GFP_KERNEL);
 		if (!new_xattr->name) {
 			kvfree(new_xattr);
-			return -ENOMEM;
+			break;
 		}
 
 		memcpy(new_xattr->name, XATTR_SECURITY_PREFIX,
@@ -3579,6 +3599,16 @@ static int shmem_initxattrs(struct inode *inode,
 		simple_xattr_add(&info->xattrs, new_xattr);
 	}
 
+	if (xattr->name != NULL) {
+		if (ispace) {
+			raw_spin_lock(&sbinfo->stat_lock);
+			sbinfo->free_ispace += ispace;
+			raw_spin_unlock(&sbinfo->stat_lock);
+		}
+		simple_xattrs_free(&info->xattrs, NULL);
+		return -ENOMEM;
+	}
+
 	return 0;
 }
 
@@ -3599,16 +3629,39 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
 				   size_t size, int flags)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	struct simple_xattr *old_xattr;
+	size_t ispace = 0;
 
 	name = xattr_full_name(handler, name);
+	if (value && sbinfo->max_inodes) {
+		ispace = simple_xattr_space(name, size);
+		raw_spin_lock(&sbinfo->stat_lock);
+		if (sbinfo->free_ispace < ispace)
+			ispace = 0;
+		else
+			sbinfo->free_ispace -= ispace;
+		raw_spin_unlock(&sbinfo->stat_lock);
+		if (!ispace)
+			return -ENOSPC;
+	}
+
 	old_xattr = simple_xattr_set(&info->xattrs, name, value, size, flags);
 	if (!IS_ERR(old_xattr)) {
+		ispace = 0;
+		if (old_xattr && sbinfo->max_inodes)
+			ispace = simple_xattr_space(old_xattr->name,
+						    old_xattr->size);
 		simple_xattr_free(old_xattr);
 		old_xattr = NULL;
 		inode->i_ctime = current_time(inode);
 		inode_inc_iversion(inode);
 	}
+	if (ispace) {
+		raw_spin_lock(&sbinfo->stat_lock);
+		sbinfo->free_ispace += ispace;
+		raw_spin_unlock(&sbinfo->stat_lock);
+	}
 	return PTR_ERR(old_xattr);
 }
 
@@ -3624,9 +3677,16 @@ static const struct xattr_handler shmem_trusted_xattr_handler = {
 	.set = shmem_xattr_handler_set,
 };
 
+static const struct xattr_handler shmem_user_xattr_handler = {
+	.prefix = XATTR_USER_PREFIX,
+	.get = shmem_xattr_handler_get,
+	.set = shmem_xattr_handler_set,
+};
+
 static const struct xattr_handler *shmem_xattr_handlers[] = {
 	&shmem_security_xattr_handler,
 	&shmem_trusted_xattr_handler,
+	&shmem_user_xattr_handler,
 	NULL
 };
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
                   ` (2 preceding siblings ...)
  2023-08-09  4:33 ` [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes Hugh Dickins
@ 2023-08-09  4:34 ` Hugh Dickins
  2023-08-09  9:54   ` Jan Kara
                     ` (2 more replies)
  2023-08-09  4:36 ` [PATCH vfs.tmpfs 5/5] mm: invalidation check mapping before folio_contains Hugh Dickins
  2023-08-09  6:45 ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Christian Brauner
  5 siblings, 3 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-09  4:34 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

Depending upon your philosophical viewpoint, either tmpfs always does
direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
stand in for a more sophisticated filesystem, it can be helpful for tmpfs
to support O_DIRECT.  So, give tmpfs a shmem_direct_IO() method, of the
simplest kind: by just returning 0 done, it leaves all the work to the
buffered fallback (and everything else just happens to work out okay -
in particular, its dirty pages don't get lost to invalidation).

xfstests auto generic which were not run on tmpfs before but now pass:
036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
with no new failures.

LTP dio tests which were not run on tmpfs before but now pass:
dio01 through dio30, except for dio04 and dio10, which fail because
tmpfs dio read and write allow odd count: tmpfs could be made stricter,
but would that be an improvement?

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/shmem.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 7420b510a9f3..4d5599e566df 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2720,6 +2720,16 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 	return copied;
 }
 
+static ssize_t shmem_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+{
+	/*
+	 * Just leave all the work to the buffered fallback.
+	 * Some LTP tests may expect us to enforce alignment restrictions,
+	 * but the fallback works just fine with any alignment, so allow it.
+	 */
+	return 0;
+}
+
 static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
@@ -4421,6 +4431,7 @@ const struct address_space_operations shmem_aops = {
 #ifdef CONFIG_TMPFS
 	.write_begin	= shmem_write_begin,
 	.write_end	= shmem_write_end,
+	.direct_IO	= shmem_direct_IO,
 #endif
 #ifdef CONFIG_MIGRATION
 	.migrate_folio	= migrate_folio,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs 5/5] mm: invalidation check mapping before folio_contains
  2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
                   ` (3 preceding siblings ...)
  2023-08-09  4:34 ` [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO Hugh Dickins
@ 2023-08-09  4:36 ` Hugh Dickins
  2023-08-09  9:27   ` Jan Kara
  2023-08-09  6:45 ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Christian Brauner
  5 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2023-08-09  4:36 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

Enabling tmpfs "direct IO" exposes it to invalidate_inode_pages2_range(),
which when swapping can hit the VM_BUG_ON_FOLIO(!folio_contains()): the
folio has been moved from page cache to swap cache (with folio->mapping
reset to NULL), but the folio_index() embedded in folio_contains() sees
swapcache, and so returns the swapcache_index() - whereas folio->index
would be the right one to check against the index from mapping's xarray.

There are different ways to fix this, but my preference is just to order
the checks in invalidate_inode_pages2_range() the same way that they are
in __filemap_get_folio() and find_lock_entries() and filemap_fault():
check folio->mapping before folio_contains().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/truncate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 95d1291d269b..c3320e66d6ea 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -657,11 +657,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 			}
 
 			folio_lock(folio);
-			VM_BUG_ON_FOLIO(!folio_contains(folio, indices[i]), folio);
-			if (folio->mapping != mapping) {
+			if (unlikely(folio->mapping != mapping)) {
 				folio_unlock(folio);
 				continue;
 			}
+			VM_BUG_ON_FOLIO(!folio_contains(folio, indices[i]), folio);
 			folio_wait_writeback(folio);
 
 			if (folio_mapped(folio))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO
  2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
                   ` (4 preceding siblings ...)
  2023-08-09  4:36 ` [PATCH vfs.tmpfs 5/5] mm: invalidation check mapping before folio_contains Hugh Dickins
@ 2023-08-09  6:45 ` Christian Brauner
  2023-08-09 11:33   ` Christian Brauner
  5 siblings, 1 reply; 32+ messages in thread
From: Christian Brauner @ 2023-08-09  6:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue, Aug 08, 2023 at 09:28:08PM -0700, Hugh Dickins wrote:
> This series enables and limits user extended attributes on tmpfs,
> and independently provides a trivial direct IO stub for tmpfs.
> 
> It is here based on the vfs.tmpfs branch in vfs.git in next-20230808
> but with a cherry-pick of v6.5-rc4's commit
> 253e5df8b8f0 ("tmpfs: fix Documentation of noswap and huge mount options")
> first: since the vfs.tmpfs branch is based on v6.5-rc1, but 3/5 in this
> series updates tmpfs.rst in a way which depends on that commit.
> 
> IIUC the right thing to do would be to cherry-pick 253e5df8b8f0 into
> vfs.tmpfs before applying this series.  I'm sorry that the series as
> posted does not apply cleanly to any known tree! but I think posting
> it against v6.5-rc5 or next-20230808 would be even less helpful.

No worries, I'll sort that out.

> 
> There is one "conflict" between this series and the final next-20230808:
> Jeff Layton's vfs.ctime mods update a line of shmem_xattr_handler_set(),
> where neighbouring lines are modified by 1/5 and 3/5 here: easily
> resolved in the merge commit, I hope.

Yeah, git rerere is our friend here as well so reassembling the tree
isn't really that much pain.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed
  2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
@ 2023-08-09  9:21   ` Jan Kara
  2023-08-09 11:37   ` Christian Brauner
  2023-08-09 13:19   ` Carlos Maiolino
  2 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-09  9:21 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue 08-08-23 21:30:59, Hugh Dickins wrote:
> tmpfs wants to support limited user extended attributes, but kernfs
> (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR)
> already supports user extended attributes through simple xattrs: but
> limited by a policy (128KiB per inode) too liberal to be used on tmpfs.
> 
> To allow a different limiting policy for tmpfs, without affecting the
> policy for kernfs, change simple_xattr_set() to return the replaced or
> removed xattr (if any), leaving the caller to update their accounting
> then free the xattr (by simple_xattr_free(), renamed from the static
> free_simple_xattr()).
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/kernfs/inode.c     | 46 +++++++++++++++++++++++++---------------
>  fs/xattr.c            | 51 +++++++++++++++++++--------------------------
>  include/linux/xattr.h |  7 ++++---
>  mm/shmem.c            | 10 +++++----
>  4 files changed, 61 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> index b22b74d1a115..fec5d5f78f07 100644
> --- a/fs/kernfs/inode.c
> +++ b/fs/kernfs/inode.c
> @@ -306,11 +306,17 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
>  int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
>  		     const void *value, size_t size, int flags)
>  {
> +	struct simple_xattr *old_xattr;
>  	struct kernfs_iattrs *attrs = kernfs_iattrs(kn);
>  	if (!attrs)
>  		return -ENOMEM;
>  
> -	return simple_xattr_set(&attrs->xattrs, name, value, size, flags, NULL);
> +	old_xattr = simple_xattr_set(&attrs->xattrs, name, value, size, flags);
> +	if (IS_ERR(old_xattr))
> +		return PTR_ERR(old_xattr);
> +
> +	simple_xattr_free(old_xattr);
> +	return 0;
>  }
>  
>  static int kernfs_vfs_xattr_get(const struct xattr_handler *handler,
> @@ -342,7 +348,7 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
>  {
>  	atomic_t *sz = &kn->iattr->user_xattr_size;
>  	atomic_t *nr = &kn->iattr->nr_user_xattrs;
> -	ssize_t removed_size;
> +	struct simple_xattr *old_xattr;
>  	int ret;
>  
>  	if (atomic_inc_return(nr) > KERNFS_MAX_USER_XATTRS) {
> @@ -355,13 +361,18 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
>  		goto dec_size_out;
>  	}
>  
> -	ret = simple_xattr_set(xattrs, full_name, value, size, flags,
> -			       &removed_size);
> -
> -	if (!ret && removed_size >= 0)
> -		size = removed_size;
> -	else if (!ret)
> +	old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
> +	if (!old_xattr)
>  		return 0;
> +
> +	if (IS_ERR(old_xattr)) {
> +		ret = PTR_ERR(old_xattr);
> +		goto dec_size_out;
> +	}
> +
> +	ret = 0;
> +	size = old_xattr->size;
> +	simple_xattr_free(old_xattr);
>  dec_size_out:
>  	atomic_sub(size, sz);
>  dec_count_out:
> @@ -376,18 +387,19 @@ static int kernfs_vfs_user_xattr_rm(struct kernfs_node *kn,
>  {
>  	atomic_t *sz = &kn->iattr->user_xattr_size;
>  	atomic_t *nr = &kn->iattr->nr_user_xattrs;
> -	ssize_t removed_size;
> -	int ret;
> +	struct simple_xattr *old_xattr;
>  
> -	ret = simple_xattr_set(xattrs, full_name, value, size, flags,
> -			       &removed_size);
> +	old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
> +	if (!old_xattr)
> +		return 0;
>  
> -	if (removed_size >= 0) {
> -		atomic_sub(removed_size, sz);
> -		atomic_dec(nr);
> -	}
> +	if (IS_ERR(old_xattr))
> +		return PTR_ERR(old_xattr);
>  
> -	return ret;
> +	atomic_sub(old_xattr->size, sz);
> +	atomic_dec(nr);
> +	simple_xattr_free(old_xattr);
> +	return 0;
>  }
>  
>  static int kernfs_vfs_user_xattr_set(const struct xattr_handler *handler,
> diff --git a/fs/xattr.c b/fs/xattr.c
> index e7bbb7f57557..ba37a8f5cfd1 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -1040,12 +1040,12 @@ const char *xattr_full_name(const struct xattr_handler *handler,
>  EXPORT_SYMBOL(xattr_full_name);
>  
>  /**
> - * free_simple_xattr - free an xattr object
> + * simple_xattr_free - free an xattr object
>   * @xattr: the xattr object
>   *
>   * Free the xattr object. Can handle @xattr being NULL.
>   */
> -static inline void free_simple_xattr(struct simple_xattr *xattr)
> +void simple_xattr_free(struct simple_xattr *xattr)
>  {
>  	if (xattr)
>  		kfree(xattr->name);
> @@ -1164,7 +1164,6 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
>   * @value: the value to store along the xattr
>   * @size: the size of @value
>   * @flags: the flags determining how to set the xattr
> - * @removed_size: the size of the removed xattr
>   *
>   * Set a new xattr object.
>   * If @value is passed a new xattr object will be allocated. If XATTR_REPLACE
> @@ -1181,29 +1180,27 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
>   * nothing if XATTR_CREATE is specified in @flags or @flags is zero. For
>   * XATTR_REPLACE we fail as mentioned above.
>   *
> - * Return: On success zero and on error a negative error code is returned.
> + * Return: On success, the removed or replaced xattr is returned, to be freed
> + * by the caller; or NULL if none. On failure a negative error code is returned.
>   */
> -int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
> -		     const void *value, size_t size, int flags,
> -		     ssize_t *removed_size)
> +struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
> +				      const char *name, const void *value,
> +				      size_t size, int flags)
>  {
> -	struct simple_xattr *xattr = NULL, *new_xattr = NULL;
> +	struct simple_xattr *old_xattr = NULL, *new_xattr = NULL;
>  	struct rb_node *parent = NULL, **rbp;
>  	int err = 0, ret;
>  
> -	if (removed_size)
> -		*removed_size = -1;
> -
>  	/* value == NULL means remove */
>  	if (value) {
>  		new_xattr = simple_xattr_alloc(value, size);
>  		if (!new_xattr)
> -			return -ENOMEM;
> +			return ERR_PTR(-ENOMEM);
>  
>  		new_xattr->name = kstrdup(name, GFP_KERNEL);
>  		if (!new_xattr->name) {
> -			free_simple_xattr(new_xattr);
> -			return -ENOMEM;
> +			simple_xattr_free(new_xattr);
> +			return ERR_PTR(-ENOMEM);
>  		}
>  	}
>  
> @@ -1217,12 +1214,12 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
>  		else if (ret > 0)
>  			rbp = &(*rbp)->rb_right;
>  		else
> -			xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
> -		if (xattr)
> +			old_xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
> +		if (old_xattr)
>  			break;
>  	}
>  
> -	if (xattr) {
> +	if (old_xattr) {
>  		/* Fail if XATTR_CREATE is requested and the xattr exists. */
>  		if (flags & XATTR_CREATE) {
>  			err = -EEXIST;
> @@ -1230,12 +1227,10 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
>  		}
>  
>  		if (new_xattr)
> -			rb_replace_node(&xattr->rb_node, &new_xattr->rb_node,
> -					&xattrs->rb_root);
> +			rb_replace_node(&old_xattr->rb_node,
> +					&new_xattr->rb_node, &xattrs->rb_root);
>  		else
> -			rb_erase(&xattr->rb_node, &xattrs->rb_root);
> -		if (!err && removed_size)
> -			*removed_size = xattr->size;
> +			rb_erase(&old_xattr->rb_node, &xattrs->rb_root);
>  	} else {
>  		/* Fail if XATTR_REPLACE is requested but no xattr is found. */
>  		if (flags & XATTR_REPLACE) {
> @@ -1260,12 +1255,10 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
>  
>  out_unlock:
>  	write_unlock(&xattrs->lock);
> -	if (err)
> -		free_simple_xattr(new_xattr);
> -	else
> -		free_simple_xattr(xattr);
> -	return err;
> -
> +	if (!err)
> +		return old_xattr;
> +	simple_xattr_free(new_xattr);
> +	return ERR_PTR(err);
>  }
>  
>  static bool xattr_is_trusted(const char *name)
> @@ -1386,7 +1379,7 @@ void simple_xattrs_free(struct simple_xattrs *xattrs)
>  		rbp_next = rb_next(rbp);
>  		xattr = rb_entry(rbp, struct simple_xattr, rb_node);
>  		rb_erase(&xattr->rb_node, &xattrs->rb_root);
> -		free_simple_xattr(xattr);
> +		simple_xattr_free(xattr);
>  		rbp = rbp_next;
>  	}
>  }
> diff --git a/include/linux/xattr.h b/include/linux/xattr.h
> index d591ef59aa98..e37fe667ae04 100644
> --- a/include/linux/xattr.h
> +++ b/include/linux/xattr.h
> @@ -116,11 +116,12 @@ struct simple_xattr {
>  void simple_xattrs_init(struct simple_xattrs *xattrs);
>  void simple_xattrs_free(struct simple_xattrs *xattrs);
>  struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
> +void simple_xattr_free(struct simple_xattr *xattr);
>  int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
>  		     void *buffer, size_t size);
> -int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
> -		     const void *value, size_t size, int flags,
> -		     ssize_t *removed_size);
> +struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
> +				      const char *name, const void *value,
> +				      size_t size, int flags);
>  ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
>  			  char *buffer, size_t size);
>  void simple_xattr_add(struct simple_xattrs *xattrs,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0f83d86fd8b4..df3cabf54206 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3595,15 +3595,17 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
>  				   size_t size, int flags)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> -	int err;
> +	struct simple_xattr *old_xattr;
>  
>  	name = xattr_full_name(handler, name);
> -	err = simple_xattr_set(&info->xattrs, name, value, size, flags, NULL);
> -	if (!err) {
> +	old_xattr = simple_xattr_set(&info->xattrs, name, value, size, flags);
> +	if (!IS_ERR(old_xattr)) {
> +		simple_xattr_free(old_xattr);
> +		old_xattr = NULL;
>  		inode->i_ctime = current_time(inode);
>  		inode_inc_iversion(inode);
>  	}
> -	return err;
> +	return PTR_ERR(old_xattr);
>  }
>  
>  static const struct xattr_handler shmem_security_xattr_handler = {
> -- 
> 2.35.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 5/5] mm: invalidation check mapping before folio_contains
  2023-08-09  4:36 ` [PATCH vfs.tmpfs 5/5] mm: invalidation check mapping before folio_contains Hugh Dickins
@ 2023-08-09  9:27   ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-09  9:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue 08-08-23 21:36:12, Hugh Dickins wrote:
> Enabling tmpfs "direct IO" exposes it to invalidate_inode_pages2_range(),
> which when swapping can hit the VM_BUG_ON_FOLIO(!folio_contains()): the
> folio has been moved from page cache to swap cache (with folio->mapping
> reset to NULL), but the folio_index() embedded in folio_contains() sees
> swapcache, and so returns the swapcache_index() - whereas folio->index
> would be the right one to check against the index from mapping's xarray.
> 
> There are different ways to fix this, but my preference is just to order
> the checks in invalidate_inode_pages2_range() the same way that they are
> in __filemap_get_folio() and find_lock_entries() and filemap_fault():
> check folio->mapping before folio_contains().
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Makes sense. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/truncate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 95d1291d269b..c3320e66d6ea 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -657,11 +657,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
>  			}
>  
>  			folio_lock(folio);
> -			VM_BUG_ON_FOLIO(!folio_contains(folio, indices[i]), folio);
> -			if (folio->mapping != mapping) {
> +			if (unlikely(folio->mapping != mapping)) {
>  				folio_unlock(folio);
>  				continue;
>  			}
> +			VM_BUG_ON_FOLIO(!folio_contains(folio, indices[i]), folio);
>  			folio_wait_writeback(folio);
>  
>  			if (folio_mapped(folio))
> -- 
> 2.35.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes
  2023-08-09  4:32 ` [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes Hugh Dickins
@ 2023-08-09  9:33   ` Jan Kara
  2023-08-09 13:29   ` Carlos Maiolino
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-09  9:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue 08-08-23 21:32:21, Hugh Dickins wrote:
> In preparation for assigning some inode space to extended attributes,
> keep track of free_ispace instead of number of free_inodes: as if one
> tmpfs inode (and accompanying dentry) occupies very approximately 1KiB.
> 
> Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit:
> but take care to enforce the maximum.  And fix the nr_blocks maximum on
> 32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead.
> 
> Delete the incorrect limited<->unlimited blocks/inodes comment above
> shmem_reconfigure(): leave it to the error messages below to describe.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/shmem_fs.h |  2 +-
>  mm/shmem.c               | 33 +++++++++++++++++----------------
>  2 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 9b2d2faff1d0..6b0c626620f5 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -54,7 +54,7 @@ struct shmem_sb_info {
>  	unsigned long max_blocks;   /* How many blocks are allowed */
>  	struct percpu_counter used_blocks;  /* How many are allocated */
>  	unsigned long max_inodes;   /* How many inodes are allowed */
> -	unsigned long free_inodes;  /* How many are left for allocation */
> +	unsigned long free_ispace;  /* How much ispace left for allocation */
>  	raw_spinlock_t stat_lock;   /* Serialize shmem_sb_info changes */
>  	umode_t mode;		    /* Mount mode for root directory */
>  	unsigned char huge;	    /* Whether to try for hugepages */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index df3cabf54206..c39471384168 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -90,6 +90,9 @@ static struct vfsmount *shm_mnt;
>  /* Pretend that each entry is of this size in directory's i_size */
>  #define BOGO_DIRENT_SIZE 20
>  
> +/* Pretend that one inode + its dentry occupy this much memory */
> +#define BOGO_INODE_SIZE 1024
> +
>  /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
>  #define SHORT_SYMLINK_LEN 128
>  
> @@ -137,7 +140,8 @@ static unsigned long shmem_default_max_inodes(void)
>  {
>  	unsigned long nr_pages = totalram_pages();
>  
> -	return min(nr_pages - totalhigh_pages(), nr_pages / 2);
> +	return min3(nr_pages - totalhigh_pages(), nr_pages / 2,
> +			ULONG_MAX / BOGO_INODE_SIZE);
>  }
>  #endif
>  
> @@ -331,11 +335,11 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop)
>  	if (!(sb->s_flags & SB_KERNMOUNT)) {
>  		raw_spin_lock(&sbinfo->stat_lock);
>  		if (sbinfo->max_inodes) {
> -			if (!sbinfo->free_inodes) {
> +			if (sbinfo->free_ispace < BOGO_INODE_SIZE) {
>  				raw_spin_unlock(&sbinfo->stat_lock);
>  				return -ENOSPC;
>  			}
> -			sbinfo->free_inodes--;
> +			sbinfo->free_ispace -= BOGO_INODE_SIZE;
>  		}
>  		if (inop) {
>  			ino = sbinfo->next_ino++;
> @@ -394,7 +398,7 @@ static void shmem_free_inode(struct super_block *sb)
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
>  	if (sbinfo->max_inodes) {
>  		raw_spin_lock(&sbinfo->stat_lock);
> -		sbinfo->free_inodes++;
> +		sbinfo->free_ispace += BOGO_INODE_SIZE;
>  		raw_spin_unlock(&sbinfo->stat_lock);
>  	}
>  }
> @@ -3155,7 +3159,7 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
>  	}
>  	if (sbinfo->max_inodes) {
>  		buf->f_files = sbinfo->max_inodes;
> -		buf->f_ffree = sbinfo->free_inodes;
> +		buf->f_ffree = sbinfo->free_ispace / BOGO_INODE_SIZE;
>  	}
>  	/* else leave those fields 0 like simple_statfs */
>  
> @@ -3815,13 +3819,13 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>  		break;
>  	case Opt_nr_blocks:
>  		ctx->blocks = memparse(param->string, &rest);
> -		if (*rest || ctx->blocks > S64_MAX)
> +		if (*rest || ctx->blocks > LONG_MAX)
>  			goto bad_value;
>  		ctx->seen |= SHMEM_SEEN_BLOCKS;
>  		break;
>  	case Opt_nr_inodes:
>  		ctx->inodes = memparse(param->string, &rest);
> -		if (*rest)
> +		if (*rest || ctx->inodes > ULONG_MAX / BOGO_INODE_SIZE)
>  			goto bad_value;
>  		ctx->seen |= SHMEM_SEEN_INODES;
>  		break;
> @@ -4002,21 +4006,17 @@ static int shmem_parse_options(struct fs_context *fc, void *data)
>  
>  /*
>   * Reconfigure a shmem filesystem.
> - *
> - * Note that we disallow change from limited->unlimited blocks/inodes while any
> - * are in use; but we must separately disallow unlimited->limited, because in
> - * that case we have no record of how much is already in use.
>   */
>  static int shmem_reconfigure(struct fs_context *fc)
>  {
>  	struct shmem_options *ctx = fc->fs_private;
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(fc->root->d_sb);
> -	unsigned long inodes;
> +	unsigned long used_isp;
>  	struct mempolicy *mpol = NULL;
>  	const char *err;
>  
>  	raw_spin_lock(&sbinfo->stat_lock);
> -	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
> +	used_isp = sbinfo->max_inodes * BOGO_INODE_SIZE - sbinfo->free_ispace;
>  
>  	if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
>  		if (!sbinfo->max_blocks) {
> @@ -4034,7 +4034,7 @@ static int shmem_reconfigure(struct fs_context *fc)
>  			err = "Cannot retroactively limit inodes";
>  			goto out;
>  		}
> -		if (ctx->inodes < inodes) {
> +		if (ctx->inodes * BOGO_INODE_SIZE < used_isp) {
>  			err = "Too few inodes for current use";
>  			goto out;
>  		}
> @@ -4080,7 +4080,7 @@ static int shmem_reconfigure(struct fs_context *fc)
>  		sbinfo->max_blocks  = ctx->blocks;
>  	if (ctx->seen & SHMEM_SEEN_INODES) {
>  		sbinfo->max_inodes  = ctx->inodes;
> -		sbinfo->free_inodes = ctx->inodes - inodes;
> +		sbinfo->free_ispace = ctx->inodes * BOGO_INODE_SIZE - used_isp;
>  	}
>  
>  	/*
> @@ -4211,7 +4211,8 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
>  	sb->s_flags |= SB_NOUSER;
>  #endif
>  	sbinfo->max_blocks = ctx->blocks;
> -	sbinfo->free_inodes = sbinfo->max_inodes = ctx->inodes;
> +	sbinfo->max_inodes = ctx->inodes;
> +	sbinfo->free_ispace = sbinfo->max_inodes * BOGO_INODE_SIZE;
>  	if (sb->s_flags & SB_KERNMOUNT) {
>  		sbinfo->ino_batch = alloc_percpu(ino_t);
>  		if (!sbinfo->ino_batch)
> -- 
> 2.35.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes
  2023-08-09  4:33 ` [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes Hugh Dickins
@ 2023-08-09  9:50   ` Jan Kara
  2023-08-09 13:52   ` Carlos Maiolino
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-09  9:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue 08-08-23 21:33:56, Hugh Dickins wrote:
> Enable "user." extended attributes on tmpfs, limiting them by tracking
> the space they occupy, and deducting that space from the limited ispace
> (unless tmpfs mounted with nr_inodes=0 to leave that ispace unlimited).
> 
> tmpfs inodes and simple xattrs are both unswappable, and have to be in
> lowmem on a 32-bit highmem kernel: so the ispace limit is appropriate
> for xattrs, without any need for a further mount option.
> 
> Add simple_xattr_space() to give approximate but deterministic estimate
> of the space taken up by each xattr: with simple_xattrs_free() outputting
> the space freed if required (but kernfs and even some tmpfs usages do not
> require that, so don't waste time on strlen'ing if not needed).
> 
> Security and trusted xattrs were already supported: for consistency and
> simplicity, account them from the same pool; though there's a small risk
> that a tmpfs with enough space before would now be considered too small.
> 
> When extended attributes are used, "df -i" does show more IUsed and less
> IFree than can be explained by the inodes: document that (manpage later).
> 
> xfstests tests/generic which were not run on tmpfs before but now pass:
> 020 037 062 070 077 097 103 117 337 377 454 486 523 533 611 618 728
> with no new failures.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  Documentation/filesystems/tmpfs.rst |  7 ++-
>  fs/Kconfig                          |  4 +-
>  fs/kernfs/dir.c                     |  2 +-
>  fs/xattr.c                          | 28 ++++++++++-
>  include/linux/xattr.h               |  3 +-
>  mm/shmem.c                          | 78 +++++++++++++++++++++++++++----
>  6 files changed, 106 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 67422ee10e03..56a26c843dbe 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -21,8 +21,8 @@ explained further below, some of which can be reconfigured dynamically on the
>  fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
>  filesystem can be resized but it cannot be resized to a size below its current
>  usage. tmpfs also supports POSIX ACLs, and extended attributes for the
> -trusted.* and security.* namespaces. ramfs does not use swap and you cannot
> -modify any parameter for a ramfs filesystem. The size limit of a ramfs
> +trusted.*, security.* and user.* namespaces. ramfs does not use swap and you
> +cannot modify any parameter for a ramfs filesystem. The size limit of a ramfs
>  filesystem is how much memory you have available, and so care must be taken if
>  used so to not run out of memory.
>  
> @@ -97,6 +97,9 @@ mount with such options, since it allows any user with write access to
>  use up all the memory on the machine; but enhances the scalability of
>  that instance in a system with many CPUs making intensive use of it.
>  
> +If nr_inodes is not 0, that limited space for inodes is also used up by
> +extended attributes: "df -i"'s IUsed and IUse% increase, IFree decreases.
> +
>  tmpfs blocks may be swapped out, when there is a shortage of memory.
>  tmpfs has a mount option to disable its use of swap:
>  
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 8218a71933f9..7da21f563192 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -205,8 +205,8 @@ config TMPFS_XATTR
>  	  Extended attributes are name:value pairs associated with inodes by
>  	  the kernel or by users (see the attr(5) manual page for details).
>  
> -	  Currently this enables support for the trusted.* and
> -	  security.* namespaces.
> +	  This enables support for the trusted.*, security.* and user.*
> +	  namespaces.
>  
>  	  You need this for POSIX ACL support on tmpfs.
>  
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 5a1a4af9d3d2..660995856a04 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -556,7 +556,7 @@ void kernfs_put(struct kernfs_node *kn)
>  	kfree_const(kn->name);
>  
>  	if (kn->iattr) {
> -		simple_xattrs_free(&kn->iattr->xattrs);
> +		simple_xattrs_free(&kn->iattr->xattrs, NULL);
>  		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
>  	}
>  	spin_lock(&kernfs_idr_lock);
> diff --git a/fs/xattr.c b/fs/xattr.c
> index ba37a8f5cfd1..2d607542281b 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -1039,6 +1039,26 @@ const char *xattr_full_name(const struct xattr_handler *handler,
>  }
>  EXPORT_SYMBOL(xattr_full_name);
>  
> +/**
> + * simple_xattr_space - estimate the memory used by a simple xattr
> + * @name: the full name of the xattr
> + * @size: the size of its value
> + *
> + * This takes no account of how much larger the two slab objects actually are:
> + * that would depend on the slab implementation, when what is required is a
> + * deterministic number, which grows with name length and size and quantity.
> + *
> + * Return: The approximate number of bytes of memory used by such an xattr.
> + */
> +size_t simple_xattr_space(const char *name, size_t size)
> +{
> +	/*
> +	 * Use "40" instead of sizeof(struct simple_xattr), to return the
> +	 * same result on 32-bit and 64-bit, and even if simple_xattr grows.
> +	 */
> +	return 40 + size + strlen(name);
> +}
> +
>  /**
>   * simple_xattr_free - free an xattr object
>   * @xattr: the xattr object
> @@ -1363,14 +1383,17 @@ void simple_xattrs_init(struct simple_xattrs *xattrs)
>  /**
>   * simple_xattrs_free - free xattrs
>   * @xattrs: xattr header whose xattrs to destroy
> + * @freed_space: approximate number of bytes of memory freed from @xattrs
>   *
>   * Destroy all xattrs in @xattr. When this is called no one can hold a
>   * reference to any of the xattrs anymore.
>   */
> -void simple_xattrs_free(struct simple_xattrs *xattrs)
> +void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space)
>  {
>  	struct rb_node *rbp;
>  
> +	if (freed_space)
> +		*freed_space = 0;
>  	rbp = rb_first(&xattrs->rb_root);
>  	while (rbp) {
>  		struct simple_xattr *xattr;
> @@ -1379,6 +1402,9 @@ void simple_xattrs_free(struct simple_xattrs *xattrs)
>  		rbp_next = rb_next(rbp);
>  		xattr = rb_entry(rbp, struct simple_xattr, rb_node);
>  		rb_erase(&xattr->rb_node, &xattrs->rb_root);
> +		if (freed_space)
> +			*freed_space += simple_xattr_space(xattr->name,
> +							   xattr->size);
>  		simple_xattr_free(xattr);
>  		rbp = rbp_next;
>  	}
> diff --git a/include/linux/xattr.h b/include/linux/xattr.h
> index e37fe667ae04..d20051865800 100644
> --- a/include/linux/xattr.h
> +++ b/include/linux/xattr.h
> @@ -114,7 +114,8 @@ struct simple_xattr {
>  };
>  
>  void simple_xattrs_init(struct simple_xattrs *xattrs);
> -void simple_xattrs_free(struct simple_xattrs *xattrs);
> +void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space);
> +size_t simple_xattr_space(const char *name, size_t size);
>  struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
>  void simple_xattr_free(struct simple_xattr *xattr);
>  int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index c39471384168..7420b510a9f3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -393,12 +393,12 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop)
>  	return 0;
>  }
>  
> -static void shmem_free_inode(struct super_block *sb)
> +static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
>  {
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
>  	if (sbinfo->max_inodes) {
>  		raw_spin_lock(&sbinfo->stat_lock);
> -		sbinfo->free_ispace += BOGO_INODE_SIZE;
> +		sbinfo->free_ispace += BOGO_INODE_SIZE + freed_ispace;
>  		raw_spin_unlock(&sbinfo->stat_lock);
>  	}
>  }
> @@ -1232,6 +1232,7 @@ static void shmem_evict_inode(struct inode *inode)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> +	size_t freed;
>  
>  	if (shmem_mapping(inode->i_mapping)) {
>  		shmem_unacct_size(info->flags, inode->i_size);
> @@ -1258,9 +1259,9 @@ static void shmem_evict_inode(struct inode *inode)
>  		}
>  	}
>  
> -	simple_xattrs_free(&info->xattrs);
> +	simple_xattrs_free(&info->xattrs, sbinfo->max_inodes ? &freed : NULL);
> +	shmem_free_inode(inode->i_sb, freed);
>  	WARN_ON(inode->i_blocks);
> -	shmem_free_inode(inode->i_sb);
>  	clear_inode(inode);
>  #ifdef CONFIG_TMPFS_QUOTA
>  	dquot_free_inode(inode);
> @@ -2440,7 +2441,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>  	inode = new_inode(sb);
>  
>  	if (!inode) {
> -		shmem_free_inode(sb);
> +		shmem_free_inode(sb, 0);
>  		return ERR_PTR(-ENOSPC);
>  	}
>  
> @@ -3281,7 +3282,7 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
>  	ret = simple_offset_add(shmem_get_offset_ctx(dir), dentry);
>  	if (ret) {
>  		if (inode->i_nlink)
> -			shmem_free_inode(inode->i_sb);
> +			shmem_free_inode(inode->i_sb, 0);
>  		goto out;
>  	}
>  
> @@ -3301,7 +3302,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
>  	struct inode *inode = d_inode(dentry);
>  
>  	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
> -		shmem_free_inode(inode->i_sb);
> +		shmem_free_inode(inode->i_sb, 0);
>  
>  	simple_offset_remove(shmem_get_offset_ctx(dir), dentry);
>  
> @@ -3554,21 +3555,40 @@ static int shmem_initxattrs(struct inode *inode,
>  			    void *fs_info)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	const struct xattr *xattr;
>  	struct simple_xattr *new_xattr;
> +	size_t ispace = 0;
>  	size_t len;
>  
> +	if (sbinfo->max_inodes) {
> +		for (xattr = xattr_array; xattr->name != NULL; xattr++) {
> +			ispace += simple_xattr_space(xattr->name,
> +				xattr->value_len + XATTR_SECURITY_PREFIX_LEN);
> +		}
> +		if (ispace) {
> +			raw_spin_lock(&sbinfo->stat_lock);
> +			if (sbinfo->free_ispace < ispace)
> +				ispace = 0;
> +			else
> +				sbinfo->free_ispace -= ispace;
> +			raw_spin_unlock(&sbinfo->stat_lock);
> +			if (!ispace)
> +				return -ENOSPC;
> +		}
> +	}
> +
>  	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
>  		new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
>  		if (!new_xattr)
> -			return -ENOMEM;
> +			break;
>  
>  		len = strlen(xattr->name) + 1;
>  		new_xattr->name = kmalloc(XATTR_SECURITY_PREFIX_LEN + len,
>  					  GFP_KERNEL);
>  		if (!new_xattr->name) {
>  			kvfree(new_xattr);
> -			return -ENOMEM;
> +			break;
>  		}
>  
>  		memcpy(new_xattr->name, XATTR_SECURITY_PREFIX,
> @@ -3579,6 +3599,16 @@ static int shmem_initxattrs(struct inode *inode,
>  		simple_xattr_add(&info->xattrs, new_xattr);
>  	}
>  
> +	if (xattr->name != NULL) {
> +		if (ispace) {
> +			raw_spin_lock(&sbinfo->stat_lock);
> +			sbinfo->free_ispace += ispace;
> +			raw_spin_unlock(&sbinfo->stat_lock);
> +		}
> +		simple_xattrs_free(&info->xattrs, NULL);
> +		return -ENOMEM;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -3599,16 +3629,39 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
>  				   size_t size, int flags)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	struct simple_xattr *old_xattr;
> +	size_t ispace = 0;
>  
>  	name = xattr_full_name(handler, name);
> +	if (value && sbinfo->max_inodes) {
> +		ispace = simple_xattr_space(name, size);
> +		raw_spin_lock(&sbinfo->stat_lock);
> +		if (sbinfo->free_ispace < ispace)
> +			ispace = 0;
> +		else
> +			sbinfo->free_ispace -= ispace;
> +		raw_spin_unlock(&sbinfo->stat_lock);
> +		if (!ispace)
> +			return -ENOSPC;
> +	}
> +
>  	old_xattr = simple_xattr_set(&info->xattrs, name, value, size, flags);
>  	if (!IS_ERR(old_xattr)) {
> +		ispace = 0;
> +		if (old_xattr && sbinfo->max_inodes)
> +			ispace = simple_xattr_space(old_xattr->name,
> +						    old_xattr->size);
>  		simple_xattr_free(old_xattr);
>  		old_xattr = NULL;
>  		inode->i_ctime = current_time(inode);
>  		inode_inc_iversion(inode);
>  	}
> +	if (ispace) {
> +		raw_spin_lock(&sbinfo->stat_lock);
> +		sbinfo->free_ispace += ispace;
> +		raw_spin_unlock(&sbinfo->stat_lock);
> +	}
>  	return PTR_ERR(old_xattr);
>  }
>  
> @@ -3624,9 +3677,16 @@ static const struct xattr_handler shmem_trusted_xattr_handler = {
>  	.set = shmem_xattr_handler_set,
>  };
>  
> +static const struct xattr_handler shmem_user_xattr_handler = {
> +	.prefix = XATTR_USER_PREFIX,
> +	.get = shmem_xattr_handler_get,
> +	.set = shmem_xattr_handler_set,
> +};
> +
>  static const struct xattr_handler *shmem_xattr_handlers[] = {
>  	&shmem_security_xattr_handler,
>  	&shmem_trusted_xattr_handler,
> +	&shmem_user_xattr_handler,
>  	NULL
>  };
>  
> -- 
> 2.35.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-09  4:34 ` [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO Hugh Dickins
@ 2023-08-09  9:54   ` Jan Kara
  2023-08-09 13:41   ` Christoph Hellwig
  2023-08-11  6:27   ` [PATCH vfs.tmpfs v2 " Hugh Dickins
  2 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-09  9:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue 08-08-23 21:34:54, Hugh Dickins wrote:
> Depending upon your philosophical viewpoint, either tmpfs always does
> direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
> stand in for a more sophisticated filesystem, it can be helpful for tmpfs
> to support O_DIRECT.  So, give tmpfs a shmem_direct_IO() method, of the
> simplest kind: by just returning 0 done, it leaves all the work to the
> buffered fallback (and everything else just happens to work out okay -
> in particular, its dirty pages don't get lost to invalidation).
> 
> xfstests auto generic which were not run on tmpfs before but now pass:
> 036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
> 323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
> with no new failures.
> 
> LTP dio tests which were not run on tmpfs before but now pass:
> dio01 through dio30, except for dio04 and dio10, which fail because
> tmpfs dio read and write allow odd count: tmpfs could be made stricter,
> but would that be an improvement?
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Yeah, we are not quite consistent about whether it is better to silently
fallback to buffered IO or return error among filesystems. So I guess
whatever you like. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/shmem.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 7420b510a9f3..4d5599e566df 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2720,6 +2720,16 @@ shmem_write_end(struct file *file, struct address_space *mapping,
>  	return copied;
>  }
>  
> +static ssize_t shmem_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> +{
> +	/*
> +	 * Just leave all the work to the buffered fallback.
> +	 * Some LTP tests may expect us to enforce alignment restrictions,
> +	 * but the fallback works just fine with any alignment, so allow it.
> +	 */
> +	return 0;
> +}
> +
>  static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  {
>  	struct file *file = iocb->ki_filp;
> @@ -4421,6 +4431,7 @@ const struct address_space_operations shmem_aops = {
>  #ifdef CONFIG_TMPFS
>  	.write_begin	= shmem_write_begin,
>  	.write_end	= shmem_write_end,
> +	.direct_IO	= shmem_direct_IO,
>  #endif
>  #ifdef CONFIG_MIGRATION
>  	.migrate_folio	= migrate_folio,
> -- 
> 2.35.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO
  2023-08-09  6:45 ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Christian Brauner
@ 2023-08-09 11:33   ` Christian Brauner
  2023-08-10  5:50     ` Hugh Dickins
  0 siblings, 1 reply; 32+ messages in thread
From: Christian Brauner @ 2023-08-09 11:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Wed, Aug 09, 2023 at 08:45:57AM +0200, Christian Brauner wrote:
> On Tue, Aug 08, 2023 at 09:28:08PM -0700, Hugh Dickins wrote:
> > This series enables and limits user extended attributes on tmpfs,
> > and independently provides a trivial direct IO stub for tmpfs.
> > 
> > It is here based on the vfs.tmpfs branch in vfs.git in next-20230808
> > but with a cherry-pick of v6.5-rc4's commit
> > 253e5df8b8f0 ("tmpfs: fix Documentation of noswap and huge mount options")
> > first: since the vfs.tmpfs branch is based on v6.5-rc1, but 3/5 in this
> > series updates tmpfs.rst in a way which depends on that commit.
> > 
> > IIUC the right thing to do would be to cherry-pick 253e5df8b8f0 into
> > vfs.tmpfs before applying this series.  I'm sorry that the series as
> > posted does not apply cleanly to any known tree! but I think posting
> > it against v6.5-rc5 or next-20230808 would be even less helpful.
> 
> No worries, I'll sort that out.

So, I hemmed and hawed but decided to rebase vfs.tmpfs onto v6.5-rc4
which includes that fix as cherry picking is odd.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed
  2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
  2023-08-09  9:21   ` Jan Kara
@ 2023-08-09 11:37   ` Christian Brauner
  2023-08-09 13:19   ` Carlos Maiolino
  2 siblings, 0 replies; 32+ messages in thread
From: Christian Brauner @ 2023-08-09 11:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Tue, Aug 08, 2023 at 09:30:59PM -0700, Hugh Dickins wrote:
> tmpfs wants to support limited user extended attributes, but kernfs
> (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR)
> already supports user extended attributes through simple xattrs: but
> limited by a policy (128KiB per inode) too liberal to be used on tmpfs.
> 
> To allow a different limiting policy for tmpfs, without affecting the
> policy for kernfs, change simple_xattr_set() to return the replaced or
> removed xattr (if any), leaving the caller to update their accounting
> then free the xattr (by simple_xattr_free(), renamed from the static
> free_simple_xattr()).
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---

Seems good enough,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed
  2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
  2023-08-09  9:21   ` Jan Kara
  2023-08-09 11:37   ` Christian Brauner
@ 2023-08-09 13:19   ` Carlos Maiolino
  2 siblings, 0 replies; 32+ messages in thread
From: Carlos Maiolino @ 2023-08-09 13:19 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Jeff Layton, Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu,
	Chris Down, Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox,
	Christoph Hellwig, Pete Zaitcev, Helge Deller, Topi Miettinen,
	Yu Kuai, linux-fsdevel, linux-mm

On Tue, Aug 08, 2023 at 09:30:59PM -0700, Hugh Dickins wrote:
> tmpfs wants to support limited user extended attributes, but kernfs
> (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR)
> already supports user extended attributes through simple xattrs: but
> limited by a policy (128KiB per inode) too liberal to be used on tmpfs.
> 
> To allow a different limiting policy for tmpfs, without affecting the
> policy for kernfs, change simple_xattr_set() to return the replaced or
> removed xattr (if any), leaving the caller to update their accounting
> then free the xattr (by simple_xattr_free(), renamed from the static
> free_simple_xattr()).
> 

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  fs/kernfs/inode.c     | 46 +++++++++++++++++++++++++---------------
>  fs/xattr.c            | 51 +++++++++++++++++++--------------------------
>  include/linux/xattr.h |  7 ++++---
>  mm/shmem.c            | 10 +++++----
>  4 files changed, 61 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> index b22b74d1a115..fec5d5f78f07 100644
> --- a/fs/kernfs/inode.c
> +++ b/fs/kernfs/inode.c
> @@ -306,11 +306,17 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
>  int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
>  		     const void *value, size_t size, int flags)
>  {
> +	struct simple_xattr *old_xattr;
>  	struct kernfs_iattrs *attrs = kernfs_iattrs(kn);
>  	if (!attrs)
>  		return -ENOMEM;
> 
> -	return simple_xattr_set(&attrs->xattrs, name, value, size, flags, NULL);
> +	old_xattr = simple_xattr_set(&attrs->xattrs, name, value, size, flags);
> +	if (IS_ERR(old_xattr))
> +		return PTR_ERR(old_xattr);
> +
> +	simple_xattr_free(old_xattr);
> +	return 0;
>  }
> 
>  static int kernfs_vfs_xattr_get(const struct xattr_handler *handler,
> @@ -342,7 +348,7 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
>  {
>  	atomic_t *sz = &kn->iattr->user_xattr_size;
>  	atomic_t *nr = &kn->iattr->nr_user_xattrs;
> -	ssize_t removed_size;
> +	struct simple_xattr *old_xattr;
>  	int ret;
> 
>  	if (atomic_inc_return(nr) > KERNFS_MAX_USER_XATTRS) {
> @@ -355,13 +361,18 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
>  		goto dec_size_out;
>  	}
> 
> -	ret = simple_xattr_set(xattrs, full_name, value, size, flags,
> -			       &removed_size);
> -
> -	if (!ret && removed_size >= 0)
> -		size = removed_size;
> -	else if (!ret)
> +	old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
> +	if (!old_xattr)
>  		return 0;
> +
> +	if (IS_ERR(old_xattr)) {
> +		ret = PTR_ERR(old_xattr);
> +		goto dec_size_out;
> +	}
> +
> +	ret = 0;
> +	size = old_xattr->size;
> +	simple_xattr_free(old_xattr);
>  dec_size_out:
>  	atomic_sub(size, sz);
>  dec_count_out:
> @@ -376,18 +387,19 @@ static int kernfs_vfs_user_xattr_rm(struct kernfs_node *kn,
>  {
>  	atomic_t *sz = &kn->iattr->user_xattr_size;
>  	atomic_t *nr = &kn->iattr->nr_user_xattrs;
> -	ssize_t removed_size;
> -	int ret;
> +	struct simple_xattr *old_xattr;
> 
> -	ret = simple_xattr_set(xattrs, full_name, value, size, flags,
> -			       &removed_size);
> +	old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
> +	if (!old_xattr)
> +		return 0;
> 
> -	if (removed_size >= 0) {
> -		atomic_sub(removed_size, sz);
> -		atomic_dec(nr);
> -	}
> +	if (IS_ERR(old_xattr))
> +		return PTR_ERR(old_xattr);
> 
> -	return ret;
> +	atomic_sub(old_xattr->size, sz);
> +	atomic_dec(nr);
> +	simple_xattr_free(old_xattr);
> +	return 0;
>  }
> 
>  static int kernfs_vfs_user_xattr_set(const struct xattr_handler *handler,
> diff --git a/fs/xattr.c b/fs/xattr.c
> index e7bbb7f57557..ba37a8f5cfd1 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -1040,12 +1040,12 @@ const char *xattr_full_name(const struct xattr_handler *handler,
>  EXPORT_SYMBOL(xattr_full_name);
> 
>  /**
> - * free_simple_xattr - free an xattr object
> + * simple_xattr_free - free an xattr object
>   * @xattr: the xattr object
>   *
>   * Free the xattr object. Can handle @xattr being NULL.
>   */
> -static inline void free_simple_xattr(struct simple_xattr *xattr)
> +void simple_xattr_free(struct simple_xattr *xattr)
>  {
>  	if (xattr)
>  		kfree(xattr->name);
> @@ -1164,7 +1164,6 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
>   * @value: the value to store along the xattr
>   * @size: the size of @value
>   * @flags: the flags determining how to set the xattr
> - * @removed_size: the size of the removed xattr
>   *
>   * Set a new xattr object.
>   * If @value is passed a new xattr object will be allocated. If XATTR_REPLACE
> @@ -1181,29 +1180,27 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
>   * nothing if XATTR_CREATE is specified in @flags or @flags is zero. For
>   * XATTR_REPLACE we fail as mentioned above.
>   *
> - * Return: On success zero and on error a negative error code is returned.
> + * Return: On success, the removed or replaced xattr is returned, to be freed
> + * by the caller; or NULL if none. On failure a negative error code is returned.
>   */
> -int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
> -		     const void *value, size_t size, int flags,
> -		     ssize_t *removed_size)
> +struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
> +				      const char *name, const void *value,
> +				      size_t size, int flags)
>  {
> -	struct simple_xattr *xattr = NULL, *new_xattr = NULL;
> +	struct simple_xattr *old_xattr = NULL, *new_xattr = NULL;
>  	struct rb_node *parent = NULL, **rbp;
>  	int err = 0, ret;
> 
> -	if (removed_size)
> -		*removed_size = -1;
> -
>  	/* value == NULL means remove */
>  	if (value) {
>  		new_xattr = simple_xattr_alloc(value, size);
>  		if (!new_xattr)
> -			return -ENOMEM;
> +			return ERR_PTR(-ENOMEM);
> 
>  		new_xattr->name = kstrdup(name, GFP_KERNEL);
>  		if (!new_xattr->name) {
> -			free_simple_xattr(new_xattr);
> -			return -ENOMEM;
> +			simple_xattr_free(new_xattr);
> +			return ERR_PTR(-ENOMEM);
>  		}
>  	}
> 
> @@ -1217,12 +1214,12 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
>  		else if (ret > 0)
>  			rbp = &(*rbp)->rb_right;
>  		else
> -			xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
> -		if (xattr)
> +			old_xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
> +		if (old_xattr)
>  			break;
>  	}
> 
> -	if (xattr) {
> +	if (old_xattr) {
>  		/* Fail if XATTR_CREATE is requested and the xattr exists. */
>  		if (flags & XATTR_CREATE) {
>  			err = -EEXIST;
> @@ -1230,12 +1227,10 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
>  		}
> 
>  		if (new_xattr)
> -			rb_replace_node(&xattr->rb_node, &new_xattr->rb_node,
> -					&xattrs->rb_root);
> +			rb_replace_node(&old_xattr->rb_node,
> +					&new_xattr->rb_node, &xattrs->rb_root);
>  		else
> -			rb_erase(&xattr->rb_node, &xattrs->rb_root);
> -		if (!err && removed_size)
> -			*removed_size = xattr->size;
> +			rb_erase(&old_xattr->rb_node, &xattrs->rb_root);
>  	} else {
>  		/* Fail if XATTR_REPLACE is requested but no xattr is found. */
>  		if (flags & XATTR_REPLACE) {
> @@ -1260,12 +1255,10 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
> 
>  out_unlock:
>  	write_unlock(&xattrs->lock);
> -	if (err)
> -		free_simple_xattr(new_xattr);
> -	else
> -		free_simple_xattr(xattr);
> -	return err;
> -
> +	if (!err)
> +		return old_xattr;
> +	simple_xattr_free(new_xattr);
> +	return ERR_PTR(err);
>  }
> 
>  static bool xattr_is_trusted(const char *name)
> @@ -1386,7 +1379,7 @@ void simple_xattrs_free(struct simple_xattrs *xattrs)
>  		rbp_next = rb_next(rbp);
>  		xattr = rb_entry(rbp, struct simple_xattr, rb_node);
>  		rb_erase(&xattr->rb_node, &xattrs->rb_root);
> -		free_simple_xattr(xattr);
> +		simple_xattr_free(xattr);
>  		rbp = rbp_next;
>  	}
>  }
> diff --git a/include/linux/xattr.h b/include/linux/xattr.h
> index d591ef59aa98..e37fe667ae04 100644
> --- a/include/linux/xattr.h
> +++ b/include/linux/xattr.h
> @@ -116,11 +116,12 @@ struct simple_xattr {
>  void simple_xattrs_init(struct simple_xattrs *xattrs);
>  void simple_xattrs_free(struct simple_xattrs *xattrs);
>  struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
> +void simple_xattr_free(struct simple_xattr *xattr);
>  int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
>  		     void *buffer, size_t size);
> -int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
> -		     const void *value, size_t size, int flags,
> -		     ssize_t *removed_size);
> +struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
> +				      const char *name, const void *value,
> +				      size_t size, int flags);
>  ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
>  			  char *buffer, size_t size);
>  void simple_xattr_add(struct simple_xattrs *xattrs,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0f83d86fd8b4..df3cabf54206 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3595,15 +3595,17 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
>  				   size_t size, int flags)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> -	int err;
> +	struct simple_xattr *old_xattr;
> 
>  	name = xattr_full_name(handler, name);
> -	err = simple_xattr_set(&info->xattrs, name, value, size, flags, NULL);
> -	if (!err) {
> +	old_xattr = simple_xattr_set(&info->xattrs, name, value, size, flags);
> +	if (!IS_ERR(old_xattr)) {
> +		simple_xattr_free(old_xattr);
> +		old_xattr = NULL;
>  		inode->i_ctime = current_time(inode);
>  		inode_inc_iversion(inode);
>  	}
> -	return err;
> +	return PTR_ERR(old_xattr);
>  }
> 
>  static const struct xattr_handler shmem_security_xattr_handler = {
> --
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes
  2023-08-09  4:32 ` [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes Hugh Dickins
  2023-08-09  9:33   ` Jan Kara
@ 2023-08-09 13:29   ` Carlos Maiolino
  1 sibling, 0 replies; 32+ messages in thread
From: Carlos Maiolino @ 2023-08-09 13:29 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Jeff Layton, Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu,
	Chris Down, Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox,
	Christoph Hellwig, Pete Zaitcev, Helge Deller, Topi Miettinen,
	Yu Kuai, linux-fsdevel, linux-mm

On Tue, Aug 08, 2023 at 09:32:21PM -0700, Hugh Dickins wrote:
> In preparation for assigning some inode space to extended attributes,
> keep track of free_ispace instead of number of free_inodes: as if one
> tmpfs inode (and accompanying dentry) occupies very approximately 1KiB.
> 
> Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit:
> but take care to enforce the maximum.  And fix the nr_blocks maximum on
> 32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead.
> 
> Delete the incorrect limited<->unlimited blocks/inodes comment above
> shmem_reconfigure(): leave it to the error messages below to describe.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Looks fine, feel free to add:
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> ---
>  include/linux/shmem_fs.h |  2 +-
>  mm/shmem.c               | 33 +++++++++++++++++----------------
>  2 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 9b2d2faff1d0..6b0c626620f5 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -54,7 +54,7 @@ struct shmem_sb_info {
>  	unsigned long max_blocks;   /* How many blocks are allowed */
>  	struct percpu_counter used_blocks;  /* How many are allocated */
>  	unsigned long max_inodes;   /* How many inodes are allowed */
> -	unsigned long free_inodes;  /* How many are left for allocation */
> +	unsigned long free_ispace;  /* How much ispace left for allocation */
>  	raw_spinlock_t stat_lock;   /* Serialize shmem_sb_info changes */
>  	umode_t mode;		    /* Mount mode for root directory */
>  	unsigned char huge;	    /* Whether to try for hugepages */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index df3cabf54206..c39471384168 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -90,6 +90,9 @@ static struct vfsmount *shm_mnt;
>  /* Pretend that each entry is of this size in directory's i_size */
>  #define BOGO_DIRENT_SIZE 20
> 
> +/* Pretend that one inode + its dentry occupy this much memory */
> +#define BOGO_INODE_SIZE 1024
> +
>  /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
>  #define SHORT_SYMLINK_LEN 128
> 
> @@ -137,7 +140,8 @@ static unsigned long shmem_default_max_inodes(void)
>  {
>  	unsigned long nr_pages = totalram_pages();
> 
> -	return min(nr_pages - totalhigh_pages(), nr_pages / 2);
> +	return min3(nr_pages - totalhigh_pages(), nr_pages / 2,
> +			ULONG_MAX / BOGO_INODE_SIZE);
>  }
>  #endif
> 
> @@ -331,11 +335,11 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop)
>  	if (!(sb->s_flags & SB_KERNMOUNT)) {
>  		raw_spin_lock(&sbinfo->stat_lock);
>  		if (sbinfo->max_inodes) {
> -			if (!sbinfo->free_inodes) {
> +			if (sbinfo->free_ispace < BOGO_INODE_SIZE) {
>  				raw_spin_unlock(&sbinfo->stat_lock);
>  				return -ENOSPC;
>  			}
> -			sbinfo->free_inodes--;
> +			sbinfo->free_ispace -= BOGO_INODE_SIZE;
>  		}
>  		if (inop) {
>  			ino = sbinfo->next_ino++;
> @@ -394,7 +398,7 @@ static void shmem_free_inode(struct super_block *sb)
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
>  	if (sbinfo->max_inodes) {
>  		raw_spin_lock(&sbinfo->stat_lock);
> -		sbinfo->free_inodes++;
> +		sbinfo->free_ispace += BOGO_INODE_SIZE;
>  		raw_spin_unlock(&sbinfo->stat_lock);
>  	}
>  }
> @@ -3155,7 +3159,7 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
>  	}
>  	if (sbinfo->max_inodes) {
>  		buf->f_files = sbinfo->max_inodes;
> -		buf->f_ffree = sbinfo->free_inodes;
> +		buf->f_ffree = sbinfo->free_ispace / BOGO_INODE_SIZE;
>  	}
>  	/* else leave those fields 0 like simple_statfs */
> 
> @@ -3815,13 +3819,13 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>  		break;
>  	case Opt_nr_blocks:
>  		ctx->blocks = memparse(param->string, &rest);
> -		if (*rest || ctx->blocks > S64_MAX)
> +		if (*rest || ctx->blocks > LONG_MAX)
>  			goto bad_value;
>  		ctx->seen |= SHMEM_SEEN_BLOCKS;
>  		break;
>  	case Opt_nr_inodes:
>  		ctx->inodes = memparse(param->string, &rest);
> -		if (*rest)
> +		if (*rest || ctx->inodes > ULONG_MAX / BOGO_INODE_SIZE)
>  			goto bad_value;
>  		ctx->seen |= SHMEM_SEEN_INODES;
>  		break;
> @@ -4002,21 +4006,17 @@ static int shmem_parse_options(struct fs_context *fc, void *data)
> 
>  /*
>   * Reconfigure a shmem filesystem.
> - *
> - * Note that we disallow change from limited->unlimited blocks/inodes while any
> - * are in use; but we must separately disallow unlimited->limited, because in
> - * that case we have no record of how much is already in use.
>   */
>  static int shmem_reconfigure(struct fs_context *fc)
>  {
>  	struct shmem_options *ctx = fc->fs_private;
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(fc->root->d_sb);
> -	unsigned long inodes;
> +	unsigned long used_isp;
>  	struct mempolicy *mpol = NULL;
>  	const char *err;
> 
>  	raw_spin_lock(&sbinfo->stat_lock);
> -	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
> +	used_isp = sbinfo->max_inodes * BOGO_INODE_SIZE - sbinfo->free_ispace;
> 
>  	if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
>  		if (!sbinfo->max_blocks) {
> @@ -4034,7 +4034,7 @@ static int shmem_reconfigure(struct fs_context *fc)
>  			err = "Cannot retroactively limit inodes";
>  			goto out;
>  		}
> -		if (ctx->inodes < inodes) {
> +		if (ctx->inodes * BOGO_INODE_SIZE < used_isp) {
>  			err = "Too few inodes for current use";
>  			goto out;
>  		}
> @@ -4080,7 +4080,7 @@ static int shmem_reconfigure(struct fs_context *fc)
>  		sbinfo->max_blocks  = ctx->blocks;
>  	if (ctx->seen & SHMEM_SEEN_INODES) {
>  		sbinfo->max_inodes  = ctx->inodes;
> -		sbinfo->free_inodes = ctx->inodes - inodes;
> +		sbinfo->free_ispace = ctx->inodes * BOGO_INODE_SIZE - used_isp;
>  	}
> 
>  	/*
> @@ -4211,7 +4211,8 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
>  	sb->s_flags |= SB_NOUSER;
>  #endif
>  	sbinfo->max_blocks = ctx->blocks;
> -	sbinfo->free_inodes = sbinfo->max_inodes = ctx->inodes;
> +	sbinfo->max_inodes = ctx->inodes;
> +	sbinfo->free_ispace = sbinfo->max_inodes * BOGO_INODE_SIZE;
>  	if (sb->s_flags & SB_KERNMOUNT) {
>  		sbinfo->ino_batch = alloc_percpu(ino_t);
>  		if (!sbinfo->ino_batch)
> --
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-09  4:34 ` [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO Hugh Dickins
  2023-08-09  9:54   ` Jan Kara
@ 2023-08-09 13:41   ` Christoph Hellwig
  2023-08-10 23:41     ` Darrick J. Wong
  2023-08-11  6:08     ` Hugh Dickins
  2023-08-11  6:27   ` [PATCH vfs.tmpfs v2 " Hugh Dickins
  2 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-08-09 13:41 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

Please do not add a new ->direct_IO method.  I'm currently working hard
on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
your read_iter/write_iter methods.

But if we just start claiming direct I/O support for file systems that
don't actually support it, I'm starting to seriously wonder why we
bother with the flag at all and don't just allow O_DIRECT opens
to always succeed..


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes
  2023-08-09  4:33 ` [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes Hugh Dickins
  2023-08-09  9:50   ` Jan Kara
@ 2023-08-09 13:52   ` Carlos Maiolino
  1 sibling, 0 replies; 32+ messages in thread
From: Carlos Maiolino @ 2023-08-09 13:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Jeff Layton, Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu,
	Chris Down, Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox,
	Christoph Hellwig, Pete Zaitcev, Helge Deller, Topi Miettinen,
	Yu Kuai, linux-fsdevel, linux-mm

On Tue, Aug 08, 2023 at 09:33:56PM -0700, Hugh Dickins wrote:
> Enable "user." extended attributes on tmpfs, limiting them by tracking
> the space they occupy, and deducting that space from the limited ispace
> (unless tmpfs mounted with nr_inodes=0 to leave that ispace unlimited).
> 
> tmpfs inodes and simple xattrs are both unswappable, and have to be in
> lowmem on a 32-bit highmem kernel: so the ispace limit is appropriate
> for xattrs, without any need for a further mount option.
> 
> Add simple_xattr_space() to give approximate but deterministic estimate
> of the space taken up by each xattr: with simple_xattrs_free() outputting
> the space freed if required (but kernfs and even some tmpfs usages do not
> require that, so don't waste time on strlen'ing if not needed).
> 
> Security and trusted xattrs were already supported: for consistency and
> simplicity, account them from the same pool; though there's a small risk
> that a tmpfs with enough space before would now be considered too small.
> 
> When extended attributes are used, "df -i" does show more IUsed and less
> IFree than can be explained by the inodes: document that (manpage later).
> 
> xfstests tests/generic which were not run on tmpfs before but now pass:
> 020 037 062 070 077 097 103 117 337 377 454 486 523 533 611 618 728
> with no new failures.
> 

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  Documentation/filesystems/tmpfs.rst |  7 ++-
>  fs/Kconfig                          |  4 +-
>  fs/kernfs/dir.c                     |  2 +-
>  fs/xattr.c                          | 28 ++++++++++-
>  include/linux/xattr.h               |  3 +-
>  mm/shmem.c                          | 78 +++++++++++++++++++++++++++----
>  6 files changed, 106 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 67422ee10e03..56a26c843dbe 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -21,8 +21,8 @@ explained further below, some of which can be reconfigured dynamically on the
>  fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
>  filesystem can be resized but it cannot be resized to a size below its current
>  usage. tmpfs also supports POSIX ACLs, and extended attributes for the
> -trusted.* and security.* namespaces. ramfs does not use swap and you cannot
> -modify any parameter for a ramfs filesystem. The size limit of a ramfs
> +trusted.*, security.* and user.* namespaces. ramfs does not use swap and you
> +cannot modify any parameter for a ramfs filesystem. The size limit of a ramfs
>  filesystem is how much memory you have available, and so care must be taken if
>  used so to not run out of memory.
> 
> @@ -97,6 +97,9 @@ mount with such options, since it allows any user with write access to
>  use up all the memory on the machine; but enhances the scalability of
>  that instance in a system with many CPUs making intensive use of it.
> 
> +If nr_inodes is not 0, that limited space for inodes is also used up by
> +extended attributes: "df -i"'s IUsed and IUse% increase, IFree decreases.
> +
>  tmpfs blocks may be swapped out, when there is a shortage of memory.
>  tmpfs has a mount option to disable its use of swap:
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 8218a71933f9..7da21f563192 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -205,8 +205,8 @@ config TMPFS_XATTR
>  	  Extended attributes are name:value pairs associated with inodes by
>  	  the kernel or by users (see the attr(5) manual page for details).
> 
> -	  Currently this enables support for the trusted.* and
> -	  security.* namespaces.
> +	  This enables support for the trusted.*, security.* and user.*
> +	  namespaces.
> 
>  	  You need this for POSIX ACL support on tmpfs.
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 5a1a4af9d3d2..660995856a04 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -556,7 +556,7 @@ void kernfs_put(struct kernfs_node *kn)
>  	kfree_const(kn->name);
> 
>  	if (kn->iattr) {
> -		simple_xattrs_free(&kn->iattr->xattrs);
> +		simple_xattrs_free(&kn->iattr->xattrs, NULL);
>  		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
>  	}
>  	spin_lock(&kernfs_idr_lock);
> diff --git a/fs/xattr.c b/fs/xattr.c
> index ba37a8f5cfd1..2d607542281b 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -1039,6 +1039,26 @@ const char *xattr_full_name(const struct xattr_handler *handler,
>  }
>  EXPORT_SYMBOL(xattr_full_name);
> 
> +/**
> + * simple_xattr_space - estimate the memory used by a simple xattr
> + * @name: the full name of the xattr
> + * @size: the size of its value
> + *
> + * This takes no account of how much larger the two slab objects actually are:
> + * that would depend on the slab implementation, when what is required is a
> + * deterministic number, which grows with name length and size and quantity.
> + *
> + * Return: The approximate number of bytes of memory used by such an xattr.
> + */
> +size_t simple_xattr_space(const char *name, size_t size)
> +{
> +	/*
> +	 * Use "40" instead of sizeof(struct simple_xattr), to return the
> +	 * same result on 32-bit and 64-bit, and even if simple_xattr grows.
> +	 */
> +	return 40 + size + strlen(name);
> +}
> +
>  /**
>   * simple_xattr_free - free an xattr object
>   * @xattr: the xattr object
> @@ -1363,14 +1383,17 @@ void simple_xattrs_init(struct simple_xattrs *xattrs)
>  /**
>   * simple_xattrs_free - free xattrs
>   * @xattrs: xattr header whose xattrs to destroy
> + * @freed_space: approximate number of bytes of memory freed from @xattrs
>   *
>   * Destroy all xattrs in @xattr. When this is called no one can hold a
>   * reference to any of the xattrs anymore.
>   */
> -void simple_xattrs_free(struct simple_xattrs *xattrs)
> +void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space)
>  {
>  	struct rb_node *rbp;
> 
> +	if (freed_space)
> +		*freed_space = 0;
>  	rbp = rb_first(&xattrs->rb_root);
>  	while (rbp) {
>  		struct simple_xattr *xattr;
> @@ -1379,6 +1402,9 @@ void simple_xattrs_free(struct simple_xattrs *xattrs)
>  		rbp_next = rb_next(rbp);
>  		xattr = rb_entry(rbp, struct simple_xattr, rb_node);
>  		rb_erase(&xattr->rb_node, &xattrs->rb_root);
> +		if (freed_space)
> +			*freed_space += simple_xattr_space(xattr->name,
> +							   xattr->size);
>  		simple_xattr_free(xattr);
>  		rbp = rbp_next;
>  	}
> diff --git a/include/linux/xattr.h b/include/linux/xattr.h
> index e37fe667ae04..d20051865800 100644
> --- a/include/linux/xattr.h
> +++ b/include/linux/xattr.h
> @@ -114,7 +114,8 @@ struct simple_xattr {
>  };
> 
>  void simple_xattrs_init(struct simple_xattrs *xattrs);
> -void simple_xattrs_free(struct simple_xattrs *xattrs);
> +void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space);
> +size_t simple_xattr_space(const char *name, size_t size);
>  struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
>  void simple_xattr_free(struct simple_xattr *xattr);
>  int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index c39471384168..7420b510a9f3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -393,12 +393,12 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop)
>  	return 0;
>  }
> 
> -static void shmem_free_inode(struct super_block *sb)
> +static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
>  {
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
>  	if (sbinfo->max_inodes) {
>  		raw_spin_lock(&sbinfo->stat_lock);
> -		sbinfo->free_ispace += BOGO_INODE_SIZE;
> +		sbinfo->free_ispace += BOGO_INODE_SIZE + freed_ispace;
>  		raw_spin_unlock(&sbinfo->stat_lock);
>  	}
>  }
> @@ -1232,6 +1232,7 @@ static void shmem_evict_inode(struct inode *inode)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> +	size_t freed;
> 
>  	if (shmem_mapping(inode->i_mapping)) {
>  		shmem_unacct_size(info->flags, inode->i_size);
> @@ -1258,9 +1259,9 @@ static void shmem_evict_inode(struct inode *inode)
>  		}
>  	}
> 
> -	simple_xattrs_free(&info->xattrs);
> +	simple_xattrs_free(&info->xattrs, sbinfo->max_inodes ? &freed : NULL);
> +	shmem_free_inode(inode->i_sb, freed);
>  	WARN_ON(inode->i_blocks);
> -	shmem_free_inode(inode->i_sb);
>  	clear_inode(inode);
>  #ifdef CONFIG_TMPFS_QUOTA
>  	dquot_free_inode(inode);
> @@ -2440,7 +2441,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>  	inode = new_inode(sb);
> 
>  	if (!inode) {
> -		shmem_free_inode(sb);
> +		shmem_free_inode(sb, 0);
>  		return ERR_PTR(-ENOSPC);
>  	}
> 
> @@ -3281,7 +3282,7 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
>  	ret = simple_offset_add(shmem_get_offset_ctx(dir), dentry);
>  	if (ret) {
>  		if (inode->i_nlink)
> -			shmem_free_inode(inode->i_sb);
> +			shmem_free_inode(inode->i_sb, 0);
>  		goto out;
>  	}
> 
> @@ -3301,7 +3302,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
>  	struct inode *inode = d_inode(dentry);
> 
>  	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
> -		shmem_free_inode(inode->i_sb);
> +		shmem_free_inode(inode->i_sb, 0);
> 
>  	simple_offset_remove(shmem_get_offset_ctx(dir), dentry);
> 
> @@ -3554,21 +3555,40 @@ static int shmem_initxattrs(struct inode *inode,
>  			    void *fs_info)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	const struct xattr *xattr;
>  	struct simple_xattr *new_xattr;
> +	size_t ispace = 0;
>  	size_t len;
> 
> +	if (sbinfo->max_inodes) {
> +		for (xattr = xattr_array; xattr->name != NULL; xattr++) {
> +			ispace += simple_xattr_space(xattr->name,
> +				xattr->value_len + XATTR_SECURITY_PREFIX_LEN);
> +		}
> +		if (ispace) {
> +			raw_spin_lock(&sbinfo->stat_lock);
> +			if (sbinfo->free_ispace < ispace)
> +				ispace = 0;
> +			else
> +				sbinfo->free_ispace -= ispace;
> +			raw_spin_unlock(&sbinfo->stat_lock);
> +			if (!ispace)
> +				return -ENOSPC;
> +		}
> +	}
> +
>  	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
>  		new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
>  		if (!new_xattr)
> -			return -ENOMEM;
> +			break;
> 
>  		len = strlen(xattr->name) + 1;
>  		new_xattr->name = kmalloc(XATTR_SECURITY_PREFIX_LEN + len,
>  					  GFP_KERNEL);
>  		if (!new_xattr->name) {
>  			kvfree(new_xattr);
> -			return -ENOMEM;
> +			break;
>  		}
> 
>  		memcpy(new_xattr->name, XATTR_SECURITY_PREFIX,
> @@ -3579,6 +3599,16 @@ static int shmem_initxattrs(struct inode *inode,
>  		simple_xattr_add(&info->xattrs, new_xattr);
>  	}
> 
> +	if (xattr->name != NULL) {
> +		if (ispace) {
> +			raw_spin_lock(&sbinfo->stat_lock);
> +			sbinfo->free_ispace += ispace;
> +			raw_spin_unlock(&sbinfo->stat_lock);
> +		}
> +		simple_xattrs_free(&info->xattrs, NULL);
> +		return -ENOMEM;
> +	}
> +
>  	return 0;
>  }
> 
> @@ -3599,16 +3629,39 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
>  				   size_t size, int flags)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	struct simple_xattr *old_xattr;
> +	size_t ispace = 0;
> 
>  	name = xattr_full_name(handler, name);
> +	if (value && sbinfo->max_inodes) {
> +		ispace = simple_xattr_space(name, size);
> +		raw_spin_lock(&sbinfo->stat_lock);
> +		if (sbinfo->free_ispace < ispace)
> +			ispace = 0;
> +		else
> +			sbinfo->free_ispace -= ispace;
> +		raw_spin_unlock(&sbinfo->stat_lock);
> +		if (!ispace)
> +			return -ENOSPC;
> +	}
> +
>  	old_xattr = simple_xattr_set(&info->xattrs, name, value, size, flags);
>  	if (!IS_ERR(old_xattr)) {
> +		ispace = 0;
> +		if (old_xattr && sbinfo->max_inodes)
> +			ispace = simple_xattr_space(old_xattr->name,
> +						    old_xattr->size);
>  		simple_xattr_free(old_xattr);
>  		old_xattr = NULL;
>  		inode->i_ctime = current_time(inode);
>  		inode_inc_iversion(inode);
>  	}
> +	if (ispace) {
> +		raw_spin_lock(&sbinfo->stat_lock);
> +		sbinfo->free_ispace += ispace;
> +		raw_spin_unlock(&sbinfo->stat_lock);
> +	}
>  	return PTR_ERR(old_xattr);
>  }
> 
> @@ -3624,9 +3677,16 @@ static const struct xattr_handler shmem_trusted_xattr_handler = {
>  	.set = shmem_xattr_handler_set,
>  };
> 
> +static const struct xattr_handler shmem_user_xattr_handler = {
> +	.prefix = XATTR_USER_PREFIX,
> +	.get = shmem_xattr_handler_get,
> +	.set = shmem_xattr_handler_set,
> +};
> +
>  static const struct xattr_handler *shmem_xattr_handlers[] = {
>  	&shmem_security_xattr_handler,
>  	&shmem_trusted_xattr_handler,
> +	&shmem_user_xattr_handler,
>  	NULL
>  };
> 
> --
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO
  2023-08-09 11:33   ` Christian Brauner
@ 2023-08-10  5:50     ` Hugh Dickins
  2023-08-10 10:07       ` Christian Brauner
  2023-08-10 23:23       ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Pete Zaitcev
  0 siblings, 2 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-10  5:50 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Hugh Dickins, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Wed, 9 Aug 2023, Christian Brauner wrote:
> On Wed, Aug 09, 2023 at 08:45:57AM +0200, Christian Brauner wrote:
> > On Tue, Aug 08, 2023 at 09:28:08PM -0700, Hugh Dickins wrote:
> > > This series enables and limits user extended attributes on tmpfs,
> > > and independently provides a trivial direct IO stub for tmpfs.
> > > 
> > > It is here based on the vfs.tmpfs branch in vfs.git in next-20230808
> > > but with a cherry-pick of v6.5-rc4's commit
> > > 253e5df8b8f0 ("tmpfs: fix Documentation of noswap and huge mount options")
> > > first: since the vfs.tmpfs branch is based on v6.5-rc1, but 3/5 in this
> > > series updates tmpfs.rst in a way which depends on that commit.
> > > 
> > > IIUC the right thing to do would be to cherry-pick 253e5df8b8f0 into
> > > vfs.tmpfs before applying this series.  I'm sorry that the series as
> > > posted does not apply cleanly to any known tree! but I think posting
> > > it against v6.5-rc5 or next-20230808 would be even less helpful.
> > 
> > No worries, I'll sort that out.
> 
> So, I hemmed and hawed but decided to rebase vfs.tmpfs onto v6.5-rc4
> which includes that fix as cherry picking is odd.

Even better, thanks.

And big thank you to you and Jan and Carlos for the very quick and
welcoming reviews.  If only Hugh were able to respond like that...

Needing "freed = 0" in shmem_evict_inode(), as reported by robot:
that was stupid of me (though it happens not to matter what the value
is in the uninitialized case): I'll send you the fixup to 3/5 tomorrow
(unless it turns out that you've typed in the " = 0" yourself already).

And I'll send a replacement for 4/5, the direct IO one, following
Christoph's guidance: but I'm wilting, and just didn't get to it today.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO
  2023-08-10  5:50     ` Hugh Dickins
@ 2023-08-10 10:07       ` Christian Brauner
  2023-08-21 17:39         ` [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs Hugh Dickins
  2023-08-10 23:23       ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Pete Zaitcev
  1 sibling, 1 reply; 32+ messages in thread
From: Christian Brauner @ 2023-08-10 10:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	linux-fsdevel, linux-mm

On Wed, Aug 09, 2023 at 10:50:39PM -0700, Hugh Dickins wrote:
> On Wed, 9 Aug 2023, Christian Brauner wrote:
> > On Wed, Aug 09, 2023 at 08:45:57AM +0200, Christian Brauner wrote:
> > > On Tue, Aug 08, 2023 at 09:28:08PM -0700, Hugh Dickins wrote:
> > > > This series enables and limits user extended attributes on tmpfs,
> > > > and independently provides a trivial direct IO stub for tmpfs.
> > > > 
> > > > It is here based on the vfs.tmpfs branch in vfs.git in next-20230808
> > > > but with a cherry-pick of v6.5-rc4's commit
> > > > 253e5df8b8f0 ("tmpfs: fix Documentation of noswap and huge mount options")
> > > > first: since the vfs.tmpfs branch is based on v6.5-rc1, but 3/5 in this
> > > > series updates tmpfs.rst in a way which depends on that commit.
> > > > 
> > > > IIUC the right thing to do would be to cherry-pick 253e5df8b8f0 into
> > > > vfs.tmpfs before applying this series.  I'm sorry that the series as
> > > > posted does not apply cleanly to any known tree! but I think posting
> > > > it against v6.5-rc5 or next-20230808 would be even less helpful.
> > > 
> > > No worries, I'll sort that out.
> > 
> > So, I hemmed and hawed but decided to rebase vfs.tmpfs onto v6.5-rc4
> > which includes that fix as cherry picking is odd.
> 
> Even better, thanks.
> 
> And big thank you to you and Jan and Carlos for the very quick and
> welcoming reviews.

Happy to.

> Needing "freed = 0" in shmem_evict_inode(), as reported by robot:

Fixed that.

> And I'll send a replacement for 4/5, the direct IO one, following

Ah great, thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO
  2023-08-10  5:50     ` Hugh Dickins
  2023-08-10 10:07       ` Christian Brauner
@ 2023-08-10 23:23       ` Pete Zaitcev
  1 sibling, 0 replies; 32+ messages in thread
From: Pete Zaitcev @ 2023-08-10 23:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Helge Deller, Topi Miettinen, Yu Kuai, linux-fsdevel, linux-mm

On Wed, 9 Aug 2023 22:50:39 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> And big thank you to you and Jan and Carlos for the very quick and
> welcoming reviews.  If only Hugh were able to respond like that...

No negative self-talk. My own naive submission was sitting in my
mailbox since February 2020, so at least you're doing it! Seriously,
thanks a lot for finding the cycles.

-- Pete


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-09 13:41   ` Christoph Hellwig
@ 2023-08-10 23:41     ` Darrick J. Wong
  2023-08-11  6:16       ` Hugh Dickins
  2023-08-11  6:08     ` Hugh Dickins
  1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2023-08-10 23:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hugh Dickins, Christian Brauner, Andrew Morton,
	Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton, Chuck Lever,
	Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Pete Zaitcev, Helge Deller,
	Topi Miettinen, Yu Kuai, linux-fsdevel, linux-mm

On Wed, Aug 09, 2023 at 06:41:17AM -0700, Christoph Hellwig wrote:
> Please do not add a new ->direct_IO method.  I'm currently working hard
> on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
> your read_iter/write_iter methods.
> 
> But if we just start claiming direct I/O support for file systems that
> don't actually support it, I'm starting to seriously wonder why we
> bother with the flag at all and don't just allow O_DIRECT opens
> to always succeed..

I see it differently -- you can do byte-aligned directio to S_DAX files
on persistent memory, so I don't see why you can't do that for tmpfs
files too.

(I'm not advocating for letting *disk* based filesystems allow O_DIRECT
even if read and writes are always going to go through the page cache
and get flushed to disk.  If programs wanted that, they'd use O_SYNC.)

/mnt is a pmem filesystem, /mnt/on/file has S_DAX set, and /mnt/off/file
does not:

# xfs_io -c statx /mnt/{on,off}/file
fd.path = "/mnt/on/file"
fd.flags = non-sync,non-direct,read-write
stat.ino = 132
stat.type = regular file
stat.size = 1048576
stat.blocks = 2048
fsxattr.xflags = 0x8002 [-p------------x--]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.cowextsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
fd.path = "/mnt/off/file"
fd.flags = non-sync,non-direct,read-write
stat.ino = 8388737
stat.type = regular file
stat.size = 1048576
stat.blocks = 2048
fsxattr.xflags = 0x2 [-p---------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.cowextsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

And now we try a byte-aligned direct write:

# xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/off/file
pwrite: Invalid argument
# xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/on/file
wrote 1/1 bytes at offset 47
1.000000 bytes, 1 ops; 0.0001 sec (5.194 KiB/sec and 5319.1489 ops/sec)

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-09 13:41   ` Christoph Hellwig
  2023-08-10 23:41     ` Darrick J. Wong
@ 2023-08-11  6:08     ` Hugh Dickins
  1 sibling, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-11  6:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hugh Dickins, Christian Brauner, Andrew Morton,
	Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton, Chuck Lever,
	Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Pete Zaitcev, Helge Deller,
	Topi Miettinen, Yu Kuai, linux-fsdevel, linux-mm

On Wed, 9 Aug 2023, Christoph Hellwig wrote:

> Please do not add a new ->direct_IO method.  I'm currently working hard
> on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
> your read_iter/write_iter methods.

Thanks for the input, I'd missed that FMODE_CAN_ODIRECT development.
I can see why you would surely prefer not to have a .direct_IO added.

But whether that's right for tmpfs at this time, I'll let you and all
decide: I've tried and tested the v2 patch now, and will send it out
shortly; but it has to add a shmem_file_write_iter(), where shmem was
doing fine with generic_file_write_iter() + direct_IO() stub before.

So my own feeling is that the v1 patch with shmem_direct_IO() was better,
duplicating less code; but whatever, you can all decide between them.

> 
> But if we just start claiming direct I/O support for file systems that
> don't actually support it, I'm starting to seriously wonder why we
> bother with the flag at all and don't just allow O_DIRECT opens
> to always succeed..

Yes, I've wondered that way too, but don't have a strong opinion on it.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-10 23:41     ` Darrick J. Wong
@ 2023-08-11  6:16       ` Hugh Dickins
  2023-08-11  8:34         ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2023-08-11  6:16 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Hugh Dickins, Christian Brauner, Andrew Morton,
	Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton, Chuck Lever,
	Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Pete Zaitcev, Helge Deller,
	Topi Miettinen, Yu Kuai, linux-fsdevel, linux-mm

On Thu, 10 Aug 2023, Darrick J. Wong wrote:
> On Wed, Aug 09, 2023 at 06:41:17AM -0700, Christoph Hellwig wrote:
> > Please do not add a new ->direct_IO method.  I'm currently working hard
> > on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
> > your read_iter/write_iter methods.
> > 
> > But if we just start claiming direct I/O support for file systems that
> > don't actually support it, I'm starting to seriously wonder why we
> > bother with the flag at all and don't just allow O_DIRECT opens
> > to always succeed..
> 
> I see it differently -- you can do byte-aligned directio to S_DAX files
> on persistent memory, so I don't see why you can't do that for tmpfs
> files too.

Helpful support, thanks.  But I didn't read Christoph as unhappy with
the granularity issue: just giving me directIOn to FMODE_CAN_ODIRECT,
and rightly wondering why we ever fail O_DIRECTs.

Hugh

> 
> (I'm not advocating for letting *disk* based filesystems allow O_DIRECT
> even if read and writes are always going to go through the page cache
> and get flushed to disk.  If programs wanted that, they'd use O_SYNC.)
> 
> /mnt is a pmem filesystem, /mnt/on/file has S_DAX set, and /mnt/off/file
> does not:
> 
> # xfs_io -c statx /mnt/{on,off}/file
> fd.path = "/mnt/on/file"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 132
> stat.type = regular file
> stat.size = 1048576
> stat.blocks = 2048
> fsxattr.xflags = 0x8002 [-p------------x--]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.cowextsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> fd.path = "/mnt/off/file"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 8388737
> stat.type = regular file
> stat.size = 1048576
> stat.blocks = 2048
> fsxattr.xflags = 0x2 [-p---------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.cowextsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> 
> And now we try a byte-aligned direct write:
> 
> # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/off/file
> pwrite: Invalid argument
> # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/on/file
> wrote 1/1 bytes at offset 47
> 1.000000 bytes, 1 ops; 0.0001 sec (5.194 KiB/sec and 5319.1489 ops/sec)
> 
> --D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs v2 4/5] tmpfs: trivial support for direct IO
  2023-08-09  4:34 ` [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO Hugh Dickins
  2023-08-09  9:54   ` Jan Kara
  2023-08-09 13:41   ` Christoph Hellwig
@ 2023-08-11  6:27   ` Hugh Dickins
  2023-08-11  8:35     ` Christoph Hellwig
                       ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-11  6:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Darrick J. Wong, linux-fsdevel, linux-mm

Depending upon your philosophical viewpoint, either tmpfs always does
direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
stand in for a more sophisticated filesystem, it can be helpful for tmpfs
to support O_DIRECT.  So, give tmpfs a shmem_file_open() method, to set
the FMODE_CAN_ODIRECT flag: then unchanged shmem_file_read_iter() and new
shmem_file_write_iter() do the work (without any shmem_direct_IO() stub).

Perhaps later, once the direct_IO method has been eliminated from all
filesystems, generic_file_write_iter() will be such that tmpfs can again
use it, even for O_DIRECT.

xfstests auto generic which were not run on tmpfs before but now pass:
036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
with no new failures.

LTP dio tests which were not run on tmpfs before but now pass:
dio01 through dio30, except for dio04 and dio10, which fail because
tmpfs dio read and write allow odd count: tmpfs could be made stricter,
but would that be an improvement?

Signed-off-by: Hugh Dickins <hughd@google.com>
---
Thanks for your earlier review, Jan: I've certainly not copied that
into this entirely different version.  I prefer the v1, but fine if
people prefer this v2.

 mm/shmem.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index ca43fb256b8e..b782edeb69aa 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2388,6 +2388,12 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
+static int shmem_file_open(struct inode *inode, struct file *file)
+{
+	file->f_mode |= FMODE_CAN_ODIRECT;
+	return generic_file_open(inode, file);
+}
+
 #ifdef CONFIG_TMPFS_XATTR
 static int shmem_initxattrs(struct inode *, const struct xattr *, void *);
 
@@ -2839,6 +2845,28 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return retval ? retval : error;
 }
 
+static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	ssize_t ret;
+
+	inode_lock(inode);
+	ret = generic_write_checks(iocb, from);
+	if (ret <= 0)
+		goto unlock;
+	ret = file_remove_privs(file);
+	if (ret)
+		goto unlock;
+	ret = file_update_time(file);
+	if (ret)
+		goto unlock;
+	ret = generic_perform_write(iocb, from);
+unlock:
+	inode_unlock(inode);
+	return ret;
+}
+
 static bool zero_pipe_buf_get(struct pipe_inode_info *pipe,
 			      struct pipe_buffer *buf)
 {
@@ -4434,12 +4462,12 @@ EXPORT_SYMBOL(shmem_aops);
 
 static const struct file_operations shmem_file_operations = {
 	.mmap		= shmem_mmap,
-	.open		= generic_file_open,
+	.open		= shmem_file_open,
 	.get_unmapped_area = shmem_get_unmapped_area,
 #ifdef CONFIG_TMPFS
 	.llseek		= shmem_file_llseek,
 	.read_iter	= shmem_file_read_iter,
-	.write_iter	= generic_file_write_iter,
+	.write_iter	= shmem_file_write_iter,
 	.fsync		= noop_fsync,
 	.splice_read	= shmem_file_splice_read,
 	.splice_write	= iter_file_splice_write,
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO
  2023-08-11  6:16       ` Hugh Dickins
@ 2023-08-11  8:34         ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-08-11  8:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Darrick J. Wong, Christoph Hellwig, Christian Brauner,
	Andrew Morton, Oleksandr Tymoshenko, Carlos Maiolino, Jeff Layton,
	Chuck Lever, Jan Kara, Miklos Szeredi, Daniel Xu, Chris Down,
	Tejun Heo, Greg Kroah-Hartman, Matthew Wilcox, Pete Zaitcev,
	Helge Deller, Topi Miettinen, Yu Kuai, linux-fsdevel, linux-mm

On Thu, Aug 10, 2023 at 11:16:20PM -0700, Hugh Dickins wrote:
> Helpful support, thanks.  But I didn't read Christoph as unhappy with
> the granularity issue: just giving me directIOn to FMODE_CAN_ODIRECT,
> and rightly wondering why we ever fail O_DIRECTs.

Exactly.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs v2 4/5] tmpfs: trivial support for direct IO
  2023-08-11  6:27   ` [PATCH vfs.tmpfs v2 " Hugh Dickins
@ 2023-08-11  8:35     ` Christoph Hellwig
  2023-08-11  8:56     ` Jan Kara
  2023-08-11 11:00     ` (subset) " Christian Brauner
  2 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-08-11  8:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Darrick J. Wong, linux-fsdevel, linux-mm

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs v2 4/5] tmpfs: trivial support for direct IO
  2023-08-11  6:27   ` [PATCH vfs.tmpfs v2 " Hugh Dickins
  2023-08-11  8:35     ` Christoph Hellwig
@ 2023-08-11  8:56     ` Jan Kara
  2023-08-11 11:00     ` (subset) " Christian Brauner
  2 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-11  8:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Darrick J. Wong, linux-fsdevel, linux-mm

On Thu 10-08-23 23:27:07, Hugh Dickins wrote:
> Depending upon your philosophical viewpoint, either tmpfs always does
> direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
> stand in for a more sophisticated filesystem, it can be helpful for tmpfs
> to support O_DIRECT.  So, give tmpfs a shmem_file_open() method, to set
> the FMODE_CAN_ODIRECT flag: then unchanged shmem_file_read_iter() and new
> shmem_file_write_iter() do the work (without any shmem_direct_IO() stub).
> 
> Perhaps later, once the direct_IO method has been eliminated from all
> filesystems, generic_file_write_iter() will be such that tmpfs can again
> use it, even for O_DIRECT.
> 
> xfstests auto generic which were not run on tmpfs before but now pass:
> 036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
> 323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
> with no new failures.
> 
> LTP dio tests which were not run on tmpfs before but now pass:
> dio01 through dio30, except for dio04 and dio10, which fail because
> tmpfs dio read and write allow odd count: tmpfs could be made stricter,
> but would that be an improvement?
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> Thanks for your earlier review, Jan: I've certainly not copied that
> into this entirely different version.  I prefer the v1, but fine if
> people prefer this v2.

Yeah, this solution is also fine with me so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

I agree the previous version has less code duplication but once .direct_IO
is gone shmem_file_write_iter() will be actually how some generic helper
will look like so we can deduplicate the code then.

								Honza
 
>  mm/shmem.c | 32 ++++++++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ca43fb256b8e..b782edeb69aa 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2388,6 +2388,12 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
>  	return 0;
>  }
>  
> +static int shmem_file_open(struct inode *inode, struct file *file)
> +{
> +	file->f_mode |= FMODE_CAN_ODIRECT;
> +	return generic_file_open(inode, file);
> +}
> +
>  #ifdef CONFIG_TMPFS_XATTR
>  static int shmem_initxattrs(struct inode *, const struct xattr *, void *);
>  
> @@ -2839,6 +2845,28 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	return retval ? retval : error;
>  }
>  
> +static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct inode *inode = file->f_mapping->host;
> +	ssize_t ret;
> +
> +	inode_lock(inode);
> +	ret = generic_write_checks(iocb, from);
> +	if (ret <= 0)
> +		goto unlock;
> +	ret = file_remove_privs(file);
> +	if (ret)
> +		goto unlock;
> +	ret = file_update_time(file);
> +	if (ret)
> +		goto unlock;
> +	ret = generic_perform_write(iocb, from);
> +unlock:
> +	inode_unlock(inode);
> +	return ret;
> +}
> +
>  static bool zero_pipe_buf_get(struct pipe_inode_info *pipe,
>  			      struct pipe_buffer *buf)
>  {
> @@ -4434,12 +4462,12 @@ EXPORT_SYMBOL(shmem_aops);
>  
>  static const struct file_operations shmem_file_operations = {
>  	.mmap		= shmem_mmap,
> -	.open		= generic_file_open,
> +	.open		= shmem_file_open,
>  	.get_unmapped_area = shmem_get_unmapped_area,
>  #ifdef CONFIG_TMPFS
>  	.llseek		= shmem_file_llseek,
>  	.read_iter	= shmem_file_read_iter,
> -	.write_iter	= generic_file_write_iter,
> +	.write_iter	= shmem_file_write_iter,
>  	.fsync		= noop_fsync,
>  	.splice_read	= shmem_file_splice_read,
>  	.splice_write	= iter_file_splice_write,
> -- 
> 2.35.3
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: (subset) [PATCH vfs.tmpfs v2 4/5] tmpfs: trivial support for direct IO
  2023-08-11  6:27   ` [PATCH vfs.tmpfs v2 " Hugh Dickins
  2023-08-11  8:35     ` Christoph Hellwig
  2023-08-11  8:56     ` Jan Kara
@ 2023-08-11 11:00     ` Christian Brauner
  2 siblings, 0 replies; 32+ messages in thread
From: Christian Brauner @ 2023-08-11 11:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Darrick J. Wong, linux-fsdevel, linux-mm

On Thu, 10 Aug 2023 23:27:07 -0700, Hugh Dickins wrote:
> Depending upon your philosophical viewpoint, either tmpfs always does
> direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
> stand in for a more sophisticated filesystem, it can be helpful for tmpfs
> to support O_DIRECT.  So, give tmpfs a shmem_file_open() method, to set
> the FMODE_CAN_ODIRECT flag: then unchanged shmem_file_read_iter() and new
> shmem_file_write_iter() do the work (without any shmem_direct_IO() stub).
> 
> [...]

I've dropped the previous version and applied this one. Thank!

---

Applied to the vfs.tmpfs branch of the vfs/vfs.git tree.
Patches in the vfs.tmpfs branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.tmpfs

[4/5] tmpfs: trivial support for direct IO
      https://git.kernel.org/vfs/vfs/c/6b55d273ec5b

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
  2023-08-10 10:07       ` Christian Brauner
@ 2023-08-21 17:39         ` Hugh Dickins
  2023-08-21 17:57           ` Jan Kara
  2023-08-22  8:58           ` (subset) " Christian Brauner
  0 siblings, 2 replies; 32+ messages in thread
From: Hugh Dickins @ 2023-08-21 17:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Hugh Dickins, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Franklin Mathieu, linux-fsdevel, linux-mm

It is particularly important for the userns mount case (when a sensible
nr_inodes maximum may not be enforced) that tmpfs user xattrs be subject
to memory cgroup limiting.  Leave temporary buffer allocations as is,
but change the persistent simple xattr allocations from GFP_KERNEL to
GFP_KERNEL_ACCOUNT.  This limits kernfs's cgroupfs too, but that's good.

(I had intended to send this change earlier, but had been confused by
shmem_alloc_inode() using GFP_KERNEL, and thought a discussion would be
needed to change that too: no, I was forgetting the SLAB_ACCOUNT on that
kmem_cache, which implicitly adds __GFP_ACCOUNT to all its allocations.)

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 fs/xattr.c | 4 ++--
 mm/shmem.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 2d607542281b..efd4736bc94b 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1093,7 +1093,7 @@ struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
 	if (len < sizeof(*new_xattr))
 		return NULL;
 
-	new_xattr = kvmalloc(len, GFP_KERNEL);
+	new_xattr = kvmalloc(len, GFP_KERNEL_ACCOUNT);
 	if (!new_xattr)
 		return NULL;
 
@@ -1217,7 +1217,7 @@ struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
 		if (!new_xattr)
 			return ERR_PTR(-ENOMEM);
 
-		new_xattr->name = kstrdup(name, GFP_KERNEL);
+		new_xattr->name = kstrdup(name, GFP_KERNEL_ACCOUNT);
 		if (!new_xattr->name) {
 			simple_xattr_free(new_xattr);
 			return ERR_PTR(-ENOMEM);
diff --git a/mm/shmem.c b/mm/shmem.c
index b782edeb69aa..11298c797cdc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3616,7 +3616,7 @@ static int shmem_initxattrs(struct inode *inode,
 
 		len = strlen(xattr->name) + 1;
 		new_xattr->name = kmalloc(XATTR_SECURITY_PREFIX_LEN + len,
-					  GFP_KERNEL);
+					  GFP_KERNEL_ACCOUNT);
 		if (!new_xattr->name) {
 			kvfree(new_xattr);
 			break;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
  2023-08-21 17:39         ` [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs Hugh Dickins
@ 2023-08-21 17:57           ` Jan Kara
  2023-08-22  8:58           ` (subset) " Christian Brauner
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kara @ 2023-08-21 17:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Franklin Mathieu, linux-fsdevel, linux-mm

On Mon 21-08-23 10:39:20, Hugh Dickins wrote:
> It is particularly important for the userns mount case (when a sensible
> nr_inodes maximum may not be enforced) that tmpfs user xattrs be subject
> to memory cgroup limiting.  Leave temporary buffer allocations as is,
> but change the persistent simple xattr allocations from GFP_KERNEL to
> GFP_KERNEL_ACCOUNT.  This limits kernfs's cgroupfs too, but that's good.
> 
> (I had intended to send this change earlier, but had been confused by
> shmem_alloc_inode() using GFP_KERNEL, and thought a discussion would be
> needed to change that too: no, I was forgetting the SLAB_ACCOUNT on that
> kmem_cache, which implicitly adds __GFP_ACCOUNT to all its allocations.)
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

So I've checked and also kernfs is affected by these changes. I guess that
makes sense as well. So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/xattr.c | 4 ++--
>  mm/shmem.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 2d607542281b..efd4736bc94b 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -1093,7 +1093,7 @@ struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
>  	if (len < sizeof(*new_xattr))
>  		return NULL;
>  
> -	new_xattr = kvmalloc(len, GFP_KERNEL);
> +	new_xattr = kvmalloc(len, GFP_KERNEL_ACCOUNT);
>  	if (!new_xattr)
>  		return NULL;
>  
> @@ -1217,7 +1217,7 @@ struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs,
>  		if (!new_xattr)
>  			return ERR_PTR(-ENOMEM);
>  
> -		new_xattr->name = kstrdup(name, GFP_KERNEL);
> +		new_xattr->name = kstrdup(name, GFP_KERNEL_ACCOUNT);
>  		if (!new_xattr->name) {
>  			simple_xattr_free(new_xattr);
>  			return ERR_PTR(-ENOMEM);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b782edeb69aa..11298c797cdc 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3616,7 +3616,7 @@ static int shmem_initxattrs(struct inode *inode,
>  
>  		len = strlen(xattr->name) + 1;
>  		new_xattr->name = kmalloc(XATTR_SECURITY_PREFIX_LEN + len,
> -					  GFP_KERNEL);
> +					  GFP_KERNEL_ACCOUNT);
>  		if (!new_xattr->name) {
>  			kvfree(new_xattr);
>  			break;
> -- 
> 2.35.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: (subset) [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
  2023-08-21 17:39         ` [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs Hugh Dickins
  2023-08-21 17:57           ` Jan Kara
@ 2023-08-22  8:58           ` Christian Brauner
  1 sibling, 0 replies; 32+ messages in thread
From: Christian Brauner @ 2023-08-22  8:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christian Brauner, Andrew Morton, Oleksandr Tymoshenko,
	Carlos Maiolino, Jeff Layton, Chuck Lever, Jan Kara,
	Miklos Szeredi, Daniel Xu, Chris Down, Tejun Heo,
	Greg Kroah-Hartman, Matthew Wilcox, Christoph Hellwig,
	Pete Zaitcev, Helge Deller, Topi Miettinen, Yu Kuai,
	Franklin Mathieu, linux-fsdevel, linux-mm

On Mon, 21 Aug 2023 10:39:20 -0700, Hugh Dickins wrote:
> It is particularly important for the userns mount case (when a sensible
> nr_inodes maximum may not be enforced) that tmpfs user xattrs be subject
> to memory cgroup limiting.  Leave temporary buffer allocations as is,
> but change the persistent simple xattr allocations from GFP_KERNEL to
> GFP_KERNEL_ACCOUNT.  This limits kernfs's cgroupfs too, but that's good.
> 
> (I had intended to send this change earlier, but had been confused by
> shmem_alloc_inode() using GFP_KERNEL, and thought a discussion would be
> needed to change that too: no, I was forgetting the SLAB_ACCOUNT on that
> kmem_cache, which implicitly adds __GFP_ACCOUNT to all its allocations.)
> 
> [...]

Applied to the vfs.tmpfs branch of the vfs/vfs.git tree.
Patches in the vfs.tmpfs branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.tmpfs

[1/1] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
      https://git.kernel.org/vfs/vfs/c/572a3d1e5d3a

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-08-22  8:59 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09  4:28 [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Hugh Dickins
2023-08-09  4:30 ` [PATCH vfs.tmpfs 1/5] xattr: simple_xattr_set() return old_xattr to be freed Hugh Dickins
2023-08-09  9:21   ` Jan Kara
2023-08-09 11:37   ` Christian Brauner
2023-08-09 13:19   ` Carlos Maiolino
2023-08-09  4:32 ` [PATCH vfs.tmpfs 2/5] tmpfs: track free_ispace instead of free_inodes Hugh Dickins
2023-08-09  9:33   ` Jan Kara
2023-08-09 13:29   ` Carlos Maiolino
2023-08-09  4:33 ` [PATCH vfs.tmpfs 3/5] tmpfs,xattr: enable limited user extended attributes Hugh Dickins
2023-08-09  9:50   ` Jan Kara
2023-08-09 13:52   ` Carlos Maiolino
2023-08-09  4:34 ` [PATCH vfs.tmpfs 4/5] tmpfs: trivial support for direct IO Hugh Dickins
2023-08-09  9:54   ` Jan Kara
2023-08-09 13:41   ` Christoph Hellwig
2023-08-10 23:41     ` Darrick J. Wong
2023-08-11  6:16       ` Hugh Dickins
2023-08-11  8:34         ` Christoph Hellwig
2023-08-11  6:08     ` Hugh Dickins
2023-08-11  6:27   ` [PATCH vfs.tmpfs v2 " Hugh Dickins
2023-08-11  8:35     ` Christoph Hellwig
2023-08-11  8:56     ` Jan Kara
2023-08-11 11:00     ` (subset) " Christian Brauner
2023-08-09  4:36 ` [PATCH vfs.tmpfs 5/5] mm: invalidation check mapping before folio_contains Hugh Dickins
2023-08-09  9:27   ` Jan Kara
2023-08-09  6:45 ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Christian Brauner
2023-08-09 11:33   ` Christian Brauner
2023-08-10  5:50     ` Hugh Dickins
2023-08-10 10:07       ` Christian Brauner
2023-08-21 17:39         ` [PATCH vfs.tmpfs] tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs Hugh Dickins
2023-08-21 17:57           ` Jan Kara
2023-08-22  8:58           ` (subset) " Christian Brauner
2023-08-10 23:23       ` [PATCH vfs.tmpfs 0/5] tmpfs: user xattrs and direct IO Pete Zaitcev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).