put the xfs xfile abstraction on a diet v3

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* put the xfs xfile abstraction on a diet v3
@ 2024-01-29 14:34 Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 01/20] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
                   ` (19 more replies)
  0 siblings, 20 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Hi all,

this series refactors and simplifies the code in the xfs xfile
abstraction, which is a thing layer on a kernel-use shmem file.

To do this is needs a slighly lower level exports from shmem.c,
which I combined with improving an assert and documentation there.

Now that this series has a fair amount of reviews we also need to
think how we can merge it best.  Should the shmem changes be
merged through the xfs tree?  Should there be a shared branch?

The series is against Linux 6.8-rc2.

Changes since v2:
 - include internal.h in workingset.h to avoid a buildbot warning
 - update the usage comment for shmem_get_folio
 - improve the xfile_load/store documentation
 - don't use highmem for xfiles for real
 - only call filemap_check_wb_err to check for errors on the mapping

Changes since v1:
 - fix reading i_blocks
 - provide wrappers for reading i_size and i_blocks
 - update the xfile load/store documentation
 - improve a commit message
 - use shmem_kernel_file_setup
 - add a missing folio unlock in the hwpoison path in xfile_get_page
 - fix checking for shmem mappings
 - improve I/O error handling (Darrick)
 - convert to folios (partially from Darrick)

Diffstat:
 Documentation/filesystems/xfs/xfs-online-fsck-design.rst |   25 -
 fs/xfs/scrub/rtsummary.c                                 |    6 
 fs/xfs/scrub/trace.h                                     |   81 ++-
 fs/xfs/scrub/xfarray.c                                   |  234 ++++------
 fs/xfs/scrub/xfarray.h                                   |   11 
 fs/xfs/scrub/xfile.c                                     |  345 +++++----------
 fs/xfs/scrub/xfile.h                                     |   62 --
 include/linux/shmem_fs.h                                 |    6 
 include/linux/swap.h                                     |   10 
 mm/filemap.c                                             |    9 
 mm/internal.h                                            |    4 
 mm/shmem.c                                               |   37 +
 mm/workingset.c                                          |    1 
 13 files changed, 349 insertions(+), 482 deletions(-)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/20] mm: move mapping_set_update out of <linux/swap.h>
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 02/20] shmem: move shmem_mapping out of line Christoph Hellwig
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Matthew Wilcox

mapping_set_update is only used inside mm/.  Move mapping_set_update to
mm/internal.h and turn it into an inline function instead of a macro.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/swap.h | 10 ----------
 mm/filemap.c         |  9 +++++++++
 mm/internal.h        |  4 ++++
 mm/workingset.c      |  1 +
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4db00ddad26169..755fc64ba48ded 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -350,16 +350,6 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg);
 void workingset_refault(struct folio *folio, void *shadow);
 void workingset_activation(struct folio *folio);
 
-/* Only track the nodes of mappings with shadow entries */
-void workingset_update_node(struct xa_node *node);
-extern struct list_lru shadow_nodes;
-#define mapping_set_update(xas, mapping) do {				\
-	if (!dax_mapping(mapping) && !shmem_mapping(mapping)) {		\
-		xas_set_update(xas, workingset_update_node);		\
-		xas_set_lru(xas, &shadow_nodes);			\
-	}								\
-} while (0)
-
 /* linux/mm/page_alloc.c */
 extern unsigned long totalreserve_pages;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779c23db74..6c8b089f00d26a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -124,6 +124,15 @@
  *    ->private_lock		(zap_pte_range->block_dirty_folio)
  */
 
+static void mapping_set_update(struct xa_state *xas,
+		struct address_space *mapping)
+{
+	if (dax_mapping(mapping) || shmem_mapping(mapping))
+		return;
+	xas_set_update(xas, workingset_update_node);
+	xas_set_lru(xas, &shadow_nodes);
+}
+
 static void page_cache_delete(struct address_space *mapping,
 				   struct folio *folio, void *shadow)
 {
diff --git a/mm/internal.h b/mm/internal.h
index f309a010d50fb6..4398f572485f00 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1266,4 +1266,8 @@ static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
 }
 #endif /* CONFIG_SHRINKER_DEBUG */
 
+/* Only track the nodes of mappings with shadow entries */
+void workingset_update_node(struct xa_node *node);
+extern struct list_lru shadow_nodes;
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/workingset.c b/mm/workingset.c
index 2260129743282d..f2a0ecaf708d76 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -16,6 +16,7 @@
 #include <linux/dax.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
+#include "internal.h"
 
 /*
  *		Double CLOCK lists
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/20] shmem: move shmem_mapping out of line
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 01/20] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-02-16  6:03   ` Hugh Dickins
  2024-01-29 14:34 ` [PATCH 03/20] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Matthew Wilcox

shmem_aops really should not be exported to the world.  Move
shmem_mapping and export it as internal for the one semi-legitimate
modular user in udmabuf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/shmem_fs.h |  6 +-----
 mm/shmem.c               | 11 ++++++++---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 2caa6b86106aa3..6b96a87e4bc80a 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -97,11 +97,7 @@ extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags);
 extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts);
 #ifdef CONFIG_SHMEM
-extern const struct address_space_operations shmem_aops;
-static inline bool shmem_mapping(struct address_space *mapping)
-{
-	return mapping->a_ops == &shmem_aops;
-}
+bool shmem_mapping(struct address_space *mapping);
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index d7c84ff621860b..f607b0cab7e4e2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -254,7 +254,7 @@ static void shmem_inode_unacct_blocks(struct inode *inode, long pages)
 }
 
 static const struct super_operations shmem_ops;
-const struct address_space_operations shmem_aops;
+static const struct address_space_operations shmem_aops;
 static const struct file_operations shmem_file_operations;
 static const struct inode_operations shmem_inode_operations;
 static const struct inode_operations shmem_dir_inode_operations;
@@ -263,6 +263,12 @@ static const struct vm_operations_struct shmem_vm_ops;
 static const struct vm_operations_struct shmem_anon_vm_ops;
 static struct file_system_type shmem_fs_type;
 
+bool shmem_mapping(struct address_space *mapping)
+{
+	return mapping->a_ops == &shmem_aops;
+}
+EXPORT_SYMBOL_GPL(shmem_mapping);
+
 bool vma_is_anon_shmem(struct vm_area_struct *vma)
 {
 	return vma->vm_ops == &shmem_anon_vm_ops;
@@ -4466,7 +4472,7 @@ static int shmem_error_remove_folio(struct address_space *mapping,
 	return 0;
 }
 
-const struct address_space_operations shmem_aops = {
+static const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
 	.dirty_folio	= noop_dirty_folio,
 #ifdef CONFIG_TMPFS
@@ -4478,7 +4484,6 @@ const struct address_space_operations shmem_aops = {
 #endif
 	.error_remove_folio = shmem_error_remove_folio,
 };
-EXPORT_SYMBOL(shmem_aops);
 
 static const struct file_operations shmem_file_operations = {
 	.mmap		= shmem_mmap,
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/20] shmem: move shmem_mapping out of line
  2024-01-29 14:34 ` [PATCH 02/20] shmem: move shmem_mapping out of line Christoph Hellwig
@ 2024-02-16  6:03   ` Hugh Dickins
  0 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2024-02-16  6:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	Hui Su, Vlastimil Babka, linux-xfs, linux-mm, Matthew Wilcox

On Mon, 29 Jan 2024, Christoph Hellwig wrote:

> shmem_aops really should not be exported to the world.  Move
> shmem_mapping and export it as internal for the one semi-legitimate
> modular user in udmabuf.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

I'm okay with this change, but let's be polite and Cc the author
of the change which this reverts - v5.11's commit 30e6a51dbb05
("mm/shmem.c: make shmem_mapping() inline").

Interesting to notice that commit did EXPORT_SYMBOL(shmem_aops),
whereas shmem_mapping() had not been exported before; you're now
exporting it _GPL(), like other examples in mm/shmem.c: agreed.

Hugh

> ---
>  include/linux/shmem_fs.h |  6 +-----
>  mm/shmem.c               | 11 ++++++++---
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 2caa6b86106aa3..6b96a87e4bc80a 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -97,11 +97,7 @@ extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr,
>  		unsigned long len, unsigned long pgoff, unsigned long flags);
>  extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts);
>  #ifdef CONFIG_SHMEM
> -extern const struct address_space_operations shmem_aops;
> -static inline bool shmem_mapping(struct address_space *mapping)
> -{
> -	return mapping->a_ops == &shmem_aops;
> -}
> +bool shmem_mapping(struct address_space *mapping);
>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
>  {
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d7c84ff621860b..f607b0cab7e4e2 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -254,7 +254,7 @@ static void shmem_inode_unacct_blocks(struct inode *inode, long pages)
>  }
>  
>  static const struct super_operations shmem_ops;
> -const struct address_space_operations shmem_aops;
> +static const struct address_space_operations shmem_aops;
>  static const struct file_operations shmem_file_operations;
>  static const struct inode_operations shmem_inode_operations;
>  static const struct inode_operations shmem_dir_inode_operations;
> @@ -263,6 +263,12 @@ static const struct vm_operations_struct shmem_vm_ops;
>  static const struct vm_operations_struct shmem_anon_vm_ops;
>  static struct file_system_type shmem_fs_type;
>  
> +bool shmem_mapping(struct address_space *mapping)
> +{
> +	return mapping->a_ops == &shmem_aops;
> +}
> +EXPORT_SYMBOL_GPL(shmem_mapping);
> +
>  bool vma_is_anon_shmem(struct vm_area_struct *vma)
>  {
>  	return vma->vm_ops == &shmem_anon_vm_ops;
> @@ -4466,7 +4472,7 @@ static int shmem_error_remove_folio(struct address_space *mapping,
>  	return 0;
>  }
>  
> -const struct address_space_operations shmem_aops = {
> +static const struct address_space_operations shmem_aops = {
>  	.writepage	= shmem_writepage,
>  	.dirty_folio	= noop_dirty_folio,
>  #ifdef CONFIG_TMPFS
> @@ -4478,7 +4484,6 @@ const struct address_space_operations shmem_aops = {
>  #endif
>  	.error_remove_folio = shmem_error_remove_folio,
>  };
> -EXPORT_SYMBOL(shmem_aops);
>  
>  static const struct file_operations shmem_file_operations = {
>  	.mmap		= shmem_mmap,
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 03/20] shmem: set a_ops earlier in shmem_symlink
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 01/20] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 02/20] shmem: move shmem_mapping out of line Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 04/20] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Matthew Wilcox

Set the a_ops in shmem_symlink before reading a folio from the mapping
to prepare for asserting that shmem_get_folio is only called on shmem
mappings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index f607b0cab7e4e2..1900916aa84d13 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3506,10 +3506,10 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 		inode->i_op = &shmem_short_symlink_operations;
 	} else {
 		inode_nohighmem(inode);
+		inode->i_mapping->a_ops = &shmem_aops;
 		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
 		if (error)
 			goto out_remove_offset;
-		inode->i_mapping->a_ops = &shmem_aops;
 		inode->i_op = &shmem_symlink_inode_operations;
 		memcpy(folio_address(folio), symname, len);
 		folio_mark_uptodate(folio);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/20] shmem: move the shmem_mapping assert into shmem_get_folio_gfp
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 03/20] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 05/20] shmem: export shmem_get_folio Christoph Hellwig
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Matthew Wilcox

Move the check that the inode really is a shmemfs one from
shmem_read_folio_gfp to shmem_get_folio_gfp given that shmem_get_folio
can also be called from outside of shmem.c.  Also turn it into a
WARN_ON_ONCE and error return instead of BUG_ON to be less severe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/shmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 1900916aa84d13..ad533b2f0721a7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1972,6 +1972,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	int error;
 	bool alloced;
 
+	if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping)))
+		return -EINVAL;
+
 	if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
 		return -EFBIG;
 repeat:
@@ -4915,7 +4918,6 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 	struct folio *folio;
 	int error;
 
-	BUG_ON(!shmem_mapping(mapping));
 	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
 				    gfp, NULL, NULL);
 	if (error)
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/20] shmem: export shmem_get_folio
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 04/20] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-02-16  6:07   ` Hugh Dickins
  2024-02-16 13:53   ` Matthew Wilcox
  2024-01-29 14:34 ` [PATCH 06/20] shmem: export shmem_kernel_file_setup Christoph Hellwig
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Matthew Wilcox

Export shmem_get_folio as a slightly lower-level variant of
shmem_read_folio_gfp.  This will be useful for XFS xfile use cases
that want to pass SGP_NOALLOC or get a locked page, which the thin
shmem_read_folio_gfp wrapper can't provide.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/shmem.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index ad533b2f0721a7..dae684cd3c99fb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2137,12 +2137,27 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	return error;
 }
 
+/**
+ * shmem_get_folio - find and get a reference to a shmem folio.
+ * @inode:	inode to search
+ * @index:	the page index.
+ * @foliop:	pointer to the found folio if one was found
+ * @sgp:	SGP_* flags to control behavior
+ *
+ * Looks up the page cache entry at @inode & @index.
+ *
+ * If this function returns a folio, it is returned with an increased refcount.
+ *
+ * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
+ * and no folio was found at @index, or an ERR_PTR() otherwise.
+ */
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 		enum sgp_type sgp)
 {
 	return shmem_get_folio_gfp(inode, index, foliop, sgp,
 			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
 }
+EXPORT_SYMBOL_GPL(shmem_get_folio);
 
 /*
  * This is like autoremove_wake_function, but it removes the wait queue
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/20] shmem: export shmem_get_folio
  2024-01-29 14:34 ` [PATCH 05/20] shmem: export shmem_get_folio Christoph Hellwig
@ 2024-02-16  6:07   ` Hugh Dickins
  2024-02-16 13:53   ` Matthew Wilcox
  1 sibling, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2024-02-16  6:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm, Matthew Wilcox

On Mon, 29 Jan 2024, Christoph Hellwig wrote:

> Export shmem_get_folio as a slightly lower-level variant of
> shmem_read_folio_gfp.  This will be useful for XFS xfile use cases
> that want to pass SGP_NOALLOC or get a locked page, which the thin
> shmem_read_folio_gfp wrapper can't provide.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  mm/shmem.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad533b2f0721a7..dae684cd3c99fb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2137,12 +2137,27 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>  	return error;
>  }
>  
> +/**
> + * shmem_get_folio - find and get a reference to a shmem folio.
> + * @inode:	inode to search
> + * @index:	the page index.
> + * @foliop:	pointer to the found folio if one was found
> + * @sgp:	SGP_* flags to control behavior
> + *
> + * Looks up the page cache entry at @inode & @index.
> + *
> + * If this function returns a folio, it is returned with an increased refcount.

If this function returns a folio, it is returned locked, with an increased
refcount.

(Important to mention that it's locked, since that differs from
shmem_read_folio_gfp().)

Hugh

> + *
> + * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
> + * and no folio was found at @index, or an ERR_PTR() otherwise.
> + */
>  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
>  		enum sgp_type sgp)
>  {
>  	return shmem_get_folio_gfp(inode, index, foliop, sgp,
>  			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
>  }
> +EXPORT_SYMBOL_GPL(shmem_get_folio);
>  
>  /*
>   * This is like autoremove_wake_function, but it removes the wait queue
> -- 
> 2.39.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/20] shmem: export shmem_get_folio
  2024-01-29 14:34 ` [PATCH 05/20] shmem: export shmem_get_folio Christoph Hellwig
  2024-02-16  6:07   ` Hugh Dickins
@ 2024-02-16 13:53   ` Matthew Wilcox
  2024-02-19  6:25     ` Christoph Hellwig
  1 sibling, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2024-02-16 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Mon, Jan 29, 2024 at 03:34:47PM +0100, Christoph Hellwig wrote:
> +/**
> + * shmem_get_folio - find and get a reference to a shmem folio.
> + * @inode:	inode to search
> + * @index:	the page index.
> + * @foliop:	pointer to the found folio if one was found
> + * @sgp:	SGP_* flags to control behavior
> + *
> + * Looks up the page cache entry at @inode & @index.
> + *
> + * If this function returns a folio, it is returned with an increased refcount.
> + *
> + * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
> + * and no folio was found at @index, or an ERR_PTR() otherwise.

I know I gave an R-b on this earlier, but Hugh made me look again, and
this comment clearly does not reflect what the function does.
Presumably it returns an errno and sets foliop if it returns 0?

Also, should this function be called shmem_lock_folio() to mirror
filemap_lock_folio()?

> + */
>  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
>  		enum sgp_type sgp)
>  {
>  	return shmem_get_folio_gfp(inode, index, foliop, sgp,
>  			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
>  }
> +EXPORT_SYMBOL_GPL(shmem_get_folio);



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/20] shmem: export shmem_get_folio
  2024-02-16 13:53   ` Matthew Wilcox
@ 2024-02-19  6:25     ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Fri, Feb 16, 2024 at 01:53:09PM +0000, Matthew Wilcox wrote:
> I know I gave an R-b on this earlier, but Hugh made me look again, and
> this comment clearly does not reflect what the function does.
> Presumably it returns an errno and sets foliop if it returns 0?

Almost.  With SGP_READ it can set *foliop to NULL and still return 0.

> Also, should this function be called shmem_lock_folio() to mirror
> filemap_lock_folio()?

shmem_get_folio can also allocate (and sometimes zero) a new folio.
Except for the different calling conventions that closest filemap
equivalent is __filemap_get_folio.  For now I'd like to avoid the
bikeshedding on the name and just get the work done.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 06/20] shmem: export shmem_kernel_file_setup
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 05/20] shmem: export shmem_get_folio Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 16:24   ` Darrick J. Wong
  2024-01-29 14:34 ` [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Matthew Wilcox

XFS wants to use this for it's internal in-memory data structures and
currently duplicates the functionality.  Export shmem_kernel_file_setup
to allow XFS to switch over to using the proper kernel API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/shmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index dae684cd3c99fb..e89fb5eccb0c0a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4856,6 +4856,7 @@ struct file *shmem_kernel_file_setup(const char *name, loff_t size, unsigned lon
 {
 	return __shmem_file_setup(shm_mnt, name, size, flags, S_PRIVATE);
 }
+EXPORT_SYMBOL_GPL(shmem_kernel_file_setup);
 
 /**
  * shmem_file_setup - get an unlinked file living in tmpfs
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/20] shmem: export shmem_kernel_file_setup
  2024-01-29 14:34 ` [PATCH 06/20] shmem: export shmem_kernel_file_setup Christoph Hellwig
@ 2024-01-29 16:24   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-01-29 16:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Hugh Dickins, Andrew Morton, linux-xfs, linux-mm,
	Matthew Wilcox

On Mon, Jan 29, 2024 at 03:34:48PM +0100, Christoph Hellwig wrote:
> XFS wants to use this for it's internal in-memory data structures and
> currently duplicates the functionality.  Export shmem_kernel_file_setup
> to allow XFS to switch over to using the proper kernel API.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  mm/shmem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index dae684cd3c99fb..e89fb5eccb0c0a 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -4856,6 +4856,7 @@ struct file *shmem_kernel_file_setup(const char *name, loff_t size, unsigned lon
>  {
>  	return __shmem_file_setup(shm_mnt, name, size, flags, S_PRIVATE);
>  }
> +EXPORT_SYMBOL_GPL(shmem_kernel_file_setup);
>  
>  /**
>   * shmem_file_setup - get an unlinked file living in tmpfs
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 06/20] shmem: export shmem_kernel_file_setup Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-30  6:06   ` Matthew Wilcox
  2024-01-29 14:34 ` [PATCH 08/20] xfs: remove xfile_stat Christoph Hellwig
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Add a blurb that simply dirtying the folio will persist data for in-kernel
shmem files.  This is what most of the callers already do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/shmem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index e89fb5eccb0c0a..f73dead491b11d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2148,6 +2148,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
  *
  * If this function returns a folio, it is returned with an increased refcount.
  *
+ * If the caller modifies data in the folio, it must call folio_mark_dirty()
+ * before unlocking the folio to ensure that the folio is not reclaimed.
+ * These is no need to reserve space before calling folio_mark_dirty().
+ *
  * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
  * and no folio was found at @index, or an ERR_PTR() otherwise.
  */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-29 14:34 ` [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
@ 2024-01-30  6:06   ` Matthew Wilcox
  2024-01-30  8:04     ` Christoph Hellwig
  2024-01-30  8:05     ` Christoph Hellwig
  0 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2024-01-30  6:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Mon, Jan 29, 2024 at 03:34:49PM +0100, Christoph Hellwig wrote:
> Add a blurb that simply dirtying the folio will persist data for in-kernel
> shmem files.  This is what most of the callers already do.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

I noticed today that drivers/gpu/drm/i915/gem/i915_gem_shmem.c is
going to a lot of effort to call write_begin and write_end.  I'll
take that out as part of the shmem conversion to buffered_write_ops.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-30  6:06   ` Matthew Wilcox
@ 2024-01-30  8:04     ` Christoph Hellwig
  2024-01-30  8:05     ` Christoph Hellwig
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-30  8:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Tue, Jan 30, 2024 at 06:06:12AM +0000, Matthew Wilcox wrote:
> On Mon, Jan 29, 2024 at 03:34:49PM +0100, Christoph Hellwig wrote:
> > Add a blurb that simply dirtying the folio will persist data for in-kernel
> > shmem files.  This is what most of the callers already do.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> I noticed today that drivers/gpu/drm/i915/gem/i915_gem_shmem.c is
> going to a lot of effort to call write_begin and write_end.  I'll
> take that out as part of the shmem conversion to buffered_write_ops.

I was planning to look into that, but I'll happily leave it to.
It also calls into ->writepage in a somewhat gross way.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-30  6:06   ` Matthew Wilcox
  2024-01-30  8:04     ` Christoph Hellwig
@ 2024-01-30  8:05     ` Christoph Hellwig
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-30  8:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Tue, Jan 30, 2024 at 06:06:12AM +0000, Matthew Wilcox wrote:
> take that out as part of the shmem conversion to buffered_write_ops.

Oh, and I think shmem is actually a good candidate for
iomap_file_buffered_write.  I think it just needs to
plug shmem_get_folio into iomap_folio_ops ->get_folio and provide
an identify mapped iomap like the block device.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 08/20] xfs: remove xfile_stat
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 09/20] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

vfs_getattr is needed to query inode attributes for unknown underlying
file systems.  But shmemfs is well known for users of shmem_file_setup
and shmem_read_mapping_page_gfp that rely on it not needing specific
inode revalidation and having a normal mapping.  Remove the detour
through the getattr method and an extra wrapper, and just read the
inode size and i_bytes directly in the scrub tracing code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/trace.h | 34 ++++++++++------------------------
 fs/xfs/scrub/xfile.c | 19 -------------------
 fs/xfs/scrub/xfile.h |  7 -------
 3 files changed, 10 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 6bbb4e8639dca6..260b8fe0a80296 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -861,18 +861,11 @@ TRACE_EVENT(xfile_destroy,
 		__field(loff_t, size)
 	),
 	TP_fast_assign(
-		struct xfile_stat	statbuf;
-		int			ret;
-
-		ret = xfile_stat(xf, &statbuf);
-		if (!ret) {
-			__entry->bytes = statbuf.bytes;
-			__entry->size = statbuf.size;
-		} else {
-			__entry->bytes = -1;
-			__entry->size = -1;
-		}
-		__entry->ino = file_inode(xf->file)->i_ino;
+		struct inode		*inode = file_inode(xf->file);
+
+		__entry->ino = inode->i_ino;
+		__entry->bytes = inode->i_blocks << SECTOR_SHIFT;
+		__entry->size = i_size_read(inode);
 	),
 	TP_printk("xfino 0x%lx mem_bytes 0x%llx isize 0x%llx",
 		  __entry->ino,
@@ -891,19 +884,12 @@ DECLARE_EVENT_CLASS(xfile_class,
 		__field(unsigned long long, bytecount)
 	),
 	TP_fast_assign(
-		struct xfile_stat	statbuf;
-		int			ret;
-
-		ret = xfile_stat(xf, &statbuf);
-		if (!ret) {
-			__entry->bytes_used = statbuf.bytes;
-			__entry->size = statbuf.size;
-		} else {
-			__entry->bytes_used = -1;
-			__entry->size = -1;
-		}
-		__entry->ino = file_inode(xf->file)->i_ino;
+		struct inode		*inode = file_inode(xf->file);
+
+		__entry->ino = inode->i_ino;
+		__entry->bytes_used = inode->i_blocks << SECTOR_SHIFT;
 		__entry->pos = pos;
+		__entry->size = i_size_read(inode);
 		__entry->bytecount = bytecount;
 	),
 	TP_printk("xfino 0x%lx mem_bytes 0x%llx pos 0x%llx bytecount 0x%llx isize 0x%llx",
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 090c3ead43fdf1..87654cdd5ac6f9 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -291,25 +291,6 @@ xfile_seek_data(
 	return ret;
 }
 
-/* Query stat information for an xfile. */
-int
-xfile_stat(
-	struct xfile		*xf,
-	struct xfile_stat	*statbuf)
-{
-	struct kstat		ks;
-	int			error;
-
-	error = vfs_getattr_nosec(&xf->file->f_path, &ks,
-			STATX_SIZE | STATX_BLOCKS, AT_STATX_DONT_SYNC);
-	if (error)
-		return error;
-
-	statbuf->size = ks.size;
-	statbuf->bytes = ks.blocks << SECTOR_SHIFT;
-	return 0;
-}
-
 /*
  * Grab the (locked) page for a memory object.  The object cannot span a page
  * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index d56643b0f429e1..c602d11560d8ee 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -63,13 +63,6 @@ xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-struct xfile_stat {
-	loff_t			size;
-	unsigned long long	bytes;
-};
-
-int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
-
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
 		struct xfile_page *xbuf);
 int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/20] xfs: remove the xfile_pread/pwrite APIs
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 08/20] xfs: remove xfile_stat Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 10/20] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

All current and pending xfile users use the xfile_obj_load
and xfile_obj_store API, so make those the actual implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 .../xfs/xfs-online-fsck-design.rst            | 23 +++-----
 fs/xfs/scrub/rtsummary.c                      |  6 +--
 fs/xfs/scrub/trace.h                          |  4 +-
 fs/xfs/scrub/xfarray.c                        | 18 +++----
 fs/xfs/scrub/xfile.c                          | 54 +++++++++----------
 fs/xfs/scrub/xfile.h                          | 32 +----------
 6 files changed, 48 insertions(+), 89 deletions(-)

diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 352516feef6ffe..216c99ce511f7c 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -1915,19 +1915,13 @@ four of those five higher level data structures.
 The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case
 study.
 
-The most general storage interface supported by the xfile enables the reading
-and writing of arbitrary quantities of data at arbitrary offsets in the xfile.
-This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions,
-which behave similarly to their userspace counterparts.
 XFS is very record-based, which suggests that the ability to load and store
 complete records is important.
-To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
-functions are provided to read and persist objects into an xfile.
-They are internally the same as pread and pwrite, except that they treat any
-error as an out of memory error.
-For online repair, squashing error conditions in this manner is an acceptable
-behavior because the only reaction is to abort the operation back to userspace.
-All five xfile usecases can be serviced by these four functions.
+To support these cases, a pair of ``xfile_load`` and ``xfile_store``
+functions are provided to read and persist objects into an xfile that treat any
+error as an out of memory error.  For online repair, squashing error conditions
+in this manner is an acceptable behavior because the only reaction is to abort
+the operation back to userspace.
 
 However, no discussion of file access idioms is complete without answering the
 question, "But what about mmap?"
@@ -1939,10 +1933,9 @@ tmpfs can only push a pagecache folio to the swap cache if the folio is neither
 pinned nor locked, which means the xfile must not pin too many folios.
 
 Short term direct access to xfile contents is done by locking the pagecache
-folio and mapping it into kernel address space.
-Programmatic access (e.g. pread and pwrite) uses this mechanism.
-Folio locks are not supposed to be held for long periods of time, so long
-term direct access to xfile contents is done by bumping the folio refcount,
+folio and mapping it into kernel address space.  Object load and store uses this
+mechanism.  Folio locks are not supposed to be held for long periods of time, so
+long term direct access to xfile contents is done by bumping the folio refcount,
 mapping it into kernel address space, and dropping the folio lock.
 These long term users *must* be responsive to memory reclaim by hooking into
 the shrinker infrastructure to know when to release folios.
diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c
index fabd0ed9dfa676..30b5a3952513f8 100644
--- a/fs/xfs/scrub/rtsummary.c
+++ b/fs/xfs/scrub/rtsummary.c
@@ -118,7 +118,7 @@ xfsum_load(
 	xfs_rtsumoff_t		sumoff,
 	union xfs_suminfo_raw	*rawinfo)
 {
-	return xfile_obj_load(sc->xfile, rawinfo,
+	return xfile_load(sc->xfile, rawinfo,
 			sizeof(union xfs_suminfo_raw),
 			sumoff << XFS_WORDLOG);
 }
@@ -129,7 +129,7 @@ xfsum_store(
 	xfs_rtsumoff_t		sumoff,
 	const union xfs_suminfo_raw rawinfo)
 {
-	return xfile_obj_store(sc->xfile, &rawinfo,
+	return xfile_store(sc->xfile, &rawinfo,
 			sizeof(union xfs_suminfo_raw),
 			sumoff << XFS_WORDLOG);
 }
@@ -141,7 +141,7 @@ xfsum_copyout(
 	union xfs_suminfo_raw	*rawinfo,
 	unsigned int		nr_words)
 {
-	return xfile_obj_load(sc->xfile, rawinfo, nr_words << XFS_WORDLOG,
+	return xfile_load(sc->xfile, rawinfo, nr_words << XFS_WORDLOG,
 			sumoff << XFS_WORDLOG);
 }
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 260b8fe0a80296..0327cab606b070 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -903,8 +903,8 @@ DECLARE_EVENT_CLASS(xfile_class,
 DEFINE_EVENT(xfile_class, name, \
 	TP_PROTO(struct xfile *xf, loff_t pos, unsigned long long bytecount), \
 	TP_ARGS(xf, pos, bytecount))
-DEFINE_XFILE_EVENT(xfile_pread);
-DEFINE_XFILE_EVENT(xfile_pwrite);
+DEFINE_XFILE_EVENT(xfile_load);
+DEFINE_XFILE_EVENT(xfile_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
 DEFINE_XFILE_EVENT(xfile_get_page);
 DEFINE_XFILE_EVENT(xfile_put_page);
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index f0f532c10a5acc..95ac14bceeadd6 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -136,7 +136,7 @@ xfarray_load(
 	if (idx >= array->nr)
 		return -ENODATA;
 
-	return xfile_obj_load(array->xfile, ptr, array->obj_size,
+	return xfile_load(array->xfile, ptr, array->obj_size,
 			xfarray_pos(array, idx));
 }
 
@@ -152,7 +152,7 @@ xfarray_is_unset(
 	if (array->unset_slots == 0)
 		return false;
 
-	error = xfile_obj_load(array->xfile, temp, array->obj_size, pos);
+	error = xfile_load(array->xfile, temp, array->obj_size, pos);
 	if (!error && xfarray_element_is_null(array, temp))
 		return true;
 
@@ -184,7 +184,7 @@ xfarray_unset(
 		return 0;
 
 	memset(temp, 0, array->obj_size);
-	error = xfile_obj_store(array->xfile, temp, array->obj_size, pos);
+	error = xfile_store(array->xfile, temp, array->obj_size, pos);
 	if (error)
 		return error;
 
@@ -209,7 +209,7 @@ xfarray_store(
 
 	ASSERT(!xfarray_element_is_null(array, ptr));
 
-	ret = xfile_obj_store(array->xfile, ptr, array->obj_size,
+	ret = xfile_store(array->xfile, ptr, array->obj_size,
 			xfarray_pos(array, idx));
 	if (ret)
 		return ret;
@@ -245,12 +245,12 @@ xfarray_store_anywhere(
 	for (pos = 0;
 	     pos < endpos && array->unset_slots > 0;
 	     pos += array->obj_size) {
-		error = xfile_obj_load(array->xfile, temp, array->obj_size,
+		error = xfile_load(array->xfile, temp, array->obj_size,
 				pos);
 		if (error || !xfarray_element_is_null(array, temp))
 			continue;
 
-		error = xfile_obj_store(array->xfile, ptr, array->obj_size,
+		error = xfile_store(array->xfile, ptr, array->obj_size,
 				pos);
 		if (error)
 			return error;
@@ -552,7 +552,7 @@ xfarray_isort(
 	trace_xfarray_isort(si, lo, hi);
 
 	xfarray_sort_bump_loads(si);
-	error = xfile_obj_load(si->array->xfile, scratch, len, lo_pos);
+	error = xfile_load(si->array->xfile, scratch, len, lo_pos);
 	if (error)
 		return error;
 
@@ -560,7 +560,7 @@ xfarray_isort(
 	sort(scratch, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
-	return xfile_obj_store(si->array->xfile, scratch, len, lo_pos);
+	return xfile_store(si->array->xfile, scratch, len, lo_pos);
 }
 
 /* Grab a page for sorting records. */
@@ -858,7 +858,7 @@ xfarray_sort_load_cached(
 		if (xfarray_sort_terminated(si, &error))
 			return error;
 
-		return xfile_obj_load(si->array->xfile, ptr,
+		return xfile_load(si->array->xfile, ptr,
 				si->array->obj_size, idx_pos);
 	}
 
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 87654cdd5ac6f9..d65681372a7458 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -118,13 +118,11 @@ xfile_destroy(
 }
 
 /*
- * Read a memory object directly from the xfile's page cache.  Unlike regular
- * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
- * high an offset, instead of truncating the read.  Otherwise, we return
- * bytes read or an error code, like regular pread.
+ * Load an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
  */
-ssize_t
-xfile_pread(
+int
+xfile_load(
 	struct xfile		*xf,
 	void			*buf,
 	size_t			count,
@@ -133,16 +131,15 @@ xfile_pread(
 	struct inode		*inode = file_inode(xf->file);
 	struct address_space	*mapping = inode->i_mapping;
 	struct page		*page = NULL;
-	ssize_t			read = 0;
 	unsigned int		pflags;
 	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
-		return -E2BIG;
+		return -ENOMEM;
 	if (inode->i_sb->s_maxbytes - pos < count)
-		return -EFBIG;
+		return -ENOMEM;
 
-	trace_xfile_pread(xf, pos, count);
+	trace_xfile_load(xf, pos, count);
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
@@ -160,8 +157,10 @@ xfile_pread(
 				__GFP_NOWARN);
 		if (IS_ERR(page)) {
 			error = PTR_ERR(page);
-			if (error != -ENOMEM)
+			if (error != -ENOMEM) {
+				error = -ENOMEM;
 				break;
+			}
 
 			memset(buf, 0, len);
 			goto advance;
@@ -185,23 +184,18 @@ xfile_pread(
 		count -= len;
 		pos += len;
 		buf += len;
-		read += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	if (read > 0)
-		return read;
 	return error;
 }
 
 /*
- * Write a memory object directly to the xfile's page cache.  Unlike regular
- * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too
- * high an offset, instead of truncating the write.  Otherwise, we return
- * bytes written or an error code, like regular pwrite.
+ * Store an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
  */
-ssize_t
-xfile_pwrite(
+int
+xfile_store(
 	struct xfile		*xf,
 	const void		*buf,
 	size_t			count,
@@ -211,16 +205,15 @@ xfile_pwrite(
 	struct address_space	*mapping = inode->i_mapping;
 	const struct address_space_operations *aops = mapping->a_ops;
 	struct page		*page = NULL;
-	ssize_t			written = 0;
 	unsigned int		pflags;
 	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
-		return -E2BIG;
+		return -ENOMEM;
 	if (inode->i_sb->s_maxbytes - pos < count)
-		return -EFBIG;
+		return -ENOMEM;
 
-	trace_xfile_pwrite(xf, pos, count);
+	trace_xfile_store(xf, pos, count);
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
@@ -239,8 +232,10 @@ xfile_pwrite(
 		 */
 		error = aops->write_begin(NULL, mapping, pos, len, &page,
 				&fsdata);
-		if (error)
+		if (error) {
+			error = -ENOMEM;
 			break;
+		}
 
 		/*
 		 * xfile pages must never be mapped into userspace, so we skip
@@ -259,13 +254,14 @@ xfile_pwrite(
 		ret = aops->write_end(NULL, mapping, pos, len, len, page,
 				fsdata);
 		if (ret < 0) {
-			error = ret;
+			error = -ENOMEM;
 			break;
 		}
 
-		written += ret;
-		if (ret != len)
+		if (ret != len) {
+			error = -ENOMEM;
 			break;
+		}
 
 		count -= ret;
 		pos += ret;
@@ -273,8 +269,6 @@ xfile_pwrite(
 	}
 	memalloc_nofs_restore(pflags);
 
-	if (written > 0)
-		return written;
 	return error;
 }
 
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index c602d11560d8ee..465b10f492b66d 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -29,38 +29,10 @@ struct xfile {
 int xfile_create(const char *description, loff_t isize, struct xfile **xfilep);
 void xfile_destroy(struct xfile *xf);
 
-ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
-ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count,
+int xfile_load(struct xfile *xf, void *buf, size_t count, loff_t pos);
+int xfile_store(struct xfile *xf, const void *buf, size_t count,
 		loff_t pos);
 
-/*
- * Load an object.  Since we're treating this file as "memory", any error or
- * short IO is treated as a failure to allocate memory.
- */
-static inline int
-xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
-{
-	ssize_t	ret = xfile_pread(xf, buf, count, pos);
-
-	if (ret < 0 || ret != count)
-		return -ENOMEM;
-	return 0;
-}
-
-/*
- * Store an object.  Since we're treating this file as "memory", any error or
- * short IO is treated as a failure to allocate memory.
- */
-static inline int
-xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
-{
-	ssize_t	ret = xfile_pwrite(xf, buf, count, pos);
-
-	if (ret < 0 || ret != count)
-		return -ENOMEM;
-	return 0;
-}
-
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/20] xfs: don't try to handle non-update pages in xfile_obj_load
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 09/20] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 11/20] xfs: shmem_file_setup can't return NULL Christoph Hellwig
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_read_mapping_page_gfp always returns an uptodate page or an
ERR_PTR.  Remove the code that tries to handle a non-uptodate page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index d65681372a7458..71c4102f3305fe 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -166,18 +166,14 @@ xfile_load(
 			goto advance;
 		}
 
-		if (PageUptodate(page)) {
-			/*
-			 * xfile pages must never be mapped into userspace, so
-			 * we skip the dcache flush.
-			 */
-			kaddr = kmap_local_page(page);
-			p = kaddr + offset_in_page(pos);
-			memcpy(buf, p, len);
-			kunmap_local(kaddr);
-		} else {
-			memset(buf, 0, len);
-		}
+		/*
+		 * xfile pages must never be mapped into userspace, so
+		 * we skip the dcache flush.
+		 */
+		kaddr = kmap_local_page(page);
+		p = kaddr + offset_in_page(pos);
+		memcpy(buf, p, len);
+		kunmap_local(kaddr);
 		put_page(page);
 
 advance:
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/20] xfs: shmem_file_setup can't return NULL
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 10/20] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 12/20] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_file_setup always returns a struct file pointer or an ERR_PTR,
so remove the code to check for a NULL return.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 71c4102f3305fe..7785afacf21809 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -62,15 +62,13 @@ xfile_create(
 {
 	struct inode		*inode;
 	struct xfile		*xf;
-	int			error = -ENOMEM;
+	int			error;
 
 	xf = kmalloc(sizeof(struct xfile), XCHK_GFP_FLAGS);
 	if (!xf)
 		return -ENOMEM;
 
 	xf->file = shmem_file_setup(description, isize, 0);
-	if (!xf->file)
-		goto out_xfile;
 	if (IS_ERR(xf->file)) {
 		error = PTR_ERR(xf->file);
 		goto out_xfile;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/20] xfs: don't modify file and inode flags for shmem files
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 11/20] xfs: shmem_file_setup can't return NULL Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-02-16  6:55   ` Hugh Dickins
  2024-01-29 14:34 ` [PATCH 13/20] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_file_setup is explicitly intended for a file that can be
fully read and written by kernel users without restrictions.  Don't
poke into internals to change random flags in the file or inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 7785afacf21809..7e915385ef0011 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -68,28 +68,13 @@ xfile_create(
 	if (!xf)
 		return -ENOMEM;
 
-	xf->file = shmem_file_setup(description, isize, 0);
+	xf->file = shmem_kernel_file_setup(description, isize, 0);
 	if (IS_ERR(xf->file)) {
 		error = PTR_ERR(xf->file);
 		goto out_xfile;
 	}
 
-	/*
-	 * We want a large sparse file that we can pread, pwrite, and seek.
-	 * xfile users are responsible for keeping the xfile hidden away from
-	 * all other callers, so we skip timestamp updates and security checks.
-	 * Make the inode only accessible by root, just in case the xfile ever
-	 * escapes.
-	 */
-	xf->file->f_mode |= FMODE_PREAD | FMODE_PWRITE | FMODE_NOCMTIME |
-			    FMODE_LSEEK;
-	xf->file->f_flags |= O_RDWR | O_LARGEFILE | O_NOATIME;
 	inode = file_inode(xf->file);
-	inode->i_flags |= S_PRIVATE | S_NOCMTIME | S_NOATIME;
-	inode->i_mode &= ~0177;
-	inode->i_uid = GLOBAL_ROOT_UID;
-	inode->i_gid = GLOBAL_ROOT_GID;
-
 	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
 
 	trace_xfile_create(xf);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/20] xfs: don't modify file and inode flags for shmem files
  2024-01-29 14:34 ` [PATCH 12/20] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
@ 2024-02-16  6:55   ` Hugh Dickins
  0 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2024-02-16  6:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Mon, 29 Jan 2024, Christoph Hellwig wrote:

> shmem_file_setup is explicitly intended for a file that can be
> fully read and written by kernel users without restrictions.  Don't
> poke into internals to change random flags in the file or inode.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/xfile.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 7785afacf21809..7e915385ef0011 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -68,28 +68,13 @@ xfile_create(
>  	if (!xf)
>  		return -ENOMEM;
>  
> -	xf->file = shmem_file_setup(description, isize, 0);
> +	xf->file = shmem_kernel_file_setup(description, isize, 0);

I think you probably want to say VM_NORESERVE there, instead of 0.

I see from the (deleted) comment below that "We want a large sparse file",
and the VM_NORESERVE does become important on large sparse files.  It is
how normal flles and memfds are set up (whereas SysV SHM says 0 to reserve
size at setup). It affects the Committed_AS seen in /proc/meminfo, and how
__vm_enough_memory() behaves (depending on /proc/sys/vm/overcommit_memory).

Hmm, I think mm/shmem.c is not prepared for the case of flags 0 there,
and then the file growing bigger than isize later on - I suspect that
Committed_AS will end up underflowing.  shmem.c ought to be defensive
against that, sure, but I don't think such a case has come up before.

I see you have two calls to xfile_create(), one with what looks like
a known fixed non-0 isize, but one with isize 0 which will grow later.

>  	if (IS_ERR(xf->file)) {
>  		error = PTR_ERR(xf->file);
>  		goto out_xfile;
>  	}
>  
> -	/*
> -	 * We want a large sparse file that we can pread, pwrite, and seek.
> -	 * xfile users are responsible for keeping the xfile hidden away from
> -	 * all other callers, so we skip timestamp updates and security checks.
> -	 * Make the inode only accessible by root, just in case the xfile ever
> -	 * escapes.
> -	 */
> -	xf->file->f_mode |= FMODE_PREAD | FMODE_PWRITE | FMODE_NOCMTIME |
> -			    FMODE_LSEEK;
> -	xf->file->f_flags |= O_RDWR | O_LARGEFILE | O_NOATIME;

I've forgotten offhand how O_LARGEFILE propagates down to where a file
gets grown, and I realize that 32-bit is not your primary concern: but
please double-check that it works correctly without O_LARGEFILE now
(maybe the new folio ops avoid everywhere that largefile would be checked).

Hugh

>  	inode = file_inode(xf->file);
> -	inode->i_flags |= S_PRIVATE | S_NOCMTIME | S_NOATIME;
> -	inode->i_mode &= ~0177;
> -	inode->i_uid = GLOBAL_ROOT_UID;
> -	inode->i_gid = GLOBAL_ROOT_GID;
> -
>  	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
>  
>  	trace_xfile_create(xf);
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 13/20] xfs: don't allow highmem pages in xfile mappings
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 12/20] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-02-16  7:13   ` Hugh Dickins
  2024-01-29 14:34 ` [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

XFS is generally used on 64-bit, non-highmem platforms and xfile
mappings are accessed all the time.  Reduce our pain by not allowing
any highmem mappings in the xfile page cache and remove all the kmap
calls for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c |  3 +--
 fs/xfs/scrub/xfile.c   | 21 +++++++++------------
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 95ac14bceeadd6..d0f98a43b2ba0a 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -580,7 +580,7 @@ xfarray_sort_get_page(
 	 * xfile pages must never be mapped into userspace, so we skip the
 	 * dcache flush when mapping the page.
 	 */
-	si->page_kaddr = kmap_local_page(si->xfpage.page);
+	si->page_kaddr = page_address(si->xfpage.page);
 	return 0;
 }
 
@@ -592,7 +592,6 @@ xfarray_sort_put_page(
 	if (!si->page_kaddr)
 		return 0;
 
-	kunmap_local(si->page_kaddr);
 	si->page_kaddr = NULL;
 
 	return xfile_put_page(si->array->xfile, &si->xfpage);
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 7e915385ef0011..a669ebbbc02d1d 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -77,6 +77,12 @@ xfile_create(
 	inode = file_inode(xf->file);
 	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
 
+	/*
+	 * We don't want to bother with kmapping data during repair, so don't
+	 * allow highmem pages to back this mapping.
+	 */
+	mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL);
+
 	trace_xfile_create(xf);
 
 	*xfilep = xf;
@@ -126,7 +132,6 @@ xfile_load(
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
-		void		*p, *kaddr;
 		unsigned int	len;
 
 		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
@@ -153,10 +158,7 @@ xfile_load(
 		 * xfile pages must never be mapped into userspace, so
 		 * we skip the dcache flush.
 		 */
-		kaddr = kmap_local_page(page);
-		p = kaddr + offset_in_page(pos);
-		memcpy(buf, p, len);
-		kunmap_local(kaddr);
+		memcpy(buf, page_address(page) + offset_in_page(pos), len);
 		put_page(page);
 
 advance:
@@ -221,14 +223,13 @@ xfile_store(
 		 * the dcache flush.  If the page is not uptodate, zero it
 		 * before writing data.
 		 */
-		kaddr = kmap_local_page(page);
+		kaddr = page_address(page);
 		if (!PageUptodate(page)) {
 			memset(kaddr, 0, PAGE_SIZE);
 			SetPageUptodate(page);
 		}
 		p = kaddr + offset_in_page(pos);
 		memcpy(p, buf, len);
-		kunmap_local(kaddr);
 
 		ret = aops->write_end(NULL, mapping, pos, len, len, page,
 				fsdata);
@@ -314,11 +315,7 @@ xfile_get_page(
 	 * to the caller and make sure the backing store will hold on to them.
 	 */
 	if (!PageUptodate(page)) {
-		void	*kaddr;
-
-		kaddr = kmap_local_page(page);
-		memset(kaddr, 0, PAGE_SIZE);
-		kunmap_local(kaddr);
+		memset(page_address(page), 0, PAGE_SIZE);
 		SetPageUptodate(page);
 	}
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] xfs: don't allow highmem pages in xfile mappings
  2024-01-29 14:34 ` [PATCH 13/20] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
@ 2024-02-16  7:13   ` Hugh Dickins
  0 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2024-02-16  7:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	Matthew Wilcox, linux-xfs, linux-mm

On Mon, 29 Jan 2024, Christoph Hellwig wrote:

> XFS is generally used on 64-bit, non-highmem platforms and xfile
> mappings are accessed all the time.  Reduce our pain by not allowing
> any highmem mappings in the xfile page cache and remove all the kmap
> calls for it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/xfarray.c |  3 +--
>  fs/xfs/scrub/xfile.c   | 21 +++++++++------------
>  2 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
> index 95ac14bceeadd6..d0f98a43b2ba0a 100644
> --- a/fs/xfs/scrub/xfarray.c
> +++ b/fs/xfs/scrub/xfarray.c
> @@ -580,7 +580,7 @@ xfarray_sort_get_page(
>  	 * xfile pages must never be mapped into userspace, so we skip the
>  	 * dcache flush when mapping the page.
>  	 */
> -	si->page_kaddr = kmap_local_page(si->xfpage.page);
> +	si->page_kaddr = page_address(si->xfpage.page);
>  	return 0;
>  }
>  
> @@ -592,7 +592,6 @@ xfarray_sort_put_page(
>  	if (!si->page_kaddr)
>  		return 0;
>  
> -	kunmap_local(si->page_kaddr);
>  	si->page_kaddr = NULL;
>  
>  	return xfile_put_page(si->array->xfile, &si->xfpage);
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 7e915385ef0011..a669ebbbc02d1d 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -77,6 +77,12 @@ xfile_create(
>  	inode = file_inode(xf->file);
>  	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
>  
> +	/*
> +	 * We don't want to bother with kmapping data during repair, so don't
> +	 * allow highmem pages to back this mapping.
> +	 */
> +	mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL);

I had been going to point you to inode_nohighmem(inode), which is how
that is usually done.  But I share Matthew's uncertainty about cpusets
and the __GFP_HARDWALL inside GFP_USER: you may well be right to stick
with GFP_KERNEL, just as you've done it here.  I suppose it depends on
whether you see this as a system operation or a user-invoked operation.

I did also think that perhaps you could mask off __GFP_FS here, to
avoid having to remember those memalloc_nofs_save()s elsewhere; but
that's probably a bad idea, I doubt anywhere else does it that way.

So, no change required, but I wanted to think about it.

Hugh

> +
>  	trace_xfile_create(xf);
>  
>  	*xfilep = xf;
> @@ -126,7 +132,6 @@ xfile_load(
>  
>  	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> -		void		*p, *kaddr;
>  		unsigned int	len;
>  
>  		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> @@ -153,10 +158,7 @@ xfile_load(
>  		 * xfile pages must never be mapped into userspace, so
>  		 * we skip the dcache flush.
>  		 */
> -		kaddr = kmap_local_page(page);
> -		p = kaddr + offset_in_page(pos);
> -		memcpy(buf, p, len);
> -		kunmap_local(kaddr);
> +		memcpy(buf, page_address(page) + offset_in_page(pos), len);
>  		put_page(page);
>  
>  advance:
> @@ -221,14 +223,13 @@ xfile_store(
>  		 * the dcache flush.  If the page is not uptodate, zero it
>  		 * before writing data.
>  		 */
> -		kaddr = kmap_local_page(page);
> +		kaddr = page_address(page);
>  		if (!PageUptodate(page)) {
>  			memset(kaddr, 0, PAGE_SIZE);
>  			SetPageUptodate(page);
>  		}
>  		p = kaddr + offset_in_page(pos);
>  		memcpy(p, buf, len);
> -		kunmap_local(kaddr);
>  
>  		ret = aops->write_end(NULL, mapping, pos, len, len, page,
>  				fsdata);
> @@ -314,11 +315,7 @@ xfile_get_page(
>  	 * to the caller and make sure the backing store will hold on to them.
>  	 */
>  	if (!PageUptodate(page)) {
> -		void	*kaddr;
> -
> -		kaddr = kmap_local_page(page);
> -		memset(kaddr, 0, PAGE_SIZE);
> -		kunmap_local(kaddr);
> +		memset(page_address(page), 0, PAGE_SIZE);
>  		SetPageUptodate(page);
>  	}
>  
> -- 
> 2.39.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 13/20] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-02-16  7:40   ` Hugh Dickins
  2024-01-29 14:34 ` [PATCH 15/20] xfs: use shmem_get_folio in in xfile_load Christoph Hellwig
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Switch to using shmem_get_folio and manually dirtying the page instead
of abusing aops->write_begin and aops->write_end in xfile_get_page.

This simplifies the code by not doing indirect calls of not actually
exported interfaces that don't really fit the use case very well, and
happens to get us large folio support for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 73 ++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index a669ebbbc02d1d..2b4819902b4cc3 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -183,11 +183,7 @@ xfile_store(
 	loff_t			pos)
 {
 	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	struct page		*page = NULL;
 	unsigned int		pflags;
-	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
 		return -ENOMEM;
@@ -196,60 +192,45 @@ xfile_store(
 
 	trace_xfile_store(xf, pos, count);
 
+	/*
+	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
+	 * actually allocates a folio instead of erroring out.
+	 */
+	if (pos + count > i_size_read(inode))
+		i_size_write(inode, pos + count);
+
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
-		void		*fsdata = NULL;
-		void		*p, *kaddr;
+		struct folio	*folio;
 		unsigned int	len;
-		int		ret;
+		unsigned int	offset;
 
-		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
-
-		/*
-		 * We call write_begin directly here to avoid all the freezer
-		 * protection lock-taking that happens in the normal path.
-		 * shmem doesn't support fs freeze, but lockdep doesn't know
-		 * that and will trip over that.
-		 */
-		error = aops->write_begin(NULL, mapping, pos, len, &page,
-				&fsdata);
-		if (error) {
-			error = -ENOMEM;
+		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+				SGP_CACHE) < 0)
 			break;
-		}
-
-		/*
-		 * xfile pages must never be mapped into userspace, so we skip
-		 * the dcache flush.  If the page is not uptodate, zero it
-		 * before writing data.
-		 */
-		kaddr = page_address(page);
-		if (!PageUptodate(page)) {
-			memset(kaddr, 0, PAGE_SIZE);
-			SetPageUptodate(page);
-		}
-		p = kaddr + offset_in_page(pos);
-		memcpy(p, buf, len);
-
-		ret = aops->write_end(NULL, mapping, pos, len, len, page,
-				fsdata);
-		if (ret < 0) {
-			error = -ENOMEM;
+		if (filemap_check_wb_err(inode->i_mapping, 0)) {
+			folio_unlock(folio);
+			folio_put(folio);
 			break;
 		}
 
-		if (ret != len) {
-			error = -ENOMEM;
-			break;
-		}
+		offset = offset_in_folio(folio, pos);
+		len = min_t(ssize_t, count, folio_size(folio) - offset);
+		memcpy(folio_address(folio) + offset, buf, len);
+
+		folio_mark_dirty(folio);
+		folio_unlock(folio);
+		folio_put(folio);
 
-		count -= ret;
-		pos += ret;
-		buf += ret;
+		count -= len;
+		pos += len;
+		buf += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	return error;
+	if (count)
+		return -ENOMEM;
+	return 0;
 }
 
 /* Find the next written area in the xfile data for a given offset. */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store
  2024-01-29 14:34 ` [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
@ 2024-02-16  7:40   ` Hugh Dickins
  0 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2024-02-16  7:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	Matthew Wilcox, linux-xfs, linux-mm

On Mon, 29 Jan 2024, Christoph Hellwig wrote:

> Switch to using shmem_get_folio and manually dirtying the page instead
> of abusing aops->write_begin and aops->write_end in xfile_get_page.
> 
> This simplifies the code by not doing indirect calls of not actually
> exported interfaces that don't really fit the use case very well, and
> happens to get us large folio support for free.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/xfile.c | 73 ++++++++++++++++----------------------------
>  1 file changed, 27 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index a669ebbbc02d1d..2b4819902b4cc3 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -183,11 +183,7 @@ xfile_store(
>  	loff_t			pos)
>  {
>  	struct inode		*inode = file_inode(xf->file);
> -	struct address_space	*mapping = inode->i_mapping;
> -	const struct address_space_operations *aops = mapping->a_ops;
> -	struct page		*page = NULL;
>  	unsigned int		pflags;
> -	int			error = 0;
>  
>  	if (count > MAX_RW_COUNT)
>  		return -ENOMEM;
> @@ -196,60 +192,45 @@ xfile_store(
>  
>  	trace_xfile_store(xf, pos, count);
>  
> +	/*
> +	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
> +	 * actually allocates a folio instead of erroring out.
> +	 */
> +	if (pos + count > i_size_read(inode))
> +		i_size_write(inode, pos + count);
> +
>  	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> -		void		*fsdata = NULL;
> -		void		*p, *kaddr;
> +		struct folio	*folio;
>  		unsigned int	len;
> -		int		ret;
> +		unsigned int	offset;
>  
> -		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> -
> -		/*
> -		 * We call write_begin directly here to avoid all the freezer
> -		 * protection lock-taking that happens in the normal path.
> -		 * shmem doesn't support fs freeze, but lockdep doesn't know
> -		 * that and will trip over that.
> -		 */
> -		error = aops->write_begin(NULL, mapping, pos, len, &page,
> -				&fsdata);
> -		if (error) {
> -			error = -ENOMEM;
> +		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> +				SGP_CACHE) < 0)o

SGP_CACHE is the safest choice, yes.  It will tend to do an unnecessary
clear_highpage() which you immediately overwrite with the actual data;
but saves calculating exactly what needs to be zeroed above and below
the data - not worth bothering with, unless shaving off CPU cycles.

>  			break;
> -		}
> -
> -		/*
> -		 * xfile pages must never be mapped into userspace, so we skip
> -		 * the dcache flush.  If the page is not uptodate, zero it
> -		 * before writing data.
> -		 */
> -		kaddr = page_address(page);
> -		if (!PageUptodate(page)) {
> -			memset(kaddr, 0, PAGE_SIZE);
> -			SetPageUptodate(page);
> -		}
> -		p = kaddr + offset_in_page(pos);
> -		memcpy(p, buf, len);
> -
> -		ret = aops->write_end(NULL, mapping, pos, len, len, page,
> -				fsdata);
> -		if (ret < 0) {
> -			error = -ENOMEM;
> +		if (filemap_check_wb_err(inode->i_mapping, 0)) {

Hah.  I was sceptical what that could ever achieve on shmem (the "wb"
is misleading); but it's an ingenious suggestion from Matthew, to avoid
our current horrid folio+page HWPoison handling.  I followed it up a bit,
and it does look as if this filemap_check_wb_err() technique should work
for that; but it's not something I've tried at all.

And that's all I've got to say on the series (I read on, but certainly
did not delve into the folio sorting stuff): looks good,  but the
VM_NORESERVE question probably needs attention (and that docbook
comment to mention "locked").

XFS tree stil seems to me the right home for it all.

Hugh

> +			folio_unlock(folio);
> +			folio_put(folio);
>  			break;
>  		}
>  
> -		if (ret != len) {
> -			error = -ENOMEM;
> -			break;
> -		}
> +		offset = offset_in_folio(folio, pos);
> +		len = min_t(ssize_t, count, folio_size(folio) - offset);
> +		memcpy(folio_address(folio) + offset, buf, len);
> +
> +		folio_mark_dirty(folio);
> +		folio_unlock(folio);
> +		folio_put(folio);
>  
> -		count -= ret;
> -		pos += ret;
> -		buf += ret;
> +		count -= len;
> +		pos += len;
> +		buf += len;
>  	}
>  	memalloc_nofs_restore(pflags);
>  
> -	return error;
> +	if (count)
> +		return -ENOMEM;
> +	return 0;
>  }
>  
>  /* Find the next written area in the xfile data for a given offset. */
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 15/20] xfs: use shmem_get_folio in in xfile_load
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 16/20] xfs: add file_{get,put}_folio Christoph Hellwig
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Switch to using shmem_get_folio in xfile_load instead of using
shmem_read_mapping_page_gfp.  This gets us support for large folios
and also optimized reading from unallocated space, as
shmem_get_folio with SGP_READ won't allocate a page for them just
to zero the content.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 61 +++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 2b4819902b4cc3..c524fe9980fea2 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -34,13 +34,6 @@
  * xfiles assume that the caller will handle all required concurrency
  * management; standard vfs locks (freezer and inode) are not taken.  Reads
  * and writes are satisfied directly from the page cache.
- *
- * NOTE: The current shmemfs implementation has a quirk that in-kernel reads
- * of a hole cause a page to be mapped into the file.  If you are going to
- * create a sparse xfile, please be careful about reading from uninitialized
- * parts of the file.  These pages are !Uptodate and will eventually be
- * reclaimed if not written, but in the short term this boosts memory
- * consumption.
  */
 
 /*
@@ -118,10 +111,7 @@ xfile_load(
 	loff_t			pos)
 {
 	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	struct page		*page = NULL;
 	unsigned int		pflags;
-	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
 		return -ENOMEM;
@@ -132,43 +122,44 @@ xfile_load(
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
+		struct folio	*folio;
 		unsigned int	len;
+		unsigned int	offset;
 
-		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
-
-		/*
-		 * In-kernel reads of a shmem file cause it to allocate a page
-		 * if the mapping shows a hole.  Therefore, if we hit ENOMEM
-		 * we can continue by zeroing the caller's buffer.
-		 */
-		page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
-				__GFP_NOWARN);
-		if (IS_ERR(page)) {
-			error = PTR_ERR(page);
-			if (error != -ENOMEM) {
-				error = -ENOMEM;
+		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+				SGP_READ) < 0)
+			break;
+		if (!folio) {
+			/*
+			 * No data stored at this offset, just zero the output
+			 * buffer until the next page boundary.
+			 */
+			len = min_t(ssize_t, count,
+				PAGE_SIZE - offset_in_page(pos));
+			memset(buf, 0, len);
+		} else {
+			if (filemap_check_wb_err(inode->i_mapping, 0)) {
+				folio_unlock(folio);
+				folio_put(folio);
 				break;
 			}
 
-			memset(buf, 0, len);
-			goto advance;
-		}
-
-		/*
-		 * xfile pages must never be mapped into userspace, so
-		 * we skip the dcache flush.
-		 */
-		memcpy(buf, page_address(page) + offset_in_page(pos), len);
-		put_page(page);
+			offset = offset_in_folio(folio, pos);
+			len = min_t(ssize_t, count, folio_size(folio) - offset);
+			memcpy(buf, folio_address(folio) + offset, len);
 
-advance:
+			folio_unlock(folio);
+			folio_put(folio);
+		}
 		count -= len;
 		pos += len;
 		buf += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	return error;
+	if (count)
+		return -ENOMEM;
+	return 0;
 }
 
 /*
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 16/20] xfs: add file_{get,put}_folio
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 15/20] xfs: use shmem_get_folio in in xfile_load Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:34 ` [PATCH 17/20] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Kent Overstreet

From: "Darrick J. Wong" <djwong@kernel.org>

Add helper similar to file_{get,set}_page, but which deal with folios
and don't allocate new folio unless explicitly asked to, which map
to shmem_get_folio instead of calling into the aops.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/xfs/scrub/trace.h |  2 ++
 fs/xfs/scrub/xfile.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfile.h |  7 +++++
 3 files changed, 83 insertions(+)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 0327cab606b070..c61fa7a95ef522 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -908,6 +908,8 @@ DEFINE_XFILE_EVENT(xfile_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
 DEFINE_XFILE_EVENT(xfile_get_page);
 DEFINE_XFILE_EVENT(xfile_put_page);
+DEFINE_XFILE_EVENT(xfile_get_folio);
+DEFINE_XFILE_EVENT(xfile_put_folio);
 
 TRACE_EVENT(xfarray_create,
 	TP_PROTO(struct xfarray *xfa, unsigned long long required_capacity),
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index c524fe9980fea2..2d59407909de35 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -340,3 +340,77 @@ xfile_put_page(
 		return -EIO;
 	return 0;
 }
+
+/*
+ * Grab the (locked) folio for a memory object.  The object cannot span a folio
+ * boundary.  Returns the locked folio if successful, NULL if there was no
+ * folio or it didn't cover the range requested, or an ERR_PTR on failure.
+ */
+struct folio *
+xfile_get_folio(
+	struct xfile		*xf,
+	loff_t			pos,
+	size_t			len,
+	unsigned int		flags)
+{
+	struct inode		*inode = file_inode(xf->file);
+	struct folio		*folio = NULL;
+	unsigned int		pflags;
+	int			error;
+
+	if (inode->i_sb->s_maxbytes - pos < len)
+		return ERR_PTR(-ENOMEM);
+
+	trace_xfile_get_folio(xf, pos, len);
+
+	/*
+	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
+	 * actually allocates a folio instead of erroring out.
+	 */
+	if ((flags & XFILE_ALLOC) && pos + len > i_size_read(inode))
+		i_size_write(inode, pos + len);
+
+	pflags = memalloc_nofs_save();
+	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
+	memalloc_nofs_restore(pflags);
+	if (error)
+		return ERR_PTR(error);
+
+	if (!folio)
+		return NULL;
+
+	if (len > folio_size(folio) - offset_in_folio(folio, pos)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return NULL;
+	}
+
+	if (filemap_check_wb_err(inode->i_mapping, 0)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return ERR_PTR(-EIO);
+	}
+
+	/*
+	 * Mark the folio dirty so that it won't be reclaimed once we drop the
+	 * (potentially last) reference in xfile_put_folio.
+	 */
+	if (flags & XFILE_ALLOC)
+		folio_set_dirty(folio);
+	return folio;
+}
+
+/*
+ * Release the (locked) folio for a memory object.
+ */
+void
+xfile_put_folio(
+	struct xfile		*xf,
+	struct folio		*folio)
+{
+	trace_xfile_put_folio(xf, folio_pos(folio), folio_size(folio));
+
+	folio_unlock(folio);
+	folio_put(folio);
+}
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index 465b10f492b66d..afb75e9fbaf265 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -39,4 +39,11 @@ int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
 		struct xfile_page *xbuf);
 int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
 
+#define XFILE_MAX_FOLIO_SIZE	(PAGE_SIZE << MAX_PAGECACHE_ORDER)
+
+#define XFILE_ALLOC		(1 << 0) /* allocate folio if not present */
+struct folio *xfile_get_folio(struct xfile *xf, loff_t offset, size_t len,
+		unsigned int flags);
+void xfile_put_folio(struct xfile *xf, struct folio *folio);
+
 #endif /* __XFS_SCRUB_XFILE_H__ */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 17/20] xfs: remove xfarray_sortinfo.page_kaddr
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 16/20] xfs: add file_{get,put}_folio Christoph Hellwig
@ 2024-01-29 14:34 ` Christoph Hellwig
  2024-01-29 14:35 ` [PATCH 18/20] xfs: fix a comment in xfarray.c Christoph Hellwig
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Now that xfile pages don't need kmapping, there is no need to cache
the kernel virtual address for them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c | 22 ++++------------------
 fs/xfs/scrub/xfarray.h |  1 -
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index d0f98a43b2ba0a..82b2a35a8e8630 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -570,18 +570,7 @@ xfarray_sort_get_page(
 	loff_t			pos,
 	uint64_t		len)
 {
-	int			error;
-
-	error = xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
-	if (error)
-		return error;
-
-	/*
-	 * xfile pages must never be mapped into userspace, so we skip the
-	 * dcache flush when mapping the page.
-	 */
-	si->page_kaddr = page_address(si->xfpage.page);
-	return 0;
+	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
 }
 
 /* Release a page we grabbed for sorting records. */
@@ -589,11 +578,8 @@ static inline int
 xfarray_sort_put_page(
 	struct xfarray_sortinfo	*si)
 {
-	if (!si->page_kaddr)
+	if (!xfile_page_cached(&si->xfpage))
 		return 0;
-
-	si->page_kaddr = NULL;
-
 	return xfile_put_page(si->array->xfile, &si->xfpage);
 }
 
@@ -636,7 +622,7 @@ xfarray_pagesort(
 		return error;
 
 	xfarray_sort_bump_heapsorts(si);
-	startp = si->page_kaddr + offset_in_page(lo_pos);
+	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
@@ -883,7 +869,7 @@ xfarray_sort_load_cached(
 			return error;
 	}
 
-	memcpy(ptr, si->page_kaddr + offset_in_page(idx_pos),
+	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
 			si->array->obj_size);
 	return 0;
 }
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 62b9c506fdd1b7..6f2862054e194d 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -107,7 +107,6 @@ struct xfarray_sortinfo {
 
 	/* Cache a page here for faster access. */
 	struct xfile_page	xfpage;
-	void			*page_kaddr;
 
 #ifdef DEBUG
 	/* Performance statistics. */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 18/20] xfs: fix a comment in xfarray.c
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2024-01-29 14:34 ` [PATCH 17/20] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
@ 2024-01-29 14:35 ` Christoph Hellwig
  2024-01-29 14:35 ` [PATCH 19/20] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
  2024-01-29 14:35 ` [PATCH 20/20] xfs: remove xfile_{get,put}_page Christoph Hellwig
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:35 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

xfiles are shmem files, not memfds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 82b2a35a8e8630..379e1db22269c7 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -16,7 +16,7 @@
  * Large Arrays of Fixed-Size Records
  * ==================================
  *
- * This memory array uses an xfile (which itself is a memfd "file") to store
+ * This memory array uses an xfile (which itself is a shmem file) to store
  * large numbers of fixed-size records in memory that can be paged out.  This
  * puts less stress on the memory reclaim algorithms during an online repair
  * because we don't have to pin so much memory.  However, array access is less
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 19/20] xfs: convert xfarray_pagesort to deal with large folios
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2024-01-29 14:35 ` [PATCH 18/20] xfs: fix a comment in xfarray.c Christoph Hellwig
@ 2024-01-29 14:35 ` Christoph Hellwig
  2024-01-29 14:35 ` [PATCH 20/20] xfs: remove xfile_{get,put}_page Christoph Hellwig
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:35 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm, Kent Overstreet

From: "Darrick J. Wong" <djwong@kernel.org>

Convert xfarray_pagesort to handle large folios by introducing a new
xfile_get_folio routine that can return a folio of arbitrary size, and
using heapsort on the full folio.  This also corrects an off-by-one bug
in the calculation of len in xfarray_pagesort that was papered over by
xfarray_want_pagesort.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/xfs/scrub/trace.h   |  43 ++++++++-
 fs/xfs/scrub/xfarray.c | 201 +++++++++++++++++++----------------------
 fs/xfs/scrub/xfarray.h |  10 +-
 3 files changed, 143 insertions(+), 111 deletions(-)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index c61fa7a95ef522..3a1a827828dcb9 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -956,7 +956,7 @@ TRACE_EVENT(xfarray_isort,
 		  __entry->hi - __entry->lo)
 );
 
-TRACE_EVENT(xfarray_pagesort,
+TRACE_EVENT(xfarray_foliosort,
 	TP_PROTO(struct xfarray_sortinfo *si, uint64_t lo, uint64_t hi),
 	TP_ARGS(si, lo, hi),
 	TP_STRUCT__entry(
@@ -1027,6 +1027,47 @@ TRACE_EVENT(xfarray_sort,
 		  __entry->bytes)
 );
 
+TRACE_EVENT(xfarray_sort_scan,
+	TP_PROTO(struct xfarray_sortinfo *si, unsigned long long idx),
+	TP_ARGS(si, idx),
+	TP_STRUCT__entry(
+		__field(unsigned long, ino)
+		__field(unsigned long long, nr)
+		__field(size_t, obj_size)
+		__field(unsigned long long, idx)
+		__field(unsigned long long, folio_pos)
+		__field(unsigned long, folio_bytes)
+		__field(unsigned long long, first_idx)
+		__field(unsigned long long, last_idx)
+	),
+	TP_fast_assign(
+		__entry->nr = si->array->nr;
+		__entry->obj_size = si->array->obj_size;
+		__entry->ino = file_inode(si->array->xfile->file)->i_ino;
+		__entry->idx = idx;
+		if (si->folio) {
+			__entry->folio_pos = folio_pos(si->folio);
+			__entry->folio_bytes = folio_size(si->folio);
+			__entry->first_idx = si->first_folio_idx;
+			__entry->last_idx = si->last_folio_idx;
+		} else {
+			__entry->folio_pos = 0;
+			__entry->folio_bytes = 0;
+			__entry->first_idx = 0;
+			__entry->last_idx = 0;
+		}
+	),
+	TP_printk("xfino 0x%lx nr %llu objsz %zu idx %llu folio_pos 0x%llx folio_bytes 0x%lx first_idx %llu last_idx %llu",
+		  __entry->ino,
+		  __entry->nr,
+		  __entry->obj_size,
+		  __entry->idx,
+		  __entry->folio_pos,
+		  __entry->folio_bytes,
+		  __entry->first_idx,
+		  __entry->last_idx)
+);
+
 TRACE_EVENT(xfarray_sort_stats,
 	TP_PROTO(struct xfarray_sortinfo *si, int error),
 	TP_ARGS(si, error),
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 379e1db22269c7..17c982a4821d47 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -563,70 +563,42 @@ xfarray_isort(
 	return xfile_store(si->array->xfile, scratch, len, lo_pos);
 }
 
-/* Grab a page for sorting records. */
-static inline int
-xfarray_sort_get_page(
-	struct xfarray_sortinfo	*si,
-	loff_t			pos,
-	uint64_t		len)
-{
-	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
-}
-
-/* Release a page we grabbed for sorting records. */
-static inline int
-xfarray_sort_put_page(
-	struct xfarray_sortinfo	*si)
-{
-	if (!xfile_page_cached(&si->xfpage))
-		return 0;
-	return xfile_put_page(si->array->xfile, &si->xfpage);
-}
-
-/* Decide if these records are eligible for in-page sorting. */
-static inline bool
-xfarray_want_pagesort(
-	struct xfarray_sortinfo	*si,
-	xfarray_idx_t		lo,
-	xfarray_idx_t		hi)
-{
-	pgoff_t			lo_page;
-	pgoff_t			hi_page;
-	loff_t			end_pos;
-
-	/* We can only map one page at a time. */
-	lo_page = xfarray_pos(si->array, lo) >> PAGE_SHIFT;
-	end_pos = xfarray_pos(si->array, hi) + si->array->obj_size - 1;
-	hi_page = end_pos >> PAGE_SHIFT;
-
-	return lo_page == hi_page;
-}
-
-/* Sort a bunch of records that all live in the same memory page. */
+/*
+ * Sort the records from lo to hi (inclusive) if they are all backed by the
+ * same memory folio.  Returns 1 if it sorted, 0 if it did not, or a negative
+ * errno.
+ */
 STATIC int
-xfarray_pagesort(
+xfarray_foliosort(
 	struct xfarray_sortinfo	*si,
 	xfarray_idx_t		lo,
 	xfarray_idx_t		hi)
 {
+	struct folio		*folio;
 	void			*startp;
 	loff_t			lo_pos = xfarray_pos(si->array, lo);
-	uint64_t		len = xfarray_pos(si->array, hi - lo);
-	int			error = 0;
+	uint64_t		len = xfarray_pos(si->array, hi - lo + 1);
 
-	trace_xfarray_pagesort(si, lo, hi);
+	/* No single folio could back this many records. */
+	if (len > XFILE_MAX_FOLIO_SIZE)
+		return 0;
 
 	xfarray_sort_bump_loads(si);
-	error = xfarray_sort_get_page(si, lo_pos, len);
-	if (error)
-		return error;
+	folio = xfile_get_folio(si->array->xfile, lo_pos, len, XFILE_ALLOC);
+	if (IS_ERR(folio))
+		return PTR_ERR(folio);
+	if (!folio)
+		return 0;
+
+	trace_xfarray_foliosort(si, lo, hi);
 
 	xfarray_sort_bump_heapsorts(si);
-	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
+	startp = folio_address(folio) + offset_in_folio(folio, lo_pos);
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
-	return xfarray_sort_put_page(si);
+	xfile_put_folio(si->array->xfile, folio);
+	return 1;
 }
 
 /* Return a pointer to the xfarray pivot record within the sortinfo struct. */
@@ -814,63 +786,78 @@ xfarray_qsort_push(
 	return 0;
 }
 
+static inline void
+xfarray_sort_scan_done(
+	struct xfarray_sortinfo	*si)
+{
+	if (si->folio)
+		xfile_put_folio(si->array->xfile, si->folio);
+	si->folio = NULL;
+}
+
 /*
- * Load an element from the array into the first scratchpad and cache the page,
- * if possible.
+ * Cache the folio backing the start of the given array element.  If the array
+ * element is contained entirely within the folio, return a pointer to the
+ * cached folio.  Otherwise, load the element into the scratchpad and return a
+ * pointer to the scratchpad.
  */
 static inline int
-xfarray_sort_load_cached(
+xfarray_sort_scan(
 	struct xfarray_sortinfo	*si,
 	xfarray_idx_t		idx,
-	void			*ptr)
+	void			**ptrp)
 {
 	loff_t			idx_pos = xfarray_pos(si->array, idx);
-	pgoff_t			startpage;
-	pgoff_t			endpage;
 	int			error = 0;
 
-	/*
-	 * If this load would split a page, release the cached page, if any,
-	 * and perform a traditional read.
-	 */
-	startpage = idx_pos >> PAGE_SHIFT;
-	endpage = (idx_pos + si->array->obj_size - 1) >> PAGE_SHIFT;
-	if (startpage != endpage) {
-		error = xfarray_sort_put_page(si);
-		if (error)
-			return error;
+	if (xfarray_sort_terminated(si, &error))
+		return error;
 
-		if (xfarray_sort_terminated(si, &error))
-			return error;
+	trace_xfarray_sort_scan(si, idx);
 
-		return xfile_load(si->array->xfile, ptr,
-				si->array->obj_size, idx_pos);
-	}
+	/* If the cached folio doesn't cover this index, release it. */
+	if (si->folio &&
+	    (idx < si->first_folio_idx || idx > si->last_folio_idx))
+		xfarray_sort_scan_done(si);
 
-	/* If the cached page is not the one we want, release it. */
-	if (xfile_page_cached(&si->xfpage) &&
-	    xfile_page_index(&si->xfpage) != startpage) {
-		error = xfarray_sort_put_page(si);
-		if (error)
-			return error;
+	/* Grab the first folio that backs this array element. */
+	if (!si->folio) {
+		loff_t		next_pos;
+
+		si->folio = xfile_get_folio(si->array->xfile, idx_pos,
+				si->array->obj_size, XFILE_ALLOC);
+		if (IS_ERR(si->folio))
+			return PTR_ERR(si->folio);
+
+		si->first_folio_idx = xfarray_idx(si->array,
+				folio_pos(si->folio) + si->array->obj_size - 1);
+
+		next_pos = folio_pos(si->folio) + folio_size(si->folio);
+		si->last_folio_idx = xfarray_idx(si->array, next_pos - 1);
+		if (xfarray_pos(si->array, si->last_folio_idx + 1) > next_pos)
+			si->last_folio_idx--;
+
+		trace_xfarray_sort_scan(si, idx);
 	}
 
 	/*
-	 * If we don't have a cached page (and we know the load is contained
-	 * in a single page) then grab it.
+	 * If this folio still doesn't cover the desired element, it must cross
+	 * a folio boundary.  Read into the scratchpad and we're done.
 	 */
-	if (!xfile_page_cached(&si->xfpage)) {
-		if (xfarray_sort_terminated(si, &error))
-			return error;
+	if (idx < si->first_folio_idx || idx > si->last_folio_idx) {
+		void		*temp = xfarray_scratch(si->array);
 
-		error = xfarray_sort_get_page(si, startpage << PAGE_SHIFT,
-				PAGE_SIZE);
+		error = xfile_load(si->array->xfile, temp, si->array->obj_size,
+				idx_pos);
 		if (error)
 			return error;
+
+		*ptrp = temp;
+		return 0;
 	}
 
-	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
-			si->array->obj_size);
+	/* Otherwise return a pointer to the array element in the folio. */
+	*ptrp = folio_address(si->folio) + offset_in_folio(si->folio, idx_pos);
 	return 0;
 }
 
@@ -937,6 +924,8 @@ xfarray_sort(
 	pivot = xfarray_sortinfo_pivot(si);
 
 	while (si->stack_depth >= 0) {
+		int		ret;
+
 		lo = si_lo[si->stack_depth];
 		hi = si_hi[si->stack_depth];
 
@@ -949,13 +938,13 @@ xfarray_sort(
 		}
 
 		/*
-		 * If directly mapping the page and sorting can solve our
+		 * If directly mapping the folio and sorting can solve our
 		 * problems, we're done.
 		 */
-		if (xfarray_want_pagesort(si, lo, hi)) {
-			error = xfarray_pagesort(si, lo, hi);
-			if (error)
-				goto out_free;
+		ret = xfarray_foliosort(si, lo, hi);
+		if (ret < 0)
+			goto out_free;
+		if (ret == 1) {
 			si->stack_depth--;
 			continue;
 		}
@@ -980,25 +969,24 @@ xfarray_sort(
 		 * than the pivot is on the right side of the range.
 		 */
 		while (lo < hi) {
+			void	*p;
+
 			/*
 			 * Decrement hi until it finds an a[hi] less than the
 			 * pivot value.
 			 */
-			error = xfarray_sort_load_cached(si, hi, scratch);
+			error = xfarray_sort_scan(si, hi, &p);
 			if (error)
 				goto out_free;
-			while (xfarray_sort_cmp(si, scratch, pivot) >= 0 &&
-								lo < hi) {
+			while (xfarray_sort_cmp(si, p, pivot) >= 0 && lo < hi) {
 				hi--;
-				error = xfarray_sort_load_cached(si, hi,
-						scratch);
+				error = xfarray_sort_scan(si, hi, &p);
 				if (error)
 					goto out_free;
 			}
-			error = xfarray_sort_put_page(si);
-			if (error)
-				goto out_free;
-
+			if (p != scratch)
+				memcpy(scratch, p, si->array->obj_size);
+			xfarray_sort_scan_done(si);
 			if (xfarray_sort_terminated(si, &error))
 				goto out_free;
 
@@ -1013,21 +1001,18 @@ xfarray_sort(
 			 * Increment lo until it finds an a[lo] greater than
 			 * the pivot value.
 			 */
-			error = xfarray_sort_load_cached(si, lo, scratch);
+			error = xfarray_sort_scan(si, lo, &p);
 			if (error)
 				goto out_free;
-			while (xfarray_sort_cmp(si, scratch, pivot) <= 0 &&
-								lo < hi) {
+			while (xfarray_sort_cmp(si, p, pivot) <= 0 && lo < hi) {
 				lo++;
-				error = xfarray_sort_load_cached(si, lo,
-						scratch);
+				error = xfarray_sort_scan(si, lo, &p);
 				if (error)
 					goto out_free;
 			}
-			error = xfarray_sort_put_page(si);
-			if (error)
-				goto out_free;
-
+			if (p != scratch)
+				memcpy(scratch, p, si->array->obj_size);
+			xfarray_sort_scan_done(si);
 			if (xfarray_sort_terminated(si, &error))
 				goto out_free;
 
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 6f2862054e194d..ec643cc9fc1432 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -105,8 +105,14 @@ struct xfarray_sortinfo {
 	/* XFARRAY_SORT_* flags; see below. */
 	unsigned int		flags;
 
-	/* Cache a page here for faster access. */
-	struct xfile_page	xfpage;
+	/* Cache a folio here for faster scanning for pivots */
+	struct folio		*folio;
+
+	/* First array index in folio that is completely readable */
+	xfarray_idx_t		first_folio_idx;
+
+	/* Last array index in folio that is completely readable */
+	xfarray_idx_t		last_folio_idx;
 
 #ifdef DEBUG
 	/* Performance statistics. */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 20/20] xfs: remove xfile_{get,put}_page
  2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2024-01-29 14:35 ` [PATCH 19/20] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
@ 2024-01-29 14:35 ` Christoph Hellwig
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-01-29 14:35 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

From: "Darrick J. Wong" <djwong@kernel.org>

These functions aren't used anymore, so get rid of them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../xfs/xfs-online-fsck-design.rst            |   2 +-
 fs/xfs/scrub/trace.h                          |   2 -
 fs/xfs/scrub/xfile.c                          | 104 ------------------
 fs/xfs/scrub/xfile.h                          |  20 ----
 4 files changed, 1 insertion(+), 127 deletions(-)

diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 216c99ce511f7c..d03480266e9b18 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -1940,7 +1940,7 @@ mapping it into kernel address space, and dropping the folio lock.
 These long term users *must* be responsive to memory reclaim by hooking into
 the shrinker infrastructure to know when to release folios.
 
-The ``xfile_get_page`` and ``xfile_put_page`` functions are provided to
+The ``xfile_get_folio`` and ``xfile_put_folio`` functions are provided to
 retrieve the (locked) folio that backs part of an xfile and to release it.
 The only code to use these folio lease functions are the xfarray
 :ref:`sorting<xfarray_sort>` algorithms and the :ref:`in-memory
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3a1a827828dcb9..ae6b2385a8cbe5 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -906,8 +906,6 @@ DEFINE_EVENT(xfile_class, name, \
 DEFINE_XFILE_EVENT(xfile_load);
 DEFINE_XFILE_EVENT(xfile_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
-DEFINE_XFILE_EVENT(xfile_get_page);
-DEFINE_XFILE_EVENT(xfile_put_page);
 DEFINE_XFILE_EVENT(xfile_get_folio);
 DEFINE_XFILE_EVENT(xfile_put_folio);
 
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 2d59407909de35..4716869147e3c0 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -237,110 +237,6 @@ xfile_seek_data(
 	return ret;
 }
 
-/*
- * Grab the (locked) page for a memory object.  The object cannot span a page
- * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
- * cannot grab the page, or the usual negative errno.
- */
-int
-xfile_get_page(
-	struct xfile		*xf,
-	loff_t			pos,
-	unsigned int		len,
-	struct xfile_page	*xfpage)
-{
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	struct page		*page = NULL;
-	void			*fsdata = NULL;
-	loff_t			key = round_down(pos, PAGE_SIZE);
-	unsigned int		pflags;
-	int			error;
-
-	if (inode->i_sb->s_maxbytes - pos < len)
-		return -ENOMEM;
-	if (len > PAGE_SIZE - offset_in_page(pos))
-		return -ENOTBLK;
-
-	trace_xfile_get_page(xf, pos, len);
-
-	pflags = memalloc_nofs_save();
-
-	/*
-	 * We call write_begin directly here to avoid all the freezer
-	 * protection lock-taking that happens in the normal path.  shmem
-	 * doesn't support fs freeze, but lockdep doesn't know that and will
-	 * trip over that.
-	 */
-	error = aops->write_begin(NULL, mapping, key, PAGE_SIZE, &page,
-			&fsdata);
-	if (error)
-		goto out_pflags;
-
-	/* We got the page, so make sure we push out EOF. */
-	if (i_size_read(inode) < pos + len)
-		i_size_write(inode, pos + len);
-
-	/*
-	 * If the page isn't up to date, fill it with zeroes before we hand it
-	 * to the caller and make sure the backing store will hold on to them.
-	 */
-	if (!PageUptodate(page)) {
-		memset(page_address(page), 0, PAGE_SIZE);
-		SetPageUptodate(page);
-	}
-
-	/*
-	 * Mark each page dirty so that the contents are written to some
-	 * backing store when we drop this buffer, and take an extra reference
-	 * to prevent the xfile page from being swapped or removed from the
-	 * page cache by reclaim if the caller unlocks the page.
-	 */
-	set_page_dirty(page);
-	get_page(page);
-
-	xfpage->page = page;
-	xfpage->fsdata = fsdata;
-	xfpage->pos = key;
-out_pflags:
-	memalloc_nofs_restore(pflags);
-	return error;
-}
-
-/*
- * Release the (locked) page for a memory object.  Returns 0 or a negative
- * errno.
- */
-int
-xfile_put_page(
-	struct xfile		*xf,
-	struct xfile_page	*xfpage)
-{
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	unsigned int		pflags;
-	int			ret;
-
-	trace_xfile_put_page(xf, xfpage->pos, PAGE_SIZE);
-
-	/* Give back the reference that we took in xfile_get_page. */
-	put_page(xfpage->page);
-
-	pflags = memalloc_nofs_save();
-	ret = aops->write_end(NULL, mapping, xfpage->pos, PAGE_SIZE, PAGE_SIZE,
-			xfpage->page, xfpage->fsdata);
-	memalloc_nofs_restore(pflags);
-	memset(xfpage, 0, sizeof(struct xfile_page));
-
-	if (ret < 0)
-		return ret;
-	if (ret != PAGE_SIZE)
-		return -EIO;
-	return 0;
-}
-
 /*
  * Grab the (locked) folio for a memory object.  The object cannot span a folio
  * boundary.  Returns the locked folio if successful, NULL if there was no
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index afb75e9fbaf265..76d78dba7e3478 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -6,22 +6,6 @@
 #ifndef __XFS_SCRUB_XFILE_H__
 #define __XFS_SCRUB_XFILE_H__
 
-struct xfile_page {
-	struct page		*page;
-	void			*fsdata;
-	loff_t			pos;
-};
-
-static inline bool xfile_page_cached(const struct xfile_page *xfpage)
-{
-	return xfpage->page != NULL;
-}
-
-static inline pgoff_t xfile_page_index(const struct xfile_page *xfpage)
-{
-	return xfpage->page->index;
-}
-
 struct xfile {
 	struct file		*file;
 };
@@ -35,10 +19,6 @@ int xfile_store(struct xfile *xf, const void *buf, size_t count,
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
-		struct xfile_page *xbuf);
-int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
-
 #define XFILE_MAX_FOLIO_SIZE	(PAGE_SIZE << MAX_PAGECACHE_ORDER)
 
 #define XFILE_ALLOC		(1 << 0) /* allocate folio if not present */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-02-19  6:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-29 14:34 put the xfs xfile abstraction on a diet v3 Christoph Hellwig
2024-01-29 14:34 ` [PATCH 01/20] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
2024-01-29 14:34 ` [PATCH 02/20] shmem: move shmem_mapping out of line Christoph Hellwig
2024-02-16  6:03   ` Hugh Dickins
2024-01-29 14:34 ` [PATCH 03/20] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
2024-01-29 14:34 ` [PATCH 04/20] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
2024-01-29 14:34 ` [PATCH 05/20] shmem: export shmem_get_folio Christoph Hellwig
2024-02-16  6:07   ` Hugh Dickins
2024-02-16 13:53   ` Matthew Wilcox
2024-02-19  6:25     ` Christoph Hellwig
2024-01-29 14:34 ` [PATCH 06/20] shmem: export shmem_kernel_file_setup Christoph Hellwig
2024-01-29 16:24   ` Darrick J. Wong
2024-01-29 14:34 ` [PATCH 07/20] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
2024-01-30  6:06   ` Matthew Wilcox
2024-01-30  8:04     ` Christoph Hellwig
2024-01-30  8:05     ` Christoph Hellwig
2024-01-29 14:34 ` [PATCH 08/20] xfs: remove xfile_stat Christoph Hellwig
2024-01-29 14:34 ` [PATCH 09/20] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
2024-01-29 14:34 ` [PATCH 10/20] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
2024-01-29 14:34 ` [PATCH 11/20] xfs: shmem_file_setup can't return NULL Christoph Hellwig
2024-01-29 14:34 ` [PATCH 12/20] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
2024-02-16  6:55   ` Hugh Dickins
2024-01-29 14:34 ` [PATCH 13/20] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
2024-02-16  7:13   ` Hugh Dickins
2024-01-29 14:34 ` [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
2024-02-16  7:40   ` Hugh Dickins
2024-01-29 14:34 ` [PATCH 15/20] xfs: use shmem_get_folio in in xfile_load Christoph Hellwig
2024-01-29 14:34 ` [PATCH 16/20] xfs: add file_{get,put}_folio Christoph Hellwig
2024-01-29 14:34 ` [PATCH 17/20] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
2024-01-29 14:35 ` [PATCH 18/20] xfs: fix a comment in xfarray.c Christoph Hellwig
2024-01-29 14:35 ` [PATCH 19/20] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
2024-01-29 14:35 ` [PATCH 20/20] xfs: remove xfile_{get,put}_page Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).