Re: [PATCH 0/3] Create large folios in iomap buffered write path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wang Yugui <wangyugui@e16-tech.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J . Wong" <djwong@kernel.org>
Subject: Re: [PATCH 0/3] Create large folios in iomap buffered write path
Date: Wed, 31 May 2023 12:34:51 +0800	[thread overview]
Message-ID: <20230531123450.7DC8.409509F4@e16-tech.com> (raw)
In-Reply-To: <20230521194053.8CD1.409509F4@e16-tech.com>

[-- Attachment #1: Type: text/plain, Size: 1927 bytes --]

Hi,

> Hi,
> 
> > Wang Yugui has a workload which would be improved by using large folios.
> > Until now, we've only created large folios in the readahead path,
> > but this workload writes without reading.  The decision of what size
> > folio to create is based purely on the size of the write() call (unlike
> > readahead where we keep history and can choose to create larger folios
> > based on that history even if individual reads are small).
> > 
> > The third patch looks like it's an optional extra but is actually needed
> > for the first two patches to work in the write path, otherwise it limits
> > the length that iomap_get_folio() sees to PAGE_SIZE.
> > 
> > Matthew Wilcox (Oracle) (3):
> >   filemap: Allow __filemap_get_folio to allocate large folios
> >   iomap: Create large folios in the buffered write path
> >   iomap: Copy larger chunks from userspace
> 
> The [PATCH 2/3] is a little difficult to backport to 6.1.y(LTS),
> it need some patches as depency.
> 
> so please provide those patches based on 6.1.y(LTS)  and depency list?
> then we can do more test on 6.1.y(LTS) too.

I selected 8 patches as depency
    d7b64041164c :Dave Chinner: iomap: write iomap validity checks

    7a70a5085ed0 :Andreas Gruenbacher: iomap: Add __iomap_put_folio helper
    80baab88bb93 :Andreas Gruenbacher: iomap/gfs2: Unlock and put folio in page_done handler
    40405dddd98a :Andreas Gruenbacher: iomap: Rename page_done handler to put_folio

    98321b5139f9 :Andreas Gruenbacher: iomap: Add iomap_get_folio helper

    9060bc4d3aca :Andreas Gruenbacher: iomap/gfs2: Get page in page_prepare handler
    07c22b56685d :Andreas Gruenbacher: iomap: Add __iomap_get_folio helper
    c82abc239464 :Andreas Gruenbacher: iomap: Rename page_prepare handler to get_folio

then rebased path 1, 2 ( see attachment files).

Now we can test patch 1,2,3 on 5.1.31.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/05/31


[-- Attachment #2: 0001-filemap-Allow-__filemap_get_folio-to-allocate-large-.patch --]
[-- Type: application/octet-stream, Size: 6398 bytes --]

From 7cd08ed97e73d2c2946fb2fc353b8e6da12a7e95 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Sat, 20 May 2023 17:36:01 +0100
Subject: [PATCH] filemap: Allow __filemap_get_folio to allocate large folios

Allow callers of __filemap_get_folio() to specify a preferred folio
order in the FGP flags.  This is only honoured in the FGP_CREATE path;
if there is already a folio in the page cache that covers the index,
we will return it, no matter what its order is.  No create-around is
attempted; we will only create folios which start at the specified index.
Unmodified callers will continue to allocate order 0 folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 29 ++++++++++++++++++++++++---
 mm/filemap.c            | 44 ++++++++++++++++++++++++++++-------------
 mm/folio-compat.c       |  2 +-
 mm/readahead.c          | 13 ------------
 4 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a4..f4d05beb64eb 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -466,6 +466,19 @@ static inline void *detach_page_private(struct page *page)
 	return folio_detach_private(page_folio(page));
 }
 
+/*
+ * There are some parts of the kernel which assume that PMD entries
+ * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
+ * limit the maximum allocation order to PMD size.  I'm not aware of any
+ * assumptions about maximum order if THP are disabled, but 8 seems like
+ * a good order (that's 1MB if you're using 4kB pages)
+ */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
+#else
+#define MAX_PAGECACHE_ORDER	8
+#endif
+
 #ifdef CONFIG_NUMA
 struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
 #else
@@ -507,11 +520,21 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 #define FGP_HEAD		0x00000080
 #define FGP_ENTRY		0x00000100
 #define FGP_STABLE		0x00000200
+#define FGP_ORDER(fgp)		((fgp) >> 26)	/* top 6 bits */
+
+static inline unsigned fgp_order(size_t size)
+{
+	unsigned int shift = ilog2(size);
+
+	if (shift <= PAGE_SHIFT)
+		return 0;
+	return (shift - PAGE_SHIFT) << 26;
+}
 
 struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
-		int fgp_flags, gfp_t gfp);
+		unsigned fgp_flags, gfp_t gfp);
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
-		int fgp_flags, gfp_t gfp);
+		unsigned fgp_flags, gfp_t gfp);
 
 /**
  * filemap_get_folio - Find and get a folio.
@@ -586,7 +609,7 @@ static inline struct page *find_get_page(struct address_space *mapping,
 }
 
 static inline struct page *find_get_page_flags(struct address_space *mapping,
-					pgoff_t offset, int fgp_flags)
+					pgoff_t offset, unsigned fgp_flags)
 {
 	return pagecache_get_page(mapping, offset, fgp_flags, 0);
 }
diff --git a/mm/filemap.c b/mm/filemap.c
index b4c9bd368b7e..5935c7aac388 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1910,7 +1910,7 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
  * Return: The found folio or %NULL otherwise.
  */
 struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
-		int fgp_flags, gfp_t gfp)
+		unsigned fgp_flags, gfp_t gfp)
 {
 	struct folio *folio;
 
@@ -1952,7 +1952,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		folio_wait_stable(folio);
 no_page:
 	if (!folio && (fgp_flags & FGP_CREAT)) {
+		unsigned order = FGP_ORDER(fgp_flags);
 		int err;
+
 		if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping))
 			gfp |= __GFP_WRITE;
 		if (fgp_flags & FGP_NOFS)
@@ -1961,26 +1963,40 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 			gfp &= ~GFP_KERNEL;
 			gfp |= GFP_NOWAIT | __GFP_NOWARN;
 		}
-
-		folio = filemap_alloc_folio(gfp, 0);
-		if (!folio)
-			return NULL;
-
 		if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
 			fgp_flags |= FGP_LOCK;
 
-		/* Init accessed so avoid atomic mark_page_accessed later */
-		if (fgp_flags & FGP_ACCESSED)
-			__folio_set_referenced(folio);
+		if (!mapping_large_folio_support(mapping))
+			order = 0;
+		if (order > MAX_PAGECACHE_ORDER)
+			order = MAX_PAGECACHE_ORDER;
+		/* If we're not aligned, allocate a smaller folio */
+		if (index & ((1UL << order) - 1))
+			order = __ffs(index);
 
-		err = filemap_add_folio(mapping, folio, index, gfp);
-		if (unlikely(err)) {
+		do {
+			err = -ENOMEM;
+			if (order == 1)
+				order = 0;
+			folio = filemap_alloc_folio(gfp, order);
+			if (!folio)
+				continue;
+
+			/* Init accessed so avoid atomic mark_page_accessed later */
+			if (fgp_flags & FGP_ACCESSED)
+				__folio_set_referenced(folio);
+
+			err = filemap_add_folio(mapping, folio, index, gfp);
+			if (!err)
+				break;
 			folio_put(folio);
 			folio = NULL;
-			if (err == -EEXIST)
-				goto repeat;
-		}
+		} while (order-- > 0);
 
+		if (err == -EEXIST)
+			goto repeat;
+		if (err)
+			return ERR_PTR(err);
 		/*
 		 * filemap_add_folio locks the page, and for mmap
 		 * we expect an unlocked page.
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index c6f056c20503..c96e88d9a262 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -92,7 +92,7 @@ EXPORT_SYMBOL(add_to_page_cache_lru);
 
 noinline
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
-		int fgp_flags, gfp_t gfp)
+		unsigned fgp_flags, gfp_t gfp)
 {
 	struct folio *folio;
 
diff --git a/mm/readahead.c b/mm/readahead.c
index 47afbca1d122..59a071badb90 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -462,19 +462,6 @@ static int try_context_readahead(struct address_space *mapping,
 	return 1;
 }
 
-/*
- * There are some parts of the kernel which assume that PMD entries
- * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
- * limit the maximum allocation order to PMD size.  I'm not aware of any
- * assumptions about maximum order if THP are disabled, but 8 seems like
- * a good order (that's 1MB if you're using 4kB pages)
- */
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
-#else
-#define MAX_PAGECACHE_ORDER	8
-#endif
-
 static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
 		pgoff_t mark, unsigned int order, gfp_t gfp)
 {
-- 
2.36.2


[-- Attachment #3: 0002-iomap-Create-large-folios-in-the-buffered-write-path.patch --]
[-- Type: application/octet-stream, Size: 3523 bytes --]

From 3528f43f2936ea78700e5650e8e0fcf572358a1c Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Sat, 20 May 2023 17:36:02 +0100
Subject: [PATCH] iomap: Create large folios in the buffered write path

Use the size of the write as a hint for the size of the folio to create.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/gfs2/bmap.c         |  2 +-
 fs/iomap/buffered-io.c | 10 ++++++----
 include/linux/iomap.h  |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index c739b258a2d9..3702e5e47b0f 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -971,7 +971,7 @@ gfs2_iomap_get_folio(struct iomap_iter *iter, loff_t pos, unsigned len)
 	if (status)
 		return ERR_PTR(status);
 
-	folio = iomap_get_folio(iter, pos);
+	folio = iomap_get_folio(iter, pos, len);
 	if (IS_ERR(folio))
 		gfs2_trans_end(sdp);
 	return folio;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f4..651af2d424ac 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -461,17 +461,19 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
  * iomap_get_folio - get a folio reference for writing
  * @iter: iteration structure
  * @pos: start offset of write
+ * @len: length of write
  *
  * Returns a locked reference to the folio at @pos, or an error pointer if the
  * folio could not be obtained.
  */
-struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
+struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
 {
 	unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
 	struct folio *folio;
 
 	if (iter->flags & IOMAP_NOWAIT)
 		fgp |= FGP_NOWAIT;
+	fgp |= fgp_order(len);
 
 	folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
 			fgp, mapping_gfp_mask(iter->inode->i_mapping));
@@ -510,8 +512,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
 		iomap_page_release(folio);
 	} else if (folio_test_large(folio)) {
 		/* Must release the iop so the page can be split */
-		WARN_ON_ONCE(!folio_test_uptodate(folio) &&
-			     folio_test_dirty(folio));
+		VM_WARN_ON_ONCE_FOLIO(!folio_test_uptodate(folio) &&
+				folio_test_dirty(folio), folio);
 		iomap_page_release(folio);
 	}
 }
@@ -603,7 +605,7 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos,
 	if (page_ops && page_ops->get_folio)
 		return page_ops->get_folio(iter, pos, len);
 	else
-		return iomap_get_folio(iter, pos);
+		return iomap_get_folio(iter, pos, len);
 }
 
 static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e2b836c2e119..80facb9c9e5b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -261,7 +261,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
 void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
 bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
-struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos);
+struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
 bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
 void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
-- 
2.36.2

     prev parent reply	other threads:[~2023-05-31  4:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-20 16:36 [PATCH 0/3] Create large folios in iomap buffered write path Matthew Wilcox (Oracle)
2023-05-20 16:36 ` [PATCH 1/3] filemap: Allow __filemap_get_folio to allocate large folios Matthew Wilcox (Oracle)
2023-05-21  1:02   ` Wang Yugui
2023-05-21  1:56     ` Matthew Wilcox
2023-05-21  2:04       ` Wang Yugui
2023-05-21  3:39         ` Matthew Wilcox
2023-05-21  2:13   ` Dave Chinner
2023-05-21  3:38     ` Matthew Wilcox
2023-05-23  5:59   ` Christoph Hellwig
2023-05-23 12:17     ` Matthew Wilcox
2023-05-20 16:36 ` [PATCH 2/3] iomap: Create large folios in the buffered write path Matthew Wilcox (Oracle)
2023-05-20 16:36 ` [PATCH 3/3] iomap: Copy larger chunks from userspace Matthew Wilcox (Oracle)
2023-05-20 19:11   ` kernel test robot
2023-05-20 19:25   ` kernel test robot
2023-05-21  0:49 ` [PATCH 0/3] Create large folios in iomap buffered write path Wang Yugui
2023-05-21  0:59   ` Matthew Wilcox
2023-05-21  1:38     ` Wang Yugui
2023-05-21  2:49 ` Dave Chinner
2023-05-21 11:40 ` Wang Yugui
2023-05-31  4:34   ` Wang Yugui [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a56308a9d1a dfblob:f4d05beb64e dfblob:b4c9bd368b7
dfblob:5935c7aac38 dfblob:c6f056c2050 dfblob:c96e88d9a26
dfblob:47afbca1d12 dfblob:59a071badb9 dfblob:c739b258a2d
dfblob:3702e5e47b0 dfblob:063133ec77f dfblob:651af2d424a
dfblob:e2b836c2e11 dfblob:80facb9c9e5 )
 OR (
bs:"iomap: Create large folios in the buffered write path" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230531123450.7DC8.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.