public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Carlos Maiolino <cem@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	linux-xfs@vger.kernel.org
Subject: [PATCH 07/12] xfs: convert buffer cache to use high order folios
Date: Wed, 26 Feb 2025 07:51:35 -0800	[thread overview]
Message-ID: <20250226155245.513494-8-hch@lst.de> (raw)
In-Reply-To: <20250226155245.513494-1-hch@lst.de>

Now that we have the buffer cache using the folio API, we can extend
the use of folios to allocate high order folios for multi-page
buffers rather than an array of single pages that are then vmapped
into a contiguous range.

This creates a new type of single folio buffers that can have arbitrary
order in addition to the existing multi-folio buffers made up of many
single page folios that get vmapped.  The single folio is for now
stashed into the existing b_pages array, but that will go away entirely
later in the series and remove the temporary page vs folio typing issues
that only work because the two structures currently can be used largely
interchangeable.

The code that allocates buffers will optimistically attempt a high
order folio allocation as a fast path if the buffer size is a power
of two and thus fits into a folio. If this high order allocation
fails, then we fall back to the existing multi-folio allocation
code. This now forms the slow allocation path, and hopefully will be
largely unused in normal conditions except for buffers with size
that are not a power of two like larger remote xattrs.

This should improve performance of large buffer operations (e.g.
large directory block sizes) as we should now mostly avoid the
expense of vmapping large buffers (and the vmap lock contention that
can occur) as well as avoid the runtime pressure that frequently
accessing kernel vmapped pages put on the TLBs.

Based on a patch from Dave Chinner <dchinner@redhat.com>, but mutilated
beyond recognition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 58 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f327bf5b04c0..3c582eaa656d 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -261,9 +261,10 @@ xfs_buf_free_pages(
 
 	for (i = 0; i < bp->b_page_count; i++) {
 		if (bp->b_pages[i])
-			__free_page(bp->b_pages[i]);
+			folio_put(page_folio(bp->b_pages[i]));
 	}
-	mm_account_reclaimed_pages(bp->b_page_count);
+	mm_account_reclaimed_pages(
+			DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE));
 
 	if (bp->b_pages != bp->b_page_array)
 		kfree(bp->b_pages);
@@ -336,12 +337,17 @@ xfs_buf_alloc_kmem(
  * For tmpfs-backed buffers used by in-memory btrees this directly maps the
  * tmpfs page cache folios.
  *
- * For real file system buffers there are two different kinds backing memory:
+ * For real file system buffers there are three different kinds backing memory:
  *
  * The first type backs the buffer by a kmalloc allocation.  This is done for
  * less than PAGE_SIZE allocations to avoid wasting memory.
  *
- * The second type of buffer is the multi-page buffer. These are always made
+ * The second type is a single folio buffer - this may be a high order folio or
+ * just a single page sized folio, but either way they get treated the same way
+ * by the rest of the code - the buffer memory spans a single contiguous memory
+ * region that we don't have to map and unmap to access the data directly.
+ *
+ * The third type of buffer is the multi-page buffer. These are always made
  * up of single pages so that they can be fed to vmap_ram() to return a
  * contiguous memory region we can access the data through, or mark it as
  * XBF_UNMAPPED and access the data directly through individual page_address()
@@ -354,6 +360,7 @@ xfs_buf_alloc_backing_mem(
 {
 	size_t		size = BBTOB(bp->b_length);
 	gfp_t		gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
+	struct folio	*folio;
 	long		filled = 0;
 
 	if (xfs_buftarg_is_mem(bp->b_target))
@@ -375,7 +382,46 @@ xfs_buf_alloc_backing_mem(
 	if (size < PAGE_SIZE && is_power_of_2(size))
 		return xfs_buf_alloc_kmem(bp, size, gfp_mask);
 
-	/* Make sure that we have a page list */
+	/* Assure zeroed buffer for non-read cases. */
+	if (!(flags & XBF_READ))
+		gfp_mask |= __GFP_ZERO;
+
+	/*
+	 * Don't bother with the retry loop for single PAGE allocations, there
+	 * is litte changes this can be better than the VM version.
+	 */
+	if (size <= PAGE_SIZE)
+		gfp_mask |= __GFP_NOFAIL;
+
+	/*
+	 * Optimistically attempt a single high order folio allocation for
+	 * larger than PAGE_SIZE buffers.
+	 *
+	 * Allocating a high order folio makes the assumption that buffers are a
+	 * power-of-2 size, matching the power-of-2 folios sizes available.
+	 *
+	 * The exception here are user xattr data buffers, which can be arbitrarily
+	 * sized up to 64kB plus structure metadata, skip straight to the vmalloc
+	 * path for them instead of wasting memory.
+	 * here.
+	 */
+	if (size > PAGE_SIZE && !is_power_of_2(size))
+		goto fallback;
+	folio = folio_alloc(gfp_mask, get_order(size));
+	if (!folio) {
+		if (size <= PAGE_SIZE)
+			return -ENOMEM;
+		goto fallback;
+	}
+	bp->b_addr = folio_address(folio);
+	bp->b_page_array[0] = &folio->page;
+	bp->b_pages = bp->b_page_array;
+	bp->b_page_count = 1;
+	bp->b_flags |= _XBF_PAGES;
+	return 0;
+
+fallback:
+	/* Fall back to allocating an array of single page folios. */
 	bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE);
 	if (bp->b_page_count <= XB_PAGES) {
 		bp->b_pages = bp->b_page_array;
@@ -1529,7 +1575,7 @@ xfs_buf_submit_bio(
 	bio->bi_private = bp;
 	bio->bi_end_io = xfs_buf_bio_end_io;
 
-	if (bp->b_flags & _XBF_KMEM) {
+	if (bp->b_page_count == 1) {
 		__bio_add_page(bio, virt_to_page(bp->b_addr), size,
 				offset_in_page(bp->b_addr));
 	} else {
-- 
2.45.2


  parent reply	other threads:[~2025-02-26 15:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 15:51 use folios and vmalloc for buffer cache backing memory Christoph Hellwig
2025-02-26 15:51 ` [PATCH 01/12] xfs: unmapped buffer item size straddling mismatch Christoph Hellwig
2025-02-26 15:51 ` [PATCH 02/12] xfs: add a fast path to xfs_buf_zero when b_addr is set Christoph Hellwig
2025-02-26 17:00   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 03/12] xfs: remove xfs_buf.b_offset Christoph Hellwig
2025-02-26 17:00   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 04/12] xfs: remove xfs_buf_is_vmapped Christoph Hellwig
2025-02-26 17:02   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 05/12] xfs: refactor backing memory allocations for buffers Christoph Hellwig
2025-02-26 17:08   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 06/12] xfs: remove the kmalloc to page allocator fallback Christoph Hellwig
2025-02-26 17:22   ` Darrick J. Wong
2025-03-04 14:05     ` Christoph Hellwig
2025-02-26 15:51 ` Christoph Hellwig [this message]
2025-02-26 17:33   ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Darrick J. Wong
2025-03-04 14:06     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 08/12] xfs: kill XBF_UNMAPPED Christoph Hellwig
2025-02-26 15:51 ` [PATCH 09/12] xfs: buffer items don't straddle pages anymore Christoph Hellwig
2025-02-26 15:51 ` [PATCH 10/12] xfs: use vmalloc instead of vm_map_area for buffer backing memory Christoph Hellwig
2025-02-26 18:02   ` Darrick J. Wong
2025-03-04 14:10     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 11/12] xfs: cleanup mapping tmpfs folios into the buffer cache Christoph Hellwig
2025-02-26 17:39   ` Darrick J. Wong
2025-03-04 14:11     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 12/12] xfs: trace what memory backs a buffer Christoph Hellwig
2025-02-26 16:45   ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2025-03-05 14:05 use folios and vmalloc for buffer cache backing memory v2 Christoph Hellwig
2025-03-05 14:05 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-03-05 18:20   ` Darrick J. Wong
2025-03-05 20:50   ` Dave Chinner
2025-03-05 23:33     ` Christoph Hellwig
2025-03-10 13:18     ` Christoph Hellwig
2025-03-10 13:19 use folios and vmalloc for buffer cache backing memory v3 Christoph Hellwig
2025-03-10 13:19 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250226155245.513494-8-hch@lst.de \
    --to=hch@lst.de \
    --cc=cem@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox