From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Carlos Maiolino <cem@kernel.org>,
Dave Chinner <dchinner@redhat.com>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 07/12] xfs: convert buffer cache to use high order folios
Date: Wed, 5 Mar 2025 10:20:09 -0800 [thread overview]
Message-ID: <20250305182009.GJ2803749@frogsfrogsfrogs> (raw)
In-Reply-To: <20250305140532.158563-8-hch@lst.de>
On Wed, Mar 05, 2025 at 07:05:24AM -0700, Christoph Hellwig wrote:
> Now that we have the buffer cache using the folio API, we can extend
> the use of folios to allocate high order folios for multi-page
> buffers rather than an array of single pages that are then vmapped
> into a contiguous range.
>
> This creates a new type of single folio buffers that can have arbitrary
> order in addition to the existing multi-folio buffers made up of many
> single page folios that get vmapped. The single folio is for now
> stashed into the existing b_pages array, but that will go away entirely
> later in the series and remove the temporary page vs folio typing issues
> that only work because the two structures currently can be used largely
> interchangeable.
>
> The code that allocates buffers will optimistically attempt a high
> order folio allocation as a fast path if the buffer size is a power
> of two and thus fits into a folio. If this high order allocation
> fails, then we fall back to the existing multi-folio allocation
> code. This now forms the slow allocation path, and hopefully will be
> largely unused in normal conditions except for buffers with size
> that are not a power of two like larger remote xattrs.
>
> This should improve performance of large buffer operations (e.g.
> large directory block sizes) as we should now mostly avoid the
> expense of vmapping large buffers (and the vmap lock contention that
> can occur) as well as avoid the runtime pressure that frequently
> accessing kernel vmapped pages put on the TLBs.
>
> Based on a patch from Dave Chinner <dchinner@redhat.com>, but mutilated
> beyond recognition.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good now!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 52 ++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 073246d4352f..f0666ef57bd2 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -203,9 +203,9 @@ xfs_buf_free_pages(
>
> for (i = 0; i < bp->b_page_count; i++) {
> if (bp->b_pages[i])
> - __free_page(bp->b_pages[i]);
> + folio_put(page_folio(bp->b_pages[i]));
> }
> - mm_account_reclaimed_pages(bp->b_page_count);
> + mm_account_reclaimed_pages(howmany(BBTOB(bp->b_length), PAGE_SIZE));
>
> if (bp->b_pages != bp->b_page_array)
> kfree(bp->b_pages);
> @@ -277,12 +277,17 @@ xfs_buf_alloc_kmem(
> * For tmpfs-backed buffers used by in-memory btrees this directly maps the
> * tmpfs page cache folios.
> *
> - * For real file system buffers there are two different kinds backing memory:
> + * For real file system buffers there are three different kinds backing memory:
> *
> * The first type backs the buffer by a kmalloc allocation. This is done for
> * less than PAGE_SIZE allocations to avoid wasting memory.
> *
> - * The second type of buffer is the multi-page buffer. These are always made
> + * The second type is a single folio buffer - this may be a high order folio or
> + * just a single page sized folio, but either way they get treated the same way
> + * by the rest of the code - the buffer memory spans a single contiguous memory
> + * region that we don't have to map and unmap to access the data directly.
> + *
> + * The third type of buffer is the multi-page buffer. These are always made
> * up of single pages so that they can be fed to vmap_ram() to return a
> * contiguous memory region we can access the data through, or mark it as
> * XBF_UNMAPPED and access the data directly through individual page_address()
> @@ -295,6 +300,7 @@ xfs_buf_alloc_backing_mem(
> {
> size_t size = BBTOB(bp->b_length);
> gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
> + struct folio *folio;
> long filled = 0;
>
> if (xfs_buftarg_is_mem(bp->b_target))
> @@ -316,7 +322,41 @@ xfs_buf_alloc_backing_mem(
> if (size < PAGE_SIZE && is_power_of_2(size))
> return xfs_buf_alloc_kmem(bp, size, gfp_mask);
>
> - /* Make sure that we have a page list */
> + /*
> + * Don't bother with the retry loop for single PAGE allocations: vmalloc
> + * won't do any better.
> + */
> + if (size <= PAGE_SIZE)
> + gfp_mask |= __GFP_NOFAIL;
> +
> + /*
> + * Optimistically attempt a single high order folio allocation for
> + * larger than PAGE_SIZE buffers.
> + *
> + * Allocating a high order folio makes the assumption that buffers are a
> + * power-of-2 size, matching the power-of-2 folios sizes available.
> + *
> + * The exception here are user xattr data buffers, which can be arbitrarily
> + * sized up to 64kB plus structure metadata, skip straight to the vmalloc
> + * path for them instead of wasting memory here.
> + */
> + if (size > PAGE_SIZE && !is_power_of_2(size))
> + goto fallback;
> + folio = folio_alloc(gfp_mask, get_order(size));
> + if (!folio) {
> + if (size <= PAGE_SIZE)
> + return -ENOMEM;
> + goto fallback;
> + }
> + bp->b_addr = folio_address(folio);
> + bp->b_page_array[0] = &folio->page;
> + bp->b_pages = bp->b_page_array;
> + bp->b_page_count = 1;
> + bp->b_flags |= _XBF_PAGES;
> + return 0;
> +
> +fallback:
> + /* Fall back to allocating an array of single page folios. */
> bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE);
> if (bp->b_page_count <= XB_PAGES) {
> bp->b_pages = bp->b_page_array;
> @@ -1474,7 +1514,7 @@ xfs_buf_submit_bio(
> bio->bi_private = bp;
> bio->bi_end_io = xfs_buf_bio_end_io;
>
> - if (bp->b_flags & _XBF_KMEM) {
> + if (bp->b_page_count == 1) {
> __bio_add_page(bio, virt_to_page(bp->b_addr), size,
> offset_in_page(bp->b_addr));
> } else {
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2025-03-05 18:20 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-05 14:05 use folios and vmalloc for buffer cache backing memory v2 Christoph Hellwig
2025-03-05 14:05 ` [PATCH 01/12] xfs: unmapped buffer item size straddling mismatch Christoph Hellwig
2025-03-05 14:05 ` [PATCH 02/12] xfs: add a fast path to xfs_buf_zero when b_addr is set Christoph Hellwig
2025-03-05 14:05 ` [PATCH 03/12] xfs: remove xfs_buf.b_offset Christoph Hellwig
2025-03-05 14:05 ` [PATCH 04/12] xfs: remove xfs_buf_is_vmapped Christoph Hellwig
2025-03-05 14:05 ` [PATCH 05/12] xfs: refactor backing memory allocations for buffers Christoph Hellwig
2025-03-05 14:05 ` [PATCH 06/12] xfs: remove the kmalloc to page allocator fallback Christoph Hellwig
2025-03-05 18:18 ` Darrick J. Wong
2025-03-05 23:32 ` Christoph Hellwig
2025-03-05 21:02 ` Dave Chinner
2025-03-05 23:38 ` Christoph Hellwig
2025-03-05 14:05 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-03-05 18:20 ` Darrick J. Wong [this message]
2025-03-05 20:50 ` Dave Chinner
2025-03-05 23:33 ` Christoph Hellwig
2025-03-10 13:18 ` Christoph Hellwig
2025-03-05 14:05 ` [PATCH 08/12] xfs: kill XBF_UNMAPPED Christoph Hellwig
2025-03-05 14:05 ` [PATCH 09/12] xfs: buffer items don't straddle pages anymore Christoph Hellwig
2025-03-05 14:05 ` [PATCH 10/12] xfs: use vmalloc instead of vm_map_area for buffer backing memory Christoph Hellwig
2025-03-05 18:22 ` Darrick J. Wong
2025-03-05 21:20 ` Dave Chinner
2025-03-05 22:54 ` Darrick J. Wong
2025-03-05 23:28 ` Dave Chinner
2025-03-05 23:45 ` Christoph Hellwig
2025-03-05 23:35 ` Christoph Hellwig
2025-03-06 0:57 ` Dave Chinner
2025-03-06 1:40 ` Christoph Hellwig
2025-03-05 14:05 ` [PATCH 11/12] xfs: cleanup mapping tmpfs folios into the buffer cache Christoph Hellwig
2025-03-05 18:34 ` Darrick J. Wong
2025-03-05 14:05 ` [PATCH 12/12] xfs: trace what memory backs a buffer Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2025-03-10 13:19 use folios and vmalloc for buffer cache backing memory v3 Christoph Hellwig
2025-03-10 13:19 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-02-26 15:51 use folios and vmalloc for buffer cache backing memory Christoph Hellwig
2025-02-26 15:51 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-02-26 17:33 ` Darrick J. Wong
2025-03-04 14:06 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250305182009.GJ2803749@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox