From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Carlos Maiolino <cem@kernel.org>,
Dave Chinner <dchinner@redhat.com>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 07/12] xfs: convert buffer cache to use high order folios
Date: Wed, 26 Feb 2025 09:33:33 -0800 [thread overview]
Message-ID: <20250226173333.GR6242@frogsfrogsfrogs> (raw)
In-Reply-To: <20250226155245.513494-8-hch@lst.de>
On Wed, Feb 26, 2025 at 07:51:35AM -0800, Christoph Hellwig wrote:
> Now that we have the buffer cache using the folio API, we can extend
> the use of folios to allocate high order folios for multi-page
> buffers rather than an array of single pages that are then vmapped
> into a contiguous range.
>
> This creates a new type of single folio buffers that can have arbitrary
> order in addition to the existing multi-folio buffers made up of many
> single page folios that get vmapped. The single folio is for now
> stashed into the existing b_pages array, but that will go away entirely
> later in the series and remove the temporary page vs folio typing issues
> that only work because the two structures currently can be used largely
> interchangeable.
>
> The code that allocates buffers will optimistically attempt a high
> order folio allocation as a fast path if the buffer size is a power
> of two and thus fits into a folio. If this high order allocation
> fails, then we fall back to the existing multi-folio allocation
> code. This now forms the slow allocation path, and hopefully will be
> largely unused in normal conditions except for buffers with size
> that are not a power of two like larger remote xattrs.
>
> This should improve performance of large buffer operations (e.g.
> large directory block sizes) as we should now mostly avoid the
> expense of vmapping large buffers (and the vmap lock contention that
> can occur) as well as avoid the runtime pressure that frequently
> accessing kernel vmapped pages put on the TLBs.
>
> Based on a patch from Dave Chinner <dchinner@redhat.com>, but mutilated
> beyond recognition.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_buf.c | 58 +++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 52 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f327bf5b04c0..3c582eaa656d 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -261,9 +261,10 @@ xfs_buf_free_pages(
>
> for (i = 0; i < bp->b_page_count; i++) {
> if (bp->b_pages[i])
> - __free_page(bp->b_pages[i]);
> + folio_put(page_folio(bp->b_pages[i]));
> }
> - mm_account_reclaimed_pages(bp->b_page_count);
> + mm_account_reclaimed_pages(
> + DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE));
Why do we round the number of bytes in the buffer up to base page size?
Don't we want howmany(BBTOB(bp->b_length), PAGE_SIZE) here?
Oh wait, howmany *is* DIV_ROUND_UP. Never mind...
> if (bp->b_pages != bp->b_page_array)
> kfree(bp->b_pages);
> @@ -336,12 +337,17 @@ xfs_buf_alloc_kmem(
> * For tmpfs-backed buffers used by in-memory btrees this directly maps the
> * tmpfs page cache folios.
> *
> - * For real file system buffers there are two different kinds backing memory:
> + * For real file system buffers there are three different kinds backing memory:
> *
> * The first type backs the buffer by a kmalloc allocation. This is done for
> * less than PAGE_SIZE allocations to avoid wasting memory.
> *
> - * The second type of buffer is the multi-page buffer. These are always made
> + * The second type is a single folio buffer - this may be a high order folio or
> + * just a single page sized folio, but either way they get treated the same way
> + * by the rest of the code - the buffer memory spans a single contiguous memory
> + * region that we don't have to map and unmap to access the data directly.
> + *
> + * The third type of buffer is the multi-page buffer. These are always made
> * up of single pages so that they can be fed to vmap_ram() to return a
> * contiguous memory region we can access the data through, or mark it as
> * XBF_UNMAPPED and access the data directly through individual page_address()
> @@ -354,6 +360,7 @@ xfs_buf_alloc_backing_mem(
> {
> size_t size = BBTOB(bp->b_length);
> gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
> + struct folio *folio;
> long filled = 0;
>
> if (xfs_buftarg_is_mem(bp->b_target))
> @@ -375,7 +382,46 @@ xfs_buf_alloc_backing_mem(
> if (size < PAGE_SIZE && is_power_of_2(size))
> return xfs_buf_alloc_kmem(bp, size, gfp_mask);
>
> - /* Make sure that we have a page list */
> + /* Assure zeroed buffer for non-read cases. */
> + if (!(flags & XBF_READ))
> + gfp_mask |= __GFP_ZERO;
Didn't this get added ten lines up in "xfs: remove the kmalloc to page
allocator fallback"?
> +
> + /*
> + * Don't bother with the retry loop for single PAGE allocations, there
> + * is litte changes this can be better than the VM version.
Er... I don't understand the second half of this sentence; is this what
you're trying to communicate?:
"Don't bother with the retry loop for single-page allocations; vmalloc
won't do any better."
> + */
> + if (size <= PAGE_SIZE)
> + gfp_mask |= __GFP_NOFAIL;
> +
> + /*
> + * Optimistically attempt a single high order folio allocation for
> + * larger than PAGE_SIZE buffers.
> + *
> + * Allocating a high order folio makes the assumption that buffers are a
> + * power-of-2 size, matching the power-of-2 folios sizes available.
> + *
> + * The exception here are user xattr data buffers, which can be arbitrarily
> + * sized up to 64kB plus structure metadata, skip straight to the vmalloc
> + * path for them instead of wasting memory.
> + * here.
Nit: "...path for them instead of wasting memory here."
--D
> + */
> + if (size > PAGE_SIZE && !is_power_of_2(size))
> + goto fallback;
> + folio = folio_alloc(gfp_mask, get_order(size));
> + if (!folio) {
> + if (size <= PAGE_SIZE)
> + return -ENOMEM;
> + goto fallback;
> + }
> + bp->b_addr = folio_address(folio);
> + bp->b_page_array[0] = &folio->page;
> + bp->b_pages = bp->b_page_array;
> + bp->b_page_count = 1;
> + bp->b_flags |= _XBF_PAGES;
> + return 0;
> +
> +fallback:
> + /* Fall back to allocating an array of single page folios. */
> bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE);
> if (bp->b_page_count <= XB_PAGES) {
> bp->b_pages = bp->b_page_array;
> @@ -1529,7 +1575,7 @@ xfs_buf_submit_bio(
> bio->bi_private = bp;
> bio->bi_end_io = xfs_buf_bio_end_io;
>
> - if (bp->b_flags & _XBF_KMEM) {
> + if (bp->b_page_count == 1) {
> __bio_add_page(bio, virt_to_page(bp->b_addr), size,
> offset_in_page(bp->b_addr));
> } else {
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2025-02-26 17:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-26 15:51 use folios and vmalloc for buffer cache backing memory Christoph Hellwig
2025-02-26 15:51 ` [PATCH 01/12] xfs: unmapped buffer item size straddling mismatch Christoph Hellwig
2025-02-26 15:51 ` [PATCH 02/12] xfs: add a fast path to xfs_buf_zero when b_addr is set Christoph Hellwig
2025-02-26 17:00 ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 03/12] xfs: remove xfs_buf.b_offset Christoph Hellwig
2025-02-26 17:00 ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 04/12] xfs: remove xfs_buf_is_vmapped Christoph Hellwig
2025-02-26 17:02 ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 05/12] xfs: refactor backing memory allocations for buffers Christoph Hellwig
2025-02-26 17:08 ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 06/12] xfs: remove the kmalloc to page allocator fallback Christoph Hellwig
2025-02-26 17:22 ` Darrick J. Wong
2025-03-04 14:05 ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-02-26 17:33 ` Darrick J. Wong [this message]
2025-03-04 14:06 ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 08/12] xfs: kill XBF_UNMAPPED Christoph Hellwig
2025-02-26 15:51 ` [PATCH 09/12] xfs: buffer items don't straddle pages anymore Christoph Hellwig
2025-02-26 15:51 ` [PATCH 10/12] xfs: use vmalloc instead of vm_map_area for buffer backing memory Christoph Hellwig
2025-02-26 18:02 ` Darrick J. Wong
2025-03-04 14:10 ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 11/12] xfs: cleanup mapping tmpfs folios into the buffer cache Christoph Hellwig
2025-02-26 17:39 ` Darrick J. Wong
2025-03-04 14:11 ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 12/12] xfs: trace what memory backs a buffer Christoph Hellwig
2025-02-26 16:45 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2025-03-05 14:05 use folios and vmalloc for buffer cache backing memory v2 Christoph Hellwig
2025-03-05 14:05 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-03-05 18:20 ` Darrick J. Wong
2025-03-05 20:50 ` Dave Chinner
2025-03-05 23:33 ` Christoph Hellwig
2025-03-10 13:18 ` Christoph Hellwig
2025-03-10 13:19 use folios and vmalloc for buffer cache backing memory v3 Christoph Hellwig
2025-03-10 13:19 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250226173333.GR6242@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox