Re: [PATCH 07/12] xfs: convert buffer cache to use high order folios

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Carlos Maiolino <cem@kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 07/12] xfs: convert buffer cache to use high order folios
Date: Wed, 26 Feb 2025 09:33:33 -0800	[thread overview]
Message-ID: <20250226173333.GR6242@frogsfrogsfrogs> (raw)
In-Reply-To: <20250226155245.513494-8-hch@lst.de>

On Wed, Feb 26, 2025 at 07:51:35AM -0800, Christoph Hellwig wrote:
> Now that we have the buffer cache using the folio API, we can extend
> the use of folios to allocate high order folios for multi-page
> buffers rather than an array of single pages that are then vmapped
> into a contiguous range.
> 
> This creates a new type of single folio buffers that can have arbitrary
> order in addition to the existing multi-folio buffers made up of many
> single page folios that get vmapped.  The single folio is for now
> stashed into the existing b_pages array, but that will go away entirely
> later in the series and remove the temporary page vs folio typing issues
> that only work because the two structures currently can be used largely
> interchangeable.
> 
> The code that allocates buffers will optimistically attempt a high
> order folio allocation as a fast path if the buffer size is a power
> of two and thus fits into a folio. If this high order allocation
> fails, then we fall back to the existing multi-folio allocation
> code. This now forms the slow allocation path, and hopefully will be
> largely unused in normal conditions except for buffers with size
> that are not a power of two like larger remote xattrs.
> 
> This should improve performance of large buffer operations (e.g.
> large directory block sizes) as we should now mostly avoid the
> expense of vmapping large buffers (and the vmap lock contention that
> can occur) as well as avoid the runtime pressure that frequently
> accessing kernel vmapped pages put on the TLBs.
> 
> Based on a patch from Dave Chinner <dchinner@redhat.com>, but mutilated
> beyond recognition.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_buf.c | 58 +++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f327bf5b04c0..3c582eaa656d 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -261,9 +261,10 @@ xfs_buf_free_pages(
>  
>  	for (i = 0; i < bp->b_page_count; i++) {
>  		if (bp->b_pages[i])
> -			__free_page(bp->b_pages[i]);
> +			folio_put(page_folio(bp->b_pages[i]));
>  	}
> -	mm_account_reclaimed_pages(bp->b_page_count);
> +	mm_account_reclaimed_pages(
> +			DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE));

Why do we round the number of bytes in the buffer up to base page size?

Don't we want howmany(BBTOB(bp->b_length), PAGE_SIZE) here?

Oh wait, howmany *is* DIV_ROUND_UP.  Never mind...

>  	if (bp->b_pages != bp->b_page_array)
>  		kfree(bp->b_pages);
> @@ -336,12 +337,17 @@ xfs_buf_alloc_kmem(
>   * For tmpfs-backed buffers used by in-memory btrees this directly maps the
>   * tmpfs page cache folios.
>   *
> - * For real file system buffers there are two different kinds backing memory:
> + * For real file system buffers there are three different kinds backing memory:
>   *
>   * The first type backs the buffer by a kmalloc allocation.  This is done for
>   * less than PAGE_SIZE allocations to avoid wasting memory.
>   *
> - * The second type of buffer is the multi-page buffer. These are always made
> + * The second type is a single folio buffer - this may be a high order folio or
> + * just a single page sized folio, but either way they get treated the same way
> + * by the rest of the code - the buffer memory spans a single contiguous memory
> + * region that we don't have to map and unmap to access the data directly.
> + *
> + * The third type of buffer is the multi-page buffer. These are always made
>   * up of single pages so that they can be fed to vmap_ram() to return a
>   * contiguous memory region we can access the data through, or mark it as
>   * XBF_UNMAPPED and access the data directly through individual page_address()
> @@ -354,6 +360,7 @@ xfs_buf_alloc_backing_mem(
>  {
>  	size_t		size = BBTOB(bp->b_length);
>  	gfp_t		gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
> +	struct folio	*folio;
>  	long		filled = 0;
>  
>  	if (xfs_buftarg_is_mem(bp->b_target))
> @@ -375,7 +382,46 @@ xfs_buf_alloc_backing_mem(
>  	if (size < PAGE_SIZE && is_power_of_2(size))
>  		return xfs_buf_alloc_kmem(bp, size, gfp_mask);
>  
> -	/* Make sure that we have a page list */
> +	/* Assure zeroed buffer for non-read cases. */
> +	if (!(flags & XBF_READ))
> +		gfp_mask |= __GFP_ZERO;

Didn't this get added ten lines up in "xfs: remove the kmalloc to page
allocator fallback"?

> +
> +	/*
> +	 * Don't bother with the retry loop for single PAGE allocations, there
> +	 * is litte changes this can be better than the VM version.

Er... I don't understand the second half of this sentence; is this what
you're trying to communicate?:

"Don't bother with the retry loop for single-page allocations; vmalloc
won't do any better."

> +	 */
> +	if (size <= PAGE_SIZE)
> +		gfp_mask |= __GFP_NOFAIL;
> +
> +	/*
> +	 * Optimistically attempt a single high order folio allocation for
> +	 * larger than PAGE_SIZE buffers.
> +	 *
> +	 * Allocating a high order folio makes the assumption that buffers are a
> +	 * power-of-2 size, matching the power-of-2 folios sizes available.
> +	 *
> +	 * The exception here are user xattr data buffers, which can be arbitrarily
> +	 * sized up to 64kB plus structure metadata, skip straight to the vmalloc
> +	 * path for them instead of wasting memory.
> +	 * here.

Nit: "...path for them instead of wasting memory here."

--D

> +	 */
> +	if (size > PAGE_SIZE && !is_power_of_2(size))
> +		goto fallback;
> +	folio = folio_alloc(gfp_mask, get_order(size));
> +	if (!folio) {
> +		if (size <= PAGE_SIZE)
> +			return -ENOMEM;
> +		goto fallback;
> +	}
> +	bp->b_addr = folio_address(folio);
> +	bp->b_page_array[0] = &folio->page;
> +	bp->b_pages = bp->b_page_array;
> +	bp->b_page_count = 1;
> +	bp->b_flags |= _XBF_PAGES;
> +	return 0;
> +
> +fallback:
> +	/* Fall back to allocating an array of single page folios. */
>  	bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE);
>  	if (bp->b_page_count <= XB_PAGES) {
>  		bp->b_pages = bp->b_page_array;
> @@ -1529,7 +1575,7 @@ xfs_buf_submit_bio(
>  	bio->bi_private = bp;
>  	bio->bi_end_io = xfs_buf_bio_end_io;
>  
> -	if (bp->b_flags & _XBF_KMEM) {
> +	if (bp->b_page_count == 1) {
>  		__bio_add_page(bio, virt_to_page(bp->b_addr), size,
>  				offset_in_page(bp->b_addr));
>  	} else {
> -- 
> 2.45.2
> 
>

next prev parent reply	other threads:[~2025-02-26 17:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 15:51 use folios and vmalloc for buffer cache backing memory Christoph Hellwig
2025-02-26 15:51 ` [PATCH 01/12] xfs: unmapped buffer item size straddling mismatch Christoph Hellwig
2025-02-26 15:51 ` [PATCH 02/12] xfs: add a fast path to xfs_buf_zero when b_addr is set Christoph Hellwig
2025-02-26 17:00   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 03/12] xfs: remove xfs_buf.b_offset Christoph Hellwig
2025-02-26 17:00   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 04/12] xfs: remove xfs_buf_is_vmapped Christoph Hellwig
2025-02-26 17:02   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 05/12] xfs: refactor backing memory allocations for buffers Christoph Hellwig
2025-02-26 17:08   ` Darrick J. Wong
2025-02-26 15:51 ` [PATCH 06/12] xfs: remove the kmalloc to page allocator fallback Christoph Hellwig
2025-02-26 17:22   ` Darrick J. Wong
2025-03-04 14:05     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-02-26 17:33   ` Darrick J. Wong [this message]
2025-03-04 14:06     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 08/12] xfs: kill XBF_UNMAPPED Christoph Hellwig
2025-02-26 15:51 ` [PATCH 09/12] xfs: buffer items don't straddle pages anymore Christoph Hellwig
2025-02-26 15:51 ` [PATCH 10/12] xfs: use vmalloc instead of vm_map_area for buffer backing memory Christoph Hellwig
2025-02-26 18:02   ` Darrick J. Wong
2025-03-04 14:10     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 11/12] xfs: cleanup mapping tmpfs folios into the buffer cache Christoph Hellwig
2025-02-26 17:39   ` Darrick J. Wong
2025-03-04 14:11     ` Christoph Hellwig
2025-02-26 15:51 ` [PATCH 12/12] xfs: trace what memory backs a buffer Christoph Hellwig
2025-02-26 16:45   ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2025-03-05 14:05 use folios and vmalloc for buffer cache backing memory v2 Christoph Hellwig
2025-03-05 14:05 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig
2025-03-05 18:20   ` Darrick J. Wong
2025-03-05 20:50   ` Dave Chinner
2025-03-05 23:33     ` Christoph Hellwig
2025-03-10 13:18     ` Christoph Hellwig
2025-03-10 13:19 use folios and vmalloc for buffer cache backing memory v3 Christoph Hellwig
2025-03-10 13:19 ` [PATCH 07/12] xfs: convert buffer cache to use high order folios Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250226173333.GR6242@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=cem@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox