Re: [PATCH v9] btrfs: prefer to allocate larger folio for metadata

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v9] btrfs: prefer to allocate larger folio for metadata
Date: Fri, 2 Aug 2024 12:00:31 -0400	[thread overview]
Message-ID: <20240802160031.GC6306@perftesting> (raw)
In-Reply-To: <ef421f88bfa5cf4fd1d4293a8f27cfc97d5d10e4.1722557590.git.wqu@suse.com>

On Fri, Aug 02, 2024 at 09:48:00AM +0930, Qu Wenruo wrote:
> Since btrfs metadata is always in fixed size (nodesize, determined at
> mkfs time, default to 16K), and btrfs has the full control of the folios
> (read is triggered internally, no read/readahead call backs), it's the
> best location to experimental larger folios inside btrfs.
> 
> To enable larger folios, the btrfs has to meet the following conditions:
> 
> - The extent buffer start is aligned to nodesize
>   This should be the common case for any btrfs in the last 5 years.
> 
> - The nodesize is larger than page size
> 
> - MM layer can fulfill our larger folio allocation
>   The larger folio will cover exactly the metadata size (nodesize).
> 
> If any of the condition is not met, we just fall back to page sized
> folio and go as usual.
> This means, we can have mixed orders for btrfs metadata.
> 
> Thus there are several new corner cases with the mixed orders:
> 
> 1) New filemap_add_folio() -EEXIST failure cases
>    For mixed order cases, filemap_add_folio() can return -EEXIST
>    meanwhile filemap_lock_folio() returns -ENOENT.
>    In this case where are 2 possible reasons:
>    * The folio get reclaimed between add and lock
>    * The larger folio conflicts with smaller ones in the range
> 
>    We have no way to distinguish them, so for larger folio case we
>    fall back to order 0 and retry, as that will rule out folio conflict
>    case.
> 
> 2) Existing folio size may be different than the one we allocated
>    This is after the existing eb checks.
> 
> 2.1) The existing folio is larger than the allocated one
>      Need to free all allocated folios, and use the existing larger
>      folio instead.
> 
> 2.2) The existing folio has the same size
>      Free the allocated one and reuse the page cache.
>      This is the existing path.
> 
> 2.3) The existing folio is smaller than the allocated one
>      Fall back to re-allocate order 0 folios instead.
> 
> Otherwise all the needed infrastructure is already here, we only need to
> try allocate larger folio as our first try in alloc_eb_folio_array().
> 
> For now, the higher order allocation is only a preferable attempt for
> debug build, before we had enough test coverage and push it to end
> users.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I think this is as close as we're going to get without testing it and finding
the sharp edges, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

     prev parent reply	other threads:[~2024-08-02 16:00 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02  0:18 [PATCH v9] btrfs: prefer to allocate larger folio for metadata Qu Wenruo
2024-08-02 16:00 ` Josef Bacik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240802160031.GC6306@perftesting \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.