Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>
Cc: mark@harmstone.com, boris@bur.io, wqu@suse.com, dsterba@suse.com,
	clm@fb.com, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-team@meta.com
Subject: Re: [RESEND PATCH v2] btrfs: prevent direct reclaim during compressed readahead
Date: Mon, 30 Mar 2026 21:49:47 +0200	[thread overview]
Message-ID: <20260330194947.GA5735@twin.jikos.cz> (raw)
In-Reply-To: <20260328214619.114790-1-jp.kobryn@linux.dev>

On Sat, Mar 28, 2026 at 02:46:19PM -0700, JP Kobryn (Meta) wrote:
> Under memory pressure, direct reclaim can kick in during compressed
> readahead. This puts the associated task into D-state. Then shrink_lruvec()
> disables interrupts when acquiring the LRU lock. Under heavy pressure,
> we've observed reclaim can run long enough that the CPU becomes prone to
> CSD lock stalls since it cannot service incoming IPIs. Although the CSD
> lock stalls are the worst case scenario, we have found many more subtle
> occurrences of this latency on the order of seconds, over a minute in some
> cases.
> 
> Prevent direct reclaim during compressed readahead. This is achieved by
> using different GFP flags at key points when the bio is marked for
> readahead.
> 
> There are two functions that allocate during compressed readahead:
> btrfs_alloc_compr_folio() and add_ra_bio_pages(). Both currently use
> GFP_NOFS which includes __GFP_DIRECT_RECLAIM.
> 
> For the internal API call btrfs_alloc_compr_folio(), the signature changes
> to accept an additional gfp_t parameter. At the readahead call site, it
> gets flags similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM.
> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
> reads still use full GFP_NOFS and will enter reclaim if needed. All other
> existing call sites of btrfs_alloc_compr_folio() now explicitly pass
> GFP_NOFS to retain their current behavior.
> 
> add_ra_bio_pages() gains a bool parameter which allows callers to specify
> if they want to allow direct reclaim or not. In either case, the
> __GFP_NOWARN flag was added unconditionally since the allocations are
> speculative.
> 
> There has been some previous work done on calling add_ra_bio_pages() [0].
> This patch is complementary: where that patch reduces call frequency, this
> patch reduces the latency associated with those calls.
> 
> [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> Reviewed-by: Mark Harmstone <mark@harmstone.com>
> ---
> v2:
>  - dropped patch 1/2, squashed into single patch based on David's feedback
>  - changed btrfs_alloc_compr_folio() signature instead of new _gfp variant
>  - update other existing callers to pass GFP_NOFS explicitly
> 
> v1: https://lore.kernel.org/linux-btrfs/20260320073445.80218-1-jp.kobryn@linux.dev/
> 
>  fs/btrfs/compression.c | 42 +++++++++++++++++++++++++++++++++++-------
>  fs/btrfs/compression.h |  2 +-
>  fs/btrfs/inode.c       |  2 +-
>  fs/btrfs/lzo.c         |  6 +++---
>  fs/btrfs/zlib.c        |  6 +++---
>  fs/btrfs/zstd.c        |  6 +++---
>  6 files changed, 46 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index e897342bece1f..8f33ef48b501e 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -180,7 +180,7 @@ static unsigned long btrfs_compr_pool_scan(struct shrinker *sh, struct shrink_co
>  /*
>   * Common wrappers for page allocation from compression wrappers
>   */
> -struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
> +struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info, gfp_t gfp)
>  {
>  	struct folio *folio = NULL;
>  
> @@ -200,7 +200,7 @@ struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
>  		return folio;
>  
>  alloc:
> -	return folio_alloc(GFP_NOFS, fs_info->block_min_order);
> +	return folio_alloc(gfp, fs_info->block_min_order);
>  }
>  
>  void btrfs_free_compr_folio(struct folio *folio)
> @@ -368,7 +368,8 @@ struct compressed_bio *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
>  static noinline int add_ra_bio_pages(struct inode *inode,
>  				     u64 compressed_end,
>  				     struct compressed_bio *cb,
> -				     int *memstall, unsigned long *pflags)
> +				     int *memstall, unsigned long *pflags,
> +				     bool direct_reclaim)
>  {
>  	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>  	pgoff_t end_index;
> @@ -376,6 +377,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  	u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
>  	u64 isize = i_size_read(inode);
>  	int ret;
> +	gfp_t constraint_gfp, cache_gfp;
>  	struct folio *folio;
>  	struct extent_map *em;
>  	struct address_space *mapping = inode->i_mapping;
> @@ -405,6 +407,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  
>  	end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
>  
> +	/*
> +	 * Avoid direct reclaim when the caller does not allow it.
> +	 * Since add_ra_bio_pages is always speculative, suppress
> +	 * allocation warnings in either case.
> +	 */
> +	if (!direct_reclaim) {
> +		constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
> +		cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	} else {
> +		constraint_gfp = ~__GFP_FS;
> +		cache_gfp = GFP_NOFS | __GFP_NOWARN;
> +	}
> +
>  	while (cur < compressed_end) {
>  		pgoff_t page_end;
>  		pgoff_t pg_index = cur >> PAGE_SHIFT;
> @@ -434,12 +449,13 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  			continue;
>  		}
>  
> -		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, ~__GFP_FS),
> +		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
> +					    constraint_gfp) | __GFP_NOWARN,

It would be IMHO better to put the __GFP_NOWARN to the definition of
constraint_gfp so it's all done in one go.

  parent reply	other threads:[~2026-03-30 19:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-28 21:46 [RESEND PATCH v2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
2026-03-30 16:52 ` JP Kobryn (Meta)
2026-03-30 19:49 ` David Sterba [this message]
2026-03-30 20:34 ` David Sterba
2026-03-30 21:22 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260330194947.GA5735@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=boris@bur.io \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=jp.kobryn@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-team@meta.com \
    --cc=mark@harmstone.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox