public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>,
	boris@bur.io, mark@harmstone.com, clm@fb.com, dsterba@suse.com,
	linux-btrfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-team@meta.com
Subject: Re: [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
Date: Fri, 20 Mar 2026 20:42:38 +1030	[thread overview]
Message-ID: <0585bdae-de5b-4aaa-bb0c-4bbe0c2fca89@suse.com> (raw)
In-Reply-To: <20260320073445.80218-3-jp.kobryn@linux.dev>



在 2026/3/20 18:04, JP Kobryn (Meta) 写道:
> Prevent direct reclaim during compressed readahead. This is achieved by
> passing specific GFP flags whenever the bio is marked for readahead. The
> flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. Also,
> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
> reads still use full GFP_NOFS and will enter reclaim if needed.

I believe it will be more convincing to explain why the current gfp 
flags is going to cause the problem you mentioned in the cover letter.

> 
> btrfs_submit_compressed_read() now makes use of the new gfp_t API for
> allocations within. Since non-readahead code may call this function, the
> bio flags are inspected to determine whether direct reclaim should be
> restricted or not.
> 
> add_ra_bio_pages() gains a bool parameter which allows callers to specify
> if they want to allow direct reclaim or not. In either case, the NOWARN
> flag was added unconditionally since the allocations are speculative.

After reading the code, I have a feeling that, we shouldn't act on the 
behalf of MM layer to add the next few folios into the page cache.

On the other hand, with the incoming large folios, we will completely 
skip the readahead for large folios.

I know this is not optimal as the next few folios may still belong to 
the same compressed extent and will cause re-read and re-decompression.

Thus I'm wondering, for your specific workload, will disabling 
compressed ra completely and fully rely on large folios help?

If the performance is acceptable, I'd prefer to disable compressed 
readahead completely and rely on large folios instead.

(Now I understand why other fses with compression support is completely 
relying on fixed IO size)

Thanks,
Qu


> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> ---
>   fs/btrfs/compression.c | 33 ++++++++++++++++++++++++++++-----
>   1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ae9cb5b7676c..f32cfc933bee 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -372,7 +372,8 @@ struct compressed_bio *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
>   static noinline int add_ra_bio_pages(struct inode *inode,
>   				     u64 compressed_end,
>   				     struct compressed_bio *cb,
> -				     int *memstall, unsigned long *pflags)
> +				     int *memstall, unsigned long *pflags,
> +				     bool direct_reclaim)
>   {
>   	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>   	pgoff_t end_index;
> @@ -380,6 +381,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   	u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
>   	u64 isize = i_size_read(inode);
>   	int ret;
> +	gfp_t constraint_gfp, cache_gfp;
>   	struct folio *folio;
>   	struct extent_map *em;
>   	struct address_space *mapping = inode->i_mapping;
> @@ -409,6 +411,14 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   
>   	end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
>   
> +	if (!direct_reclaim) {
> +		constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
> +		cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	} else {
> +		constraint_gfp = ~__GFP_FS;
> +		cache_gfp = GFP_NOFS | __GFP_NOWARN;
> +	}
> +
>   	while (cur < compressed_end) {
>   		pgoff_t page_end;
>   		pgoff_t pg_index = cur >> PAGE_SHIFT;
> @@ -438,12 +448,13 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   			continue;
>   		}
>   
> -		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, ~__GFP_FS),
> +		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
> +					    constraint_gfp) | __GFP_NOWARN,
>   					    0, NULL);
>   		if (!folio)
>   			break;
>   
> -		if (filemap_add_folio(mapping, folio, pg_index, GFP_NOFS)) {
> +		if (filemap_add_folio(mapping, folio, pg_index, cache_gfp)) {
>   			/* There is already a page, skip to page end */
>   			cur += folio_size(folio);
>   			folio_put(folio);
> @@ -536,6 +547,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	unsigned int compressed_len;
>   	const u32 min_folio_size = btrfs_min_folio_size(fs_info);
>   	u64 file_offset = bbio->file_offset;
> +	gfp_t gfp;
>   	u64 em_len;
>   	u64 em_start;
>   	struct extent_map *em;
> @@ -543,6 +555,17 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	int memstall = 0;
>   	int ret;
>   
> +	/*
> +	 * If this is a readahead bio, prevent direct reclaim. This is done to
> +	 * avoid stalling on speculative allocations when memory pressure is
> +	 * high. The demand fault will retry with GFP_NOFS and enter direct
> +	 * reclaim if needed.
> +	 */
> +	if (bbio->bio.bi_opf & REQ_RAHEAD)
> +		gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	else
> +		gfp = GFP_NOFS;
> +
>   	/* we need the actual starting offset of this extent in the file */
>   	read_lock(&em_tree->lock);
>   	em = btrfs_lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
> @@ -573,7 +596,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   		struct folio *folio;
>   		u32 cur_len = min(compressed_len - i * min_folio_size, min_folio_size);
>   
> -		folio = btrfs_alloc_compr_folio(fs_info);
> +		folio = btrfs_alloc_compr_folio_gfp(fs_info, gfp);
>   		if (!folio) {
>   			ret = -ENOMEM;
>   			goto out_free_bio;
> @@ -589,7 +612,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	ASSERT(cb->bbio.bio.bi_iter.bi_size == compressed_len);
>   
>   	add_ra_bio_pages(&inode->vfs_inode, em_start + em_len, cb, &memstall,
> -			 &pflags);
> +			 &pflags, !(bbio->bio.bi_opf & REQ_RAHEAD));
>   
>   	cb->len = bbio->bio.bi_iter.bi_size;
>   	cb->bbio.bio.bi_iter.bi_sector = bbio->bio.bi_iter.bi_sector;


  parent reply	other threads:[~2026-03-20 10:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20  7:34 [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
2026-03-20 11:11   ` Mark Harmstone
2026-03-20 17:55   ` David Sterba
2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
2026-03-20  7:36   ` Christoph Hellwig
2026-03-20 16:17     ` JP Kobryn (Meta)
2026-03-20 10:12   ` Qu Wenruo [this message]
2026-03-20 11:14     ` Mark Harmstone
2026-03-20 11:11   ` Mark Harmstone
2026-03-20 11:10 ` [PATCH 0/2] " Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0585bdae-de5b-4aaa-bb0c-4bbe0c2fca89@suse.com \
    --to=wqu@suse.com \
    --cc=boris@bur.io \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=jp.kobryn@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-team@meta.com \
    --cc=mark@harmstone.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox