[PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead
@ 2026-03-20  7:34 JP Kobryn (Meta)
  2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-20  7:34 UTC (permalink / raw)
  To: boris, mark, clm, wqu, dsterba, linux-btrfs; +Cc: linux-kernel, linux-team

We're finding that while under memory pressure, direct reclaim is kicking
in during compressed readahead. This puts the associated task into D-state.
Then shrink_lruvec() disables interrupts when acquiring the LRU lock. Under
heavy pressure, reclaim can run long enough that the CPU becomes prone to
CSD lock stalls since it cannot service incoming IPIs. Although the CSD
lock stalls are the worst case scenario, we have found many more subtle
occurrences of this latency on the order of seconds, over a minute in some
cases.

Prevent direct reclaim during compressed readahead. This is achieved by
using different GFP flags whenever the bio is marked for readahead. The
flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM.  Also,
__GFP_NOWARN is added since these allocations are allowed to fail. Demand
reads still use full GFP_NOFS and will enter reclaim if needed.

There has been some previous work done to reduce the frequency of calling
add_ra_bio_pages() [0]. This patch is complementary in that it reduces the
latency associated with those calls.

[0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/

JP Kobryn (Meta) (2):
  btrfs: additional gfp api for allocating compressed folios
  btrfs: prevent direct reclaim during compressed readahead

 fs/btrfs/compression.c | 44 ++++++++++++++++++++++++++++++++++--------
 fs/btrfs/compression.h |  1 +
 2 files changed, 37 insertions(+), 8 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios
  2026-03-20  7:34 [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
@ 2026-03-20  7:34 ` JP Kobryn (Meta)
  2026-03-20 11:11   ` Mark Harmstone
  2026-03-20 17:55   ` David Sterba
  2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
  2026-03-20 11:10 ` [PATCH 0/2] " Mark Harmstone
  2 siblings, 2 replies; 11+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-20  7:34 UTC (permalink / raw)
  To: boris, mark, clm, wqu, dsterba, linux-btrfs; +Cc: linux-kernel, linux-team

btrfs_alloc_compr_folio() assumes all callers want GFP_NOFS. This is fine
for most cases, however there are some call sites that would benefit from
different flags. One such case is preventing direct reclaim from occurring
during readahead allocations. With unbounded reclaim during this time,
noticeable latency will occur under high memory pressure.

Provide an additional API that accepts one additional gfp_t parameter,
giving callers flexibility over the characteristics of their allocation.

Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
---
 fs/btrfs/compression.c | 9 +++++++--
 fs/btrfs/compression.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 192f133d9eb5..ae9cb5b7676c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -180,7 +180,7 @@ static unsigned long btrfs_compr_pool_scan(struct shrinker *sh, struct shrink_co
 /*
  * Common wrappers for page allocation from compression wrappers
  */
-struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
+struct folio *btrfs_alloc_compr_folio_gfp(struct btrfs_fs_info *fs_info, gfp_t gfp)
 {
 	struct folio *folio = NULL;
 
@@ -200,7 +200,12 @@ struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
 		return folio;
 
 alloc:
-	return folio_alloc(GFP_NOFS, fs_info->block_min_order);
+	return folio_alloc(gfp, fs_info->block_min_order);
+}
+
+struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
+{
+	return btrfs_alloc_compr_folio_gfp(fs_info, GFP_NOFS);
 }
 
 void btrfs_free_compr_folio(struct folio *folio)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 973530e9ce6c..6131c128dd21 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -99,6 +99,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio);
 int btrfs_compress_str2level(unsigned int type, const char *str, int *level_ret);
 
 struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info);
+struct folio *btrfs_alloc_compr_folio_gfp(struct btrfs_fs_info *fs_info, gfp_t gfp);
 void btrfs_free_compr_folio(struct folio *folio);
 
 struct workspace_manager {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20  7:34 [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
  2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
@ 2026-03-20  7:34 ` JP Kobryn (Meta)
  2026-03-20  7:36   ` Christoph Hellwig
                     ` (2 more replies)
  2026-03-20 11:10 ` [PATCH 0/2] " Mark Harmstone
  2 siblings, 3 replies; 11+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-20  7:34 UTC (permalink / raw)
  To: boris, mark, clm, wqu, dsterba, linux-btrfs; +Cc: linux-kernel, linux-team

Prevent direct reclaim during compressed readahead. This is achieved by
passing specific GFP flags whenever the bio is marked for readahead. The
flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. Also,
__GFP_NOWARN is added since these allocations are allowed to fail. Demand
reads still use full GFP_NOFS and will enter reclaim if needed.

btrfs_submit_compressed_read() now makes use of the new gfp_t API for
allocations within. Since non-readahead code may call this function, the
bio flags are inspected to determine whether direct reclaim should be
restricted or not.

add_ra_bio_pages() gains a bool parameter which allows callers to specify
if they want to allow direct reclaim or not. In either case, the NOWARN
flag was added unconditionally since the allocations are speculative.

Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
---
 fs/btrfs/compression.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ae9cb5b7676c..f32cfc933bee 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -372,7 +372,8 @@ struct compressed_bio *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
 static noinline int add_ra_bio_pages(struct inode *inode,
 				     u64 compressed_end,
 				     struct compressed_bio *cb,
-				     int *memstall, unsigned long *pflags)
+				     int *memstall, unsigned long *pflags,
+				     bool direct_reclaim)
 {
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
 	pgoff_t end_index;
@@ -380,6 +381,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 	u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
 	u64 isize = i_size_read(inode);
 	int ret;
+	gfp_t constraint_gfp, cache_gfp;
 	struct folio *folio;
 	struct extent_map *em;
 	struct address_space *mapping = inode->i_mapping;
@@ -409,6 +411,14 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 
 	end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
 
+	if (!direct_reclaim) {
+		constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
+		cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
+	} else {
+		constraint_gfp = ~__GFP_FS;
+		cache_gfp = GFP_NOFS | __GFP_NOWARN;
+	}
+
 	while (cur < compressed_end) {
 		pgoff_t page_end;
 		pgoff_t pg_index = cur >> PAGE_SHIFT;
@@ -438,12 +448,13 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			continue;
 		}
 
-		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, ~__GFP_FS),
+		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
+					    constraint_gfp) | __GFP_NOWARN,
 					    0, NULL);
 		if (!folio)
 			break;
 
-		if (filemap_add_folio(mapping, folio, pg_index, GFP_NOFS)) {
+		if (filemap_add_folio(mapping, folio, pg_index, cache_gfp)) {
 			/* There is already a page, skip to page end */
 			cur += folio_size(folio);
 			folio_put(folio);
@@ -536,6 +547,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 	unsigned int compressed_len;
 	const u32 min_folio_size = btrfs_min_folio_size(fs_info);
 	u64 file_offset = bbio->file_offset;
+	gfp_t gfp;
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
@@ -543,6 +555,17 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 	int memstall = 0;
 	int ret;
 
+	/*
+	 * If this is a readahead bio, prevent direct reclaim. This is done to
+	 * avoid stalling on speculative allocations when memory pressure is
+	 * high. The demand fault will retry with GFP_NOFS and enter direct
+	 * reclaim if needed.
+	 */
+	if (bbio->bio.bi_opf & REQ_RAHEAD)
+		gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
+	else
+		gfp = GFP_NOFS;
+
 	/* we need the actual starting offset of this extent in the file */
 	read_lock(&em_tree->lock);
 	em = btrfs_lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
@@ -573,7 +596,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 		struct folio *folio;
 		u32 cur_len = min(compressed_len - i * min_folio_size, min_folio_size);
 
-		folio = btrfs_alloc_compr_folio(fs_info);
+		folio = btrfs_alloc_compr_folio_gfp(fs_info, gfp);
 		if (!folio) {
 			ret = -ENOMEM;
 			goto out_free_bio;
@@ -589,7 +612,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 	ASSERT(cb->bbio.bio.bi_iter.bi_size == compressed_len);
 
 	add_ra_bio_pages(&inode->vfs_inode, em_start + em_len, cb, &memstall,
-			 &pflags);
+			 &pflags, !(bbio->bio.bi_opf & REQ_RAHEAD));
 
 	cb->len = bbio->bio.bi_iter.bi_size;
 	cb->bbio.bio.bi_iter.bi_sector = bbio->bio.bi_iter.bi_sector;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
@ 2026-03-20  7:36   ` Christoph Hellwig
  2026-03-20 16:17     ` JP Kobryn (Meta)
  2026-03-20 10:12   ` Qu Wenruo
  2026-03-20 11:11   ` Mark Harmstone
  2 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2026-03-20  7:36 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: boris, mark, clm, wqu, dsterba, linux-btrfs, linux-kernel,
	linux-team

On Fri, Mar 20, 2026 at 12:34:45AM -0700, JP Kobryn (Meta) wrote:
> Prevent direct reclaim during compressed readahead.

This completely fails to explain why you want that.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
  2026-03-20  7:36   ` Christoph Hellwig
@ 2026-03-20 10:12   ` Qu Wenruo
  2026-03-20 11:14     ` Mark Harmstone
  2026-03-20 11:11   ` Mark Harmstone
  2 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2026-03-20 10:12 UTC (permalink / raw)
  To: JP Kobryn (Meta), boris, mark, clm, dsterba, linux-btrfs
  Cc: linux-kernel, linux-team



在 2026/3/20 18:04, JP Kobryn (Meta) 写道:
> Prevent direct reclaim during compressed readahead. This is achieved by
> passing specific GFP flags whenever the bio is marked for readahead. The
> flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. Also,
> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
> reads still use full GFP_NOFS and will enter reclaim if needed.

I believe it will be more convincing to explain why the current gfp 
flags is going to cause the problem you mentioned in the cover letter.

> 
> btrfs_submit_compressed_read() now makes use of the new gfp_t API for
> allocations within. Since non-readahead code may call this function, the
> bio flags are inspected to determine whether direct reclaim should be
> restricted or not.
> 
> add_ra_bio_pages() gains a bool parameter which allows callers to specify
> if they want to allow direct reclaim or not. In either case, the NOWARN
> flag was added unconditionally since the allocations are speculative.

After reading the code, I have a feeling that, we shouldn't act on the 
behalf of MM layer to add the next few folios into the page cache.

On the other hand, with the incoming large folios, we will completely 
skip the readahead for large folios.

I know this is not optimal as the next few folios may still belong to 
the same compressed extent and will cause re-read and re-decompression.

Thus I'm wondering, for your specific workload, will disabling 
compressed ra completely and fully rely on large folios help?

If the performance is acceptable, I'd prefer to disable compressed 
readahead completely and rely on large folios instead.

(Now I understand why other fses with compression support is completely 
relying on fixed IO size)

Thanks,
Qu


> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> ---
>   fs/btrfs/compression.c | 33 ++++++++++++++++++++++++++++-----
>   1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ae9cb5b7676c..f32cfc933bee 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -372,7 +372,8 @@ struct compressed_bio *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
>   static noinline int add_ra_bio_pages(struct inode *inode,
>   				     u64 compressed_end,
>   				     struct compressed_bio *cb,
> -				     int *memstall, unsigned long *pflags)
> +				     int *memstall, unsigned long *pflags,
> +				     bool direct_reclaim)
>   {
>   	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>   	pgoff_t end_index;
> @@ -380,6 +381,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   	u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
>   	u64 isize = i_size_read(inode);
>   	int ret;
> +	gfp_t constraint_gfp, cache_gfp;
>   	struct folio *folio;
>   	struct extent_map *em;
>   	struct address_space *mapping = inode->i_mapping;
> @@ -409,6 +411,14 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   
>   	end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
>   
> +	if (!direct_reclaim) {
> +		constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
> +		cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	} else {
> +		constraint_gfp = ~__GFP_FS;
> +		cache_gfp = GFP_NOFS | __GFP_NOWARN;
> +	}
> +
>   	while (cur < compressed_end) {
>   		pgoff_t page_end;
>   		pgoff_t pg_index = cur >> PAGE_SHIFT;
> @@ -438,12 +448,13 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   			continue;
>   		}
>   
> -		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, ~__GFP_FS),
> +		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
> +					    constraint_gfp) | __GFP_NOWARN,
>   					    0, NULL);
>   		if (!folio)
>   			break;
>   
> -		if (filemap_add_folio(mapping, folio, pg_index, GFP_NOFS)) {
> +		if (filemap_add_folio(mapping, folio, pg_index, cache_gfp)) {
>   			/* There is already a page, skip to page end */
>   			cur += folio_size(folio);
>   			folio_put(folio);
> @@ -536,6 +547,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	unsigned int compressed_len;
>   	const u32 min_folio_size = btrfs_min_folio_size(fs_info);
>   	u64 file_offset = bbio->file_offset;
> +	gfp_t gfp;
>   	u64 em_len;
>   	u64 em_start;
>   	struct extent_map *em;
> @@ -543,6 +555,17 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	int memstall = 0;
>   	int ret;
>   
> +	/*
> +	 * If this is a readahead bio, prevent direct reclaim. This is done to
> +	 * avoid stalling on speculative allocations when memory pressure is
> +	 * high. The demand fault will retry with GFP_NOFS and enter direct
> +	 * reclaim if needed.
> +	 */
> +	if (bbio->bio.bi_opf & REQ_RAHEAD)
> +		gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	else
> +		gfp = GFP_NOFS;
> +
>   	/* we need the actual starting offset of this extent in the file */
>   	read_lock(&em_tree->lock);
>   	em = btrfs_lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
> @@ -573,7 +596,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   		struct folio *folio;
>   		u32 cur_len = min(compressed_len - i * min_folio_size, min_folio_size);
>   
> -		folio = btrfs_alloc_compr_folio(fs_info);
> +		folio = btrfs_alloc_compr_folio_gfp(fs_info, gfp);
>   		if (!folio) {
>   			ret = -ENOMEM;
>   			goto out_free_bio;
> @@ -589,7 +612,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	ASSERT(cb->bbio.bio.bi_iter.bi_size == compressed_len);
>   
>   	add_ra_bio_pages(&inode->vfs_inode, em_start + em_len, cb, &memstall,
> -			 &pflags);
> +			 &pflags, !(bbio->bio.bi_opf & REQ_RAHEAD));
>   
>   	cb->len = bbio->bio.bi_iter.bi_size;
>   	cb->bbio.bio.bi_iter.bi_sector = bbio->bio.bi_iter.bi_sector;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20  7:34 [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
  2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
  2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
@ 2026-03-20 11:10 ` Mark Harmstone
  2 siblings, 0 replies; 11+ messages in thread
From: Mark Harmstone @ 2026-03-20 11:10 UTC (permalink / raw)
  To: JP Kobryn (Meta), boris, clm, wqu, dsterba, linux-btrfs
  Cc: linux-kernel, linux-team

On 20/03/2026 7.34 am, JP Kobryn (Meta) wrote:
> We're finding that while under memory pressure, direct reclaim is kicking
> in during compressed readahead. This puts the associated task into D-state.
> Then shrink_lruvec() disables interrupts when acquiring the LRU lock. Under
> heavy pressure, reclaim can run long enough that the CPU becomes prone to
> CSD lock stalls since it cannot service incoming IPIs. Although the CSD
> lock stalls are the worst case scenario, we have found many more subtle
> occurrences of this latency on the order of seconds, over a minute in some
> cases.
> 
> Prevent direct reclaim during compressed readahead. This is achieved by
> using different GFP flags whenever the bio is marked for readahead. The
> flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM.  Also,
> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
> reads still use full GFP_NOFS and will enter reclaim if needed.

This seems a sensible change to me. Read-ahead is speculative, so it's 
better for it to fail rather than cause problems elsewhere.

> There has been some previous work done to reduce the frequency of calling
> add_ra_bio_pages() [0]. This patch is complementary in that it reduces the
> latency associated with those calls.
> 
> [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/
> 
> JP Kobryn (Meta) (2):
>    btrfs: additional gfp api for allocating compressed folios
>    btrfs: prevent direct reclaim during compressed readahead
> 
>   fs/btrfs/compression.c | 44 ++++++++++++++++++++++++++++++++++--------
>   fs/btrfs/compression.h |  1 +
>   2 files changed, 37 insertions(+), 8 deletions(-)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios
  2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
@ 2026-03-20 11:11   ` Mark Harmstone
  2026-03-20 17:55   ` David Sterba
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Harmstone @ 2026-03-20 11:11 UTC (permalink / raw)
  To: JP Kobryn (Meta), boris, clm, wqu, dsterba, linux-btrfs
  Cc: linux-kernel, linux-team

Reviewed-by: Mark Harmstone <mark@harmstone.com>

On 20/03/2026 7.34 am, JP Kobryn (Meta) wrote:
> btrfs_alloc_compr_folio() assumes all callers want GFP_NOFS. This is fine
> for most cases, however there are some call sites that would benefit from
> different flags. One such case is preventing direct reclaim from occurring
> during readahead allocations. With unbounded reclaim during this time,
> noticeable latency will occur under high memory pressure.
> 
> Provide an additional API that accepts one additional gfp_t parameter,
> giving callers flexibility over the characteristics of their allocation.
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> ---
>   fs/btrfs/compression.c | 9 +++++++--
>   fs/btrfs/compression.h | 1 +
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 192f133d9eb5..ae9cb5b7676c 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -180,7 +180,7 @@ static unsigned long btrfs_compr_pool_scan(struct shrinker *sh, struct shrink_co
>   /*
>    * Common wrappers for page allocation from compression wrappers
>    */
> -struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
> +struct folio *btrfs_alloc_compr_folio_gfp(struct btrfs_fs_info *fs_info, gfp_t gfp)
>   {
>   	struct folio *folio = NULL;
>   
> @@ -200,7 +200,12 @@ struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
>   		return folio;
>   
>   alloc:
> -	return folio_alloc(GFP_NOFS, fs_info->block_min_order);
> +	return folio_alloc(gfp, fs_info->block_min_order);
> +}
> +
> +struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
> +{
> +	return btrfs_alloc_compr_folio_gfp(fs_info, GFP_NOFS);
>   }
>   
>   void btrfs_free_compr_folio(struct folio *folio)
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index 973530e9ce6c..6131c128dd21 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -99,6 +99,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio);
>   int btrfs_compress_str2level(unsigned int type, const char *str, int *level_ret);
>   
>   struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info);
> +struct folio *btrfs_alloc_compr_folio_gfp(struct btrfs_fs_info *fs_info, gfp_t gfp);
>   void btrfs_free_compr_folio(struct folio *folio);
>   
>   struct workspace_manager {


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
  2026-03-20  7:36   ` Christoph Hellwig
  2026-03-20 10:12   ` Qu Wenruo
@ 2026-03-20 11:11   ` Mark Harmstone
  2 siblings, 0 replies; 11+ messages in thread
From: Mark Harmstone @ 2026-03-20 11:11 UTC (permalink / raw)
  To: JP Kobryn (Meta), boris, clm, wqu, dsterba, linux-btrfs
  Cc: linux-kernel, linux-team

Reviewed-by: Mark Harmstone <mark@harmstone.com>

On 20/03/2026 7.34 am, JP Kobryn (Meta) wrote:
> Prevent direct reclaim during compressed readahead. This is achieved by
> passing specific GFP flags whenever the bio is marked for readahead. The
> flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. Also,
> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
> reads still use full GFP_NOFS and will enter reclaim if needed.
> 
> btrfs_submit_compressed_read() now makes use of the new gfp_t API for
> allocations within. Since non-readahead code may call this function, the
> bio flags are inspected to determine whether direct reclaim should be
> restricted or not.
> 
> add_ra_bio_pages() gains a bool parameter which allows callers to specify
> if they want to allow direct reclaim or not. In either case, the NOWARN
> flag was added unconditionally since the allocations are speculative.
> 
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
> ---
>   fs/btrfs/compression.c | 33 ++++++++++++++++++++++++++++-----
>   1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ae9cb5b7676c..f32cfc933bee 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -372,7 +372,8 @@ struct compressed_bio *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
>   static noinline int add_ra_bio_pages(struct inode *inode,
>   				     u64 compressed_end,
>   				     struct compressed_bio *cb,
> -				     int *memstall, unsigned long *pflags)
> +				     int *memstall, unsigned long *pflags,
> +				     bool direct_reclaim)
>   {
>   	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>   	pgoff_t end_index;
> @@ -380,6 +381,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   	u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
>   	u64 isize = i_size_read(inode);
>   	int ret;
> +	gfp_t constraint_gfp, cache_gfp;
>   	struct folio *folio;
>   	struct extent_map *em;
>   	struct address_space *mapping = inode->i_mapping;
> @@ -409,6 +411,14 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   
>   	end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
>   
> +	if (!direct_reclaim) {
> +		constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
> +		cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	} else {
> +		constraint_gfp = ~__GFP_FS;
> +		cache_gfp = GFP_NOFS | __GFP_NOWARN;
> +	}
> +
>   	while (cur < compressed_end) {
>   		pgoff_t page_end;
>   		pgoff_t pg_index = cur >> PAGE_SHIFT;
> @@ -438,12 +448,13 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   			continue;
>   		}
>   
> -		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, ~__GFP_FS),
> +		folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
> +					    constraint_gfp) | __GFP_NOWARN,
>   					    0, NULL);
>   		if (!folio)
>   			break;
>   
> -		if (filemap_add_folio(mapping, folio, pg_index, GFP_NOFS)) {
> +		if (filemap_add_folio(mapping, folio, pg_index, cache_gfp)) {
>   			/* There is already a page, skip to page end */
>   			cur += folio_size(folio);
>   			folio_put(folio);
> @@ -536,6 +547,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	unsigned int compressed_len;
>   	const u32 min_folio_size = btrfs_min_folio_size(fs_info);
>   	u64 file_offset = bbio->file_offset;
> +	gfp_t gfp;
>   	u64 em_len;
>   	u64 em_start;
>   	struct extent_map *em;
> @@ -543,6 +555,17 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	int memstall = 0;
>   	int ret;
>   
> +	/*
> +	 * If this is a readahead bio, prevent direct reclaim. This is done to
> +	 * avoid stalling on speculative allocations when memory pressure is
> +	 * high. The demand fault will retry with GFP_NOFS and enter direct
> +	 * reclaim if needed.
> +	 */
> +	if (bbio->bio.bi_opf & REQ_RAHEAD)
> +		gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	else
> +		gfp = GFP_NOFS;
> +
>   	/* we need the actual starting offset of this extent in the file */
>   	read_lock(&em_tree->lock);
>   	em = btrfs_lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
> @@ -573,7 +596,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   		struct folio *folio;
>   		u32 cur_len = min(compressed_len - i * min_folio_size, min_folio_size);
>   
> -		folio = btrfs_alloc_compr_folio(fs_info);
> +		folio = btrfs_alloc_compr_folio_gfp(fs_info, gfp);
>   		if (!folio) {
>   			ret = -ENOMEM;
>   			goto out_free_bio;
> @@ -589,7 +612,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
>   	ASSERT(cb->bbio.bio.bi_iter.bi_size == compressed_len);
>   
>   	add_ra_bio_pages(&inode->vfs_inode, em_start + em_len, cb, &memstall,
> -			 &pflags);
> +			 &pflags, !(bbio->bio.bi_opf & REQ_RAHEAD));
>   
>   	cb->len = bbio->bio.bi_iter.bi_size;
>   	cb->bbio.bio.bi_iter.bi_sector = bbio->bio.bi_iter.bi_sector;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20 10:12   ` Qu Wenruo
@ 2026-03-20 11:14     ` Mark Harmstone
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Harmstone @ 2026-03-20 11:14 UTC (permalink / raw)
  To: Qu Wenruo, JP Kobryn (Meta), boris, clm, dsterba, linux-btrfs
  Cc: linux-kernel, linux-team

On 20/03/2026 10.12 am, Qu Wenruo wrote:
> 
> 
> 在 2026/3/20 18:04, JP Kobryn (Meta) 写道:
>> Prevent direct reclaim during compressed readahead. This is achieved by
>> passing specific GFP flags whenever the bio is marked for readahead. The
>> flags are similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. Also,
>> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
>> reads still use full GFP_NOFS and will enter reclaim if needed.
> 
> I believe it will be more convincing to explain why the current gfp 
> flags is going to cause the problem you mentioned in the cover letter.
> 
>>
>> btrfs_submit_compressed_read() now makes use of the new gfp_t API for
>> allocations within. Since non-readahead code may call this function, the
>> bio flags are inspected to determine whether direct reclaim should be
>> restricted or not.
>>
>> add_ra_bio_pages() gains a bool parameter which allows callers to specify
>> if they want to allow direct reclaim or not. In either case, the NOWARN
>> flag was added unconditionally since the allocations are speculative.
> 
> After reading the code, I have a feeling that, we shouldn't act on the 
> behalf of MM layer to add the next few folios into the page cache.

Your idea might have merit, but this is a quick fix for a problem that 
JP has seen in production. Reworking the whole thing might be the 
ultimate solution, but that's much riskier, and will take more testing, 
than the proposed change.

> On the other hand, with the incoming large folios, we will completely 
> skip the readahead for large folios.
> 
> I know this is not optimal as the next few folios may still belong to 
> the same compressed extent and will cause re-read and re-decompression.
> 
> Thus I'm wondering, for your specific workload, will disabling 
> compressed ra completely and fully rely on large folios help?
> 
> If the performance is acceptable, I'd prefer to disable compressed 
> readahead completely and rely on large folios instead.
> 
> (Now I understand why other fses with compression support is completely 
> relying on fixed IO size)
> 
> Thanks,
> Qu
> 
> 
>>
>> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
>> ---
>>   fs/btrfs/compression.c | 33 ++++++++++++++++++++++++++++-----
>>   1 file changed, 28 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index ae9cb5b7676c..f32cfc933bee 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -372,7 +372,8 @@ struct compressed_bio 
>> *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
>>   static noinline int add_ra_bio_pages(struct inode *inode,
>>                        u64 compressed_end,
>>                        struct compressed_bio *cb,
>> -                     int *memstall, unsigned long *pflags)
>> +                     int *memstall, unsigned long *pflags,
>> +                     bool direct_reclaim)
>>   {
>>       struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>>       pgoff_t end_index;
>> @@ -380,6 +381,7 @@ static noinline int add_ra_bio_pages(struct inode 
>> *inode,
>>       u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
>>       u64 isize = i_size_read(inode);
>>       int ret;
>> +    gfp_t constraint_gfp, cache_gfp;
>>       struct folio *folio;
>>       struct extent_map *em;
>>       struct address_space *mapping = inode->i_mapping;
>> @@ -409,6 +411,14 @@ static noinline int add_ra_bio_pages(struct inode 
>> *inode,
>>       end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
>> +    if (!direct_reclaim) {
>> +        constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
>> +        cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
>> +    } else {
>> +        constraint_gfp = ~__GFP_FS;
>> +        cache_gfp = GFP_NOFS | __GFP_NOWARN;
>> +    }
>> +
>>       while (cur < compressed_end) {
>>           pgoff_t page_end;
>>           pgoff_t pg_index = cur >> PAGE_SHIFT;
>> @@ -438,12 +448,13 @@ static noinline int add_ra_bio_pages(struct 
>> inode *inode,
>>               continue;
>>           }
>> -        folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, 
>> ~__GFP_FS),
>> +        folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
>> +                        constraint_gfp) | __GFP_NOWARN,
>>                           0, NULL);
>>           if (!folio)
>>               break;
>> -        if (filemap_add_folio(mapping, folio, pg_index, GFP_NOFS)) {
>> +        if (filemap_add_folio(mapping, folio, pg_index, cache_gfp)) {
>>               /* There is already a page, skip to page end */
>>               cur += folio_size(folio);
>>               folio_put(folio);
>> @@ -536,6 +547,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio 
>> *bbio)
>>       unsigned int compressed_len;
>>       const u32 min_folio_size = btrfs_min_folio_size(fs_info);
>>       u64 file_offset = bbio->file_offset;
>> +    gfp_t gfp;
>>       u64 em_len;
>>       u64 em_start;
>>       struct extent_map *em;
>> @@ -543,6 +555,17 @@ void btrfs_submit_compressed_read(struct 
>> btrfs_bio *bbio)
>>       int memstall = 0;
>>       int ret;
>> +    /*
>> +     * If this is a readahead bio, prevent direct reclaim. This is 
>> done to
>> +     * avoid stalling on speculative allocations when memory pressure is
>> +     * high. The demand fault will retry with GFP_NOFS and enter direct
>> +     * reclaim if needed.
>> +     */
>> +    if (bbio->bio.bi_opf & REQ_RAHEAD)
>> +        gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
>> +    else
>> +        gfp = GFP_NOFS;
>> +
>>       /* we need the actual starting offset of this extent in the file */
>>       read_lock(&em_tree->lock);
>>       em = btrfs_lookup_extent_mapping(em_tree, file_offset, fs_info- 
>> >sectorsize);
>> @@ -573,7 +596,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio 
>> *bbio)
>>           struct folio *folio;
>>           u32 cur_len = min(compressed_len - i * min_folio_size, 
>> min_folio_size);
>> -        folio = btrfs_alloc_compr_folio(fs_info);
>> +        folio = btrfs_alloc_compr_folio_gfp(fs_info, gfp);
>>           if (!folio) {
>>               ret = -ENOMEM;
>>               goto out_free_bio;
>> @@ -589,7 +612,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio 
>> *bbio)
>>       ASSERT(cb->bbio.bio.bi_iter.bi_size == compressed_len);
>>       add_ra_bio_pages(&inode->vfs_inode, em_start + em_len, cb, 
>> &memstall,
>> -             &pflags);
>> +             &pflags, !(bbio->bio.bi_opf & REQ_RAHEAD));
>>       cb->len = bbio->bio.bi_iter.bi_size;
>>       cb->bbio.bio.bi_iter.bi_sector = bbio->bio.bi_iter.bi_sector;
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead
  2026-03-20  7:36   ` Christoph Hellwig
@ 2026-03-20 16:17     ` JP Kobryn (Meta)
  0 siblings, 0 replies; 11+ messages in thread
From: JP Kobryn (Meta) @ 2026-03-20 16:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: boris, mark, clm, wqu, dsterba, linux-btrfs, linux-kernel,
	linux-team

On 3/20/26 12:36 AM, Christoph Hellwig wrote:
> On Fri, Mar 20, 2026 at 12:34:45AM -0700, JP Kobryn (Meta) wrote:
>> Prevent direct reclaim during compressed readahead.
> 
> This completely fails to explain why you want that.

I see that now.

Qu also had a good point about including info on why the current flags
are an issue. Some of the cover letter text would help here. I'll bring
some of that over and expand on it so we have the relevant details here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios
  2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
  2026-03-20 11:11   ` Mark Harmstone
@ 2026-03-20 17:55   ` David Sterba
  1 sibling, 0 replies; 11+ messages in thread
From: David Sterba @ 2026-03-20 17:55 UTC (permalink / raw)
  To: JP Kobryn (Meta)
  Cc: boris, mark, clm, wqu, dsterba, linux-btrfs, linux-kernel,
	linux-team

On Fri, Mar 20, 2026 at 12:34:44AM -0700, JP Kobryn (Meta) wrote:
> btrfs_alloc_compr_folio() assumes all callers want GFP_NOFS. This is fine
> for most cases, however there are some call sites that would benefit from
> different flags. One such case is preventing direct reclaim from occurring
> during readahead allocations. With unbounded reclaim during this time,
> noticeable latency will occur under high memory pressure.
> 
> Provide an additional API that accepts one additional gfp_t parameter,
> giving callers flexibility over the characteristics of their allocation.

If you still need to set the gfp flags in v2, please drop this patch and
extend btrfs_alloc_compr_folio(), the API is internal and we don't need
it fine grained.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-20 17:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20  7:34 [PATCH 0/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
2026-03-20  7:34 ` [PATCH 1/2] btrfs: additional gfp api for allocating compressed folios JP Kobryn (Meta)
2026-03-20 11:11   ` Mark Harmstone
2026-03-20 17:55   ` David Sterba
2026-03-20  7:34 ` [PATCH 2/2] btrfs: prevent direct reclaim during compressed readahead JP Kobryn (Meta)
2026-03-20  7:36   ` Christoph Hellwig
2026-03-20 16:17     ` JP Kobryn (Meta)
2026-03-20 10:12   ` Qu Wenruo
2026-03-20 11:14     ` Mark Harmstone
2026-03-20 11:11   ` Mark Harmstone
2026-03-20 11:10 ` [PATCH 0/2] " Mark Harmstone

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox