* [PATCH v2 0/4] btrfs: prepare compression for bs > ps support
@ 2025-09-10 5:18 Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 1/4] btrfs: prepare compression folio alloc/free for bs > ps cases Qu Wenruo
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-09-10 5:18 UTC (permalink / raw)
To: linux-btrfs
[CHANGELOG]
v2:
- Fix a missing callsite inside btrfs_compress_file_range() which only
zeros the range inside the first page
The folio_zero_range() of the last compressed folio should cover the
full folio, not only the first page.
This is the compression part support for bs > ps cases.
The main trick involved is the handling of compr folios, the main
changes are:
- Compressed folios now need to follow the minimal order
This is the requirement for the recently added btrfs_for_each_block*()
helpers, and this keeps our code from handling sub-block sized ranges.
- No cached compression folios for bs > ps cases
Those folios are large and are not sharable between other fses, and
most of btrfs will use 4K (until storage with 16K block size got
popular).
- Extra rejection of HIGHMEM systems with bs > ps support
Unfortunately HIGHMEM large folios need us to map them page by page,
this breaks our principle of no sub-block handling.
Considering HIGHMEM is always a pain in the backend and is already
planned for deprecation, it's best for everyone to just reject bs > ps
btrfses on HIGHMEM systems.
Please still keep in mind that, raid56, scrub, encoded write are not yet
supporting bs > ps cases.
For now I have only done basic read/write/balance/offline data check
tests on bs > ps cases with all 4 compression algorithms (none, lzo, zlib,
zstd), so far so good.
If some one wants to play with the incomplete bs > ps cases, the
following simple diff will enable the work:
--- a/fs/btrfs/fs.c
+++ b/fs/btrfs/fs.c
@@ -96,8 +96,7 @@ bool __attribute_const__ btrfs_supported_blocksize(u32 blocksize)
*/
if (IS_ENABLED(CONFIG_HIGHMEM) && blocksize > PAGE_SIZE)
return false;
- if (blocksize <= PAGE_SIZE)
- return true;
+ return true;
#endif
return false;
}
The remaining features and their road maps are:
- Encoded writes
This should be the most simple part.
- RAID56
Needs to convert the page usage into folio one first.
- Scrub
This relies on some RAID56 interfaces for parity handling.
Otherwise pretty like RAID56, we need to convert the page usage to
folios first.
Qu Wenruo (4):
btrfs: prepare compression folio alloc/free for bs > ps cases
btrfs: prepare zstd to support bs > ps cases
btrfs: prepare lzo to support bs > ps cases
btrfs: prepare zlib to support bs > ps cases
fs/btrfs/compression.c | 38 +++++++++++++++++++-------
fs/btrfs/compression.h | 2 +-
fs/btrfs/extent_io.c | 7 +++--
fs/btrfs/extent_io.h | 3 ++-
fs/btrfs/fs.c | 17 ++++++++++++
fs/btrfs/fs.h | 6 +++++
fs/btrfs/inode.c | 16 ++++++-----
fs/btrfs/lzo.c | 59 ++++++++++++++++++++++-------------------
fs/btrfs/zlib.c | 60 +++++++++++++++++++++++++++---------------
fs/btrfs/zstd.c | 44 +++++++++++++++++--------------
10 files changed, 163 insertions(+), 89 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/4] btrfs: prepare compression folio alloc/free for bs > ps cases
2025-09-10 5:18 [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
@ 2025-09-10 5:18 ` Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 2/4] btrfs: prepare zstd to support " Qu Wenruo
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-09-10 5:18 UTC (permalink / raw)
To: linux-btrfs
This includes the following preparation for bs > ps cases:
- Always alloc/free the folio directly if bs > ps
This adds a new @fs_info parameter for btrfs_alloc_compr_folio(), thus
affecting all compression algorithms.
For btrfs_free_compr_folio() it needs no parameter for now, as we can
use the folio size to skip the caching part.
For now the change is just to passing a @fs_info into the function,
all the folio size assumption is still based on page size.
- Properly zero the last folio in compress_file_range()
Since the compressed folios can be larger than a page, we need to
properly zero the whole folio.
- Use correct folio size for btrfs_add_compressed_bio_folios()
Instead of page size, use the correct folio size.
- Use correct folio size/shift for btrfs_compress_filemap_get_folio()
As we are not only using simple page sized folios anymore.
- Skip readahead for compressed pages
Similar to subpage cases.
- Make btrfs_alloc_folio_array() to accept a new @order parameter
- Add a helper to calculate the minimal folio size
All those changes should not affect the existing bs <= ps handling.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.c | 38 ++++++++++++++++++++++++++++----------
fs/btrfs/compression.h | 2 +-
fs/btrfs/extent_io.c | 7 +++++--
fs/btrfs/extent_io.h | 3 ++-
fs/btrfs/fs.h | 6 ++++++
fs/btrfs/inode.c | 16 +++++++++-------
fs/btrfs/lzo.c | 18 ++++++++++--------
fs/btrfs/zlib.c | 13 +++++++------
fs/btrfs/zstd.c | 15 ++++++++-------
9 files changed, 76 insertions(+), 42 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 068339e86123..969c79fbaa1e 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -223,10 +223,14 @@ static unsigned long btrfs_compr_pool_scan(struct shrinker *sh, struct shrink_co
/*
* Common wrappers for page allocation from compression wrappers
*/
-struct folio *btrfs_alloc_compr_folio(void)
+struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
{
struct folio *folio = NULL;
+ /* For bs > ps cases, no cached folio pool for now. */
+ if (fs_info->block_min_order)
+ goto alloc;
+
spin_lock(&compr_pool.lock);
if (compr_pool.count > 0) {
folio = list_first_entry(&compr_pool.list, struct folio, lru);
@@ -238,13 +242,18 @@ struct folio *btrfs_alloc_compr_folio(void)
if (folio)
return folio;
- return folio_alloc(GFP_NOFS, 0);
+alloc:
+ return folio_alloc(GFP_NOFS, fs_info->block_min_order);
}
void btrfs_free_compr_folio(struct folio *folio)
{
bool do_free = false;
+ /* The folio is from bs > ps fs, no cached pool for now. */
+ if (folio_order(folio))
+ goto free;
+
spin_lock(&compr_pool.lock);
if (compr_pool.count > compr_pool.thresh) {
do_free = true;
@@ -257,6 +266,7 @@ void btrfs_free_compr_folio(struct folio *folio)
if (!do_free)
return;
+free:
ASSERT(folio_ref_count(folio) == 1);
folio_put(folio);
}
@@ -344,16 +354,19 @@ static void end_bbio_compressed_write(struct btrfs_bio *bbio)
static void btrfs_add_compressed_bio_folios(struct compressed_bio *cb)
{
+ struct btrfs_fs_info *fs_info = cb->bbio.fs_info;
struct bio *bio = &cb->bbio.bio;
u32 offset = 0;
while (offset < cb->compressed_len) {
+ struct folio *folio;
int ret;
- u32 len = min_t(u32, cb->compressed_len - offset, PAGE_SIZE);
+ u32 len = min_t(u32, cb->compressed_len - offset,
+ btrfs_min_folio_size(fs_info));
+ folio = cb->compressed_folios[offset >> (PAGE_SHIFT + fs_info->block_min_order)];
/* Maximum compressed extent is smaller than bio size limit. */
- ret = bio_add_folio(bio, cb->compressed_folios[offset >> PAGE_SHIFT],
- len, 0);
+ ret = bio_add_folio(bio, folio, len, 0);
ASSERT(ret);
offset += len;
}
@@ -443,6 +456,10 @@ static noinline int add_ra_bio_pages(struct inode *inode,
if (fs_info->sectorsize < PAGE_SIZE)
return 0;
+ /* For bs > ps cases, we don't support readahead for compressed folios for now. */
+ if (fs_info->block_min_order)
+ return 0;
+
end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
while (cur < compressed_end) {
@@ -606,14 +623,15 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
btrfs_free_extent_map(em);
- cb->nr_folios = DIV_ROUND_UP(compressed_len, PAGE_SIZE);
+ cb->nr_folios = DIV_ROUND_UP(compressed_len, btrfs_min_folio_size(fs_info));
cb->compressed_folios = kcalloc(cb->nr_folios, sizeof(struct folio *), GFP_NOFS);
if (!cb->compressed_folios) {
status = BLK_STS_RESOURCE;
goto out_free_bio;
}
- ret = btrfs_alloc_folio_array(cb->nr_folios, cb->compressed_folios);
+ ret = btrfs_alloc_folio_array(cb->nr_folios, fs_info->block_min_order,
+ cb->compressed_folios);
if (ret) {
status = BLK_STS_RESOURCE;
goto out_free_compressed_pages;
@@ -1033,12 +1051,12 @@ int btrfs_compress_filemap_get_folio(struct address_space *mapping, u64 start,
* - compression algo are 0-3
* - the level are bits 4-7
*
- * @out_pages is an in/out parameter, holds maximum number of pages to allocate
- * and returns number of actually allocated pages
+ * @out_folios is an in/out parameter, holds maximum number of folios to allocate
+ * and returns number of actually allocated folios
*
* @total_in is used to return the number of bytes actually read. It
* may be smaller than the input length if we had to exit early because we
- * ran out of room in the pages array or because we cross the
+ * ran out of room in the folios array or because we cross the
* max_out threshold.
*
* @total_out is an in/out parameter, must be set to the input length and will
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 760d4aac74e6..41d93a977d69 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -112,7 +112,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio);
int btrfs_compress_str2level(unsigned int type, const char *str);
-struct folio *btrfs_alloc_compr_folio(void);
+struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info);
void btrfs_free_compr_folio(struct folio *folio);
struct workspace_manager {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ca7174fa0240..258658856195 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -618,6 +618,7 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
* Populate every free slot in a provided array with folios using GFP_NOFS.
*
* @nr_folios: number of folios to allocate
+ * @order: the order of the folios to be allocated
* @folio_array: the array to fill with folios; any existing non-NULL entries in
* the array will be skipped
*
@@ -625,12 +626,13 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
* -ENOMEM otherwise, the partially allocated folios would be freed and
* the array slots zeroed
*/
-int btrfs_alloc_folio_array(unsigned int nr_folios, struct folio **folio_array)
+int btrfs_alloc_folio_array(unsigned int nr_folios, unsigned int order,
+ struct folio **folio_array)
{
for (int i = 0; i < nr_folios; i++) {
if (folio_array[i])
continue;
- folio_array[i] = folio_alloc(GFP_NOFS, 0);
+ folio_array[i] = folio_alloc(GFP_NOFS, order);
if (!folio_array[i])
goto error;
}
@@ -639,6 +641,7 @@ int btrfs_alloc_folio_array(unsigned int nr_folios, struct folio **folio_array)
for (int i = 0; i < nr_folios; i++) {
if (folio_array[i])
folio_put(folio_array[i]);
+ folio_array[i] = NULL;
}
return -ENOMEM;
}
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 61130786b9a3..5fcbfe44218c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -366,7 +366,8 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
bool nofail);
-int btrfs_alloc_folio_array(unsigned int nr_folios, struct folio **folio_array);
+int btrfs_alloc_folio_array(unsigned int nr_folios, unsigned int order,
+ struct folio **folio_array);
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
bool find_lock_delalloc_range(struct inode *inode,
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index bf4a1b75b0cf..4a48b19ea863 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -921,6 +921,12 @@ static inline gfp_t btrfs_alloc_write_mask(struct address_space *mapping)
return mapping_gfp_constraint(mapping, ~__GFP_FS);
}
+/* Return the minimal folio size of the fs. */
+static inline unsigned int btrfs_min_folio_size(struct btrfs_fs_info *fs_info)
+{
+ return 1 << (PAGE_SHIFT + fs_info->block_min_order);
+}
+
static inline u64 btrfs_get_fs_generation(const struct btrfs_fs_info *fs_info)
{
return READ_ONCE(fs_info->generation);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ad876779289e..6b52ab164f45 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -854,6 +854,8 @@ static void compress_file_range(struct btrfs_work *work)
struct btrfs_inode *inode = async_chunk->inode;
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct address_space *mapping = inode->vfs_inode.i_mapping;
+ const u32 min_folio_shift = PAGE_SHIFT + fs_info->block_min_order;
+ const u32 min_folio_size = btrfs_min_folio_size(fs_info);
u64 blocksize = fs_info->sectorsize;
u64 start = async_chunk->start;
u64 end = async_chunk->end;
@@ -864,7 +866,7 @@ static void compress_file_range(struct btrfs_work *work)
unsigned long nr_folios;
unsigned long total_compressed = 0;
unsigned long total_in = 0;
- unsigned int poff;
+ unsigned int loff;
int i;
int compress_type = fs_info->compress_type;
int compress_level = fs_info->compress_level;
@@ -902,8 +904,8 @@ static void compress_file_range(struct btrfs_work *work)
actual_end = min_t(u64, i_size, end + 1);
again:
folios = NULL;
- nr_folios = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
- nr_folios = min_t(unsigned long, nr_folios, BTRFS_MAX_COMPRESSED_PAGES);
+ nr_folios = (end >> min_folio_shift) - (start >> min_folio_shift) + 1;
+ nr_folios = min_t(unsigned long, nr_folios, BTRFS_MAX_COMPRESSED >> min_folio_shift);
/*
* we don't want to send crud past the end of i_size through
@@ -965,12 +967,12 @@ static void compress_file_range(struct btrfs_work *work)
goto mark_incompressible;
/*
- * Zero the tail end of the last page, as we might be sending it down
+ * Zero the tail end of the last folio, as we might be sending it down
* to disk.
*/
- poff = offset_in_page(total_compressed);
- if (poff)
- folio_zero_range(folios[nr_folios - 1], poff, PAGE_SIZE - poff);
+ loff = total_compressed & (min_folio_size - 1);
+ if (loff)
+ folio_zero_range(folios[nr_folios - 1], loff, min_folio_size - loff);
/*
* Try to create an inline extent.
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 047d90e216f6..c5a25fd872bd 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -132,13 +132,14 @@ static inline size_t read_compress_length(const char *buf)
*
* Will allocate new pages when needed.
*/
-static int copy_compressed_data_to_page(char *compressed_data,
+static int copy_compressed_data_to_page(struct btrfs_fs_info *fs_info,
+ char *compressed_data,
size_t compressed_size,
struct folio **out_folios,
unsigned long max_nr_folio,
- u32 *cur_out,
- const u32 sectorsize)
+ u32 *cur_out)
{
+ const u32 sectorsize = fs_info->sectorsize;
u32 sector_bytes_left;
u32 orig_out;
struct folio *cur_folio;
@@ -156,7 +157,7 @@ static int copy_compressed_data_to_page(char *compressed_data,
cur_folio = out_folios[*cur_out / PAGE_SIZE];
/* Allocate a new page */
if (!cur_folio) {
- cur_folio = btrfs_alloc_compr_folio();
+ cur_folio = btrfs_alloc_compr_folio(fs_info);
if (!cur_folio)
return -ENOMEM;
out_folios[*cur_out / PAGE_SIZE] = cur_folio;
@@ -182,7 +183,7 @@ static int copy_compressed_data_to_page(char *compressed_data,
cur_folio = out_folios[*cur_out / PAGE_SIZE];
/* Allocate a new page */
if (!cur_folio) {
- cur_folio = btrfs_alloc_compr_folio();
+ cur_folio = btrfs_alloc_compr_folio(fs_info);
if (!cur_folio)
return -ENOMEM;
out_folios[*cur_out / PAGE_SIZE] = cur_folio;
@@ -217,8 +218,9 @@ int lzo_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
u64 start, struct folio **folios, unsigned long *out_folios,
unsigned long *total_in, unsigned long *total_out)
{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct workspace *workspace = list_entry(ws, struct workspace, list);
- const u32 sectorsize = inode->root->fs_info->sectorsize;
+ const u32 sectorsize = fs_info->sectorsize;
struct address_space *mapping = inode->vfs_inode.i_mapping;
struct folio *folio_in = NULL;
char *sizes_ptr;
@@ -268,9 +270,9 @@ int lzo_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
goto out;
}
- ret = copy_compressed_data_to_page(workspace->cbuf, out_len,
+ ret = copy_compressed_data_to_page(fs_info, workspace->cbuf, out_len,
folios, max_nr_folio,
- &cur_out, sectorsize);
+ &cur_out);
if (ret < 0)
goto out;
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index d72566a87afa..ccf77a0fa96c 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -136,6 +136,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
u64 start, struct folio **folios, unsigned long *out_folios,
unsigned long *total_in, unsigned long *total_out)
{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct workspace *workspace = list_entry(ws, struct workspace, list);
struct address_space *mapping = inode->vfs_inode.i_mapping;
int ret;
@@ -147,7 +148,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
unsigned long len = *total_out;
unsigned long nr_dest_folios = *out_folios;
const unsigned long max_out = nr_dest_folios * PAGE_SIZE;
- const u32 blocksize = inode->root->fs_info->sectorsize;
+ const u32 blocksize = fs_info->sectorsize;
const u64 orig_end = start + len;
*out_folios = 0;
@@ -156,7 +157,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret = zlib_deflateInit(&workspace->strm, workspace->level);
if (unlikely(ret != Z_OK)) {
- btrfs_err(inode->root->fs_info,
+ btrfs_err(fs_info,
"zlib compression init failed, error %d root %llu inode %llu offset %llu",
ret, btrfs_root_id(inode->root), btrfs_ino(inode), start);
ret = -EIO;
@@ -166,7 +167,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
workspace->strm.total_in = 0;
workspace->strm.total_out = 0;
- out_folio = btrfs_alloc_compr_folio();
+ out_folio = btrfs_alloc_compr_folio(fs_info);
if (out_folio == NULL) {
ret = -ENOMEM;
goto out;
@@ -224,7 +225,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
if (unlikely(ret != Z_OK)) {
- btrfs_warn(inode->root->fs_info,
+ btrfs_warn(fs_info,
"zlib compression failed, error %d root %llu inode %llu offset %llu",
ret, btrfs_root_id(inode->root), btrfs_ino(inode),
start);
@@ -249,7 +250,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret = -E2BIG;
goto out;
}
- out_folio = btrfs_alloc_compr_folio();
+ out_folio = btrfs_alloc_compr_folio(fs_info);
if (out_folio == NULL) {
ret = -ENOMEM;
goto out;
@@ -285,7 +286,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret = -E2BIG;
goto out;
}
- out_folio = btrfs_alloc_compr_folio();
+ out_folio = btrfs_alloc_compr_folio(fs_info);
if (out_folio == NULL) {
ret = -ENOMEM;
goto out;
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index b11a87842cda..28e2e99a2463 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -400,6 +400,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
u64 start, struct folio **folios, unsigned long *out_folios,
unsigned long *total_in, unsigned long *total_out)
{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct workspace *workspace = list_entry(ws, struct workspace, list);
struct address_space *mapping = inode->vfs_inode.i_mapping;
zstd_cstream *stream;
@@ -412,7 +413,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
unsigned long len = *total_out;
const unsigned long nr_dest_folios = *out_folios;
const u64 orig_end = start + len;
- const u32 blocksize = inode->root->fs_info->sectorsize;
+ const u32 blocksize = fs_info->sectorsize;
unsigned long max_out = nr_dest_folios * PAGE_SIZE;
unsigned int cur_len;
@@ -425,7 +426,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
stream = zstd_init_cstream(&workspace->params, len, workspace->mem,
workspace->size);
if (unlikely(!stream)) {
- btrfs_err(inode->root->fs_info,
+ btrfs_err(fs_info,
"zstd compression init level %d failed, root %llu inode %llu offset %llu",
workspace->req_level, btrfs_root_id(inode->root),
btrfs_ino(inode), start);
@@ -443,7 +444,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
workspace->in_buf.size = cur_len;
/* Allocate and map in the output buffer */
- out_folio = btrfs_alloc_compr_folio();
+ out_folio = btrfs_alloc_compr_folio(fs_info);
if (out_folio == NULL) {
ret = -ENOMEM;
goto out;
@@ -459,7 +460,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret2 = zstd_compress_stream(stream, &workspace->out_buf,
&workspace->in_buf);
if (unlikely(zstd_is_error(ret2))) {
- btrfs_warn(inode->root->fs_info,
+ btrfs_warn(fs_info,
"zstd compression level %d failed, error %d root %llu inode %llu offset %llu",
workspace->req_level, zstd_get_error_code(ret2),
btrfs_root_id(inode->root), btrfs_ino(inode),
@@ -491,7 +492,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret = -E2BIG;
goto out;
}
- out_folio = btrfs_alloc_compr_folio();
+ out_folio = btrfs_alloc_compr_folio(fs_info);
if (out_folio == NULL) {
ret = -ENOMEM;
goto out;
@@ -532,7 +533,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret2 = zstd_end_stream(stream, &workspace->out_buf);
if (unlikely(zstd_is_error(ret2))) {
- btrfs_err(inode->root->fs_info,
+ btrfs_err(fs_info,
"zstd compression end level %d failed, error %d root %llu inode %llu offset %llu",
workspace->req_level, zstd_get_error_code(ret2),
btrfs_root_id(inode->root), btrfs_ino(inode),
@@ -556,7 +557,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
ret = -E2BIG;
goto out;
}
- out_folio = btrfs_alloc_compr_folio();
+ out_folio = btrfs_alloc_compr_folio(fs_info);
if (out_folio == NULL) {
ret = -ENOMEM;
goto out;
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/4] btrfs: prepare zstd to support bs > ps cases
2025-09-10 5:18 [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 1/4] btrfs: prepare compression folio alloc/free for bs > ps cases Qu Wenruo
@ 2025-09-10 5:18 ` Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 3/4] btrfs: prepare lzo " Qu Wenruo
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-09-10 5:18 UTC (permalink / raw)
To: linux-btrfs
This involves converting the following functions to use proper folio
sizes/shifts:
- zstd_compress_folios()
- zstd_decompress_bio()
The function zstd_decompress() is already using block size correctly
without using page size, thus it needs no modification.
And since zstd compression is calling kmap_local_folio(), the existing
code can not handle large folios with HIGHMEM, as kmap_local_folio()
requires us to handle one page range each time.
I do not really think it's worthy to spend time on some feature that
will be deprecated eventually.
So here just add an extra explicit rejection for bs > ps with HIGHMEM
feature enabled kernels.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/fs.c | 17 +++++++++++++++++
fs/btrfs/zstd.c | 29 ++++++++++++++++-------------
2 files changed, 33 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/fs.c b/fs/btrfs/fs.c
index 014fb8b12f96..35084b4e498b 100644
--- a/fs/btrfs/fs.c
+++ b/fs/btrfs/fs.c
@@ -79,6 +79,23 @@ bool __attribute_const__ btrfs_supported_blocksize(u32 blocksize)
if (blocksize == PAGE_SIZE || blocksize == SZ_4K || blocksize == BTRFS_MIN_BLOCKSIZE)
return true;
#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ /*
+ * For bs > ps support it's done by specifying a minimal folio order
+ * for filemap, thus implying large data folios.
+ * For HIGHMEM systems, we can not always access the content of a (large)
+ * folio in one go, but go through them page by page.
+ *
+ * A lot of features doesn't implement a proper PAGE sized loop for large
+ * folios, this includes:
+ * - compression
+ * - verity
+ * - encoded write
+ *
+ * Considering HIGHMEM is such a pain in the backend and it's going
+ * to be deprecated eventually, just reject HIGHMEM && bs > ps cases.
+ */
+ if (IS_ENABLED(CONFIG_HIGHMEM) && blocksize > PAGE_SIZE)
+ return false;
if (blocksize <= PAGE_SIZE)
return true;
#endif
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index 28e2e99a2463..2f1593ddef4a 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -414,7 +414,8 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
const unsigned long nr_dest_folios = *out_folios;
const u64 orig_end = start + len;
const u32 blocksize = fs_info->sectorsize;
- unsigned long max_out = nr_dest_folios * PAGE_SIZE;
+ const u32 min_folio_size = btrfs_min_folio_size(fs_info);
+ unsigned long max_out = nr_dest_folios * min_folio_size;
unsigned int cur_len;
workspace->params = zstd_get_btrfs_parameters(workspace->req_level, len);
@@ -452,7 +453,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
folios[nr_folios++] = out_folio;
workspace->out_buf.dst = folio_address(out_folio);
workspace->out_buf.pos = 0;
- workspace->out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+ workspace->out_buf.size = min_t(size_t, max_out, min_folio_size);
while (1) {
size_t ret2;
@@ -486,8 +487,8 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
/* Check if we need more output space */
if (workspace->out_buf.pos == workspace->out_buf.size) {
- tot_out += PAGE_SIZE;
- max_out -= PAGE_SIZE;
+ tot_out += min_folio_size;
+ max_out -= min_folio_size;
if (nr_folios == nr_dest_folios) {
ret = -E2BIG;
goto out;
@@ -501,7 +502,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
workspace->out_buf.dst = folio_address(out_folio);
workspace->out_buf.pos = 0;
workspace->out_buf.size = min_t(size_t, max_out,
- PAGE_SIZE);
+ min_folio_size);
}
/* We've reached the end of the input */
@@ -551,8 +552,8 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
goto out;
}
- tot_out += PAGE_SIZE;
- max_out -= PAGE_SIZE;
+ tot_out += min_folio_size;
+ max_out -= min_folio_size;
if (nr_folios == nr_dest_folios) {
ret = -E2BIG;
goto out;
@@ -565,7 +566,7 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
folios[nr_folios++] = out_folio;
workspace->out_buf.dst = folio_address(out_folio);
workspace->out_buf.pos = 0;
- workspace->out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+ workspace->out_buf.size = min_t(size_t, max_out, min_folio_size);
}
if (tot_out >= tot_in) {
@@ -587,14 +588,16 @@ int zstd_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
{
+ struct btrfs_fs_info *fs_info = cb_to_fs_info(cb);
struct workspace *workspace = list_entry(ws, struct workspace, list);
struct folio **folios_in = cb->compressed_folios;
size_t srclen = cb->compressed_len;
zstd_dstream *stream;
int ret = 0;
- const u32 blocksize = cb_to_fs_info(cb)->sectorsize;
+ const u32 blocksize = fs_info->sectorsize;
+ const unsigned int min_folio_size = btrfs_min_folio_size(fs_info);
unsigned long folio_in_index = 0;
- unsigned long total_folios_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
+ unsigned long total_folios_in = DIV_ROUND_UP(srclen, min_folio_size);
unsigned long buf_start;
unsigned long total_out = 0;
@@ -612,7 +615,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
workspace->in_buf.src = kmap_local_folio(folios_in[folio_in_index], 0);
workspace->in_buf.pos = 0;
- workspace->in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
+ workspace->in_buf.size = min_t(size_t, srclen, min_folio_size);
workspace->out_buf.dst = workspace->buf;
workspace->out_buf.pos = 0;
@@ -657,11 +660,11 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
ret = -EIO;
goto done;
}
- srclen -= PAGE_SIZE;
+ srclen -= min_folio_size;
workspace->in_buf.src =
kmap_local_folio(folios_in[folio_in_index], 0);
workspace->in_buf.pos = 0;
- workspace->in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
+ workspace->in_buf.size = min_t(size_t, srclen, min_folio_size);
}
}
ret = 0;
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/4] btrfs: prepare lzo to support bs > ps cases
2025-09-10 5:18 [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 1/4] btrfs: prepare compression folio alloc/free for bs > ps cases Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 2/4] btrfs: prepare zstd to support " Qu Wenruo
@ 2025-09-10 5:18 ` Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 4/4] btrfs: prepare zlib " Qu Wenruo
2025-09-15 6:53 ` [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-09-10 5:18 UTC (permalink / raw)
To: linux-btrfs
This involves converting the following functions to use correct folio
sizes/shifts:
- copy_compress_data_to_page()
- lzo_compress_folios()
- lzo_decompress_bio()
Just like zstd, lzo has some extra incorrect usage of kmap_local_folio()
that the offset is always 0.
This will not handle HIGHMEM large folios correct, but those cases are
already rejected explicitly so it should not cause problems when bs > ps
support is enabled.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/lzo.c | 41 ++++++++++++++++++++++-------------------
1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index c5a25fd872bd..bc0890f3c2bb 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -140,12 +140,13 @@ static int copy_compressed_data_to_page(struct btrfs_fs_info *fs_info,
u32 *cur_out)
{
const u32 sectorsize = fs_info->sectorsize;
+ const u32 min_folio_shift = PAGE_SHIFT + fs_info->block_min_order;
u32 sector_bytes_left;
u32 orig_out;
struct folio *cur_folio;
char *kaddr;
- if ((*cur_out / PAGE_SIZE) >= max_nr_folio)
+ if ((*cur_out >> min_folio_shift) >= max_nr_folio)
return -E2BIG;
/*
@@ -154,18 +155,17 @@ static int copy_compressed_data_to_page(struct btrfs_fs_info *fs_info,
*/
ASSERT((*cur_out / sectorsize) == (*cur_out + LZO_LEN - 1) / sectorsize);
- cur_folio = out_folios[*cur_out / PAGE_SIZE];
+ cur_folio = out_folios[*cur_out >> min_folio_shift];
/* Allocate a new page */
if (!cur_folio) {
cur_folio = btrfs_alloc_compr_folio(fs_info);
if (!cur_folio)
return -ENOMEM;
- out_folios[*cur_out / PAGE_SIZE] = cur_folio;
+ out_folios[*cur_out >> min_folio_shift] = cur_folio;
}
- kaddr = kmap_local_folio(cur_folio, 0);
- write_compress_length(kaddr + offset_in_page(*cur_out),
- compressed_size);
+ kaddr = kmap_local_folio(cur_folio, offset_in_folio(cur_folio, *cur_out));
+ write_compress_length(kaddr, compressed_size);
*cur_out += LZO_LEN;
orig_out = *cur_out;
@@ -177,20 +177,20 @@ static int copy_compressed_data_to_page(struct btrfs_fs_info *fs_info,
kunmap_local(kaddr);
- if ((*cur_out / PAGE_SIZE) >= max_nr_folio)
+ if ((*cur_out >> min_folio_shift) >= max_nr_folio)
return -E2BIG;
- cur_folio = out_folios[*cur_out / PAGE_SIZE];
+ cur_folio = out_folios[*cur_out >> min_folio_shift];
/* Allocate a new page */
if (!cur_folio) {
cur_folio = btrfs_alloc_compr_folio(fs_info);
if (!cur_folio)
return -ENOMEM;
- out_folios[*cur_out / PAGE_SIZE] = cur_folio;
+ out_folios[*cur_out >> min_folio_shift] = cur_folio;
}
kaddr = kmap_local_folio(cur_folio, 0);
- memcpy(kaddr + offset_in_page(*cur_out),
+ memcpy(kaddr + offset_in_folio(cur_folio, *cur_out),
compressed_data + *cur_out - orig_out, copy_len);
*cur_out += copy_len;
@@ -221,6 +221,7 @@ int lzo_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct workspace *workspace = list_entry(ws, struct workspace, list);
const u32 sectorsize = fs_info->sectorsize;
+ const u32 min_folio_size = btrfs_min_folio_size(fs_info);
struct address_space *mapping = inode->vfs_inode.i_mapping;
struct folio *folio_in = NULL;
char *sizes_ptr;
@@ -287,8 +288,8 @@ int lzo_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
goto out;
}
- /* Check if we have reached page boundary */
- if (PAGE_ALIGNED(cur_in)) {
+ /* Check if we have reached folio boundary */
+ if (IS_ALIGNED(cur_in, min_folio_size)) {
folio_put(folio_in);
folio_in = NULL;
}
@@ -305,7 +306,7 @@ int lzo_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
out:
if (folio_in)
folio_put(folio_in);
- *out_folios = DIV_ROUND_UP(cur_out, PAGE_SIZE);
+ *out_folios = DIV_ROUND_UP(cur_out, min_folio_size);
return ret;
}
@@ -317,15 +318,16 @@ int lzo_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
static void copy_compressed_segment(struct compressed_bio *cb,
char *dest, u32 len, u32 *cur_in)
{
+ struct btrfs_fs_info *fs_info = cb_to_fs_info(cb);
+ const u32 min_folio_shift = PAGE_SHIFT + fs_info->block_min_order;
u32 orig_in = *cur_in;
while (*cur_in < orig_in + len) {
- struct folio *cur_folio;
- u32 copy_len = min_t(u32, PAGE_SIZE - offset_in_page(*cur_in),
- orig_in + len - *cur_in);
+ struct folio *cur_folio = cb->compressed_folios[*cur_in >> min_folio_shift];
+ u32 copy_len = min_t(u32, orig_in + len - *cur_in,
+ folio_size(cur_folio) - offset_in_folio(cur_folio, *cur_in));
ASSERT(copy_len);
- cur_folio = cb->compressed_folios[*cur_in / PAGE_SIZE];
memcpy_from_folio(dest + *cur_in - orig_in, cur_folio,
offset_in_folio(cur_folio, *cur_in), copy_len);
@@ -339,6 +341,7 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
struct workspace *workspace = list_entry(ws, struct workspace, list);
const struct btrfs_fs_info *fs_info = cb->bbio.inode->root->fs_info;
const u32 sectorsize = fs_info->sectorsize;
+ const u32 min_folio_shift = PAGE_SHIFT + fs_info->block_min_order;
char *kaddr;
int ret;
/* Compressed data length, can be unaligned */
@@ -385,10 +388,10 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
*/
ASSERT(cur_in / sectorsize ==
(cur_in + LZO_LEN - 1) / sectorsize);
- cur_folio = cb->compressed_folios[cur_in / PAGE_SIZE];
+ cur_folio = cb->compressed_folios[cur_in >> min_folio_shift];
ASSERT(cur_folio);
kaddr = kmap_local_folio(cur_folio, 0);
- seg_len = read_compress_length(kaddr + offset_in_page(cur_in));
+ seg_len = read_compress_length(kaddr + offset_in_folio(cur_folio, cur_in));
kunmap_local(kaddr);
cur_in += LZO_LEN;
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/4] btrfs: prepare zlib to support bs > ps cases
2025-09-10 5:18 [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
` (2 preceding siblings ...)
2025-09-10 5:18 ` [PATCH v2 3/4] btrfs: prepare lzo " Qu Wenruo
@ 2025-09-10 5:18 ` Qu Wenruo
2025-09-15 6:53 ` [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-09-10 5:18 UTC (permalink / raw)
To: linux-btrfs
This involves converting the following functions to use correct folio
sizes/shifts:
- zlib_compress_folios()
- zlib_decompress_bio()
There is a special handling for s390 hardware acceleration.
With bs > ps cases, we can go with 16K block size on s390 (which uses
fixed 4K page size).
In that case we do not need to do the buffer copy as our folio is large
enough for hardware acceleration.
So extract the s390 specific and folio size check into a helper,
need_special_buffer().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/zlib.c | 47 ++++++++++++++++++++++++++++++++---------------
1 file changed, 32 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index ccf77a0fa96c..889af188a924 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -53,6 +53,22 @@ void zlib_free_workspace(struct list_head *ws)
kfree(workspace);
}
+/*
+ * For s390 hardware acceleration, the buffer size should be at least
+ * ZLIB_DFLTCC_BUF_SIZE to achieve the best performance.
+ *
+ * But if bs > ps we can have large enough folios that meets the s390 hardware
+ * handling.
+ */
+static bool need_special_buffer(struct btrfs_fs_info *fs_info)
+{
+ if (!zlib_deflate_dfltcc_enabled())
+ return false;
+ if (btrfs_min_folio_size(fs_info) >= ZLIB_DFLTCC_BUF_SIZE)
+ return false;
+ return true;
+}
+
struct list_head *zlib_alloc_workspace(struct btrfs_fs_info *fs_info, unsigned int level)
{
const u32 blocksize = fs_info->sectorsize;
@@ -68,11 +84,7 @@ struct list_head *zlib_alloc_workspace(struct btrfs_fs_info *fs_info, unsigned i
workspace->strm.workspace = kvzalloc(workspacesize, GFP_KERNEL | __GFP_NOWARN);
workspace->level = level;
workspace->buf = NULL;
- /*
- * In case of s390 zlib hardware support, allocate lager workspace
- * buffer. If allocator fails, fall back to a single page buffer.
- */
- if (zlib_deflate_dfltcc_enabled()) {
+ if (need_special_buffer(fs_info)) {
workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
__GFP_NOMEMALLOC | __GFP_NORETRY |
__GFP_NOWARN | GFP_NOIO);
@@ -139,6 +151,8 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct workspace *workspace = list_entry(ws, struct workspace, list);
struct address_space *mapping = inode->vfs_inode.i_mapping;
+ const u32 min_folio_shift = PAGE_SHIFT + fs_info->block_min_order;
+ const u32 min_folio_size = btrfs_min_folio_size(fs_info);
int ret;
char *data_in = NULL;
char *cfolio_out;
@@ -147,7 +161,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
struct folio *out_folio = NULL;
unsigned long len = *total_out;
unsigned long nr_dest_folios = *out_folios;
- const unsigned long max_out = nr_dest_folios * PAGE_SIZE;
+ const unsigned long max_out = nr_dest_folios << min_folio_shift;
const u32 blocksize = fs_info->sectorsize;
const u64 orig_end = start + len;
@@ -179,7 +193,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
workspace->strm.next_in = workspace->buf;
workspace->strm.avail_in = 0;
workspace->strm.next_out = cfolio_out;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = min_folio_size;
while (workspace->strm.total_in < len) {
/*
@@ -191,10 +205,11 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
unsigned int copy_length = min(bytes_left, workspace->buf_size);
/*
- * This can only happen when hardware zlib compression is
- * enabled.
+ * For s390 hardware accelerated zlib, and our folio is smaller
+ * than the copy_length, we need to fill the buffer so that
+ * we can take full advantage of hardware acceleration.
*/
- if (copy_length > PAGE_SIZE) {
+ if (need_special_buffer(fs_info)) {
ret = copy_data_into_buffer(mapping, workspace,
start, copy_length);
if (ret < 0)
@@ -258,7 +273,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
cfolio_out = folio_address(out_folio);
folios[nr_folios] = out_folio;
nr_folios++;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = min_folio_size;
workspace->strm.next_out = cfolio_out;
}
/* we're all done */
@@ -294,7 +309,7 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
cfolio_out = folio_address(out_folio);
folios[nr_folios] = out_folio;
nr_folios++;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = min_folio_size;
workspace->strm.next_out = cfolio_out;
}
}
@@ -320,20 +335,22 @@ int zlib_compress_folios(struct list_head *ws, struct btrfs_inode *inode,
int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
{
+ struct btrfs_fs_info *fs_info = cb_to_fs_info(cb);
struct workspace *workspace = list_entry(ws, struct workspace, list);
+ const u32 min_folio_size = btrfs_min_folio_size(fs_info);
int ret = 0, ret2;
int wbits = MAX_WBITS;
char *data_in;
size_t total_out = 0;
unsigned long folio_in_index = 0;
size_t srclen = cb->compressed_len;
- unsigned long total_folios_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
+ unsigned long total_folios_in = DIV_ROUND_UP(srclen, min_folio_size);
unsigned long buf_start;
struct folio **folios_in = cb->compressed_folios;
data_in = kmap_local_folio(folios_in[folio_in_index], 0);
workspace->strm.next_in = data_in;
- workspace->strm.avail_in = min_t(size_t, srclen, PAGE_SIZE);
+ workspace->strm.avail_in = min_t(size_t, srclen, min_folio_size);
workspace->strm.total_in = 0;
workspace->strm.total_out = 0;
@@ -394,7 +411,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
data_in = kmap_local_folio(folios_in[folio_in_index], 0);
workspace->strm.next_in = data_in;
tmp = srclen - workspace->strm.total_in;
- workspace->strm.avail_in = min(tmp, PAGE_SIZE);
+ workspace->strm.avail_in = min(tmp, min_folio_size);
}
}
if (unlikely(ret != Z_STREAM_END)) {
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/4] btrfs: prepare compression for bs > ps support
2025-09-10 5:18 [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
` (3 preceding siblings ...)
2025-09-10 5:18 ` [PATCH v2 4/4] btrfs: prepare zlib " Qu Wenruo
@ 2025-09-15 6:53 ` Qu Wenruo
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-09-15 6:53 UTC (permalink / raw)
To: linux-btrfs
在 2025/9/10 14:48, Qu Wenruo 写道:
> [CHANGELOG]
> v2:
> - Fix a missing callsite inside btrfs_compress_file_range() which only
> zeros the range inside the first page
> The folio_zero_range() of the last compressed folio should cover the
> full folio, not only the first page.
And there is a missing call site in btrfs_decompress() which uses
ASSERT() to check against PAGE_SIZE, not folio size and can crash
btrfs/056 during tests.
(Yep, I'm already testing bs > ps with fstests now, and it can reach
btrfs/056 except a weird crash in btrfs/004 that I'm still debugging)
Will update the series when no more compression bugs exposed by default
fstests runs.
Thanks,
Qu
>
> This is the compression part support for bs > ps cases.
>
> The main trick involved is the handling of compr folios, the main
> changes are:
>
> - Compressed folios now need to follow the minimal order
> This is the requirement for the recently added btrfs_for_each_block*()
> helpers, and this keeps our code from handling sub-block sized ranges.
>
> - No cached compression folios for bs > ps cases
> Those folios are large and are not sharable between other fses, and
> most of btrfs will use 4K (until storage with 16K block size got
> popular).
>
> - Extra rejection of HIGHMEM systems with bs > ps support
> Unfortunately HIGHMEM large folios need us to map them page by page,
> this breaks our principle of no sub-block handling.
>
> Considering HIGHMEM is always a pain in the backend and is already
> planned for deprecation, it's best for everyone to just reject bs > ps
> btrfses on HIGHMEM systems.
>
> Please still keep in mind that, raid56, scrub, encoded write are not yet
> supporting bs > ps cases.
>
> For now I have only done basic read/write/balance/offline data check
> tests on bs > ps cases with all 4 compression algorithms (none, lzo, zlib,
> zstd), so far so good.
>
> If some one wants to play with the incomplete bs > ps cases, the
> following simple diff will enable the work:
>
> --- a/fs/btrfs/fs.c
> +++ b/fs/btrfs/fs.c
> @@ -96,8 +96,7 @@ bool __attribute_const__ btrfs_supported_blocksize(u32 blocksize)
> */
> if (IS_ENABLED(CONFIG_HIGHMEM) && blocksize > PAGE_SIZE)
> return false;
> - if (blocksize <= PAGE_SIZE)
> - return true;
> + return true;
> #endif
> return false;
> }
>
> The remaining features and their road maps are:
>
> - Encoded writes
> This should be the most simple part.
>
> - RAID56
> Needs to convert the page usage into folio one first.
>
> - Scrub
> This relies on some RAID56 interfaces for parity handling.
> Otherwise pretty like RAID56, we need to convert the page usage to
> folios first.
>
> Qu Wenruo (4):
> btrfs: prepare compression folio alloc/free for bs > ps cases
> btrfs: prepare zstd to support bs > ps cases
> btrfs: prepare lzo to support bs > ps cases
> btrfs: prepare zlib to support bs > ps cases
>
> fs/btrfs/compression.c | 38 +++++++++++++++++++-------
> fs/btrfs/compression.h | 2 +-
> fs/btrfs/extent_io.c | 7 +++--
> fs/btrfs/extent_io.h | 3 ++-
> fs/btrfs/fs.c | 17 ++++++++++++
> fs/btrfs/fs.h | 6 +++++
> fs/btrfs/inode.c | 16 ++++++-----
> fs/btrfs/lzo.c | 59 ++++++++++++++++++++++-------------------
> fs/btrfs/zlib.c | 60 +++++++++++++++++++++++++++---------------
> fs/btrfs/zstd.c | 44 +++++++++++++++++--------------
> 10 files changed, 163 insertions(+), 89 deletions(-)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-15 6:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 5:18 [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 1/4] btrfs: prepare compression folio alloc/free for bs > ps cases Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 2/4] btrfs: prepare zstd to support " Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 3/4] btrfs: prepare lzo " Qu Wenruo
2025-09-10 5:18 ` [PATCH v2 4/4] btrfs: prepare zlib " Qu Wenruo
2025-09-15 6:53 ` [PATCH v2 0/4] btrfs: prepare compression for bs > ps support Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox