* [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-08 9:08 [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage Qu Wenruo
@ 2024-01-08 9:08 ` Qu Wenruo
2024-01-09 3:02 ` kernel test robot
2024-01-08 9:08 ` [PATCH 2/3] btrfs: lzo: " Qu Wenruo
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-01-08 9:08 UTC (permalink / raw)
To: linux-btrfs
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 1 (zlib)
Which has an inline compressed extent at file offset 0, and its
decompressed size is 4K, allowing us to reflink that 4K range to another
location (which will not be compressed).
If we do such reflink on a subpage system, it would fail like this:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zlib_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should switch our output
buffer.
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer, since the whole input/output
buffer should never exceed one sector.
(The above assumption is only not true if we're going to support
multi-page sectorsize)
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modification:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Add extra ASSERT() inside btrfs_decompress() to make sure the
input/output size never exceed one sector.
- Use Z_FINISH flag to make sure the decompression happens in one go
- Remove all the loop needed to switch input/output buffer
- Use correct destination offset inside the destination page
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And resulted a correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.c | 23 +++++++++----
fs/btrfs/compression.h | 2 +-
fs/btrfs/super.h | 3 ++
fs/btrfs/zlib.c | 73 +++++++++++-------------------------------
4 files changed, 39 insertions(+), 62 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 3d8fc2ad0f42..9cf7d38dc66c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -141,16 +141,16 @@ static int compression_decompress_bio(struct list_head *ws,
}
static int compression_decompress(int type, struct list_head *ws,
- const u8 *data_in, struct page *dest_page,
- unsigned long start_byte, size_t srclen, size_t destlen)
+ const u8 *data_in, struct page *dest_page,
+ unsigned long dest_pgoff, size_t srclen, size_t destlen)
{
switch (type) {
case BTRFS_COMPRESS_ZLIB: return zlib_decompress(ws, data_in, dest_page,
- start_byte, srclen, destlen);
+ dest_pgoff, srclen, destlen);
case BTRFS_COMPRESS_LZO: return lzo_decompress(ws, data_in, dest_page,
- start_byte, srclen, destlen);
+ dest_pgoff, srclen, destlen);
case BTRFS_COMPRESS_ZSTD: return zstd_decompress(ws, data_in, dest_page,
- start_byte, srclen, destlen);
+ dest_pgoff, srclen, destlen);
case BTRFS_COMPRESS_NONE:
default:
/*
@@ -1037,14 +1037,23 @@ static int btrfs_decompress_bio(struct compressed_bio *cb)
* start_byte tells us the offset into the compressed data we're interested in
*/
int btrfs_decompress(int type, const u8 *data_in, struct page *dest_page,
- unsigned long start_byte, size_t srclen, size_t destlen)
+ unsigned long dest_pgoff, size_t srclen, size_t destlen)
{
+ struct btrfs_fs_info *fs_info = btrfs_sb(dest_page->mapping->host->i_sb);
struct list_head *workspace;
+ const u32 sectorsize = fs_info->sectorsize;
int ret;
+ /*
+ * The full destination page range should not exceed the page size.
+ * And the @destlen should not exceed sectorsize, as this is only called for
+ * inline file extents, which should not exceed sectorsize.
+ */
+ ASSERT(dest_pgoff + destlen <= PAGE_SIZE && destlen <= sectorsize);
+
workspace = get_workspace(type, 0);
ret = compression_decompress(type, workspace, data_in, dest_page,
- start_byte, srclen, destlen);
+ dest_pgoff, srclen, destlen);
put_workspace(type, workspace);
return ret;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 93cc92974dee..2b4dfb1b010c 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -148,7 +148,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
unsigned long *total_in, unsigned long *total_out);
int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
int zlib_decompress(struct list_head *ws, const u8 *data_in,
- struct page *dest_page, unsigned long start_byte, size_t srclen,
+ struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
size_t destlen);
struct list_head *zlib_alloc_workspace(unsigned int level);
void zlib_free_workspace(struct list_head *ws);
diff --git a/fs/btrfs/super.h b/fs/btrfs/super.h
index f18253ca280d..0dc3dd713cbb 100644
--- a/fs/btrfs/super.h
+++ b/fs/btrfs/super.h
@@ -3,6 +3,9 @@
#ifndef BTRFS_SUPER_H
#define BTRFS_SUPER_H
+#include <linux/fs.h>
+#include "fs.h"
+
bool btrfs_check_options(struct btrfs_fs_info *info, unsigned long *mount_opt,
unsigned long flags);
int btrfs_sync_fs(struct super_block *sb, int wait);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index 36cf1f0e338e..58cd23f2ca1c 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -354,18 +354,13 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
}
int zlib_decompress(struct list_head *ws, const u8 *data_in,
- struct page *dest_page, unsigned long start_byte, size_t srclen,
+ struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
size_t destlen)
{
struct workspace *workspace = list_entry(ws, struct workspace, list);
int ret = 0;
int wbits = MAX_WBITS;
- unsigned long bytes_left;
- unsigned long total_out = 0;
- unsigned long pg_offset = 0;
-
- destlen = min_t(unsigned long, destlen, PAGE_SIZE);
- bytes_left = destlen;
+ unsigned long to_copy;
workspace->strm.next_in = data_in;
workspace->strm.avail_in = srclen;
@@ -390,60 +385,30 @@ int zlib_decompress(struct list_head *ws, const u8 *data_in,
return -EIO;
}
- while (bytes_left > 0) {
- unsigned long buf_start;
- unsigned long buf_offset;
- unsigned long bytes;
+ /*
+ * Everything (in/out buf) should be at most one sector, there should
+ * be no need to switch any input/output buffer.
+ */
+ ret = zlib_inflate(&workspace->strm, Z_FINISH);
+ to_copy = min(workspace->strm.total_out, destlen);
+ if (ret != Z_STREAM_END)
+ goto out;
- ret = zlib_inflate(&workspace->strm, Z_NO_FLUSH);
- if (ret != Z_OK && ret != Z_STREAM_END)
- break;
+ memcpy_to_page(dest_page, dest_pgoff, workspace->buf, to_copy);
- buf_start = total_out;
- total_out = workspace->strm.total_out;
-
- if (total_out == buf_start) {
- ret = -EIO;
- break;
- }
-
- if (total_out <= start_byte)
- goto next;
-
- if (total_out > start_byte && buf_start < start_byte)
- buf_offset = start_byte - buf_start;
- else
- buf_offset = 0;
-
- bytes = min(PAGE_SIZE - pg_offset,
- PAGE_SIZE - (buf_offset % PAGE_SIZE));
- bytes = min(bytes, bytes_left);
-
- memcpy_to_page(dest_page, pg_offset,
- workspace->buf + buf_offset, bytes);
-
- pg_offset += bytes;
- bytes_left -= bytes;
-next:
- workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = workspace->buf_size;
- }
-
- if (ret != Z_STREAM_END && bytes_left != 0)
+out:
+ if (unlikely(to_copy != destlen)) {
+ pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
+ to_copy, destlen);
ret = -EIO;
- else
+ } else {
ret = 0;
+ }
zlib_inflateEnd(&workspace->strm);
- /*
- * this should only happen if zlib returned fewer bytes than we
- * expected. btrfs_get_block is responsible for zeroing from the
- * end of the inline extent (destlen) to the end of the page
- */
- if (pg_offset < destlen) {
- memzero_page(dest_page, pg_offset, destlen - pg_offset);
- }
+ if (unlikely(to_copy < destlen))
+ memzero_page(dest_page, dest_pgoff + to_copy, destlen - to_copy);
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-08 9:08 ` [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression Qu Wenruo
@ 2024-01-09 3:02 ` kernel test robot
2024-01-10 1:59 ` David Sterba
0 siblings, 1 reply; 12+ messages in thread
From: kernel test robot @ 2024-01-09 3:02 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: llvm, oe-kbuild-all
Hi Qu,
kernel test robot noticed the following build warnings:
[auto build test WARNING on kdave/for-next]
[also build test WARNING on next-20240108]
[cannot apply to linus/master v6.7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-zlib-fix-and-simplify-the-inline-extent-decompression/20240108-171206
base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/r/29b7793e53e1cdd559ad212ee69cec211a3b4db2.1704704328.git.wqu%40suse.com
patch subject: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/config)
compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401091012.pLm6PcKG-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> fs/btrfs/zlib.c:402:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
401 | pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
| ~~~
| %zu
402 | to_copy, destlen);
| ^~~~~~~
include/linux/printk.h:665:49: note: expanded from macro 'pr_warn_ratelimited'
665 | printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
| ~~~ ^~~~~~~~~~~
include/linux/printk.h:649:17: note: expanded from macro 'printk_ratelimited'
649 | printk(fmt, ##__VA_ARGS__); \
| ~~~ ^~~~~~~~~~~
include/linux/printk.h:455:60: note: expanded from macro 'printk'
455 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
| ~~~ ^~~~~~~~~~~
include/linux/printk.h:427:19: note: expanded from macro 'printk_index_wrap'
427 | _p_func(_fmt, ##__VA_ARGS__); \
| ~~~~ ^~~~~~~~~~~
1 warning generated.
vim +402 fs/btrfs/zlib.c
355
356 int zlib_decompress(struct list_head *ws, const u8 *data_in,
357 struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
358 size_t destlen)
359 {
360 struct workspace *workspace = list_entry(ws, struct workspace, list);
361 int ret = 0;
362 int wbits = MAX_WBITS;
363 unsigned long to_copy;
364
365 workspace->strm.next_in = data_in;
366 workspace->strm.avail_in = srclen;
367 workspace->strm.total_in = 0;
368
369 workspace->strm.next_out = workspace->buf;
370 workspace->strm.avail_out = workspace->buf_size;
371 workspace->strm.total_out = 0;
372 /* If it's deflate, and it's got no preset dictionary, then
373 we can tell zlib to skip the adler32 check. */
374 if (srclen > 2 && !(data_in[1] & PRESET_DICT) &&
375 ((data_in[0] & 0x0f) == Z_DEFLATED) &&
376 !(((data_in[0]<<8) + data_in[1]) % 31)) {
377
378 wbits = -((data_in[0] >> 4) + 8);
379 workspace->strm.next_in += 2;
380 workspace->strm.avail_in -= 2;
381 }
382
383 if (Z_OK != zlib_inflateInit2(&workspace->strm, wbits)) {
384 pr_warn("BTRFS: inflateInit failed\n");
385 return -EIO;
386 }
387
388 /*
389 * Everything (in/out buf) should be at most one sector, there should
390 * be no need to switch any input/output buffer.
391 */
392 ret = zlib_inflate(&workspace->strm, Z_FINISH);
393 to_copy = min(workspace->strm.total_out, destlen);
394 if (ret != Z_STREAM_END)
395 goto out;
396
397 memcpy_to_page(dest_page, dest_pgoff, workspace->buf, to_copy);
398
399 out:
400 if (unlikely(to_copy != destlen)) {
401 pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
> 402 to_copy, destlen);
403 ret = -EIO;
404 } else {
405 ret = 0;
406 }
407
408 zlib_inflateEnd(&workspace->strm);
409
410 if (unlikely(to_copy < destlen))
411 memzero_page(dest_page, dest_pgoff + to_copy, destlen - to_copy);
412 return ret;
413 }
414
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-09 3:02 ` kernel test robot
@ 2024-01-10 1:59 ` David Sterba
2024-01-10 2:03 ` Qu Wenruo
0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2024-01-10 1:59 UTC (permalink / raw)
To: kernel test robot; +Cc: Qu Wenruo, linux-btrfs, llvm, oe-kbuild-all
On Tue, Jan 09, 2024 at 11:02:54AM +0800, kernel test robot wrote:
> Hi Qu,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on kdave/for-next]
> [also build test WARNING on next-20240108]
> [cannot apply to linus/master v6.7]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-zlib-fix-and-simplify-the-inline-extent-decompression/20240108-171206
> base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> patch link: https://lore.kernel.org/r/29b7793e53e1cdd559ad212ee69cec211a3b4db2.1704704328.git.wqu%40suse.com
> patch subject: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
> config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/config)
> compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091012.pLm6PcKG-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> >> fs/btrfs/zlib.c:402:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
> 401 | pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
> | ~~~
> | %zu
> 402 | to_copy, destlen);
> | ^~~~~~~
Valid report but I can't reproduce it. Built with clang 17 and
explicitly enabled -Wformat. We have additional warnings enabled per
directory fs/btrfs/ so we can add -Wformat, I'd like to know what I'm
missing, we've had fixups for the size_t printk format so it would make
sense to catch it early.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-10 1:59 ` David Sterba
@ 2024-01-10 2:03 ` Qu Wenruo
2024-01-10 2:26 ` David Sterba
0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-01-10 2:03 UTC (permalink / raw)
To: dsterba, kernel test robot; +Cc: linux-btrfs, llvm, oe-kbuild-all
[-- Attachment #1.1.1: Type: text/plain, Size: 2866 bytes --]
On 2024/1/10 12:29, David Sterba wrote:
> On Tue, Jan 09, 2024 at 11:02:54AM +0800, kernel test robot wrote:
>> Hi Qu,
>>
>> kernel test robot noticed the following build warnings:
>>
>> [auto build test WARNING on kdave/for-next]
>> [also build test WARNING on next-20240108]
>> [cannot apply to linus/master v6.7]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>>
>> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-zlib-fix-and-simplify-the-inline-extent-decompression/20240108-171206
>> base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
>> patch link: https://lore.kernel.org/r/29b7793e53e1cdd559ad212ee69cec211a3b4db2.1704704328.git.wqu%40suse.com
>> patch subject: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
>> config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/config)
>> compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
>> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/reproduce)
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <lkp@intel.com>
>> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091012.pLm6PcKG-lkp@intel.com/
>>
>> All warnings (new ones prefixed by >>):
>>
>>>> fs/btrfs/zlib.c:402:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
>> 401 | pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
>> | ~~~
>> | %zu
>> 402 | to_copy, destlen);
>> | ^~~~~~~
>
> Valid report but I can't reproduce it. Built with clang 17 and
> explicitly enabled -Wformat. We have additional warnings enabled per
> directory fs/btrfs/ so we can add -Wformat, I'd like to know what I'm
> missing, we've had fixups for the size_t printk format so it would make
> sense to catch it early.
I guess it's due to the platform? (The report is from 32bit system).
Otherwise it's indeed my bad, for now I don't even have a 32bit VM to
verify, thus my LSP doesn't warn me about the format.
Thanks,
Qu
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-10 2:03 ` Qu Wenruo
@ 2024-01-10 2:26 ` David Sterba
2024-01-10 2:34 ` Qu Wenruo
0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2024-01-10 2:26 UTC (permalink / raw)
To: Qu Wenruo; +Cc: dsterba, kernel test robot, linux-btrfs, llvm, oe-kbuild-all
On Wed, Jan 10, 2024 at 12:33:17PM +1030, Qu Wenruo wrote:
>
>
> On 2024/1/10 12:29, David Sterba wrote:
> > On Tue, Jan 09, 2024 at 11:02:54AM +0800, kernel test robot wrote:
> >> Hi Qu,
> >>
> >> kernel test robot noticed the following build warnings:
> >>
> >> [auto build test WARNING on kdave/for-next]
> >> [also build test WARNING on next-20240108]
> >> [cannot apply to linus/master v6.7]
> >> [If your patch is applied to the wrong git tree, kindly drop us a note.
> >> And when submitting patch, we suggest to use '--base' as documented in
> >> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >>
> >> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-zlib-fix-and-simplify-the-inline-extent-decompression/20240108-171206
> >> base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> >> patch link: https://lore.kernel.org/r/29b7793e53e1cdd559ad212ee69cec211a3b4db2.1704704328.git.wqu%40suse.com
> >> patch subject: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
> >> config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/config)
> >> compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> >> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/reproduce)
> >>
> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >> the same patch/commit), kindly add following tags
> >> | Reported-by: kernel test robot <lkp@intel.com>
> >> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091012.pLm6PcKG-lkp@intel.com/
> >>
> >> All warnings (new ones prefixed by >>):
> >>
> >>>> fs/btrfs/zlib.c:402:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
> >> 401 | pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
> >> | ~~~
> >> | %zu
> >> 402 | to_copy, destlen);
> >> | ^~~~~~~
> >
> > Valid report but I can't reproduce it. Built with clang 17 and
> > explicitly enabled -Wformat. We have additional warnings enabled per
> > directory fs/btrfs/ so we can add -Wformat, I'd like to know what I'm
> > missing, we've had fixups for the size_t printk format so it would make
> > sense to catch it early.
>
> I guess it's due to the platform? (The report is from 32bit system).
Ah I see, I build on 64bit platform but should the Wformat warning also
point out mismatch there? The size_t type is an alias of unsigned long
so it is not an error but when size_t and %zu don't match could be a
platform-independent error, no? This would save us reports and followup
fixups roundtrips.
> Otherwise it's indeed my bad, for now I don't even have a 32bit VM to
> verify, thus my LSP doesn't warn me about the format.
Yeah, it could/should though.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-10 2:26 ` David Sterba
@ 2024-01-10 2:34 ` Qu Wenruo
2024-01-10 2:42 ` David Sterba
0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-01-10 2:34 UTC (permalink / raw)
To: dsterba; +Cc: kernel test robot, linux-btrfs, llvm, oe-kbuild-all
[-- Attachment #1.1.1: Type: text/plain, Size: 4317 bytes --]
On 2024/1/10 12:56, David Sterba wrote:
> On Wed, Jan 10, 2024 at 12:33:17PM +1030, Qu Wenruo wrote:
>>
>>
>> On 2024/1/10 12:29, David Sterba wrote:
>>> On Tue, Jan 09, 2024 at 11:02:54AM +0800, kernel test robot wrote:
>>>> Hi Qu,
>>>>
>>>> kernel test robot noticed the following build warnings:
>>>>
>>>> [auto build test WARNING on kdave/for-next]
>>>> [also build test WARNING on next-20240108]
>>>> [cannot apply to linus/master v6.7]
>>>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>>>> And when submitting patch, we suggest to use '--base' as documented in
>>>> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>>>>
>>>> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-zlib-fix-and-simplify-the-inline-extent-decompression/20240108-171206
>>>> base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
>>>> patch link: https://lore.kernel.org/r/29b7793e53e1cdd559ad212ee69cec211a3b4db2.1704704328.git.wqu%40suse.com
>>>> patch subject: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
>>>> config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/config)
>>>> compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
>>>> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091012.pLm6PcKG-lkp@intel.com/reproduce)
>>>>
>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>> the same patch/commit), kindly add following tags
>>>> | Reported-by: kernel test robot <lkp@intel.com>
>>>> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091012.pLm6PcKG-lkp@intel.com/
>>>>
>>>> All warnings (new ones prefixed by >>):
>>>>
>>>>>> fs/btrfs/zlib.c:402:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
>>>> 401 | pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
>>>> | ~~~
>>>> | %zu
>>>> 402 | to_copy, destlen);
>>>> | ^~~~~~~
>>>
>>> Valid report but I can't reproduce it. Built with clang 17 and
>>> explicitly enabled -Wformat. We have additional warnings enabled per
>>> directory fs/btrfs/ so we can add -Wformat, I'd like to know what I'm
>>> missing, we've had fixups for the size_t printk format so it would make
>>> sense to catch it early.
>>
>> I guess it's due to the platform? (The report is from 32bit system).
>
> Ah I see, I build on 64bit platform but should the Wformat warning also
> point out mismatch there? The size_t type is an alias of unsigned long
> so it is not an error but when size_t and %zu don't match could be a
> platform-independent error, no? This would save us reports and followup
> fixups roundtrips.
size_t is defined differently, in include/uapi/asm-generic/posix_types.h:
/*
* Most 32 bit architectures use "unsigned int" size_t,
* and all 64 bit architectures use "unsigned long" size_t.
*/
#ifndef __kernel_size_t
#if __BITS_PER_LONG != 64
typedef unsigned int __kernel_size_t;
typedef int __kernel_ssize_t;
typedef int __kernel_ptrdiff_t;
#else
typedef __kernel_ulong_t __kernel_size_t;
typedef __kernel_long_t __kernel_ssize_t;
typedef __kernel_long_t __kernel_ptrdiff_t;
#endif
#endif
Thus this is the reason why we need @zu for size_t to handle the
difference, and since for 64bit it's just unsigned long, thus compiler
won't give any warning.
(That's also why I tend to not use size_t at all, and why I like rust's
explicit sized type, and we may want to go that path to prefer
u8/u16/u32/u64 when possible)
Thanks,
Qu
>
>> Otherwise it's indeed my bad, for now I don't even have a 32bit VM to
>> verify, thus my LSP doesn't warn me about the format.
>
> Yeah, it could/should though.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression
2024-01-10 2:34 ` Qu Wenruo
@ 2024-01-10 2:42 ` David Sterba
0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2024-01-10 2:42 UTC (permalink / raw)
To: Qu Wenruo; +Cc: dsterba, kernel test robot, linux-btrfs, llvm, oe-kbuild-all
On Wed, Jan 10, 2024 at 01:04:06PM +1030, Qu Wenruo wrote:
> >>>>
> >>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >>>> the same patch/commit), kindly add following tags
> >>>> | Reported-by: kernel test robot <lkp@intel.com>
> >>>> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091012.pLm6PcKG-lkp@intel.com/
> >>>>
> >>>> All warnings (new ones prefixed by >>):
> >>>>
> >>>>>> fs/btrfs/zlib.c:402:15: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
> >>>> 401 | pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%lu\n",
> >>>> | ~~~
> >>>> | %zu
> >>>> 402 | to_copy, destlen);
> >>>> | ^~~~~~~
> >>>
> >>> Valid report but I can't reproduce it. Built with clang 17 and
> >>> explicitly enabled -Wformat. We have additional warnings enabled per
> >>> directory fs/btrfs/ so we can add -Wformat, I'd like to know what I'm
> >>> missing, we've had fixups for the size_t printk format so it would make
> >>> sense to catch it early.
> >>
> >> I guess it's due to the platform? (The report is from 32bit system).
> >
> > Ah I see, I build on 64bit platform but should the Wformat warning also
> > point out mismatch there? The size_t type is an alias of unsigned long
> > so it is not an error but when size_t and %zu don't match could be a
> > platform-independent error, no? This would save us reports and followup
> > fixups roundtrips.
>
> size_t is defined differently, in include/uapi/asm-generic/posix_types.h:
>
> /*
> * Most 32 bit architectures use "unsigned int" size_t,
> * and all 64 bit architectures use "unsigned long" size_t.
> */
> #ifndef __kernel_size_t
> #if __BITS_PER_LONG != 64
> typedef unsigned int __kernel_size_t;
> typedef int __kernel_ssize_t;
> typedef int __kernel_ptrdiff_t;
> #else
> typedef __kernel_ulong_t __kernel_size_t;
> typedef __kernel_long_t __kernel_ssize_t;
> typedef __kernel_long_t __kernel_ptrdiff_t;
> #endif
> #endif
>
> Thus this is the reason why we need @zu for size_t to handle the
> difference, and since for 64bit it's just unsigned long, thus compiler
> won't give any warning.
So it's the int/long difference and kernel does it in a special way due
to absence of the standard libc headers.
> (That's also why I tend to not use size_t at all, and why I like rust's
> explicit sized type, and we may want to go that path to prefer
> u8/u16/u32/u64 when possible)
Yeah, from what I've seen so far is that size_t brings us only problems,
I would not mind conversions to explicit types like u32 or u64. We do
that in many places, I think it's namely on the interface boundaries
where we provide a callback with a size_t type, but from that down it
could be u64.
Quick grep for size_t returns 300+ lines and 100+ of that is ssize_t,
that's still a lot so the conversions should be targeted.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] btrfs: lzo: fix and simplify the inline extent decompression
2024-01-08 9:08 [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage Qu Wenruo
2024-01-08 9:08 ` [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression Qu Wenruo
@ 2024-01-08 9:08 ` Qu Wenruo
2024-01-08 9:08 ` [PATCH 3/3] btrfs: zstd: " Qu Wenruo
2024-01-10 3:29 ` [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage David Sterba
3 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2024-01-08 9:08 UTC (permalink / raw)
To: linux-btrfs
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 2 (lzo)
Then trying to reflink that extent in an aarch64 system with 64K page
size, the reflink would just fail:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zlib_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should error out, without
any proper explanation (this is from the very beginning of btrfs).
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer nor error out, since the whole
input/output buffer should never exceed one sector.
(The above assumption is only not true if we're going to support
multi-page sectorsize)
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modification:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Use @sectorsize other than PAGE_SIZE to properly initialize the
output buffer size
- Use correct destination offset inside the destination page
- Use memcpy_to_page() to copy the contents to the destination page
- Use memzero_page() to zero out the tailing part
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And results the correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.h | 2 +-
fs/btrfs/lzo.c | 34 +++++++++-------------------------
2 files changed, 10 insertions(+), 26 deletions(-)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 2b4dfb1b010c..afd7e50d073d 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -159,7 +159,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
unsigned long *total_in, unsigned long *total_out);
int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
int lzo_decompress(struct list_head *ws, const u8 *data_in,
- struct page *dest_page, unsigned long start_byte, size_t srclen,
+ struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
size_t destlen);
struct list_head *lzo_alloc_workspace(unsigned int level);
void lzo_free_workspace(struct list_head *ws);
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 1131d5a29d61..e43bc0fdc74e 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -425,16 +425,16 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
}
int lzo_decompress(struct list_head *ws, const u8 *data_in,
- struct page *dest_page, unsigned long start_byte, size_t srclen,
+ struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
size_t destlen)
{
struct workspace *workspace = list_entry(ws, struct workspace, list);
+ struct btrfs_fs_info *fs_info = btrfs_sb(dest_page->mapping->host->i_sb);
+ const u32 sectorsize = fs_info->sectorsize;
size_t in_len;
size_t out_len;
size_t max_segment_len = WORKSPACE_BUF_LENGTH;
int ret = 0;
- char *kaddr;
- unsigned long bytes;
if (srclen < LZO_LEN || srclen > max_segment_len + LZO_LEN * 2)
return -EUCLEAN;
@@ -451,7 +451,7 @@ int lzo_decompress(struct list_head *ws, const u8 *data_in,
}
data_in += LZO_LEN;
- out_len = PAGE_SIZE;
+ out_len = sectorsize;
ret = lzo1x_decompress_safe(data_in, in_len, workspace->buf, &out_len);
if (ret != LZO_E_OK) {
pr_warn("BTRFS: decompress failed!\n");
@@ -459,29 +459,13 @@ int lzo_decompress(struct list_head *ws, const u8 *data_in,
goto out;
}
- if (out_len < start_byte) {
+ ASSERT(out_len <= sectorsize);
+ memcpy_to_page(dest_page, dest_pgoff, workspace->buf, out_len);
+ /* Early end, considered as an error. */
+ if (unlikely(out_len < destlen)) {
ret = -EIO;
- goto out;
+ memzero_page(dest_page, dest_pgoff + out_len, destlen - out_len);
}
-
- /*
- * the caller is already checking against PAGE_SIZE, but lets
- * move this check closer to the memcpy/memset
- */
- destlen = min_t(unsigned long, destlen, PAGE_SIZE);
- bytes = min_t(unsigned long, destlen, out_len - start_byte);
-
- kaddr = kmap_local_page(dest_page);
- memcpy(kaddr, workspace->buf + start_byte, bytes);
-
- /*
- * btrfs_getblock is doing a zero on the tail of the page too,
- * but this will cover anything missing from the decompressed
- * data.
- */
- if (bytes < destlen)
- memset(kaddr+bytes, 0, destlen-bytes);
- kunmap_local(kaddr);
out:
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/3] btrfs: zstd: fix and simplify the inline extent decompression
2024-01-08 9:08 [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage Qu Wenruo
2024-01-08 9:08 ` [PATCH 1/3] btrfs: zlib: fix and simplify the inline extent decompression Qu Wenruo
2024-01-08 9:08 ` [PATCH 2/3] btrfs: lzo: " Qu Wenruo
@ 2024-01-08 9:08 ` Qu Wenruo
2024-01-10 3:29 ` [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage David Sterba
3 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2024-01-08 9:08 UTC (permalink / raw)
To: linux-btrfs
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 3 (zstd)
Then trying to reflink that extent in an aarch64 system with 64K page
size, the reflink would just fail:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zstd_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should error out, without
any proper explanation (this is copied from other decompression code).
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer nor error out, since the whole
input/output buffer should never exceed one sector, thus we should not
need to do any buffer switch.
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modification:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Use @sectorsize other than PAGE_SIZE to properly initialize the
output buffer size
- Use correct destination offset inside the destination page
- Simplify the main loop
Since the input/output buffer should never switch, we only need one
zstd_decompress_stream() call.
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And results the correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.h | 2 +-
fs/btrfs/zstd.c | 74 +++++++++++++-----------------------------
2 files changed, 23 insertions(+), 53 deletions(-)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index afd7e50d073d..97fe3ebf11a2 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -169,7 +169,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
unsigned long *total_in, unsigned long *total_out);
int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
int zstd_decompress(struct list_head *ws, const u8 *data_in,
- struct page *dest_page, unsigned long start_byte, size_t srclen,
+ struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
size_t destlen);
void zstd_init_workspace_manager(void);
void zstd_cleanup_workspace_manager(void);
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index 0d66db8bc1d4..812e3bd43889 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -20,6 +20,7 @@
#include "misc.h"
#include "compression.h"
#include "ctree.h"
+#include "super.h"
#define ZSTD_BTRFS_MAX_WINDOWLOG 17
#define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
@@ -618,80 +619,49 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
}
int zstd_decompress(struct list_head *ws, const u8 *data_in,
- struct page *dest_page, unsigned long start_byte, size_t srclen,
+ struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
size_t destlen)
{
struct workspace *workspace = list_entry(ws, struct workspace, list);
+ struct btrfs_fs_info *fs_info = btrfs_sb(dest_page->mapping->host->i_sb);
+ const u32 sectorsize = fs_info->sectorsize;
zstd_dstream *stream;
int ret = 0;
- size_t ret2;
- unsigned long total_out = 0;
- unsigned long pg_offset = 0;
+ unsigned long to_copy = 0;
stream = zstd_init_dstream(
ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
if (!stream) {
pr_warn("BTRFS: zstd_init_dstream failed\n");
- ret = -EIO;
goto finish;
}
- destlen = min_t(size_t, destlen, PAGE_SIZE);
-
workspace->in_buf.src = data_in;
workspace->in_buf.pos = 0;
workspace->in_buf.size = srclen;
workspace->out_buf.dst = workspace->buf;
workspace->out_buf.pos = 0;
- workspace->out_buf.size = PAGE_SIZE;
+ workspace->out_buf.size = sectorsize;
- ret2 = 1;
- while (pg_offset < destlen
- && workspace->in_buf.pos < workspace->in_buf.size) {
- unsigned long buf_start;
- unsigned long buf_offset;
- unsigned long bytes;
-
- /* Check if the frame is over and we still need more input */
- if (ret2 == 0) {
- pr_debug("BTRFS: zstd_decompress_stream ended early\n");
- ret = -EIO;
- goto finish;
- }
- ret2 = zstd_decompress_stream(stream, &workspace->out_buf,
- &workspace->in_buf);
- if (zstd_is_error(ret2)) {
- pr_debug("BTRFS: zstd_decompress_stream returned %d\n",
- zstd_get_error_code(ret2));
- ret = -EIO;
- goto finish;
- }
-
- buf_start = total_out;
- total_out += workspace->out_buf.pos;
- workspace->out_buf.pos = 0;
-
- if (total_out <= start_byte)
- continue;
-
- if (total_out > start_byte && buf_start < start_byte)
- buf_offset = start_byte - buf_start;
- else
- buf_offset = 0;
-
- bytes = min_t(unsigned long, destlen - pg_offset,
- workspace->out_buf.size - buf_offset);
-
- memcpy_to_page(dest_page, pg_offset,
- workspace->out_buf.dst + buf_offset, bytes);
-
- pg_offset += bytes;
+ /*
+ * Since both input and output buffer should not exceed one sector,
+ * One call should end the decompression.
+ */
+ ret = zstd_decompress_stream(stream, &workspace->out_buf, &workspace->in_buf);
+ if (zstd_is_error(ret)) {
+ pr_warn_ratelimited("BTRFS: zstd_decompress_stream return %d\n",
+ zstd_get_error_code(ret));
+ goto finish;
}
- ret = 0;
+ to_copy = workspace->out_buf.pos;
+ memcpy_to_page(dest_page, dest_pgoff + to_copy, workspace->out_buf.dst,
+ to_copy);
finish:
- if (pg_offset < destlen) {
- memzero_page(dest_page, pg_offset, destlen - pg_offset);
+ /* Error or early end. */
+ if (to_copy < destlen) {
+ ret = -EIO;
+ memzero_page(dest_page, dest_pgoff + to_copy, destlen - to_copy);
}
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage
2024-01-08 9:08 [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage Qu Wenruo
` (2 preceding siblings ...)
2024-01-08 9:08 ` [PATCH 3/3] btrfs: zstd: " Qu Wenruo
@ 2024-01-10 3:29 ` David Sterba
2024-01-10 4:18 ` Qu Wenruo
3 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2024-01-10 3:29 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Jan 08, 2024 at 07:38:43PM +1030, Qu Wenruo wrote:
> There is a long existing bug in subpage inline extent reflinking to
> another location.
>
> The bug is caused by an existing bad code, which is from the beginning
> of btrfs.
> The bad code is never properly explained and got further copied into new
> compression code.
I think there was an idea to let an inline compressed extent to span
more than one page, but it never materialized. There could be code
handling it but due to the invariants (like that inline extent is always
smaller than a sector) it's never executed.
> The bad condition never got properly triggered by different reasons for
> different platforms:
>
> - On 4K page sized system, the @start_byte is always 0
> Thus the existing checks are all dead code, thus never triggered.
>
> - For subpage (4K sectorsize 64K page size) cases, inline extent
> creation is disable for a different reason
> Since no inline extent can be created, there is no way to reflink
> any inlined extent thus no way to trigger it.
>
> The fixes are mostly going to rework the decompression loop, making sure
> the input and output buffer are always large enough for inline extent.
> Thus no need for any loop, but a single decompression call.
Yeah, as long as we have the page == sectorsize it's not necessary.
> But the difficulty lies in how to properly test the bug.
> For now I'm only doing cross-platform tests, using image created on
> x86_64, and do the reflink on aarch64.
> Not sure if it's possible to upload a binary image for fstests, or I
> don't have any good way to test the bug.
We had this discussion year ago (no crafted images) but the maintainer
of fstests changed meanwhile so you can try again. You can always add
that to progs but it won't get the same level of testing due to its
primary purpose to test the userspace code.
> Qu Wenruo (3):
> btrfs: zlib: fix and simplify the inline extent decompression
> btrfs: lzo: fix and simplify the inline extent decompression
> btrfs: zstd: fix and simplify the inline extent decompression
Added to misc-next, thanks.
There are new error messages when the decompressed length is not
expected, this could be still improved (and made more consistent). The
name of the file or inode number and filesystem identification are
missing, but that's how the current code works.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage
2024-01-10 3:29 ` [PATCH 0/3] btrfs: fix and simplify the inline extent decompression path for subpage David Sterba
@ 2024-01-10 4:18 ` Qu Wenruo
0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2024-01-10 4:18 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs
On 2024/1/10 13:59, David Sterba wrote:
> On Mon, Jan 08, 2024 at 07:38:43PM +1030, Qu Wenruo wrote:
>> There is a long existing bug in subpage inline extent reflinking to
>> another location.
>>
>> The bug is caused by an existing bad code, which is from the beginning
>> of btrfs.
>> The bad code is never properly explained and got further copied into new
>> compression code.
>
> I think there was an idea to let an inline compressed extent to span
> more than one page, but it never materialized. There could be code
> handling it but due to the invariants (like that inline extent is always
> smaller than a sector) it's never executed.
Unfortunately it doesn't look like this.
For multiple page cases, we should not call `offset_in_page()` for that
@start_byte, and even in that case, we still don't need that @start_byte
parameter.
To me, that parameter is just for subpage, but not properly documented
and since it's mostly dead code, we copy them around without properly
knowing the reason.
>
>> The bad condition never got properly triggered by different reasons for
>> different platforms:
>>
>> - On 4K page sized system, the @start_byte is always 0
>> Thus the existing checks are all dead code, thus never triggered.
>>
>> - For subpage (4K sectorsize 64K page size) cases, inline extent
>> creation is disable for a different reason
>> Since no inline extent can be created, there is no way to reflink
>> any inlined extent thus no way to trigger it.
>>
>> The fixes are mostly going to rework the decompression loop, making sure
>> the input and output buffer are always large enough for inline extent.
>> Thus no need for any loop, but a single decompression call.
>
> Yeah, as long as we have the page == sectorsize it's not necessary.
>
>> But the difficulty lies in how to properly test the bug.
>> For now I'm only doing cross-platform tests, using image created on
>> x86_64, and do the reflink on aarch64.
>> Not sure if it's possible to upload a binary image for fstests, or I
>> don't have any good way to test the bug.
>
> We had this discussion year ago (no crafted images) but the maintainer
> of fstests changed meanwhile so you can try again.
It looks like that's the best way to go for now.
Thanks,
Qu
> You can always add
> that to progs but it won't get the same level of testing due to its
> primary purpose to test the userspace code.
>
>> Qu Wenruo (3):
>> btrfs: zlib: fix and simplify the inline extent decompression
>> btrfs: lzo: fix and simplify the inline extent decompression
>> btrfs: zstd: fix and simplify the inline extent decompression
>
> Added to misc-next, thanks.
>
> There are new error messages when the decompressed length is not
> expected, this could be still improved (and made more consistent). The
> name of the file or inode number and filesystem identification are
> missing, but that's how the current code works.
>
^ permalink raw reply [flat|nested] 12+ messages in thread