* [PATCH 0/2] btrfs-progs: check for device item between super
@ 2025-08-02 0:26 Qu Wenruo
2025-08-02 0:26 ` [PATCH 1/2] btrfs-progs: fsck-tests: make the warning check more specific for 057 Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-08-02 0:26 UTC (permalink / raw)
To: linux-btrfs
Mark has submitted a check enhancement for progs to detect the device
item mismatch between super blocks and the items inside chunk tree.
However there is a long existing problem that it will break CI.
The root cause is that the CI kernel lacks the needed backports, that on
a lot cases the kernel can lead to such mismatch and being caught by the
newer progs.
So to merge this long existing fsck enhancement, this series refresh and
workaround the problem by:
- Only reports warnings when such mismatch is detected
Such mismatch is not a huge deal, as we always trust the device item in
chunk tree more than the super block one.
So it won't cause data loss or whatever.
So even if the CI kernel doesn't have the fix, self test cases won't
report them as a failure.
- Workaround fsck/057 to avoid failure
Test case fsck/057 is a special case, where we manually check the
output for warning messages.
This is originally to detect problems related seed device, but now it
will also detect device item mismatch cause by the older CI kernel.
Workaround it by making the keyword more specific to the original
problem, not just checking for warnings.
With those done, we can finally make CI to accept the new checks even
the kernel is not uptodate.
Mark Harmstone (1):
btrfs-progs: check that device byte values in superblock match those
in chunk root
Qu Wenruo (1):
btrfs-progs: fsck-tests: make the warning check more specific for 057
check/main.c | 35 +++++++++++++++++++
.../fsck-tests/057-seed-false-alerts/test.sh | 6 ++--
2 files changed, 38 insertions(+), 3 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/2] btrfs-progs: fsck-tests: make the warning check more specific for 057 2025-08-02 0:26 [PATCH 0/2] btrfs-progs: check for device item between super Qu Wenruo @ 2025-08-02 0:26 ` Qu Wenruo 2025-08-02 0:26 ` [PATCH 2/2] btrfs-progs: check that device byte values in superblock match those in chunk root Qu Wenruo 2025-08-05 19:34 ` [PATCH 0/2] btrfs-progs: check for device item between super Boris Burkov 2 siblings, 0 replies; 6+ messages in thread From: Qu Wenruo @ 2025-08-02 0:26 UTC (permalink / raw) To: linux-btrfs [BUG] On older kernels without the fix ae4477f93756 ("btrfs: update superblock's device bytes_used when dropping chunk"), the test case 057 will result super block device item to mismatch with the chunk one. Normally this is not a big deal, but newer btrfs-progs will check for such mismatch. Although newer btrfs-progs won't report this as an error, the test case fsck/057 will manually check for any warnings, and fail the test case: ====== RUN CHECK /home/runner/work/btrfs-progs/btrfs-progs/btrfs check /dev/loop1 [1/8] checking log skipped (none written) [2/8] checking root items [3/8] checking extents WARNING: device 2's bytes_used was 503316480 in tree but 570425344 in superblock [4/8] checking free space tree [5/8] checking fs roots [6/8] checking only csums items (without verifying data) [7/8] checking root refs [8/8] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... Checking filesystem on /dev/loop1 UUID: efd3e3b9-2fac-4a6f-b378-34694dc2d446 found 147456 bytes used, no error found total csum bytes: 0 total tree bytes: 147456 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 139992 file data blocks allocated: 0 referenced 0 That WARNING line will fail the test case, even if we didn't error out. [CAUSE] The test case itself is a test case for btrfs-progs commit 0dc8b8b6a414 ("btrfs-progs: check: fix wrong total bytes check for seed device"), which will report minor warning like the following: [2/7] checking extents WARNING: minor unaligned/mismatch device size detected WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it But in this particular case, 057 should only check for the related wanrings, not something caused by the kernel. [FIX] Make the warning check in fsck/057 more specific, instead of "WARNING" use "fix-device-size" as the keyword. This is an unfortunate workaround for the CI kernels, which doesn't have fast enough backport of upstream kernel fixes. Signed-off-by: Qu Wenruo <wqu@suse.com> --- tests/fsck-tests/057-seed-false-alerts/test.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/fsck-tests/057-seed-false-alerts/test.sh b/tests/fsck-tests/057-seed-false-alerts/test.sh index edf160fab348..2d491e1fbaf5 100755 --- a/tests/fsck-tests/057-seed-false-alerts/test.sh +++ b/tests/fsck-tests/057-seed-false-alerts/test.sh @@ -35,13 +35,13 @@ sprouted_output=$(_mktemp btrfs-progs-sprouted-check-stdout.XXXXXX) run_check_stdout $SUDO_HELPER "$TOP/btrfs" check "$dev1" >> "$seed_output" run_check_stdout $SUDO_HELPER "$TOP/btrfs" check "$dev2" >> "$sprouted_output" -# There should be no warning for both seed and sprouted fs -if grep -q "WARNING" "$seed_output"; then +# There should be no warning about the device size for both seed and sprouted fs +if grep -q "fix-device-size" "$seed_output"; then cleanup_loopdevs rm -f -- "$seed_output" "$sprouted_output" _fail "false alerts detected for seed fs" fi -if grep -q "WARNING" "$sprouted_output"; then +if grep -q "fix-device-size" "$sprouted_output"; then cleanup_loopdevs rm -f -- "$seed_output" "$sprouted_output" _fail "false alerts detected for sprouted fs" -- 2.50.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs-progs: check that device byte values in superblock match those in chunk root 2025-08-02 0:26 [PATCH 0/2] btrfs-progs: check for device item between super Qu Wenruo 2025-08-02 0:26 ` [PATCH 1/2] btrfs-progs: fsck-tests: make the warning check more specific for 057 Qu Wenruo @ 2025-08-02 0:26 ` Qu Wenruo 2025-08-05 19:34 ` [PATCH 0/2] btrfs-progs: check for device item between super Boris Burkov 2 siblings, 0 replies; 6+ messages in thread From: Qu Wenruo @ 2025-08-02 0:26 UTC (permalink / raw) To: linux-btrfs; +Cc: Mark Harmstone From: Mark Harmstone <maharmstone@fb.com> The superblock of each device contains a copy of the corresponding struct btrfs_dev_item that lives in the chunk root. Add a check that the total_bytes and bytes_used values of these two copies match. Signed-off-by: Mark Harmstone <maharmstone@fb.com> [ Change the error to warning ] Signed-off-by: Qu Wenruo <wqu@suse.com> --- check/main.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/check/main.c b/check/main.c index b78eb59d0c50..5449bf016f39 100644 --- a/check/main.c +++ b/check/main.c @@ -8593,6 +8593,7 @@ static int check_device_used(struct device_record *dev_rec, if (opt_check_repair) { ret = repair_dev_item_bytes_used(gfs_info, dev_rec->devid, total_byte); + dev_rec->byte_used = total_byte; } return ret; } else { @@ -8650,6 +8651,28 @@ static bool is_super_size_valid(void) return true; } +static int check_super_dev_item(struct device_record *dev_rec) +{ + struct btrfs_dev_item *super_di = &gfs_info->super_copy->dev_item; + int ret = 0; + + if (btrfs_stack_device_total_bytes(super_di) != dev_rec->total_byte) { + warning("device %llu's total_bytes was %llu in tree but %llu in superblock", + dev_rec->devid, dev_rec->total_byte, + btrfs_stack_device_total_bytes(super_di)); + ret = 1; + } + + if (btrfs_stack_device_bytes_used(super_di) != dev_rec->byte_used) { + warning("device %llu's bytes_used was %llu in tree but %llu in superblock", + dev_rec->devid, dev_rec->byte_used, + btrfs_stack_device_bytes_used(super_di)); + ret = 1; + } + + return ret; +} + /* check btrfs_dev_item -> btrfs_dev_extent */ static int check_devices(struct rb_root *dev_cache, struct device_extent_tree *dev_extent_cache) @@ -8671,6 +8694,18 @@ static int check_devices(struct rb_root *dev_cache, gfs_info->sectorsize); if (dev_rec->bad_block_dev_size && !ret) ret = 1; + + if (dev_rec->devid == gfs_info->super_copy->dev_item.devid) { + /* + * This dev item mismatch between super and chunk tree + * is not a criticl problem, and CI kernels do not receive + * needed backport so they will cause mismatch during RW mounts. + * + * SO here we didn't record the mismatch as an error. + */ + check_super_dev_item(dev_rec); + } + dev_node = rb_next(dev_node); } list_for_each_entry(dext_rec, &dev_extent_cache->no_device_orphans, -- 2.50.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] btrfs-progs: check for device item between super 2025-08-02 0:26 [PATCH 0/2] btrfs-progs: check for device item between super Qu Wenruo 2025-08-02 0:26 ` [PATCH 1/2] btrfs-progs: fsck-tests: make the warning check more specific for 057 Qu Wenruo 2025-08-02 0:26 ` [PATCH 2/2] btrfs-progs: check that device byte values in superblock match those in chunk root Qu Wenruo @ 2025-08-05 19:34 ` Boris Burkov 2025-08-05 22:53 ` Qu Wenruo 2 siblings, 1 reply; 6+ messages in thread From: Boris Burkov @ 2025-08-05 19:34 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sat, Aug 02, 2025 at 09:56:18AM +0930, Qu Wenruo wrote: > Mark has submitted a check enhancement for progs to detect the device > item mismatch between super blocks and the items inside chunk tree. > > However there is a long existing problem that it will break CI. > > The root cause is that the CI kernel lacks the needed backports, that on > a lot cases the kernel can lead to such mismatch and being caught by the > newer progs. Can you explain why we are contorting around out of date "continuous" integration? Shouldn't we just get the CI on a new kernel? > > So to merge this long existing fsck enhancement, this series refresh and > workaround the problem by: Thanks for putting in the effort to get the enhancement in, by the way! > > - Only reports warnings when such mismatch is detected > Such mismatch is not a huge deal, as we always trust the device item in > chunk tree more than the super block one. > So it won't cause data loss or whatever. That makes sense to me. > > So even if the CI kernel doesn't have the fix, self test cases won't > report them as a failure. > > - Workaround fsck/057 to avoid failure > Test case fsck/057 is a special case, where we manually check the > output for warning messages. > > This is originally to detect problems related seed device, but now it > will also detect device item mismatch cause by the older CI kernel. > > Workaround it by making the keyword more specific to the original > problem, not just checking for warnings. I'm a little skeptical about reducing even the incidental coverage of the test except for a good reason. Thanks, Boris > > With those done, we can finally make CI to accept the new checks even > the kernel is not uptodate. > > Mark Harmstone (1): > btrfs-progs: check that device byte values in superblock match those > in chunk root > > Qu Wenruo (1): > btrfs-progs: fsck-tests: make the warning check more specific for 057 > > check/main.c | 35 +++++++++++++++++++ > .../fsck-tests/057-seed-false-alerts/test.sh | 6 ++-- > 2 files changed, 38 insertions(+), 3 deletions(-) > > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] btrfs-progs: check for device item between super 2025-08-05 19:34 ` [PATCH 0/2] btrfs-progs: check for device item between super Boris Burkov @ 2025-08-05 22:53 ` Qu Wenruo 2025-08-05 23:26 ` Boris Burkov 0 siblings, 1 reply; 6+ messages in thread From: Qu Wenruo @ 2025-08-05 22:53 UTC (permalink / raw) To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs 在 2025/8/6 05:04, Boris Burkov 写道: > On Sat, Aug 02, 2025 at 09:56:18AM +0930, Qu Wenruo wrote: >> Mark has submitted a check enhancement for progs to detect the device >> item mismatch between super blocks and the items inside chunk tree. >> >> However there is a long existing problem that it will break CI. >> >> The root cause is that the CI kernel lacks the needed backports, that on >> a lot cases the kernel can lead to such mismatch and being caught by the >> newer progs. > > Can you explain why we are contorting around out of date "continuous" > integration? Shouldn't we just get the CI on a new kernel? Because the distro used in github CI (ubuntu LTS?) doesn't have a proper way to install the latest upstream kernel. Thus it means a lot of btrfs-progs self tests are out of our control. I have a crazy idea that we include some read-write fuse implementation into btrfs-progs, and use that fuse implement to replace kernel btrfs, but that will be a super huge project. Thus we have to workaround the problem for now, and I believe the github CI is still a great tool. > >> >> So to merge this long existing fsck enhancement, this series refresh and >> workaround the problem by: > > Thanks for putting in the effort to get the enhancement in, by the way! > >> >> - Only reports warnings when such mismatch is detected >> Such mismatch is not a huge deal, as we always trust the device item in >> chunk tree more than the super block one. >> So it won't cause data loss or whatever. > > That makes sense to me. > >> >> So even if the CI kernel doesn't have the fix, self test cases won't >> report them as a failure. >> >> - Workaround fsck/057 to avoid failure >> Test case fsck/057 is a special case, where we manually check the >> output for warning messages. >> >> This is originally to detect problems related seed device, but now it >> will also detect device item mismatch cause by the older CI kernel. >> >> Workaround it by making the keyword more specific to the original >> problem, not just checking for warnings. > > I'm a little skeptical about reducing even the incidental coverage of > the test except for a good reason. Yep, that's the biggest problem, and I do not have any better solution. We either: - Find a way to use upstream kernel for github CI But it will still cause check errors for newer progs on older kernels. And I failed to find a way for that. - Further reduce the severity of the dev item mismatch That's possible and workaround the fsck/057 by not outputting "WARNING:" at all. But that also further reduce the need of dev item check in the first place. - Workaround fsck/057 The way I choose. - Use fuse from btrfs-progs instead of kernel The ultimate but the most time consuming solution. Thanks, Qu > > Thanks, > Boris > >> >> With those done, we can finally make CI to accept the new checks even >> the kernel is not uptodate. >> >> Mark Harmstone (1): >> btrfs-progs: check that device byte values in superblock match those >> in chunk root >> >> Qu Wenruo (1): >> btrfs-progs: fsck-tests: make the warning check more specific for 057 >> >> check/main.c | 35 +++++++++++++++++++ >> .../fsck-tests/057-seed-false-alerts/test.sh | 6 ++-- >> 2 files changed, 38 insertions(+), 3 deletions(-) >> >> -- >> 2.50.1 >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] btrfs-progs: check for device item between super 2025-08-05 22:53 ` Qu Wenruo @ 2025-08-05 23:26 ` Boris Burkov 0 siblings, 0 replies; 6+ messages in thread From: Boris Burkov @ 2025-08-05 23:26 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs On Wed, Aug 06, 2025 at 08:23:33AM +0930, Qu Wenruo wrote: > > > 在 2025/8/6 05:04, Boris Burkov 写道: > > On Sat, Aug 02, 2025 at 09:56:18AM +0930, Qu Wenruo wrote: > > > Mark has submitted a check enhancement for progs to detect the device > > > item mismatch between super blocks and the items inside chunk tree. > > > > > > However there is a long existing problem that it will break CI. > > > > > > The root cause is that the CI kernel lacks the needed backports, that on > > > a lot cases the kernel can lead to such mismatch and being caught by the > > > newer progs. > > > > Can you explain why we are contorting around out of date "continuous" > > integration? Shouldn't we just get the CI on a new kernel? > > Because the distro used in github CI (ubuntu LTS?) doesn't have a proper way > to install the latest upstream kernel. > > Thus it means a lot of btrfs-progs self tests are out of our control. > > I have a crazy idea that we include some read-write fuse implementation into > btrfs-progs, and use that fuse implement to replace kernel btrfs, but that > will be a super huge project. > > Thus we have to workaround the problem for now, and I believe the github CI > is still a great tool. > Sounds, good, thanks for explanation. You can add Reviewed-by: Boris Burkov <boris@bur.io> to the series. > > > > > > > > So to merge this long existing fsck enhancement, this series refresh and > > > workaround the problem by: > > > > Thanks for putting in the effort to get the enhancement in, by the way! > > > > > > > > - Only reports warnings when such mismatch is detected > > > Such mismatch is not a huge deal, as we always trust the device item in > > > chunk tree more than the super block one. > > > So it won't cause data loss or whatever. > > > > That makes sense to me. > > > > > > > > So even if the CI kernel doesn't have the fix, self test cases won't > > > report them as a failure. > > > > > > - Workaround fsck/057 to avoid failure > > > Test case fsck/057 is a special case, where we manually check the > > > output for warning messages. > > > > > > This is originally to detect problems related seed device, but now it > > > will also detect device item mismatch cause by the older CI kernel. > > > > > > Workaround it by making the keyword more specific to the original > > > problem, not just checking for warnings. > > > > I'm a little skeptical about reducing even the incidental coverage of > > the test except for a good reason. > > Yep, that's the biggest problem, and I do not have any better solution. > > We either: > > - Find a way to use upstream kernel for github CI > But it will still cause check errors for newer progs on older > kernels. > > And I failed to find a way for that. > > > - Further reduce the severity of the dev item mismatch > That's possible and workaround the fsck/057 by not outputting > "WARNING:" at all. > > But that also further reduce the need of dev item check in the first > place. > > - Workaround fsck/057 > The way I choose. > > > - Use fuse from btrfs-progs instead of kernel > The ultimate but the most time consuming solution. > > Thanks, > Qu > > > > > Thanks, > > Boris > > > > > > > > With those done, we can finally make CI to accept the new checks even > > > the kernel is not uptodate. > > > > > > Mark Harmstone (1): > > > btrfs-progs: check that device byte values in superblock match those > > > in chunk root > > > > > > Qu Wenruo (1): > > > btrfs-progs: fsck-tests: make the warning check more specific for 057 > > > > > > check/main.c | 35 +++++++++++++++++++ > > > .../fsck-tests/057-seed-false-alerts/test.sh | 6 ++-- > > > 2 files changed, 38 insertions(+), 3 deletions(-) > > > > > > -- > > > 2.50.1 > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-05 23:25 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-02 0:26 [PATCH 0/2] btrfs-progs: check for device item between super Qu Wenruo 2025-08-02 0:26 ` [PATCH 1/2] btrfs-progs: fsck-tests: make the warning check more specific for 057 Qu Wenruo 2025-08-02 0:26 ` [PATCH 2/2] btrfs-progs: check that device byte values in superblock match those in chunk root Qu Wenruo 2025-08-05 19:34 ` [PATCH 0/2] btrfs-progs: check for device item between super Boris Burkov 2025-08-05 22:53 ` Qu Wenruo 2025-08-05 23:26 ` Boris Burkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox