From: Qu Wenruo <wqu@suse.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/4] btrfs: introduce btrfs_bio::async_csum
Date: Tue, 28 Oct 2025 20:45:15 +1030 [thread overview]
Message-ID: <7e73f89d-ecc4-4adc-a151-1eca9f199c8f@suse.com> (raw)
In-Reply-To: <124eef27-79d1-40ec-9f54-f94509f904fb@suse.com>
在 2025/10/28 18:36, Qu Wenruo 写道:
>
>
> 在 2025/10/28 17:49, kernel test robot 写道:
>>
>>
>> Hello,
>>
>> kernel test robot noticed "xfstests.btrfs.026.fail" on:
>>
>> commit: d72352d1c3a3a201dcd3684b05987f281b1d66aa ("[PATCH 4/4] btrfs:
>> introduce btrfs_bio::async_csum")
>> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-
>> make-sure-all-btrfs_bio-end_io-is-called-in-task-context/20251024-185435
>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git
>> for-next
>> patch link: https://lore.kernel.org/
>> all/44a1532190aee561c2a8ae7af9f84fc1e092ae9e.1761302592.git.wqu@suse.com/
>> patch subject: [PATCH 4/4] btrfs: introduce btrfs_bio::async_csum
>>
>> in testcase: xfstests
>> version: xfstests-x86_64-2cba4b54-1_20251020
>> with following parameters:
>>
>> disk: 6HDD
>> fs: btrfs
>> test: btrfs-026
>>
>>
>>
>> config: x86_64-rhel-9.4-func
>> compiler: gcc-14
>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @
>> 3.40GHz (Haswell) with 8G memory
>>
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new
>> version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202510281522.d23994ae-
>> lkp@intel.com
>
> Unfortunately I'm unable to reproduce the failure here.
> 100+ runs no reproduce.
>
> Thus I guess it may be some incompatibility with the series and the base
> (which is for-next branch, not the btrfs for-next branch).
>
> Just to rule out the possibility, mind to re-test using my branch directly?
>
> https://github.com/adam900710/linux/tree/async_csum
>
>
> From the assets, the dmesg shows that all data checksum are zeros:
>
> [ 62.192305][ T269] BTRFS warning (device sdb2): csum failed root 5
> ino 258 off 275644416 csum 0xa54e4c94 expected csum 0x00000000 mirror 1
> [ 62.192397][ T12] BTRFS warning (device sdb2): csum failed root 5
> ino 258 off 275775488 csum 0xa54e4c94 expected csum 0x00000000 mirror 1
> [ 62.192470][ T5037] BTRFS warning (device sdb2): csum failed root 5
> ino 258 off 275906560 csum 0xa54e4c94 expected csum 0x00000000 mirror 1
>
> This means we're running the end_io() functions before the csum is fully
> calculated.
>
> If that's the case, it will eventually hits some use-after-free at other
> tests, which I hit too many times during development due to incorrect
> wait timing.
My bad, and after looking into my local runs, there are some tests that
fails with the same dmesg errors.
It turns out that there is a race on bi_iter, where csum_one_bio_work()
and lower storage layer can try to grab the same bi_iter.
On much newer hardware like zen5, the checksum calculation is way faster
than IO, thus under most cases csum_one_bio_work() can get the bi_iter
before lower layer advancing it.
But if lower layer advanced it before csum_one_bio_work(), then
csum_one_bio_work() will skip the calculation completely resulting the
csums to be all zero.
Thanks for detecting this bug, I'll update the fix to add a new
saved_iter so that csum_one_bio_work() can always grab the correct iter.
Thanks,
Qu
>
> But I'm only seeing this single report, which is pretty weird.
>
> I hope it's just some bad code base.
>
> Thanks,
> Qu
>>
>> 2025-10-27 23:06:41 cd /lkp/benchmarks/xfstests
>> 2025-10-27 23:06:42 export TEST_DIR=/fs/sda1
>> 2025-10-27 23:06:42 export TEST_DEV=/dev/sda1
>> 2025-10-27 23:06:42 export FSTYP=btrfs
>> 2025-10-27 23:06:42 export SCRATCH_MNT=/fs/scratch
>> 2025-10-27 23:06:42 mkdir /fs/scratch -p
>> 2025-10-27 23:06:42 export SCRATCH_DEV_POOL="/dev/sda2 /dev/sda3 /dev/
>> sda4 /dev/sda5 /dev/sda6"
>> 2025-10-27 23:06:42 echo btrfs/026
>> 2025-10-27 23:06:42 ./check btrfs/026
>> FSTYP -- btrfs
>> PLATFORM -- Linux/x86_64 lkp-hsw-d01 6.18.0-rc1-00265-
>> gd72352d1c3a3 #1 SMP PREEMPT_DYNAMIC Tue Oct 28 06:52:50 CST 2025
>> MKFS_OPTIONS -- /dev/sda2
>> MOUNT_OPTIONS -- /dev/sda2 /fs/scratch
>>
>> btrfs/026 - output mismatch (see /lkp/benchmarks/xfstests/
>> results//btrfs/026.out.bad)
>> --- tests/btrfs/026.out 2025-10-20 16:48:15.000000000 +0000
>> +++ /lkp/benchmarks/xfstests/results//btrfs/026.out.bad
>> 2025-10-27 23:06:53.540513519 +0000
>> @@ -12,4 +12,4 @@
>> 5876dba1217b4c2915cda86f4c67640e SCRATCH_MNT/bar
>> File digests after remounting the file system:
>> 647d815906324ccdf288c7681f900ec0 SCRATCH_MNT/foo
>> -5876dba1217b4c2915cda86f4c67640e SCRATCH_MNT/bar
>> +md5sum: /fs/scratch/bar: Input/output error
>> ...
>> (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/026.out /lkp/
>> benchmarks/xfstests/results//btrfs/026.out.bad' to see the entire diff)
>> Ran: btrfs/026
>> Failures: btrfs/026
>> Failed 1 of 1 tests
>>
>>
>>
>>
>> The kernel config and materials to reproduce are available at:
>> https://download.01.org/0day-ci/
>> archive/20251028/202510281522.d23994ae-lkp@intel.com
>>
>>
>>
>
>
prev parent reply other threads:[~2025-10-28 10:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-24 10:49 [PATCH 0/4] btrfs: introduce async_csum feature Qu Wenruo
2025-10-24 10:49 ` [PATCH 1/4] btrfs: make sure all btrfs_bio::end_io is called in task context Qu Wenruo
2025-10-24 10:49 ` [PATCH 2/4] btrfs: remove btrfs_fs_info::compressed_write_workers Qu Wenruo
2025-10-24 10:49 ` [PATCH 3/4] btrfs: relax btrfs_inode::ordered_tree_lock Qu Wenruo
2025-10-24 10:49 ` [PATCH 4/4] btrfs: introduce btrfs_bio::async_csum Qu Wenruo
2025-10-24 10:58 ` Christoph Hellwig
2025-10-24 22:15 ` Qu Wenruo
2025-10-24 22:51 ` Eric Biggers
2025-10-24 23:13 ` Qu Wenruo
2025-10-24 14:51 ` Boris Burkov
2025-10-24 21:40 ` Qu Wenruo
2025-10-28 7:19 ` kernel test robot
2025-10-28 8:06 ` Qu Wenruo
2025-10-28 10:15 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7e73f89d-ecc4-4adc-a151-1eca9f199c8f@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox