public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/4] btrfs: introduce btrfs_bio::async_csum
Date: Tue, 28 Oct 2025 20:45:15 +1030	[thread overview]
Message-ID: <7e73f89d-ecc4-4adc-a151-1eca9f199c8f@suse.com> (raw)
In-Reply-To: <124eef27-79d1-40ec-9f54-f94509f904fb@suse.com>



在 2025/10/28 18:36, Qu Wenruo 写道:
> 
> 
> 在 2025/10/28 17:49, kernel test robot 写道:
>>
>>
>> Hello,
>>
>> kernel test robot noticed "xfstests.btrfs.026.fail" on:
>>
>> commit: d72352d1c3a3a201dcd3684b05987f281b1d66aa ("[PATCH 4/4] btrfs: 
>> introduce btrfs_bio::async_csum")
>> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs- 
>> make-sure-all-btrfs_bio-end_io-is-called-in-task-context/20251024-185435
>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git 
>> for-next
>> patch link: https://lore.kernel.org/ 
>> all/44a1532190aee561c2a8ae7af9f84fc1e092ae9e.1761302592.git.wqu@suse.com/
>> patch subject: [PATCH 4/4] btrfs: introduce btrfs_bio::async_csum
>>
>> in testcase: xfstests
>> version: xfstests-x86_64-2cba4b54-1_20251020
>> with following parameters:
>>
>>     disk: 6HDD
>>     fs: btrfs
>>     test: btrfs-026
>>
>>
>>
>> config: x86_64-rhel-9.4-func
>> compiler: gcc-14
>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 
>> 3.40GHz (Haswell) with 8G memory
>>
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new 
>> version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202510281522.d23994ae- 
>> lkp@intel.com
> 
> Unfortunately I'm unable to reproduce the failure here.
> 100+ runs no reproduce.
> 
> Thus I guess it may be some incompatibility with the series and the base 
> (which is for-next branch, not the btrfs for-next branch).
> 
> Just to rule out the possibility, mind to re-test using my branch directly?
> 
> https://github.com/adam900710/linux/tree/async_csum
> 
> 
>  From the assets, the dmesg shows that all data checksum are zeros:
> 
> [   62.192305][  T269] BTRFS warning (device sdb2): csum failed root 5 
> ino 258 off 275644416 csum 0xa54e4c94 expected csum 0x00000000 mirror 1
> [   62.192397][   T12] BTRFS warning (device sdb2): csum failed root 5 
> ino 258 off 275775488 csum 0xa54e4c94 expected csum 0x00000000 mirror 1
> [   62.192470][ T5037] BTRFS warning (device sdb2): csum failed root 5 
> ino 258 off 275906560 csum 0xa54e4c94 expected csum 0x00000000 mirror 1
> 
> This means we're running the end_io() functions before the csum is fully 
> calculated.
> 
> If that's the case, it will eventually hits some use-after-free at other 
> tests, which I hit too many times during development due to incorrect 
> wait timing.

My bad, and after looking into my local runs, there are some tests that 
fails with the same dmesg errors.

It turns out that there is a race on bi_iter, where csum_one_bio_work() 
and lower storage layer can try to grab the same bi_iter.

On much newer hardware like zen5, the checksum calculation is way faster 
than IO, thus under most cases csum_one_bio_work() can get the bi_iter 
before lower layer advancing it.

But if lower layer advanced it before csum_one_bio_work(), then 
csum_one_bio_work() will skip the calculation completely resulting the 
csums to be all zero.

Thanks for detecting this bug, I'll update the fix to add a new 
saved_iter so that csum_one_bio_work() can always grab the correct iter.

Thanks,
Qu

> 
> But I'm only seeing this single report, which is pretty weird.
> 
> I hope it's just some bad code base.
> 
> Thanks,
> Qu
>>
>> 2025-10-27 23:06:41 cd /lkp/benchmarks/xfstests
>> 2025-10-27 23:06:42 export TEST_DIR=/fs/sda1
>> 2025-10-27 23:06:42 export TEST_DEV=/dev/sda1
>> 2025-10-27 23:06:42 export FSTYP=btrfs
>> 2025-10-27 23:06:42 export SCRATCH_MNT=/fs/scratch
>> 2025-10-27 23:06:42 mkdir /fs/scratch -p
>> 2025-10-27 23:06:42 export SCRATCH_DEV_POOL="/dev/sda2 /dev/sda3 /dev/ 
>> sda4 /dev/sda5 /dev/sda6"
>> 2025-10-27 23:06:42 echo btrfs/026
>> 2025-10-27 23:06:42 ./check btrfs/026
>> FSTYP         -- btrfs
>> PLATFORM      -- Linux/x86_64 lkp-hsw-d01 6.18.0-rc1-00265- 
>> gd72352d1c3a3 #1 SMP PREEMPT_DYNAMIC Tue Oct 28 06:52:50 CST 2025
>> MKFS_OPTIONS  -- /dev/sda2
>> MOUNT_OPTIONS -- /dev/sda2 /fs/scratch
>>
>> btrfs/026       - output mismatch (see /lkp/benchmarks/xfstests/ 
>> results//btrfs/026.out.bad)
>>      --- tests/btrfs/026.out    2025-10-20 16:48:15.000000000 +0000
>>      +++ /lkp/benchmarks/xfstests/results//btrfs/026.out.bad    
>> 2025-10-27 23:06:53.540513519 +0000
>>      @@ -12,4 +12,4 @@
>>       5876dba1217b4c2915cda86f4c67640e  SCRATCH_MNT/bar
>>       File digests after remounting the file system:
>>       647d815906324ccdf288c7681f900ec0  SCRATCH_MNT/foo
>>      -5876dba1217b4c2915cda86f4c67640e  SCRATCH_MNT/bar
>>      +md5sum: /fs/scratch/bar: Input/output error
>>      ...
>>      (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/026.out /lkp/ 
>> benchmarks/xfstests/results//btrfs/026.out.bad'  to see the entire diff)
>> Ran: btrfs/026
>> Failures: btrfs/026
>> Failed 1 of 1 tests
>>
>>
>>
>>
>> The kernel config and materials to reproduce are available at:
>> https://download.01.org/0day-ci/ 
>> archive/20251028/202510281522.d23994ae-lkp@intel.com
>>
>>
>>
> 
> 


      reply	other threads:[~2025-10-28 10:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-24 10:49 [PATCH 0/4] btrfs: introduce async_csum feature Qu Wenruo
2025-10-24 10:49 ` [PATCH 1/4] btrfs: make sure all btrfs_bio::end_io is called in task context Qu Wenruo
2025-10-24 10:49 ` [PATCH 2/4] btrfs: remove btrfs_fs_info::compressed_write_workers Qu Wenruo
2025-10-24 10:49 ` [PATCH 3/4] btrfs: relax btrfs_inode::ordered_tree_lock Qu Wenruo
2025-10-24 10:49 ` [PATCH 4/4] btrfs: introduce btrfs_bio::async_csum Qu Wenruo
2025-10-24 10:58   ` Christoph Hellwig
2025-10-24 22:15     ` Qu Wenruo
2025-10-24 22:51       ` Eric Biggers
2025-10-24 23:13         ` Qu Wenruo
2025-10-24 14:51   ` Boris Burkov
2025-10-24 21:40     ` Qu Wenruo
2025-10-28  7:19   ` kernel test robot
2025-10-28  8:06     ` Qu Wenruo
2025-10-28 10:15       ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e73f89d-ecc4-4adc-a151-1eca9f199c8f@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox