Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
Date: Tue, 20 Jun 2023 09:07:37 +0800	[thread overview]
Message-ID: <3ea45213-fd95-4027-8a95-066b07fdecc9@suse.com> (raw)
In-Reply-To: <CAL3q7H6VhahmpcWZGpfauyUmrG76kqAyZAMQb8_5T74Xyg6U+A@mail.gmail.com>



On 2023/6/20 00:13, Filipe Manana wrote:
> On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> There is a chance that btrfs/266 would fail on aarch64 with 64K page
>> size. (No matter if it's 4K sector size or 64K sector size)
>>
>> The failure indicates that one or more mirrors are not properly fixed.
>>
>> [CAUSE]
>> I have added some trace events into the btrfs IO paths, including
>> __btrfs_submit_bio() and __end_bio_extent_readpage().
>>
>> When the test failed, the trace looks like this:
>>
>>   112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
>>                                      ^^^ Initial read on the full 192K file
>>   112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
>>                                      ^^^ Repair on the first 64K block
>>                                          Which would success
>>   112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
>>                                      ^^^ Repair on the second 64K block
>>                                          Which would fail
>>   112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
>>                                      ^^^ Repair on the third 64K block
>>                                          Which would success
>>   112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
>>                                           ^^^ The repair succeeded, the
>>                                               full 192K read finished.
>>
>>   112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
>>   112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
>>   112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
>>   112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0
>>
>> But above read only happen for mirror 1 and mirror 3, mirror 2 is not
>> involved.
>> This means by somehow, the read on mirror 2 didn't happen, mostly
>> due to something wrong during the drop_caches call.
>>
>> It may be an async operation or some hardware specific behavior.
>>
>> On the other hand, for test cases doing the same operation but utilizing
>> direct IO, aka btrfs/267, it never fails as we never populate the page
>> cache thus would always read from the disk.
>>
>> [WORKAROUND]
>> The root cause is in the "echo 3 > drop_caches", which I'm still
>> debugging.
>>
>> But at the same time, we can workaround the problem by forcing a
>> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().
>>
>> By this we can ensure all page caches are dropped no matter if
>> drop_caches is working correctly.
>>
>> With this patch, I no longer hit the failure on aarch64 with 64K page
>> size anymore, while before this the test case would always fail during a
>> -I 10 run.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> RFC:
>> The root cause is still inside that "echo 3 > drop_caches", but I don't
>> have any better solution if such fundamental debug function is not
>> working as expected.
>>
>> Thus this is only a workaround, before I can pin down the root cause of
>> that drop_cache hiccup.
>> ---
>>   common/btrfs | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/common/btrfs b/common/btrfs
>> index 344509ce..1d522c33 100644
>> --- a/common/btrfs
>> +++ b/common/btrfs
>> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
>>          local size=$5
>>
>>          echo 3 > /proc/sys/vm/drop_caches
>> +       # Above drop_caches doesn't seem to drop every pages on aarch64 with
>> +       # 64K page size.
>> +       # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
>> +       # the cache are dropped.
>> +       _scratch_cycle_mount
> 
> Btw, I'm getting some tests failing because of this change.
> 
> For e.g.:
> 
> ./check btrfs/143
> FSTYP         -- btrfs
> PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP
> PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
> MKFS_OPTIONS  -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> 
> btrfs/143 6s ... [failed, exit status 1]- output mismatch (see
> /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad)
>      --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100
>      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad
> 2023-06-19 17:04:00.575033899 +0100
>      @@ -1,37 +1,6 @@
>       QA output created by 143
>       wrote 131072/131072 bytes
>       XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      ...
>      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out
> /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad'  to see
> the entire diff)
> Ran: btrfs/143
> Failures: btrfs/143
> Failed 1 of 1 tests
> 
> The problem is this test is using dm-rust, and _scratch_cycle_mount()
> doesn't work in that case.
> It should be _unmount_dust() followed by_mount_dust() for this particular case.

Any idea on a better solution?

My current idea is to grab the mounted device of SCRATCH_MNT, then 
unmount and mount the grabbed device, instead of always using scratch 
device.

But it may look a little over-complicated and would affect too many test 
cases.

Or maybe we can detect if it's using dust device inside 
_btrfs_buffered_read_on_mirror() instead?

Thanks,
Qu
> 
> 
>>          while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
>>                  exec $XFS_IO_PROG \
>>                          -c "pread -b $size $offset $size" $file) ]]; do
>> --
>> 2.39.1
>>

next prev parent reply	other threads:[~2023-06-20  1:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo
2023-06-08 12:15 ` Christoph Hellwig
2023-06-08 23:57   ` Qu Wenruo
2023-06-10  6:46     ` Zorro Lang
2023-06-10  7:25       ` Qu Wenruo
2023-06-19 16:13 ` Filipe Manana
2023-06-20  1:07   ` Qu Wenruo [this message]
2023-06-20  9:13     ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ea45213-fd95-4027-8a95-066b07fdecc9@suse.com \
    --to=wqu@suse.com \
    --cc=fdmanana@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox