[PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
@ 2023-06-08 11:48 Qu Wenruo
  2023-06-08 12:15 ` Christoph Hellwig
  2023-06-19 16:13 ` Filipe Manana
  0 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-06-08 11:48 UTC (permalink / raw)
  To: linux-btrfs, fstests

[BUG]
There is a chance that btrfs/266 would fail on aarch64 with 64K page
size. (No matter if it's 4K sector size or 64K sector size)

The failure indicates that one or more mirrors are not properly fixed.

[CAUSE]
I have added some trace events into the btrfs IO paths, including
__btrfs_submit_bio() and __end_bio_extent_readpage().

When the test failed, the trace looks like this:

 112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
                                    ^^^ Initial read on the full 192K file
 112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
				    ^^^ Repair on the first 64K block
					Which would success
 112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
				    ^^^ Repair on the second 64K block
				 	Which would fail
 112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
				    ^^^ Repair on the third 64K block
					Which would success
 112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
					 ^^^ The repair succeeded, the
					     full 192K read finished.

 112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
 112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
 112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
 112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0

But above read only happen for mirror 1 and mirror 3, mirror 2 is not
involved.
This means by somehow, the read on mirror 2 didn't happen, mostly
due to something wrong during the drop_caches call.

It may be an async operation or some hardware specific behavior.

On the other hand, for test cases doing the same operation but utilizing
direct IO, aka btrfs/267, it never fails as we never populate the page
cache thus would always read from the disk.

[WORKAROUND]
The root cause is in the "echo 3 > drop_caches", which I'm still
debugging.

But at the same time, we can workaround the problem by forcing a
cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().

By this we can ensure all page caches are dropped no matter if
drop_caches is working correctly.

With this patch, I no longer hit the failure on aarch64 with 64K page
size anymore, while before this the test case would always fail during a
-I 10 run.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
RFC:
The root cause is still inside that "echo 3 > drop_caches", but I don't
have any better solution if such fundamental debug function is not
working as expected.

Thus this is only a workaround, before I can pin down the root cause of
that drop_cache hiccup.
---
 common/btrfs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/common/btrfs b/common/btrfs
index 344509ce..1d522c33 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
 	local size=$5

 	echo 3 > /proc/sys/vm/drop_caches
+	# Above drop_caches doesn't seem to drop every pages on aarch64 with
+	# 64K page size.
+	# So here as a workaround, cycle mount the SCRATCH_MNT to ensure
+	# the cache are dropped.
+	_scratch_cycle_mount
 	while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
 		exec $XFS_IO_PROG \
 			-c "pread -b $size $offset $size" $file) ]]; do
-- 
2.39.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo
@ 2023-06-08 12:15 ` Christoph Hellwig
  2023-06-08 23:57   ` Qu Wenruo
  2023-06-19 16:13 ` Filipe Manana
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2023-06-08 12:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, fstests

On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote:
>  	echo 3 > /proc/sys/vm/drop_caches
> +	# Above drop_caches doesn't seem to drop every pages on aarch64 with
> +	# 64K page size.
> +	# So here as a workaround, cycle mount the SCRATCH_MNT to ensure
> +	# the cache are dropped.
> +	_scratch_cycle_mount

The _scratch_cycle_mount looks ok to me, but if we do that, there is
really no point in doing the drop_caches as well.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-08 12:15 ` Christoph Hellwig
@ 2023-06-08 23:57   ` Qu Wenruo
  2023-06-10  6:46     ` Zorro Lang
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2023-06-08 23:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, fstests



On 2023/6/8 20:15, Christoph Hellwig wrote:
> On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote:
>>   	echo 3 > /proc/sys/vm/drop_caches
>> +	# Above drop_caches doesn't seem to drop every pages on aarch64 with
>> +	# 64K page size.
>> +	# So here as a workaround, cycle mount the SCRATCH_MNT to ensure
>> +	# the cache are dropped.
>> +	_scratch_cycle_mount
> 
> The _scratch_cycle_mount looks ok to me, but if we do that, there is
> really no point in doing the drop_caches as well.

Yep, we can remove that line. It's mostly a reminder for me to dig out 
why that drop_caches doesn't work as expected.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-08 23:57   ` Qu Wenruo
@ 2023-06-10  6:46     ` Zorro Lang
  2023-06-10  7:25       ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Zorro Lang @ 2023-06-10  6:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, fstests

On Fri, Jun 09, 2023 at 07:57:37AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/6/8 20:15, Christoph Hellwig wrote:
> > On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote:
> > >   	echo 3 > /proc/sys/vm/drop_caches
> > > +	# Above drop_caches doesn't seem to drop every pages on aarch64 with
> > > +	# 64K page size.
> > > +	# So here as a workaround, cycle mount the SCRATCH_MNT to ensure
> > > +	# the cache are dropped.
> > > +	_scratch_cycle_mount
> > 
> > The _scratch_cycle_mount looks ok to me, but if we do that, there is
> > really no point in doing the drop_caches as well.
> 
> Yep, we can remove that line. It's mostly a reminder for me to dig out why
> that drop_caches doesn't work as expected.

How about leaving a reminder/explanation in comment, remove that real drop_caches
line. If no objection, I'll merge this patch with this change.

Thanks,
Zorro

> 
> Thanks,
> Qu
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-10  6:46     ` Zorro Lang
@ 2023-06-10  7:25       ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-06-10  7:25 UTC (permalink / raw)
  To: Zorro Lang, Qu Wenruo; +Cc: linux-btrfs, fstests



On 2023/6/10 14:46, Zorro Lang wrote:
> On Fri, Jun 09, 2023 at 07:57:37AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2023/6/8 20:15, Christoph Hellwig wrote:
>>> On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote:
>>>>    	echo 3 > /proc/sys/vm/drop_caches
>>>> +	# Above drop_caches doesn't seem to drop every pages on aarch64 with
>>>> +	# 64K page size.
>>>> +	# So here as a workaround, cycle mount the SCRATCH_MNT to ensure
>>>> +	# the cache are dropped.
>>>> +	_scratch_cycle_mount
>>>
>>> The _scratch_cycle_mount looks ok to me, but if we do that, there is
>>> really no point in doing the drop_caches as well.
>>
>> Yep, we can remove that line. It's mostly a reminder for me to dig out why
>> that drop_caches doesn't work as expected.
>
> How about leaving a reminder/explanation in comment, remove that real drop_caches
> line. If no objection, I'll merge this patch with this change.

We can reuse the comment in the patch, which is already mentioning
drop_caches doesn't always work at least on aarch64.

I'm fine dropping the "echo 3 > drop_caches" line.

Thanks,
Qu
>
> Thanks,
> Zorro
>
>>
>> Thanks,
>> Qu
>>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo
  2023-06-08 12:15 ` Christoph Hellwig
@ 2023-06-19 16:13 ` Filipe Manana
  2023-06-20  1:07   ` Qu Wenruo
  1 sibling, 1 reply; 8+ messages in thread
From: Filipe Manana @ 2023-06-19 16:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, fstests

On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> There is a chance that btrfs/266 would fail on aarch64 with 64K page
> size. (No matter if it's 4K sector size or 64K sector size)
>
> The failure indicates that one or more mirrors are not properly fixed.
>
> [CAUSE]
> I have added some trace events into the btrfs IO paths, including
> __btrfs_submit_bio() and __end_bio_extent_readpage().
>
> When the test failed, the trace looks like this:
>
>  112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
>                                     ^^^ Initial read on the full 192K file
>  112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
>                                     ^^^ Repair on the first 64K block
>                                         Which would success
>  112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
>                                     ^^^ Repair on the second 64K block
>                                         Which would fail
>  112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
>                                     ^^^ Repair on the third 64K block
>                                         Which would success
>  112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
>                                          ^^^ The repair succeeded, the
>                                              full 192K read finished.
>
>  112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
>  112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
>  112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
>  112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0
>
> But above read only happen for mirror 1 and mirror 3, mirror 2 is not
> involved.
> This means by somehow, the read on mirror 2 didn't happen, mostly
> due to something wrong during the drop_caches call.
>
> It may be an async operation or some hardware specific behavior.
>
> On the other hand, for test cases doing the same operation but utilizing
> direct IO, aka btrfs/267, it never fails as we never populate the page
> cache thus would always read from the disk.
>
> [WORKAROUND]
> The root cause is in the "echo 3 > drop_caches", which I'm still
> debugging.
>
> But at the same time, we can workaround the problem by forcing a
> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().
>
> By this we can ensure all page caches are dropped no matter if
> drop_caches is working correctly.
>
> With this patch, I no longer hit the failure on aarch64 with 64K page
> size anymore, while before this the test case would always fail during a
> -I 10 run.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> RFC:
> The root cause is still inside that "echo 3 > drop_caches", but I don't
> have any better solution if such fundamental debug function is not
> working as expected.
>
> Thus this is only a workaround, before I can pin down the root cause of
> that drop_cache hiccup.
> ---
>  common/btrfs | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/common/btrfs b/common/btrfs
> index 344509ce..1d522c33 100644
> --- a/common/btrfs
> +++ b/common/btrfs
> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
>         local size=$5
>
>         echo 3 > /proc/sys/vm/drop_caches
> +       # Above drop_caches doesn't seem to drop every pages on aarch64 with
> +       # 64K page size.
> +       # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
> +       # the cache are dropped.
> +       _scratch_cycle_mount

Btw, I'm getting some tests failing because of this change.

For e.g.:

./check btrfs/143
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP
PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS  -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

btrfs/143 6s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad)
    --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100
    +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad
2023-06-19 17:04:00.575033899 +0100
    @@ -1,37 +1,6 @@
     QA output created by 143
     wrote 131072/131072 bytes
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    ...
    (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out
/home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad'  to see
the entire diff)
Ran: btrfs/143
Failures: btrfs/143
Failed 1 of 1 tests

The problem is this test is using dm-rust, and _scratch_cycle_mount()
doesn't work in that case.
It should be _unmount_dust() followed by_mount_dust() for this particular case.


>         while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
>                 exec $XFS_IO_PROG \
>                         -c "pread -b $size $offset $size" $file) ]]; do
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-19 16:13 ` Filipe Manana
@ 2023-06-20  1:07   ` Qu Wenruo
  2023-06-20  9:13     ` Filipe Manana
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2023-06-20  1:07 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, fstests



On 2023/6/20 00:13, Filipe Manana wrote:
> On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> There is a chance that btrfs/266 would fail on aarch64 with 64K page
>> size. (No matter if it's 4K sector size or 64K sector size)
>>
>> The failure indicates that one or more mirrors are not properly fixed.
>>
>> [CAUSE]
>> I have added some trace events into the btrfs IO paths, including
>> __btrfs_submit_bio() and __end_bio_extent_readpage().
>>
>> When the test failed, the trace looks like this:
>>
>>   112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
>>                                      ^^^ Initial read on the full 192K file
>>   112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
>>                                      ^^^ Repair on the first 64K block
>>                                          Which would success
>>   112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
>>                                      ^^^ Repair on the second 64K block
>>                                          Which would fail
>>   112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
>>                                      ^^^ Repair on the third 64K block
>>                                          Which would success
>>   112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
>>                                           ^^^ The repair succeeded, the
>>                                               full 192K read finished.
>>
>>   112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
>>   112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
>>   112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
>>   112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0
>>
>> But above read only happen for mirror 1 and mirror 3, mirror 2 is not
>> involved.
>> This means by somehow, the read on mirror 2 didn't happen, mostly
>> due to something wrong during the drop_caches call.
>>
>> It may be an async operation or some hardware specific behavior.
>>
>> On the other hand, for test cases doing the same operation but utilizing
>> direct IO, aka btrfs/267, it never fails as we never populate the page
>> cache thus would always read from the disk.
>>
>> [WORKAROUND]
>> The root cause is in the "echo 3 > drop_caches", which I'm still
>> debugging.
>>
>> But at the same time, we can workaround the problem by forcing a
>> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().
>>
>> By this we can ensure all page caches are dropped no matter if
>> drop_caches is working correctly.
>>
>> With this patch, I no longer hit the failure on aarch64 with 64K page
>> size anymore, while before this the test case would always fail during a
>> -I 10 run.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> RFC:
>> The root cause is still inside that "echo 3 > drop_caches", but I don't
>> have any better solution if such fundamental debug function is not
>> working as expected.
>>
>> Thus this is only a workaround, before I can pin down the root cause of
>> that drop_cache hiccup.
>> ---
>>   common/btrfs | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/common/btrfs b/common/btrfs
>> index 344509ce..1d522c33 100644
>> --- a/common/btrfs
>> +++ b/common/btrfs
>> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
>>          local size=$5
>>
>>          echo 3 > /proc/sys/vm/drop_caches
>> +       # Above drop_caches doesn't seem to drop every pages on aarch64 with
>> +       # 64K page size.
>> +       # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
>> +       # the cache are dropped.
>> +       _scratch_cycle_mount
> 
> Btw, I'm getting some tests failing because of this change.
> 
> For e.g.:
> 
> ./check btrfs/143
> FSTYP         -- btrfs
> PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP
> PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
> MKFS_OPTIONS  -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> 
> btrfs/143 6s ... [failed, exit status 1]- output mismatch (see
> /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad)
>      --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100
>      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad
> 2023-06-19 17:04:00.575033899 +0100
>      @@ -1,37 +1,6 @@
>       QA output created by 143
>       wrote 131072/131072 bytes
>       XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> ................
>      ...
>      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out
> /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad'  to see
> the entire diff)
> Ran: btrfs/143
> Failures: btrfs/143
> Failed 1 of 1 tests
> 
> The problem is this test is using dm-rust, and _scratch_cycle_mount()
> doesn't work in that case.
> It should be _unmount_dust() followed by_mount_dust() for this particular case.

Any idea on a better solution?

My current idea is to grab the mounted device of SCRATCH_MNT, then 
unmount and mount the grabbed device, instead of always using scratch 
device.

But it may look a little over-complicated and would affect too many test 
cases.

Or maybe we can detect if it's using dust device inside 
_btrfs_buffered_read_on_mirror() instead?

Thanks,
Qu
> 
> 
>>          while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
>>                  exec $XFS_IO_PROG \
>>                          -c "pread -b $size $offset $size" $file) ]]; do
>> --
>> 2.39.1
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
  2023-06-20  1:07   ` Qu Wenruo
@ 2023-06-20  9:13     ` Filipe Manana
  0 siblings, 0 replies; 8+ messages in thread
From: Filipe Manana @ 2023-06-20  9:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, fstests

On Tue, Jun 20, 2023 at 2:07 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2023/6/20 00:13, Filipe Manana wrote:
> > On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> [BUG]
> >> There is a chance that btrfs/266 would fail on aarch64 with 64K page
> >> size. (No matter if it's 4K sector size or 64K sector size)
> >>
> >> The failure indicates that one or more mirrors are not properly fixed.
> >>
> >> [CAUSE]
> >> I have added some trace events into the btrfs IO paths, including
> >> __btrfs_submit_bio() and __end_bio_extent_readpage().
> >>
> >> When the test failed, the trace looks like this:
> >>
> >>   112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
> >>                                      ^^^ Initial read on the full 192K file
> >>   112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
> >>                                      ^^^ Repair on the first 64K block
> >>                                          Which would success
> >>   112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
> >>                                      ^^^ Repair on the second 64K block
> >>                                          Which would fail
> >>   112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
> >>                                      ^^^ Repair on the third 64K block
> >>                                          Which would success
> >>   112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
> >>                                           ^^^ The repair succeeded, the
> >>                                               full 192K read finished.
> >>
> >>   112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
> >>   112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
> >>   112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
> >>   112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0
> >>
> >> But above read only happen for mirror 1 and mirror 3, mirror 2 is not
> >> involved.
> >> This means by somehow, the read on mirror 2 didn't happen, mostly
> >> due to something wrong during the drop_caches call.
> >>
> >> It may be an async operation or some hardware specific behavior.
> >>
> >> On the other hand, for test cases doing the same operation but utilizing
> >> direct IO, aka btrfs/267, it never fails as we never populate the page
> >> cache thus would always read from the disk.
> >>
> >> [WORKAROUND]
> >> The root cause is in the "echo 3 > drop_caches", which I'm still
> >> debugging.
> >>
> >> But at the same time, we can workaround the problem by forcing a
> >> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().
> >>
> >> By this we can ensure all page caches are dropped no matter if
> >> drop_caches is working correctly.
> >>
> >> With this patch, I no longer hit the failure on aarch64 with 64K page
> >> size anymore, while before this the test case would always fail during a
> >> -I 10 run.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> RFC:
> >> The root cause is still inside that "echo 3 > drop_caches", but I don't
> >> have any better solution if such fundamental debug function is not
> >> working as expected.
> >>
> >> Thus this is only a workaround, before I can pin down the root cause of
> >> that drop_cache hiccup.
> >> ---
> >>   common/btrfs | 5 +++++
> >>   1 file changed, 5 insertions(+)
> >>
> >> diff --git a/common/btrfs b/common/btrfs
> >> index 344509ce..1d522c33 100644
> >> --- a/common/btrfs
> >> +++ b/common/btrfs
> >> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
> >>          local size=$5
> >>
> >>          echo 3 > /proc/sys/vm/drop_caches
> >> +       # Above drop_caches doesn't seem to drop every pages on aarch64 with
> >> +       # 64K page size.
> >> +       # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
> >> +       # the cache are dropped.
> >> +       _scratch_cycle_mount
> >
> > Btw, I'm getting some tests failing because of this change.
> >
> > For e.g.:
> >
> > ./check btrfs/143
> > FSTYP         -- btrfs
> > PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP
> > PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
> > MKFS_OPTIONS  -- /dev/sdc
> > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> >
> > btrfs/143 6s ... [failed, exit status 1]- output mismatch (see
> > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad)
> >      --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100
> >      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad
> > 2023-06-19 17:04:00.575033899 +0100
> >      @@ -1,37 +1,6 @@
> >       QA output created by 143
> >       wrote 131072/131072 bytes
> >       XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > ................
> >      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > ................
> >      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > ................
> >      -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > ................
> >      ...
> >      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out
> > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad'  to see
> > the entire diff)
> > Ran: btrfs/143
> > Failures: btrfs/143
> > Failed 1 of 1 tests
> >
> > The problem is this test is using dm-rust, and _scratch_cycle_mount()
> > doesn't work in that case.
> > It should be _unmount_dust() followed by_mount_dust() for this particular case.
>
> Any idea on a better solution?
>
> My current idea is to grab the mounted device of SCRATCH_MNT, then
> unmount and mount the grabbed device, instead of always using scratch
> device.

Maybe that, yes.

>
> But it may look a little over-complicated and would affect too many test
> cases.
>
> Or maybe we can detect if it's using dust device inside
> _btrfs_buffered_read_on_mirror() instead?

Well, that sounds fragile. Maybe there are other tests using dmflakey
instead of dmrust, or some other target.
Or maybe not now, but in the future...

>
> Thanks,
> Qu
> >
> >
> >>          while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
> >>                  exec $XFS_IO_PROG \
> >>                          -c "pread -b $size $offset $size" $file) ]]; do
> >> --
> >> 2.39.1
> >>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-20  9:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo
2023-06-08 12:15 ` Christoph Hellwig
2023-06-08 23:57   ` Qu Wenruo
2023-06-10  6:46     ` Zorro Lang
2023-06-10  7:25       ` Qu Wenruo
2023-06-19 16:13 ` Filipe Manana
2023-06-20  1:07   ` Qu Wenruo
2023-06-20  9:13     ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox