* [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped
@ 2023-06-08 11:48 Qu Wenruo
2023-06-08 12:15 ` Christoph Hellwig
2023-06-19 16:13 ` Filipe Manana
0 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-06-08 11:48 UTC (permalink / raw)
To: linux-btrfs, fstests
[BUG]
There is a chance that btrfs/266 would fail on aarch64 with 64K page
size. (No matter if it's 4K sector size or 64K sector size)
The failure indicates that one or more mirrors are not properly fixed.
[CAUSE]
I have added some trace events into the btrfs IO paths, including
__btrfs_submit_bio() and __end_bio_extent_readpage().
When the test failed, the trace looks like this:
112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
^^^ Initial read on the full 192K file
112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
^^^ Repair on the first 64K block
Which would success
112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
^^^ Repair on the second 64K block
Which would fail
112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
^^^ Repair on the third 64K block
Which would success
112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
^^^ The repair succeeded, the
full 192K read finished.
112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0
But above read only happen for mirror 1 and mirror 3, mirror 2 is not
involved.
This means by somehow, the read on mirror 2 didn't happen, mostly
due to something wrong during the drop_caches call.
It may be an async operation or some hardware specific behavior.
On the other hand, for test cases doing the same operation but utilizing
direct IO, aka btrfs/267, it never fails as we never populate the page
cache thus would always read from the disk.
[WORKAROUND]
The root cause is in the "echo 3 > drop_caches", which I'm still
debugging.
But at the same time, we can workaround the problem by forcing a
cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().
By this we can ensure all page caches are dropped no matter if
drop_caches is working correctly.
With this patch, I no longer hit the failure on aarch64 with 64K page
size anymore, while before this the test case would always fail during a
-I 10 run.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
RFC:
The root cause is still inside that "echo 3 > drop_caches", but I don't
have any better solution if such fundamental debug function is not
working as expected.
Thus this is only a workaround, before I can pin down the root cause of
that drop_cache hiccup.
---
common/btrfs | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/common/btrfs b/common/btrfs
index 344509ce..1d522c33 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
local size=$5
echo 3 > /proc/sys/vm/drop_caches
+ # Above drop_caches doesn't seem to drop every pages on aarch64 with
+ # 64K page size.
+ # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
+ # the cache are dropped.
+ _scratch_cycle_mount
while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
exec $XFS_IO_PROG \
-c "pread -b $size $offset $size" $file) ]]; do
--
2.39.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo @ 2023-06-08 12:15 ` Christoph Hellwig 2023-06-08 23:57 ` Qu Wenruo 2023-06-19 16:13 ` Filipe Manana 1 sibling, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2023-06-08 12:15 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, fstests On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote: > echo 3 > /proc/sys/vm/drop_caches > + # Above drop_caches doesn't seem to drop every pages on aarch64 with > + # 64K page size. > + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure > + # the cache are dropped. > + _scratch_cycle_mount The _scratch_cycle_mount looks ok to me, but if we do that, there is really no point in doing the drop_caches as well. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-08 12:15 ` Christoph Hellwig @ 2023-06-08 23:57 ` Qu Wenruo 2023-06-10 6:46 ` Zorro Lang 0 siblings, 1 reply; 8+ messages in thread From: Qu Wenruo @ 2023-06-08 23:57 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-btrfs, fstests On 2023/6/8 20:15, Christoph Hellwig wrote: > On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote: >> echo 3 > /proc/sys/vm/drop_caches >> + # Above drop_caches doesn't seem to drop every pages on aarch64 with >> + # 64K page size. >> + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure >> + # the cache are dropped. >> + _scratch_cycle_mount > > The _scratch_cycle_mount looks ok to me, but if we do that, there is > really no point in doing the drop_caches as well. Yep, we can remove that line. It's mostly a reminder for me to dig out why that drop_caches doesn't work as expected. Thanks, Qu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-08 23:57 ` Qu Wenruo @ 2023-06-10 6:46 ` Zorro Lang 2023-06-10 7:25 ` Qu Wenruo 0 siblings, 1 reply; 8+ messages in thread From: Zorro Lang @ 2023-06-10 6:46 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, fstests On Fri, Jun 09, 2023 at 07:57:37AM +0800, Qu Wenruo wrote: > > > On 2023/6/8 20:15, Christoph Hellwig wrote: > > On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote: > > > echo 3 > /proc/sys/vm/drop_caches > > > + # Above drop_caches doesn't seem to drop every pages on aarch64 with > > > + # 64K page size. > > > + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure > > > + # the cache are dropped. > > > + _scratch_cycle_mount > > > > The _scratch_cycle_mount looks ok to me, but if we do that, there is > > really no point in doing the drop_caches as well. > > Yep, we can remove that line. It's mostly a reminder for me to dig out why > that drop_caches doesn't work as expected. How about leaving a reminder/explanation in comment, remove that real drop_caches line. If no objection, I'll merge this patch with this change. Thanks, Zorro > > Thanks, > Qu > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-10 6:46 ` Zorro Lang @ 2023-06-10 7:25 ` Qu Wenruo 0 siblings, 0 replies; 8+ messages in thread From: Qu Wenruo @ 2023-06-10 7:25 UTC (permalink / raw) To: Zorro Lang, Qu Wenruo; +Cc: linux-btrfs, fstests On 2023/6/10 14:46, Zorro Lang wrote: > On Fri, Jun 09, 2023 at 07:57:37AM +0800, Qu Wenruo wrote: >> >> >> On 2023/6/8 20:15, Christoph Hellwig wrote: >>> On Thu, Jun 08, 2023 at 07:48:36PM +0800, Qu Wenruo wrote: >>>> echo 3 > /proc/sys/vm/drop_caches >>>> + # Above drop_caches doesn't seem to drop every pages on aarch64 with >>>> + # 64K page size. >>>> + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure >>>> + # the cache are dropped. >>>> + _scratch_cycle_mount >>> >>> The _scratch_cycle_mount looks ok to me, but if we do that, there is >>> really no point in doing the drop_caches as well. >> >> Yep, we can remove that line. It's mostly a reminder for me to dig out why >> that drop_caches doesn't work as expected. > > How about leaving a reminder/explanation in comment, remove that real drop_caches > line. If no objection, I'll merge this patch with this change. We can reuse the comment in the patch, which is already mentioning drop_caches doesn't always work at least on aarch64. I'm fine dropping the "echo 3 > drop_caches" line. Thanks, Qu > > Thanks, > Zorro > >> >> Thanks, >> Qu >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo 2023-06-08 12:15 ` Christoph Hellwig @ 2023-06-19 16:13 ` Filipe Manana 2023-06-20 1:07 ` Qu Wenruo 1 sibling, 1 reply; 8+ messages in thread From: Filipe Manana @ 2023-06-19 16:13 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, fstests On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > There is a chance that btrfs/266 would fail on aarch64 with 64K page > size. (No matter if it's 4K sector size or 64K sector size) > > The failure indicates that one or more mirrors are not properly fixed. > > [CAUSE] > I have added some trace events into the btrfs IO paths, including > __btrfs_submit_bio() and __end_bio_extent_readpage(). > > When the test failed, the trace looks like this: > > 112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663 > ^^^ Initial read on the full 192K file > 112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806 > ^^^ Repair on the first 64K block > Which would success > 112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806 > ^^^ Repair on the second 64K block > Which would fail > 112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806 > ^^^ Repair on the third 64K block > Which would success > 112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0 > ^^^ The repair succeeded, the > full 192K read finished. > > 112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665 > 112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369 > 112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369 > 112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0 > > But above read only happen for mirror 1 and mirror 3, mirror 2 is not > involved. > This means by somehow, the read on mirror 2 didn't happen, mostly > due to something wrong during the drop_caches call. > > It may be an async operation or some hardware specific behavior. > > On the other hand, for test cases doing the same operation but utilizing > direct IO, aka btrfs/267, it never fails as we never populate the page > cache thus would always read from the disk. > > [WORKAROUND] > The root cause is in the "echo 3 > drop_caches", which I'm still > debugging. > > But at the same time, we can workaround the problem by forcing a > cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror(). > > By this we can ensure all page caches are dropped no matter if > drop_caches is working correctly. > > With this patch, I no longer hit the failure on aarch64 with 64K page > size anymore, while before this the test case would always fail during a > -I 10 run. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > RFC: > The root cause is still inside that "echo 3 > drop_caches", but I don't > have any better solution if such fundamental debug function is not > working as expected. > > Thus this is only a workaround, before I can pin down the root cause of > that drop_cache hiccup. > --- > common/btrfs | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/common/btrfs b/common/btrfs > index 344509ce..1d522c33 100644 > --- a/common/btrfs > +++ b/common/btrfs > @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror() > local size=$5 > > echo 3 > /proc/sys/vm/drop_caches > + # Above drop_caches doesn't seem to drop every pages on aarch64 with > + # 64K page size. > + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure > + # the cache are dropped. > + _scratch_cycle_mount Btw, I'm getting some tests failing because of this change. For e.g.: ./check btrfs/143 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023 MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 btrfs/143 6s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad) --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad 2023-06-19 17:04:00.575033899 +0100 @@ -1,37 +1,6 @@ QA output created by 143 wrote 131072/131072 bytes XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad' to see the entire diff) Ran: btrfs/143 Failures: btrfs/143 Failed 1 of 1 tests The problem is this test is using dm-rust, and _scratch_cycle_mount() doesn't work in that case. It should be _unmount_dust() followed by_mount_dust() for this particular case. > while [[ -z $( (( BASHPID % nr_mirrors == mirror )) && > exec $XFS_IO_PROG \ > -c "pread -b $size $offset $size" $file) ]]; do > -- > 2.39.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-19 16:13 ` Filipe Manana @ 2023-06-20 1:07 ` Qu Wenruo 2023-06-20 9:13 ` Filipe Manana 0 siblings, 1 reply; 8+ messages in thread From: Qu Wenruo @ 2023-06-20 1:07 UTC (permalink / raw) To: Filipe Manana; +Cc: linux-btrfs, fstests On 2023/6/20 00:13, Filipe Manana wrote: > On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote: >> >> [BUG] >> There is a chance that btrfs/266 would fail on aarch64 with 64K page >> size. (No matter if it's 4K sector size or 64K sector size) >> >> The failure indicates that one or more mirrors are not properly fixed. >> >> [CAUSE] >> I have added some trace events into the btrfs IO paths, including >> __btrfs_submit_bio() and __end_bio_extent_readpage(). >> >> When the test failed, the trace looks like this: >> >> 112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663 >> ^^^ Initial read on the full 192K file >> 112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806 >> ^^^ Repair on the first 64K block >> Which would success >> 112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806 >> ^^^ Repair on the second 64K block >> Which would fail >> 112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806 >> ^^^ Repair on the third 64K block >> Which would success >> 112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0 >> ^^^ The repair succeeded, the >> full 192K read finished. >> >> 112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665 >> 112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369 >> 112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369 >> 112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0 >> >> But above read only happen for mirror 1 and mirror 3, mirror 2 is not >> involved. >> This means by somehow, the read on mirror 2 didn't happen, mostly >> due to something wrong during the drop_caches call. >> >> It may be an async operation or some hardware specific behavior. >> >> On the other hand, for test cases doing the same operation but utilizing >> direct IO, aka btrfs/267, it never fails as we never populate the page >> cache thus would always read from the disk. >> >> [WORKAROUND] >> The root cause is in the "echo 3 > drop_caches", which I'm still >> debugging. >> >> But at the same time, we can workaround the problem by forcing a >> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror(). >> >> By this we can ensure all page caches are dropped no matter if >> drop_caches is working correctly. >> >> With this patch, I no longer hit the failure on aarch64 with 64K page >> size anymore, while before this the test case would always fail during a >> -I 10 run. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> RFC: >> The root cause is still inside that "echo 3 > drop_caches", but I don't >> have any better solution if such fundamental debug function is not >> working as expected. >> >> Thus this is only a workaround, before I can pin down the root cause of >> that drop_cache hiccup. >> --- >> common/btrfs | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/common/btrfs b/common/btrfs >> index 344509ce..1d522c33 100644 >> --- a/common/btrfs >> +++ b/common/btrfs >> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror() >> local size=$5 >> >> echo 3 > /proc/sys/vm/drop_caches >> + # Above drop_caches doesn't seem to drop every pages on aarch64 with >> + # 64K page size. >> + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure >> + # the cache are dropped. >> + _scratch_cycle_mount > > Btw, I'm getting some tests failing because of this change. > > For e.g.: > > ./check btrfs/143 > FSTYP -- btrfs > PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP > PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023 > MKFS_OPTIONS -- /dev/sdc > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 > > btrfs/143 6s ... [failed, exit status 1]- output mismatch (see > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad) > --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100 > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad > 2023-06-19 17:04:00.575033899 +0100 > @@ -1,37 +1,6 @@ > QA output created by 143 > wrote 131072/131072 bytes > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > ................ > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > ................ > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > ................ > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > ................ > ... > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad' to see > the entire diff) > Ran: btrfs/143 > Failures: btrfs/143 > Failed 1 of 1 tests > > The problem is this test is using dm-rust, and _scratch_cycle_mount() > doesn't work in that case. > It should be _unmount_dust() followed by_mount_dust() for this particular case. Any idea on a better solution? My current idea is to grab the mounted device of SCRATCH_MNT, then unmount and mount the grabbed device, instead of always using scratch device. But it may look a little over-complicated and would affect too many test cases. Or maybe we can detect if it's using dust device inside _btrfs_buffered_read_on_mirror() instead? Thanks, Qu > > >> while [[ -z $( (( BASHPID % nr_mirrors == mirror )) && >> exec $XFS_IO_PROG \ >> -c "pread -b $size $offset $size" $file) ]]; do >> -- >> 2.39.1 >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped 2023-06-20 1:07 ` Qu Wenruo @ 2023-06-20 9:13 ` Filipe Manana 0 siblings, 0 replies; 8+ messages in thread From: Filipe Manana @ 2023-06-20 9:13 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, fstests On Tue, Jun 20, 2023 at 2:07 AM Qu Wenruo <wqu@suse.com> wrote: > > > > On 2023/6/20 00:13, Filipe Manana wrote: > > On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@suse.com> wrote: > >> > >> [BUG] > >> There is a chance that btrfs/266 would fail on aarch64 with 64K page > >> size. (No matter if it's 4K sector size or 64K sector size) > >> > >> The failure indicates that one or more mirrors are not properly fixed. > >> > >> [CAUSE] > >> I have added some trace events into the btrfs IO paths, including > >> __btrfs_submit_bio() and __end_bio_extent_readpage(). > >> > >> When the test failed, the trace looks like this: > >> > >> 112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663 > >> ^^^ Initial read on the full 192K file > >> 112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806 > >> ^^^ Repair on the first 64K block > >> Which would success > >> 112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806 > >> ^^^ Repair on the second 64K block > >> Which would fail > >> 112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806 > >> ^^^ Repair on the third 64K block > >> Which would success > >> 112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0 > >> ^^^ The repair succeeded, the > >> full 192K read finished. > >> > >> 112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665 > >> 112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369 > >> 112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369 > >> 112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0 > >> > >> But above read only happen for mirror 1 and mirror 3, mirror 2 is not > >> involved. > >> This means by somehow, the read on mirror 2 didn't happen, mostly > >> due to something wrong during the drop_caches call. > >> > >> It may be an async operation or some hardware specific behavior. > >> > >> On the other hand, for test cases doing the same operation but utilizing > >> direct IO, aka btrfs/267, it never fails as we never populate the page > >> cache thus would always read from the disk. > >> > >> [WORKAROUND] > >> The root cause is in the "echo 3 > drop_caches", which I'm still > >> debugging. > >> > >> But at the same time, we can workaround the problem by forcing a > >> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror(). > >> > >> By this we can ensure all page caches are dropped no matter if > >> drop_caches is working correctly. > >> > >> With this patch, I no longer hit the failure on aarch64 with 64K page > >> size anymore, while before this the test case would always fail during a > >> -I 10 run. > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> RFC: > >> The root cause is still inside that "echo 3 > drop_caches", but I don't > >> have any better solution if such fundamental debug function is not > >> working as expected. > >> > >> Thus this is only a workaround, before I can pin down the root cause of > >> that drop_cache hiccup. > >> --- > >> common/btrfs | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/common/btrfs b/common/btrfs > >> index 344509ce..1d522c33 100644 > >> --- a/common/btrfs > >> +++ b/common/btrfs > >> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror() > >> local size=$5 > >> > >> echo 3 > /proc/sys/vm/drop_caches > >> + # Above drop_caches doesn't seem to drop every pages on aarch64 with > >> + # 64K page size. > >> + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure > >> + # the cache are dropped. > >> + _scratch_cycle_mount > > > > Btw, I'm getting some tests failing because of this change. > > > > For e.g.: > > > > ./check btrfs/143 > > FSTYP -- btrfs > > PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP > > PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023 > > MKFS_OPTIONS -- /dev/sdc > > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 > > > > btrfs/143 6s ... [failed, exit status 1]- output mismatch (see > > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad) > > --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100 > > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad > > 2023-06-19 17:04:00.575033899 +0100 > > @@ -1,37 +1,6 @@ > > QA output created by 143 > > wrote 131072/131072 bytes > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > ... > > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out > > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad' to see > > the entire diff) > > Ran: btrfs/143 > > Failures: btrfs/143 > > Failed 1 of 1 tests > > > > The problem is this test is using dm-rust, and _scratch_cycle_mount() > > doesn't work in that case. > > It should be _unmount_dust() followed by_mount_dust() for this particular case. > > Any idea on a better solution? > > My current idea is to grab the mounted device of SCRATCH_MNT, then > unmount and mount the grabbed device, instead of always using scratch > device. Maybe that, yes. > > But it may look a little over-complicated and would affect too many test > cases. > > Or maybe we can detect if it's using dust device inside > _btrfs_buffered_read_on_mirror() instead? Well, that sounds fragile. Maybe there are other tests using dmflakey instead of dmrust, or some other target. Or maybe not now, but in the future... > > Thanks, > Qu > > > > > >> while [[ -z $( (( BASHPID % nr_mirrors == mirror )) && > >> exec $XFS_IO_PROG \ > >> -c "pread -b $size $offset $size" $file) ]]; do > >> -- > >> 2.39.1 > >> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-06-20 9:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-08 11:48 [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped Qu Wenruo 2023-06-08 12:15 ` Christoph Hellwig 2023-06-08 23:57 ` Qu Wenruo 2023-06-10 6:46 ` Zorro Lang 2023-06-10 7:25 ` Qu Wenruo 2023-06-19 16:13 ` Filipe Manana 2023-06-20 1:07 ` Qu Wenruo 2023-06-20 9:13 ` Filipe Manana
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox