Uncorrectable error during multiple scrub (raid5 recovery).

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* Uncorrectable error during multiple scrub (raid5 recovery).
@ 2022-08-14  3:10 Wang Yugui
  2022-08-14  4:46 ` Qu Wenruo
  2022-08-14  5:31 ` Wang Yugui
  0 siblings, 2 replies; 5+ messages in thread
From: Wang Yugui @ 2022-08-14  3:10 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Uncorrectable error during multiple scrub (raid5 recovery).

This reproducer is based on some reproducer [1],
but it seems a new problem, so I open a new thread.

reproducer:

mkfs.btrfs -f -draid5 -mraid1 ${SCRATCH_DEV_POOL}
SCRATCH_DEV_ARRAY=($SCRATCH_DEV_POOL)
mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime

/bin/cp -a /usr/bin $SCRATCH_MNT/
#(OK)dd if=/dev/urandom bs=1M count=1K of=$SCRATCH_MNT/1G.img
du -sh $SCRATCH_MNT

for((i=1;i<=15;++i)); do

	#(OK)umount $SCRATCH_MNT; mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
	sync; sleep 5; sync; sleep 5; sync; sleep 25;

	# change the device to discard in every loop
	j=$(( i % ${#SCRATCH_DEV_ARRAY[@]} ))
	/usr/sbin/blkdiscard -f ${SCRATCH_DEV_ARRAY[$j]} # --offset 2M

	btrfs scrub start -Bd $SCRATCH_MNT | grep 'summary\|Uncorrectable'

done

This problem will not happen if we change the test data to simpler one.
# about 220M data of '/usr/bin' to single 1G file

This problem will not happen if we clear cache with 'umount; mount'
between multiple loop.
# 'sync; sleep 5; ...' to  'umount; mount'

so it seems that some info in memory is wrong after RAID5 recovery?

[1]
Subject: misc-next and for-next: kernel BUG at fs/btrfs/extent_io.c:2350!
during raid5 recovery
https://lore.kernel.org/linux-btrfs/9dfb0b60-9178-7bbe-6ba1-10d056a7e84c@gmx.com/T/#t

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/08/14

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Uncorrectable error during multiple scrub (raid5 recovery).
  2022-08-14  3:10 Uncorrectable error during multiple scrub (raid5 recovery) Wang Yugui
@ 2022-08-14  4:46 ` Qu Wenruo
  2022-08-14  4:58   ` Wang Yugui
  2022-08-14  5:31 ` Wang Yugui
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2022-08-14  4:46 UTC (permalink / raw)
  To: Wang Yugui, linux-btrfs



On 2022/8/14 11:10, Wang Yugui wrote:
> Hi,
>
> Uncorrectable error during multiple scrub (raid5 recovery).
>
> This reproducer is based on some reproducer [1],
> but it seems a new problem, so I open a new thread.
>
> reproducer:
>
> mkfs.btrfs -f -draid5 -mraid1 ${SCRATCH_DEV_POOL}
> SCRATCH_DEV_ARRAY=($SCRATCH_DEV_POOL)
> mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime

Please remove unnecessary comments if it's not explaining anything.

It just makes the test case much harder to read.
>
> /bin/cp -a /usr/bin $SCRATCH_MNT/
> #(OK)dd if=/dev/urandom bs=1M count=1K of=$SCRATCH_MNT/1G.img
> du -sh $SCRATCH_MNT
>
> for((i=1;i<=15;++i)); do
>
> 	#(OK)umount $SCRATCH_MNT; mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
> 	sync; sleep 5; sync; sleep 5; sync; sleep 25;

If you really want to provide a reproducer, either explain why this is
needed or it will just waste time of everyone who is trying this test.

>
> 	# change the device to discard in every loop
> 	j=$(( i % ${#SCRATCH_DEV_ARRAY[@]} ))
> 	/usr/sbin/blkdiscard -f ${SCRATCH_DEV_ARRAY[$j]} # --offset 2M
>
> 	btrfs scrub start -Bd $SCRATCH_MNT | grep 'summary\|Uncorrectable'
>
> done
>
> This problem will not happen if we change the test data to simpler one.
> # about 220M data of '/usr/bin' to single 1G file
>
> This problem will not happen if we clear cache with 'umount; mount'
> between multiple loop.
> # 'sync; sleep 5; ...' to  'umount; mount'
>
> so it seems that some info in memory is wrong after RAID5 recovery?
>
> [1]
> Subject: misc-next and for-next: kernel BUG at fs/btrfs/extent_io.c:2350!
> during raid5 recovery
> https://lore.kernel.org/linux-btrfs/9dfb0b60-9178-7bbe-6ba1-10d056a7e84c@gmx.com/T/#t

That case is in fact not related to RAID56, and we already have the fix
for it:
https://lore.kernel.org/linux-btrfs/1d9b69af6ce0a79e54fbaafcc65ead8f71b54b60.1660377678.git.wqu@suse.com/

Thanks,
Qu

>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/08/14
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Uncorrectable error during multiple scrub (raid5 recovery).
  2022-08-14  4:46 ` Qu Wenruo
@ 2022-08-14  4:58   ` Wang Yugui
  2022-08-14  5:08     ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-08-14  4:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi,

> On 2022/8/14 11:10, Wang Yugui wrote:
> > Hi,
> >
> > Uncorrectable error during multiple scrub (raid5 recovery).
> >
> > This reproducer is based on some reproducer [1],
> > but it seems a new problem, so I open a new thread.
> >
> > reproducer:
> >
> > mkfs.btrfs -f -draid5 -mraid1 ${SCRATCH_DEV_POOL}
> > SCRATCH_DEV_ARRAY=($SCRATCH_DEV_POOL)
> > mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
> 
> Please remove unnecessary comments if it's not explaining anything.
> 
> It just makes the test case much harder to read.
> >
> > /bin/cp -a /usr/bin $SCRATCH_MNT/
> > #(OK)dd if=/dev/urandom bs=1M count=1K of=$SCRATCH_MNT/1G.img
> > du -sh $SCRATCH_MNT
> >
> > for((i=1;i<=15;++i)); do
> >
> > 	#(OK)umount $SCRATCH_MNT; mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
> > 	sync; sleep 5; sync; sleep 5; sync; sleep 25;
> 
> If you really want to provide a reproducer, either explain why this is
> needed or it will just waste time of everyone who is trying this test.
> 
> >
> > 	# change the device to discard in every loop
> > 	j=$(( i % ${#SCRATCH_DEV_ARRAY[@]} ))
> > 	/usr/sbin/blkdiscard -f ${SCRATCH_DEV_ARRAY[$j]} # --offset 2M
> >
> > 	btrfs scrub start -Bd $SCRATCH_MNT | grep 'summary\|Uncorrectable'
> >
> > done
> >
> > This problem will not happen if we change the test data to simpler one.
> > # about 220M data of '/usr/bin' to single 1G file
> >
> > This problem will not happen if we clear cache with 'umount; mount'
> > between multiple loop.
> > # 'sync; sleep 5; ...' to  'umount; mount'
> >
> > so it seems that some info in memory is wrong after RAID5 recovery?
> >
> > [1]
> > Subject: misc-next and for-next: kernel BUG at fs/btrfs/extent_io.c:2350!
> > during raid5 recovery
> > https://lore.kernel.org/linux-btrfs/9dfb0b60-9178-7bbe-6ba1-10d056a7e84c@gmx.com/T/#t
> 
> That case is in fact not related to RAID56, and we already have the fix
> for it:
> https://lore.kernel.org/linux-btrfs/1d9b69af6ce0a79e54fbaafcc65ead8f71b54b60.1660377678.git.wqu@suse.com/

This problem still happen on  linux 5.20(20220812) with the flowing patches.

Subject: btrfs: scrub: properly report super block errors in system log
Subject: btrfs: scrub: try to fix super block errors
(v2)Subject: btrfs: don't merge pages into bio if their page offset is not

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/08/14



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Uncorrectable error during multiple scrub (raid5 recovery).
  2022-08-14  4:58   ` Wang Yugui
@ 2022-08-14  5:08     ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-08-14  5:08 UTC (permalink / raw)
  To: Wang Yugui; +Cc: linux-btrfs



On 2022/8/14 12:58, Wang Yugui wrote:
> Hi,
>
>> On 2022/8/14 11:10, Wang Yugui wrote:
>>> Hi,
>>>
>>> Uncorrectable error during multiple scrub (raid5 recovery).
>>>
>>> This reproducer is based on some reproducer [1],
>>> but it seems a new problem, so I open a new thread.
>>>
>>> reproducer:
>>>
>>> mkfs.btrfs -f -draid5 -mraid1 ${SCRATCH_DEV_POOL}
>>> SCRATCH_DEV_ARRAY=($SCRATCH_DEV_POOL)
>>> mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
>>
>> Please remove unnecessary comments if it's not explaining anything.
>>
>> It just makes the test case much harder to read.
>>>
>>> /bin/cp -a /usr/bin $SCRATCH_MNT/
>>> #(OK)dd if=/dev/urandom bs=1M count=1K of=$SCRATCH_MNT/1G.img
>>> du -sh $SCRATCH_MNT
>>>
>>> for((i=1;i<=15;++i)); do
>>>
>>> 	#(OK)umount $SCRATCH_MNT; mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
>>> 	sync; sleep 5; sync; sleep 5; sync; sleep 25;
>>
>> If you really want to provide a reproducer, either explain why this is
>> needed or it will just waste time of everyone who is trying this test.
>>
>>>
>>> 	# change the device to discard in every loop
>>> 	j=$(( i % ${#SCRATCH_DEV_ARRAY[@]} ))
>>> 	/usr/sbin/blkdiscard -f ${SCRATCH_DEV_ARRAY[$j]} # --offset 2M
>>>
>>> 	btrfs scrub start -Bd $SCRATCH_MNT | grep 'summary\|Uncorrectable'
>>>
>>> done
>>>
>>> This problem will not happen if we change the test data to simpler one.
>>> # about 220M data of '/usr/bin' to single 1G file
>>>
>>> This problem will not happen if we clear cache with 'umount; mount'
>>> between multiple loop.
>>> # 'sync; sleep 5; ...' to  'umount; mount'
>>>
>>> so it seems that some info in memory is wrong after RAID5 recovery?
>>>
>>> [1]
>>> Subject: misc-next and for-next: kernel BUG at fs/btrfs/extent_io.c:2350!
>>> during raid5 recovery
>>> https://lore.kernel.org/linux-btrfs/9dfb0b60-9178-7bbe-6ba1-10d056a7e84c@gmx.com/T/#t
>>
>> That case is in fact not related to RAID56, and we already have the fix
>> for it:
>> https://lore.kernel.org/linux-btrfs/1d9b69af6ce0a79e54fbaafcc65ead8f71b54b60.1660377678.git.wqu@suse.com/
>
> This problem still happen on  linux 5.20(20220812) with the flowing patches.

Then please provide a human readable reproducer.

The current one is not really clear which part is commented out.

>
> Subject: btrfs: scrub: properly report super block errors in system log
> Subject: btrfs: scrub: try to fix super block errors
> (v2)Subject: btrfs: don't merge pages into bio if their page offset is not
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/08/14
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Uncorrectable error during multiple scrub (raid5 recovery).
  2022-08-14  3:10 Uncorrectable error during multiple scrub (raid5 recovery) Wang Yugui
  2022-08-14  4:46 ` Qu Wenruo
@ 2022-08-14  5:31 ` Wang Yugui
  1 sibling, 0 replies; 5+ messages in thread
From: Wang Yugui @ 2022-08-14  5:31 UTC (permalink / raw)
  To: linux-btrfs

Hi,

> Uncorrectable error during multiple scrub (raid5 recovery).
> 
> This reproducer is based on some reproducer [1],
> but it seems a new problem, so I open a new thread.
> 
> reproducer:
> 
> mkfs.btrfs -f -draid5 -mraid1 ${SCRATCH_DEV_POOL}
> SCRATCH_DEV_ARRAY=($SCRATCH_DEV_POOL)
> mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
> 
> /bin/cp -a /usr/bin $SCRATCH_MNT/
> #(OK)dd if=/dev/urandom bs=1M count=1K of=$SCRATCH_MNT/1G.img
> du -sh $SCRATCH_MNT
> 
> for((i=1;i<=15;++i)); do
> 
> 	#(OK)umount $SCRATCH_MNT; mount ${SCRATCH_DEV_ARRAY[0]} $SCRATCH_MNT # -o compress=zstd,noatime
> 	sync; sleep 5; sync; sleep 5; sync; sleep 25;
> 
> 	# change the device to discard in every loop
> 	j=$(( i % ${#SCRATCH_DEV_ARRAY[@]} ))
> 	/usr/sbin/blkdiscard -f ${SCRATCH_DEV_ARRAY[$j]} # --offset 2M
> 
> 	btrfs scrub start -Bd $SCRATCH_MNT | grep 'summary\|Uncorrectable'
> 
> done


Add some info about the logical of this reproducer.

For btrfs RAID5, if only one disk is corrupted (blkdiscard in this
reproducer) at the same time, we can recovery it by 'btrfs scrub'.

Once this recovery is finished, and then if another disk is corrupted
(blkdiscard in this reproducer) too, we can recovery it by 'btrfs scrub'
too.

We do multiple RAID5 recovery(btrfs scrub) with different disk, as a way
to try to detect some mismatched info inside.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/08/14


> 
> This problem will not happen if we change the test data to simpler one.
> # about 220M data of '/usr/bin' to single 1G file
> 
> This problem will not happen if we clear cache with 'umount; mount'
> between multiple loop.
> # 'sync; sleep 5; ...' to  'umount; mount'
> 
> so it seems that some info in memory is wrong after RAID5 recovery?
> 
> [1]
> Subject: misc-next and for-next: kernel BUG at fs/btrfs/extent_io.c:2350!
> during raid5 recovery
> https://lore.kernel.org/linux-btrfs/9dfb0b60-9178-7bbe-6ba1-10d056a7e84c@gmx.com/T/#t
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/08/14
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-14  5:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-14  3:10 Uncorrectable error during multiple scrub (raid5 recovery) Wang Yugui
2022-08-14  4:46 ` Qu Wenruo
2022-08-14  4:58   ` Wang Yugui
2022-08-14  5:08     ` Qu Wenruo
2022-08-14  5:31 ` Wang Yugui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox