* Bug in reshape+discard?
@ 2023-03-04 21:36 Benjamin Sonntag
2023-03-14 7:35 ` Guoqing Jiang
0 siblings, 1 reply; 2+ messages in thread
From: Benjamin Sonntag @ 2023-03-04 21:36 UTC (permalink / raw)
To: linux-raid
Hi everyone,
I think we found a bug in the mdadm code here at Octopuce. I'm reporting it here, please tell me if that's not the right place to report it, or if you need any other information.
This bug "hangs" processes in the Device-busy (D) state forever, until we reboot. It has been tested on both a debian 5.10 an 6.0 Linux kernel
How to trigger the bug:
- create a raid5 or raid6 block device using mdadm
mdadm --create /dev/md0 -l 5 -n 3 /dev/sd{a,b,c}2
- create a partition on it and mount it USING DISCARD/TRIM (important) (the underlying device must support trim)
mkfs.ext4 /dev/md0
mount /dev/md0 /mnt -o discard
- create a few files
for i in $( seq 1 1000 ) ; do dd if=/dev/zero of=/mnt/$i bs=10M count=1 ; done
- expand the raid by adding a new drive
mdadm --add /dev/md0 /dev/sdd2
mdadm --grow /dev/md0 -n 4
- the last command will start a "reshape" operation on md0
- DURING THE RESHAPE (it's important) erase some file (it goes fine)
rm /mnt/* -rf
- THEN, still during the reshape (important) try to sync or fsync
sync
- the sync process get stuck in the D state. no way to kill it until reboot
(in fact, any process that does sync during the reshape after some files were deleted will get stuck, such as mariadbd or rsyslog...)
- If you don't mount with discard your partition, the 'sync' works properly
An easy way to circumvent this problem:
- before reshaping, remount without discard
mount /mnt -o remount,nodiscard
- after the reshaping ends, remount with discard
mount /mnt -o remount,discard
We don't really know how to start searching for a solution, since it requires knowing the internal of MD & Discard pretty well :/ (and I'm definitely not a kernel coder ;) )
thanks for your help on this issue,
cheers,
Benjamin Sonntag
Octopuce, Paris.
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: Bug in reshape+discard?
2023-03-04 21:36 Bug in reshape+discard? Benjamin Sonntag
@ 2023-03-14 7:35 ` Guoqing Jiang
0 siblings, 0 replies; 2+ messages in thread
From: Guoqing Jiang @ 2023-03-14 7:35 UTC (permalink / raw)
To: Benjamin Sonntag, linux-raid
Hi,
On 3/5/23 05:36, Benjamin Sonntag wrote:
> Hi everyone,
>
> I think we found a bug in the mdadm code here at Octopuce.
Probably something wrong inside md raid.
> I'm reporting it here, please tell me if that's not the right place to report it, or if you need any other information.
>
> This bug "hangs" processes in the Device-busy (D) state forever, until we reboot. It has been tested on both a debian 5.10 an 6.0 Linux kernel
Do you mean it happened on both kernel versions? Could you share
relevant stacks by "cat /proc/${pid of D state process}/stack''?
> How to trigger the bug:
>
> - create a raid5 or raid6 block device using mdadm
> mdadm --create /dev/md0 -l 5 -n 3 /dev/sd{a,b,c}2
>
> - create a partition on it and mount it USING DISCARD/TRIM (important) (the underlying device must support trim)
> mkfs.ext4 /dev/md0
> mount /dev/md0 /mnt -o discard
>
> - create a few files
> for i in $( seq 1 1000 ) ; do dd if=/dev/zero of=/mnt/$i bs=10M count=1 ; done
>
> - expand the raid by adding a new drive
> mdadm --add /dev/md0 /dev/sdd2
> mdadm --grow /dev/md0 -n 4
>
> - the last command will start a "reshape" operation on md0
> - DURING THE RESHAPE (it's important) erase some file (it goes fine)
> rm /mnt/* -rf
>
> - THEN, still during the reshape (important) try to sync or fsync
> sync
>
> - the sync process get stuck in the D state. no way to kill it until reboot
> (in fact, any process that does sync during the reshape after some files were deleted will get stuck, such as mariadbd or rsyslog...)
>
> - If you don't mount with discard your partition, the 'sync' works properly
>
>
> An easy way to circumvent this problem:
>
> - before reshaping, remount without discard
> mount /mnt -o remount,nodiscard
>
> - after the reshaping ends, remount with discard
> mount /mnt -o remount,discard
>
>
> We don't really know how to start searching for a solution, since it requires knowing the internal of MD & Discard pretty well :/ (and I'm definitely not a kernel coder ;) )
>
> thanks for your help on this issue,
Assume reshape + discard works with previous kernel version, maybe
you can try to bisect kernel tree to see which commit might caused
the bug.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-03-14 7:35 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-04 21:36 Bug in reshape+discard? Benjamin Sonntag
2023-03-14 7:35 ` Guoqing Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).