linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* MD: drive removal hangs with freshly created partition
@ 2024-07-02 14:57 Mateusz Kusiak
  2024-07-03  8:09 ` Mariusz Tkaczyk
  2024-07-04  9:42 ` Mateusz Kusiak
  0 siblings, 2 replies; 3+ messages in thread
From: Mateusz Kusiak @ 2024-07-02 14:57 UTC (permalink / raw)
  To: linux-raid

Hello,
I'm back with another regression found in SLES15SP6.

The scenario is as follows:
1.Create RAID 1 volume with native metadata.
# mdadm -CR /dev/md126 -l1 -n2 /dev/nvme[0-1]n1 --assume-clean --size=5G

2. Create partition and filesystem on raid volume.
# parted -a optimal /dev/md126 mktable gpt mkpart primary ext4 0% 100% -s
# mkfs.ext4 -F /dev/md126p1

3. Remove device via "--incremental --fail".
# mdadm -If nvme0n1

Result:
Mdadm hangs and hung task info from mutliple components starts appearing on serial.

Few notes:
* Issue does not reproduce without creating partition and filesystem.
* If array is stopped and reassembled before step 3, the issue does not reproduce.
* If partition is "reused" (metadata was cleared, new raid volume created, partition left in tact, 
no recreating partition) the issue does not reproduce.
* If "--set-faulty" and then "--remove" used (instead of "--incremental --fail") "--set-faulty" 
succeeds, "--remove" hangs.
* I verified this is not mdadm issue by installing mdadm-4.2 (SLES15SP6 has mdadm-4.3 inbox) and 
rerunning the test. Outcome is the same.
* Writing "remove" to sysfs directly has same result.

Thanks,
Mateusz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: MD: drive removal hangs with freshly created partition
  2024-07-02 14:57 MD: drive removal hangs with freshly created partition Mateusz Kusiak
@ 2024-07-03  8:09 ` Mariusz Tkaczyk
  2024-07-04  9:42 ` Mateusz Kusiak
  1 sibling, 0 replies; 3+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-03  8:09 UTC (permalink / raw)
  To: Mateusz Kusiak; +Cc: linux-raid

On Tue, 2 Jul 2024 16:57:38 +0200
Mateusz Kusiak <mateusz.kusiak@linux.intel.com> wrote:

> Hello,
> I'm back with another regression found in SLES15SP6.
> 
> The scenario is as follows:
> 1.Create RAID 1 volume with native metadata.
> # mdadm -CR /dev/md126 -l1 -n2 /dev/nvme[0-1]n1 --assume-clean --size=5G
> 
> 2. Create partition and filesystem on raid volume.
> # parted -a optimal /dev/md126 mktable gpt mkpart primary ext4 0% 100% -s
> # mkfs.ext4 -F /dev/md126p1
> 
> 3. Remove device via "--incremental --fail".
> # mdadm -If nvme0n1
> 
> Result:
> Mdadm hangs and hung task info from mutliple components starts appearing on
> serial.
> 
> Few notes:
> * Issue does not reproduce without creating partition and filesystem.
> * If array is stopped and reassembled before step 3, the issue does not
> reproduce.
> * If partition is "reused" (metadata was cleared, new raid volume created,
> partition left in tact, no recreating partition) the issue does not reproduce.
> * If "--set-faulty" and then "--remove" used (instead of "--incremental
> --fail") "--set-faulty" succeeds, "--remove" hangs.
> * I verified this is not mdadm issue by installing mdadm-4.2 (SLES15SP6 has
> mdadm-4.3 inbox) and rerunning the test. Outcome is the same.
> * Writing "remove" to sysfs directly has same result.
> 
> Thanks,
> Mateusz
> 

More info:
As Mateusz said echo "remove" >/sys/block/md126/md/rd0/state hangs. Same hang
is observed with HOT_REMOVE_DISK ioctl. We can simulate the scenario by:

echo "faulty" >/sys/block/md126/md/rd0/state
echo "remove" >/sys/block/md126/md/rd0/state

This is really interesting that it is only happening with partitions and only
after their creation.

Mariusz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: MD: drive removal hangs with freshly created partition
  2024-07-02 14:57 MD: drive removal hangs with freshly created partition Mateusz Kusiak
  2024-07-03  8:09 ` Mariusz Tkaczyk
@ 2024-07-04  9:42 ` Mateusz Kusiak
  1 sibling, 0 replies; 3+ messages in thread
From: Mateusz Kusiak @ 2024-07-04  9:42 UTC (permalink / raw)
  To: linux-raid

On 02.07.2024 16:57, Mateusz Kusiak wrote:
> Hello,
> I'm back with another regression found in SLES15SP6.
> 
> The scenario is as follows:
> 1.Create RAID 1 volume with native metadata.
> # mdadm -CR /dev/md126 -l1 -n2 /dev/nvme[0-1]n1 --assume-clean --size=5G
> 
> 2. Create partition and filesystem on raid volume.
> # parted -a optimal /dev/md126 mktable gpt mkpart primary ext4 0% 100% -s
> # mkfs.ext4 -F /dev/md126p1
> 
> 3. Remove device via "--incremental --fail".
> # mdadm -If nvme0n1
> 
> Result:
> Mdadm hangs and hung task info from mutliple components starts appearing on serial.
> 

FYI, this is fixed by: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=855678ed8534518e2b428bcbcec695de9ba248e8

More info: https://github.com/advisories/GHSA-93q3-24jj-x39c

Thanks,
Mateusz

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-07-04  9:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02 14:57 MD: drive removal hangs with freshly created partition Mateusz Kusiak
2024-07-03  8:09 ` Mariusz Tkaczyk
2024-07-04  9:42 ` Mateusz Kusiak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).