* MD: Long delay for container drive removal
@ 2024-06-18 14:24 Mateusz Kusiak
2024-06-20 12:43 ` Mateusz Kusiak
0 siblings, 1 reply; 4+ messages in thread
From: Mateusz Kusiak @ 2024-06-18 14:24 UTC (permalink / raw)
To: linux-raid
Hi all,
we have an issue submitted for SLES15SP6 that is caused by huge delays when trying to remove drive
from a container.
The scenario is as follows:
1. Create two drive imsm container
# mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2 /dev/nvme[0-1]n1
2. Remove single drive from container
# mdadm /dev/md127 --remove /dev/nvme0n1
The problem is that drive removal may take up to 7 seconds, which causes timeouts for other
components that are mdadm dependent.
We narrowed it down to be MD related. We tested this with inbox mdadm-4.3 and mdadm-4.2 on SP6 and
delay time is pretty much the same. SP5 is free of this issue.
I also tried RHEL 8.9 and drive removal is almost instant.
Is it default behavior now, or should we treat this as an issue?
Thanks,
Mateusz
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: MD: Long delay for container drive removal
2024-06-18 14:24 MD: Long delay for container drive removal Mateusz Kusiak
@ 2024-06-20 12:43 ` Mateusz Kusiak
2024-06-24 7:14 ` Mariusz Tkaczyk
0 siblings, 1 reply; 4+ messages in thread
From: Mateusz Kusiak @ 2024-06-20 12:43 UTC (permalink / raw)
To: linux-raid
On 18.06.2024 16:24, Mateusz Kusiak wrote:
> Hi all,
> we have an issue submitted for SLES15SP6 that is caused by huge delays when trying to remove drive
> from a container.
>
> The scenario is as follows:
> 1. Create two drive imsm container
> # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2 /dev/nvme[0-1]n1
> 2. Remove single drive from container
> # mdadm /dev/md127 --remove /dev/nvme0n1
>
> The problem is that drive removal may take up to 7 seconds, which causes timeouts for other
> components that are mdadm dependent.
>
> We narrowed it down to be MD related. We tested this with inbox mdadm-4.3 and mdadm-4.2 on SP6 and
> delay time is pretty much the same. SP5 is free of this issue.
>
> I also tried RHEL 8.9 and drive removal is almost instant.
>
> Is it default behavior now, or should we treat this as an issue?
>
> Thanks,
> Mateusz
>
I dug into this more. I retested this on:
- Ubuntu 24.04 with inbox kernel 6.6.0: No reproduction
- RHEL 9.4 with usptream kernel: 6.9.5-1: Got reproduction
(Note that SLES15SP6 comes with 6.8.0-rc4 inbox)
I plugged into mdadm with gdb and found out that ioctl call in hot_remove_disk() fails and it's
causing a delay. The function looks as follows:
int hot_remove_disk(int mdfd, unsigned long dev, int force)
{
int cnt = force ? 500 : 5;
int ret;
/* HOT_REMOVE_DISK can fail with EBUSY if there are
* outstanding IO requests to the device.
* In this case, it can be helpful to wait a little while,
* up to 5 seconds if 'force' is set, or 50 msec if not.
*/
while ((ret = ioctl(mdfd, HOT_REMOVE_DISK, dev)) == -1 &&
errno == EBUSY &&
cnt-- > 0)
sleep_for(0, MSEC_TO_NSEC(10), true);
return ret;
}
... if it fails, then it defaults to removing drive via sysfs call.
Looks like a kernel ioctl issue...
Thanks,
Mateusz
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: MD: Long delay for container drive removal
2024-06-20 12:43 ` Mateusz Kusiak
@ 2024-06-24 7:14 ` Mariusz Tkaczyk
2024-06-25 9:25 ` Yu Kuai
0 siblings, 1 reply; 4+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-24 7:14 UTC (permalink / raw)
To: Mateusz Kusiak; +Cc: linux-raid
On Thu, 20 Jun 2024 14:43:50 +0200
Mateusz Kusiak <mateusz.kusiak@linux.intel.com> wrote:
> On 18.06.2024 16:24, Mateusz Kusiak wrote:
> > Hi all,
> > we have an issue submitted for SLES15SP6 that is caused by huge delays when
> > trying to remove drive from a container.
> >
> > The scenario is as follows:
> > 1. Create two drive imsm container
> > # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2
> > /dev/nvme[0-1]n1 2. Remove single drive from container
> > # mdadm /dev/md127 --remove /dev/nvme0n1
> >
> > The problem is that drive removal may take up to 7 seconds, which causes
> > timeouts for other components that are mdadm dependent.
> >
> > We narrowed it down to be MD related. We tested this with inbox mdadm-4.3
> > and mdadm-4.2 on SP6 and delay time is pretty much the same. SP5 is free of
> > this issue.
> >
> > I also tried RHEL 8.9 and drive removal is almost instant.
> >
> > Is it default behavior now, or should we treat this as an issue?
> >
> > Thanks,
> > Mateusz
> >
>
> I dug into this more. I retested this on:
> - Ubuntu 24.04 with inbox kernel 6.6.0: No reproduction
> - RHEL 9.4 with usptream kernel: 6.9.5-1: Got reproduction
> (Note that SLES15SP6 comes with 6.8.0-rc4 inbox)
>
> I plugged into mdadm with gdb and found out that ioctl call in
> hot_remove_disk() fails and it's causing a delay. The function looks as
> follows:
>
> int hot_remove_disk(int mdfd, unsigned long dev, int force)
> {
> int cnt = force ? 500 : 5;
> int ret;
>
> /* HOT_REMOVE_DISK can fail with EBUSY if there are
> * outstanding IO requests to the device.
> * In this case, it can be helpful to wait a little while,
> * up to 5 seconds if 'force' is set, or 50 msec if not.
> */
> while ((ret = ioctl(mdfd, HOT_REMOVE_DISK, dev)) == -1 &&
> errno == EBUSY &&
> cnt-- > 0)
> sleep_for(0, MSEC_TO_NSEC(10), true);
>
> return ret;
> }
> ... if it fails, then it defaults to removing drive via sysfs call.
>
> Looks like a kernel ioctl issue...
>
Hello,
I investigated this. Looks like HOT_REMOVE_DRIVE ioctl almost always failed for
raid with no raid personality. At some point it was allowed but it was blocked
6 years ago in c42a0e2675 (this id leads to merge commit, so giving title "md:
fix NULL dereference of mddev->pers in remove_and_add_spares()").
And that explains why we have outdated comment in mdadm:
if (err && errno == ENODEV) {
/* Old kernels rejected this if no personality
* is registered */
I'm working to make it fixed in mdadm (for kernels with this hang), I will
remove ioctl call for external containers:
https://github.com/md-raid-utilities/mdadm/pull/31
On HOT_REMOVE_DRIVE ioctl path, there is a wait for clearing MD_RECOVERY_NEEDED
flag with timeout set to 5 seconds. When I disabled this for arrays
with no personality- it fixes issue. However, I'm not sure if it is right fix. I
would expect to not set MD_RECOVERY_NEEDED for arrays with no MD personality.
Kuai and Song could you please advice?
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c0426a6d2fd1..bd1cedeb105b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7827,7 +7827,7 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t
mode, return get_bitmap_file(mddev, argp);
}
- if (cmd == HOT_REMOVE_DISK)
+ if (cmd == HOT_REMOVE_DISK && mddev->pers)
/* need to ensure recovery thread has run */
wait_event_interruptible_timeout(mddev->sb_wait,
!test_bit(MD_RECOVERY_NEEDED,
Thanks,
Mariusz
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: MD: Long delay for container drive removal
2024-06-24 7:14 ` Mariusz Tkaczyk
@ 2024-06-25 9:25 ` Yu Kuai
0 siblings, 0 replies; 4+ messages in thread
From: Yu Kuai @ 2024-06-25 9:25 UTC (permalink / raw)
To: Mariusz Tkaczyk, Mateusz Kusiak
Cc: linux-raid, yukuai (C), yangerkun@huawei.com
Hi,
在 2024/06/24 15:14, Mariusz Tkaczyk 写道:
> On Thu, 20 Jun 2024 14:43:50 +0200
> Mateusz Kusiak <mateusz.kusiak@linux.intel.com> wrote:
>
>> On 18.06.2024 16:24, Mateusz Kusiak wrote:
>>> Hi all,
>>> we have an issue submitted for SLES15SP6 that is caused by huge delays when
>>> trying to remove drive from a container.
>>>
>>> The scenario is as follows:
>>> 1. Create two drive imsm container
>>> # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2
>>> /dev/nvme[0-1]n1 2. Remove single drive from container
>>> # mdadm /dev/md127 --remove /dev/nvme0n1
>>>
>>> The problem is that drive removal may take up to 7 seconds, which causes
>>> timeouts for other components that are mdadm dependent.
>>>
>>> We narrowed it down to be MD related. We tested this with inbox mdadm-4.3
>>> and mdadm-4.2 on SP6 and delay time is pretty much the same. SP5 is free of
>>> this issue.
>>>
>>> I also tried RHEL 8.9 and drive removal is almost instant.
>>>
>>> Is it default behavior now, or should we treat this as an issue?
>>>
>>> Thanks,
>>> Mateusz
>>>
>>
>> I dug into this more. I retested this on:
>> - Ubuntu 24.04 with inbox kernel 6.6.0: No reproduction
>> - RHEL 9.4 with usptream kernel: 6.9.5-1: Got reproduction
>> (Note that SLES15SP6 comes with 6.8.0-rc4 inbox)
>>
>> I plugged into mdadm with gdb and found out that ioctl call in
>> hot_remove_disk() fails and it's causing a delay. The function looks as
>> follows:
>>
>> int hot_remove_disk(int mdfd, unsigned long dev, int force)
>> {
>> int cnt = force ? 500 : 5;
>> int ret;
>>
>> /* HOT_REMOVE_DISK can fail with EBUSY if there are
>> * outstanding IO requests to the device.
>> * In this case, it can be helpful to wait a little while,
>> * up to 5 seconds if 'force' is set, or 50 msec if not.
>> */
>> while ((ret = ioctl(mdfd, HOT_REMOVE_DISK, dev)) == -1 &&
>> errno == EBUSY &&
>> cnt-- > 0)
>> sleep_for(0, MSEC_TO_NSEC(10), true);
>>
>> return ret;
>> }
>> ... if it fails, then it defaults to removing drive via sysfs call.
>>
>> Looks like a kernel ioctl issue...
>>
>
> Hello,
> I investigated this. Looks like HOT_REMOVE_DRIVE ioctl almost always failed for
> raid with no raid personality. At some point it was allowed but it was blocked
> 6 years ago in c42a0e2675 (this id leads to merge commit, so giving title "md:
> fix NULL dereference of mddev->pers in remove_and_add_spares()").
>
> And that explains why we have outdated comment in mdadm:
>
> if (err && errno == ENODEV) {
> /* Old kernels rejected this if no personality
> * is registered */
>
> I'm working to make it fixed in mdadm (for kernels with this hang), I will
> remove ioctl call for external containers:
> https://github.com/md-raid-utilities/mdadm/pull/31
>
> On HOT_REMOVE_DRIVE ioctl path, there is a wait for clearing MD_RECOVERY_NEEDED
> flag with timeout set to 5 seconds. When I disabled this for arrays
> with no personality- it fixes issue. However, I'm not sure if it is right fix. I
> would expect to not set MD_RECOVERY_NEEDED for arrays with no MD personality.
> Kuai and Song could you please advice?
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c0426a6d2fd1..bd1cedeb105b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7827,7 +7827,7 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t
> mode, return get_bitmap_file(mddev, argp);
> }
>
> - if (cmd == HOT_REMOVE_DISK)
> + if (cmd == HOT_REMOVE_DISK && mddev->pers)
This patch will work, however, I'm afraid this can't fix the problem
thoroughly, because this is called without 'reconfig_mutex', and
mddev->pers can be set later before hot_remove_disk().
After taking a look at the commit 90f5f7ad4f38, which introducing the
waiting, was trying to wait for a failed device to be removed by
md_check_recovery(). However, this doen't make sense now, because
remove_and_add_spares() is called diretcly from hot_remove_disk(),
hence failed device can be removed directly from ioctl, without
md_check_recovery().
The only thining to prevent a failed device to be removed from array
wound be MD_RECOVERY_RUNNING, however, we can't wait for this falg to be
cleared, hence I'll suggest to revert this patch.
Thanks,
Kuai
> /* need to ensure recovery thread has run */
> wait_event_interruptible_timeout(mddev->sb_wait,
> !test_bit(MD_RECOVERY_NEEDED,
>
>
> Thanks,
> Mariusz
>
> .
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-25 9:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18 14:24 MD: Long delay for container drive removal Mateusz Kusiak
2024-06-20 12:43 ` Mateusz Kusiak
2024-06-24 7:14 ` Mariusz Tkaczyk
2024-06-25 9:25 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).