* MD: Long delay for container drive removal @ 2024-06-18 14:24 Mateusz Kusiak 2024-06-20 12:43 ` Mateusz Kusiak 0 siblings, 1 reply; 4+ messages in thread From: Mateusz Kusiak @ 2024-06-18 14:24 UTC (permalink / raw) To: linux-raid Hi all, we have an issue submitted for SLES15SP6 that is caused by huge delays when trying to remove drive from a container. The scenario is as follows: 1. Create two drive imsm container # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2 /dev/nvme[0-1]n1 2. Remove single drive from container # mdadm /dev/md127 --remove /dev/nvme0n1 The problem is that drive removal may take up to 7 seconds, which causes timeouts for other components that are mdadm dependent. We narrowed it down to be MD related. We tested this with inbox mdadm-4.3 and mdadm-4.2 on SP6 and delay time is pretty much the same. SP5 is free of this issue. I also tried RHEL 8.9 and drive removal is almost instant. Is it default behavior now, or should we treat this as an issue? Thanks, Mateusz ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: MD: Long delay for container drive removal 2024-06-18 14:24 MD: Long delay for container drive removal Mateusz Kusiak @ 2024-06-20 12:43 ` Mateusz Kusiak 2024-06-24 7:14 ` Mariusz Tkaczyk 0 siblings, 1 reply; 4+ messages in thread From: Mateusz Kusiak @ 2024-06-20 12:43 UTC (permalink / raw) To: linux-raid On 18.06.2024 16:24, Mateusz Kusiak wrote: > Hi all, > we have an issue submitted for SLES15SP6 that is caused by huge delays when trying to remove drive > from a container. > > The scenario is as follows: > 1. Create two drive imsm container > # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2 /dev/nvme[0-1]n1 > 2. Remove single drive from container > # mdadm /dev/md127 --remove /dev/nvme0n1 > > The problem is that drive removal may take up to 7 seconds, which causes timeouts for other > components that are mdadm dependent. > > We narrowed it down to be MD related. We tested this with inbox mdadm-4.3 and mdadm-4.2 on SP6 and > delay time is pretty much the same. SP5 is free of this issue. > > I also tried RHEL 8.9 and drive removal is almost instant. > > Is it default behavior now, or should we treat this as an issue? > > Thanks, > Mateusz > I dug into this more. I retested this on: - Ubuntu 24.04 with inbox kernel 6.6.0: No reproduction - RHEL 9.4 with usptream kernel: 6.9.5-1: Got reproduction (Note that SLES15SP6 comes with 6.8.0-rc4 inbox) I plugged into mdadm with gdb and found out that ioctl call in hot_remove_disk() fails and it's causing a delay. The function looks as follows: int hot_remove_disk(int mdfd, unsigned long dev, int force) { int cnt = force ? 500 : 5; int ret; /* HOT_REMOVE_DISK can fail with EBUSY if there are * outstanding IO requests to the device. * In this case, it can be helpful to wait a little while, * up to 5 seconds if 'force' is set, or 50 msec if not. */ while ((ret = ioctl(mdfd, HOT_REMOVE_DISK, dev)) == -1 && errno == EBUSY && cnt-- > 0) sleep_for(0, MSEC_TO_NSEC(10), true); return ret; } ... if it fails, then it defaults to removing drive via sysfs call. Looks like a kernel ioctl issue... Thanks, Mateusz ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: MD: Long delay for container drive removal 2024-06-20 12:43 ` Mateusz Kusiak @ 2024-06-24 7:14 ` Mariusz Tkaczyk 2024-06-25 9:25 ` Yu Kuai 0 siblings, 1 reply; 4+ messages in thread From: Mariusz Tkaczyk @ 2024-06-24 7:14 UTC (permalink / raw) To: Mateusz Kusiak; +Cc: linux-raid On Thu, 20 Jun 2024 14:43:50 +0200 Mateusz Kusiak <mateusz.kusiak@linux.intel.com> wrote: > On 18.06.2024 16:24, Mateusz Kusiak wrote: > > Hi all, > > we have an issue submitted for SLES15SP6 that is caused by huge delays when > > trying to remove drive from a container. > > > > The scenario is as follows: > > 1. Create two drive imsm container > > # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2 > > /dev/nvme[0-1]n1 2. Remove single drive from container > > # mdadm /dev/md127 --remove /dev/nvme0n1 > > > > The problem is that drive removal may take up to 7 seconds, which causes > > timeouts for other components that are mdadm dependent. > > > > We narrowed it down to be MD related. We tested this with inbox mdadm-4.3 > > and mdadm-4.2 on SP6 and delay time is pretty much the same. SP5 is free of > > this issue. > > > > I also tried RHEL 8.9 and drive removal is almost instant. > > > > Is it default behavior now, or should we treat this as an issue? > > > > Thanks, > > Mateusz > > > > I dug into this more. I retested this on: > - Ubuntu 24.04 with inbox kernel 6.6.0: No reproduction > - RHEL 9.4 with usptream kernel: 6.9.5-1: Got reproduction > (Note that SLES15SP6 comes with 6.8.0-rc4 inbox) > > I plugged into mdadm with gdb and found out that ioctl call in > hot_remove_disk() fails and it's causing a delay. The function looks as > follows: > > int hot_remove_disk(int mdfd, unsigned long dev, int force) > { > int cnt = force ? 500 : 5; > int ret; > > /* HOT_REMOVE_DISK can fail with EBUSY if there are > * outstanding IO requests to the device. > * In this case, it can be helpful to wait a little while, > * up to 5 seconds if 'force' is set, or 50 msec if not. > */ > while ((ret = ioctl(mdfd, HOT_REMOVE_DISK, dev)) == -1 && > errno == EBUSY && > cnt-- > 0) > sleep_for(0, MSEC_TO_NSEC(10), true); > > return ret; > } > ... if it fails, then it defaults to removing drive via sysfs call. > > Looks like a kernel ioctl issue... > Hello, I investigated this. Looks like HOT_REMOVE_DRIVE ioctl almost always failed for raid with no raid personality. At some point it was allowed but it was blocked 6 years ago in c42a0e2675 (this id leads to merge commit, so giving title "md: fix NULL dereference of mddev->pers in remove_and_add_spares()"). And that explains why we have outdated comment in mdadm: if (err && errno == ENODEV) { /* Old kernels rejected this if no personality * is registered */ I'm working to make it fixed in mdadm (for kernels with this hang), I will remove ioctl call for external containers: https://github.com/md-raid-utilities/mdadm/pull/31 On HOT_REMOVE_DRIVE ioctl path, there is a wait for clearing MD_RECOVERY_NEEDED flag with timeout set to 5 seconds. When I disabled this for arrays with no personality- it fixes issue. However, I'm not sure if it is right fix. I would expect to not set MD_RECOVERY_NEEDED for arrays with no MD personality. Kuai and Song could you please advice? diff --git a/drivers/md/md.c b/drivers/md/md.c index c0426a6d2fd1..bd1cedeb105b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7827,7 +7827,7 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, return get_bitmap_file(mddev, argp); } - if (cmd == HOT_REMOVE_DISK) + if (cmd == HOT_REMOVE_DISK && mddev->pers) /* need to ensure recovery thread has run */ wait_event_interruptible_timeout(mddev->sb_wait, !test_bit(MD_RECOVERY_NEEDED, Thanks, Mariusz ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: MD: Long delay for container drive removal 2024-06-24 7:14 ` Mariusz Tkaczyk @ 2024-06-25 9:25 ` Yu Kuai 0 siblings, 0 replies; 4+ messages in thread From: Yu Kuai @ 2024-06-25 9:25 UTC (permalink / raw) To: Mariusz Tkaczyk, Mateusz Kusiak Cc: linux-raid, yukuai (C), yangerkun@huawei.com Hi, 在 2024/06/24 15:14, Mariusz Tkaczyk 写道: > On Thu, 20 Jun 2024 14:43:50 +0200 > Mateusz Kusiak <mateusz.kusiak@linux.intel.com> wrote: > >> On 18.06.2024 16:24, Mateusz Kusiak wrote: >>> Hi all, >>> we have an issue submitted for SLES15SP6 that is caused by huge delays when >>> trying to remove drive from a container. >>> >>> The scenario is as follows: >>> 1. Create two drive imsm container >>> # mdadm --create --run /dev/md/imsm --metadata=imsm --raid-devices=2 >>> /dev/nvme[0-1]n1 2. Remove single drive from container >>> # mdadm /dev/md127 --remove /dev/nvme0n1 >>> >>> The problem is that drive removal may take up to 7 seconds, which causes >>> timeouts for other components that are mdadm dependent. >>> >>> We narrowed it down to be MD related. We tested this with inbox mdadm-4.3 >>> and mdadm-4.2 on SP6 and delay time is pretty much the same. SP5 is free of >>> this issue. >>> >>> I also tried RHEL 8.9 and drive removal is almost instant. >>> >>> Is it default behavior now, or should we treat this as an issue? >>> >>> Thanks, >>> Mateusz >>> >> >> I dug into this more. I retested this on: >> - Ubuntu 24.04 with inbox kernel 6.6.0: No reproduction >> - RHEL 9.4 with usptream kernel: 6.9.5-1: Got reproduction >> (Note that SLES15SP6 comes with 6.8.0-rc4 inbox) >> >> I plugged into mdadm with gdb and found out that ioctl call in >> hot_remove_disk() fails and it's causing a delay. The function looks as >> follows: >> >> int hot_remove_disk(int mdfd, unsigned long dev, int force) >> { >> int cnt = force ? 500 : 5; >> int ret; >> >> /* HOT_REMOVE_DISK can fail with EBUSY if there are >> * outstanding IO requests to the device. >> * In this case, it can be helpful to wait a little while, >> * up to 5 seconds if 'force' is set, or 50 msec if not. >> */ >> while ((ret = ioctl(mdfd, HOT_REMOVE_DISK, dev)) == -1 && >> errno == EBUSY && >> cnt-- > 0) >> sleep_for(0, MSEC_TO_NSEC(10), true); >> >> return ret; >> } >> ... if it fails, then it defaults to removing drive via sysfs call. >> >> Looks like a kernel ioctl issue... >> > > Hello, > I investigated this. Looks like HOT_REMOVE_DRIVE ioctl almost always failed for > raid with no raid personality. At some point it was allowed but it was blocked > 6 years ago in c42a0e2675 (this id leads to merge commit, so giving title "md: > fix NULL dereference of mddev->pers in remove_and_add_spares()"). > > And that explains why we have outdated comment in mdadm: > > if (err && errno == ENODEV) { > /* Old kernels rejected this if no personality > * is registered */ > > I'm working to make it fixed in mdadm (for kernels with this hang), I will > remove ioctl call for external containers: > https://github.com/md-raid-utilities/mdadm/pull/31 > > On HOT_REMOVE_DRIVE ioctl path, there is a wait for clearing MD_RECOVERY_NEEDED > flag with timeout set to 5 seconds. When I disabled this for arrays > with no personality- it fixes issue. However, I'm not sure if it is right fix. I > would expect to not set MD_RECOVERY_NEEDED for arrays with no MD personality. > Kuai and Song could you please advice? > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index c0426a6d2fd1..bd1cedeb105b 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -7827,7 +7827,7 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t > mode, return get_bitmap_file(mddev, argp); > } > > - if (cmd == HOT_REMOVE_DISK) > + if (cmd == HOT_REMOVE_DISK && mddev->pers) This patch will work, however, I'm afraid this can't fix the problem thoroughly, because this is called without 'reconfig_mutex', and mddev->pers can be set later before hot_remove_disk(). After taking a look at the commit 90f5f7ad4f38, which introducing the waiting, was trying to wait for a failed device to be removed by md_check_recovery(). However, this doen't make sense now, because remove_and_add_spares() is called diretcly from hot_remove_disk(), hence failed device can be removed directly from ioctl, without md_check_recovery(). The only thining to prevent a failed device to be removed from array wound be MD_RECOVERY_RUNNING, however, we can't wait for this falg to be cleared, hence I'll suggest to revert this patch. Thanks, Kuai > /* need to ensure recovery thread has run */ > wait_event_interruptible_timeout(mddev->sb_wait, > !test_bit(MD_RECOVERY_NEEDED, > > > Thanks, > Mariusz > > . > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-25 9:25 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-18 14:24 MD: Long delay for container drive removal Mateusz Kusiak 2024-06-20 12:43 ` Mateusz Kusiak 2024-06-24 7:14 ` Mariusz Tkaczyk 2024-06-25 9:25 ` Yu Kuai
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.