* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware
@ 2015-10-09 0:13 Neil Brown
2015-11-12 22:28 ` Joshua Kinard
0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2015-10-09 0:13 UTC (permalink / raw)
To: Joshua Kinard; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]
> Per commit ac8fa4196d20:
>
> > md: allow resync to go faster when there is competing IO.
> >
> > When md notices non-sync IO happening while it is trying to resync (or
> > reshape or recover) it slows down to the set minimum.
> >
> > The default minimum might have made sense many years ago but the drives have
> > become faster. Changing the default to match the times isn't really a long
> > term solution.
>
> This holds true for modern hardware, but this commit is causing problems on
> older hardware, like SGI MIPS platforms, that use mdraid. Namely, while trying
> to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
> of sync, so on the next reboot, mdraid's attempt to resync at full speed
> absolutely murdered interactivity. It took close to 30mins for the system to
> finally reach the login prompt.
>
> Revert this patch was working to mitigate the problem at first, but it appears
> that in recent kernels, this is no longer the case, and reverting this commit
> has no noticeable effect anymore. I assume I'd have to hunt down newer commits
> to revert, but it's probably saner to just highlight the problem and test any
> proposed solutions.
>
> Is there some way to resolve this in such a way that old hardware maintains
> some level of interactivity during a resync, but that won't inconvenience the
> more modern systems?
>
> http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20
>
> Thanks!,
>
Hmmm... this change shouldn't have that effect.
It should allow resync to soak up a bit more of the idle time, but when
there is any other IO, resync should still back off.
I wonder if there is some other change which has confused the event
counting for the particular hardware you are using.
How did you identify this commit as a possible cause?
The fact that reverting it no longer helps strongly suggests that some
other change is implicated. I don't think there have been other changes
in md which could affect this.
Have you tried adjusting /proc/sys/dev/raid/speed_limit_m{ax,in} ??
Did that have any noticeable effect?
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware 2015-10-09 0:13 Problem w/ commit ac8fa4196d20 on older, slower hardware Neil Brown @ 2015-11-12 22:28 ` Joshua Kinard 2015-11-13 0:03 ` Andreas Klauer 0 siblings, 1 reply; 5+ messages in thread From: Joshua Kinard @ 2015-11-12 22:28 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 10/08/2015 20:13, Neil Brown wrote: > >> Per commit ac8fa4196d20: >> >>> md: allow resync to go faster when there is competing IO. >>> >>> When md notices non-sync IO happening while it is trying to resync (or >>> reshape or recover) it slows down to the set minimum. >>> >>> The default minimum might have made sense many years ago but the drives have >>> become faster. Changing the default to match the times isn't really a long >>> term solution. >> >> This holds true for modern hardware, but this commit is causing problems on >> older hardware, like SGI MIPS platforms, that use mdraid. Namely, while trying >> to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out >> of sync, so on the next reboot, mdraid's attempt to resync at full speed >> absolutely murdered interactivity. It took close to 30mins for the system to >> finally reach the login prompt. >> >> Revert this patch was working to mitigate the problem at first, but it appears >> that in recent kernels, this is no longer the case, and reverting this commit >> has no noticeable effect anymore. I assume I'd have to hunt down newer commits >> to revert, but it's probably saner to just highlight the problem and test any >> proposed solutions. >> >> Is there some way to resolve this in such a way that old hardware maintains >> some level of interactivity during a resync, but that won't inconvenience the >> more modern systems? >> >> http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20 >> >> Thanks!, >> > > Hmmm... this change shouldn't have that effect. > It should allow resync to soak up a bit more of the idle time, but when > there is any other IO, resync should still back off. > > I wonder if there is some other change which has confused the event > counting for the particular hardware you are using. > > How did you identify this commit as a possible cause? Sorry for the late response. I pinned down this particular commit as the cause on an SGI Onyx2 (IP27), which is a MIPS big-endian platform that supports ccNUMA. The SCSI chip is a QLogic ISP1040B. It's been supported in the mainline kernel for a long time, but has suffered from bit-rot over the years. There's an unidentified bug somewhere in the architecture code that, under heavy disk I/O or memory operations (I am not sure which, yet), the machine will completely lock up hard. I have three ~50GB SCA SCSI drives plugged into it, running MD RAID5 and the XFS filesystem. I have /, /home, /usr, /var, and /tmp on separate partitions, each a RAID5 setup. After one of these hard lockups, on the next reboot, the kernel detected that my largest partition, /usr, needed to be rebuilt, so it launched a background resync. The other partitions were fine. I noticed after several minutes that the kernel had still not proceeded to execute /init, and that XFS hadn't even mounted the rootfs yet. I thought the machine had hardlocked again. The lockup bug normally does not happen with a resync (which takes place entirely within the kernel), but more so when running commands from userspace. Physically checking the machine, the disk lights were showing drive activity, so I let it sit for a good half-hour, and when I later checked the serial console out, it had gotten most of the way through the bootup process and was still bringing up runlevel 3 services. Logging into the root console several minutes later showed the resync was almost complete, but interactivity remained very sluggish until the resync finished. So I dug into gitweb on linux-mips.org and looked for any recent commits to md.c that might have something to do with resync operations, and this one stood out the most. Reverting it, then forcing the lockup bug to happen several times until another background resync took place showed drastically-improved bootup speed. The machine was able to boot to userland within ~4-6 mins with the background resync happening on /usr. I think this was on 3.19 or 4.0 (I forget). It was on the next version up that I noticed the revert was no longer having an effect, and a resync slowed I/O down enough that booting to userland was back into the ~30min range. I have also noticed that the lockup bug is also happening, randomly, during a resync now too. I suspect whatever issue is causing the lockup is getting worse. The last kernel I booted on this platform was a 4.2-rcX release. I have not had time to test 4.3.x out. I have also reproduced the same issue on an SGI Octane (IP30), which needs out-of-tree patches to work. It's basically the smaller cousin of an Origin/Onyx2, using the same CPU, SCSI chip, same partition layout, same filesystem. Only the disks, 3x 73GB SCA SCSI disks, and some internal hardware architecture, are different between the two. It does not suffer from any lockup bugs whatsoever, and I only triggered a background resync when I got frustrated at an unrelated issue and powered the machine off out of annoyance. Per hdparm -tT, the average I/O speed is ~160MB/sec reading from cache, and ~18.3MB/sec reading from the /dev/mdX devices. Reading from the individual /dev/sdX drives is slightly faster at ~18.5MB/sec. This is true for both machines. > The fact that reverting it no longer helps strongly suggests that some > other change is implicated. I don't think there have been other changes > in md which could affect this. The changes to the code that this commit affected seems to play some role in the issue, but I agree that it does not appear to be the sole participant anymore. > Have you tried adjusting /proc/sys/dev/raid/speed_limit_m{ax,in} ?? > Did that have any noticeable effect? Hard to do when your kernel takes 30+ minutes to boot up :) Once I got to userland in one instance, though, I did touch one of the /proc parameters (I for get which one, but it had something to do w/ the minimum background I/O speed) and dropped it down to 1,000K/sec, the machine's responsiveness improved dramatically. The real issue of what's causing the lockups in the first place ultimately needs to be chased down, but I lack the debugging skills necessary to do that. I tend to stop for the night when the resync needs to take place and power the machine down, as it drinks ~700W+, and I save the long resync for a day when utility rates are low. --J ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware 2015-11-12 22:28 ` Joshua Kinard @ 2015-11-13 0:03 ` Andreas Klauer 2015-12-21 0:43 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: Andreas Klauer @ 2015-11-13 0:03 UTC (permalink / raw) To: Joshua Kinard; +Cc: Neil Brown, linux-raid On Thu, Nov 12, 2015 at 05:28:41PM -0500, Joshua Kinard wrote: > running MD RAID5 and the XFS filesystem. I have /, /home, /usr, /var, > and /tmp on separate partitions, each a RAID5 setup. Hi, sorry for butting in, I have the same issue, on a regular consumer Haswell i5 box, with a setup very very similar to yours: 7x2TB disks, multiple partitions, for each: RAID-5, LUKS, LVM, XFS. The issue occurs during regular RAID check which I run daily (different partition/RAID each day, so it's more like a evenly distributed weekly check). I have an application that uses `find -size +100M` on a directory tree with ~3k subdirs and ~6k files in total. It doesn't do anything with the find result, it's purely informal. So no big data involved, even though the files themselves aren't small. Yet, it's slooow. The following tests were on a completely idle box, apart from a running RAID check on the same /dev/mdX device. Kernel 4.2.3, unpatched: real 0m53.555s user 0m0.013s sys 0m0.037s real 1m3.777s user 0m0.013s sys 0m0.037s real 1m3.453s user 0m0.014s sys 0m0.036s Kernel 4.2.3, reverted ac8fa4196d20: real 0m3.206s user 0m0.010s sys 0m0.030s real 0m0.450s user 0m0.003s sys 0m0.014s real 0m0.375s user 0m0.003s sys 0m0.012s I did echo 3 > /proc/sys/vm/drop_caches between each find. For some reason, subsequent calls in the reverted kernel are considerably faster regardless. In the original kernel it stays slow... if I don't drop_caches, the time is 0.006s. I don't normally reboot (while a RAID sync or check is running) but while switching between kernels I noticed the shutdown was very slow also in the original kernel. Are small requests getting delayed a lot or something? Regards Andreas Klauer ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware 2015-11-13 0:03 ` Andreas Klauer @ 2015-12-21 0:43 ` NeilBrown 0 siblings, 0 replies; 5+ messages in thread From: NeilBrown @ 2015-12-21 0:43 UTC (permalink / raw) To: Andreas Klauer, Joshua Kinard; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3429 bytes --] On Fri, Nov 13 2015, Andreas Klauer wrote: > On Thu, Nov 12, 2015 at 05:28:41PM -0500, Joshua Kinard wrote: >> running MD RAID5 and the XFS filesystem. I have /, /home, /usr, /var, >> and /tmp on separate partitions, each a RAID5 setup. > > Hi, sorry for butting in, > > I have the same issue, on a regular consumer Haswell i5 box, > with a setup very very similar to yours: > > 7x2TB disks, multiple partitions, for each: RAID-5, LUKS, LVM, XFS. > > The issue occurs during regular RAID check which I run daily > (different partition/RAID each day, so it's more like a > evenly distributed weekly check). > > I have an application that uses `find -size +100M` on a directory > tree with ~3k subdirs and ~6k files in total. It doesn't do anything > with the find result, it's purely informal. So no big data involved, > even though the files themselves aren't small. > > Yet, it's slooow. The following tests were on a completely idle box, > apart from a running RAID check on the same /dev/mdX device. > > Kernel 4.2.3, unpatched: > > real 0m53.555s > user 0m0.013s > sys 0m0.037s > > real 1m3.777s > user 0m0.013s > sys 0m0.037s > > real 1m3.453s > user 0m0.014s > sys 0m0.036s > > Kernel 4.2.3, reverted ac8fa4196d20: > > real 0m3.206s > user 0m0.010s > sys 0m0.030s > > real 0m0.450s > user 0m0.003s > sys 0m0.014s > > real 0m0.375s > user 0m0.003s > sys 0m0.012s > > I did echo 3 > /proc/sys/vm/drop_caches between each find. > For some reason, subsequent calls in the reverted kernel are > considerably faster regardless. In the original kernel it > stays slow... if I don't drop_caches, the time is 0.006s. > > I don't normally reboot (while a RAID sync or check is > running) but while switching between kernels I noticed > the shutdown was very slow also in the original kernel. > > Are small requests getting delayed a lot or something? Thanks for all the details and sorry for the delay. Are (either of) you able to test with this small incremental patch? When the md resync notices there is other IO pending, the old code would cause the resync to wait at least 500msec and possibly longer to get the overall resync speed below a threshold. Having the threshold fixed doesn't make sense when devices have such a wide range of speeds. The problem patch changes it to only wait until pending resync requests have finished. These means the wait is proportional to the speed of the devices, which makes more sense. The hope was that this would allow quite a few regular IO request to slip in the gap between resync requests so that regular IO would proceed reasonably quickly. Sometimes that worked, but obviously not for you. This patch adds an extra delay, still proportional to the speed of the devices, but with (hopefully) a lot more room for regular IO requests to get queued and handled. Thanks, NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index c0c3e6dec248..8a25cf6087ed 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8070,8 +8070,10 @@ void md_do_sync(struct md_thread *thread) * Give other IO more of a chance. * The faster the devices, the less we wait. */ + unsigned long start = jiffies; wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); + msleep(jiffies_to_msecs(jiffies - start)); } } } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Problem w/ commit ac8fa4196d20 on older, slower hardware
@ 2015-10-05 7:41 Joshua Kinard
0 siblings, 0 replies; 5+ messages in thread
From: Joshua Kinard @ 2015-10-05 7:41 UTC (permalink / raw)
To: linux-raid
Per commit ac8fa4196d20:
> md: allow resync to go faster when there is competing IO.
>
> When md notices non-sync IO happening while it is trying to resync (or
> reshape or recover) it slows down to the set minimum.
>
> The default minimum might have made sense many years ago but the drives have
> become faster. Changing the default to match the times isn't really a long
> term solution.
This holds true for modern hardware, but this commit is causing problems on
older hardware, like SGI MIPS platforms, that use mdraid. Namely, while trying
to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
of sync, so on the next reboot, mdraid's attempt to resync at full speed
absolutely murdered interactivity. It took close to 30mins for the system to
finally reach the login prompt.
Revert this patch was working to mitigate the problem at first, but it appears
that in recent kernels, this is no longer the case, and reverting this commit
has no noticeable effect anymore. I assume I'd have to hunt down newer commits
to revert, but it's probably saner to just highlight the problem and test any
proposed solutions.
Is there some way to resolve this in such a way that old hardware maintains
some level of interactivity during a resync, but that won't inconvenience the
more modern systems?
http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20
Thanks!,
--J
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-12-21 0:43 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-10-09 0:13 Problem w/ commit ac8fa4196d20 on older, slower hardware Neil Brown 2015-11-12 22:28 ` Joshua Kinard 2015-11-13 0:03 ` Andreas Klauer 2015-12-21 0:43 ` NeilBrown -- strict thread matches above, loose matches on Subject: below -- 2015-10-05 7:41 Joshua Kinard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).