Problem w/ commit ac8fa4196d20 on older, slower hardware

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Problem w/ commit ac8fa4196d20 on older, slower hardware
@ 2015-10-05  7:41 Joshua Kinard
  0 siblings, 0 replies; 5+ messages in thread
From: Joshua Kinard @ 2015-10-05  7:41 UTC (permalink / raw)
  To: linux-raid

Per commit ac8fa4196d20:

> md: allow resync to go faster when there is competing IO.
> 
> When md notices non-sync IO happening while it is trying to resync (or
> reshape or recover) it slows down to the set minimum.
> 
> The default minimum might have made sense many years ago but the drives have
> become faster. Changing the default to match the times isn't really a long
> term solution.

This holds true for modern hardware, but this commit is causing problems on
older hardware, like SGI MIPS platforms, that use mdraid.  Namely, while trying
to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
of sync, so on the next reboot, mdraid's attempt to resync at full speed
absolutely murdered interactivity.  It took close to 30mins for the system to
finally reach the login prompt.

Revert this patch was working to mitigate the problem at first, but it appears
that in recent kernels, this is no longer the case, and reverting this commit
has no noticeable effect anymore.  I assume I'd have to hunt down newer commits
to revert, but it's probably saner to just highlight the problem and test any
proposed solutions.

Is there some way to resolve this in such a way that old hardware maintains
some level of interactivity during a resync, but that won't inconvenience the
more modern systems?

http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20

Thanks!,

--J

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware
@ 2015-10-09  0:13 Neil Brown
  2015-11-12 22:28 ` Joshua Kinard
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2015-10-09  0:13 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]


> Per commit ac8fa4196d20:
> 
> > md: allow resync to go faster when there is competing IO.
> > 
> > When md notices non-sync IO happening while it is trying to resync (or
> > reshape or recover) it slows down to the set minimum.
> > 
> > The default minimum might have made sense many years ago but the drives have
> > become faster. Changing the default to match the times isn't really a long
> > term solution.
> 
> This holds true for modern hardware, but this commit is causing problems on
> older hardware, like SGI MIPS platforms, that use mdraid.  Namely, while trying
> to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
> of sync, so on the next reboot, mdraid's attempt to resync at full speed
> absolutely murdered interactivity.  It took close to 30mins for the system to
> finally reach the login prompt.
> 
> Revert this patch was working to mitigate the problem at first, but it appears
> that in recent kernels, this is no longer the case, and reverting this commit
> has no noticeable effect anymore.  I assume I'd have to hunt down newer commits
> to revert, but it's probably saner to just highlight the problem and test any
> proposed solutions.
> 
> Is there some way to resolve this in such a way that old hardware maintains
> some level of interactivity during a resync, but that won't inconvenience the
> more modern systems?
> 
> http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20
> 
> Thanks!,
>

Hmmm... this change shouldn't have that effect.
It should allow resync to soak up a bit more of the idle time, but when
there is any other IO, resync should still back off.

I wonder if there is some other change which has confused the event
counting for the particular hardware you are using.

How did you identify this commit as a possible cause?

The fact that reverting it no longer helps strongly suggests that some
other change is implicated.  I don't think there have been other changes
in md which could affect this.

Have you tried adjusting /proc/sys/dev/raid/speed_limit_m{ax,in} ??
Did that have any noticeable effect?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware
  2015-10-09  0:13 Problem w/ commit ac8fa4196d20 on older, slower hardware Neil Brown
@ 2015-11-12 22:28 ` Joshua Kinard
  2015-11-13  0:03   ` Andreas Klauer
  0 siblings, 1 reply; 5+ messages in thread
From: Joshua Kinard @ 2015-11-12 22:28 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 10/08/2015 20:13, Neil Brown wrote:
> 
>> Per commit ac8fa4196d20:
>>
>>> md: allow resync to go faster when there is competing IO.
>>>
>>> When md notices non-sync IO happening while it is trying to resync (or
>>> reshape or recover) it slows down to the set minimum.
>>>
>>> The default minimum might have made sense many years ago but the drives have
>>> become faster. Changing the default to match the times isn't really a long
>>> term solution.
>>
>> This holds true for modern hardware, but this commit is causing problems on
>> older hardware, like SGI MIPS platforms, that use mdraid.  Namely, while trying
>> to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
>> of sync, so on the next reboot, mdraid's attempt to resync at full speed
>> absolutely murdered interactivity.  It took close to 30mins for the system to
>> finally reach the login prompt.
>>
>> Revert this patch was working to mitigate the problem at first, but it appears
>> that in recent kernels, this is no longer the case, and reverting this commit
>> has no noticeable effect anymore.  I assume I'd have to hunt down newer commits
>> to revert, but it's probably saner to just highlight the problem and test any
>> proposed solutions.
>>
>> Is there some way to resolve this in such a way that old hardware maintains
>> some level of interactivity during a resync, but that won't inconvenience the
>> more modern systems?
>>
>> http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20
>>
>> Thanks!,
>>
> 
> Hmmm... this change shouldn't have that effect.
> It should allow resync to soak up a bit more of the idle time, but when
> there is any other IO, resync should still back off.
> 
> I wonder if there is some other change which has confused the event
> counting for the particular hardware you are using.
> 
> How did you identify this commit as a possible cause?

Sorry for the late response.  I pinned down this particular commit as the cause
on an SGI Onyx2 (IP27), which is a MIPS big-endian platform that supports
ccNUMA.  The SCSI chip is a QLogic ISP1040B.  It's been supported in the
mainline kernel for a long time, but has suffered from bit-rot over the years.
 There's an unidentified bug somewhere in the architecture code that, under
heavy disk I/O or memory operations (I am not sure which, yet), the machine
will completely lock up hard.

I have three ~50GB SCA SCSI drives plugged into it, running MD RAID5 and the
XFS filesystem.  I have /, /home, /usr, /var, and /tmp on separate partitions,
each a RAID5 setup.  After one of these hard lockups, on the next reboot, the
kernel detected that my largest partition, /usr, needed to be rebuilt, so it
launched a background resync.  The other partitions were fine.

I noticed after several minutes that the kernel had still not proceeded to
execute /init, and that XFS hadn't even mounted the rootfs yet.  I thought the
machine had hardlocked again.  The lockup bug normally does not happen with a
resync (which takes place entirely within the kernel), but more so when running
commands from userspace.  Physically checking the machine, the disk lights were
showing drive activity, so I let it sit for a good half-hour, and when I later
checked the serial console out, it had gotten most of the way through the
bootup process and was still bringing up runlevel 3 services.

Logging into the root console several minutes later showed the resync was
almost complete, but interactivity remained very sluggish until the resync
finished.  So I dug into gitweb on linux-mips.org and looked for any recent
commits to md.c that might have something to do with resync operations, and
this one stood out the most.  Reverting it, then forcing the lockup bug to
happen several times until another background resync took place showed
drastically-improved bootup speed.  The machine was able to boot to userland
within ~4-6 mins with the background resync happening on /usr.

I think this was on 3.19 or 4.0 (I forget).  It was on the next version up that
I noticed the revert was no longer having an effect, and a resync slowed I/O
down enough that booting to userland was back into the ~30min range.  I have
also noticed that the lockup bug is also happening, randomly, during a resync
now too.  I suspect whatever issue is causing the lockup is getting worse.

The last kernel I booted on this platform was a 4.2-rcX release.  I have not
had time to test 4.3.x out.

I have also reproduced the same issue on an SGI Octane (IP30), which needs
out-of-tree patches to work.  It's basically the smaller cousin of an
Origin/Onyx2, using the same CPU, SCSI chip, same partition layout, same
filesystem.  Only the disks, 3x 73GB SCA SCSI disks, and some internal hardware
architecture, are different between the two.  It does not suffer from any
lockup bugs whatsoever, and I only triggered a background resync when I got
frustrated at an unrelated issue and powered the machine off out of annoyance.

Per hdparm -tT, the average I/O speed is ~160MB/sec reading from cache, and
~18.3MB/sec reading from the /dev/mdX devices.  Reading from the individual
/dev/sdX drives is slightly faster at ~18.5MB/sec.  This is true for both machines.

> The fact that reverting it no longer helps strongly suggests that some
> other change is implicated.  I don't think there have been other changes
> in md which could affect this.

The changes to the code that this commit affected seems to play some role in
the issue, but I agree that it does not appear to be the sole participant anymore.

> Have you tried adjusting /proc/sys/dev/raid/speed_limit_m{ax,in} ??
> Did that have any noticeable effect?

Hard to do when your kernel takes 30+ minutes to boot up :)  Once I got to
userland in one instance, though, I did touch one of the /proc parameters (I
for get which one, but it had something to do w/ the minimum background I/O
speed) and dropped it down to 1,000K/sec, the machine's responsiveness improved
dramatically.

The real issue of what's causing the lockups in the first place ultimately
needs to be chased down, but I lack the debugging skills necessary to do that.
 I tend to stop for the night when the resync needs to take place and power the
machine down, as it drinks ~700W+, and I save the long resync for a day when
utility rates are low.

--J

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware
  2015-11-12 22:28 ` Joshua Kinard
@ 2015-11-13  0:03   ` Andreas Klauer
  2015-12-21  0:43     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Klauer @ 2015-11-13  0:03 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: Neil Brown, linux-raid

On Thu, Nov 12, 2015 at 05:28:41PM -0500, Joshua Kinard wrote:
> running MD RAID5 and the XFS filesystem.  I have /, /home, /usr, /var,
> and /tmp on separate partitions, each a RAID5 setup.

Hi, sorry for butting in,

I have the same issue, on a regular consumer Haswell i5 box, 
with a setup very very similar to yours:

7x2TB disks, multiple partitions, for each: RAID-5, LUKS, LVM, XFS.

The issue occurs during regular RAID check which I run daily 
(different partition/RAID each day, so it's more like a 
evenly distributed weekly check).

I have an application that uses `find -size +100M` on a directory 
tree with ~3k subdirs and ~6k files in total. It doesn't do anything 
with the find result, it's purely informal. So no big data involved, 
even though the files themselves aren't small.

Yet, it's slooow. The following tests were on a completely idle box, 
apart from a running RAID check on the same /dev/mdX device.

Kernel 4.2.3, unpatched:

real	0m53.555s
user	0m0.013s
sys	0m0.037s

real	1m3.777s
user	0m0.013s
sys	0m0.037s

real	1m3.453s
user	0m0.014s
sys	0m0.036s

Kernel 4.2.3, reverted ac8fa4196d20:

real	0m3.206s
user	0m0.010s
sys	0m0.030s

real	0m0.450s
user	0m0.003s
sys	0m0.014s

real	0m0.375s
user	0m0.003s
sys	0m0.012s

I did echo 3 > /proc/sys/vm/drop_caches between each find. 
For some reason, subsequent calls in the reverted kernel are 
considerably faster regardless. In the original kernel it 
stays slow... if I don't drop_caches, the time is 0.006s.

I don't normally reboot (while a RAID sync or check is 
running) but while switching between kernels I noticed 
the shutdown was very slow also in the original kernel.

Are small requests getting delayed a lot or something?

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware
  2015-11-13  0:03   ` Andreas Klauer
@ 2015-12-21  0:43     ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2015-12-21  0:43 UTC (permalink / raw)
  To: Andreas Klauer, Joshua Kinard; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3429 bytes --]

On Fri, Nov 13 2015, Andreas Klauer wrote:

> On Thu, Nov 12, 2015 at 05:28:41PM -0500, Joshua Kinard wrote:
>> running MD RAID5 and the XFS filesystem.  I have /, /home, /usr, /var,
>> and /tmp on separate partitions, each a RAID5 setup.
>
> Hi, sorry for butting in,
>
> I have the same issue, on a regular consumer Haswell i5 box, 
> with a setup very very similar to yours:
>
> 7x2TB disks, multiple partitions, for each: RAID-5, LUKS, LVM, XFS.
>
> The issue occurs during regular RAID check which I run daily 
> (different partition/RAID each day, so it's more like a 
> evenly distributed weekly check).
>
> I have an application that uses `find -size +100M` on a directory 
> tree with ~3k subdirs and ~6k files in total. It doesn't do anything 
> with the find result, it's purely informal. So no big data involved, 
> even though the files themselves aren't small.
>
> Yet, it's slooow. The following tests were on a completely idle box, 
> apart from a running RAID check on the same /dev/mdX device.
>
> Kernel 4.2.3, unpatched:
>
> real	0m53.555s
> user	0m0.013s
> sys	0m0.037s
>
> real	1m3.777s
> user	0m0.013s
> sys	0m0.037s
>
> real	1m3.453s
> user	0m0.014s
> sys	0m0.036s
>
> Kernel 4.2.3, reverted ac8fa4196d20:
>
> real	0m3.206s
> user	0m0.010s
> sys	0m0.030s
>
> real	0m0.450s
> user	0m0.003s
> sys	0m0.014s
>
> real	0m0.375s
> user	0m0.003s
> sys	0m0.012s
>
> I did echo 3 > /proc/sys/vm/drop_caches between each find. 
> For some reason, subsequent calls in the reverted kernel are 
> considerably faster regardless. In the original kernel it 
> stays slow... if I don't drop_caches, the time is 0.006s.
>
> I don't normally reboot (while a RAID sync or check is 
> running) but while switching between kernels I noticed 
> the shutdown was very slow also in the original kernel.
>
> Are small requests getting delayed a lot or something?

Thanks for all the details and sorry for the delay.

Are (either of) you able to test with this small incremental patch?

When the md resync notices there is other IO pending, the old code would
cause the resync to wait at least 500msec and possibly longer to get the
overall resync speed below a threshold.
Having the threshold fixed doesn't make sense when devices have such a
wide range of speeds.

The problem patch changes it to only wait until pending resync requests
have finished.  These means the wait is proportional to the speed of the
devices, which makes more sense.  The hope was that this would allow
quite a few regular IO request to slip in the gap between resync requests
so that regular IO would proceed reasonably quickly.  Sometimes that
worked, but obviously not for you.

This patch adds an extra delay, still proportional to the speed of the
devices, but with (hopefully) a lot more room for regular IO requests to
get queued and handled.

Thanks,
NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c0c3e6dec248..8a25cf6087ed 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8070,8 +8070,10 @@ void md_do_sync(struct md_thread *thread)
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
+				unsigned long start = jiffies;
 				wait_event(mddev->recovery_wait,
 					   !atomic_read(&mddev->recovery_active));
+				msleep(jiffies_to_msecs(jiffies - start));
 			}
 		}
 	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-21  0:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-09  0:13 Problem w/ commit ac8fa4196d20 on older, slower hardware Neil Brown
2015-11-12 22:28 ` Joshua Kinard
2015-11-13  0:03   ` Andreas Klauer
2015-12-21  0:43     ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2015-10-05  7:41 Joshua Kinard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).