linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Problem w/ commit ac8fa4196d20 on older, slower hardware
@ 2015-10-09  0:13 Neil Brown
  2015-11-12 22:28 ` Joshua Kinard
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2015-10-09  0:13 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]


> Per commit ac8fa4196d20:
> 
> > md: allow resync to go faster when there is competing IO.
> > 
> > When md notices non-sync IO happening while it is trying to resync (or
> > reshape or recover) it slows down to the set minimum.
> > 
> > The default minimum might have made sense many years ago but the drives have
> > become faster. Changing the default to match the times isn't really a long
> > term solution.
> 
> This holds true for modern hardware, but this commit is causing problems on
> older hardware, like SGI MIPS platforms, that use mdraid.  Namely, while trying
> to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
> of sync, so on the next reboot, mdraid's attempt to resync at full speed
> absolutely murdered interactivity.  It took close to 30mins for the system to
> finally reach the login prompt.
> 
> Revert this patch was working to mitigate the problem at first, but it appears
> that in recent kernels, this is no longer the case, and reverting this commit
> has no noticeable effect anymore.  I assume I'd have to hunt down newer commits
> to revert, but it's probably saner to just highlight the problem and test any
> proposed solutions.
> 
> Is there some way to resolve this in such a way that old hardware maintains
> some level of interactivity during a resync, but that won't inconvenience the
> more modern systems?
> 
> http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20
> 
> Thanks!,
>

Hmmm... this change shouldn't have that effect.
It should allow resync to soak up a bit more of the idle time, but when
there is any other IO, resync should still back off.

I wonder if there is some other change which has confused the event
counting for the particular hardware you are using.

How did you identify this commit as a possible cause?

The fact that reverting it no longer helps strongly suggests that some
other change is implicated.  I don't think there have been other changes
in md which could affect this.

Have you tried adjusting /proc/sys/dev/raid/speed_limit_m{ax,in} ??
Did that have any noticeable effect?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread
* Problem w/ commit ac8fa4196d20 on older, slower hardware
@ 2015-10-05  7:41 Joshua Kinard
  0 siblings, 0 replies; 5+ messages in thread
From: Joshua Kinard @ 2015-10-05  7:41 UTC (permalink / raw)
  To: linux-raid

Per commit ac8fa4196d20:

> md: allow resync to go faster when there is competing IO.
> 
> When md notices non-sync IO happening while it is trying to resync (or
> reshape or recover) it slows down to the set minimum.
> 
> The default minimum might have made sense many years ago but the drives have
> become faster. Changing the default to match the times isn't really a long
> term solution.

This holds true for modern hardware, but this commit is causing problems on
older hardware, like SGI MIPS platforms, that use mdraid.  Namely, while trying
to chase down an unrelated hardlock bug on an Onyx2, one of the arrays got out
of sync, so on the next reboot, mdraid's attempt to resync at full speed
absolutely murdered interactivity.  It took close to 30mins for the system to
finally reach the login prompt.

Revert this patch was working to mitigate the problem at first, but it appears
that in recent kernels, this is no longer the case, and reverting this commit
has no noticeable effect anymore.  I assume I'd have to hunt down newer commits
to revert, but it's probably saner to just highlight the problem and test any
proposed solutions.

Is there some way to resolve this in such a way that old hardware maintains
some level of interactivity during a resync, but that won't inconvenience the
more modern systems?

http://git.linux-mips.org/cgit/ralf/linux.git/commit/?id=ac8fa4196d20

Thanks!,

--J

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-21  0:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-09  0:13 Problem w/ commit ac8fa4196d20 on older, slower hardware Neil Brown
2015-11-12 22:28 ` Joshua Kinard
2015-11-13  0:03   ` Andreas Klauer
2015-12-21  0:43     ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2015-10-05  7:41 Joshua Kinard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).