* Is there any way to delay reconstruction
@ 2005-05-25 6:16 danci
2005-05-25 6:26 ` Catalin(ux aka Dino) BOIE
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: danci @ 2005-05-25 6:16 UTC (permalink / raw)
To: linux-raid
Hi,
I have 800+ machines all using Linux SW RAID-1. Recent kernels use modules
for IDE (piix) and before those modules are loaded, I cannot turn on DMA.
So I do this using hdparm from a rc.boot script.
The problem is that usually fails with 'hdX: lost interrupt' if the disks
are busy due to RAID reconstruction - which happens a lot as some of the
800+ machines get rebooted for various reasons...
Of course I could be running without DMA (that's what I did on most
critical machines), but that is painfully slow and it takes forever just
to finish RAID reconstruction.
So I need to know if there is any way (kernel parameter would be ideal) to
delay reconstruction for X seconds/minutes?
Any other suggestions to deal with the problem?
Thanks, Danilo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 6:16 Is there any way to delay reconstruction danci
@ 2005-05-25 6:26 ` Catalin(ux aka Dino) BOIE
2005-05-25 8:25 ` danci
2005-05-25 15:09 ` Tim Moore
2005-05-25 15:25 ` Derek Piper
2 siblings, 1 reply; 11+ messages in thread
From: Catalin(ux aka Dino) BOIE @ 2005-05-25 6:26 UTC (permalink / raw)
To: danci; +Cc: linux-raid
On Wed, 25 May 2005 danci@agenda.si wrote:
> Hi,
>
> I have 800+ machines all using Linux SW RAID-1. Recent kernels use modules
> for IDE (piix) and before those modules are loaded, I cannot turn on DMA.
> So I do this using hdparm from a rc.boot script.
>
> The problem is that usually fails with 'hdX: lost interrupt' if the disks
> are busy due to RAID reconstruction - which happens a lot as some of the
> 800+ machines get rebooted for various reasons...
>
> Of course I could be running without DMA (that's what I did on most
> critical machines), but that is painfully slow and it takes forever just
> to finish RAID reconstruction.
>
> So I need to know if there is any way (kernel parameter would be ideal) to
> delay reconstruction for X seconds/minutes?
>
> Any other suggestions to deal with the problem?
>
> Thanks, Danilo
You can hot-remove the previous failed disk (anyway, it just started to
resync). Then, activate DMA, then hot-add the disk to array.
Hope it helps.
---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 6:26 ` Catalin(ux aka Dino) BOIE
@ 2005-05-25 8:25 ` danci
2005-05-25 15:12 ` Mike Hardy
0 siblings, 1 reply; 11+ messages in thread
From: danci @ 2005-05-25 8:25 UTC (permalink / raw)
To: linux-raid
On Wed, 25 May 2005, Catalin(ux aka Dino) BOIE wrote:
> > So I need to know if there is any way (kernel parameter would be
> > ideal) to delay reconstruction for X seconds/minutes?
> >
> > Any other suggestions to deal with the problem?
>
> You can hot-remove the previous failed disk (anyway, it just started to
> resync). Then, activate DMA, then hot-add the disk to array.
>
> Hope it helps.
Thanks for the suggestion - this may be the last resort (I'd like to keep
the init-scripts as clean as possible).
I forgot to mention that I'm using 2.4 kernels (2.4.27 at the moment,
2.4.30 on the test machine - but it does the same thing).
There is a module 'piix.o' that needs to be loaded before I can use DMA at
all. It's loaded via init scripts - I will try putting it in the initrd.
D.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 6:16 Is there any way to delay reconstruction danci
2005-05-25 6:26 ` Catalin(ux aka Dino) BOIE
@ 2005-05-25 15:09 ` Tim Moore
2005-05-26 6:41 ` danci
2005-05-25 15:25 ` Derek Piper
2 siblings, 1 reply; 11+ messages in thread
From: Tim Moore @ 2005-05-25 15:09 UTC (permalink / raw)
To: linux-raid
Recompile with piix in the kernel.
danci@agenda.si wrote:
> Hi,
>
> I have 800+ machines all using Linux SW RAID-1. Recent kernels use modules
> for IDE (piix) and before those modules are loaded, I cannot turn on DMA.
> So I do this using hdparm from a rc.boot script.
>
> The problem is that usually fails with 'hdX: lost interrupt' if the disks
> are busy due to RAID reconstruction - which happens a lot as some of the
> 800+ machines get rebooted for various reasons...
>
> Of course I could be running without DMA (that's what I did on most
> critical machines), but that is painfully slow and it takes forever just
> to finish RAID reconstruction.
>
> So I need to know if there is any way (kernel parameter would be ideal) to
> delay reconstruction for X seconds/minutes?
>
> Any other suggestions to deal with the problem?
>
> Thanks, Danilo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 8:25 ` danci
@ 2005-05-25 15:12 ` Mike Hardy
2005-05-26 6:48 ` danci
0 siblings, 1 reply; 11+ messages in thread
From: Mike Hardy @ 2005-05-25 15:12 UTC (permalink / raw)
To: danci; +Cc: linux-raid
I've had this problem. I turned the raid speed limit max to 0, slept for
a second (just because), did my hdparm commands, slept another second
(again, just because - may not be necessary), then turned the raid speed
limit back to something that was good for background reconstruction.
That got rid of my dropped interrupts
-Mike
danci@agenda.si wrote:
> On Wed, 25 May 2005, Catalin(ux aka Dino) BOIE wrote:
>
>
>>>So I need to know if there is any way (kernel parameter would be
>>>ideal) to delay reconstruction for X seconds/minutes?
>>>
>>>Any other suggestions to deal with the problem?
>>
>>You can hot-remove the previous failed disk (anyway, it just started to
>>resync). Then, activate DMA, then hot-add the disk to array.
>>
>>Hope it helps.
>
>
> Thanks for the suggestion - this may be the last resort (I'd like to keep
> the init-scripts as clean as possible).
>
> I forgot to mention that I'm using 2.4 kernels (2.4.27 at the moment,
> 2.4.30 on the test machine - but it does the same thing).
>
> There is a module 'piix.o' that needs to be loaded before I can use DMA at
> all. It's loaded via init scripts - I will try putting it in the initrd.
>
> D.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 6:16 Is there any way to delay reconstruction danci
2005-05-25 6:26 ` Catalin(ux aka Dino) BOIE
2005-05-25 15:09 ` Tim Moore
@ 2005-05-25 15:25 ` Derek Piper
2005-05-25 16:58 ` Mike Hardy
2005-05-26 6:59 ` danci
2 siblings, 2 replies; 11+ messages in thread
From: Derek Piper @ 2005-05-25 15:25 UTC (permalink / raw)
To: linux-raid
Why would rebooting the machines cause raid reconstruction? that
sounds pretty bad to need to do that. Shouldn't that be addressed
first? Then you might not need to worry about reconstruction so much.
Derek
On 5/25/05, danci@agenda.si <danci@agenda.si> wrote:
> The problem is that usually fails with 'hdX: lost interrupt' if the disks
> are busy due to RAID reconstruction - which happens a lot as some of the
> 800+ machines get rebooted for various reasons...
>
--
Derek Piper - derek.piper@gmail.com
http://doofer.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 15:25 ` Derek Piper
@ 2005-05-25 16:58 ` Mike Hardy
2005-05-26 6:59 ` danci
1 sibling, 0 replies; 11+ messages in thread
From: Mike Hardy @ 2005-05-25 16:58 UTC (permalink / raw)
To: Derek Piper, linux-raid
He mentioned he was on linux 2.4 - an unclean reboot will almost always
cause reconstruction there, as opposed to the aggressively clean 2.6
which almost never reconstructs.
I'd imagine with 800+ boxen, there's a few reboots a day no matter what
you're doing
-Mike
Derek Piper wrote:
> Why would rebooting the machines cause raid reconstruction? that
> sounds pretty bad to need to do that. Shouldn't that be addressed
> first? Then you might not need to worry about reconstruction so much.
>
> Derek
>
> On 5/25/05, danci@agenda.si <danci@agenda.si> wrote:
>
>
>>The problem is that usually fails with 'hdX: lost interrupt' if the disks
>>are busy due to RAID reconstruction - which happens a lot as some of the
>>800+ machines get rebooted for various reasons...
>>
>
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 15:09 ` Tim Moore
@ 2005-05-26 6:41 ` danci
0 siblings, 0 replies; 11+ messages in thread
From: danci @ 2005-05-26 6:41 UTC (permalink / raw)
To: Tim Moore; +Cc: linux-raid
On Wed, 25 May 2005, Tim Moore wrote:
> Recompile with piix in the kernel.
I did - it works now! The test machine had approx. 550 reboots over night
- no problem so far.
D.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 15:12 ` Mike Hardy
@ 2005-05-26 6:48 ` danci
0 siblings, 0 replies; 11+ messages in thread
From: danci @ 2005-05-26 6:48 UTC (permalink / raw)
To: Mike Hardy; +Cc: linux-raid
On Wed, 25 May 2005, Mike Hardy wrote:
> I've had this problem. I turned the raid speed limit max to 0, slept for
> a second (just because), did my hdparm commands, slept another second
> (again, just because - may not be necessary), then turned the raid speed
> limit back to something that was good for background reconstruction.
>
> That got rid of my dropped interrupts
That's a good idea - why didn't I think of that! :)
Anyway, I've recompiled the kernel with piix NOT as a module and it
works.
But maybe it would be easier to change the rc.boot script then reinstall
the kernel - on the 800+ machines!?
Thanks for the idea!
D.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-25 15:25 ` Derek Piper
2005-05-25 16:58 ` Mike Hardy
@ 2005-05-26 6:59 ` danci
2005-05-26 16:07 ` Mike Hardy
1 sibling, 1 reply; 11+ messages in thread
From: danci @ 2005-05-26 6:59 UTC (permalink / raw)
To: Derek Piper; +Cc: linux-raid
On Wed, 25 May 2005, Derek Piper wrote:
> Why would rebooting the machines cause raid reconstruction? that
> sounds pretty bad to need to do that. Shouldn't that be addressed
> first? Then you might not need to worry about reconstruction so much.
Rebooting the 'clean' way (CTRL-ALT-DEL or 'shutdown -r now') is no
problem - it doesn't require reconstruction.
It's 'cold' (or hardware) resets (such as power outages, silly users,
etc.) that cause that - I don't think there is much you can do about that.
D.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is there any way to delay reconstruction
2005-05-26 6:59 ` danci
@ 2005-05-26 16:07 ` Mike Hardy
0 siblings, 0 replies; 11+ messages in thread
From: Mike Hardy @ 2005-05-26 16:07 UTC (permalink / raw)
To: danci, linux-raid
danci@agenda.si wrote:
> Rebooting the 'clean' way (CTRL-ALT-DEL or 'shutdown -r now') is no
> problem - it doesn't require reconstruction.
>
> It's 'cold' (or hardware) resets (such as power outages, silly users,
> etc.) that cause that - I don't think there is much you can do about that.
If/when you upgrade to 2.6.x you'll notice that even on the vast
majority of abnormal reboots, it still won't reconstruct. The code in
2.6.x marks the array as "clean" very quickly after writes stop so
unless the array is actually being written when it goes down, its
probably okay. On a huge array, that's nearly worth the upgrade right
there...
-Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-05-26 16:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-25 6:16 Is there any way to delay reconstruction danci
2005-05-25 6:26 ` Catalin(ux aka Dino) BOIE
2005-05-25 8:25 ` danci
2005-05-25 15:12 ` Mike Hardy
2005-05-26 6:48 ` danci
2005-05-25 15:09 ` Tim Moore
2005-05-26 6:41 ` danci
2005-05-25 15:25 ` Derek Piper
2005-05-25 16:58 ` Mike Hardy
2005-05-26 6:59 ` danci
2005-05-26 16:07 ` Mike Hardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).