linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is there any way to delay reconstruction
@ 2005-05-25  6:16 danci
  2005-05-25  6:26 ` Catalin(ux aka Dino) BOIE
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: danci @ 2005-05-25  6:16 UTC (permalink / raw)
  To: linux-raid

Hi,

I have 800+ machines all using Linux SW RAID-1. Recent kernels use modules 
for IDE (piix) and before those modules are loaded, I cannot turn on DMA. 
So I do this using hdparm from a rc.boot script. 

The problem is that usually fails with 'hdX: lost interrupt' if the disks 
are busy due to RAID reconstruction - which happens a lot as some of the 
800+ machines get rebooted for various reasons...

Of course I could be running without DMA (that's what I did on most 
critical machines), but that is painfully slow and it takes forever just 
to finish RAID reconstruction.

So I need to know if there is any way (kernel parameter would be ideal) to 
delay reconstruction for X seconds/minutes?

Any other suggestions to deal with the problem?

 Thanks, Danilo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25  6:16 Is there any way to delay reconstruction danci
@ 2005-05-25  6:26 ` Catalin(ux aka Dino) BOIE
  2005-05-25  8:25   ` danci
  2005-05-25 15:09 ` Tim Moore
  2005-05-25 15:25 ` Derek Piper
  2 siblings, 1 reply; 11+ messages in thread
From: Catalin(ux aka Dino) BOIE @ 2005-05-25  6:26 UTC (permalink / raw)
  To: danci; +Cc: linux-raid

On Wed, 25 May 2005 danci@agenda.si wrote:

> Hi,
>
> I have 800+ machines all using Linux SW RAID-1. Recent kernels use modules
> for IDE (piix) and before those modules are loaded, I cannot turn on DMA.
> So I do this using hdparm from a rc.boot script.
>
> The problem is that usually fails with 'hdX: lost interrupt' if the disks
> are busy due to RAID reconstruction - which happens a lot as some of the
> 800+ machines get rebooted for various reasons...
>
> Of course I could be running without DMA (that's what I did on most
> critical machines), but that is painfully slow and it takes forever just
> to finish RAID reconstruction.
>
> So I need to know if there is any way (kernel parameter would be ideal) to
> delay reconstruction for X seconds/minutes?
>
> Any other suggestions to deal with the problem?
>
> Thanks, Danilo

You can hot-remove the previous failed disk (anyway, it just started to 
resync). Then, activate DMA, then hot-add the disk to array.

Hope it helps.

---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25  6:26 ` Catalin(ux aka Dino) BOIE
@ 2005-05-25  8:25   ` danci
  2005-05-25 15:12     ` Mike Hardy
  0 siblings, 1 reply; 11+ messages in thread
From: danci @ 2005-05-25  8:25 UTC (permalink / raw)
  To: linux-raid

On Wed, 25 May 2005, Catalin(ux aka Dino) BOIE wrote:

> > So I need to know if there is any way (kernel parameter would be 
> > ideal) to delay reconstruction for X seconds/minutes?
> > 
> > Any other suggestions to deal with the problem?
> 
> You can hot-remove the previous failed disk (anyway, it just started to
> resync). Then, activate DMA, then hot-add the disk to array.
> 
> Hope it helps.

Thanks for the suggestion - this may be the last resort (I'd like to keep 
the init-scripts as clean as possible).

I forgot to mention that I'm using 2.4 kernels (2.4.27 at the moment, 
2.4.30 on the test machine - but it does the same thing).

There is a module 'piix.o' that needs to be loaded before I can use DMA at 
all. It's loaded via init scripts - I will try putting it in the initrd.

 D.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25  6:16 Is there any way to delay reconstruction danci
  2005-05-25  6:26 ` Catalin(ux aka Dino) BOIE
@ 2005-05-25 15:09 ` Tim Moore
  2005-05-26  6:41   ` danci
  2005-05-25 15:25 ` Derek Piper
  2 siblings, 1 reply; 11+ messages in thread
From: Tim Moore @ 2005-05-25 15:09 UTC (permalink / raw)
  To: linux-raid

Recompile with piix in the kernel.

danci@agenda.si wrote:
> Hi,
> 
> I have 800+ machines all using Linux SW RAID-1. Recent kernels use modules 
> for IDE (piix) and before those modules are loaded, I cannot turn on DMA. 
> So I do this using hdparm from a rc.boot script. 
> 
> The problem is that usually fails with 'hdX: lost interrupt' if the disks 
> are busy due to RAID reconstruction - which happens a lot as some of the 
> 800+ machines get rebooted for various reasons...
> 
> Of course I could be running without DMA (that's what I did on most 
> critical machines), but that is painfully slow and it takes forever just 
> to finish RAID reconstruction.
> 
> So I need to know if there is any way (kernel parameter would be ideal) to 
> delay reconstruction for X seconds/minutes?
> 
> Any other suggestions to deal with the problem?
> 
>  Thanks, Danilo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25  8:25   ` danci
@ 2005-05-25 15:12     ` Mike Hardy
  2005-05-26  6:48       ` danci
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Hardy @ 2005-05-25 15:12 UTC (permalink / raw)
  To: danci; +Cc: linux-raid


I've had this problem. I turned the raid speed limit max to 0, slept for
a second (just because), did my hdparm commands, slept another second
(again, just because - may not be necessary), then turned the raid speed
limit back to something that was good for background reconstruction.

That got rid of my dropped interrupts

-Mike

danci@agenda.si wrote:
> On Wed, 25 May 2005, Catalin(ux aka Dino) BOIE wrote:
> 
> 
>>>So I need to know if there is any way (kernel parameter would be 
>>>ideal) to delay reconstruction for X seconds/minutes?
>>>
>>>Any other suggestions to deal with the problem?
>>
>>You can hot-remove the previous failed disk (anyway, it just started to
>>resync). Then, activate DMA, then hot-add the disk to array.
>>
>>Hope it helps.
> 
> 
> Thanks for the suggestion - this may be the last resort (I'd like to keep 
> the init-scripts as clean as possible).
> 
> I forgot to mention that I'm using 2.4 kernels (2.4.27 at the moment, 
> 2.4.30 on the test machine - but it does the same thing).
> 
> There is a module 'piix.o' that needs to be loaded before I can use DMA at 
> all. It's loaded via init scripts - I will try putting it in the initrd.
> 
>  D.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25  6:16 Is there any way to delay reconstruction danci
  2005-05-25  6:26 ` Catalin(ux aka Dino) BOIE
  2005-05-25 15:09 ` Tim Moore
@ 2005-05-25 15:25 ` Derek Piper
  2005-05-25 16:58   ` Mike Hardy
  2005-05-26  6:59   ` danci
  2 siblings, 2 replies; 11+ messages in thread
From: Derek Piper @ 2005-05-25 15:25 UTC (permalink / raw)
  To: linux-raid

Why would rebooting the machines cause raid reconstruction? that
sounds pretty bad to need to do that. Shouldn't that be addressed
first? Then you might not need to worry about reconstruction so much.

Derek

On 5/25/05, danci@agenda.si <danci@agenda.si> wrote:

> The problem is that usually fails with 'hdX: lost interrupt' if the disks
> are busy due to RAID reconstruction - which happens a lot as some of the
> 800+ machines get rebooted for various reasons...
> 


-- 
Derek Piper - derek.piper@gmail.com
http://doofer.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25 15:25 ` Derek Piper
@ 2005-05-25 16:58   ` Mike Hardy
  2005-05-26  6:59   ` danci
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Hardy @ 2005-05-25 16:58 UTC (permalink / raw)
  To: Derek Piper, linux-raid


He mentioned he was on linux 2.4 - an unclean reboot will almost always
cause reconstruction there, as opposed to the aggressively clean 2.6
which almost never reconstructs.

I'd imagine with 800+ boxen, there's a few reboots a day no matter what
you're doing

-Mike

Derek Piper wrote:
> Why would rebooting the machines cause raid reconstruction? that
> sounds pretty bad to need to do that. Shouldn't that be addressed
> first? Then you might not need to worry about reconstruction so much.
> 
> Derek
> 
> On 5/25/05, danci@agenda.si <danci@agenda.si> wrote:
> 
> 
>>The problem is that usually fails with 'hdX: lost interrupt' if the disks
>>are busy due to RAID reconstruction - which happens a lot as some of the
>>800+ machines get rebooted for various reasons...
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25 15:09 ` Tim Moore
@ 2005-05-26  6:41   ` danci
  0 siblings, 0 replies; 11+ messages in thread
From: danci @ 2005-05-26  6:41 UTC (permalink / raw)
  To: Tim Moore; +Cc: linux-raid

On Wed, 25 May 2005, Tim Moore wrote:

> Recompile with piix in the kernel.

I did - it works now! The test machine had approx. 550 reboots over night 
- no problem so far.

 D.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25 15:12     ` Mike Hardy
@ 2005-05-26  6:48       ` danci
  0 siblings, 0 replies; 11+ messages in thread
From: danci @ 2005-05-26  6:48 UTC (permalink / raw)
  To: Mike Hardy; +Cc: linux-raid

On Wed, 25 May 2005, Mike Hardy wrote:

> I've had this problem. I turned the raid speed limit max to 0, slept for
> a second (just because), did my hdparm commands, slept another second
> (again, just because - may not be necessary), then turned the raid speed
> limit back to something that was good for background reconstruction.
> 
> That got rid of my dropped interrupts

That's a good idea - why didn't I think of that! :)

Anyway, I've recompiled the kernel with piix NOT as a module and it 
works.

But maybe it would be easier to change the rc.boot script then reinstall 
the kernel - on the 800+ machines!?

Thanks for the idea!

 D.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-25 15:25 ` Derek Piper
  2005-05-25 16:58   ` Mike Hardy
@ 2005-05-26  6:59   ` danci
  2005-05-26 16:07     ` Mike Hardy
  1 sibling, 1 reply; 11+ messages in thread
From: danci @ 2005-05-26  6:59 UTC (permalink / raw)
  To: Derek Piper; +Cc: linux-raid

On Wed, 25 May 2005, Derek Piper wrote:

> Why would rebooting the machines cause raid reconstruction? that
> sounds pretty bad to need to do that. Shouldn't that be addressed
> first? Then you might not need to worry about reconstruction so much.

Rebooting the 'clean' way (CTRL-ALT-DEL or 'shutdown -r now') is no 
problem - it doesn't require reconstruction.

It's 'cold' (or hardware) resets (such as power outages, silly users, 
etc.) that cause that - I don't think there is much you can do about that. 

 D.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is there any way to delay reconstruction
  2005-05-26  6:59   ` danci
@ 2005-05-26 16:07     ` Mike Hardy
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Hardy @ 2005-05-26 16:07 UTC (permalink / raw)
  To: danci, linux-raid



danci@agenda.si wrote:

> Rebooting the 'clean' way (CTRL-ALT-DEL or 'shutdown -r now') is no 
> problem - it doesn't require reconstruction.
> 
> It's 'cold' (or hardware) resets (such as power outages, silly users, 
> etc.) that cause that - I don't think there is much you can do about that. 

If/when you upgrade to 2.6.x you'll notice that even on the vast
majority of abnormal reboots, it still won't reconstruct. The code in
2.6.x marks the array as "clean" very quickly after writes stop so
unless the array is actually being written when it goes down, its
probably okay. On a huge array, that's nearly worth the upgrade right
there...

-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-05-26 16:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-25  6:16 Is there any way to delay reconstruction danci
2005-05-25  6:26 ` Catalin(ux aka Dino) BOIE
2005-05-25  8:25   ` danci
2005-05-25 15:12     ` Mike Hardy
2005-05-26  6:48       ` danci
2005-05-25 15:09 ` Tim Moore
2005-05-26  6:41   ` danci
2005-05-25 15:25 ` Derek Piper
2005-05-25 16:58   ` Mike Hardy
2005-05-26  6:59   ` danci
2005-05-26 16:07     ` Mike Hardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).