* about faulty spare-disk interrupts synchronization
@ 2009-12-23 11:51 spren.gm
2009-12-23 23:31 ` Neil Brown
0 siblings, 1 reply; 2+ messages in thread
From: spren.gm @ 2009-12-23 11:51 UTC (permalink / raw)
To: linux-raid@vger.kernel.org
Hi,
Is it intended that when a spare disk status gets faulty (detached from raid or really faulty)
synchronization is interrupted ? We found that case several days ago with kernel version of 2.6.24,
after we unplugged a spare disk of a raid5 which had bitmap and was recovering, the spare disk
status became faulty and synchronization restarted from 0%.
Looking into the md code, i find that in md/md.c/md_error(), it doesn't make a difference between
spare disks and normal disks. Should we make a faulty spare disk not interrupt raid synchronization ?
Disks nowadays have become much larger, and recovering one disk may cost several hours or even longer.
--------------
spren.gm@gmail.com
2009-12-23
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: about faulty spare-disk interrupts synchronization
2009-12-23 11:51 about faulty spare-disk interrupts synchronization spren.gm
@ 2009-12-23 23:31 ` Neil Brown
0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2009-12-23 23:31 UTC (permalink / raw)
To: spren.gm@gmail.com; +Cc: linux-raid@vger.kernel.org
On Wed, 23 Dec 2009 19:51:33 +0800
"spren.gm@gmail.com" <spren.gm@gmail.com> wrote:
> Hi,
> Is it intended that when a spare disk status gets faulty (detached from raid or really faulty)
> synchronization is interrupted ? We found that case several days ago with kernel version of 2.6.24,
> after we unplugged a spare disk of a raid5 which had bitmap and was recovering, the spare disk
> status became faulty and synchronization restarted from 0%.
>
> Looking into the md code, i find that in md/md.c/md_error(), it doesn't make a difference between
> spare disks and normal disks. Should we make a faulty spare disk not interrupt raid synchronization ?
> Disks nowadays have become much larger, and recovering one disk may cost several hours or even longer.
>
Yes, it is intended that any synchronisation is interrupted when any device
fails.
However if the device was just an inactive spare, then the synchronisation
should start again from the same place that it was up to, it at least it
should repeat the already-done part very very quickly.
Can you test on a more recent kernel?
Can you give precise details of steps and kernel log messages and mdstat
output?
NeilBrown
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-12-23 23:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-23 11:51 about faulty spare-disk interrupts synchronization spren.gm
2009-12-23 23:31 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).