* Problem with md: Not rebuilding rai5
@ 2006-08-29 9:12 Nico Schottelius
2006-08-29 9:26 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Nico Schottelius @ 2006-08-29 9:12 UTC (permalink / raw)
To: LKML
[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]
Hello!
I created a degrated raid5 on top of md1 and hde1. Then moved the data
from /dev/hdk to the mounted raid5, and then added hdk1 (repartitoned)
to the array. The sync began, but after that hde1 was faulty.
I removed it, readded it, but now I've a raid5 with only one active
disk (which should not be possible imho, a raid5 always needs 2 disks)
AND what's even stranger for me, I've two spare disks.
Is there a way to force rebuilding the array?
I am a bit confused, because I thought Linux would rebuilt the array
automaticly, when having spare disks.
The output of mdadm and some debug can be found at
http://home.schottelius.org/~nico/linux/debug/raid/raid5.strange
I am happy for any hint, because I did not find documentation, which
refers to having two spare disks with raid5. And as much as I know from the
Software-Raid howto, Linux should rebuilt it automaticly, shouldn't it?
I am happy for any hint, I am currently not rebooting the system or
umounting the path, because I do not know whether the array will come up
again.
Nico
P.S.: Please CC, I am not subscribed.
--
``...if there's one thing about Linux users, they're do-ers, not whiners.''
(A quotation of Andy Patrizio I completely agree with)
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem with md: Not rebuilding rai5
2006-08-29 9:12 Problem with md: Not rebuilding rai5 Nico Schottelius
@ 2006-08-29 9:26 ` Neil Brown
2006-08-29 9:40 ` Nico Schottelius
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2006-08-29 9:26 UTC (permalink / raw)
To: Nico Schottelius; +Cc: LKML
On Tuesday August 29, nico-kernel20060829@schottelius.org wrote:
> Hello!
>
> I created a degrated raid5 on top of md1 and hde1. Then moved the data
> from /dev/hdk to the mounted raid5, and then added hdk1 (repartitoned)
> to the array. The sync began, but after that hde1 was faulty.
So you created a raid5 containing one drive that was already faulty.
That is unfortunate!
>
> I removed it, readded it, but now I've a raid5 with only one active
> disk (which should not be possible imho, a raid5 always needs 2 disks)
> AND what's even stranger for me, I've two spare disks.
If you have a raid5 with 2 working drives and one fails, how many
working drives do you expect to be left? 1. So the raid is no longer
fully functional. You might be able to read some data, but you want
able to write.
What did you expect to happen when hde1 failed?
>
> Is there a way to force rebuilding the array?
Well, you can create the array over md1 and hde1 again, and your data
should still be there, but it will just fail again whenever it tries
to access the block on hde1 which is bad.
I suggest you:
- recreate the array over md1 and hde1
- copy the data back to hdk
- stop the array
- replace hde1
- make the array.
- read the entire array (dd > /dev/null) to make sure it is safe
- copy data back from hdk
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem with md: Not rebuilding rai5
2006-08-29 9:26 ` Neil Brown
@ 2006-08-29 9:40 ` Nico Schottelius
2006-08-29 10:28 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Nico Schottelius @ 2006-08-29 9:40 UTC (permalink / raw)
To: Neil Brown; +Cc: LKML
[-- Attachment #1: Type: text/plain, Size: 2475 bytes --]
Neil Brown [Tue, Aug 29, 2006 at 07:26:52PM +1000]:
> On Tuesday August 29, nico-kernel20060829@schottelius.org wrote:
> > Hello!
> >
> > I created a degrated raid5 on top of md1 and hde1. Then moved the data
> > from /dev/hdk to the mounted raid5, and then added hdk1 (repartitoned)
> > to the array. The sync began, but after that hde1 was faulty.
>
> So you created a raid5 containing one drive that was already faulty.
> That is unfortunate!
And reported in the manpage of mdadm to be usable (simply specify "missing"
as keyword).
> > I removed it, readded it, but now I've a raid5 with only one active
> > disk (which should not be possible imho, a raid5 always needs 2 disks)
> > AND what's even stranger for me, I've two spare disks.
>
> If you have a raid5 with 2 working drives and one fails, how many
> working drives do you expect to be left? 1. So the raid is no longer
> fully functional.
That is what I also thought. But there are some points that make me
wonder:
a) why is hdk1 marked as spare? I added it this morning and the
rebuilt began. Though something happened (I do not know what)
and made hdk1 not beeing in the array. (dmesg output
is now available at
http://home.schottelius.org/~nico/linux/debug/raid/raid5.strange.dmesg)
b) what's the reason, linux does not mark md2 as faulty?
> You might be able to read some data, but you want
> able to write.
> What did you expect to happen when hde1 failed?
I expected hdk1 and md1 to work.
> > Is there a way to force rebuilding the array?
>
> Well, you can create the array over md1 and hde1 again, and your data
> should still be there, but it will just fail again whenever it tries
> to access the block on hde1 which is bad.
That's clear.
> I suggest you:
> - recreate the array over md1 and hde1
> - copy the data back to hdk
> - stop the array
> - replace hde1
> - make the array.
> - read the entire array (dd > /dev/null) to make sure it is safe
> - copy data back from hdk
Will linux detect, that md1 and hde1 are from the same array
and will it see which harddisk is xored with which one?
Perhaps this is the only way to go. I hope I did not loose too much
data with my 'moving to raid5'-experiement.
Thanks for the suggestions so far!
Nico
--
``...if there's one thing about Linux users, they're do-ers, not whiners.''
(A quotation of Andy Patrizio I completely agree with)
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem with md: Not rebuilding rai5
2006-08-29 9:40 ` Nico Schottelius
@ 2006-08-29 10:28 ` Neil Brown
0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2006-08-29 10:28 UTC (permalink / raw)
To: Nico Schottelius; +Cc: LKML
On Tuesday August 29, nico-kernel20060829@schottelius.org wrote:
> Neil Brown [Tue, Aug 29, 2006 at 07:26:52PM +1000]:
> > On Tuesday August 29, nico-kernel20060829@schottelius.org wrote:
> > > Hello!
> > >
> > > I created a degrated raid5 on top of md1 and hde1. Then moved the data
> > > from /dev/hdk to the mounted raid5, and then added hdk1 (repartitoned)
> > > to the array. The sync began, but after that hde1 was faulty.
> >
> > So you created a raid5 containing one drive that was already faulty.
> > That is unfortunate!
>
> And reported in the manpage of mdadm to be usable (simply specify "missing"
> as keyword).
That's not what I meant.
Creating a 3 drive raid5 with 2 working drives and one 'missing' drive
is fine.
Creating a 3 drive raid5 with one good drive, one faulty drive and one
missing drive is unfortunate. But the doing anything with a faulty
drive is unfortunate.
>
> > > I removed it, readded it, but now I've a raid5 with only one active
> > > disk (which should not be possible imho, a raid5 always needs 2 disks)
> > > AND what's even stranger for me, I've two spare disks.
> >
> > If you have a raid5 with 2 working drives and one fails, how many
> > working drives do you expect to be left? 1. So the raid is no longer
> > fully functional.
>
> That is what I also thought. But there are some points that make me
> wonder:
> a) why is hdk1 marked as spare? I added it this morning and the
> rebuilt began. Though something happened (I do not know what)
> and made hdk1 not beeing in the array. (dmesg output
> is now available at
> http://home.schottelius.org/~nico/linux/debug/raid/raid5.strange.dmesg)
You added hdk1 as a spare. md noticed that the array was degraded and
a spare was available so it started recovery of the missing drive on
to the spare. It remains as a 'spare' until recovery is complete.
Then it becomes a full member.
However recovery didn't complete due to a read error on hde. So hdk1
is still a spare.
Then you removed and re-added hde1, so it was a spare too. md didn't
try to reconstruct onto either spare as it didn't have enough working
drives to perform a reconstruction.
>
> b) what's the reason, linux does not mark md2 as faulty?
>
This is no sense in which an array is marked faulty. There is no
place to put the marking.
If you write to md2, it will fail.
If you read from md2, it might succeed or it might not, depending on
whether the data you try to read is stored on the working device or on
a failed device.
> > You might be able to read some data, but you want
> > able to write.
> > What did you expect to happen when hde1 failed?
>
> I expected hdk1 and md1 to work.
However md hasn't completely the recovery onto hdk1, so there is
nothing that can be done.
>
> > I suggest you:
> > - recreate the array over md1 and hde1
> > - copy the data back to hdk
> > - stop the array
> > - replace hde1
> > - make the array.
> > - read the entire array (dd > /dev/null) to make sure it is safe
> > - copy data back from hdk
>
> Will linux detect, that md1 and hde1 are from the same array
> and will it see which harddisk is xored with which one?
Linux won't detect anything. But if you create the array in the same
way that you did before, with md1 and hde1, the data will still be
where it was. And when you then read from the new md2, you will get
the data that was there before hde1 failed (as long as you don't read
from the block on hde1 that is faulty).
Good luck.
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-08-29 10:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-29 9:12 Problem with md: Not rebuilding rai5 Nico Schottelius
2006-08-29 9:26 ` Neil Brown
2006-08-29 9:40 ` Nico Schottelius
2006-08-29 10:28 ` Neil Brown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.