linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Recreate raid 10 array
@ 2009-04-06 19:49 LCID Fire
  2009-04-07  6:13 ` Goswin von Brederlow
  0 siblings, 1 reply; 11+ messages in thread
From: LCID Fire @ 2009-04-06 19:49 UTC (permalink / raw)
  To: linux-raid

On my raid10 array (4 drives) I've had 2 drives (the same model) which 
got disconnected by the kernel (almost at the same time). It seems like 
both were one raid1 part so one half of the raid0 is missing.
Afterwards I tried to readd the drives again (my bad) so now I'm stuck 
with 1 half of the raid0 part being present and valid and the other (2 
drives) marked as spare.
The thing I'd like to do is:
- Take 2 new drives (different manufacturers)
- Clone one of the valid and one of the spare drives to the new drives
- Try to reassemble the array.
Problem is - is this even possible?
How do I tell mdadm to not care about the spare state?
Does the kernel write bogus data to the still valid drives if the other 
2 fail?

Would be great if someone could enlighten me ;)

P.S.: Does someone know how to easily report the different sata errors 
which the kernel encounters?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-06 19:49 Recreate raid 10 array LCID Fire
@ 2009-04-07  6:13 ` Goswin von Brederlow
  2009-04-08 21:47   ` Bill Davidsen
  0 siblings, 1 reply; 11+ messages in thread
From: Goswin von Brederlow @ 2009-04-07  6:13 UTC (permalink / raw)
  To: LCID Fire; +Cc: linux-raid

LCID Fire <lcid-fire@gmx.net> writes:

> On my raid10 array (4 drives) I've had 2 drives (the same model) which
> got disconnected by the kernel (almost at the same time). It seems
> like both were one raid1 part so one half of the raid0 is missing.
> Afterwards I tried to readd the drives again (my bad) so now I'm stuck
> with 1 half of the raid0 part being present and valid and the other (2
> drives) marked as spare.
> The thing I'd like to do is:
> - Take 2 new drives (different manufacturers)
> - Clone one of the valid and one of the spare drives to the new drives
> - Try to reassemble the array.
> Problem is - is this even possible?
> How do I tell mdadm to not care about the spare state?
> Does the kernel write bogus data to the still valid drives if the
> other 2 fail?
>
> Would be great if someone could enlighten me ;)
>
> P.S.: Does someone know how to easily report the different sata errors
> which the kernel encounters?

mdadm --create --assume-clean -l 10 -n 4 /dev/mdX /dev/copied_disk_1 /dev/copied_disk2 missing missing

You need to match the create parameters exactly with the ones you
initially used (near/offset/farcopies? stripe size? ...) and the order
of devices is relevant so you might have to shuffle the disk
arguments. So just try different orders till the result can be mounted
or fscked. With the wrong options the mount/fsck could screw up the
data but then you copy the disk again for the next try. It should be
reasonably obvious when mount/fsck goes wrong as it should find tons
of errors. Mostly I would expect mount/fsck to just fail with the
wrong mdadm args though.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-07  6:13 ` Goswin von Brederlow
@ 2009-04-08 21:47   ` Bill Davidsen
  2009-04-08 21:57     ` Andrew Burgess
  0 siblings, 1 reply; 11+ messages in thread
From: Bill Davidsen @ 2009-04-08 21:47 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: LCID Fire, linux-raid

Goswin von Brederlow wrote:
> LCID Fire <lcid-fire@gmx.net> writes:
>
>   
>> On my raid10 array (4 drives) I've had 2 drives (the same model) which
>> got disconnected by the kernel (almost at the same time). It seems
>> like both were one raid1 part so one half of the raid0 is missing.
>> Afterwards I tried to readd the drives again (my bad) so now I'm stuck
>> with 1 half of the raid0 part being present and valid and the other (2
>> drives) marked as spare.
>> The thing I'd like to do is:
>> - Take 2 new drives (different manufacturers)
>> - Clone one of the valid and one of the spare drives to the new drives
>> - Try to reassemble the array.
>> Problem is - is this even possible?
>> How do I tell mdadm to not care about the spare state?
>> Does the kernel write bogus data to the still valid drives if the
>> other 2 fail?
>>
>> Would be great if someone could enlighten me ;)
>>
>> P.S.: Does someone know how to easily report the different sata errors
>> which the kernel encounters?
>>     
>
> mdadm --create --assume-clean -l 10 -n 4 /dev/mdX /dev/copied_disk_1 /dev/copied_disk2 missing missing
>
> You need to match the create parameters exactly with the ones you
> initially used (near/offset/farcopies? stripe size? ...) and the order
> of devices is relevant so you might have to shuffle the disk
> arguments. So just try different orders till the result can be mounted
> or fscked. With the wrong options the mount/fsck could screw up the
> data but then you copy the disk again for the next try. It should be
> reasonably obvious when mount/fsck goes wrong as it should find tons
> of errors. Mostly I would expect mount/fsck to just fail with the
> wrong mdadm args though.
>   

May I say that this makes a great case for saving the contents of some 
files to a safe place when the system is up and running right.? Maybe 
all of /etc, and at least a "tree /sys" and /proc/mdstat would be 
useful, preferably on something readable like a CD or USB flash drive, 
so you have a chance of reading it if you can't boot.

Of course a rescue flash drive is pretty useful as well, so that's 
probably the way to go.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-08 21:47   ` Bill Davidsen
@ 2009-04-08 21:57     ` Andrew Burgess
  2009-04-08 22:13       ` Goswin von Brederlow
  2009-04-08 22:14       ` LCID Fire
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Burgess @ 2009-04-08 21:57 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Goswin von Brederlow, LCID Fire, linux-raid

On Wed, 2009-04-08 at 17:47 -0400, Bill Davidsen wrote:
> Goswin von Brederlow wrote:
> > mdadm --create --assume-clean -l 10 -n 4 /dev/mdX /dev/copied_disk_1 /dev/copied_disk2 missing missing
> >
> > You need to match the create parameters exactly with the ones you
> > initially used (near/offset/farcopies? stripe size? ...) and the order
> > of devices is relevant so you might have to shuffle the disk
> > arguments. So just try different orders till the result can be mounted
> > or fscked. With the wrong options the mount/fsck could screw up the
> > data but then you copy the disk again for the next try. It should be
> > reasonably obvious when mount/fsck goes wrong as it should find tons
> > of errors. Mostly I would expect mount/fsck to just fail with the
> > wrong mdadm args though.

Most fscks can be told to run read-only so they won't write to the
device and also interactive so they ask before writing so you should be
able to avoid recopying. The ext3 journal recovery violates at least one
of these IIRC (or used to) so if it's ext3 find an option to tell it to
ignore the journal.

> May I say that this makes a great case for saving the contents of some 
> files to a safe place when the system is up and running right.? Maybe 
> all of /etc, and at least a "tree /sys" and /proc/mdstat would be 
> useful, preferably on something readable like a CD or USB flash drive, 
> so you have a chance of reading it if you can't boot.
> 
> Of course a rescue flash drive is pretty useful as well, so that's 
> probably the way to go.

It's a good idea

It also seems like mdadm could be enhanced to figure stuff like this out
given intact device superblocks (I suggest --wild-ass-guess as the
option name)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-08 21:57     ` Andrew Burgess
@ 2009-04-08 22:13       ` Goswin von Brederlow
  2009-04-08 22:14       ` LCID Fire
  1 sibling, 0 replies; 11+ messages in thread
From: Goswin von Brederlow @ 2009-04-08 22:13 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Bill Davidsen, Goswin von Brederlow, LCID Fire, linux-raid

Andrew Burgess <aab@cichlid.com> writes:

> It also seems like mdadm could be enhanced to figure stuff like this out
> given intact device superblocks (I suggest --wild-ass-guess as the
> option name)

Like (an imaginary)

mdadm --recreate --assume-clean /dev/mdX /dev/disk1 /dev/disk2

figuring out all the right parameters, number of missing disks and
positions of the existing ones and then writing a new supberlock? The
ultimate --assemble --force option.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-08 21:57     ` Andrew Burgess
  2009-04-08 22:13       ` Goswin von Brederlow
@ 2009-04-08 22:14       ` LCID Fire
  2009-04-09 10:47         ` Andrew Burgess
                           ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: LCID Fire @ 2009-04-08 22:14 UTC (permalink / raw)
  Cc: linux-raid

First off the good news: I'm currently running on my raid10 again - with 
only little data loss.

Andrew Burgess wrote:
> On Wed, 2009-04-08 at 17:47 -0400, Bill Davidsen wrote:
>> Goswin von Brederlow wrote:
>>> mdadm --create --assume-clean -l 10 -n 4 /dev/mdX /dev/copied_disk_1 /dev/copied_disk2 missing missing
>>>
>>> You need to match the create parameters exactly with the ones you
>>> initially used (near/offset/farcopies? stripe size? ...) and the order
>>> of devices is relevant so you might have to shuffle the disk
>>> arguments. So just try different orders till the result can be mounted
>>> or fscked. With the wrong options the mount/fsck could screw up the
>>> data but then you copy the disk again for the next try. It should be
>>> reasonably obvious when mount/fsck goes wrong as it should find tons
>>> of errors. Mostly I would expect mount/fsck to just fail with the
>>> wrong mdadm args though.
> 
> Most fscks can be told to run read-only so they won't write to the
> device and also interactive so they ask before writing so you should be
> able to avoid recopying. The ext3 journal recovery violates at least one
> of these IIRC (or used to) so if it's ext3 find an option to tell it to
> ignore the journal.
Too late. The journal recovery did complain quite a bit and I didn't 
know better than to have it fix the things it liked to fix.
As a result it shows the problem with many apps using sqlite these days 
- it's not very good when the database file is corrupted.

>> May I say that this makes a great case for saving the contents of some 
>> files to a safe place when the system is up and running right.? Maybe 
>> all of /etc, and at least a "tree /sys" and /proc/mdstat would be 
>> useful, preferably on something readable like a CD or USB flash drive, 
>> so you have a chance of reading it if you can't boot.
>>
>> Of course a rescue flash drive is pretty useful as well, so that's 
>> probably the way to go.
Quite frankly I don't really care about / - as long as my /home is safe 
- because I can setup my machine again - but losing my work means losing 
far more time.

> It also seems like mdadm could be enhanced to figure stuff like this out
> given intact device superblocks (I suggest --wild-ass-guess as the
> option name)
That would be great (not that I'm eager to run into that again).

As a note I did a binary comparison between the raid1 stuff and got 
quite shocked. The corrupted one had around 1.000.000 byte difference - 
something I would expect - but even the valid mirror had around 20.0000 
bytes difference - which I can't explain to myself this easily.

Anyway - thanks guys for the great help.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-08 22:14       ` LCID Fire
@ 2009-04-09 10:47         ` Andrew Burgess
  2009-04-10  1:41           ` Goswin von Brederlow
  2009-04-09 22:38         ` Bill Davidsen
  2009-04-10 11:01         ` LCID Fire
  2 siblings, 1 reply; 11+ messages in thread
From: Andrew Burgess @ 2009-04-09 10:47 UTC (permalink / raw)
  To: LCID Fire; +Cc: linux-raid

On Thu, 2009-04-09 at 00:14 +0200, LCID Fire wrote:

> > Most fscks can be told to run read-only so they won't write to the
> > device and also interactive so they ask before writing so you should be
> > able to avoid recopying. The ext3 journal recovery violates at least one
> > of these IIRC (or used to) so if it's ext3 find an option to tell it to
> > ignore the journal.

> Too late. The journal recovery did complain quite a bit and I didn't 
> know better than to have it fix the things it liked to fix.

Sorry to hear that. Actually our advice should have been to assemble the
array read-only, then all manner of fs bugs/features could not write to
it.

If anyone ever implements the --wild-ass-guess/--recreate mdadm option
perhaps that should automatically set read-only?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-08 22:14       ` LCID Fire
  2009-04-09 10:47         ` Andrew Burgess
@ 2009-04-09 22:38         ` Bill Davidsen
  2009-04-10 11:01         ` LCID Fire
  2 siblings, 0 replies; 11+ messages in thread
From: Bill Davidsen @ 2009-04-09 22:38 UTC (permalink / raw)
  To: LCID Fire; +Cc: linux-raid

LCID Fire wrote:
> First off the good news: I'm currently running on my raid10 again - 
> with only little data loss.
>
> Andrew Burgess wrote:
>> On Wed, 2009-04-08 at 17:47 -0400, Bill Davidsen wrote:
>>> Goswin von Brederlow wrote:
>>>> mdadm --create --assume-clean -l 10 -n 4 /dev/mdX 
>>>> /dev/copied_disk_1 /dev/copied_disk2 missing missing
>>>>
>>>> You need to match the create parameters exactly with the ones you
>>>> initially used (near/offset/farcopies? stripe size? ...) and the order
>>>> of devices is relevant so you might have to shuffle the disk
>>>> arguments. So just try different orders till the result can be mounted
>>>> or fscked. With the wrong options the mount/fsck could screw up the
>>>> data but then you copy the disk again for the next try. It should be
>>>> reasonably obvious when mount/fsck goes wrong as it should find tons
>>>> of errors. Mostly I would expect mount/fsck to just fail with the
>>>> wrong mdadm args though.
>>
>> Most fscks can be told to run read-only so they won't write to the
>> device and also interactive so they ask before writing so you should be
>> able to avoid recopying. The ext3 journal recovery violates at least one
>> of these IIRC (or used to) so if it's ext3 find an option to tell it to
>> ignore the journal.
> Too late. The journal recovery did complain quite a bit and I didn't 
> know better than to have it fix the things it liked to fix.
> As a result it shows the problem with many apps using sqlite these 
> days - it's not very good when the database file is corrupted.
>
>>> May I say that this makes a great case for saving the contents of 
>>> some files to a safe place when the system is up and running right.? 
>>> Maybe all of /etc, and at least a "tree /sys" and /proc/mdstat would 
>>> be useful, preferably on something readable like a CD or USB flash 
>>> drive, so you have a chance of reading it if you can't boot.
>>>
>>> Of course a rescue flash drive is pretty useful as well, so that's 
>>> probably the way to go.
> Quite frankly I don't really care about / - as long as my /home is 
> safe - because I can setup my machine again - but losing my work means 
> losing far more time.
>
>> It also seems like mdadm could be enhanced to figure stuff like this out
>> given intact device superblocks (I suggest --wild-ass-guess as the
>> option name)
> That would be great (not that I'm eager to run into that again).
>
> As a note I did a binary comparison between the raid1 stuff and got 
> quite shocked. The corrupted one had around 1.000.000 byte difference 
> - something I would expect - but even the valid mirror had around 
> 20.0000 bytes difference - which I can't explain to myself this easily.

My personal explanation for mismatch on raid1 swap is this: because 
software raid isn't real raid, you don't get a mirror by writing the 
same page once only to two different drives by sending the data to the 
controller and letting that happen. What you get is the kernel sending 
the same page to two drives without locking the page so it can't change 
as it's being written. That's a gross over-simplification, but I think 
it addresses the heart of the matter. To prevent that would require a 
flag which causes COW until the page had been moved to the controller 
for the last time.

The alternate I've seen proposed is that somewhere between writing 
copies the page is deallocated so the second (or Nth for more than two 
way mirror) write is abandoned.

In any case this appears to be because the page in memory changes 
between write and/or not all writes are mirrored.

The interesting thing is that some systems seem to have lots of 
mishatches and some almost always none.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-09 10:47         ` Andrew Burgess
@ 2009-04-10  1:41           ` Goswin von Brederlow
  0 siblings, 0 replies; 11+ messages in thread
From: Goswin von Brederlow @ 2009-04-10  1:41 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: LCID Fire, linux-raid

Andrew Burgess <aab@cichlid.com> writes:

> On Thu, 2009-04-09 at 00:14 +0200, LCID Fire wrote:
>
>> > Most fscks can be told to run read-only so they won't write to the
>> > device and also interactive so they ask before writing so you should be
>> > able to avoid recopying. The ext3 journal recovery violates at least one
>> > of these IIRC (or used to) so if it's ext3 find an option to tell it to
>> > ignore the journal.
>
>> Too late. The journal recovery did complain quite a bit and I didn't 
>> know better than to have it fix the things it liked to fix.
>
> Sorry to hear that. Actually our advice should have been to assemble the
> array read-only, then all manner of fs bugs/features could not write to
> it.

But then you couldn't mount it at all since the journal replay needs
to happen in ext3.

> If anyone ever implements the --wild-ass-guess/--recreate mdadm option
> perhaps that should automatically set read-only?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-08 22:14       ` LCID Fire
  2009-04-09 10:47         ` Andrew Burgess
  2009-04-09 22:38         ` Bill Davidsen
@ 2009-04-10 11:01         ` LCID Fire
  2009-04-10 14:25           ` LCID Fire
  2 siblings, 1 reply; 11+ messages in thread
From: LCID Fire @ 2009-04-10 11:01 UTC (permalink / raw)
  To: linux-raid

LCID Fire wrote:
> First off the good news: I'm currently running on my raid10 again - 
> with only little data loss.
And now also some bad news again :(
Appearently something is wrong.
mount /dev/md0 /mnt
complains
mount: unknown filesystem type 'linux_raid_member'

On older kernels it works when stating ext3 as the filesystem - but with 
recent 2.6.28/mdadm it doesn't anymore.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recreate raid 10 array
  2009-04-10 11:01         ` LCID Fire
@ 2009-04-10 14:25           ` LCID Fire
  0 siblings, 0 replies; 11+ messages in thread
From: LCID Fire @ 2009-04-10 14:25 UTC (permalink / raw)
  To: linux-raid

LCID Fire wrote:
> And now also some bad news again :(
> Appearently something is wrong.
> mount /dev/md0 /mnt
> complains
> mount: unknown filesystem type 'linux_raid_member'
> 
> On older kernels it works when stating ext3 as the filesystem - but with 
> recent 2.6.28/mdadm it doesn't anymore.
Seems like I accidentally added /dev/md0 to another (temporarily) raid 
device. After removing the superblock it works fine now.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-04-10 14:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-06 19:49 Recreate raid 10 array LCID Fire
2009-04-07  6:13 ` Goswin von Brederlow
2009-04-08 21:47   ` Bill Davidsen
2009-04-08 21:57     ` Andrew Burgess
2009-04-08 22:13       ` Goswin von Brederlow
2009-04-08 22:14       ` LCID Fire
2009-04-09 10:47         ` Andrew Burgess
2009-04-10  1:41           ` Goswin von Brederlow
2009-04-09 22:38         ` Bill Davidsen
2009-04-10 11:01         ` LCID Fire
2009-04-10 14:25           ` LCID Fire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).