linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Please help me save my data
@ 2006-09-08 13:36 martin.kihlgren
  0 siblings, 0 replies; 7+ messages in thread
From: martin.kihlgren @ 2006-09-08 13:36 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb


Hello list.

I have a spot of trouble with a RAID5 array of mine, and I thought maybe you could help me.

This is the story so far:

 * I bought 10 external USB drives. This seemed like a good idea, they are
   cheap, they are hot-pluggable and they are fast enough.
 * I set them up in two RAID5 arrays, which I set up as LVM pv's. Then I
   created an LVM vg out of these and an LVM lv out of the vg.
 * I encrypted this lv and formatted it with an xfs fs.

This all worked perfectly fine, until I realized how bad these drives and this USB controller work with ehci_hcd.

In short, the devices get reset all the time. And each time they get reset, everything stops for a while. Nothing strange and no showstopper 
here.

But the really bad thing is when they reset in some extra-bad way, and get dropped completely. What happens then is that the RAID5 system 
drops them, and they get reincarnated with a new device name. /dev/sdi becomes suddenly /dev/sdn or something similarly horrbile.

And since I (up until yesterday) didnt know about write-intent bitmaps each resync took around 10 hours. Plenty of time for ANOTHER disk to 
fail and get dropped.

This I usually solved by doing mdadm -S and then mdadm -A -f.

Yesterday, however, I was feeling extra clever, and I just did mdadm -a /dev/md1 /dev/sdn1.

This was a huge mistake.

What had happened, I now realized, was this:

 * /dev/md1 is fine
 * /dev/sdX1 drops, and /dev/md1 is degraded
 * I re-add /dev/sdX1 in its new guise, and /dev/md1 is resyncing with
   4 working drives and one spare
 * /dev/sdY1 drops, and /dev/md1 stops
 * I re-add /dev/sdY1 in its new guise, and mdadm marks it as a SPARE. * I suddenly have an array with 3 working drives and 2 spares where
   I know that one spare is in fact synced and ready to go, since
   the array stopped the moment it failed.

Also, I dont know any longer WHERE in the array the synced but
spare-marked drive should go. I know that the working drives are 0, 2 and 4, but not where the synced spare drive should go.

So, what I want to do is:

 * Mark the synced spare drive as working and in position 1
 * Assemble the array without the unsynced spare and check if this
   provides consistent data
 * If it didnt, I want to mark the synced spare as working and in
   position 3, and try the same thing again
 * When I have it working, I just want to add the unsynced spare and
   let it sync normally
 * Then I will create a write-intent bitmap to avoid the dangerously
   long sync times, and also buy a new USB controller hoping that it will solve my problems

So, do you guys have any idea how I can do this? mdadm doesnt support changing the superblock in such a free hand manner...

Please help me save this data :/ It is precious to me :.(

regards,
//Martin Kihlgren


^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: Please help me save my data
@ 2006-09-11 12:16 martin.kihlgren
  2006-09-11 13:10 ` Patrick Hoover
  0 siblings, 1 reply; 7+ messages in thread
From: martin.kihlgren @ 2006-09-11 12:16 UTC (permalink / raw)
  To: linux-raid

> On 9/8/06, martin.kihlgren@adocca.com <martin.kihlgren@adocca.com> wrote:
>> So, what I want to do is:
>>
>>  * Mark the synced spare drive as working and in position 1
>>  * Assemble the array without the unsynced spare and check if this
>>    provides consistent data
>>  * If it didnt, I want to mark the synced spare as working and in
>>    position 3, and try the same thing again
>>  * When I have it working, I just want to add the unsynced spare and
>>    let it sync normally
>>  * Then I will create a write-intent bitmap to avoid the dangerously
>>    long sync times, and also buy a new USB controller hoping that it
will solve my problems
>
> You can recreate the raid array with 1 missing disk, like this:
>
> mdadm -C /dev/md1 /dev/sdn1 /dev/sdX1 /dev/sdn1 /dev/sdn1 missing
>
> The ordering is relevant, raid-disks 0,1,2,3,4 or so. beware, you have
to have block size and symmetry correct, so better backup mdadm
> --examine and --detail output beforehand.
>
> This create op causes no sync (no danger data overwrites), as there is
still the one drive missing, but raid-superblocks are rewritten.
>
> (On a sidenote, i'm uncertain if a bitmap helps in the case of
> single-device remove-add cycle? I thought it was only for crashes, at
least for now..)
>

Thanks for your help! Your advice is good, and I will use it next time.

This time I found an old USB memory stick to experiment with, and managed
to do pretty much the same thing with:

mdadm -C -l 5 -n 5 -f -e 1.2 --assume-clean /dev/md1 /dev....
mdadm -f /dev/md1 /dev/borken_device

And yes, the ordering was very relevant. An xfs_check showed me which
ordering was correct however. But I still have a problem with not easily
knowing what physical drive is what raid device, since USB devices get
ordered in some random way.

And no, the bitmap didnt help in this case (it has happened again but with
only one disk)... I wish my USB worked better, but I guess its a question
of time and kernel development.

Thanks anyhow!
regards,
//Martin


^ permalink raw reply	[flat|nested] 7+ messages in thread
* Please help me save my data
@ 2006-09-08 13:26 martin.kihlgren
  2006-09-09 16:00 ` Tuomas Leikola
  0 siblings, 1 reply; 7+ messages in thread
From: martin.kihlgren @ 2006-09-08 13:26 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb


Hello list.

I have a spot of trouble with a RAID5 array of mine, and I thought maybe
you could help me.

This is the story so far:

 * I bought 10 external USB drives. This seemed like a good idea, they are
   cheap, they are hot-pluggable and they are fast enough.
 * I set them up in two RAID5 arrays, which I set up as LVM pv's. Then I
   created an LVM vg out of these and an LVM lv out of the vg.
 * I encrypted this lv and formatted it with an xfs fs.

This all worked perfectly fine, until I realized how bad these drives and
this USB controller work with ehci_hcd.

In short, the devices get reset all the time. And each time they get
reset, everything stops for a while. Nothing strange and no showstopper
here.

But the really bad thing is when they reset in some extra-bad way, and get
dropped completely. What happens then is that the RAID5 system drops them,
and they get reincarnated with a new device name. /dev/sdi becomes
suddenly /dev/sdn or something similarly horrbile.

And since I (up until yesterday) didnt know about write-intent bitmaps
each resync took around 10 hours. Plenty of time for ANOTHER disk to fail
and get dropped.

This I usually solved by doing mdadm -S and then mdadm -A -f.

Yesterday, however, I was feeling extra clever, and I just did mdadm -a
/dev/md1 /dev/sdn1.

This was a huge mistake.

What had happened, I now realized, was this:

 * /dev/md1 is fine
 * /dev/sdX1 drops, and /dev/md1 is degraded
 * I re-add /dev/sdX1 in its new guise, and /dev/md1 is resyncing with
   4 working drives and one spare
 * /dev/sdY1 drops, and /dev/md1 stops
 * I re-add /dev/sdY1 in its new guise, and mdadm marks it as a SPARE.
 * I suddenly have an array with 3 working drives and 2 spares where
   I know that one spare is in fact synced and ready to go, since
   the array stopped the moment it failed.

Also, I dont know any longer WHERE in the array the synced but
spare-marked drive should go. I know that the working drives are 0, 2 and
4, but not where the synced spare drive should go.

So, what I want to do is:

 * Mark the synced spare drive as working and in position 1
 * Assemble the array without the unsynced spare and check if this
   provides consistent data
 * If it didnt, I want to mark the synced spare as working and in
   position 3, and try the same thing again
 * When I have it working, I just want to add the unsynced spare and
   let it sync normally
 * Then I will create a write-intent bitmap to avoid the dangerously
   long sync times, and also buy a new USB controller hoping that it
   will solve my problems

So, do you guys have any idea how I can do this? mdadm doesnt support
changing the superblock in such a free hand manner...

Please help me save this data :/ It is precious to me :.(

regards,
//Martin Kihlgren

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-09-16 23:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-08 13:36 Please help me save my data martin.kihlgren
  -- strict thread matches above, loose matches on Subject: below --
2006-09-11 12:16 martin.kihlgren
2006-09-11 13:10 ` Patrick Hoover
2006-09-16 13:14   ` Molle Bestefich
2006-09-16 23:53     ` martin.kihlgren
2006-09-08 13:26 martin.kihlgren
2006-09-09 16:00 ` Tuomas Leikola

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).