HELP! my raid5 ate my data!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* HELP! my raid5 ate my data!
@ 2008-10-21 16:15 Morgan Wahl
  2008-10-21 16:31 ` Raz
  0 siblings, 1 reply; 6+ messages in thread
From: Morgan Wahl @ 2008-10-21 16:15 UTC (permalink / raw)
  To: linux-raid

yesterday I had a problem with the SATA bus one of my drives is on,
which i've since fixed. I have a raid5 on four 20GB partitions (each
on it's own drive, of course). The one that failed was sdc1. when i
rebooted, the raid was no longer active (according to `mdadm -Q
/dev/md0`).  I ran `mdadm --examine ` on each drive. For sdc1 it said
everything was fine, but for the other three drives sdc1 was marked as
failed. I've never recovered from a drive failure before and I don't
think I did it correctly. I removed sdc1 from the raid and then
incrementally added it again. (I now realize I should've started the
raid with it removed and done a backup.) I started the raid again with
all four drives, and put it in readonly mode and tried to mount it,
but it presumably still isn't set up right since it refuses to mount
and `e2fsck -n` returns myriad errors.

Did starting the raid with the bad disk destroy everything? did it
only destory a little (assuming fsck can get the filesystem back into
a useable state)? How can I even find out what's wrong? I do have a
separate terabyte disk that i've copied images of the disks to, so I
can perform experiment on them, but I'm not sure what to do.

Help! Please!

   -Morgan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HELP! my raid5 ate my data!
  2008-10-21 16:15 HELP! my raid5 ate my data! Morgan Wahl
@ 2008-10-21 16:31 ` Raz
  2008-10-21 16:40   ` Morgan Wahl
  0 siblings, 1 reply; 6+ messages in thread
From: Raz @ 2008-10-21 16:31 UTC (permalink / raw)
  To: Morgan Wahl; +Cc: linux-raid

Please provide the exact sequence of operations. what mdadm command
did you use at each phase ? Did you try to mount it rw ?

On Tue, Oct 21, 2008 at 6:15 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
> yesterday I had a problem with the SATA bus one of my drives is on,
> which i've since fixed. I have a raid5 on four 20GB partitions (each
> on it's own drive, of course). The one that failed was sdc1. when i
> rebooted, the raid was no longer active (according to `mdadm -Q
> /dev/md0`).  I ran `mdadm --examine ` on each drive. For sdc1 it said
> everything was fine, but for the other three drives sdc1 was marked as
> failed. I've never recovered from a drive failure before and I don't
> think I did it correctly. I removed sdc1 from the raid and then
> incrementally added it again. (I now realize I should've started the
> raid with it removed and done a backup.) I started the raid again with
> all four drives, and put it in readonly mode and tried to mount it,
> but it presumably still isn't set up right since it refuses to mount
> and `e2fsck -n` returns myriad errors.
>
> Did starting the raid with the bad disk destroy everything? did it
> only destory a little (assuming fsck can get the filesystem back into
> a useable state)? How can I even find out what's wrong? I do have a
> separate terabyte disk that i've copied images of the disks to, so I
> can perform experiment on them, but I'm not sure what to do.
>
> Help! Please!
>
>   -Morgan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HELP! my raid5 ate my data!
  2008-10-21 16:31 ` Raz
@ 2008-10-21 16:40   ` Morgan Wahl
  2008-10-21 16:44     ` Raz
  2008-10-25  6:49     ` Neil Brown
  0 siblings, 2 replies; 6+ messages in thread
From: Morgan Wahl @ 2008-10-21 16:40 UTC (permalink / raw)
  To: linux-raid

these lines are the only uncommented ones in mdadm.conf:
DEVICE /dev/sd[abcd]1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

mdadm -Q /dev/md0
mdadm --examine /dev/sda1
mdadm --examine /dev/sdb1
mdadm --examine /dev/sdc1
mdadm --examine /dev/sdd1

mdadm /dev/md0 --remove /dev/sdc1

mdadm -A /dev/md0

mdadm --readonly /dev/md0

mount /dev/md0 /mnt/temp
(this return an error)
e2fsck -n /dev/md0
(lots of errors)

On Tue, Oct 21, 2008 at 12:31 PM, Raz <raziebe@gmail.com> wrote:
> Please provide the exact sequence of operations. what mdadm command
> did you use at each phase ? Did you try to mount it rw ?
>
> On Tue, Oct 21, 2008 at 6:15 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
>> yesterday I had a problem with the SATA bus one of my drives is on,
>> which i've since fixed. I have a raid5 on four 20GB partitions (each
>> on it's own drive, of course). The one that failed was sdc1. when i
>> rebooted, the raid was no longer active (according to `mdadm -Q
>> /dev/md0`).  I ran `mdadm --examine ` on each drive. For sdc1 it said
>> everything was fine, but for the other three drives sdc1 was marked as
>> failed. I've never recovered from a drive failure before and I don't
>> think I did it correctly. I removed sdc1 from the raid and then
>> incrementally added it again. (I now realize I should've started the
>> raid with it removed and done a backup.) I started the raid again with
>> all four drives, and put it in readonly mode and tried to mount it,
>> but it presumably still isn't set up right since it refuses to mount
>> and `e2fsck -n` returns myriad errors.
>>
>> Did starting the raid with the bad disk destroy everything? did it
>> only destory a little (assuming fsck can get the filesystem back into
>> a useable state)? How can I even find out what's wrong? I do have a
>> separate terabyte disk that i've copied images of the disks to, so I
>> can perform experiment on them, but I'm not sure what to do.
>>
>> Help! Please!
>>
>>   -Morgan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HELP! my raid5 ate my data!
  2008-10-21 16:40   ` Morgan Wahl
@ 2008-10-21 16:44     ` Raz
       [not found]       ` <9512192b0810210946o31a451a5i91aa3f03e3b919a9@mail.gmail.com>
  2008-10-25  6:49     ` Neil Brown
  1 sibling, 1 reply; 6+ messages in thread
From: Raz @ 2008-10-21 16:44 UTC (permalink / raw)
  To: Morgan Wahl; +Cc: linux-raid

cat /proc/mdstats.

On Tue, Oct 21, 2008 at 6:40 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
> these lines are the only uncommented ones in mdadm.conf:
> DEVICE /dev/sd[abcd]1
> ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1
>
> mdadm -Q /dev/md0
> mdadm --examine /dev/sda1
> mdadm --examine /dev/sdb1
> mdadm --examine /dev/sdc1
> mdadm --examine /dev/sdd1
>
> mdadm /dev/md0 --remove /dev/sdc1
>
> mdadm -A /dev/md0
>
> mdadm --readonly /dev/md0
>
> mount /dev/md0 /mnt/temp
> (this return an error)
> e2fsck -n /dev/md0
> (lots of errors)
>
> On Tue, Oct 21, 2008 at 12:31 PM, Raz <raziebe@gmail.com> wrote:
>> Please provide the exact sequence of operations. what mdadm command
>> did you use at each phase ? Did you try to mount it rw ?
>>
>> On Tue, Oct 21, 2008 at 6:15 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
>>> yesterday I had a problem with the SATA bus one of my drives is on,
>>> which i've since fixed. I have a raid5 on four 20GB partitions (each
>>> on it's own drive, of course). The one that failed was sdc1. when i
>>> rebooted, the raid was no longer active (according to `mdadm -Q
>>> /dev/md0`).  I ran `mdadm --examine ` on each drive. For sdc1 it said
>>> everything was fine, but for the other three drives sdc1 was marked as
>>> failed. I've never recovered from a drive failure before and I don't
>>> think I did it correctly. I removed sdc1 from the raid and then
>>> incrementally added it again. (I now realize I should've started the
>>> raid with it removed and done a backup.) I started the raid again with
>>> all four drives, and put it in readonly mode and tried to mount it,
>>> but it presumably still isn't set up right since it refuses to mount
>>> and `e2fsck -n` returns myriad errors.
>>>
>>> Did starting the raid with the bad disk destroy everything? did it
>>> only destory a little (assuming fsck can get the filesystem back into
>>> a useable state)? How can I even find out what's wrong? I do have a
>>> separate terabyte disk that i've copied images of the disks to, so I
>>> can perform experiment on them, but I'm not sure what to do.
>>>
>>> Help! Please!
>>>
>>>   -Morgan
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HELP! my raid5 ate my data!
       [not found]       ` <9512192b0810210946o31a451a5i91aa3f03e3b919a9@mail.gmail.com>
@ 2008-10-21 16:54         ` Raz
  0 siblings, 0 replies; 6+ messages in thread
From: Raz @ 2008-10-21 16:54 UTC (permalink / raw)
  To: Morgan Wahl; +Cc: Linux RAID Mailing List

When I these problems I :
1. remove the hard drive from the computer.
2. boot.
3. if raid assembles I mount it. if is is not assembled i mount manually.
I never use ro. only rw.

On Tue, Oct 21, 2008 at 6:46 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1]
>      58604736 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>
> unused devices: <none>
>
> the raid seems to still activate whenever i start my computer, that
> why it's active right now.
>
> On Tue, Oct 21, 2008 at 12:44 PM, Raz <raziebe@gmail.com> wrote:
>> cat /proc/mdstats.
>>
>> On Tue, Oct 21, 2008 at 6:40 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
>>> these lines are the only uncommented ones in mdadm.conf:
>>> DEVICE /dev/sd[abcd]1
>>> ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1
>>>
>>> mdadm -Q /dev/md0
>>> mdadm --examine /dev/sda1
>>> mdadm --examine /dev/sdb1
>>> mdadm --examine /dev/sdc1
>>> mdadm --examine /dev/sdd1
>>>
>>> mdadm /dev/md0 --remove /dev/sdc1
>>>
>>> mdadm -A /dev/md0
>>>
>>> mdadm --readonly /dev/md0
>>>
>>> mount /dev/md0 /mnt/temp
>>> (this return an error)
>>> e2fsck -n /dev/md0
>>> (lots of errors)
>>>
>>> On Tue, Oct 21, 2008 at 12:31 PM, Raz <raziebe@gmail.com> wrote:
>>>> Please provide the exact sequence of operations. what mdadm command
>>>> did you use at each phase ? Did you try to mount it rw ?
>>>>
>>>> On Tue, Oct 21, 2008 at 6:15 PM, Morgan Wahl <morgy.wahl@gmail.com> wrote:
>>>>> yesterday I had a problem with the SATA bus one of my drives is on,
>>>>> which i've since fixed. I have a raid5 on four 20GB partitions (each
>>>>> on it's own drive, of course). The one that failed was sdc1. when i
>>>>> rebooted, the raid was no longer active (according to `mdadm -Q
>>>>> /dev/md0`).  I ran `mdadm --examine ` on each drive. For sdc1 it said
>>>>> everything was fine, but for the other three drives sdc1 was marked as
>>>>> failed. I've never recovered from a drive failure before and I don't
>>>>> think I did it correctly. I removed sdc1 from the raid and then
>>>>> incrementally added it again. (I now realize I should've started the
>>>>> raid with it removed and done a backup.) I started the raid again with
>>>>> all four drives, and put it in readonly mode and tried to mount it,
>>>>> but it presumably still isn't set up right since it refuses to mount
>>>>> and `e2fsck -n` returns myriad errors.
>>>>>
>>>>> Did starting the raid with the bad disk destroy everything? did it
>>>>> only destory a little (assuming fsck can get the filesystem back into
>>>>> a useable state)? How can I even find out what's wrong? I do have a
>>>>> separate terabyte disk that i've copied images of the disks to, so I
>>>>> can perform experiment on them, but I'm not sure what to do.
>>>>>
>>>>> Help! Please!
>>>>>
>>>>>   -Morgan
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HELP! my raid5 ate my data!
  2008-10-21 16:40   ` Morgan Wahl
  2008-10-21 16:44     ` Raz
@ 2008-10-25  6:49     ` Neil Brown
  1 sibling, 0 replies; 6+ messages in thread
From: Neil Brown @ 2008-10-25  6:49 UTC (permalink / raw)
  To: Morgan Wahl; +Cc: linux-raid

On Tuesday October 21, morgy.wahl@gmail.com wrote:
> these lines are the only uncommented ones in mdadm.conf:
> DEVICE /dev/sd[abcd]1
> ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1
> 
> mdadm -Q /dev/md0
> mdadm --examine /dev/sda1
> mdadm --examine /dev/sdb1
> mdadm --examine /dev/sdc1
> mdadm --examine /dev/sdd1
> 
> mdadm /dev/md0 --remove /dev/sdc1

If the array was not running (as I think was implied in your other
email) then this will have done nothing.

> 
> mdadm -A /dev/md0

This should have worked.  What do you get from

  mdadm -Av /dev/md0
??

> 
> mdadm --readonly /dev/md0

This is a sensible precaution.

> 
> mount /dev/md0 /mnt/temp
> (this return an error)
> e2fsck -n /dev/md0
> (lots of errors)

It does sound like it is a mess, but there is nothing obvious in
your description which would have caused it.
Maybe if you could include the output of all the "mdadm --examine"
commands?

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-25  6:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-21 16:15 HELP! my raid5 ate my data! Morgan Wahl
2008-10-21 16:31 ` Raz
2008-10-21 16:40   ` Morgan Wahl
2008-10-21 16:44     ` Raz
     [not found]       ` <9512192b0810210946o31a451a5i91aa3f03e3b919a9@mail.gmail.com>
2008-10-21 16:54         ` Raz
2008-10-25  6:49     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).