Please help- raid1 recovery after disk failure

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Please help- raid1 recovery after disk failure
@ 2004-10-18 22:56 Konstantin Olchanski
  2004-10-18 23:22 ` Mike Tran
  2004-10-18 23:28 ` Guy
  0 siblings, 2 replies; 5+ messages in thread
From: Konstantin Olchanski @ 2004-10-18 22:56 UTC (permalink / raw)
  To: linux-raid

Dear Linux raiders- I ran into a problem with raid1 recovery after
a disk failure (running Fedora2, kernel 2.6.8-1.521smp).

1) I had a raid1 filesystem mirrored across /dev/hda2 and /dev/hdc2.
2) Disk hda died (unreadable sectors, fails SMART tests)
3) A new blank hda was installed and partitionned exactly like hdc.
4) I cannot restart and rebuild the raid1 volume because hdc2 is
   in a funny "spare" state (see below)

How do I mark hdc2 as "active"?
Once "active", I assume then I will be able to restart md0,
hot-add /dev/hda2 as usual. (And the mirror will resync and rebuild itself?
Hopefully?)

[root@tw04 root]# mdadm -E /dev/hdc2
/dev/hdc2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : aade8782:20122089:4f496788:228d85b9
  Creation Time : Fri Oct  8 17:12:56 2004
     Raid Level : raid1
    Device Size : 124158208 (118.41 GiB 127.14 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Mon Oct 18 06:04:35 2004
          State : clean, no-errors
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 7041db42 - correct
         Events : 0.312429

      Number   Major   Minor   RaidDevice State
this     2      22        2        2      spare   /dev/hdc2
   0     0       3        2        0      active sync   /dev/hda2
   1     1       0        0        1      faulty removed
   2     2      22        2        2      spare   /dev/hdc2
[root@tw04 root]# 

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Please help- raid1 recovery after disk failure
  2004-10-18 22:56 Please help- raid1 recovery after disk failure Konstantin Olchanski
@ 2004-10-18 23:22 ` Mike Tran
  2004-10-27  4:20   ` Konstantin Olchanski
  2004-10-18 23:28 ` Guy
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Tran @ 2004-10-18 23:22 UTC (permalink / raw)
  To: linux-raid

I would re-create md0 array with a missing disk as follows:

mdadm -C /dev/md0 -l 1 -n 2 /dev/hdc2 missing

Later you can hot add a disk to make it a normal 2-way mirror array.

--
Regards,
Mike T.

On Mon, 2004-10-18 at 17:56, Konstantin Olchanski wrote:
> Dear Linux raiders- I ran into a problem with raid1 recovery after
> a disk failure (running Fedora2, kernel 2.6.8-1.521smp).
> 
> 1) I had a raid1 filesystem mirrored across /dev/hda2 and /dev/hdc2.
> 2) Disk hda died (unreadable sectors, fails SMART tests)
> 3) A new blank hda was installed and partitionned exactly like hdc.
> 4) I cannot restart and rebuild the raid1 volume because hdc2 is
>    in a funny "spare" state (see below)
> 
> How do I mark hdc2 as "active"?
> Once "active", I assume then I will be able to restart md0,
> hot-add /dev/hda2 as usual. (And the mirror will resync and rebuild itself?
> Hopefully?)
> 
> [root@tw04 root]# mdadm -E /dev/hdc2
> /dev/hdc2:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : aade8782:20122089:4f496788:228d85b9
>   Creation Time : Fri Oct  8 17:12:56 2004
>      Raid Level : raid1
>     Device Size : 124158208 (118.41 GiB 127.14 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 0
> 
>     Update Time : Mon Oct 18 06:04:35 2004
>           State : clean, no-errors
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : 7041db42 - correct
>          Events : 0.312429
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     2      22        2        2      spare   /dev/hdc2
>    0     0       3        2        0      active sync   /dev/hda2
>    1     1       0        0        1      faulty removed
>    2     2      22        2        2      spare   /dev/hdc2
> [root@tw04 root]# 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Please help- raid1 recovery after disk failure
  2004-10-18 22:56 Please help- raid1 recovery after disk failure Konstantin Olchanski
  2004-10-18 23:22 ` Mike Tran
@ 2004-10-18 23:28 ` Guy
  2004-10-18 23:31   ` Guy
  1 sibling, 1 reply; 5+ messages in thread
From: Guy @ 2004-10-18 23:28 UTC (permalink / raw)
  To: 'Konstantin Olchanski', linux-raid

You must remove the bad disk first.
mdadm -r /dev/md2 /dev/hda2

Then add the new disk:
mdadm -a /dev/md2 /dev/hda2

Why are you using the obsolete raidtools which includes hot-add?
Is red hat still using the old stuff?

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Konstantin Olchanski
Sent: Monday, October 18, 2004 6:57 PM
To: linux-raid@vger.kernel.org
Subject: Please help- raid1 recovery after disk failure


Dear Linux raiders- I ran into a problem with raid1 recovery after
a disk failure (running Fedora2, kernel 2.6.8-1.521smp).

1) I had a raid1 filesystem mirrored across /dev/hda2 and /dev/hdc2.
2) Disk hda died (unreadable sectors, fails SMART tests)
3) A new blank hda was installed and partitionned exactly like hdc.
4) I cannot restart and rebuild the raid1 volume because hdc2 is
   in a funny "spare" state (see below)

How do I mark hdc2 as "active"?
Once "active", I assume then I will be able to restart md0,
hot-add /dev/hda2 as usual. (And the mirror will resync and rebuild itself?
Hopefully?)

[root@tw04 root]# mdadm -E /dev/hdc2
/dev/hdc2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : aade8782:20122089:4f496788:228d85b9
  Creation Time : Fri Oct  8 17:12:56 2004
     Raid Level : raid1
    Device Size : 124158208 (118.41 GiB 127.14 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Mon Oct 18 06:04:35 2004
          State : clean, no-errors
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 7041db42 - correct
         Events : 0.312429


      Number   Major   Minor   RaidDevice State
this     2      22        2        2      spare   /dev/hdc2
   0     0       3        2        0      active sync   /dev/hda2
   1     1       0        0        1      faulty removed
   2     2      22        2        2      spare   /dev/hdc2
[root@tw04 root]# 

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Please help- raid1 recovery after disk failure
  2004-10-18 23:28 ` Guy
@ 2004-10-18 23:31   ` Guy
  0 siblings, 0 replies; 5+ messages in thread
From: Guy @ 2004-10-18 23:31 UTC (permalink / raw)
  To: 'Guy', 'Konstantin Olchanski', linux-raid

Oops!   hdc2!

Hopefully you know which disk was replaced!  :)

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Guy
Sent: Monday, October 18, 2004 7:28 PM
To: 'Konstantin Olchanski'; linux-raid@vger.kernel.org
Subject: RE: Please help- raid1 recovery after disk failure

You must remove the bad disk first.
mdadm -r /dev/md2 /dev/hda2

Then add the new disk:
mdadm -a /dev/md2 /dev/hda2

Why are you using the obsolete raidtools which includes hot-add?
Is red hat still using the old stuff?

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Konstantin Olchanski
Sent: Monday, October 18, 2004 6:57 PM
To: linux-raid@vger.kernel.org
Subject: Please help- raid1 recovery after disk failure


Dear Linux raiders- I ran into a problem with raid1 recovery after
a disk failure (running Fedora2, kernel 2.6.8-1.521smp).

1) I had a raid1 filesystem mirrored across /dev/hda2 and /dev/hdc2.
2) Disk hda died (unreadable sectors, fails SMART tests)
3) A new blank hda was installed and partitionned exactly like hdc.
4) I cannot restart and rebuild the raid1 volume because hdc2 is
   in a funny "spare" state (see below)

How do I mark hdc2 as "active"?
Once "active", I assume then I will be able to restart md0,
hot-add /dev/hda2 as usual. (And the mirror will resync and rebuild itself?
Hopefully?)

[root@tw04 root]# mdadm -E /dev/hdc2
/dev/hdc2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : aade8782:20122089:4f496788:228d85b9
  Creation Time : Fri Oct  8 17:12:56 2004
     Raid Level : raid1
    Device Size : 124158208 (118.41 GiB 127.14 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Mon Oct 18 06:04:35 2004
          State : clean, no-errors
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 7041db42 - correct
         Events : 0.312429


      Number   Major   Minor   RaidDevice State
this     2      22        2        2      spare   /dev/hdc2
   0     0       3        2        0      active sync   /dev/hda2
   1     1       0        0        1      faulty removed
   2     2      22        2        2      spare   /dev/hdc2
[root@tw04 root]# 

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Please help- raid1 recovery after disk failure
  2004-10-18 23:22 ` Mike Tran
@ 2004-10-27  4:20   ` Konstantin Olchanski
  0 siblings, 0 replies; 5+ messages in thread
From: Konstantin Olchanski @ 2004-10-27  4:20 UTC (permalink / raw)
  To: Mike Tran; +Cc: linux-raid

On Mon, Oct 18, 2004 at 06:22:05PM -0500, Mike Tran wrote:
> I would re-create md0 array with a missing disk as follows:
> mdadm -C /dev/md0 -l 1 -n 2 /dev/hdc2 missing
> Later you can hot add a disk to make it a normal 2-way mirror array.

Thanks for all the responses and suggestions- I was able to rebuild
my raid1 (mirror) array without losing any data. For the record,
this is what I did:

0) /dev/hdc2 (half-mirror) mounted as "/"
1) mdadm -C /dev/md0 ... /dev/hdc2 ...
   did not work- says "hdc2" is busy. Maybe it's for the better.
2) reboot into the Fedora2 rescue CD, get the rescue root shell
3) make sure hdc2 is not mounted, md0 is not active (they are not)
4) mdadm -C /dev/md0 -l1 -n2 -c256 /dev/hdc2 missing
   (may have warned about something)
5) mdadm --start /dev/md0, cat /proc/mdstat, mount /dev/md0 /mnt/tmp,
   edit grub.conf (root=/dev/hdc2->/dev/md0), edit fstab (hdc2->md0).
   Notes: /proc/mdstat shows two devices with status status [U_].
6) umount /dev/md0, mdadm --stop /dev/md0
7) reset, remove rescue CD
8) boot from the hard disk, note: md0 started, mounted as "/".
9) mdadm /dev/md0 -a /dev/hda2, note: resync starts automatically
10) wait for resync to complete, 160 Gbyte took about 90 minutes
11) /proc/mdstat shows status [UU].
12) shutdown, move loose disks into enclosures, close the box
13) boot: md0 comes up, status [UU], I am in business until the next
    spurious read error (I am too lazy to roll a custom patched kernel,
    I would rather wait until Red Hat apply the raid1 patches fixing
    bug 136485).

K.O.


> On Mon, 2004-10-18 at 17:56, Konstantin Olchanski wrote:
> > Dear Linux raiders- I ran into a problem with raid1 recovery after
> > a disk failure (running Fedora2, kernel 2.6.8-1.521smp).
> > 
> > 1) I had a raid1 filesystem mirrored across /dev/hda2 and /dev/hdc2.
> > 2) Disk hda died (unreadable sectors, fails SMART tests)
> > 3) A new blank hda was installed and partitionned exactly like hdc.
> > 4) I cannot restart and rebuild the raid1 volume because hdc2 is
> >    in a funny "spare" state (see below)
> > 
> > How do I mark hdc2 as "active"?
> > Once "active", I assume then I will be able to restart md0,
> > hot-add /dev/hda2 as usual. (And the mirror will resync and rebuild itself?
> > Hopefully?)
> > 
> > [root@tw04 root]# mdadm -E /dev/hdc2
> > /dev/hdc2:
> >           Magic : a92b4efc
> >         Version : 00.90.00
> >            UUID : aade8782:20122089:4f496788:228d85b9
> >   Creation Time : Fri Oct  8 17:12:56 2004
> >      Raid Level : raid1
> >     Device Size : 124158208 (118.41 GiB 127.14 GB)
> >    Raid Devices : 2
> >   Total Devices : 2
> > Preferred Minor : 0
> > 
> >     Update Time : Mon Oct 18 06:04:35 2004
> >           State : clean, no-errors
> >  Active Devices : 1
> > Working Devices : 2
> >  Failed Devices : 1
> >   Spare Devices : 1
> >        Checksum : 7041db42 - correct
> >          Events : 0.312429
> > 
> > 
> >       Number   Major   Minor   RaidDevice State
> > this     2      22        2        2      spare   /dev/hdc2
> >    0     0       3        2        0      active sync   /dev/hda2
> >    1     1       0        0        1      faulty removed
> >    2     2      22        2        2      spare   /dev/hdc2
> > [root@tw04 root]# 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-10-27  4:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-18 22:56 Please help- raid1 recovery after disk failure Konstantin Olchanski
2004-10-18 23:22 ` Mike Tran
2004-10-27  4:20   ` Konstantin Olchanski
2004-10-18 23:28 ` Guy
2004-10-18 23:31   ` Guy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).