linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems recovering from a raid1 failure
@ 2010-03-12  7:51 Jonathan Gordon
  2010-03-12  8:17 ` Michael Evans
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Gordon @ 2010-03-12  7:51 UTC (permalink / raw)
  To: linux-raid

Upon reboot, my machine began recovering from a raid1 failure.
Querying mdadm yielded the following:

jgordon@kubuntu:~$ sudo mdadm --detail /dev/md0
[sudo] password for jgordon:
/dev/md0:
       Version : 00.90
 Creation Time : Mon Sep 11 06:35:17 2006
    Raid Level : raid1
    Array Size : 242187776 (230.97 GiB 248.00 GB)
 Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
  Raid Devices : 2
 Total Devices : 2
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Thu Mar 11 18:09:25 2010
         State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 1

 Rebuild Status : 26% complete

          UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
        Events : 0.5260272

   Number   Major   Minor   RaidDevice State
      2       8        1        0      spare rebuilding   /dev/sda1
      1       8       17        1      active sync   /dev/sdb1

After some time, the rebuild seemed to complete, but the State seemed
to switch alternately between "active, degraded" and "clean,
degraded". Addiontally, the state for /dev/sda1 seems to continue to
stay in "spare rebuilding". This is the current output:

jgordon@kubuntu:~$ sudo mdadm -D /dev/md0
[sudo] password for jgordon:
/dev/md0:
       Version : 00.90
 Creation Time : Mon Sep 11 06:35:17 2006
    Raid Level : raid1
    Array Size : 242187776 (230.97 GiB 248.00 GB)
 Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
  Raid Devices : 2
 Total Devices : 2
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Thu Mar 11 23:07:59 2010
         State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 1

          UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
        Events : 0.5273340

   Number   Major   Minor   RaidDevice State
      2       8        1        0      spare rebuilding   /dev/sda1
      1       8       17        1      active sync   /dev/sdb1

Additionally, /var/log/kern.log is getting filled with the following:

Mar 11 19:19:14 jigme kernel: [ 6596.236366] ata4: EH complete
Mar 11 19:19:16 jigme kernel: [ 6598.104676] ata4.00: exception Emask
0x0 SAct 0x0 SErr 0x0 action 0x0
Mar 11 19:19:16 jigme kernel: [ 6598.104683] ata4.00: BMDMA stat 0x24
Mar 11 19:19:16 jigme kernel: [ 6598.104692] ata4.00: cmd
25/00:08:ff:b0:e0/00:00:15:00:00/e0 tag 0 dma 4096 in
Mar 11 19:19:16 jigme kernel: [ 6598.104694]          res
51/40:00:04:b1:e0/40:00:15:00:00/e0 Emask 0x9 (media error)
Mar 11 19:19:16 jigme kernel: [ 6598.104698] ata4.00: status: { DRDY ERR }
Mar 11 19:19:16 jigme kernel: [ 6598.104702] ata4.00: error: { UNC }
Mar 11 19:19:16 jigme kernel: [ 6598.120352] ata4.00: configured for UDMA/133
Mar 11 19:19:16 jigme kernel: [ 6598.120371] sd 3:0:0:0: [sdb]
Unhandled sense code
Mar 11 19:19:16 jigme kernel: [ 6598.120375] sd 3:0:0:0: [sdb] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 11 19:19:16 jigme kernel: [ 6598.120380] sd 3:0:0:0: [sdb] Sense
Key : Medium Error [current] [descriptor]
Mar 11 19:19:16 jigme kernel: [ 6598.120388] Descriptor sense data
with sense descriptors (in hex):
Mar 11 19:19:16 jigme kernel: [ 6598.120392]         72 03 11 04 00 00
00 0c 00 0a 80 00 00 00 00 00
Mar 11 19:19:16 jigme kernel: [ 6598.120412]         15 e0 b1 04
Mar 11 19:19:16 jigme kernel: [ 6598.120420] sd 3:0:0:0: [sdb] Add.
Sense: Unrecovered read error - auto reallocate failed
Mar 11 19:19:16 jigme kernel: [ 6598.120428] end_request: I/O error,
dev sdb, sector 367046916
Mar 11 19:19:16 jigme kernel: [ 6598.120446] ata4: EH complete
Mar 11 19:19:16 jigme kernel: [ 6598.120744] raid1: sdb: unrecoverable
I/O read error for block 367046784
Mar 11 19:19:17 jigme kernel: [ 6599.164052] md: md0: recovery done.
Mar 11 19:19:17 jigme kernel: [ 6599.460124] RAID1 conf printout:
Mar 11 19:19:17 jigme kernel: [ 6599.460145]  --- wd:1 rd:2
Mar 11 19:19:17 jigme kernel: [ 6599.460160]  disk 0, wo:1, o:1, dev:sda1
Mar 11 19:19:17 jigme kernel: [ 6599.460170]  disk 1, wo:0, o:1, dev:sdb1
Mar 11 19:19:17 jigme kernel: [ 6599.460178] RAID1 conf printout:
Mar 11 19:19:17 jigme kernel: [ 6599.460185]  --- wd:1 rd:2
Mar 11 19:19:17 jigme kernel: [ 6599.460195]  disk 0, wo:1, o:1, dev:sda1
Mar 11 19:19:17 jigme kernel: [ 6599.460204]  disk 1, wo:0, o:1, dev:sdb1
Mar 11 19:19:22 jigme kernel: [ 6604.165111] RAID1 conf printout:
Mar 11 19:19:22 jigme kernel: [ 6604.165117]  --- wd:1 rd:2
Mar 11 19:19:22 jigme kernel: [ 6604.165122]  disk 0, wo:1, o:1, dev:sda1
Mar 11 19:19:22 jigme kernel: [ 6604.165125]  disk 1, wo:0, o:1, dev:sdb1
Mar 11 19:19:22 jigme kernel: [ 6604.165128] RAID1 conf printout:
Mar 11 19:19:22 jigme kernel: [ 6604.165131]  --- wd:1 rd:2
Mar 11 19:19:22 jigme kernel: [ 6604.165134]  disk 0, wo:1, o:1, dev:sda1
Mar 11 19:19:22 jigme kernel: [ 6604.165137]  disk 1, wo:0, o:1, dev:sdb1
...
Mar 11 23:16:28 jigme kernel: [20830.889380] RAID1 conf printout:
Mar 11 23:16:28 jigme kernel: [20830.889386]  --- wd:1 rd:2
Mar 11 23:16:28 jigme kernel: [20830.889391]  disk 0, wo:1, o:1, dev:sda1
Mar 11 23:16:28 jigme kernel: [20830.889394]  disk 1, wo:0, o:1, dev:sdb1
Mar 11 23:16:28 jigme kernel: [20830.889397] RAID1 conf printout:
Mar 11 23:16:28 jigme kernel: [20830.889399]  --- wd:1 rd:2
Mar 11 23:16:28 jigme kernel: [20830.889403]  disk 0, wo:1, o:1, dev:sda1
Mar 11 23:16:28 jigme kernel: [20830.889406]  disk 1, wo:0, o:1, dev:sdb1

The "RAID1 conf printout:" messages appear every few seconds or so.

Machine info:

jgordon@kubuntu:~$ uname -a
Linux kubuntu 2.6.31-20-386 #57-Ubuntu SMP Mon Feb 8 11:42:49 UTC 2010
i686 GNU/Linux

Any idea what I can do to resolve this?

Thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems recovering from a raid1 failure
  2010-03-12  7:51 Problems recovering from a raid1 failure Jonathan Gordon
@ 2010-03-12  8:17 ` Michael Evans
  2010-03-12  8:22   ` Michael Evans
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Evans @ 2010-03-12  8:17 UTC (permalink / raw)
  To: Jonathan Gordon; +Cc: linux-raid

On Thu, Mar 11, 2010 at 11:51 PM, Jonathan Gordon
<jonathan.kinobe@gmail.com> wrote:
> Upon reboot, my machine began recovering from a raid1 failure.
> Querying mdadm yielded the following:
>
> jgordon@kubuntu:~$ sudo mdadm --detail /dev/md0
> [sudo] password for jgordon:
> /dev/md0:
>       Version : 00.90
>  Creation Time : Mon Sep 11 06:35:17 2006
>    Raid Level : raid1
>    Array Size : 242187776 (230.97 GiB 248.00 GB)
>  Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
>  Raid Devices : 2
>  Total Devices : 2
> Preferred Minor : 0
>   Persistence : Superblock is persistent
>
>   Update Time : Thu Mar 11 18:09:25 2010
>         State : clean, degraded, recovering
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 0
>  Spare Devices : 1
>
>  Rebuild Status : 26% complete
>
>          UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
>        Events : 0.5260272
>
>   Number   Major   Minor   RaidDevice State
>      2       8        1        0      spare rebuilding   /dev/sda1
>      1       8       17        1      active sync   /dev/sdb1
>
> After some time, the rebuild seemed to complete, but the State seemed
> to switch alternately between "active, degraded" and "clean,
> degraded". Addiontally, the state for /dev/sda1 seems to continue to
> stay in "spare rebuilding". This is the current output:
>
> jgordon@kubuntu:~$ sudo mdadm -D /dev/md0
> [sudo] password for jgordon:
> /dev/md0:
>       Version : 00.90
>  Creation Time : Mon Sep 11 06:35:17 2006
>    Raid Level : raid1
>    Array Size : 242187776 (230.97 GiB 248.00 GB)
>  Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
>  Raid Devices : 2
>  Total Devices : 2
> Preferred Minor : 0
>   Persistence : Superblock is persistent
>
>   Update Time : Thu Mar 11 23:07:59 2010
>         State : clean, degraded
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 0
>  Spare Devices : 1
>
>          UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
>        Events : 0.5273340
>
>   Number   Major   Minor   RaidDevice State
>      2       8        1        0      spare rebuilding   /dev/sda1
>      1       8       17        1      active sync   /dev/sdb1
>
> Additionally, /var/log/kern.log is getting filled with the following:
>
> Mar 11 19:19:14 jigme kernel: [ 6596.236366] ata4: EH complete
> Mar 11 19:19:16 jigme kernel: [ 6598.104676] ata4.00: exception Emask
> 0x0 SAct 0x0 SErr 0x0 action 0x0
> Mar 11 19:19:16 jigme kernel: [ 6598.104683] ata4.00: BMDMA stat 0x24
> Mar 11 19:19:16 jigme kernel: [ 6598.104692] ata4.00: cmd
> 25/00:08:ff:b0:e0/00:00:15:00:00/e0 tag 0 dma 4096 in
> Mar 11 19:19:16 jigme kernel: [ 6598.104694]          res
> 51/40:00:04:b1:e0/40:00:15:00:00/e0 Emask 0x9 (media error)
> Mar 11 19:19:16 jigme kernel: [ 6598.104698] ata4.00: status: { DRDY ERR }
> Mar 11 19:19:16 jigme kernel: [ 6598.104702] ata4.00: error: { UNC }
> Mar 11 19:19:16 jigme kernel: [ 6598.120352] ata4.00: configured for UDMA/133
> Mar 11 19:19:16 jigme kernel: [ 6598.120371] sd 3:0:0:0: [sdb]
> Unhandled sense code
> Mar 11 19:19:16 jigme kernel: [ 6598.120375] sd 3:0:0:0: [sdb] Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Mar 11 19:19:16 jigme kernel: [ 6598.120380] sd 3:0:0:0: [sdb] Sense
> Key : Medium Error [current] [descriptor]
> Mar 11 19:19:16 jigme kernel: [ 6598.120388] Descriptor sense data
> with sense descriptors (in hex):
> Mar 11 19:19:16 jigme kernel: [ 6598.120392]         72 03 11 04 00 00
> 00 0c 00 0a 80 00 00 00 00 00
> Mar 11 19:19:16 jigme kernel: [ 6598.120412]         15 e0 b1 04
> Mar 11 19:19:16 jigme kernel: [ 6598.120420] sd 3:0:0:0: [sdb] Add.
> Sense: Unrecovered read error - auto reallocate failed
> Mar 11 19:19:16 jigme kernel: [ 6598.120428] end_request: I/O error,
> dev sdb, sector 367046916
> Mar 11 19:19:16 jigme kernel: [ 6598.120446] ata4: EH complete
> Mar 11 19:19:16 jigme kernel: [ 6598.120744] raid1: sdb: unrecoverable
> I/O read error for block 367046784
> Mar 11 19:19:17 jigme kernel: [ 6599.164052] md: md0: recovery done.
> Mar 11 19:19:17 jigme kernel: [ 6599.460124] RAID1 conf printout:
> Mar 11 19:19:17 jigme kernel: [ 6599.460145]  --- wd:1 rd:2
> Mar 11 19:19:17 jigme kernel: [ 6599.460160]  disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:17 jigme kernel: [ 6599.460170]  disk 1, wo:0, o:1, dev:sdb1
> Mar 11 19:19:17 jigme kernel: [ 6599.460178] RAID1 conf printout:
> Mar 11 19:19:17 jigme kernel: [ 6599.460185]  --- wd:1 rd:2
> Mar 11 19:19:17 jigme kernel: [ 6599.460195]  disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:17 jigme kernel: [ 6599.460204]  disk 1, wo:0, o:1, dev:sdb1
> Mar 11 19:19:22 jigme kernel: [ 6604.165111] RAID1 conf printout:
> Mar 11 19:19:22 jigme kernel: [ 6604.165117]  --- wd:1 rd:2
> Mar 11 19:19:22 jigme kernel: [ 6604.165122]  disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:22 jigme kernel: [ 6604.165125]  disk 1, wo:0, o:1, dev:sdb1
> Mar 11 19:19:22 jigme kernel: [ 6604.165128] RAID1 conf printout:
> Mar 11 19:19:22 jigme kernel: [ 6604.165131]  --- wd:1 rd:2
> Mar 11 19:19:22 jigme kernel: [ 6604.165134]  disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:22 jigme kernel: [ 6604.165137]  disk 1, wo:0, o:1, dev:sdb1
> ...
> Mar 11 23:16:28 jigme kernel: [20830.889380] RAID1 conf printout:
> Mar 11 23:16:28 jigme kernel: [20830.889386]  --- wd:1 rd:2
> Mar 11 23:16:28 jigme kernel: [20830.889391]  disk 0, wo:1, o:1, dev:sda1
> Mar 11 23:16:28 jigme kernel: [20830.889394]  disk 1, wo:0, o:1, dev:sdb1
> Mar 11 23:16:28 jigme kernel: [20830.889397] RAID1 conf printout:
> Mar 11 23:16:28 jigme kernel: [20830.889399]  --- wd:1 rd:2
> Mar 11 23:16:28 jigme kernel: [20830.889403]  disk 0, wo:1, o:1, dev:sda1
> Mar 11 23:16:28 jigme kernel: [20830.889406]  disk 1, wo:0, o:1, dev:sdb1
>
> The "RAID1 conf printout:" messages appear every few seconds or so.
>
> Machine info:
>
> jgordon@kubuntu:~$ uname -a
> Linux kubuntu 2.6.31-20-386 #57-Ubuntu SMP Mon Feb 8 11:42:49 UTC 2010
> i686 GNU/Linux
>
> Any idea what I can do to resolve this?
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Replace your failing disk; from the look of the kernel log and the
description of the issue I'd say your drive is out of spare sectors
and would fail a S.M.A.R.T. test.

If you require more proof start reading up on how to use the smartctl
command from the smartmontools package (may have dashes/etc in your
package manager).

http://sourceforge.net/apps/trac/smartmontools/wiki/TocDoc
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems recovering from a raid1 failure
  2010-03-12  8:17 ` Michael Evans
@ 2010-03-12  8:22   ` Michael Evans
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Evans @ 2010-03-12  8:22 UTC (permalink / raw)
  To: Jonathan Gordon; +Cc: linux-raid

On Fri, Mar 12, 2010 at 12:17 AM, Michael Evans <mjevans1983@gmail.com> wrote:
> On Thu, Mar 11, 2010 at 11:51 PM, Jonathan Gordon
> <jonathan.kinobe@gmail.com> wrote:
>> Upon reboot, my machine began recovering from a raid1 failure.
>> Querying mdadm yielded the following:
>>
>> jgordon@kubuntu:~$ sudo mdadm --detail /dev/md0
>> [sudo] password for jgordon:
>> /dev/md0:
>>       Version : 00.90
>>  Creation Time : Mon Sep 11 06:35:17 2006
>>    Raid Level : raid1
>>    Array Size : 242187776 (230.97 GiB 248.00 GB)
>>  Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
>>  Raid Devices : 2
>>  Total Devices : 2
>> Preferred Minor : 0
>>   Persistence : Superblock is persistent
>>
>>   Update Time : Thu Mar 11 18:09:25 2010
>>         State : clean, degraded, recovering
>>  Active Devices : 1
>> Working Devices : 2
>>  Failed Devices : 0
>>  Spare Devices : 1
>>
>>  Rebuild Status : 26% complete
>>
>>          UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
>>        Events : 0.5260272
>>
>>   Number   Major   Minor   RaidDevice State
>>      2       8        1        0      spare rebuilding   /dev/sda1
>>      1       8       17        1      active sync   /dev/sdb1
>>
>> After some time, the rebuild seemed to complete, but the State seemed
>> to switch alternately between "active, degraded" and "clean,
>> degraded". Addiontally, the state for /dev/sda1 seems to continue to
>> stay in "spare rebuilding". This is the current output:
>>
>> jgordon@kubuntu:~$ sudo mdadm -D /dev/md0
>> [sudo] password for jgordon:
>> /dev/md0:
>>       Version : 00.90
>>  Creation Time : Mon Sep 11 06:35:17 2006
>>    Raid Level : raid1
>>    Array Size : 242187776 (230.97 GiB 248.00 GB)
>>  Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
>>  Raid Devices : 2
>>  Total Devices : 2
>> Preferred Minor : 0
>>   Persistence : Superblock is persistent
>>
>>   Update Time : Thu Mar 11 23:07:59 2010
>>         State : clean, degraded
>>  Active Devices : 1
>> Working Devices : 2
>>  Failed Devices : 0
>>  Spare Devices : 1
>>
>>          UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
>>        Events : 0.5273340
>>
>>   Number   Major   Minor   RaidDevice State
>>      2       8        1        0      spare rebuilding   /dev/sda1
>>      1       8       17        1      active sync   /dev/sdb1
>>
>> Additionally, /var/log/kern.log is getting filled with the following:
>>
>> Mar 11 19:19:14 jigme kernel: [ 6596.236366] ata4: EH complete
>> Mar 11 19:19:16 jigme kernel: [ 6598.104676] ata4.00: exception Emask
>> 0x0 SAct 0x0 SErr 0x0 action 0x0
>> Mar 11 19:19:16 jigme kernel: [ 6598.104683] ata4.00: BMDMA stat 0x24
>> Mar 11 19:19:16 jigme kernel: [ 6598.104692] ata4.00: cmd
>> 25/00:08:ff:b0:e0/00:00:15:00:00/e0 tag 0 dma 4096 in
>> Mar 11 19:19:16 jigme kernel: [ 6598.104694]          res
>> 51/40:00:04:b1:e0/40:00:15:00:00/e0 Emask 0x9 (media error)
>> Mar 11 19:19:16 jigme kernel: [ 6598.104698] ata4.00: status: { DRDY ERR }
>> Mar 11 19:19:16 jigme kernel: [ 6598.104702] ata4.00: error: { UNC }
>> Mar 11 19:19:16 jigme kernel: [ 6598.120352] ata4.00: configured for UDMA/133
>> Mar 11 19:19:16 jigme kernel: [ 6598.120371] sd 3:0:0:0: [sdb]
>> Unhandled sense code
>> Mar 11 19:19:16 jigme kernel: [ 6598.120375] sd 3:0:0:0: [sdb] Result:
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Mar 11 19:19:16 jigme kernel: [ 6598.120380] sd 3:0:0:0: [sdb] Sense
>> Key : Medium Error [current] [descriptor]
>> Mar 11 19:19:16 jigme kernel: [ 6598.120388] Descriptor sense data
>> with sense descriptors (in hex):
>> Mar 11 19:19:16 jigme kernel: [ 6598.120392]         72 03 11 04 00 00
>> 00 0c 00 0a 80 00 00 00 00 00
>> Mar 11 19:19:16 jigme kernel: [ 6598.120412]         15 e0 b1 04
>> Mar 11 19:19:16 jigme kernel: [ 6598.120420] sd 3:0:0:0: [sdb] Add.
>> Sense: Unrecovered read error - auto reallocate failed
>> Mar 11 19:19:16 jigme kernel: [ 6598.120428] end_request: I/O error,
>> dev sdb, sector 367046916
>> Mar 11 19:19:16 jigme kernel: [ 6598.120446] ata4: EH complete
>> Mar 11 19:19:16 jigme kernel: [ 6598.120744] raid1: sdb: unrecoverable
>> I/O read error for block 367046784
>> Mar 11 19:19:17 jigme kernel: [ 6599.164052] md: md0: recovery done.
>> Mar 11 19:19:17 jigme kernel: [ 6599.460124] RAID1 conf printout:
>> Mar 11 19:19:17 jigme kernel: [ 6599.460145]  --- wd:1 rd:2
>> Mar 11 19:19:17 jigme kernel: [ 6599.460160]  disk 0, wo:1, o:1, dev:sda1
>> Mar 11 19:19:17 jigme kernel: [ 6599.460170]  disk 1, wo:0, o:1, dev:sdb1
>> Mar 11 19:19:17 jigme kernel: [ 6599.460178] RAID1 conf printout:
>> Mar 11 19:19:17 jigme kernel: [ 6599.460185]  --- wd:1 rd:2
>> Mar 11 19:19:17 jigme kernel: [ 6599.460195]  disk 0, wo:1, o:1, dev:sda1
>> Mar 11 19:19:17 jigme kernel: [ 6599.460204]  disk 1, wo:0, o:1, dev:sdb1
>> Mar 11 19:19:22 jigme kernel: [ 6604.165111] RAID1 conf printout:
>> Mar 11 19:19:22 jigme kernel: [ 6604.165117]  --- wd:1 rd:2
>> Mar 11 19:19:22 jigme kernel: [ 6604.165122]  disk 0, wo:1, o:1, dev:sda1
>> Mar 11 19:19:22 jigme kernel: [ 6604.165125]  disk 1, wo:0, o:1, dev:sdb1
>> Mar 11 19:19:22 jigme kernel: [ 6604.165128] RAID1 conf printout:
>> Mar 11 19:19:22 jigme kernel: [ 6604.165131]  --- wd:1 rd:2
>> Mar 11 19:19:22 jigme kernel: [ 6604.165134]  disk 0, wo:1, o:1, dev:sda1
>> Mar 11 19:19:22 jigme kernel: [ 6604.165137]  disk 1, wo:0, o:1, dev:sdb1
>> ...
>> Mar 11 23:16:28 jigme kernel: [20830.889380] RAID1 conf printout:
>> Mar 11 23:16:28 jigme kernel: [20830.889386]  --- wd:1 rd:2
>> Mar 11 23:16:28 jigme kernel: [20830.889391]  disk 0, wo:1, o:1, dev:sda1
>> Mar 11 23:16:28 jigme kernel: [20830.889394]  disk 1, wo:0, o:1, dev:sdb1
>> Mar 11 23:16:28 jigme kernel: [20830.889397] RAID1 conf printout:
>> Mar 11 23:16:28 jigme kernel: [20830.889399]  --- wd:1 rd:2
>> Mar 11 23:16:28 jigme kernel: [20830.889403]  disk 0, wo:1, o:1, dev:sda1
>> Mar 11 23:16:28 jigme kernel: [20830.889406]  disk 1, wo:0, o:1, dev:sdb1
>>
>> The "RAID1 conf printout:" messages appear every few seconds or so.
>>
>> Machine info:
>>
>> jgordon@kubuntu:~$ uname -a
>> Linux kubuntu 2.6.31-20-386 #57-Ubuntu SMP Mon Feb 8 11:42:49 UTC 2010
>> i686 GNU/Linux
>>
>> Any idea what I can do to resolve this?
>>
>> Thanks!
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Replace your failing disk; from the look of the kernel log and the
> description of the issue I'd say your drive is out of spare sectors
> and would fail a S.M.A.R.T. test.
>
> If you require more proof start reading up on how to use the smartctl
> command from the smartmontools package (may have dashes/etc in your
> package manager).
>
> http://sourceforge.net/apps/trac/smartmontools/wiki/TocDoc
>

Reading more carefully, I notice you're in a most unfortunate
situation.  It's failing while trying to READ your 'ACTIVE' disc.

You should buy two NEW drives, attempt to copy the current active
member of the array to a NEW drive using something like
http://www.gnu.org/software/ddrescue/ddrescue.html

For anything that won't copy you /MIGHT/ try the other member of the
array if it used to be in place.  Chances are good that stale data
from a section MAY be better than no data at all.  You MUST perform an
FSCK prior to mounting the newly re-created array; you should try to
determine which files occupy any 'recovered' sectors and validate that
they are intact or replace them from other duplicates.

How to do that depends very much on what filesystem you have on the array.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-03-12  8:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-12  7:51 Problems recovering from a raid1 failure Jonathan Gordon
2010-03-12  8:17 ` Michael Evans
2010-03-12  8:22   ` Michael Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).