From: Michael Evans <mjevans1983@gmail.com>
To: Jonathan Gordon <jonathan.kinobe@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Problems recovering from a raid1 failure
Date: Fri, 12 Mar 2010 00:17:08 -0800 [thread overview]
Message-ID: <4877c76c1003120017n34cb170awac2260d1abf1a7a@mail.gmail.com> (raw)
In-Reply-To: <910019881003112351y2fb108abw55fef9da81a5435f@mail.gmail.com>
On Thu, Mar 11, 2010 at 11:51 PM, Jonathan Gordon
<jonathan.kinobe@gmail.com> wrote:
> Upon reboot, my machine began recovering from a raid1 failure.
> Querying mdadm yielded the following:
>
> jgordon@kubuntu:~$ sudo mdadm --detail /dev/md0
> [sudo] password for jgordon:
> /dev/md0:
> Version : 00.90
> Creation Time : Mon Sep 11 06:35:17 2006
> Raid Level : raid1
> Array Size : 242187776 (230.97 GiB 248.00 GB)
> Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Thu Mar 11 18:09:25 2010
> State : clean, degraded, recovering
> Active Devices : 1
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 1
>
> Rebuild Status : 26% complete
>
> UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
> Events : 0.5260272
>
> Number Major Minor RaidDevice State
> 2 8 1 0 spare rebuilding /dev/sda1
> 1 8 17 1 active sync /dev/sdb1
>
> After some time, the rebuild seemed to complete, but the State seemed
> to switch alternately between "active, degraded" and "clean,
> degraded". Addiontally, the state for /dev/sda1 seems to continue to
> stay in "spare rebuilding". This is the current output:
>
> jgordon@kubuntu:~$ sudo mdadm -D /dev/md0
> [sudo] password for jgordon:
> /dev/md0:
> Version : 00.90
> Creation Time : Mon Sep 11 06:35:17 2006
> Raid Level : raid1
> Array Size : 242187776 (230.97 GiB 248.00 GB)
> Used Dev Size : 242187776 (230.97 GiB 248.00 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Thu Mar 11 23:07:59 2010
> State : clean, degraded
> Active Devices : 1
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 1
>
> UUID : 7fd22081:c39cb3e4:21109eec:10ecdf10
> Events : 0.5273340
>
> Number Major Minor RaidDevice State
> 2 8 1 0 spare rebuilding /dev/sda1
> 1 8 17 1 active sync /dev/sdb1
>
> Additionally, /var/log/kern.log is getting filled with the following:
>
> Mar 11 19:19:14 jigme kernel: [ 6596.236366] ata4: EH complete
> Mar 11 19:19:16 jigme kernel: [ 6598.104676] ata4.00: exception Emask
> 0x0 SAct 0x0 SErr 0x0 action 0x0
> Mar 11 19:19:16 jigme kernel: [ 6598.104683] ata4.00: BMDMA stat 0x24
> Mar 11 19:19:16 jigme kernel: [ 6598.104692] ata4.00: cmd
> 25/00:08:ff:b0:e0/00:00:15:00:00/e0 tag 0 dma 4096 in
> Mar 11 19:19:16 jigme kernel: [ 6598.104694] res
> 51/40:00:04:b1:e0/40:00:15:00:00/e0 Emask 0x9 (media error)
> Mar 11 19:19:16 jigme kernel: [ 6598.104698] ata4.00: status: { DRDY ERR }
> Mar 11 19:19:16 jigme kernel: [ 6598.104702] ata4.00: error: { UNC }
> Mar 11 19:19:16 jigme kernel: [ 6598.120352] ata4.00: configured for UDMA/133
> Mar 11 19:19:16 jigme kernel: [ 6598.120371] sd 3:0:0:0: [sdb]
> Unhandled sense code
> Mar 11 19:19:16 jigme kernel: [ 6598.120375] sd 3:0:0:0: [sdb] Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Mar 11 19:19:16 jigme kernel: [ 6598.120380] sd 3:0:0:0: [sdb] Sense
> Key : Medium Error [current] [descriptor]
> Mar 11 19:19:16 jigme kernel: [ 6598.120388] Descriptor sense data
> with sense descriptors (in hex):
> Mar 11 19:19:16 jigme kernel: [ 6598.120392] 72 03 11 04 00 00
> 00 0c 00 0a 80 00 00 00 00 00
> Mar 11 19:19:16 jigme kernel: [ 6598.120412] 15 e0 b1 04
> Mar 11 19:19:16 jigme kernel: [ 6598.120420] sd 3:0:0:0: [sdb] Add.
> Sense: Unrecovered read error - auto reallocate failed
> Mar 11 19:19:16 jigme kernel: [ 6598.120428] end_request: I/O error,
> dev sdb, sector 367046916
> Mar 11 19:19:16 jigme kernel: [ 6598.120446] ata4: EH complete
> Mar 11 19:19:16 jigme kernel: [ 6598.120744] raid1: sdb: unrecoverable
> I/O read error for block 367046784
> Mar 11 19:19:17 jigme kernel: [ 6599.164052] md: md0: recovery done.
> Mar 11 19:19:17 jigme kernel: [ 6599.460124] RAID1 conf printout:
> Mar 11 19:19:17 jigme kernel: [ 6599.460145] --- wd:1 rd:2
> Mar 11 19:19:17 jigme kernel: [ 6599.460160] disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:17 jigme kernel: [ 6599.460170] disk 1, wo:0, o:1, dev:sdb1
> Mar 11 19:19:17 jigme kernel: [ 6599.460178] RAID1 conf printout:
> Mar 11 19:19:17 jigme kernel: [ 6599.460185] --- wd:1 rd:2
> Mar 11 19:19:17 jigme kernel: [ 6599.460195] disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:17 jigme kernel: [ 6599.460204] disk 1, wo:0, o:1, dev:sdb1
> Mar 11 19:19:22 jigme kernel: [ 6604.165111] RAID1 conf printout:
> Mar 11 19:19:22 jigme kernel: [ 6604.165117] --- wd:1 rd:2
> Mar 11 19:19:22 jigme kernel: [ 6604.165122] disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:22 jigme kernel: [ 6604.165125] disk 1, wo:0, o:1, dev:sdb1
> Mar 11 19:19:22 jigme kernel: [ 6604.165128] RAID1 conf printout:
> Mar 11 19:19:22 jigme kernel: [ 6604.165131] --- wd:1 rd:2
> Mar 11 19:19:22 jigme kernel: [ 6604.165134] disk 0, wo:1, o:1, dev:sda1
> Mar 11 19:19:22 jigme kernel: [ 6604.165137] disk 1, wo:0, o:1, dev:sdb1
> ...
> Mar 11 23:16:28 jigme kernel: [20830.889380] RAID1 conf printout:
> Mar 11 23:16:28 jigme kernel: [20830.889386] --- wd:1 rd:2
> Mar 11 23:16:28 jigme kernel: [20830.889391] disk 0, wo:1, o:1, dev:sda1
> Mar 11 23:16:28 jigme kernel: [20830.889394] disk 1, wo:0, o:1, dev:sdb1
> Mar 11 23:16:28 jigme kernel: [20830.889397] RAID1 conf printout:
> Mar 11 23:16:28 jigme kernel: [20830.889399] --- wd:1 rd:2
> Mar 11 23:16:28 jigme kernel: [20830.889403] disk 0, wo:1, o:1, dev:sda1
> Mar 11 23:16:28 jigme kernel: [20830.889406] disk 1, wo:0, o:1, dev:sdb1
>
> The "RAID1 conf printout:" messages appear every few seconds or so.
>
> Machine info:
>
> jgordon@kubuntu:~$ uname -a
> Linux kubuntu 2.6.31-20-386 #57-Ubuntu SMP Mon Feb 8 11:42:49 UTC 2010
> i686 GNU/Linux
>
> Any idea what I can do to resolve this?
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Replace your failing disk; from the look of the kernel log and the
description of the issue I'd say your drive is out of spare sectors
and would fail a S.M.A.R.T. test.
If you require more proof start reading up on how to use the smartctl
command from the smartmontools package (may have dashes/etc in your
package manager).
http://sourceforge.net/apps/trac/smartmontools/wiki/TocDoc
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-12 8:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-12 7:51 Problems recovering from a raid1 failure Jonathan Gordon
2010-03-12 8:17 ` Michael Evans [this message]
2010-03-12 8:22 ` Michael Evans
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4877c76c1003120017n34cb170awac2260d1abf1a7a@mail.gmail.com \
--to=mjevans1983@gmail.com \
--cc=jonathan.kinobe@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).