From: Andrew Thrift <andrew@networklabs.co.nz>
To: linux-raid@vger.kernel.org
Subject: Another corrupt RAID5
Date: Tue, 01 May 2012 18:34:10 +1200 [thread overview]
Message-ID: <4F9F83E2.90407@networklabs.co.nz> (raw)
Hi,
I have had a working md raid5 configuration for a number of years now.
Last year I rebuilt it in to a 2x Raid5 arrays as PV's for LVM2, which
has been working great... Until I upgraded to Ubuntu 12.04 from 11.10.
I just noticed Christoph's post, and while my symptoms are very similar,
they are also different. I will outline what happened below.
After the upgrade everything initially looked OK, however I noticed when
I tried to list directory contents it would show nothing, and the logs
would fill with IO errors e.g.:
Apr 30 22:54:41 blackbox kernel: [ 3648.798394] EXT4-fs error (device
dm-0): __ext4_get_inode_loc:3657: inode #440926209: block 1763704864:
comm smbd: unable to read itable block
Apr 30 22:54:41 blackbox kernel: [ 3648.799920] EXT4-fs (dm-0): previous
I/O error to superblock detected
Apr 30 22:54:41 blackbox kernel: [ 3648.799935] EXT4-fs error (device
dm-0): __ext4_get_inode_loc:3657: inode #440926209: block 1763704864:
comm smbd: unable to read itable block
Apr 30 22:54:41 blackbox kernel: [ 3648.800026] EXT4-fs (dm-0): previous
I/O error to superblock detected
I assumed that maybe the LSI2008 controller had maybe not spun up the
drives properly, and gave the machine a reboot. All appeared well now,
so I left the machine. However overnight the logs filled with:
May 1 00:09:37 blackbox kernel: [ 3712.741980] sd 9:0:3:0: [sdf] Device
not ready
May 1 00:09:37 blackbox kernel: [ 3712.741985] sd 9:0:3:0: [sdf]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 1 00:09:37 blackbox kernel: [ 3712.741990] sd 9:0:3:0: [sdf] Sense
Key : Not Ready [current]
May 1 00:09:37 blackbox kernel: [ 3712.741995] sd 9:0:3:0: [sdf] Add.
Sense: Logical unit not ready, initializing command required
May 1 00:09:37 blackbox kernel: [ 3712.742000] sd 9:0:3:0: [sdf] CDB:
Read(10): 28 00 a9 d5 56 47 00 00 08 00
May 1 00:09:37 blackbox kernel: [ 3712.742011] end_request: I/O error,
dev sdf, sector 2849330759
May 1 00:09:37 blackbox kernel: [ 3712.742120] sd 9:0:4:0: [sdg] Device
not ready
May 1 00:09:37 blackbox kernel: [ 3712.742122] sd 9:0:4:0: [sdg]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 1 00:09:37 blackbox kernel: [ 3712.742126] sd 9:0:4:0: [sdg] Sense
Key : Not Ready [current]
May 1 00:09:37 blackbox kernel: [ 3712.742132] sd 9:0:4:0: [sdg] Add.
Sense: Logical unit not ready, initializing command required
May 1 00:09:37 blackbox kernel: [ 3712.742136] sd 9:0:4:0: [sdg] CDB:
Read(10): 28 00 a9 d5 56 47 00 00 08 00
May 1 00:09:37 blackbox kernel: [ 3712.742145] end_request: I/O error,
dev sdg, sector 2849330759
May 1 00:09:37 blackbox kernel: [ 3712.742187] sd 9:0:5:0: [sdh] Device
not ready
May 1 00:09:37 blackbox kernel: [ 3712.742189] sd 9:0:5:0: [sdh]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 1 00:09:37 blackbox kernel: [ 3712.742192] sd 9:0:5:0: [sdh] Sense
Key : Not Ready [current]
May 1 00:09:37 blackbox kernel: [ 3712.742196] sd 9:0:5:0: [sdh] Add.
Sense: Logical unit not ready, initializing command required
May 1 00:09:37 blackbox kernel: [ 3712.742200] sd 9:0:5:0: [sdh] CDB:
Read(10): 28 00 a9 d5 56 47 00 00 08 00
May 1 00:09:37 blackbox kernel: [ 3712.742208] end_request: I/O error,
dev sdh, sector 2849330759
May 1 00:09:37 blackbox kernel: [ 3712.756852] md/raid:md0: Disk
failure on sdh1, disabling device.
May 1 00:09:37 blackbox kernel: [ 3712.756854] md/raid:md0: Operation
continuing on 3 devices.
May 1 00:09:37 blackbox kernel: [ 3712.756925] md/raid:md0: Disk
failure on sdg1, disabling device.
May 1 00:09:37 blackbox kernel: [ 3712.756926] md/raid:md0: Operation
continuing on 2 devices.
May 1 00:09:37 blackbox kernel: [ 3712.756985] md/raid:md0: Disk
failure on sdf1, disabling device.
May 1 00:09:37 blackbox kernel: [ 3712.756986] md/raid:md0: Operation
continuing on 1 devices.
May 1 00:09:37 blackbox kernel: [ 3712.757038] EXT4-fs error (device
dm-0): ext4_read_inode_bitmap:161: comm nfsd: Cannot read inode bitmap -
block_group = 32609, inode_bitmap = 1068498961
May 1 00:09:37 blackbox kernel: [ 3712.757083] EXT4-fs error (device
dm-0) in ext4_new_inode:937: IO failure
May 1 00:09:37 blackbox kernel: [ 3712.863217] RAID conf printout:
May 1 00:09:37 blackbox kernel: [ 3712.863222] --- level:5 rd:4 wd:1
May 1 00:09:37 blackbox kernel: [ 3712.863225] disk 0, o:0, dev:sdf1
May 1 00:09:37 blackbox kernel: [ 3712.863227] disk 1, o:0, dev:sdg1
May 1 00:09:37 blackbox kernel: [ 3712.863229] disk 2, o:1, dev:sdi1
May 1 00:09:37 blackbox kernel: [ 3712.863231] disk 3, o:0, dev:sdh1
May 1 00:09:37 blackbox kernel: [ 3712.864483] RAID conf printout:
May 1 00:09:37 blackbox kernel: [ 3712.864487] --- level:5 rd:4 wd:1
May 1 00:09:37 blackbox kernel: [ 3712.864491] disk 0, o:0, dev:sdf1
May 1 00:09:37 blackbox kernel: [ 3712.864493] disk 1, o:0, dev:sdg1
May 1 00:09:37 blackbox kernel: [ 3712.864495] disk 2, o:1, dev:sdi1
May 1 00:09:37 blackbox kernel: [ 3712.864501] RAID conf printout:
May 1 00:09:37 blackbox kernel: [ 3712.864503] --- level:5 rd:4 wd:1
May 1 00:09:37 blackbox kernel: [ 3712.864505] disk 0, o:0, dev:sdf1
May 1 00:09:37 blackbox kernel: [ 3712.864507] disk 1, o:0, dev:sdg1
May 1 00:09:37 blackbox kernel: [ 3712.864508] disk 2, o:1, dev:sdi1
May 1 00:09:37 blackbox kernel: [ 3712.869463] RAID conf printout:
May 1 00:09:37 blackbox kernel: [ 3712.869467] --- level:5 rd:4 wd:1
May 1 00:09:37 blackbox kernel: [ 3712.869471] disk 0, o:0, dev:sdf1
May 1 00:09:37 blackbox kernel: [ 3712.869473] disk 2, o:1, dev:sdi1
May 1 00:09:37 blackbox kernel: [ 3712.869477] RAID conf printout:
May 1 00:09:37 blackbox kernel: [ 3712.869479] --- level:5 rd:4 wd:1
May 1 00:09:37 blackbox kernel: [ 3712.869481] disk 0, o:0, dev:sdf1
May 1 00:09:37 blackbox kernel: [ 3712.869483] disk 2, o:1, dev:sdi1
May 1 00:09:37 blackbox kernel: [ 3712.869554] RAID conf printout:
May 1 00:09:37 blackbox kernel: [ 3712.869559] --- level:5 rd:4 wd:1
May 1 00:09:37 blackbox kernel: [ 3712.869562] disk 2, o:1, dev:sdi1
May 1 00:09:37 blackbox kernel: [ 3712.869578] Buffer I/O error on
device dm-0, logical block 0
May 1 00:09:37 blackbox kernel: [ 3712.869613] lost page write due to
I/O error on dm-0
May 1 00:09:42 blackbox kernel: [ 3718.213744] Aborting journal on
device dm-0-8.
May 1 00:09:42 blackbox kernel: [ 3718.213828] Buffer I/O error on
device dm-0, logical block 976781312
May 1 00:09:42 blackbox kernel: [ 3718.213867] lost page write due to
I/O error on dm-0
May 1 00:09:42 blackbox kernel: [ 3718.213876] JBD2: I/O error detected
when updating journal superblock for dm-0-8.
May 1 00:09:43 blackbox mdadm[1876]: Fail event detected on md device
/dev/md0, component device /dev/sdf1
May 1 00:09:49 blackbox mdadm[1876]: Fail event detected on md device
/dev/md0, component device /dev/sdg1
May 1 00:09:54 blackbox mdadm[1876]: Fail event detected on md device
/dev/md0, component device /dev/sdh1
May 1 05:55:38 blackbox kernel: [24453.921252] EXT4-fs (dm-0): previous
I/O error to superblock detected
May 1 05:55:38 blackbox kernel: [24453.966924] Buffer I/O error on
device dm-0, logical block 0
May 1 05:55:38 blackbox kernel: [24453.966960] lost page write due to
I/O error on dm-0
May 1 05:55:38 blackbox kernel: [24453.966970] EXT4-fs error (device
dm-0): ext4_journal_start_sb:327: Detected aborted journal
May 1 05:55:38 blackbox kernel: [24453.967025] EXT4-fs (dm-0):
Remounting filesystem read-only
May 1 05:55:38 blackbox kernel: [24453.967057] EXT4-fs (dm-0): previous
I/O error to superblock detected
May 1 05:55:38 blackbox kernel: [24453.967107] Buffer I/O error on
device dm-0, logical block 0
May 1 05:55:38 blackbox kernel: [24453.967140] lost page write due to
I/O error on dm-0
May 1 06:25:14 blackbox kernel: [26228.988963] Buffer I/O error on
device dm-0, logical block 9250
May 1 06:25:14 blackbox kernel: [26228.989008] Buffer I/O error on
device dm-0, logical block 9251
May 1 06:25:14 blackbox kernel: [26228.989044] Buffer I/O error on
device dm-0, logical block 9252
May 1 06:25:14 blackbox kernel: [26228.989080] Buffer I/O error on
device dm-0, logical block 9253
May 1 06:25:14 blackbox kernel: [26228.989116] Buffer I/O error on
device dm-0, logical block 9254
May 1 06:25:14 blackbox kernel: [26228.989151] Buffer I/O error on
device dm-0, logical block 9255
May 1 06:25:14 blackbox kernel: [26228.989186] Buffer I/O error on
device dm-0, logical block 9256
May 1 06:25:14 blackbox kernel: [26228.989221] Buffer I/O error on
device dm-0, logical block 9257
May 1 06:25:14 blackbox kernel: [26228.989256] Buffer I/O error on
device dm-0, logical block 9258
May 1 06:25:14 blackbox kernel: [26228.989291] Buffer I/O error on
device dm-0, logical block 9259
May 1 06:25:14 blackbox kernel: [26228.989345] EXT4-fs (dm-0): previous
I/O error to superblock detected
May 1 06:25:14 blackbox kernel: [26229.070433] EXT4-fs error (device
dm-0): ext4_readdir:173: inode #11: comm standard: path
/media/store0/lost+found: directory contains a hole at offset 0
May 1 08:28:59 blackbox kernel: [33646.969601] journal commit I/O error
May 1 08:28:59 blackbox kernel: [33647.017036] Buffer I/O error on
device dm-0, logical block 902299653
May 1 08:28:59 blackbox kernel: [33647.017107] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.017123] sd 9:0:2:0: [sde] Device
not ready
May 1 08:28:59 blackbox kernel: [33647.017125] sd 9:0:2:0: [sde]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 1 08:28:59 blackbox kernel: [33647.017129] sd 9:0:2:0: [sde] Sense
Key : Not Ready [current]
May 1 08:28:59 blackbox kernel: [33647.017136] sd 9:0:2:0: [sde] Add.
Sense: Logical unit not ready, initializing command required
May 1 08:28:59 blackbox kernel: [33647.017141] sd 9:0:2:0: [sde] CDB:
Write(10): 2a 00 74 70 59 3f 00 00 08 00
May 1 08:28:59 blackbox kernel: [33647.017153] end_request: I/O error,
dev sde, sector 1953519935
May 1 08:28:59 blackbox kernel: [33647.017188] end_request: I/O error,
dev sde, sector 1953519935
May 1 08:28:59 blackbox kernel: [33647.017221] md: super_written gets
error=-5, uptodate=0
May 1 08:28:59 blackbox kernel: [33647.017225] md/raid:md1: Disk
failure on sde1, disabling device.
May 1 08:28:59 blackbox kernel: [33647.017226] md/raid:md1: Operation
continuing on 2 devices.
May 1 08:28:59 blackbox kernel: [33647.017298] sd 9:0:0:0: [sdc] Device
not ready
May 1 08:28:59 blackbox kernel: [33647.017300] sd 9:0:0:0: [sdc]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 1 08:28:59 blackbox kernel: [33647.017303] sd 9:0:0:0: [sdc] Sense
Key : Not Ready [current]
May 1 08:28:59 blackbox kernel: [33647.017307] sd 9:0:0:0: [sdc] Add.
Sense: Logical unit not ready, initializing command required
May 1 08:28:59 blackbox kernel: [33647.017312] sd 9:0:0:0: [sdc] CDB:
Write(10): 2a 00 74 70 59 3f 00 00 08 00
May 1 08:28:59 blackbox kernel: [33647.017320] end_request: I/O error,
dev sdc, sector 1953519935
May 1 08:28:59 blackbox kernel: [33647.017354] end_request: I/O error,
dev sdc, sector 1953519935
May 1 08:28:59 blackbox kernel: [33647.017386] md: super_written gets
error=-5, uptodate=0
May 1 08:28:59 blackbox kernel: [33647.017389] md/raid:md1: Disk
failure on sdc1, disabling device.
May 1 08:28:59 blackbox kernel: [33647.017390] md/raid:md1: Operation
continuing on 1 devices.
May 1 08:28:59 blackbox kernel: [33647.017455] sd 9:0:1:0: [sdd] Device
not ready
May 1 08:28:59 blackbox kernel: [33647.017457] sd 9:0:1:0: [sdd]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 1 08:28:59 blackbox kernel: [33647.017461] sd 9:0:1:0: [sdd] Sense
Key : Not Ready [current]
May 1 08:28:59 blackbox kernel: [33647.017464] sd 9:0:1:0: [sdd] Add.
Sense: Logical unit not ready, initializing command required
May 1 08:28:59 blackbox kernel: [33647.017468] sd 9:0:1:0: [sdd] CDB:
Write(10): 2a 00 74 70 59 3f 00 00 08 00
May 1 08:28:59 blackbox kernel: [33647.017476] end_request: I/O error,
dev sdd, sector 1953519935
May 1 08:28:59 blackbox kernel: [33647.017509] end_request: I/O error,
dev sdd, sector 1953519935
May 1 08:28:59 blackbox kernel: [33647.018544] md: super_written gets
error=-5, uptodate=0
May 1 08:28:59 blackbox kernel: [33647.018547] md/raid:md1: Disk
failure on sdd1, disabling device.
May 1 08:28:59 blackbox kernel: [33647.018548] md/raid:md1: Operation
continuing on 0 devices.
May 1 08:28:59 blackbox kernel: [33647.020709] RAID conf printout:
May 1 08:28:59 blackbox kernel: [33647.020714] --- level:5 rd:3 wd:0
May 1 08:28:59 blackbox kernel: [33647.020718] disk 0, o:0, dev:sdc1
May 1 08:28:59 blackbox kernel: [33647.020722] disk 1, o:0, dev:sde1
May 1 08:28:59 blackbox kernel: [33647.020726] disk 2, o:0, dev:sdd1
May 1 08:28:59 blackbox kernel: [33647.067507] RAID conf printout:
May 1 08:28:59 blackbox kernel: [33647.067512] --- level:5 rd:3 wd:0
May 1 08:28:59 blackbox kernel: [33647.067515] disk 0, o:0, dev:sdc1
May 1 08:28:59 blackbox kernel: [33647.067517] disk 2, o:0, dev:sdd1
May 1 08:28:59 blackbox kernel: [33647.067523] RAID conf printout:
May 1 08:28:59 blackbox kernel: [33647.067525] --- level:5 rd:3 wd:0
May 1 08:28:59 blackbox kernel: [33647.067527] disk 0, o:0, dev:sdc1
May 1 08:28:59 blackbox kernel: [33647.067529] disk 2, o:0, dev:sdd1
May 1 08:28:59 blackbox kernel: [33647.127449] RAID conf printout:
May 1 08:28:59 blackbox kernel: [33647.127453] --- level:5 rd:3 wd:0
May 1 08:28:59 blackbox kernel: [33647.127456] disk 2, o:0, dev:sdd1
May 1 08:28:59 blackbox kernel: [33647.127461] RAID conf printout:
May 1 08:28:59 blackbox kernel: [33647.127463] --- level:5 rd:3 wd:0
May 1 08:28:59 blackbox kernel: [33647.127465] disk 2, o:0, dev:sdd1
May 1 08:28:59 blackbox kernel: [33647.167454] RAID conf printout:
May 1 08:28:59 blackbox kernel: [33647.167459] --- level:5 rd:3 wd:0
May 1 08:28:59 blackbox kernel: [33647.167474] Buffer I/O error on
device dm-0, logical block 1714946056
May 1 08:28:59 blackbox kernel: [33647.168557] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.168641] Buffer I/O error on
device dm-0, logical block 1714946057
May 1 08:28:59 blackbox kernel: [33647.170230] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.170298] Buffer I/O error on
device dm-0, logical block 1714946058
May 1 08:28:59 blackbox kernel: [33647.171896] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.171962] Buffer I/O error on
device dm-0, logical block 1714946059
May 1 08:28:59 blackbox kernel: [33647.173396] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.173486] Buffer I/O error on
device dm-0, logical block 1714946061
May 1 08:28:59 blackbox kernel: [33647.174512] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.174575] Buffer I/O error on
device dm-0, logical block 1714946060
May 1 08:28:59 blackbox kernel: [33647.174605] Buffer I/O error on
device dm-0, logical block 902467307
May 1 08:28:59 blackbox kernel: [33647.174608] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.176545] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.176646] Buffer I/O error on
device dm-0, logical block 999292932
May 1 08:28:59 blackbox kernel: [33647.177560] lost page write due to
I/O error on dm-0
May 1 08:28:59 blackbox kernel: [33647.177738] EXT4-fs (dm-0): previous
I/O error to superblock detected
May 1 08:28:59 blackbox kernel: [33647.178680] EXT4-fs error (device
dm-0): ext4_put_super:818: Couldn't clean up the journal
May 1 08:29:06 blackbox mdadm[1876]: Fail event detected on md device
/dev/md1, component device /dev/sdc1
May 1 08:29:11 blackbox mdadm[1876]: Fail event detected on md device
/dev/md1, component device /dev/sde1
May 1 08:29:17 blackbox mdadm[1876]: Fail event detected on md device
/dev/md1, component device /dev/sdd1
And the /dev/md0 array is now corrupt. The /dev/md1 array appears
fine, but obviously without the /dev/md0 that the LV was spanned across
it is not usable.
Each drive that was previously in /dev/md0 has the following output:
mdadm --examine /dev/sdh1
/dev/sdh1:
Magic : a92b4efc
Version : 0.90.00
UUID : 00000000:00000000:00000000:00000000
Creation Time : Tue May 1 14:44:06 2012
Raid Level : -unknown-
Raid Devices : 0
Total Devices : 2
Preferred Minor : 0
Update Time : Tue May 1 16:24:56 2012
State : active
Active Devices : 0
Working Devices : 2
Failed Devices : 0
Spare Devices : 2
Checksum : bccafbfb - correct
Events : 1
Number Major Minor RaidDevice State
this 0 8 113 0 spare /dev/sdh1
0 0 8 113 0 spare /dev/sdh1
1 1 8 81 1 spare /dev/sdf1
e.g. Raid Level is -unknown- and the UUID is
00000000:00000000:00000000:00000000
This appears to be a quite major bug, is this known, and is there any
way I can recover my data ?
Regards,
Andrew
next reply other threads:[~2012-05-01 6:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-01 6:34 Andrew Thrift [this message]
2012-05-01 7:36 ` Another corrupt RAID5 NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F9F83E2.90407@networklabs.co.nz \
--to=andrew@networklabs.co.nz \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.