linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Another corrupt RAID5
@ 2012-05-01  6:34 Andrew Thrift
  2012-05-01  7:36 ` NeilBrown
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Thrift @ 2012-05-01  6:34 UTC (permalink / raw)
  To: linux-raid

Hi,

I have had a working md raid5 configuration for a number of years now.  
Last year I rebuilt it in to a 2x Raid5 arrays as PV's for LVM2, which 
has been working great... Until I upgraded to Ubuntu 12.04 from 11.10.

I just noticed Christoph's post, and while my symptoms are very similar, 
they are also different. I will outline what happened below.

After the upgrade everything initially looked OK, however I noticed when 
I tried to list directory contents it would show nothing, and the logs 
would fill with IO errors e.g.:

Apr 30 22:54:41 blackbox kernel: [ 3648.798394] EXT4-fs error (device 
dm-0): __ext4_get_inode_loc:3657: inode #440926209: block 1763704864: 
comm smbd: unable to read itable block
Apr 30 22:54:41 blackbox kernel: [ 3648.799920] EXT4-fs (dm-0): previous 
I/O error to superblock detected
Apr 30 22:54:41 blackbox kernel: [ 3648.799935] EXT4-fs error (device 
dm-0): __ext4_get_inode_loc:3657: inode #440926209: block 1763704864: 
comm smbd: unable to read itable block
Apr 30 22:54:41 blackbox kernel: [ 3648.800026] EXT4-fs (dm-0): previous 
I/O error to superblock detected

I assumed that maybe the LSI2008 controller had maybe not spun up the 
drives properly, and gave the machine a reboot.  All appeared well now, 
so I left the machine.  However overnight the logs filled with:

May  1 00:09:37 blackbox kernel: [ 3712.741980] sd 9:0:3:0: [sdf] Device 
not ready
May  1 00:09:37 blackbox kernel: [ 3712.741985] sd 9:0:3:0: [sdf]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May  1 00:09:37 blackbox kernel: [ 3712.741990] sd 9:0:3:0: [sdf]  Sense 
Key : Not Ready [current]
May  1 00:09:37 blackbox kernel: [ 3712.741995] sd 9:0:3:0: [sdf]  Add. 
Sense: Logical unit not ready, initializing command required
May  1 00:09:37 blackbox kernel: [ 3712.742000] sd 9:0:3:0: [sdf] CDB: 
Read(10): 28 00 a9 d5 56 47 00 00 08 00
May  1 00:09:37 blackbox kernel: [ 3712.742011] end_request: I/O error, 
dev sdf, sector 2849330759
May  1 00:09:37 blackbox kernel: [ 3712.742120] sd 9:0:4:0: [sdg] Device 
not ready
May  1 00:09:37 blackbox kernel: [ 3712.742122] sd 9:0:4:0: [sdg]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May  1 00:09:37 blackbox kernel: [ 3712.742126] sd 9:0:4:0: [sdg]  Sense 
Key : Not Ready [current]
May  1 00:09:37 blackbox kernel: [ 3712.742132] sd 9:0:4:0: [sdg]  Add. 
Sense: Logical unit not ready, initializing command required
May  1 00:09:37 blackbox kernel: [ 3712.742136] sd 9:0:4:0: [sdg] CDB: 
Read(10): 28 00 a9 d5 56 47 00 00 08 00
May  1 00:09:37 blackbox kernel: [ 3712.742145] end_request: I/O error, 
dev sdg, sector 2849330759
May  1 00:09:37 blackbox kernel: [ 3712.742187] sd 9:0:5:0: [sdh] Device 
not ready
May  1 00:09:37 blackbox kernel: [ 3712.742189] sd 9:0:5:0: [sdh]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May  1 00:09:37 blackbox kernel: [ 3712.742192] sd 9:0:5:0: [sdh]  Sense 
Key : Not Ready [current]
May  1 00:09:37 blackbox kernel: [ 3712.742196] sd 9:0:5:0: [sdh]  Add. 
Sense: Logical unit not ready, initializing command required
May  1 00:09:37 blackbox kernel: [ 3712.742200] sd 9:0:5:0: [sdh] CDB: 
Read(10): 28 00 a9 d5 56 47 00 00 08 00
May  1 00:09:37 blackbox kernel: [ 3712.742208] end_request: I/O error, 
dev sdh, sector 2849330759
May  1 00:09:37 blackbox kernel: [ 3712.756852] md/raid:md0: Disk 
failure on sdh1, disabling device.
May  1 00:09:37 blackbox kernel: [ 3712.756854] md/raid:md0: Operation 
continuing on 3 devices.
May  1 00:09:37 blackbox kernel: [ 3712.756925] md/raid:md0: Disk 
failure on sdg1, disabling device.
May  1 00:09:37 blackbox kernel: [ 3712.756926] md/raid:md0: Operation 
continuing on 2 devices.
May  1 00:09:37 blackbox kernel: [ 3712.756985] md/raid:md0: Disk 
failure on sdf1, disabling device.
May  1 00:09:37 blackbox kernel: [ 3712.756986] md/raid:md0: Operation 
continuing on 1 devices.
May  1 00:09:37 blackbox kernel: [ 3712.757038] EXT4-fs error (device 
dm-0): ext4_read_inode_bitmap:161: comm nfsd: Cannot read inode bitmap - 
block_group = 32609, inode_bitmap = 1068498961
May  1 00:09:37 blackbox kernel: [ 3712.757083] EXT4-fs error (device 
dm-0) in ext4_new_inode:937: IO failure
May  1 00:09:37 blackbox kernel: [ 3712.863217] RAID conf printout:
May  1 00:09:37 blackbox kernel: [ 3712.863222]  --- level:5 rd:4 wd:1
May  1 00:09:37 blackbox kernel: [ 3712.863225]  disk 0, o:0, dev:sdf1
May  1 00:09:37 blackbox kernel: [ 3712.863227]  disk 1, o:0, dev:sdg1
May  1 00:09:37 blackbox kernel: [ 3712.863229]  disk 2, o:1, dev:sdi1
May  1 00:09:37 blackbox kernel: [ 3712.863231]  disk 3, o:0, dev:sdh1
May  1 00:09:37 blackbox kernel: [ 3712.864483] RAID conf printout:
May  1 00:09:37 blackbox kernel: [ 3712.864487]  --- level:5 rd:4 wd:1
May  1 00:09:37 blackbox kernel: [ 3712.864491]  disk 0, o:0, dev:sdf1
May  1 00:09:37 blackbox kernel: [ 3712.864493]  disk 1, o:0, dev:sdg1
May  1 00:09:37 blackbox kernel: [ 3712.864495]  disk 2, o:1, dev:sdi1
May  1 00:09:37 blackbox kernel: [ 3712.864501] RAID conf printout:
May  1 00:09:37 blackbox kernel: [ 3712.864503]  --- level:5 rd:4 wd:1
May  1 00:09:37 blackbox kernel: [ 3712.864505]  disk 0, o:0, dev:sdf1
May  1 00:09:37 blackbox kernel: [ 3712.864507]  disk 1, o:0, dev:sdg1
May  1 00:09:37 blackbox kernel: [ 3712.864508]  disk 2, o:1, dev:sdi1
May  1 00:09:37 blackbox kernel: [ 3712.869463] RAID conf printout:
May  1 00:09:37 blackbox kernel: [ 3712.869467]  --- level:5 rd:4 wd:1
May  1 00:09:37 blackbox kernel: [ 3712.869471]  disk 0, o:0, dev:sdf1
May  1 00:09:37 blackbox kernel: [ 3712.869473]  disk 2, o:1, dev:sdi1
May  1 00:09:37 blackbox kernel: [ 3712.869477] RAID conf printout:
May  1 00:09:37 blackbox kernel: [ 3712.869479]  --- level:5 rd:4 wd:1
May  1 00:09:37 blackbox kernel: [ 3712.869481]  disk 0, o:0, dev:sdf1
May  1 00:09:37 blackbox kernel: [ 3712.869483]  disk 2, o:1, dev:sdi1
May  1 00:09:37 blackbox kernel: [ 3712.869554] RAID conf printout:
May  1 00:09:37 blackbox kernel: [ 3712.869559]  --- level:5 rd:4 wd:1
May  1 00:09:37 blackbox kernel: [ 3712.869562]  disk 2, o:1, dev:sdi1
May  1 00:09:37 blackbox kernel: [ 3712.869578] Buffer I/O error on 
device dm-0, logical block 0
May  1 00:09:37 blackbox kernel: [ 3712.869613] lost page write due to 
I/O error on dm-0
May  1 00:09:42 blackbox kernel: [ 3718.213744] Aborting journal on 
device dm-0-8.
May  1 00:09:42 blackbox kernel: [ 3718.213828] Buffer I/O error on 
device dm-0, logical block 976781312
May  1 00:09:42 blackbox kernel: [ 3718.213867] lost page write due to 
I/O error on dm-0
May  1 00:09:42 blackbox kernel: [ 3718.213876] JBD2: I/O error detected 
when updating journal superblock for dm-0-8.
May  1 00:09:43 blackbox mdadm[1876]: Fail event detected on md device 
/dev/md0, component device /dev/sdf1
May  1 00:09:49 blackbox mdadm[1876]: Fail event detected on md device 
/dev/md0, component device /dev/sdg1
May  1 00:09:54 blackbox mdadm[1876]: Fail event detected on md device 
/dev/md0, component device /dev/sdh1
May  1 05:55:38 blackbox kernel: [24453.921252] EXT4-fs (dm-0): previous 
I/O error to superblock detected
May  1 05:55:38 blackbox kernel: [24453.966924] Buffer I/O error on 
device dm-0, logical block 0
May  1 05:55:38 blackbox kernel: [24453.966960] lost page write due to 
I/O error on dm-0
May  1 05:55:38 blackbox kernel: [24453.966970] EXT4-fs error (device 
dm-0): ext4_journal_start_sb:327: Detected aborted journal
May  1 05:55:38 blackbox kernel: [24453.967025] EXT4-fs (dm-0): 
Remounting filesystem read-only
May  1 05:55:38 blackbox kernel: [24453.967057] EXT4-fs (dm-0): previous 
I/O error to superblock detected
May  1 05:55:38 blackbox kernel: [24453.967107] Buffer I/O error on 
device dm-0, logical block 0
May  1 05:55:38 blackbox kernel: [24453.967140] lost page write due to 
I/O error on dm-0
May  1 06:25:14 blackbox kernel: [26228.988963] Buffer I/O error on 
device dm-0, logical block 9250
May  1 06:25:14 blackbox kernel: [26228.989008] Buffer I/O error on 
device dm-0, logical block 9251
May  1 06:25:14 blackbox kernel: [26228.989044] Buffer I/O error on 
device dm-0, logical block 9252
May  1 06:25:14 blackbox kernel: [26228.989080] Buffer I/O error on 
device dm-0, logical block 9253
May  1 06:25:14 blackbox kernel: [26228.989116] Buffer I/O error on 
device dm-0, logical block 9254
May  1 06:25:14 blackbox kernel: [26228.989151] Buffer I/O error on 
device dm-0, logical block 9255
May  1 06:25:14 blackbox kernel: [26228.989186] Buffer I/O error on 
device dm-0, logical block 9256
May  1 06:25:14 blackbox kernel: [26228.989221] Buffer I/O error on 
device dm-0, logical block 9257
May  1 06:25:14 blackbox kernel: [26228.989256] Buffer I/O error on 
device dm-0, logical block 9258
May  1 06:25:14 blackbox kernel: [26228.989291] Buffer I/O error on 
device dm-0, logical block 9259
May  1 06:25:14 blackbox kernel: [26228.989345] EXT4-fs (dm-0): previous 
I/O error to superblock detected
May  1 06:25:14 blackbox kernel: [26229.070433] EXT4-fs error (device 
dm-0): ext4_readdir:173: inode #11: comm standard: path 
/media/store0/lost+found: directory contains a hole at offset 0
May  1 08:28:59 blackbox kernel: [33646.969601] journal commit I/O error
May  1 08:28:59 blackbox kernel: [33647.017036] Buffer I/O error on 
device dm-0, logical block 902299653
May  1 08:28:59 blackbox kernel: [33647.017107] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.017123] sd 9:0:2:0: [sde] Device 
not ready
May  1 08:28:59 blackbox kernel: [33647.017125] sd 9:0:2:0: [sde]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May  1 08:28:59 blackbox kernel: [33647.017129] sd 9:0:2:0: [sde]  Sense 
Key : Not Ready [current]
May  1 08:28:59 blackbox kernel: [33647.017136] sd 9:0:2:0: [sde]  Add. 
Sense: Logical unit not ready, initializing command required
May  1 08:28:59 blackbox kernel: [33647.017141] sd 9:0:2:0: [sde] CDB: 
Write(10): 2a 00 74 70 59 3f 00 00 08 00
May  1 08:28:59 blackbox kernel: [33647.017153] end_request: I/O error, 
dev sde, sector 1953519935
May  1 08:28:59 blackbox kernel: [33647.017188] end_request: I/O error, 
dev sde, sector 1953519935
May  1 08:28:59 blackbox kernel: [33647.017221] md: super_written gets 
error=-5, uptodate=0
May  1 08:28:59 blackbox kernel: [33647.017225] md/raid:md1: Disk 
failure on sde1, disabling device.
May  1 08:28:59 blackbox kernel: [33647.017226] md/raid:md1: Operation 
continuing on 2 devices.
May  1 08:28:59 blackbox kernel: [33647.017298] sd 9:0:0:0: [sdc] Device 
not ready
May  1 08:28:59 blackbox kernel: [33647.017300] sd 9:0:0:0: [sdc]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May  1 08:28:59 blackbox kernel: [33647.017303] sd 9:0:0:0: [sdc]  Sense 
Key : Not Ready [current]
May  1 08:28:59 blackbox kernel: [33647.017307] sd 9:0:0:0: [sdc]  Add. 
Sense: Logical unit not ready, initializing command required
May  1 08:28:59 blackbox kernel: [33647.017312] sd 9:0:0:0: [sdc] CDB: 
Write(10): 2a 00 74 70 59 3f 00 00 08 00
May  1 08:28:59 blackbox kernel: [33647.017320] end_request: I/O error, 
dev sdc, sector 1953519935
May  1 08:28:59 blackbox kernel: [33647.017354] end_request: I/O error, 
dev sdc, sector 1953519935
May  1 08:28:59 blackbox kernel: [33647.017386] md: super_written gets 
error=-5, uptodate=0
May  1 08:28:59 blackbox kernel: [33647.017389] md/raid:md1: Disk 
failure on sdc1, disabling device.
May  1 08:28:59 blackbox kernel: [33647.017390] md/raid:md1: Operation 
continuing on 1 devices.
May  1 08:28:59 blackbox kernel: [33647.017455] sd 9:0:1:0: [sdd] Device 
not ready
May  1 08:28:59 blackbox kernel: [33647.017457] sd 9:0:1:0: [sdd]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May  1 08:28:59 blackbox kernel: [33647.017461] sd 9:0:1:0: [sdd]  Sense 
Key : Not Ready [current]
May  1 08:28:59 blackbox kernel: [33647.017464] sd 9:0:1:0: [sdd]  Add. 
Sense: Logical unit not ready, initializing command required
May  1 08:28:59 blackbox kernel: [33647.017468] sd 9:0:1:0: [sdd] CDB: 
Write(10): 2a 00 74 70 59 3f 00 00 08 00
May  1 08:28:59 blackbox kernel: [33647.017476] end_request: I/O error, 
dev sdd, sector 1953519935
May  1 08:28:59 blackbox kernel: [33647.017509] end_request: I/O error, 
dev sdd, sector 1953519935
May  1 08:28:59 blackbox kernel: [33647.018544] md: super_written gets 
error=-5, uptodate=0
May  1 08:28:59 blackbox kernel: [33647.018547] md/raid:md1: Disk 
failure on sdd1, disabling device.
May  1 08:28:59 blackbox kernel: [33647.018548] md/raid:md1: Operation 
continuing on 0 devices.
May  1 08:28:59 blackbox kernel: [33647.020709] RAID conf printout:
May  1 08:28:59 blackbox kernel: [33647.020714]  --- level:5 rd:3 wd:0
May  1 08:28:59 blackbox kernel: [33647.020718]  disk 0, o:0, dev:sdc1
May  1 08:28:59 blackbox kernel: [33647.020722]  disk 1, o:0, dev:sde1
May  1 08:28:59 blackbox kernel: [33647.020726]  disk 2, o:0, dev:sdd1
May  1 08:28:59 blackbox kernel: [33647.067507] RAID conf printout:
May  1 08:28:59 blackbox kernel: [33647.067512]  --- level:5 rd:3 wd:0
May  1 08:28:59 blackbox kernel: [33647.067515]  disk 0, o:0, dev:sdc1
May  1 08:28:59 blackbox kernel: [33647.067517]  disk 2, o:0, dev:sdd1
May  1 08:28:59 blackbox kernel: [33647.067523] RAID conf printout:
May  1 08:28:59 blackbox kernel: [33647.067525]  --- level:5 rd:3 wd:0
May  1 08:28:59 blackbox kernel: [33647.067527]  disk 0, o:0, dev:sdc1
May  1 08:28:59 blackbox kernel: [33647.067529]  disk 2, o:0, dev:sdd1
May  1 08:28:59 blackbox kernel: [33647.127449] RAID conf printout:
May  1 08:28:59 blackbox kernel: [33647.127453]  --- level:5 rd:3 wd:0
May  1 08:28:59 blackbox kernel: [33647.127456]  disk 2, o:0, dev:sdd1
May  1 08:28:59 blackbox kernel: [33647.127461] RAID conf printout:
May  1 08:28:59 blackbox kernel: [33647.127463]  --- level:5 rd:3 wd:0
May  1 08:28:59 blackbox kernel: [33647.127465]  disk 2, o:0, dev:sdd1
May  1 08:28:59 blackbox kernel: [33647.167454] RAID conf printout:
May  1 08:28:59 blackbox kernel: [33647.167459]  --- level:5 rd:3 wd:0
May  1 08:28:59 blackbox kernel: [33647.167474] Buffer I/O error on 
device dm-0, logical block 1714946056
May  1 08:28:59 blackbox kernel: [33647.168557] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.168641] Buffer I/O error on 
device dm-0, logical block 1714946057
May  1 08:28:59 blackbox kernel: [33647.170230] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.170298] Buffer I/O error on 
device dm-0, logical block 1714946058
May  1 08:28:59 blackbox kernel: [33647.171896] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.171962] Buffer I/O error on 
device dm-0, logical block 1714946059
May  1 08:28:59 blackbox kernel: [33647.173396] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.173486] Buffer I/O error on 
device dm-0, logical block 1714946061
May  1 08:28:59 blackbox kernel: [33647.174512] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.174575] Buffer I/O error on 
device dm-0, logical block 1714946060
May  1 08:28:59 blackbox kernel: [33647.174605] Buffer I/O error on 
device dm-0, logical block 902467307
May  1 08:28:59 blackbox kernel: [33647.174608] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.176545] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.176646] Buffer I/O error on 
device dm-0, logical block 999292932
May  1 08:28:59 blackbox kernel: [33647.177560] lost page write due to 
I/O error on dm-0
May  1 08:28:59 blackbox kernel: [33647.177738] EXT4-fs (dm-0): previous 
I/O error to superblock detected
May  1 08:28:59 blackbox kernel: [33647.178680] EXT4-fs error (device 
dm-0): ext4_put_super:818: Couldn't clean up the journal
May  1 08:29:06 blackbox mdadm[1876]: Fail event detected on md device 
/dev/md1, component device /dev/sdc1
May  1 08:29:11 blackbox mdadm[1876]: Fail event detected on md device 
/dev/md1, component device /dev/sde1
May  1 08:29:17 blackbox mdadm[1876]: Fail event detected on md device 
/dev/md1, component device /dev/sdd1

And the /dev/md0 array is now corrupt.   The /dev/md1 array appears 
fine, but obviously without the /dev/md0 that the LV was spanned across 
it is not usable.

Each drive that was previously in /dev/md0 has the following output:

mdadm --examine /dev/sdh1
/dev/sdh1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 00000000:00000000:00000000:00000000
   Creation Time : Tue May  1 14:44:06 2012
      Raid Level : -unknown-
    Raid Devices : 0
   Total Devices : 2
Preferred Minor : 0

     Update Time : Tue May  1 16:24:56 2012
           State : active
  Active Devices : 0
Working Devices : 2
  Failed Devices : 0
   Spare Devices : 2
        Checksum : bccafbfb - correct
          Events : 1


       Number   Major   Minor   RaidDevice State
this     0       8      113        0      spare   /dev/sdh1

    0     0       8      113        0      spare   /dev/sdh1
    1     1       8       81        1      spare   /dev/sdf1


e.g. Raid Level is -unknown- and the UUID is 
00000000:00000000:00000000:00000000

This appears to be a quite major bug, is this known, and is there any 
way I can recover my data ?



Regards,








Andrew



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-05-01  7:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-01  6:34 Another corrupt RAID5 Andrew Thrift
2012-05-01  7:36 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).