* Assembly failure @ 2012-07-10 16:33 Brian Candler 2012-07-10 16:48 ` Sebastian Riemer 2012-07-10 17:05 ` pants 0 siblings, 2 replies; 16+ messages in thread From: Brian Candler @ 2012-07-10 16:33 UTC (permalink / raw) To: linux-raid An odd one here. Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine has a boot disk plus 12 other disks in a RAID10 far2 array. System was working fine, but after most recent reboot mdraid failed to assemble. root@dev-storage1:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S) 35163186720 blocks super 1.2 unused devices: <none> dmesg shows periodic "export_rdev" messages: root@dev-storage1:~# dmesg | grep md: [ 953.986401] md: export_rdev(sdo) [ 953.988515] md: export_rdev(sdo) [ 960.237392] md: export_rdev(sdp) [ 960.241928] md: export_rdev(sdp) [ 960.965132] md: export_rdev(sdr) [ 960.967265] md: export_rdev(sdr) [ 1012.573415] md: export_rdev(sdo) [ 1012.575650] md: export_rdev(sdo) [ 1012.829690] md: export_rdev(sdp) [ 1012.831493] md: export_rdev(sdp) ... [19378.332473] md: export_rdev(sds) [19378.333764] md: export_rdev(sds) [19417.220171] md: export_rdev(sdr) [19417.221748] md: export_rdev(sdr) [23739.824227] md: export_rdev(sdr) [23739.825554] md: export_rdev(sdr) [23740.568940] md: export_rdev(sds) [23740.570079] md: export_rdev(sds) metadata (see below) suggests that some drives think members 1/3/4 are missing, but those drives think the array is fine. The "Events" counts are different on some members though. What's the best thing to do here - attempt to force assembly? Any ideas how it got into this state? The machine was rebooted a couple of times but in what should have been a clean way, i.e. sudo reboot or sudo halt -p. Many thanks, Brian. root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done === /dev/sdj === /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : 5673ac95 - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 8 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdk === /dev/sdk: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : 9020674a - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 9 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdl === /dev/sdl: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : 30664ccb - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 10 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdm === /dev/sdm: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : e3603570:8e767487:63f3131b:afe358ea Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : 33446897 - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 11 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdn === /dev/sdn: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0 Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : ce0a8f46 - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 0 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdo === /dev/sdo: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : b09f3869:09ce8a89:31ed7097:d3621064 Internal Bitmap : 8 sectors from superblock Update Time : Sat Jul 7 14:02:07 2012 Checksum : 246eb119 - correct Events : 29355 Layout : far=2 Chunk Size : 1024K Device Role : Active device 2 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdp === /dev/sdp: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6 Internal Bitmap : 8 sectors from superblock Update Time : Sat Jul 7 14:00:55 2012 Checksum : 61c11eeb - correct Events : 29352 Layout : far=2 Chunk Size : 1024K Device Role : Active device 3 Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) === /dev/sdq === /dev/sdq: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : fd1699fc - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 7 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdr === /dev/sdr: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : e0565412:a68cf236:9da9a141:e89f935a Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 3 04:21:50 2012 Checksum : 1ba72ec1 - correct Events : 20228 Layout : far=2 Chunk Size : 1024K Device Role : Active device 1 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) === /dev/sds === /dev/sds: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9 Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 3 06:37:33 2012 Checksum : 24f36d4c - correct Events : 29312 Layout : far=2 Chunk Size : 1024K Device Role : Active device 5 Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) === /dev/sdt === /dev/sdt: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7 Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 10 10:46:19 2012 Checksum : 44250f1b - correct Events : 29374 Layout : far=2 Chunk Size : 1024K Device Role : Active device 6 Array State : A.A..AAAAAAA ('A' == active, '.' == missing) === /dev/sdu === /dev/sdu: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 Name : storage1:storage1 Creation Time : Thu Jun 7 13:51:21 2012 Raid Level : raid10 Raid Devices : 12 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 35163168768 (16767.11 GiB 18003.54 GB) Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3 Internal Bitmap : 8 sectors from superblock Update Time : Tue Jul 3 06:36:18 2012 Checksum : 22812848 - correct Events : 29272 Layout : far=2 Chunk Size : 1024K Device Role : Active device 4 Array State : A.AAAAAAAAAA ('A' == active, '.' == missing) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-10 16:33 Assembly failure Brian Candler @ 2012-07-10 16:48 ` Sebastian Riemer 2012-07-10 17:06 ` Brian Candler 2012-07-10 17:05 ` pants 1 sibling, 1 reply; 16+ messages in thread From: Sebastian Riemer @ 2012-07-10 16:48 UTC (permalink / raw) To: Brian Candler; +Cc: linux-raid 1. Are you crazy to do so? The kernel 3.2 is the stable for Ubuntu 12.04. 2. Please provide the complete Ubuntu version number of your kernel so that we can look for the commits in the Ubuntu Git. Should be git://kernel.ubuntu.com/ppisati/ubuntu-quantal.git. There were some nasty bugs in 3.4.0 mainline - I don't know if the fixes for them are in your kernel. Cheers, Sebastian On 10.07.2012 18:33, Brian Candler wrote: > An odd one here. > > Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine > has a boot disk plus 12 other disks in a RAID10 far2 array. > > System was working fine, but after most recent reboot mdraid failed to > assemble. > > root@dev-storage1:~# cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S) > 35163186720 blocks super 1.2 > > unused devices: <none> > > dmesg shows periodic "export_rdev" messages: > > root@dev-storage1:~# dmesg | grep md: > [ 953.986401] md: export_rdev(sdo) > [ 953.988515] md: export_rdev(sdo) > [ 960.237392] md: export_rdev(sdp) > [ 960.241928] md: export_rdev(sdp) > [ 960.965132] md: export_rdev(sdr) > [ 960.967265] md: export_rdev(sdr) > [ 1012.573415] md: export_rdev(sdo) > [ 1012.575650] md: export_rdev(sdo) > [ 1012.829690] md: export_rdev(sdp) > [ 1012.831493] md: export_rdev(sdp) > ... > [19378.332473] md: export_rdev(sds) > [19378.333764] md: export_rdev(sds) > [19417.220171] md: export_rdev(sdr) > [19417.221748] md: export_rdev(sdr) > [23739.824227] md: export_rdev(sdr) > [23739.825554] md: export_rdev(sdr) > [23740.568940] md: export_rdev(sds) > [23740.570079] md: export_rdev(sds) > > metadata (see below) suggests that some drives think members 1/3/4 are > missing, but those drives think the array is fine. The "Events" counts are > different on some members though. > > What's the best thing to do here - attempt to force assembly? Any ideas how > it got into this state? > > The machine was rebooted a couple of times but in what should have been a > clean way, i.e. sudo reboot or sudo halt -p. > > Many thanks, > > Brian. > > > root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done > === /dev/sdj === > /dev/sdj: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : 5673ac95 - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 8 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdk === > /dev/sdk: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : 9020674a - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 9 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdl === > /dev/sdl: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : 30664ccb - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 10 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdm === > /dev/sdm: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : e3603570:8e767487:63f3131b:afe358ea > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : 33446897 - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 11 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdn === > /dev/sdn: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : ce0a8f46 - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 0 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdo === > /dev/sdo: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : b09f3869:09ce8a89:31ed7097:d3621064 > > Internal Bitmap : 8 sectors from superblock > Update Time : Sat Jul 7 14:02:07 2012 > Checksum : 246eb119 - correct > Events : 29355 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 2 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdp === > /dev/sdp: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6 > > Internal Bitmap : 8 sectors from superblock > Update Time : Sat Jul 7 14:00:55 2012 > Checksum : 61c11eeb - correct > Events : 29352 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 3 > Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) > === /dev/sdq === > /dev/sdq: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : fd1699fc - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 7 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdr === > /dev/sdr: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : e0565412:a68cf236:9da9a141:e89f935a > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 3 04:21:50 2012 > Checksum : 1ba72ec1 - correct > Events : 20228 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 1 > Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) > === /dev/sds === > /dev/sds: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 3 06:37:33 2012 > Checksum : 24f36d4c - correct > Events : 29312 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 5 > Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) > === /dev/sdt === > /dev/sdt: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 10 10:46:19 2012 > Checksum : 44250f1b - correct > Events : 29374 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 6 > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > === /dev/sdu === > /dev/sdu: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > Name : storage1:storage1 > Creation Time : Thu Jun 7 13:51:21 2012 > Raid Level : raid10 > Raid Devices : 12 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Jul 3 06:36:18 2012 > Checksum : 22812848 - correct > Events : 29272 > > Layout : far=2 > Chunk Size : 1024K > > Device Role : Active device 4 > Array State : A.AAAAAAAAAA ('A' == active, '.' == missing) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-10 16:48 ` Sebastian Riemer @ 2012-07-10 17:06 ` Brian Candler 2012-07-10 17:38 ` Sebastian Riemer 0 siblings, 1 reply; 16+ messages in thread From: Brian Candler @ 2012-07-10 17:06 UTC (permalink / raw) To: Sebastian Riemer; +Cc: linux-raid On Tue, Jul 10, 2012 at 06:48:00PM +0200, Sebastian Riemer wrote: > 1. Are you crazy to do so? The kernel 3.2 is the stable for Ubuntu 12.04. Possibly crazy :-) Specifically I had been testing out whether the direct-io-enable option for glusterfs would be helpful (it wasn't) - this required FUSE support for O_DIRECT which is not in 3.2.0. > 2. Please provide the complete Ubuntu version number of your kernel so > that we can look for the commits in the Ubuntu Git. brian@dev-storage1:~$ uname -a Linux dev-storage1.example.com 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Packages: ii linux-headers-3.4.0-030400 3.4.0-030400.201205210521 Header files related to Linux kernel version 3.4.0 ii linux-headers-3.4.0-030400-generic 3.4.0-030400.201205210521 Linux kernel headers for version 3.4.0 on 32 bit x86 SMP ii linux-image-3.4.0-030400-generic 3.4.0-030400.201205210521 Linux kernel image for version 3.4.0 on 32 bit x86 SMP It was downloaded from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/ There doesn't seem to be any newer 3.4.x for precise. > Should be > git://kernel.ubuntu.com/ppisati/ubuntu-quantal.git. > There were some nasty bugs in 3.4.0 mainline - I don't know if the fixes > for them are in your kernel. I have no problem rolling back into 3.2.0, but I'm also very happy to do any diagnostics which may be helpful before I do so. Regards, Brian. > Cheers, > Sebastian > > > On 10.07.2012 18:33, Brian Candler wrote: > > An odd one here. > > > > Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine > > has a boot disk plus 12 other disks in a RAID10 far2 array. > > > > System was working fine, but after most recent reboot mdraid failed to > > assemble. > > > > root@dev-storage1:~# cat /proc/mdstat > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > > md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S) > > 35163186720 blocks super 1.2 > > > > unused devices: <none> > > > > dmesg shows periodic "export_rdev" messages: > > > > root@dev-storage1:~# dmesg | grep md: > > [ 953.986401] md: export_rdev(sdo) > > [ 953.988515] md: export_rdev(sdo) > > [ 960.237392] md: export_rdev(sdp) > > [ 960.241928] md: export_rdev(sdp) > > [ 960.965132] md: export_rdev(sdr) > > [ 960.967265] md: export_rdev(sdr) > > [ 1012.573415] md: export_rdev(sdo) > > [ 1012.575650] md: export_rdev(sdo) > > [ 1012.829690] md: export_rdev(sdp) > > [ 1012.831493] md: export_rdev(sdp) > > ... > > [19378.332473] md: export_rdev(sds) > > [19378.333764] md: export_rdev(sds) > > [19417.220171] md: export_rdev(sdr) > > [19417.221748] md: export_rdev(sdr) > > [23739.824227] md: export_rdev(sdr) > > [23739.825554] md: export_rdev(sdr) > > [23740.568940] md: export_rdev(sds) > > [23740.570079] md: export_rdev(sds) > > > > metadata (see below) suggests that some drives think members 1/3/4 are > > missing, but those drives think the array is fine. The "Events" counts are > > different on some members though. > > > > What's the best thing to do here - attempt to force assembly? Any ideas how > > it got into this state? > > > > The machine was rebooted a couple of times but in what should have been a > > clean way, i.e. sudo reboot or sudo halt -p. > > > > Many thanks, > > > > Brian. > > > > > > root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done > > === /dev/sdj === > > /dev/sdj: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : 5673ac95 - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 8 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdk === > > /dev/sdk: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : 9020674a - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 9 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdl === > > /dev/sdl: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : 30664ccb - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 10 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdm === > > /dev/sdm: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : e3603570:8e767487:63f3131b:afe358ea > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : 33446897 - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 11 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdn === > > /dev/sdn: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0 > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : ce0a8f46 - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 0 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdo === > > /dev/sdo: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : b09f3869:09ce8a89:31ed7097:d3621064 > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Sat Jul 7 14:02:07 2012 > > Checksum : 246eb119 - correct > > Events : 29355 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 2 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdp === > > /dev/sdp: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6 > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Sat Jul 7 14:00:55 2012 > > Checksum : 61c11eeb - correct > > Events : 29352 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 3 > > Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdq === > > /dev/sdq: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : fd1699fc - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 7 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdr === > > /dev/sdr: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : e0565412:a68cf236:9da9a141:e89f935a > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 3 04:21:50 2012 > > Checksum : 1ba72ec1 - correct > > Events : 20228 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 1 > > Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) > > === /dev/sds === > > /dev/sds: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9 > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 3 06:37:33 2012 > > Checksum : 24f36d4c - correct > > Events : 29312 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 5 > > Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdt === > > /dev/sdt: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7 > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 10 10:46:19 2012 > > Checksum : 44250f1b - correct > > Events : 29374 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 6 > > Array State : A.A..AAAAAAA ('A' == active, '.' == missing) > > === /dev/sdu === > > /dev/sdu: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x1 > > Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 > > Name : storage1:storage1 > > Creation Time : Thu Jun 7 13:51:21 2012 > > Raid Level : raid10 > > Raid Devices : 12 > > > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > > Array Size : 35163168768 (16767.11 GiB 18003.54 GB) > > Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3 > > > > Internal Bitmap : 8 sectors from superblock > > Update Time : Tue Jul 3 06:36:18 2012 > > Checksum : 22812848 - correct > > Events : 29272 > > > > Layout : far=2 > > Chunk Size : 1024K > > > > Device Role : Active device 4 > > Array State : A.AAAAAAAAAA ('A' == active, '.' == missing) > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-10 17:06 ` Brian Candler @ 2012-07-10 17:38 ` Sebastian Riemer 2012-07-10 18:59 ` Brian Candler 0 siblings, 1 reply; 16+ messages in thread From: Sebastian Riemer @ 2012-07-10 17:38 UTC (permalink / raw) To: Brian Candler; +Cc: linux-raid Your kernel is similar to v3.4 mainline. Your kernel has been compiled one day after Linus tagged v3.4. This kernel has major issues. Please reboot into the old 3.2 kernel. Your kernel has no tag in the Ubuntu Git repos! http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=tags http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=tags Your kernel is absolutely unstable. Who built this kernel? Can't be official release! Cheers, Sebastian On 10.07.2012 19:06, Brian Candler wrote: > On Tue, Jul 10, 2012 at 06:48:00PM +0200, Sebastian Riemer wrote: >> 1. Are you crazy to do so? The kernel 3.2 is the stable for Ubuntu 12.04. > Possibly crazy :-) Specifically I had been testing out whether the > direct-io-enable option for glusterfs would be helpful (it wasn't) - this > required FUSE support for O_DIRECT which is not in 3.2.0. > >> 2. Please provide the complete Ubuntu version number of your kernel so >> that we can look for the commits in the Ubuntu Git. > brian@dev-storage1:~$ uname -a > Linux dev-storage1.example.com 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > Packages: > > ii linux-headers-3.4.0-030400 3.4.0-030400.201205210521 Header files related to Linux kernel version 3.4.0 > ii linux-headers-3.4.0-030400-generic 3.4.0-030400.201205210521 Linux kernel headers for version 3.4.0 on 32 bit x86 SMP > ii linux-image-3.4.0-030400-generic 3.4.0-030400.201205210521 Linux kernel image for version 3.4.0 on 32 bit x86 SMP > > It was downloaded from here: > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/ > > There doesn't seem to be any newer 3.4.x for precise. > >> Should be >> git://kernel.ubuntu.com/ppisati/ubuntu-quantal.git. >> There were some nasty bugs in 3.4.0 mainline - I don't know if the fixes >> for them are in your kernel. > I have no problem rolling back into 3.2.0, but I'm also very happy to do any > diagnostics which may be helpful before I do so. > > Regards, > > Brian. > >> Cheers, >> Sebastian >> >> >> On 10.07.2012 18:33, Brian Candler wrote: >>> An odd one here. >>> >>> Ubuntu 12.04 system, updated to 3.4.0 kernel from the mainline-ppa. Machine >>> has a boot disk plus 12 other disks in a RAID10 far2 array. >>> >>> System was working fine, but after most recent reboot mdraid failed to >>> assemble. >>> >>> root@dev-storage1:~# cat /proc/mdstat >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >>> md127 : inactive sdk[9](S) sdl[10](S) sdj[8](S) sdg[5](S) sdc[1](S) sdm[11](S) sdf[4](S) sde[3](S) sdi[7](S) sdd[2](S) sdb[0](S) sdh[6](S) >>> 35163186720 blocks super 1.2 >>> >>> unused devices: <none> >>> >>> dmesg shows periodic "export_rdev" messages: >>> >>> root@dev-storage1:~# dmesg | grep md: >>> [ 953.986401] md: export_rdev(sdo) >>> [ 953.988515] md: export_rdev(sdo) >>> [ 960.237392] md: export_rdev(sdp) >>> [ 960.241928] md: export_rdev(sdp) >>> [ 960.965132] md: export_rdev(sdr) >>> [ 960.967265] md: export_rdev(sdr) >>> [ 1012.573415] md: export_rdev(sdo) >>> [ 1012.575650] md: export_rdev(sdo) >>> [ 1012.829690] md: export_rdev(sdp) >>> [ 1012.831493] md: export_rdev(sdp) >>> ... >>> [19378.332473] md: export_rdev(sds) >>> [19378.333764] md: export_rdev(sds) >>> [19417.220171] md: export_rdev(sdr) >>> [19417.221748] md: export_rdev(sdr) >>> [23739.824227] md: export_rdev(sdr) >>> [23739.825554] md: export_rdev(sdr) >>> [23740.568940] md: export_rdev(sds) >>> [23740.570079] md: export_rdev(sds) >>> >>> metadata (see below) suggests that some drives think members 1/3/4 are >>> missing, but those drives think the array is fine. The "Events" counts are >>> different on some members though. >>> >>> What's the best thing to do here - attempt to force assembly? Any ideas how >>> it got into this state? >>> >>> The machine was rebooted a couple of times but in what should have been a >>> clean way, i.e. sudo reboot or sudo halt -p. >>> >>> Many thanks, >>> >>> Brian. >>> >>> >>> root@dev-storage1:~# for i in /dev/sd{j..u}; do echo "=== $i ==="; mdadm --examine $i; done >>> === /dev/sdj === >>> /dev/sdj: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : e96965d5:bf8986b7:fa83b813:e27aa17f >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : 5673ac95 - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 8 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdk === >>> /dev/sdk: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : f15bb47f:85ca59b1:cad42dec:f8b1b63c >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : 9020674a - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 9 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdl === >>> /dev/sdl: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : e2d82a4c:8409d883:cf2d9b7c:83829aad >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : 30664ccb - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 10 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdm === >>> /dev/sdm: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : e3603570:8e767487:63f3131b:afe358ea >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : 33446897 - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 11 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdn === >>> /dev/sdn: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : 549a5230:005cedf8:b37a0d7e:36648ff0 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : ce0a8f46 - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 0 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdo === >>> /dev/sdo: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : active >>> Device UUID : b09f3869:09ce8a89:31ed7097:d3621064 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Sat Jul 7 14:02:07 2012 >>> Checksum : 246eb119 - correct >>> Events : 29355 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 2 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdp === >>> /dev/sdp: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : active >>> Device UUID : 71494ee2:1504b35a:00a1d927:543db7c6 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Sat Jul 7 14:00:55 2012 >>> Checksum : 61c11eeb - correct >>> Events : 29352 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 3 >>> Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdq === >>> /dev/sdq: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : 17fcd5b8:97fc715f:0877d022:3770d08b >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : fd1699fc - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 7 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdr === >>> /dev/sdr: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : e0565412:a68cf236:9da9a141:e89f935a >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 3 04:21:50 2012 >>> Checksum : 1ba72ec1 - correct >>> Events : 20228 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 1 >>> Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) >>> === /dev/sds === >>> /dev/sds: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : active >>> Device UUID : c0f3bbab:70f3e69e:1314cdca:072d5ad9 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 3 06:37:33 2012 >>> Checksum : 24f36d4c - correct >>> Events : 29312 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 5 >>> Array State : A.AA.AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdt === >>> /dev/sdt: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : 6b4f9595:4de46aa8:fa695fe8:59b797e7 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 10 10:46:19 2012 >>> Checksum : 44250f1b - correct >>> Events : 29374 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 6 >>> Array State : A.A..AAAAAAA ('A' == active, '.' == missing) >>> === /dev/sdu === >>> /dev/sdu: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x1 >>> Array UUID : 16b260fd:e49bd157:da886cd0:5394e194 >>> Name : storage1:storage1 >>> Creation Time : Thu Jun 7 13:51:21 2012 >>> Raid Level : raid10 >>> Raid Devices : 12 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 35163168768 (16767.11 GiB 18003.54 GB) >>> Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : active >>> Device UUID : 46f03854:de80bec8:b44b062c:dd265ba3 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Update Time : Tue Jul 3 06:36:18 2012 >>> Checksum : 22812848 - correct >>> Events : 29272 >>> >>> Layout : far=2 >>> Chunk Size : 1024K >>> >>> Device Role : Active device 4 >>> Array State : A.AAAAAAAAAA ('A' == active, '.' == missing) >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sebastian Riemer Linux Kernel Developer ProfitBricks GmbH Greifswalder Str. 207 10405 Berlin, Germany Tel.: +49 - 30 - 60 98 56 991 - 303 Fax: +49 - 30 - 51 64 09 22 Email: sebastian.riemer@profitbricks.com Web: http://www.profitbricks.com/ Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Geschäftsführer: Andreas Gauger, Achim Weiss -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-10 17:38 ` Sebastian Riemer @ 2012-07-10 18:59 ` Brian Candler 2012-07-11 2:43 ` NeilBrown 0 siblings, 1 reply; 16+ messages in thread From: Brian Candler @ 2012-07-10 18:59 UTC (permalink / raw) To: Sebastian Riemer; +Cc: linux-raid On Tue, Jul 10, 2012 at 07:38:51PM +0200, Sebastian Riemer wrote: > Your kernel is similar to v3.4 mainline. Your kernel has been compiled > one day after Linus tagged v3.4. This kernel has major issues. Please > reboot into the old 3.2 kernel. > > Your kernel has no tag in the Ubuntu Git repos! > > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=tags > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=tags > > Your kernel is absolutely unstable. Who built this kernel? Can't be > official release! I don't know who makes ~kernel-ppa packages. Anyway, box is now on linux-image-3.2.0-24-generic. Same problem: brian@dev-storage1:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : inactive sdm[1](S) sdg[5](S) sdh[4](S) sdd[3](S) sdj[9](S) sdl[11](S) sdi[8](S) sdk[10](S) sdb[0](S) sde[7](S) sdf[6](S) sdc[2](S) 35163186720 blocks super 1.2 unused devices: <none> What's my best next step? There's nothing critical on here, but I would like to use this as practice of recovering a broken md raid volume. Regards, Brian. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-10 18:59 ` Brian Candler @ 2012-07-11 2:43 ` NeilBrown 2012-07-11 7:58 ` Brian Candler 0 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2012-07-11 2:43 UTC (permalink / raw) To: Brian Candler; +Cc: Sebastian Riemer, linux-raid [-- Attachment #1: Type: text/plain, Size: 2209 bytes --] On Tue, 10 Jul 2012 19:59:27 +0100 Brian Candler <B.Candler@pobox.com> wrote: > On Tue, Jul 10, 2012 at 07:38:51PM +0200, Sebastian Riemer wrote: > > Your kernel is similar to v3.4 mainline. Your kernel has been compiled > > one day after Linus tagged v3.4. This kernel has major issues. Please > > reboot into the old 3.2 kernel. > > > > Your kernel has no tag in the Ubuntu Git repos! > > > > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=tags > > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=tags > > > > Your kernel is absolutely unstable. Who built this kernel? Can't be > > official release! > > I don't know who makes ~kernel-ppa packages. > > Anyway, box is now on linux-image-3.2.0-24-generic. Same problem: > > brian@dev-storage1:~$ cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > md127 : inactive sdm[1](S) sdg[5](S) sdh[4](S) sdd[3](S) sdj[9](S) sdl[11](S) sdi[8](S) sdk[10](S) sdb[0](S) sde[7](S) sdf[6](S) sdc[2](S) > 35163186720 blocks super 1.2 > > unused devices: <none> > > What's my best next step? There's nothing critical on here, but I would like > to use this as practice of recovering a broken md raid volume. > mdadm -S /dev/md127 Then assemble again with "--force" as you expected. Don't try to --create --assume-clean, it isn't needed. And don't worry too much about the kernel - though keep away from any Ubuntu 3.2 kernel before the one you have -there is a nasty bug (unrelated to your current experience) that you don't want to go near. When you re-assemble it won't include all the devices in the array - just enough to make the array functional. You would then need to add the others back in if you trust them. As others have suggested, there is probably some hardware problem somewhere. It looks like sdr failed first, around "Jul 3 04:21:50 2012". The array continued working until about 06:36 when sdu then sds failed. Since then it doesn't look like much if anything has been written to the array - but I cannot be completely certain. Do you have kernel logs from the morning of 3rd July? NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 2:43 ` NeilBrown @ 2012-07-11 7:58 ` Brian Candler 2012-07-11 8:27 ` Christian Balzer 0 siblings, 1 reply; 16+ messages in thread From: Brian Candler @ 2012-07-11 7:58 UTC (permalink / raw) To: NeilBrown; +Cc: Sebastian Riemer, linux-raid On Wed, Jul 11, 2012 at 12:43:16PM +1000, NeilBrown wrote: > As others have suggested, there is probably some hardware problem somewhere. > It looks like sdr failed first, around "Jul 3 04:21:50 2012". > The array continued working until about 06:36 when sdu then sds failed. > Since then it doesn't look like much if anything has been written to the > array - but I cannot be completely certain. Good spotting, how did you work that out from `mdadm --examine` output? I see 8 drives in state "clean" and four in state "active". Three have update times on Jul 3, two on Jul 7, the rest on Jul 10. I couldn't see anything which obviously jumps out as "FAULTY". You're right that the system has been not doing any writes recently. > Do you have kernel logs from the morning of 3rd July? Logs at end of mail. It looks like sde failed, then a few seconds later reattached itself as sdn - and a couple of hours later more things started to fail (looks like sdh failed and reattached as sdo, and a minute later sdf failed and reattached as sdp). Very odd that these devices should fail so close together, maybe a power glitch? After that even more drives apparently started to fail. SMART shows 7 drives have reported at least one uncorrectable error: root@dev-storage1:~# for i in /dev/sd?; do smartctl -A $i | grep -i correct; done 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 092 092 000 Old_age Always - 8 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 098 098 000 Old_age Always - 2 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 095 095 000 Old_age Always - 5 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 092 092 000 Old_age Always - 8 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 098 098 000 Old_age Always - 2 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 093 093 000 Old_age Always - 7 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 And three have very high seek error rates: root@dev-storage1:~# for i in /dev/sd?; do smartctl -A $i | grep -i seek_error; done 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2233869 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2231308 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2199098 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2250443 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2220215 7 Seek_Error_Rate 0x000f 060 059 030 Pre-fail Always - 8592064174 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2238212 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 4297171738 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2170378 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2253934 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2192100 7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 4297224551 BTW these are all Seagate ST3000DM001. Yes, I know :-( So this doesn't look good. Having a go at reassembly anyway: root@dev-storage1:~# mdadm -S /dev/md127 mdadm: stopped /dev/md127 root@dev-storage1:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none> root@dev-storage1:~# ls /dev/sd* /dev/sda /dev/sda5 /dev/sde /dev/sdj /dev/sdn /dev/sdq /dev/sda1 /dev/sdb /dev/sdf /dev/sdk /dev/sdo /dev/sda2 /dev/sdc /dev/sdi /dev/sdl /dev/sdp root@dev-storage1:~# mdadm --assemble --force /dev/md/storage1 /dev/sd{b,c,e,f,i,j,k,l,n,o,p,q} mdadm: forcing event count in /dev/sdc(2) from 29355 upto 29374 mdadm: forcing event count in /dev/sdn(3) from 29352 upto 29374 mdadm: forcing event count in /dev/sdq(5) from 29312 upto 29374 mdadm: forcing event count in /dev/sdo(4) from 29272 upto 29374 mdadm: forcing event count in /dev/sdp(1) from 20228 upto 29374 mdadm: clearing FAULTY flag for device 10 in /dev/md/storage1 for /dev/sdp mdadm: clearing FAULTY flag for device 8 in /dev/md/storage1 for /dev/sdn mdadm: clearing FAULTY flag for device 9 in /dev/md/storage1 for /dev/sdo mdadm: Marking array /dev/md/storage1 as 'clean' mdadm: /dev/md/storage1 assembled from 12 drives - not enough to start the array. root@dev-storage1:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : inactive sdb[0](S) sdl[11](S) sdk[10](S) sdj[9](S) sdi[8](S) sde[7](S) sdf[6](S) sdq[5](S) sdo[4](S) sdn[3](S) sdc[2](S) sdp[1](S) 35163186720 blocks super 1.2 unused devices: <none> > Don't try to --create --assume-clean, it isn't needed. ... > When you re-assemble it won't include all the devices in the array - just > enough to make the array functional. You would then need to add the others > back in if you trust them. I am guessing that it's not starting the array because devices 8 and 9 were both marked as failed, which are two halves of the same pair. This is a test system but I will exchange at least the three drives with the high seek error rates. One final point. I would like to be able to monitor for suspect or failed drives. Is my best bet to look at /proc/mdstat output and identify drives which have been kicked out of the array? Or to monitor SMART variables (in that case though I need to decide which ones are the most important to monitor, and what thresholds to set)? It would be really useful if the kernel itself kept some per-drive counters for I/O failures, but if it does, I can't find them. http://www.kernel.org/doc/Documentation/block/stat.txt Regards, Brian. Jul 3 04:22:33 dev-storage1 kernel: [50147.362942] sd 4:0:3:0: [sde] Synchronizing SCSI cache Jul 3 04:22:33 dev-storage1 kernel: [50147.364646] end_request: I/O error, dev sde, sector 8 Jul 3 04:22:33 dev-storage1 kernel: [50147.364656] md: super_written gets error=-5, uptodate=0 Jul 3 04:22:33 dev-storage1 kernel: [50147.364663] md/raid10:md127: Disk failure on sde, disabling device. Jul 3 04:22:33 dev-storage1 kernel: [50147.364665] md/raid10:md127: Operation continuing on 11 devices. Jul 3 04:22:33 dev-storage1 kernel: [50147.364710] sd 4:0:3:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 04:22:33 dev-storage1 kernel: [50147.364881] mpt2sas0: removing handle(0x000c), sas_addr(0x4433221100000000) Jul 3 04:22:34 dev-storage1 kernel: [50147.437339] RAID10 conf printout: Jul 3 04:22:34 dev-storage1 kernel: [50147.437344] --- wd:11 rd:12 Jul 3 04:22:34 dev-storage1 kernel: [50147.437347] disk 0, wo:0, o:1, dev:sdb Jul 3 04:22:34 dev-storage1 kernel: [50147.437350] disk 1, wo:1, o:0, dev:sde Jul 3 04:22:34 dev-storage1 kernel: [50147.437352] disk 2, wo:0, o:1, dev:sdc Jul 3 04:22:34 dev-storage1 kernel: [50147.437354] disk 3, wo:0, o:1, dev:sdd Jul 3 04:22:34 dev-storage1 kernel: [50147.437357] disk 4, wo:0, o:1, dev:sdh Jul 3 04:22:34 dev-storage1 kernel: [50147.437359] disk 5, wo:0, o:1, dev:sdf Jul 3 04:22:34 dev-storage1 kernel: [50147.437361] disk 6, wo:0, o:1, dev:sdi Jul 3 04:22:34 dev-storage1 kernel: [50147.437363] disk 7, wo:0, o:1, dev:sdg Jul 3 04:22:34 dev-storage1 kernel: [50147.437366] disk 8, wo:0, o:1, dev:sdj Jul 3 04:22:34 dev-storage1 kernel: [50147.437368] disk 9, wo:0, o:1, dev:sdk Jul 3 04:22:34 dev-storage1 kernel: [50147.437370] disk 10, wo:0, o:1, dev:sdl Jul 3 04:22:34 dev-storage1 kernel: [50147.437372] disk 11, wo:0, o:1, dev:sdm Jul 3 04:22:34 dev-storage1 kernel: [50147.437429] RAID10 conf printout: Jul 3 04:22:34 dev-storage1 kernel: [50147.437434] --- wd:11 rd:12 Jul 3 04:22:34 dev-storage1 kernel: [50147.437437] disk 0, wo:0, o:1, dev:sdb Jul 3 04:22:34 dev-storage1 kernel: [50147.437439] disk 2, wo:0, o:1, dev:sdc Jul 3 04:22:34 dev-storage1 kernel: [50147.437441] disk 3, wo:0, o:1, dev:sdd Jul 3 04:22:34 dev-storage1 kernel: [50147.437444] disk 4, wo:0, o:1, dev:sdh Jul 3 04:22:34 dev-storage1 kernel: [50147.437446] disk 5, wo:0, o:1, dev:sdf Jul 3 04:22:34 dev-storage1 kernel: [50147.437448] disk 6, wo:0, o:1, dev:sdi Jul 3 04:22:34 dev-storage1 kernel: [50147.437450] disk 7, wo:0, o:1, dev:sdg Jul 3 04:22:34 dev-storage1 kernel: [50147.437452] disk 8, wo:0, o:1, dev:sdj Jul 3 04:22:34 dev-storage1 kernel: [50147.437454] disk 9, wo:0, o:1, dev:sdk Jul 3 04:22:34 dev-storage1 kernel: [50147.437457] disk 10, wo:0, o:1, dev:sdl Jul 3 04:22:34 dev-storage1 kernel: [50147.437459] disk 11, wo:0, o:1, dev:sdm Jul 3 04:22:47 dev-storage1 kernel: [50161.168292] scsi 4:0:8:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 3 04:22:47 dev-storage1 kernel: [50161.168302] scsi 4:0:8:0: SATA: handle(0x000c), sas_addr(0x4433221100000000), phy(0), device_name(0x5000c5004a37a3ae) Jul 3 04:22:47 dev-storage1 kernel: [50161.168307] scsi 4:0:8:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(3) Jul 3 04:22:47 dev-storage1 kernel: [50161.168457] scsi 4:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 3 04:22:47 dev-storage1 kernel: [50161.168465] scsi 4:0:8:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 3 04:22:47 dev-storage1 kernel: [50161.168686] sd 4:0:8:0: Attached scsi generic sg4 type 0 Jul 3 04:22:47 dev-storage1 kernel: [50161.170776] sd 4:0:8:0: [sdn] physical block alignment offset: 4096 Jul 3 04:22:47 dev-storage1 kernel: [50161.170783] sd 4:0:8:0: [sdn] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 3 04:22:47 dev-storage1 kernel: [50161.170786] sd 4:0:8:0: [sdn] 4096-byte physical blocks Jul 3 04:22:47 dev-storage1 kernel: [50161.233278] sd 4:0:8:0: [sdn] Write Protect is off Jul 3 04:22:47 dev-storage1 kernel: [50161.233283] sd 4:0:8:0: [sdn] Mode Sense: 7f 00 00 08 Jul 3 04:22:47 dev-storage1 kernel: [50161.233981] sd 4:0:8:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 3 04:22:47 dev-storage1 kernel: [50161.325272] sdn: unknown partition table Jul 3 04:22:47 dev-storage1 kernel: [50161.392086] sd 4:0:8:0: [sdn] Attached SCSI disk Jul 3 06:37:00 dev-storage1 kernel: [58191.291518] sd 4:0:6:0: [sdh] Synchronizing SCSI cache Jul 3 06:37:00 dev-storage1 kernel: [58191.291586] sd 4:0:6:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 06:37:00 dev-storage1 kernel: [58191.291975] md: super_written gets error=-19, uptodate=0 Jul 3 06:37:00 dev-storage1 kernel: [58191.291982] md/raid10:md127: Disk failure on sdh, disabling device. Jul 3 06:37:00 dev-storage1 kernel: [58191.291985] md/raid10:md127: Operation continuing on 10 devices. Jul 3 06:37:00 dev-storage1 kernel: [58191.292675] mpt2sas0: removing handle(0x000f), sas_addr(0x4433221105000000) Jul 3 06:37:00 dev-storage1 kernel: [58191.363950] RAID10 conf printout: Jul 3 06:37:00 dev-storage1 kernel: [58191.363955] --- wd:10 rd:12 Jul 3 06:37:00 dev-storage1 kernel: [58191.363958] disk 0, wo:0, o:1, dev:sdb Jul 3 06:37:00 dev-storage1 kernel: [58191.363961] disk 2, wo:0, o:1, dev:sdc Jul 3 06:37:00 dev-storage1 kernel: [58191.363963] disk 3, wo:0, o:1, dev:sdd Jul 3 06:37:00 dev-storage1 kernel: [58191.363965] disk 4, wo:1, o:0, dev:sdh Jul 3 06:37:00 dev-storage1 kernel: [58191.363967] disk 5, wo:0, o:1, dev:sdf Jul 3 06:37:00 dev-storage1 kernel: [58191.363970] disk 6, wo:0, o:1, dev:sdi Jul 3 06:37:00 dev-storage1 kernel: [58191.363972] disk 7, wo:0, o:1, dev:sdg Jul 3 06:37:00 dev-storage1 kernel: [58191.363974] disk 8, wo:0, o:1, dev:sdj Jul 3 06:37:00 dev-storage1 kernel: [58191.363976] disk 9, wo:0, o:1, dev:sdk Jul 3 06:37:00 dev-storage1 kernel: [58191.363979] disk 10, wo:0, o:1, dev:sdl Jul 3 06:37:00 dev-storage1 kernel: [58191.363981] disk 11, wo:0, o:1, dev:sdm Jul 3 06:37:00 dev-storage1 kernel: [58191.364014] RAID10 conf printout: Jul 3 06:37:00 dev-storage1 kernel: [58191.364018] --- wd:10 rd:12 Jul 3 06:37:00 dev-storage1 kernel: [58191.364021] disk 0, wo:0, o:1, dev:sdb Jul 3 06:37:00 dev-storage1 kernel: [58191.364024] disk 2, wo:0, o:1, dev:sdc Jul 3 06:37:00 dev-storage1 kernel: [58191.364026] disk 3, wo:0, o:1, dev:sdd Jul 3 06:37:00 dev-storage1 kernel: [58191.364028] disk 5, wo:0, o:1, dev:sdf Jul 3 06:37:00 dev-storage1 kernel: [58191.364030] disk 6, wo:0, o:1, dev:sdi Jul 3 06:37:00 dev-storage1 kernel: [58191.364033] disk 7, wo:0, o:1, dev:sdg Jul 3 06:37:00 dev-storage1 kernel: [58191.364035] disk 8, wo:0, o:1, dev:sdj Jul 3 06:37:00 dev-storage1 kernel: [58191.364037] disk 9, wo:0, o:1, dev:sdk Jul 3 06:37:00 dev-storage1 kernel: [58191.364039] disk 10, wo:0, o:1, dev:sdl Jul 3 06:37:00 dev-storage1 kernel: [58191.364041] disk 11, wo:0, o:1, dev:sdm Jul 3 06:37:14 dev-storage1 kernel: [58204.853102] scsi 4:0:9:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 3 06:37:14 dev-storage1 kernel: [58204.853112] scsi 4:0:9:0: SATA: handle(0x000f), sas_addr(0x4433221105000000), phy(5), device_name(0x5000c5004a44edbe) Jul 3 06:37:14 dev-storage1 kernel: [58204.853116] scsi 4:0:9:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(6) Jul 3 06:37:14 dev-storage1 kernel: [58204.853292] scsi 4:0:9:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 3 06:37:14 dev-storage1 kernel: [58204.853299] scsi 4:0:9:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 3 06:37:14 dev-storage1 kernel: [58204.853491] sd 4:0:9:0: Attached scsi generic sg7 type 0 Jul 3 06:37:14 dev-storage1 kernel: [58204.853882] sd 4:0:9:0: [sdo] physical block alignment offset: 4096 Jul 3 06:37:14 dev-storage1 kernel: [58204.853892] sd 4:0:9:0: [sdo] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 3 06:37:14 dev-storage1 kernel: [58204.853897] sd 4:0:9:0: [sdo] 4096-byte physical blocks Jul 3 06:37:14 dev-storage1 kernel: [58204.920533] sd 4:0:9:0: [sdo] Write Protect is off Jul 3 06:37:14 dev-storage1 kernel: [58204.920539] sd 4:0:9:0: [sdo] Mode Sense: 7f 00 00 08 Jul 3 06:37:14 dev-storage1 kernel: [58204.949593] sd 4:0:9:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 3 06:37:14 dev-storage1 kernel: [58205.077376] sdo: unknown partition table Jul 3 06:37:14 dev-storage1 kernel: [58205.145744] sd 4:0:9:0: [sdo] Attached SCSI disk Jul 3 06:38:15 dev-storage1 kernel: [58266.076919] sd 4:0:4:0: [sdf] Synchronizing SCSI cache Jul 3 06:38:15 dev-storage1 kernel: [58266.077767] sd 4:0:4:0: [sdf] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 3 06:38:15 dev-storage1 kernel: [58266.078218] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221104000000) Jul 3 06:38:15 dev-storage1 kernel: [58266.078681] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:16 dev-storage1 kernel: [58266.356847] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:16 dev-storage1 kernel: [58266.381809] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:18 dev-storage1 kernel: [58268.351226] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:18 dev-storage1 kernel: [58268.381174] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:18 dev-storage1 kernel: [58268.606432] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:18 dev-storage1 kernel: [58268.643731] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:20 dev-storage1 kernel: [58270.676566] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:20 dev-storage1 kernel: [58270.727079] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:20 dev-storage1 kernel: [58270.951730] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:20 dev-storage1 kernel: [58270.986091] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:22 dev-storage1 kernel: [58273.021826] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:22 dev-storage1 kernel: [58273.100664] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:22 dev-storage1 kernel: [58273.122092] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:23 dev-storage1 kernel: [58273.324949] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:23 dev-storage1 kernel: [58273.351418] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:24 dev-storage1 kernel: [58274.318169] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:24 dev-storage1 kernel: [58274.351982] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:24 dev-storage1 kernel: [58274.609269] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:24 dev-storage1 kernel: [58274.636725] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:29 dev-storage1 kernel: [58279.573103] scsi 4:0:10:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 3 06:38:29 dev-storage1 kernel: [58279.573113] scsi 4:0:10:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x5000c5004a123b93) Jul 3 06:38:29 dev-storage1 kernel: [58279.573117] scsi 4:0:10:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(7) Jul 3 06:38:29 dev-storage1 kernel: [58279.573252] scsi 4:0:10:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 3 06:38:29 dev-storage1 kernel: [58279.573257] scsi 4:0:10:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 3 06:38:29 dev-storage1 kernel: [58279.573450] sd 4:0:10:0: Attached scsi generic sg5 type 0 Jul 3 06:38:29 dev-storage1 kernel: [58279.573754] sd 4:0:10:0: [sdp] physical block alignment offset: 4096 Jul 3 06:38:29 dev-storage1 kernel: [58279.573764] sd 4:0:10:0: [sdp] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 3 06:38:29 dev-storage1 kernel: [58279.573768] sd 4:0:10:0: [sdp] 4096-byte physical blocks Jul 3 06:38:29 dev-storage1 kernel: [58279.626371] sd 4:0:10:0: [sdp] Write Protect is off Jul 3 06:38:29 dev-storage1 kernel: [58279.626376] sd 4:0:10:0: [sdp] Mode Sense: 7f 00 00 08 Jul 3 06:38:29 dev-storage1 kernel: [58279.627052] sd 4:0:10:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 3 06:38:29 dev-storage1 kernel: [58279.722780] sdp: unknown partition table Jul 3 06:38:29 dev-storage1 kernel: [58279.785171] sd 4:0:10:0: [sdp] Attached SCSI disk Jul 3 06:38:45 dev-storage1 kernel: [58296.071974] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:45 dev-storage1 kernel: [58296.104706] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:45 dev-storage1 kernel: [58296.125410] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:46 dev-storage1 kernel: [58296.347134] md: super_written gets error=-19, uptodate=0 Jul 3 06:38:46 dev-storage1 kernel: [58296.373160] md: super_written gets error=-19, uptodate=0 Jul 3 14:00:27 dev-storage1 kernel: [84721.683301] md: super_written gets error=-19, uptodate=0 Jul 3 14:00:27 dev-storage1 kernel: [84721.717956] md: super_written gets error=-19, uptodate=0 Jul 3 14:00:27 dev-storage1 kernel: [84721.974395] md: super_written gets error=-19, uptodate=0 Jul 3 14:00:27 dev-storage1 kernel: [84721.997145] md: super_written gets error=-19, uptodate=0 Jul 3 14:01:01 dev-storage1 kernel: [84755.610249] md: super_written gets error=-19, uptodate=0 Jul 3 14:01:01 dev-storage1 kernel: [84755.652918] md: super_written gets error=-19, uptodate=0 Jul 3 14:01:01 dev-storage1 kernel: [84755.673944] md: super_written gets error=-19, uptodate=0 Jul 3 14:01:01 dev-storage1 kernel: [84755.694312] quiet_error: 24 callbacks suppressed Jul 3 14:01:01 dev-storage1 kernel: [84755.694318] Buffer I/O error on device md127, logical block 2067791905 Jul 3 14:01:01 dev-storage1 kernel: [84755.694326] lost page write due to I/O error on md127 Jul 3 14:01:01 dev-storage1 kernel: [84755.897413] md: super_written gets error=-19, uptodate=0 Jul 3 14:01:01 dev-storage1 kernel: [84755.918495] md: super_written gets error=-19, uptodate=0 Jul 4 14:00:34 dev-storage1 kernel: [170882.380831] md: super_written gets error=-19, uptodate=0 Jul 4 14:00:34 dev-storage1 kernel: [170882.415605] md: super_written gets error=-19, uptodate=0 Jul 4 14:00:34 dev-storage1 kernel: [170882.671923] md: super_written gets error=-19, uptodate=0 Jul 4 14:00:34 dev-storage1 kernel: [170882.694429] md: super_written gets error=-19, uptodate=0 Jul 4 14:01:08 dev-storage1 kernel: [170916.387597] md: super_written gets error=-19, uptodate=0 Jul 4 14:01:08 dev-storage1 kernel: [170916.423033] md: super_written gets error=-19, uptodate=0 Jul 4 14:01:08 dev-storage1 kernel: [170916.444517] Buffer I/O error on device md127, logical block 2067791905 Jul 4 14:01:08 dev-storage1 kernel: [170916.444524] lost page write due to I/O error on md127 Jul 4 14:01:08 dev-storage1 kernel: [170916.646795] md: super_written gets error=-19, uptodate=0 Jul 4 14:01:08 dev-storage1 kernel: [170916.667910] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:07 dev-storage1 kernel: [229786.969375] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:07 dev-storage1 kernel: [229787.001133] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:07 dev-storage1 kernel: [229787.264457] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:07 dev-storage1 kernel: [229787.284359] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:41 dev-storage1 kernel: [229821.534543] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:41 dev-storage1 kernel: [229821.568091] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:42 dev-storage1 kernel: [229821.793719] md: super_written gets error=-19, uptodate=0 Jul 5 06:25:42 dev-storage1 kernel: [229821.821779] md: super_written gets error=-19, uptodate=0 Jul 5 14:00:41 dev-storage1 kernel: [257043.062362] md: super_written gets error=-19, uptodate=0 Jul 5 14:00:41 dev-storage1 kernel: [257043.096332] md: super_written gets error=-19, uptodate=0 Jul 5 14:00:41 dev-storage1 kernel: [257043.353502] md: super_written gets error=-19, uptodate=0 Jul 5 14:00:41 dev-storage1 kernel: [257043.372982] md: super_written gets error=-19, uptodate=0 Jul 5 14:01:15 dev-storage1 kernel: [257077.148922] md: super_written gets error=-19, uptodate=0 Jul 5 14:01:15 dev-storage1 kernel: [257077.183874] md: super_written gets error=-19, uptodate=0 Jul 5 14:01:15 dev-storage1 kernel: [257077.205482] Buffer I/O error on device md127, logical block 2067791905 Jul 5 14:01:15 dev-storage1 kernel: [257077.205491] lost page write due to I/O error on md127 Jul 5 14:01:15 dev-storage1 kernel: [257077.408139] md: super_written gets error=-19, uptodate=0 Jul 5 14:01:15 dev-storage1 kernel: [257077.430487] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:08 dev-storage1 kernel: [315941.667994] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:08 dev-storage1 kernel: [315941.701458] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:08 dev-storage1 kernel: [315941.959147] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:08 dev-storage1 kernel: [315941.980677] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:42 dev-storage1 kernel: [315975.882227] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:42 dev-storage1 kernel: [315975.911833] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:42 dev-storage1 kernel: [315976.141403] md: super_written gets error=-19, uptodate=0 Jul 6 06:25:42 dev-storage1 kernel: [315976.169041] md: super_written gets error=-19, uptodate=0 Jul 6 14:00:48 dev-storage1 kernel: [343203.728029] md: super_written gets error=-19, uptodate=0 Jul 6 14:00:48 dev-storage1 kernel: [343203.761823] md: super_written gets error=-19, uptodate=0 Jul 6 14:00:48 dev-storage1 kernel: [343204.019107] md: super_written gets error=-19, uptodate=0 Jul 6 14:00:48 dev-storage1 kernel: [343204.041347] md: super_written gets error=-19, uptodate=0 Jul 6 14:01:22 dev-storage1 kernel: [343238.117700] md: super_written gets error=-19, uptodate=0 Jul 6 14:01:22 dev-storage1 kernel: [343238.153053] md: super_written gets error=-19, uptodate=0 Jul 6 14:01:22 dev-storage1 kernel: [343238.173113] Buffer I/O error on device md127, logical block 2067791905 Jul 6 14:01:22 dev-storage1 kernel: [343238.173123] lost page write due to I/O error on md127 Jul 6 14:01:22 dev-storage1 kernel: [343238.372889] md: super_written gets error=-19, uptodate=0 Jul 6 14:01:22 dev-storage1 kernel: [343238.393954] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:07 dev-storage1 kernel: [402094.372432] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:07 dev-storage1 kernel: [402094.405765] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:07 dev-storage1 kernel: [402094.655548] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:07 dev-storage1 kernel: [402094.676942] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:42 dev-storage1 kernel: [402129.208784] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:42 dev-storage1 kernel: [402129.244327] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:42 dev-storage1 kernel: [402129.467991] md: super_written gets error=-19, uptodate=0 Jul 7 06:25:42 dev-storage1 kernel: [402129.491687] md: super_written gets error=-19, uptodate=0 Jul 7 14:00:55 dev-storage1 kernel: [429364.393602] md: super_written gets error=-19, uptodate=0 Jul 7 14:00:55 dev-storage1 kernel: [429364.443065] md: super_written gets error=-19, uptodate=0 Jul 7 14:01:37 dev-storage1 kernel: [429406.619062] sd 4:0:2:0: [sdd] Synchronizing SCSI cache Jul 7 14:01:37 dev-storage1 kernel: [429406.619127] sd 4:0:2:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 7 14:01:37 dev-storage1 kernel: [429406.619415] mpt2sas0: removing handle(0x000a), sas_addr(0x4433221102000000) Jul 7 14:01:37 dev-storage1 kernel: [429406.860149] md: super_written gets error=-19, uptodate=0 Jul 7 14:01:37 dev-storage1 kernel: [429406.860155] md/raid10:md127: Disk failure on sdd, disabling device. Jul 7 14:01:37 dev-storage1 kernel: [429406.860157] md/raid10:md127: Operation continuing on 9 devices. Jul 7 14:01:37 dev-storage1 kernel: [429406.860177] md: super_written gets error=-19, uptodate=0 Jul 7 14:01:37 dev-storage1 kernel: [429406.888238] md: super_written gets error=-19, uptodate=0 Jul 7 14:01:37 dev-storage1 kernel: [429406.909143] md: super_written gets error=-19, uptodate=0 Jul 7 14:01:37 dev-storage1 kernel: [429406.930558] md: super_written gets error=-19, uptodate=0 Jul 7 14:01:37 dev-storage1 kernel: [429406.951439] RAID10 conf printout: Jul 7 14:01:37 dev-storage1 kernel: [429406.951443] --- wd:9 rd:12 Jul 7 14:01:37 dev-storage1 kernel: [429406.951446] disk 0, wo:0, o:1, dev:sdb Jul 7 14:01:37 dev-storage1 kernel: [429406.951449] disk 2, wo:0, o:1, dev:sdc Jul 7 14:01:37 dev-storage1 kernel: [429406.951451] disk 3, wo:1, o:0, dev:sdd Jul 7 14:01:37 dev-storage1 kernel: [429406.951453] disk 5, wo:0, o:1, dev:sdf Jul 7 14:01:37 dev-storage1 kernel: [429406.951456] disk 6, wo:0, o:1, dev:sdi Jul 7 14:01:37 dev-storage1 kernel: [429406.951458] disk 7, wo:0, o:1, dev:sdg Jul 7 14:01:37 dev-storage1 kernel: [429406.951460] disk 8, wo:0, o:1, dev:sdj Jul 7 14:01:37 dev-storage1 kernel: [429406.951462] disk 9, wo:0, o:1, dev:sdk Jul 7 14:01:37 dev-storage1 kernel: [429406.951465] disk 10, wo:0, o:1, dev:sdl Jul 7 14:01:37 dev-storage1 kernel: [429406.951467] disk 11, wo:0, o:1, dev:sdm Jul 7 14:01:37 dev-storage1 kernel: [429406.951527] RAID10 conf printout: Jul 7 14:01:37 dev-storage1 kernel: [429406.951532] --- wd:9 rd:12 Jul 7 14:01:37 dev-storage1 kernel: [429406.951535] disk 0, wo:0, o:1, dev:sdb Jul 7 14:01:37 dev-storage1 kernel: [429406.951537] disk 2, wo:0, o:1, dev:sdc Jul 7 14:01:37 dev-storage1 kernel: [429406.951540] disk 5, wo:0, o:1, dev:sdf Jul 7 14:01:37 dev-storage1 kernel: [429406.951542] disk 6, wo:0, o:1, dev:sdi Jul 7 14:01:37 dev-storage1 kernel: [429406.951544] disk 7, wo:0, o:1, dev:sdg Jul 7 14:01:37 dev-storage1 kernel: [429406.951546] disk 8, wo:0, o:1, dev:sdj Jul 7 14:01:37 dev-storage1 kernel: [429406.951549] disk 9, wo:0, o:1, dev:sdk Jul 7 14:01:37 dev-storage1 kernel: [429406.951551] disk 10, wo:0, o:1, dev:sdl Jul 7 14:01:37 dev-storage1 kernel: [429406.951553] disk 11, wo:0, o:1, dev:sdm Jul 7 14:01:51 dev-storage1 kernel: [429420.898133] scsi 4:0:11:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 7 14:01:51 dev-storage1 kernel: [429420.898143] scsi 4:0:11:0: SATA: handle(0x000a), sas_addr(0x4433221102000000), phy(2), device_name(0x5000c5004a44f42a) Jul 7 14:01:51 dev-storage1 kernel: [429420.898147] scsi 4:0:11:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(1) Jul 7 14:01:51 dev-storage1 kernel: [429420.898290] scsi 4:0:11:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 7 14:01:51 dev-storage1 kernel: [429420.898297] scsi 4:0:11:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 7 14:01:51 dev-storage1 kernel: [429420.898514] sd 4:0:11:0: Attached scsi generic sg3 type 0 Jul 7 14:01:51 dev-storage1 kernel: [429420.898858] sd 4:0:11:0: [sdq] physical block alignment offset: 4096 Jul 7 14:01:51 dev-storage1 kernel: [429420.898867] sd 4:0:11:0: [sdq] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 7 14:01:51 dev-storage1 kernel: [429420.898872] sd 4:0:11:0: [sdq] 4096-byte physical blocks Jul 7 14:01:51 dev-storage1 kernel: [429420.957177] sd 4:0:11:0: [sdq] Write Protect is off Jul 7 14:01:51 dev-storage1 kernel: [429420.957183] sd 4:0:11:0: [sdq] Mode Sense: 7f 00 00 08 Jul 7 14:01:51 dev-storage1 kernel: [429420.957882] sd 4:0:11:0: [sdq] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 7 14:01:51 dev-storage1 kernel: [429421.030777] sdq: unknown partition table Jul 7 14:01:51 dev-storage1 kernel: [429421.099310] sd 4:0:11:0: [sdq] Attached SCSI disk Jul 7 14:02:07 dev-storage1 kernel: [429436.579265] md: super_written gets error=-19, uptodate=0 Jul 7 14:02:07 dev-storage1 kernel: [429436.612518] md: super_written gets error=-19, uptodate=0 Jul 7 14:02:50 dev-storage1 kernel: [429479.409214] sd 4:0:1:0: [sdc] Synchronizing SCSI cache Jul 7 14:02:50 dev-storage1 kernel: [429479.409290] sd 4:0:1:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 7 14:02:50 dev-storage1 kernel: [429479.409297] Buffer I/O error on device md127, logical block 2067791905 Jul 7 14:02:50 dev-storage1 kernel: [429479.409300] lost page write due to I/O error on md127 Jul 7 14:02:50 dev-storage1 kernel: [429479.409502] mpt2sas0: removing handle(0x0009), sas_addr(0x4433221101000000) Jul 7 14:02:50 dev-storage1 kernel: [429479.612173] md: super_written gets error=-19, uptodate=0 Jul 7 14:02:50 dev-storage1 kernel: [429479.612192] md: super_written gets error=-19, uptodate=0 Jul 7 14:02:50 dev-storage1 kernel: [429479.639212] md: super_written gets error=-19, uptodate=0 Jul 7 14:02:50 dev-storage1 kernel: [429479.639222] md: super_written gets error=-19, uptodate=0 Jul 7 14:03:04 dev-storage1 kernel: [429493.162396] scsi 4:0:12:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 7 14:03:04 dev-storage1 kernel: [429493.162407] scsi 4:0:12:0: SATA: handle(0x0009), sas_addr(0x4433221101000000), phy(1), device_name(0x5000c5004a46ceca) Jul 7 14:03:04 dev-storage1 kernel: [429493.162411] scsi 4:0:12:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(2) Jul 7 14:03:04 dev-storage1 kernel: [429493.162518] scsi 4:0:12:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 7 14:03:04 dev-storage1 kernel: [429493.162526] scsi 4:0:12:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 7 14:03:04 dev-storage1 kernel: [429493.162782] sd 4:0:12:0: Attached scsi generic sg2 type 0 Jul 7 14:03:04 dev-storage1 kernel: [429493.163136] sd 4:0:12:0: [sdr] physical block alignment offset: 4096 Jul 7 14:03:04 dev-storage1 kernel: [429493.163143] sd 4:0:12:0: [sdr] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 7 14:03:04 dev-storage1 kernel: [429493.163146] sd 4:0:12:0: [sdr] 4096-byte physical blocks Jul 7 14:03:04 dev-storage1 kernel: [429493.217763] sd 4:0:12:0: [sdr] Write Protect is off Jul 7 14:03:04 dev-storage1 kernel: [429493.217768] sd 4:0:12:0: [sdr] Mode Sense: 7f 00 00 08 Jul 7 14:03:04 dev-storage1 kernel: [429493.218501] sd 4:0:12:0: [sdr] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 7 14:03:04 dev-storage1 kernel: [429493.289491] sdr: unknown partition table Jul 7 14:03:04 dev-storage1 kernel: [429493.351632] sd 4:0:12:0: [sdr] Attached SCSI disk Jul 7 15:38:20 dev-storage1 kernel: [435193.023742] sd 4:0:10:0: [sdp] Synchronizing SCSI cache Jul 7 15:38:20 dev-storage1 kernel: [435193.023794] sd 4:0:10:0: [sdp] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 7 15:38:20 dev-storage1 kernel: [435193.024209] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221104000000) Jul 7 15:38:29 dev-storage1 kernel: [435202.541557] scsi 4:0:13:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 7 15:38:29 dev-storage1 kernel: [435202.541567] scsi 4:0:13:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x5000c5004a123b93) Jul 7 15:38:29 dev-storage1 kernel: [435202.541571] scsi 4:0:13:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(7) Jul 7 15:38:29 dev-storage1 kernel: [435202.541693] scsi 4:0:13:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 7 15:38:29 dev-storage1 kernel: [435202.541699] scsi 4:0:13:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 7 15:38:29 dev-storage1 kernel: [435202.541978] sd 4:0:13:0: Attached scsi generic sg5 type 0 Jul 7 15:38:29 dev-storage1 kernel: [435202.542230] sd 4:0:13:0: [sdp] physical block alignment offset: 4096 Jul 7 15:38:29 dev-storage1 kernel: [435202.542240] sd 4:0:13:0: [sdp] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 7 15:38:29 dev-storage1 kernel: [435202.542245] sd 4:0:13:0: [sdp] 4096-byte physical blocks Jul 7 15:38:30 dev-storage1 kernel: [435202.594827] sd 4:0:13:0: [sdp] Write Protect is off Jul 7 15:38:30 dev-storage1 kernel: [435202.594832] sd 4:0:13:0: [sdp] Mode Sense: 7f 00 00 08 Jul 7 15:38:30 dev-storage1 kernel: [435202.595551] sd 4:0:13:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 7 15:38:30 dev-storage1 kernel: [435202.666733] sdp: unknown partition table Jul 7 15:38:30 dev-storage1 kernel: [435202.728676] sd 4:0:13:0: [sdp] Attached SCSI disk Jul 7 15:48:21 dev-storage1 kernel: [435792.305634] sd 4:0:13:0: [sdp] Synchronizing SCSI cache Jul 7 15:48:21 dev-storage1 kernel: [435792.305685] sd 4:0:13:0: [sdp] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 7 15:48:21 dev-storage1 kernel: [435792.305961] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221104000000) Jul 7 15:48:21 dev-storage1 kernel: [435792.307158] sd 4:0:9:0: [sdo] Synchronizing SCSI cache Jul 7 15:48:21 dev-storage1 kernel: [435792.307201] sd 4:0:9:0: [sdo] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jul 7 15:48:21 dev-storage1 kernel: [435792.307477] mpt2sas0: removing handle(0x000f), sas_addr(0x4433221105000000) Jul 7 15:48:30 dev-storage1 kernel: [435801.077761] scsi 4:0:14:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 7 15:48:30 dev-storage1 kernel: [435801.077773] scsi 4:0:14:0: SATA: handle(0x000f), sas_addr(0x4433221105000000), phy(5), device_name(0x5000c5004a44edbe) Jul 7 15:48:30 dev-storage1 kernel: [435801.077780] scsi 4:0:14:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(6) Jul 7 15:48:30 dev-storage1 kernel: [435801.077876] scsi 4:0:14:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 7 15:48:30 dev-storage1 kernel: [435801.077884] scsi 4:0:14:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 7 15:48:30 dev-storage1 kernel: [435801.078164] sd 4:0:14:0: Attached scsi generic sg5 type 0 Jul 7 15:48:30 dev-storage1 kernel: [435801.078467] sd 4:0:14:0: [sdo] physical block alignment offset: 4096 Jul 7 15:48:30 dev-storage1 kernel: [435801.078476] sd 4:0:14:0: [sdo] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 7 15:48:30 dev-storage1 kernel: [435801.078481] sd 4:0:14:0: [sdo] 4096-byte physical blocks Jul 7 15:48:30 dev-storage1 kernel: [435801.134074] sd 4:0:14:0: [sdo] Write Protect is off Jul 7 15:48:30 dev-storage1 kernel: [435801.134079] sd 4:0:14:0: [sdo] Mode Sense: 7f 00 00 08 Jul 7 15:48:30 dev-storage1 kernel: [435801.134786] sd 4:0:14:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 7 15:48:30 dev-storage1 kernel: [435801.208179] sdo: unknown partition table Jul 7 15:48:30 dev-storage1 kernel: [435801.276258] sd 4:0:14:0: [sdo] Attached SCSI disk Jul 7 15:48:30 dev-storage1 kernel: [435801.822171] scsi 4:0:15:0: Direct-Access ATA ST3000DM001-9YN1 CC4C PQ: 0 ANSI: 6 Jul 7 15:48:30 dev-storage1 kernel: [435801.822180] scsi 4:0:15:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x5000c5004a123b93) Jul 7 15:48:30 dev-storage1 kernel: [435801.822185] scsi 4:0:15:0: SATA: enclosure_logical_id(0x500605b00448c4f0), slot(7) Jul 7 15:48:30 dev-storage1 kernel: [435801.822272] scsi 4:0:15:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Jul 7 15:48:30 dev-storage1 kernel: [435801.822279] scsi 4:0:15:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) Jul 7 15:48:30 dev-storage1 kernel: [435801.822589] sd 4:0:15:0: Attached scsi generic sg7 type 0 Jul 7 15:48:30 dev-storage1 kernel: [435801.822886] sd 4:0:15:0: [sdp] physical block alignment offset: 4096 Jul 7 15:48:30 dev-storage1 kernel: [435801.822896] sd 4:0:15:0: [sdp] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 7 15:48:30 dev-storage1 kernel: [435801.822900] sd 4:0:15:0: [sdp] 4096-byte physical blocks Jul 7 15:48:30 dev-storage1 kernel: [435801.875888] sd 4:0:15:0: [sdp] Write Protect is off Jul 7 15:48:30 dev-storage1 kernel: [435801.875893] sd 4:0:15:0: [sdp] Mode Sense: 7f 00 00 08 Jul 7 15:48:30 dev-storage1 kernel: [435801.876484] sd 4:0:15:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 7 15:48:31 dev-storage1 kernel: [435801.947660] sdp: unknown partition table Jul 7 15:48:31 dev-storage1 kernel: [435802.009745] sd 4:0:15:0: [sdp] Attached SCSI disk Jul 7 15:49:14 dev-storage1 kernel: [435845.153305] sd 4:0:14:0: [sdo] Synchronizing SCSI cache Jul 7 15:49:14 dev-storage1 kernel: [435845.153356] sd 4:0:14:0: [sdo] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK ... snip rest ... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 7:58 ` Brian Candler @ 2012-07-11 8:27 ` Christian Balzer 2012-07-11 9:09 ` Brian Candler ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Christian Balzer @ 2012-07-11 8:27 UTC (permalink / raw) To: linux-raid; +Cc: Brian Candler On Wed, 11 Jul 2012 08:58:18 +0100 Brian Candler wrote: [snip] > > BTW these are all Seagate ST3000DM001. Yes, I know :-( > Indeed, there is your problem. And on a LSI controller (which one?) to boot. ^o^ Though the later part should be fine with a kernel as new as yours. The new STxxxxM drives from Seagate are <expletive deleted>. They're wonderfully fast, but you absolutely can NOT use them in any HW RAID until they get a non-braindead firmware that won't park (look at the Load_Cycle_Count in SMART) the heads every 30 seconds, come rain or shine. Not only will this wear out the drives in any remotely busy scenario, but it will also cause them to be considered off-line by the SATA controller in the right (wrong) circumstances, leading to exactly what you're seeing here. I experienced the same thing and have switched to Hitachi drives for the foreseeable future, which seem from one year of experience to be of far higher quality/reliability anyway. These Seagates are also suffering from quality control issues and large DOA and early death rates. With direct attached drives that you can issue hdparm commands to, you can "fix" this deadly behavior by issuing an "apm = 255" command to them (in hdparm.conf, needs to be done on each boot...). [snip] Regards, Christian -- Christian Balzer Network/Systems Engineer chibi@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 8:27 ` Christian Balzer @ 2012-07-11 9:09 ` Brian Candler 2012-07-11 10:32 ` Mikael Abrahamsson 2012-07-11 10:44 ` Roman Mamedov 2012-07-13 18:52 ` Brian Candler 2 siblings, 1 reply; 16+ messages in thread From: Brian Candler @ 2012-07-11 9:09 UTC (permalink / raw) To: Christian Balzer; +Cc: linux-raid On Wed, Jul 11, 2012 at 05:27:42PM +0900, Christian Balzer wrote: > > BTW these are all Seagate ST3000DM001. Yes, I know :-( > > > Indeed, there is your problem. And on a LSI controller (which one?) to > boot. ^o^ > Though the later part should be fine with a kernel as new as yours. The drives were only bought because the supplier was out of Hitachis, and we didn't realise the Seagates don't have ERC. This is why the Hitachis have moved to production, and I'm stuck with the Seagates on the dev systems :-( > (look at the Load_Cycle_Count in SMART) 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 490 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 549 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 516 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 505 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 76 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 77 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 502 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 495 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 550 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 562 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 532 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 556 Ugh. (Two have a lower count, but maybe those are bad...) > With direct attached drives that you can issue hdparm commands to, you can > "fix" this deadly behavior by issuing an "apm = 255" command to them (in > hdparm.conf, needs to be done on each boot...). Thanks, rc.local now has: # Set Error Recovery Control if drive supports it for i in /dev/sd*; do /usr/sbin/smartctl -l scterc,70,70 $i >/dev/null; done # Stop drives from spinning down for i in /dev/sd*; do hdparm -q -B255 $i; done Cheers, Brian. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 9:09 ` Brian Candler @ 2012-07-11 10:32 ` Mikael Abrahamsson 2012-07-11 10:47 ` Brian Candler 0 siblings, 1 reply; 16+ messages in thread From: Mikael Abrahamsson @ 2012-07-11 10:32 UTC (permalink / raw) To: Brian Candler; +Cc: Christian Balzer, linux-raid On Wed, 11 Jul 2012, Brian Candler wrote: >> (look at the Load_Cycle_Count in SMART) > > 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 490 That is really low: $ sudo smartctl -a /dev/sdd | grep -i load 193 Load_Cycle_Count 0x0032 052 052 000 Old_age Always - 444425 Most drives are rated for a load cycle count of 200-600k. All mine with high load-cycle-count are WD20EARS. WD20EADS doesn't do this. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 10:32 ` Mikael Abrahamsson @ 2012-07-11 10:47 ` Brian Candler 0 siblings, 0 replies; 16+ messages in thread From: Brian Candler @ 2012-07-11 10:47 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Christian Balzer, linux-raid On Wed, Jul 11, 2012 at 12:32:28PM +0200, Mikael Abrahamsson wrote: > That is really low: > > $ sudo smartctl -a /dev/sdd | grep -i load > 193 Load_Cycle_Count 0x0032 052 052 000 Old_age Always - 444425 The drives had -B 254 before I changed them to -B 255. According to hdparm(8) that should be the least aggressive power management. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 8:27 ` Christian Balzer 2012-07-11 9:09 ` Brian Candler @ 2012-07-11 10:44 ` Roman Mamedov 2012-07-11 17:21 ` Christian Balzer 2012-07-13 18:52 ` Brian Candler 2 siblings, 1 reply; 16+ messages in thread From: Roman Mamedov @ 2012-07-11 10:44 UTC (permalink / raw) To: Christian Balzer; +Cc: linux-raid, Brian Candler [-- Attachment #1: Type: text/plain, Size: 917 bytes --] On Wed, 11 Jul 2012 17:27:42 +0900 Christian Balzer <chibi@gol.com> wrote: > RAID until they get a non-braindead firmware that won't park (look at the > Load_Cycle_Count in SMART) the heads every 30 seconds, come rain or shine. So I seem to have such firmware, what do I win? Device Model: ST1000DM003-9YN162 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2796 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 326 or maybe.... # hdparm -B /dev/sda /dev/sda: APM_level = 128 See "man hdparm" about the "-B" switch and check what is the value on your drives. Mine was at 128 by default btw, I did not have to change this value manually. -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free." [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 10:44 ` Roman Mamedov @ 2012-07-11 17:21 ` Christian Balzer 0 siblings, 0 replies; 16+ messages in thread From: Christian Balzer @ 2012-07-11 17:21 UTC (permalink / raw) To: Roman Mamedov; +Cc: linux-raid, Brian Candler On Wed, 11 Jul 2012 16:44:41 +0600 Roman Mamedov wrote: > On Wed, 11 Jul 2012 17:27:42 +0900 > Christian Balzer <chibi@gol.com> wrote: > > > RAID until they get a non-braindead firmware that won't park (look at > > the Load_Cycle_Count in SMART) the heads every 30 seconds, come rain > > or shine. > > So I seem to have such firmware, what do I win? > An inflatable washer machine... > Device Model: ST1000DM003-9YN162 > > 9 Power_On_Hours 0x0032 097 097 000 Old_age > Always - 2796 193 Load_Cycle_Count 0x0032 100 > 100 000 Old_age Always - 326 > Read the respective threads on the Seagate forums. If your drive is not busy most of the time, once the heads get parked (unloaded), they will stay that way, until it gets accessed again. However if the drive is busy every second or so, the aforementioned statement holds. Device Model: ST2000DM001-9YN164 Firmware Version: CC4C 193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 2419 Those were accumulated in the first 2 days or so of that drive, before I set the APM level to 255. -- Christian Balzer Network/Systems Engineer chibi@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-11 8:27 ` Christian Balzer 2012-07-11 9:09 ` Brian Candler 2012-07-11 10:44 ` Roman Mamedov @ 2012-07-13 18:52 ` Brian Candler 2 siblings, 0 replies; 16+ messages in thread From: Brian Candler @ 2012-07-13 18:52 UTC (permalink / raw) To: Christian Balzer; +Cc: linux-raid OK, after reseating drives and removing the three definitely bad ones, I think the hardware is stable again now. So now I have a problem with the five-drive array I had set up in the mean time. All five drives are there, but one is a bit behind the others in its event count and last update time. Here's the mdadm --examine output: /dev/sdb: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7 Name : storage1.2 Creation Time : Wed Jul 11 14:50:06 2012 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 17581590528 (8383.56 GiB 9001.77 GB) Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 56e9ce91:c5df8850:2105c86d:c9c710a1 Internal Bitmap : 8 sectors from superblock Update Time : Wed Jul 11 15:19:31 2012 Checksum : 80c0762 - correct Events : 276 Layout : left-symmetric Chunk Size : 1024K Device Role : Active device 0 Array State : AAAAA ('A' == active, '.' == missing) /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7 Name : storage1.2 Creation Time : Wed Jul 11 14:50:06 2012 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 17581590528 (8383.56 GiB 9001.77 GB) Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : db72c8d7:672760b4:572dc944:fc7c151b Internal Bitmap : 8 sectors from superblock Update Time : Wed Jul 11 15:29:52 2012 Checksum : 11ec5fef - correct Events : 357 Layout : left-symmetric Chunk Size : 1024K Device Role : Active device 1 Array State : .AAAA ('A' == active, '.' == missing) /dev/sdk: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7 Name : storage1.2 Creation Time : Wed Jul 11 14:50:06 2012 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 17581590528 (8383.56 GiB 9001.77 GB) Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : b12fefdd:74914e6e:9f3ca2bd:8b433e34 Internal Bitmap : 8 sectors from superblock Update Time : Wed Jul 11 15:29:52 2012 Checksum : 64035caa - correct Events : 357 Layout : left-symmetric Chunk Size : 1024K Device Role : Active device 2 Array State : .AAAA ('A' == active, '.' == missing) /dev/sdl: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7 Name : storage1.2 Creation Time : Wed Jul 11 14:50:06 2012 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 17581590528 (8383.56 GiB 9001.77 GB) Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : db387f8a:383c26f4:4012a3ec:12c7679e Internal Bitmap : 8 sectors from superblock Update Time : Wed Jul 11 15:29:52 2012 Checksum : 2f9569c2 - correct Events : 357 Layout : left-symmetric Chunk Size : 1024K Device Role : Active device 3 Array State : .AAAA ('A' == active, '.' == missing) /dev/sdm: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 149c0025:e7c5da3a:62b7a318:4ca57af7 Name : storage1.2 Creation Time : Wed Jul 11 14:50:06 2012 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 17581590528 (8383.56 GiB 9001.77 GB) Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : ac50fe77:91ce387a:e819a38d:4d56a734 Internal Bitmap : 8 sectors from superblock Update Time : Wed Jul 11 15:29:52 2012 Checksum : da66aace - correct Events : 357 Layout : left-symmetric Chunk Size : 1024K Device Role : Active device 4 Array State : .AAAA ('A' == active, '.' == missing) Now, a simple assemble fails: root@dev-storage1:~# mdadm --assemble /dev/md/storage1.2 /dev/sd{b,j,k,l,m} mdadm: /dev/md/storage1.2 assembled from 4 drives - not enough to start the array while not clean - consider --force. root@dev-storage1:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : inactive sdj[1](S) sdm[4](S) sdl[3](S) sdk[2](S) sdb[0](S) 14651327800 blocks super 1.2 unused devices: <none> (Well, md127 exists, but I don't know how to "start" it). So let's try using --force as it suggests: root@dev-storage1:~# mdadm -S /dev/md127 mdadm: stopped /dev/md127 root@dev-storage1:~# mdadm --assemble --force /dev/md/storage1.2 /dev/sd{b,j,k,l,m} mdadm: /dev/md/storage1.2 has been started with 4 drives (out of 5). root@dev-storage1:~# cat /proc/mdstatPersonalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid6 sdj[1] sdm[4] sdl[3] sdk[2] 8790795264 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [5/4] [_UUUU] bitmap: 22/22 pages [88KB], 65536KB chunk unused devices: <none> root@dev-storage1:~# Now I have a 4-drive degraded RAID6, /dev/sdb isn't even listed (even though I gave it on the command line). Is this correct? Is the next thing to do to add the 5th drive into it manually? root@dev-storage1:~# mdadm --manage --re-add /dev/md127 /dev/sdb mdadm: re-added /dev/sdb root@dev-storage1:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid6 sdb[0] sdj[1] sdm[4] sdl[3] sdk[2] 8790795264 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [5/4] [_UUUU] [>....................] recovery = 1.1% (32854540/2930265088) finish=952.5min speed=50692K/sec bitmap: 22/22 pages [88KB], 65536KB chunk unused devices: <none> That seems to have worked, can someone just confirm that's the right sequence of things to do though. This is a test system, next time I do this might be for real :-) Cheers, Brian. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure 2012-07-10 16:33 Assembly failure Brian Candler 2012-07-10 16:48 ` Sebastian Riemer @ 2012-07-10 17:05 ` pants 1 sibling, 0 replies; 16+ messages in thread From: pants @ 2012-07-10 17:05 UTC (permalink / raw) To: linux-raid On Tue, Jul 10, 2012 at 05:33:45PM +0100, Brian Candler wrote: > metadata (see below) suggests that some drives think members 1/3/4 are > missing, but those drives think the array is fine. The "Events" counts are > different on some members though. I have had this problem before; in fact, it is the usual behavior when a drive begins to fail. If the three drives in question fail to assemble, it is usually because they aren't readable/writable by your system, and therefore can't have their metadata changed to reflect the degenerate state of the array. I would check the SMART status of the drives and look into your logs to see if any ATA errors exist, but my suspicion is that, at assembly, none of those drives was talking to your system. If you feel that the drives are fine and that this is some random fluke, you can simply add the drives back to the array (you may have to wipe their metadata blocks) while using --assume-clean to ensure that the data on the newly added drives is kept. Good luck! pants. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Assembly failure @ 2012-07-13 20:34 Richard Scobie 0 siblings, 0 replies; 16+ messages in thread From: Richard Scobie @ 2012-07-13 20:34 UTC (permalink / raw) To: Linux RAID Mailing List Brian Candler wrote: ------------------------------------ One final point. I would like to be able to monitor for suspect or failed drives. Is my best bet to look at /proc/mdstat output and identify drives which have been kicked out of the array? Or to monitor SMART variables (in that case though I need to decide which ones are the most important to monitor, and what thresholds to set)? ------------------------------------------- For years I have used smartd without issues and it will log and email anomalies as they occur. It is also advisable to regularly "scrub" all md devices, to flush out faulty sectors: echo check > /sys/block/mdX/md/sync_action See Documentation/md.txt for details. Regards, Richard ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-07-13 20:34 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-10 16:33 Assembly failure Brian Candler 2012-07-10 16:48 ` Sebastian Riemer 2012-07-10 17:06 ` Brian Candler 2012-07-10 17:38 ` Sebastian Riemer 2012-07-10 18:59 ` Brian Candler 2012-07-11 2:43 ` NeilBrown 2012-07-11 7:58 ` Brian Candler 2012-07-11 8:27 ` Christian Balzer 2012-07-11 9:09 ` Brian Candler 2012-07-11 10:32 ` Mikael Abrahamsson 2012-07-11 10:47 ` Brian Candler 2012-07-11 10:44 ` Roman Mamedov 2012-07-11 17:21 ` Christian Balzer 2012-07-13 18:52 ` Brian Candler 2012-07-10 17:05 ` pants -- strict thread matches above, loose matches on Subject: below -- 2012-07-13 20:34 Richard Scobie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).