* RAID1 member mysteriously failing on 3.8+
@ 2013-04-15 8:56 Roman Mamedov
0 siblings, 0 replies; only message in thread
From: Roman Mamedov @ 2013-04-15 8:56 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3555 bytes --]
Hello,
Continuing on the dangerous and exciting journey with trying to upgrade my
system from a 3.7.10 kernel to 3.8.7, I face the following problem.
In a RAID1 array of an SSD and a HDD marked as write-mostly, mdadm at some
point just randomly decides that a device has failed, despite NO dmesg
messages that would confirm that anything at all happened to the device (at 133s).
Then I notice this, remove(413s) and re-add(418s) the device. It starts
rebuilding, but just after 10 seconds, "fails" again! (428s). This can repeat
and repeat, I can't readd it and have it rebuild successfully.
If a device truly failed in some way, I'd expect dmesg errors from e.g. the ATA layer, etc.
But there is none; also what leads me to suspicion that this is some sort of a
bug, is the fact that this same array works perfectly on the older (3.7.10)
kernel.
...
[ 22.984532] r8169 0000:04:00.0 eth0: link up
[ 22.984541] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 22.984584] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready
[ 22.996464] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 22.996712] NFSD: starting 90-second grace period (net ffffffff81cb36c0)
[ 41.315150] ata1.00: configured for UDMA/133
[ 41.315164] ata1: EH complete
[ 41.329004] ata2.00: configured for UDMA/133
[ 41.329019] ata2: EH complete
[ 41.330625] ata3.00: configured for UDMA/133
[ 41.330640] ata3: EH complete
[ 41.333116] ata4.00: configured for UDMA/133
[ 41.333130] ata4: EH complete
[ 41.335766] ata5.00: configured for UDMA/133
[ 41.335781] ata5: EH complete
[ 41.356118] ata7.00: configured for UDMA/133
[ 41.356133] ata7: EH complete
[ 41.362298] ata12.00: configured for UDMA/133
[ 41.362313] ata12: EH complete
[ 41.362483] ata13.00: configured for UDMA/133
[ 41.362491] ata13: EH complete
[ 41.369409] ata11.00: configured for UDMA/133
[ 41.369424] ata11: EH complete
[ 133.191756] md/raid1:md3: Disk failure on sdg1, disabling device.
[ 133.191756] md/raid1:md3: Operation continuing on 1 devices.
[ 133.194892] RAID1 conf printout:
[ 133.194901] --- wd:1 rd:2
[ 133.194906] disk 0, wo:0, o:1, dev:sdf1
[ 133.194911] disk 1, wo:1, o:0, dev:sdg1
[ 133.198199] RAID1 conf printout:
[ 133.198213] --- wd:1 rd:2
[ 133.198219] disk 0, wo:0, o:1, dev:sdf1
[ 413.692816] md: unbind<sdg1>
[ 413.692863] md: export_rdev(sdg1)
[ 413.718257] device label home devid 1 transid 568912 /dev/md3
[ 418.696848] md: bind<sdg1>
[ 418.699066] RAID1 conf printout:
[ 418.699074] --- wd:1 rd:2
[ 418.699080] disk 0, wo:0, o:1, dev:sdf1
[ 418.699085] disk 1, wo:1, o:1, dev:sdg1
[ 418.704862] md: recovery of RAID array md3
[ 418.704873] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 418.704879] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 418.704888] md: using 128k window, over a total of 58579264k.
[ 418.763888] device label home devid 1 transid 568912 /dev/md3
[ 428.464670] md/raid1:md3: Disk failure on sdg1, disabling device.
[ 428.464670] md/raid1:md3: Operation continuing on 1 devices.
[ 428.979635] md: md3: recovery done.
[ 428.984824] RAID1 conf printout:
[ 428.984836] --- wd:1 rd:2
[ 428.984843] disk 0, wo:0, o:1, dev:sdf1
[ 428.984848] disk 1, wo:1, o:0, dev:sdg1
[ 428.987765] RAID1 conf printout:
[ 428.987771] --- wd:1 rd:2
[ 428.987777] disk 0, wo:0, o:1, dev:sdf1
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2013-04-15 8:56 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 8:56 RAID1 member mysteriously failing on 3.8+ Roman Mamedov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox