Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Roman Mamedov <rm@romanrm.ru>
To: linux-raid@vger.kernel.org
Subject: RAID1 member mysteriously failing on 3.8+
Date: Mon, 15 Apr 2013 14:56:52 +0600	[thread overview]
Message-ID: <20130415145652.302d87f1@natsu> (raw)

[-- Attachment #1: Type: text/plain, Size: 3555 bytes --]

Hello,

Continuing on the dangerous and exciting journey with trying to upgrade my
system from a 3.7.10 kernel to 3.8.7, I face the following problem.

In a RAID1 array of an SSD and a HDD marked as write-mostly, mdadm at some
point just randomly decides that a device has failed, despite NO dmesg
messages that would confirm that anything at all happened to the device (at 133s).

Then I notice this, remove(413s) and re-add(418s) the device. It starts
rebuilding, but just after 10 seconds, "fails" again! (428s). This can repeat
and repeat, I can't readd it and have it rebuild successfully.

If a device truly failed in some way, I'd expect dmesg errors from e.g. the ATA layer, etc.
But there is none; also what leads me to suspicion that this is some sort of a
bug, is the fact that this same array works perfectly on the older (3.7.10)
kernel.

...
[   22.984532] r8169 0000:04:00.0 eth0: link up
[   22.984541] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   22.984584] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready
[   22.996464] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[   22.996712] NFSD: starting 90-second grace period (net ffffffff81cb36c0)
[   41.315150] ata1.00: configured for UDMA/133
[   41.315164] ata1: EH complete
[   41.329004] ata2.00: configured for UDMA/133
[   41.329019] ata2: EH complete
[   41.330625] ata3.00: configured for UDMA/133
[   41.330640] ata3: EH complete
[   41.333116] ata4.00: configured for UDMA/133
[   41.333130] ata4: EH complete
[   41.335766] ata5.00: configured for UDMA/133
[   41.335781] ata5: EH complete
[   41.356118] ata7.00: configured for UDMA/133
[   41.356133] ata7: EH complete
[   41.362298] ata12.00: configured for UDMA/133
[   41.362313] ata12: EH complete
[   41.362483] ata13.00: configured for UDMA/133
[   41.362491] ata13: EH complete
[   41.369409] ata11.00: configured for UDMA/133
[   41.369424] ata11: EH complete
[  133.191756] md/raid1:md3: Disk failure on sdg1, disabling device.
[  133.191756] md/raid1:md3: Operation continuing on 1 devices.
[  133.194892] RAID1 conf printout:
[  133.194901]  --- wd:1 rd:2
[  133.194906]  disk 0, wo:0, o:1, dev:sdf1
[  133.194911]  disk 1, wo:1, o:0, dev:sdg1
[  133.198199] RAID1 conf printout:
[  133.198213]  --- wd:1 rd:2
[  133.198219]  disk 0, wo:0, o:1, dev:sdf1
[  413.692816] md: unbind<sdg1>
[  413.692863] md: export_rdev(sdg1)
[  413.718257] device label home devid 1 transid 568912 /dev/md3
[  418.696848] md: bind<sdg1>
[  418.699066] RAID1 conf printout:
[  418.699074]  --- wd:1 rd:2
[  418.699080]  disk 0, wo:0, o:1, dev:sdf1
[  418.699085]  disk 1, wo:1, o:1, dev:sdg1
[  418.704862] md: recovery of RAID array md3
[  418.704873] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  418.704879] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  418.704888] md: using 128k window, over a total of 58579264k.
[  418.763888] device label home devid 1 transid 568912 /dev/md3
[  428.464670] md/raid1:md3: Disk failure on sdg1, disabling device.
[  428.464670] md/raid1:md3: Operation continuing on 1 devices.
[  428.979635] md: md3: recovery done.
[  428.984824] RAID1 conf printout:
[  428.984836]  --- wd:1 rd:2
[  428.984843]  disk 0, wo:0, o:1, dev:sdf1
[  428.984848]  disk 1, wo:1, o:0, dev:sdg1
[  428.987765] RAID1 conf printout:
[  428.987771]  --- wd:1 rd:2
[  428.987777]  disk 0, wo:0, o:1, dev:sdf1


-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

                 reply	other threads:[~2013-04-15  8:56 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130415145652.302d87f1@natsu \
    --to=rm@romanrm.ru \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox