From: MRK <mrk@shiftmail.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm: failed devices become spares!
Date: Wed, 19 May 2010 00:25:16 +0200 [thread overview]
Message-ID: <4BF313CC.9030401@shiftmail.org> (raw)
In-Reply-To: <20100518120637.24d875c9@notabene.brown>
On 05/18/2010 04:06 AM, Neil Brown wrote:
> However if --monitor gets to check the array between the above to events, it
> will first see that the working drive is now faulty, so it reports a failure,
> and then see that the faulty device isn't faulty any more and in fact isn't
> even there. The "isn't event there" bit doesn't register and it treats it as
> 'SpareActive'.
>
> I should fix that.
>
However in one case the two events are not detected in the same round:
Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdf1
Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdf1
1 minute passes between the two entries. I suppose that's the mdadm
daemon polling time.
In the other case all the entries are at the same time
Apr 13 08:00:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdd1
Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdd1
Apr 13 08:00:02 phobos last message repeated 7 times
[...many times that messages..]
...plus, in this second case the SpareActive triggers a lot of times
within that same second (Pierre you cut it short, but are all the "many
times that messages" all at the exact same time or they span a few seconds?)
It looks to me like some kind of usb failure where the USB connection or
USB bridge momentarily fails then immediately gets re-detected and
re-added to the system. But since there are no usb entries in dmesg,
that would also be an issue of the usb driver. Could the problem also be
a mixture with some unwise udev triggers of Debian, maybe somehow
causing the auto-re-add of the drive to the RAID?
Pierre:
- can you post your mdadm.conf?
- USB is not good for RAID imho. Many times in my life I saw problems
with USB/SATA bridges where the drive would get disconnected on high I/O
activity and then reconnected after a few seconds. Anyway, readding it
to the RAID shouldn't have happened. Also in my case there were "usb"
entries in dmesg.
next prev parent reply other threads:[~2010-05-18 22:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-16 15:40 mdadm: failed devices become spares! Pierre Vignéras
2010-05-16 19:56 ` Leslie Rhorer
2010-05-17 18:10 ` Pierre Vignéras
2010-05-17 21:09 ` Tim Small
2010-05-18 1:30 ` Neil Brown
2010-05-18 2:06 ` Neil Brown
2010-05-18 22:25 ` MRK [this message]
2010-05-19 19:56 ` Simon Matthews
2010-05-21 21:00 ` Pierre Vignéras
2010-05-21 21:27 ` mdadm: failed devices become spares! -> Solved ! Pierre Vignéras
2010-05-18 23:07 ` mdadm: failed devices become spares! Pierre Vignéras
2010-05-19 1:45 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BF313CC.9030401@shiftmail.org \
--to=mrk@shiftmail.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.