All of lore.kernel.org
 help / color / mirror / Atom feed
From: MRK <mrk@shiftmail.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm: failed devices become spares!
Date: Wed, 19 May 2010 00:25:16 +0200	[thread overview]
Message-ID: <4BF313CC.9030401@shiftmail.org> (raw)
In-Reply-To: <20100518120637.24d875c9@notabene.brown>

On 05/18/2010 04:06 AM, Neil Brown wrote:
> However if --monitor gets to check the array between the above to events, it
> will first see that the working drive is now faulty, so it reports a failure,
> and then see that the faulty device isn't faulty any more and in fact isn't
> even there.  The "isn't event there" bit doesn't register and it treats it as
> 'SpareActive'.
>
> I should fix that.
>    

However in one case the two events are not detected in the same round:

Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdf1
Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdf1


1 minute passes between the two entries. I suppose that's the mdadm 
daemon polling time.

In the other case all the entries are at the same time

Apr 13 08:00:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdd1
Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdd1
Apr 13 08:00:02 phobos last message repeated 7 times
[...many times that messages..]


...plus, in this second case the SpareActive triggers a lot of times 
within that same second (Pierre you cut it short, but are all the "many 
times that messages" all at the exact same time or they span a few seconds?)

It looks to me like some kind of usb failure where the USB connection or 
USB bridge momentarily fails then immediately gets re-detected and 
re-added to the system. But since there are no usb entries in dmesg, 
that would also be an issue of the usb driver. Could the problem also be 
a mixture with some unwise udev triggers of Debian, maybe somehow 
causing the auto-re-add of the drive to the RAID?

Pierre:
- can you post your mdadm.conf?
- USB is not good for RAID imho. Many times in my life I saw problems 
with USB/SATA bridges where the drive would get disconnected on high I/O 
activity and then reconnected after a few seconds. Anyway, readding it 
to the RAID shouldn't have happened. Also in my case there were "usb" 
entries in dmesg.

  reply	other threads:[~2010-05-18 22:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-16 15:40 mdadm: failed devices become spares! Pierre Vignéras
2010-05-16 19:56 ` Leslie Rhorer
2010-05-17 18:10   ` Pierre Vignéras
2010-05-17 21:09     ` Tim Small
2010-05-18  1:30     ` Neil Brown
2010-05-18  2:06       ` Neil Brown
2010-05-18 22:25         ` MRK [this message]
2010-05-19 19:56           ` Simon Matthews
2010-05-21 21:00           ` Pierre Vignéras
2010-05-21 21:27         ` mdadm: failed devices become spares! -> Solved ! Pierre Vignéras
2010-05-18 23:07       ` mdadm: failed devices become spares! Pierre Vignéras
2010-05-19  1:45         ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BF313CC.9030401@shiftmail.org \
    --to=mrk@shiftmail.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.