linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Tim Nufire <linux_ide_tim@ibink.com>
Cc: linux-ide@vger.kernel.org
Subject: Re: Frozen drives when using SiI3726
Date: Mon, 29 Dec 2008 18:13:15 +0900	[thread overview]
Message-ID: <495894AB.8010404@kernel.org> (raw)
In-Reply-To: <20E60A0E-48E9-4942-851D-E20C012F5B89@ibink.com>

Hi,

Tim Nufire wrote:
> Hello,
> 
> I'm building a server using 9 SiI3726 based port multiplier backplanes
> connected to cards using SiI3132 (PCI Express) and SiI3124 (PCI). The
> drives are configured into 5 RAID6 groups of 9 drives each such that
> each array has 1 drive from each backplane. During the initial RAID
> synchronization one of the backplanes failed and restarted (see dmesg
> output below). While this did not disrupt the RAID groups this time, the
> reset took about 25 seconds and could easily have caused one or more
> drives to fail.
> 
> Is there anything I can do to prevent failures like this?

The reset was triggered by a timeout which probably have taken around
or more than 30 secs, so the array probably experienced disruption
which is about a minute long.  The failure latency is a bit
unfortunate at the moment.  :-(

Also, the timeout is one of the most generic failure mode there is.
It can be triggered by virtually anything including transmission
failure, power quality issues, drive problems and whatnot.  So, it's
impossible to tell what went wrong with the provided information.  It
could be an one-time fluke - e.g. bad sectors which developed during
storage and shipping and RAID sync makes the firmware think what to do
about it for a long time - or something more systematic -
e.g. slightly bad connection on the backplane side or sub-par power
which slightly chokes when all drives are pulling juice out of it.

Unfortunately, the only way to debug would be keeping an eye on
whether such failures repeat and if so when and where - whether it
always happen on the same chassis, slot or drive (by exchaning
drives), etc...

Please let us know when you find out more.

Happy new year.

-- 
tejun

  reply	other threads:[~2008-12-29  9:13 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-23 10:02 Frozen drives when using SiI3726 Tim Nufire
2008-12-29  9:13 ` Tejun Heo [this message]
2008-12-29 19:20   ` Grant Grundler
2009-01-02  3:28     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=495894AB.8010404@kernel.org \
    --to=tj@kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux_ide_tim@ibink.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).