All of lore.kernel.org
 help / color / mirror / Atom feed
From: "lrhorer@satx.rr.com" <lrhorer@satx.rr.com>
To: linux-raid@vger.kernel.org
Subject: RE: Spurious HD convictions
Date: Sat, 12 Dec 2009 21:44:53 -0600	[thread overview]
Message-ID: <200912122144.53709.lrhorer@satx.rr.com> (raw)

Hmm.  I don't see how it could be either the PS or the PMs, since the drives 
were moved to a new enclosure when the problem started happening, yet the 
problem persists.  The new chassis has all new PMs and of course a new PS, 
and the problem is happening across multiple PMs.  In addition, if NCQ is the 
problem, why has it just started happening?  This system has been up and 
running for the better part of a year.  Regardless, I have disabled NCQ by 
executing `echo 1 > /sys/block/sd[a-g]/device/queue_depth`, and I am 
attempting a repair action again.  We'll see how it goes.

> Hi Leslie,
> 
> According to some of the links here:
> http://www.google.com/search?hl=en&q=failed+to+read+SCR+1+(Emask%3D0x40)
> 
> It seem to be either the Power Supply Unit (PSU) or the Port Multiplier
> (PM).
> 
> A quick workaround seem to be disabling NCQ on all affected devices.
> 
> On Sun, Dec 13, 2009 at 5:02 AM, lrhorer@satx.rr.com
> <lrhorer@satx.rr.com> wrote:
> >
> >        What's happening here?  Suddenly, my backup server is suffering
> apparently
> > spurious hard drive convictions.  The server is running RAID5 on 7 disks
> > under md.  It has been running well for months, but suddenly it has
> started
> > kicking drives from the array when under moderately heavy read or write
> > loads.  The thing is, it isn't convicting any particular drive
> repeatedly,
> > and the drives are not showing any errors under SMART.  This is a PM
> system,
> > and I have tried changing the drive adapters, changing the PMs, changing
> > cables, moving the drives around, and moving them out of the CPU
> enclosure to
> > a new external chassis.  The convictions are not occurring on any one
> > channel, over any one particular PM, or over any particular cable.
>  Since
> > this started happening, I have been unable to get all the way through a
> > resync before the array dumps at least one of the drives.  Here is a
> sample
> > from the kernel log during one of the convictions:

             reply	other threads:[~2009-12-13  3:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-13  3:44 lrhorer [this message]
2009-12-14 20:06 ` Spurious HD convictions Majed B.
     [not found]   ` <4b271970.5e44f10a.484f.ffffdd07SMTPIN_ADDED@mx.google.com>
2009-12-15  8:47     ` Majed B.
2009-12-16  5:40       ` Leslie Rhorer
2009-12-16  5:41   ` Leslie Rhorer
2009-12-16  9:13     ` Robin Hill
  -- strict thread matches above, loose matches on Subject: below --
2009-12-13  2:07 lrhorer
2009-12-13  2:02 lrhorer
2009-12-13  2:57 ` Majed B.
2009-12-12 19:42 Leslie Rhorer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200912122144.53709.lrhorer@satx.rr.com \
    --to=lrhorer@satx.rr.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.