public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: "Raoul Bhatia [IPAX]" <r.bhatia@ipax.at>
Cc: linux-scsi@vger.kernel.org
Subject: Re: aic94xx driver woes continued
Date: Thu, 20 Mar 2008 14:57:07 -0500	[thread overview]
Message-ID: <1206043027.3038.48.camel@localhost.localdomain> (raw)
In-Reply-To: <47E2B7EF.1050203@ipax.at>

On Thu, 2008-03-20 at 20:15 +0100, Raoul Bhatia [IPAX] wrote:
> James Bottomley wrote:
> > This is all normal.  Seagate drives are known for throwing protocol
> > errors under stress at certain revs of firmware.  That's what
> > REQ_TASK_ABORT, reason=0x6 is.
> > 
> > Your logs indicate that the recovery occurred correctly (as in all tasks
> > were eventually retried), so it doesn't show an actual problem.
> 
> ok, i already filed a trouble ticket at seagate - lets see if they
> provide a firmware update for the disks. afaik mine is "firmware 0002"
> 
> >> sometimes even a disk is kicked out of the raid configuration.
> > 
> > This would be abnormal, if you have a log of this, could you post it.  I
> > assume it was because of I/O errors?
> 
> i attached a bigger syslog file (.gz format).

OK, this looks more definitive, thanks!

What appears to be happening is that you get a run of protocol errors,
not necessarily all on the same command, but what happens every time (by
current design of the aic94xx driver) is that we halt the aic94xx, abort
all the outstanding commands and resubmit them.  Because the disk is
being hammered, there are rather a lot, so all it takes is five protocol
errors in a few seconds for one unlucky command to get aborted five
times (not necessarily through any fault of its own) and run out of
retries.  This causes it to return to the upper layers with DID_ABORT
and be treated as an I/O error.

A work around might be to lower the queue depth to say 4 or 8 and up the
retries (this latter can only be done by altering the SD_MAX_RETRIES
parameter in include/scsi/sd.h and recompiling).

Longer term, I think REQ_TASK_ABORT needs to be handled better on the
fly.  What we should do is abort only the task we've been asked to abort
and return it to the upper layer for a retry without invoking the error
handler ... I can look into this, but it will take a while.

James



  parent reply	other threads:[~2008-03-20 19:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-20 18:43 aic94xx driver woes continued Raoul Bhatia [IPAX]
2008-03-20 19:01 ` James Bottomley
2008-03-20 19:14   ` Raoul Bhatia [IPAX]
2008-03-29 22:36     ` Luben Tuikov
2008-03-20 19:15   ` Raoul Bhatia [IPAX]
2008-03-20 19:18     ` Raoul Bhatia [IPAX]
2008-03-20 19:57     ` James Bottomley [this message]
2008-03-20 20:21       ` Raoul Bhatia [IPAX]
2008-03-20 21:08       ` Raoul Bhatia [IPAX]
2008-03-20 21:17         ` James Bottomley
2008-03-20 22:18           ` Alexis Bruemmer
2008-03-26 14:34             ` Raoul Bhatia [IPAX]
2008-03-29 22:39       ` Luben Tuikov
2008-03-29 22:33   ` Luben Tuikov
2008-03-31 20:23     ` Raoul Bhatia [IPAX]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1206043027.3038.48.camel@localhost.localdomain \
    --to=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=r.bhatia@ipax.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox