public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Keith Hopkins <vger@hopnet.net>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>,
	Jan Sembera <jsembera@suse.cz>,
	linux-scsi@vger.kernel.org
Subject: Re: aic94xx: failing on high load (another data point)
Date: Fri, 15 Feb 2008 09:28:43 -0600	[thread overview]
Message-ID: <1203089323.3058.20.camel@localhost.localdomain> (raw)
In-Reply-To: <47B4682C.4020505@hopnet.net>

On Fri, 2008-02-15 at 00:11 +0800, Keith Hopkins wrote:
> On 01/31/2008 03:29 AM, Darrick J. Wong wrote:
> > On Wed, Jan 30, 2008 at 06:59:34PM +0800, Keith Hopkins wrote:
> >> V28.  My controller functions well with a single drive (low-medium load).  Unfortunately, all attempts to get the mirrors in sync fail and usually hang the whole box.
> > 
> > Adaptec posted a V30 sequencer on their website; does that fix the
> > problems?
> > 
> > http://www.adaptec.com/en-US/speed/scsi/linux/aic94xx-seq-30-1_tar_gz.htm
> > 
> 
> I lost connectivity to the drive again, and had to reboot to recover
> the drive, so it seemed a good time to try out the V30 firmware.
> Unfortunately, it didn't work any better.  Details are in the
> attachment.

Well, I can offer some hope.  The errors you report:

> aic94xx: escb_tasklet_complete: REQ_TASK_ABORT, reason=0x6
> aic94xx: escb_tasklet_complete: Can't find task (tc=6) to abort!

Are requests by the sequencer to abort a task because of a protocol
error.  IBM did some extensive testing with seagate drives and found
that the protocol errors were genuine and the result of drive firmware
problems.  IBM released a version of seagate firmware (BA17) to correct
these.  Unfortunately, your drive identifies its firmware as S513 which
is likely OEM firmware from another vendor ... however, that vendor may
have an update which corrects the problem.

Of course, the other issue is this:

> aic94xx: escb_tasklet_complete: Can't find task (tc=6) to abort!

This is a bug in the driver.  It's not finding the task in the
outstanding list.  The problem seems to be that it's taking the task
from the escb which, by definition, is always NULL.  It should be taking
the task from the ascb it finds by looping over the pending queue.

If you're willing, could you try this patch which may correct the
problem?  It's sort of like falling off a cliff: if you never go near
the edge (i.e. you upgrade the drive fw) you never fall off;
alternatively, it would be nice if you could help me put up guard rails
just in case.

Thanks,

James

---
diff --git a/drivers/scsi/aic94xx/aic94xx_scb.c b/drivers/scsi/aic94xx/aic94xx_scb.c
index 0febad4..ab35050 100644
--- a/drivers/scsi/aic94xx/aic94xx_scb.c
+++ b/drivers/scsi/aic94xx/aic94xx_scb.c
@@ -458,13 +458,19 @@ static void escb_tasklet_complete(struct asd_ascb *ascb,
 		tc_abort = le16_to_cpu(tc_abort);
 
 		list_for_each_entry_safe(a, b, &asd_ha->seq.pend_q, list) {
-			struct sas_task *task = ascb->uldd_task;
+			struct sas_task *task = a->uldd_task;
+
+			if (a->tc_index != tc_abort)
+				continue;
 
-			if (task && a->tc_index == tc_abort) {
+			if (task) {
 				failed_dev = task->dev;
 				sas_task_abort(task);
-				break;
+			} else {
+				ASD_DPRINTK("R_T_A for non TASK scb 0x%x\n",
+					    a->scb->header.opcode);
 			}
+			break;
 		}
 
 		if (!failed_dev) {
@@ -478,7 +484,7 @@ static void escb_tasklet_complete(struct asd_ascb *ascb,
 		 * that the EH will wake up and do something.
 		 */
 		list_for_each_entry_safe(a, b, &asd_ha->seq.pend_q, list) {
-			struct sas_task *task = ascb->uldd_task;
+			struct sas_task *task = a->uldd_task;
 
 			if (task &&
 			    task->dev == failed_dev &&



  reply	other threads:[~2008-02-15 15:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <479FB3ED.3080401@hopnet.net>
     [not found] ` <20080130091403.GA14887@alaris.suse.cz>
2008-01-30 10:59   ` aic94xx: failing on high load (another data point) Keith Hopkins
2008-01-30 19:29     ` Darrick J. Wong
2008-02-14 16:11       ` Keith Hopkins
2008-02-15 15:28         ` James Bottomley [this message]
2008-02-15 16:28           ` Keith Hopkins
2008-02-18 14:26           ` Keith Hopkins
2008-02-18 16:18             ` James Bottomley
2008-02-19 16:22             ` James Bottomley
2008-02-19 18:44               ` [PATCH] aic94xx: Don't free ABORT_TASK SCBs that are timed out (Was: Re: aic94xx: failing on high load) Darrick J. Wong
2008-02-19 18:52                 ` James Bottomley
2008-02-28 14:56                 ` Keith Hopkins
2008-02-28 16:10                   ` James Bottomley
2008-02-20  3:48               ` aic94xx: failing on high load (another data point) James Bottomley
2008-02-20  9:54                 ` Keith Hopkins
2008-02-20 16:22                   ` James Bottomley
2008-01-30 10:55 Keith Hopkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1203089323.3058.20.camel@localhost.localdomain \
    --to=james.bottomley@hansenpartnership.com \
    --cc=djwong@us.ibm.com \
    --cc=jsembera@suse.cz \
    --cc=linux-scsi@vger.kernel.org \
    --cc=vger@hopnet.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox