public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Muli Ben-Yehuda <muli@il.ibm.com>
To: Andy Warner <andyw@pobox.com>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	James Bottomley <James.Bottomley@SteelEye.com>
Subject: Re: aic94xx IO errors with "escb_tasklet_complete: phy0: REQ_TASK_ABORT"
Date: Wed, 4 Oct 2006 22:56:21 +0200	[thread overview]
Message-ID: <20061004205621.GF5091@rhun.haifa.ibm.com> (raw)
In-Reply-To: <20061004132929.A29689@florence.linkmargin.com>

On Wed, Oct 04, 2006 at 01:29:29PM -0500, Andy Warner wrote:
> Muli Ben-Yehuda wrote:
> > [resending as it probably hit the 100K limit the first time]
> > 
> > I'm seeing these aic94xx IO errors on an IBM x366, usually after I
> > copy ~20GB but occasionally as soon as heavy IO starts. Happens with
> > and without Calgary enabled (iommu=off). I'm seeing this on two
> > different disks which badblocks claims are ok. The machine usually
> > stays up and keeps chugging along after this happens.
> 
> Since you're working in this area,

Not really... I just need aic94xx working reliably so that when it
breaks, I can be reasonably certain it's because I broke Calgary :-)

> the processing for REQ_TASK_ABORT, REQ_DEVICE_RESET,
> SIGNAL_NCQ_ERROR and CLEAR_NCQ_ERROR needs fixing as all 4 events
> collapse to REQ_TASK_ABORT, because sb_opcode is masked with
> ~DL_PHY_MASK before the switch() in escb_tasklet_complete(). In
> unpatched code, check the phy number reported in the REQ_TASK_ABORT
> message:
> 
>   0 => REQ_TASK_ABORT
>   1 => REQ_DEVICE_RESET
>   2 => SIGNAL_NCQ_ERROR
>   3 => CLEAR_NCQ_ERROR
> 
> So you are seeing legitimate REQ_TASK_ABORT values, but need to look
> at the remaining data to see what the chip is trying to tell you.
> For REQ_TASK_ABORT, status_block[1..2] is the transaction context,
> and status_block[3] is the reason (TC_NO_ERROR etc from
> aic94xx_sas.h)

Using your patch (thanks!) I get 

aic94xx: escb_tasklet_complete: REQ_TASK_ABORT, reason 0x05

which corresponds to TI_BREAK.

Is an old firmware version ("Razor_10a1") expected to work with the
aic94xx in mainline? alternatively, is the new firmware version
("V17/10c6") expected to work with older aic94xx versions? if either
is true, I can try that tomorrow to see if firmware version makes a
difference with the bad aic94xx.

Cheers,
Muli

  parent reply	other threads:[~2006-10-04 20:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-04 16:44 aic94xx IO errors with "escb_tasklet_complete: phy0: REQ_TASK_ABORT" Muli Ben-Yehuda
2006-10-04 18:29 ` Andy Warner
2006-10-04 20:12   ` Darrick J. Wong
2006-10-04 19:31     ` Andy Warner
2006-10-04 20:56   ` Muli Ben-Yehuda [this message]
2006-10-04 21:11     ` James Bottomley
2006-10-05 21:55       ` Muli Ben-Yehuda
2006-10-05 22:11         ` Darrick J. Wong
2006-10-16 19:51       ` Muli Ben-Yehuda
2006-10-16 20:00         ` James Bottomley
2006-10-16 20:47           ` Muli Ben-Yehuda
2006-10-05  0:50 ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061004205621.GF5091@rhun.haifa.ibm.com \
    --to=muli@il.ibm.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=andyw@pobox.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox