From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Keith Hopkins <vger@hopnet.net>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>,
Jan Sembera <jsembera@suse.cz>,
linux-scsi@vger.kernel.org
Subject: Re: aic94xx: failing on high load (another data point)
Date: Tue, 19 Feb 2008 10:22:20 -0600 [thread overview]
Message-ID: <1203438140.3103.24.camel@localhost.localdomain> (raw)
In-Reply-To: <47B9958A.8080104@hopnet.net>
On Mon, 2008-02-18 at 22:26 +0800, Keith Hopkins wrote:
> Well, that made life interesting....
> but didn't seem to fix anything.
>
> The behavior is about the same as before, but with more verbose
> errors. I failed one member of the raid and had it rebuild as a
> test...which hangs for a while and the drive falls off-line.
>
> Please grab the dmesg output in all its gory glory from here:
> http://wiki.hopnet.net/dokuwiki/lib/exe/fetch.php?media=myit:sas:dmesg-20080218-wpatch-fail.txt.gz
I had a look through this. Amazingly, in spite of the message spew, up
to here:
> sas: Enter sas_scsi_recover_host
> sas: trying to find task 0xffff81033c3d3d80
> sas: sas_scsi_find_task: aborting task 0xffff81033c3d3d80
> aic94xx: tmf timed out
> aic94xx: tmf came back
Everything is going normally (the REQ_TASK_ABORT are properly aborted an
retried). At this point (around L3449 in the trace) the aborts start
failing.
Unfortunately, there's a bug in TMF timeout handling in the driver, it
leaves the sequencer entry pending, but frees the ascb. If the
sequencer ever picks this up it will get very confused, as it does a
while down in the trace:
> aic94xx: BUG:sequencer:dl:no ascb?!
> aic94xx: BUG:sequencer:dl:no ascb?!
That's where the sequencer adds an ascb to the done list that we've
already freed. From this point on confusion reigns and the error
handler eventually offlines the device.
I'll see if I can come up with patches to fix this ... or at least
mitigate the problems it causes.
James
next prev parent reply other threads:[~2008-02-19 16:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <479FB3ED.3080401@hopnet.net>
[not found] ` <20080130091403.GA14887@alaris.suse.cz>
2008-01-30 10:59 ` aic94xx: failing on high load (another data point) Keith Hopkins
2008-01-30 19:29 ` Darrick J. Wong
2008-02-14 16:11 ` Keith Hopkins
2008-02-15 15:28 ` James Bottomley
2008-02-15 16:28 ` Keith Hopkins
2008-02-18 14:26 ` Keith Hopkins
2008-02-18 16:18 ` James Bottomley
2008-02-19 16:22 ` James Bottomley [this message]
2008-02-19 18:44 ` [PATCH] aic94xx: Don't free ABORT_TASK SCBs that are timed out (Was: Re: aic94xx: failing on high load) Darrick J. Wong
2008-02-19 18:52 ` James Bottomley
2008-02-28 14:56 ` Keith Hopkins
2008-02-28 16:10 ` James Bottomley
2008-02-20 3:48 ` aic94xx: failing on high load (another data point) James Bottomley
2008-02-20 9:54 ` Keith Hopkins
2008-02-20 16:22 ` James Bottomley
2008-01-30 10:55 Keith Hopkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1203438140.3103.24.camel@localhost.localdomain \
--to=james.bottomley@hansenpartnership.com \
--cc=djwong@us.ibm.com \
--cc=jsembera@suse.cz \
--cc=linux-scsi@vger.kernel.org \
--cc=vger@hopnet.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox