All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Hopkins <vger@hopnet.net>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>,
	Jan Sembera <jsembera@suse.cz>,
	linux-scsi@vger.kernel.org
Subject: Re: aic94xx: failing on high load (another data point)
Date: Mon, 18 Feb 2008 22:26:18 +0800	[thread overview]
Message-ID: <47B9958A.8080104@hopnet.net> (raw)
In-Reply-To: <1203089323.3058.20.camel@localhost.localdomain>

On 02/15/2008 11:28 PM, James Bottomley wrote:
> On Fri, 2008-02-15 at 00:11 +0800, Keith Hopkins wrote:
>> On 01/31/2008 03:29 AM, Darrick J. Wong wrote:
>>> On Wed, Jan 30, 2008 at 06:59:34PM +0800, Keith Hopkins wrote:
>>>> V28.  My controller functions well with a single drive (low-medium load).  Unfortunately, all attempts to get the mirrors in sync fail and usually hang the whole box.
>>> Adaptec posted a V30 sequencer on their website; does that fix the
>>> problems?
>>>
>>> http://www.adaptec.com/en-US/speed/scsi/linux/aic94xx-seq-30-1_tar_gz.htm
>>>
>> I lost connectivity to the drive again, and had to reboot to recover
>> the drive, so it seemed a good time to try out the V30 firmware.
>> Unfortunately, it didn't work any better.  Details are in the
>> attachment.
> 
> Well, I can offer some hope.  The errors you report:
> 
>> aic94xx: escb_tasklet_complete: REQ_TASK_ABORT, reason=0x6
>> aic94xx: escb_tasklet_complete: Can't find task (tc=6) to abort!
> 
> Are requests by the sequencer to abort a task because of a protocol
> error.  IBM did some extensive testing with seagate drives and found
> that the protocol errors were genuine and the result of drive firmware
> problems.  IBM released a version of seagate firmware (BA17) to correct
> these.  Unfortunately, your drive identifies its firmware as S513 which
> is likely OEM firmware from another vendor ... however, that vendor may
> have an update which corrects the problem.
> 
> Of course, the other issue is this:
> 
>> aic94xx: escb_tasklet_complete: Can't find task (tc=6) to abort!
> 
> This is a bug in the driver.  It's not finding the task in the
> outstanding list.  The problem seems to be that it's taking the task
> from the escb which, by definition, is always NULL.  It should be taking
> the task from the ascb it finds by looping over the pending queue.
> 
> If you're willing, could you try this patch which may correct the
> problem?  It's sort of like falling off a cliff: if you never go near
> the edge (i.e. you upgrade the drive fw) you never fall off;
> alternatively, it would be nice if you could help me put up guard rails
> just in case.
> 

Well, that made life interesting....
  but didn't seem to fix anything.

The behavior is about the same as before, but with more verbose errors.  I failed one member of the raid and had it rebuild as a test...which hangs for a while and the drive falls off-line.

Please grab the dmesg output in all its gory glory from here: http://wiki.hopnet.net/dokuwiki/lib/exe/fetch.php?media=myit:sas:dmesg-20080218-wpatch-fail.txt.gz

The drive is a Dell OEM drive, but it's not in a Dell system.  There is at least one firmware (S527) upgrade for it, but the Dell loader refuses to load it (because it isn't in a Dell system...)
Does anyone know a generic way to load a new firmware onto a SAS drive?

--Keith

  parent reply	other threads:[~2008-02-18 14:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <479FB3ED.3080401@hopnet.net>
     [not found] ` <20080130091403.GA14887@alaris.suse.cz>
2008-01-30 10:59   ` aic94xx: failing on high load (another data point) Keith Hopkins
2008-01-30 19:29     ` Darrick J. Wong
2008-02-14 16:11       ` Keith Hopkins
2008-02-15 15:28         ` James Bottomley
2008-02-15 16:28           ` Keith Hopkins
2008-02-18 14:26           ` Keith Hopkins [this message]
2008-02-18 16:18             ` James Bottomley
2008-02-19 16:22             ` James Bottomley
2008-02-19 18:44               ` [PATCH] aic94xx: Don't free ABORT_TASK SCBs that are timed out (Was: Re: aic94xx: failing on high load) Darrick J. Wong
2008-02-19 18:52                 ` James Bottomley
2008-02-28 14:56                 ` Keith Hopkins
2008-02-28 16:10                   ` James Bottomley
2008-02-20  3:48               ` aic94xx: failing on high load (another data point) James Bottomley
2008-02-20  9:54                 ` Keith Hopkins
2008-02-20 16:22                   ` James Bottomley
2008-01-30 10:55 Keith Hopkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47B9958A.8080104@hopnet.net \
    --to=vger@hopnet.net \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=djwong@us.ibm.com \
    --cc=jsembera@suse.cz \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.