public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Roll-up of sas_ata patches
@ 2007-01-30  9:15 Darrick J. Wong
  2007-02-03 22:32 ` James Bottomley
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2007-01-30  9:15 UTC (permalink / raw)
  To: Darrick J. Wong, linux-scsi; +Cc: Alexis Bruemmer

Hi all,

This is a roll-up of all of my ATA related uncommitted patches against
libsas and aic94xx to date.  Per James Bottomley's request, I'm pushing
these patches out for further review in aic94xx-sas.  The big changes in
this patch set are a lot of bug and locking fixes, the conversion of the
EH routines to interact with the SAS EH strategy routines, and of course
the separation of the SATL code into a separate module.

These patches should apply in number order cleanly against 2.6.20-rc6 +
scsi_misc + scsi-rc-fixes + aic94xx-sas.  They've been fairly well tested
on a bunch of SATA disks in a x206m, though the ATAPI support is not so
well tested.  However, I have run these patches in other loads for a while.
Hopefully these patches are ready for more widespread testing in
scsi-misc, and thank you for any comments or feedback that you provide.

(Apologies for any stgit mail misconfiguration on my part.)

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/12] Roll-up of sas_ata patches
  2007-01-30  9:15 [PATCH 00/12] Roll-up of sas_ata patches Darrick J. Wong
@ 2007-02-03 22:32 ` James Bottomley
  2007-02-04  9:21   ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2007-02-03 22:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-scsi, Alexis Bruemmer

On Tue, 2007-01-30 at 04:15 -0500, Darrick J. Wong wrote:
> This is a roll-up of all of my ATA related uncommitted patches against
> libsas and aic94xx to date.  Per James Bottomley's request, I'm pushing
> these patches out for further review in aic94xx-sas.  The big changes in
> this patch set are a lot of bug and locking fixes, the conversion of the
> EH routines to interact with the SAS EH strategy routines, and of course
> the separation of the SATL code into a separate module.
> 
> These patches should apply in number order cleanly against 2.6.20-rc6 +
> scsi_misc + scsi-rc-fixes + aic94xx-sas.  They've been fairly well tested
> on a bunch of SATA disks in a x206m, though the ATAPI support is not so
> well tested.  However, I have run these patches in other loads for a while.
> Hopefully these patches are ready for more widespread testing in
> scsi-misc, and thank you for any comments or feedback that you provide.
> 
> (Apologies for any stgit mail misconfiguration on my part.)

There's a problem somewhere with your error handler changes (which I
picked up thanks to the problems with the V28 firmware).  What I see
without your changes is that for a directly attached SATA device, when
the firmware begins its death spiral, the commands all return and
eventually send I/O errors to the filesystem,  With your patch series
applied, it just loops forever giving messages like:

Feb  3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR
Feb  3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout
Feb  3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq
Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Feb  3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe
Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
....

James



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/12] Roll-up of sas_ata patches
  2007-02-03 22:32 ` James Bottomley
@ 2007-02-04  9:21   ` Darrick J. Wong
  2007-02-04 15:11     ` James Bottomley
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2007-02-04  9:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Alexis Bruemmer

James Bottomley wrote:

> There's a problem somewhere with your error handler changes (which I
> picked up thanks to the problems with the V28 firmware).  What I see
> without your changes is that for a directly attached SATA device, when
> the firmware begins its death spiral, the commands all return and
> eventually send I/O errors to the filesystem,  With your patch series
> applied, it just loops forever giving messages like:
> 
> Feb  3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR
> Feb  3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout
> Feb  3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq
> Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> Feb  3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe
> Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host

Interesting, since the opposite happens with SAS disks. :)

The infinite loop is usually what happens if a scsi_cmnd gets pulled off
the eh queue without being scsi_eh_finish_cmnd()'d.  Can you send me the
whole dmesg?  It's possible that we're trying to abort a command, which
of course fails for a SATA disk, so we try bigger and bigger hammers....
and the big hammers don't call scsi-eh-finish-cmd.

Did these SATA link reset errors only start showing up after the v28
firmware patch, or has this always happened?  I've noticed lately that I
get link reset errors if I run a short exercise on an ext3 filesystem on
a SATA disk, yet dd exercise runs just fine.  But I had also thought
that it was just my flaky hardware. :)

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 00/12] Roll-up of sas_ata patches
  2007-02-04  9:21   ` Darrick J. Wong
@ 2007-02-04 15:11     ` James Bottomley
  0 siblings, 0 replies; 4+ messages in thread
From: James Bottomley @ 2007-02-04 15:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-scsi, Alexis Bruemmer

On Sun, 2007-02-04 at 01:21 -0800, Darrick J. Wong wrote:
> James Bottomley wrote:
> 
> > There's a problem somewhere with your error handler changes (which I
> > picked up thanks to the problems with the V28 firmware).  What I see
> > without your changes is that for a directly attached SATA device, when
> > the firmware begins its death spiral, the commands all return and
> > eventually send I/O errors to the filesystem,  With your patch series
> > applied, it just loops forever giving messages like:
> > 
> > Feb  3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR
> > Feb  3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout
> > Feb  3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq
> > Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> > Feb  3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe
> > Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> > Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> > Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> > Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> > Feb  3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> > Feb  3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> 
> Interesting, since the opposite happens with SAS disks. :)

Well, the initial error is a firmware induced drive error of some type.

> The infinite loop is usually what happens if a scsi_cmnd gets pulled off
> the eh queue without being scsi_eh_finish_cmnd()'d.  Can you send me the
> whole dmesg?  It's possible that we're trying to abort a command, which
> of course fails for a SATA disk, so we try bigger and bigger hammers....
> and the big hammers don't call scsi-eh-finish-cmd.

I've put the full log from detection of the aic94xx to forced power off
(all 512k of it) at

http://www2.kernel.org:/pub/linux/kernel/people/jejb/klog.aic94xx.failure.txt

(give it a while for the kernel.org mirrors to propagate)

> Did these SATA link reset errors only start showing up after the v28
> firmware patch, or has this always happened?  I've noticed lately that I
> get link reset errors if I run a short exercise on an ext3 filesystem on
> a SATA disk, yet dd exercise runs just fine.  But I had also thought
> that it was just my flaky hardware. :)

Er ... no idea ... The problem only shows up with V28 firmware, so I've
never seen a SATA disc fail with the V17 firmware.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-02-04 15:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-30  9:15 [PATCH 00/12] Roll-up of sas_ata patches Darrick J. Wong
2007-02-03 22:32 ` James Bottomley
2007-02-04  9:21   ` Darrick J. Wong
2007-02-04 15:11     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox