* [PATCH 00/12] Roll-up of sas_ata patches
@ 2007-01-30 9:15 Darrick J. Wong
2007-02-03 22:32 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2007-01-30 9:15 UTC (permalink / raw)
To: Darrick J. Wong, linux-scsi; +Cc: Alexis Bruemmer
Hi all,
This is a roll-up of all of my ATA related uncommitted patches against
libsas and aic94xx to date. Per James Bottomley's request, I'm pushing
these patches out for further review in aic94xx-sas. The big changes in
this patch set are a lot of bug and locking fixes, the conversion of the
EH routines to interact with the SAS EH strategy routines, and of course
the separation of the SATL code into a separate module.
These patches should apply in number order cleanly against 2.6.20-rc6 +
scsi_misc + scsi-rc-fixes + aic94xx-sas. They've been fairly well tested
on a bunch of SATA disks in a x206m, though the ATAPI support is not so
well tested. However, I have run these patches in other loads for a while.
Hopefully these patches are ready for more widespread testing in
scsi-misc, and thank you for any comments or feedback that you provide.
(Apologies for any stgit mail misconfiguration on my part.)
--D
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 00/12] Roll-up of sas_ata patches
2007-01-30 9:15 [PATCH 00/12] Roll-up of sas_ata patches Darrick J. Wong
@ 2007-02-03 22:32 ` James Bottomley
2007-02-04 9:21 ` Darrick J. Wong
0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2007-02-03 22:32 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-scsi, Alexis Bruemmer
On Tue, 2007-01-30 at 04:15 -0500, Darrick J. Wong wrote:
> This is a roll-up of all of my ATA related uncommitted patches against
> libsas and aic94xx to date. Per James Bottomley's request, I'm pushing
> these patches out for further review in aic94xx-sas. The big changes in
> this patch set are a lot of bug and locking fixes, the conversion of the
> EH routines to interact with the SAS EH strategy routines, and of course
> the separation of the SATL code into a separate module.
>
> These patches should apply in number order cleanly against 2.6.20-rc6 +
> scsi_misc + scsi-rc-fixes + aic94xx-sas. They've been fairly well tested
> on a bunch of SATA disks in a x206m, though the ATAPI support is not so
> well tested. However, I have run these patches in other loads for a while.
> Hopefully these patches are ready for more widespread testing in
> scsi-misc, and thank you for any comments or feedback that you provide.
>
> (Apologies for any stgit mail misconfiguration on my part.)
There's a problem somewhere with your error handler changes (which I
picked up thanks to the problems with the V28 firmware). What I see
without your changes is that for a directly attached SATA device, when
the firmware begins its death spiral, the commands all return and
eventually send I/O errors to the filesystem, With your patch series
applied, it just loops forever giving messages like:
Feb 3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR
Feb 3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout
Feb 3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq
Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Feb 3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe
Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
....
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 00/12] Roll-up of sas_ata patches
2007-02-03 22:32 ` James Bottomley
@ 2007-02-04 9:21 ` Darrick J. Wong
2007-02-04 15:11 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2007-02-04 9:21 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, Alexis Bruemmer
James Bottomley wrote:
> There's a problem somewhere with your error handler changes (which I
> picked up thanks to the problems with the V28 firmware). What I see
> without your changes is that for a directly attached SATA device, when
> the firmware begins its death spiral, the commands all return and
> eventually send I/O errors to the filesystem, With your patch series
> applied, it just loops forever giving messages like:
>
> Feb 3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR
> Feb 3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout
> Feb 3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq
> Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> Feb 3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe
> Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
Interesting, since the opposite happens with SAS disks. :)
The infinite loop is usually what happens if a scsi_cmnd gets pulled off
the eh queue without being scsi_eh_finish_cmnd()'d. Can you send me the
whole dmesg? It's possible that we're trying to abort a command, which
of course fails for a SATA disk, so we try bigger and bigger hammers....
and the big hammers don't call scsi-eh-finish-cmd.
Did these SATA link reset errors only start showing up after the v28
firmware patch, or has this always happened? I've noticed lately that I
get link reset errors if I run a short exercise on an ext3 filesystem on
a SATA disk, yet dd exercise runs just fine. But I had also thought
that it was just my flaky hardware. :)
--D
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 00/12] Roll-up of sas_ata patches
2007-02-04 9:21 ` Darrick J. Wong
@ 2007-02-04 15:11 ` James Bottomley
0 siblings, 0 replies; 4+ messages in thread
From: James Bottomley @ 2007-02-04 15:11 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-scsi, Alexis Bruemmer
On Sun, 2007-02-04 at 01:21 -0800, Darrick J. Wong wrote:
> James Bottomley wrote:
>
> > There's a problem somewhere with your error handler changes (which I
> > picked up thanks to the problems with the V28 firmware). What I see
> > without your changes is that for a directly attached SATA device, when
> > the firmware begins its death spiral, the commands all return and
> > eventually send I/O errors to the filesystem, With your patch series
> > applied, it just loops forever giving messages like:
> >
> > Feb 3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR
> > Feb 3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout
> > Feb 3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq
> > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> > Feb 3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe
> > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
> > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host
> > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host
>
> Interesting, since the opposite happens with SAS disks. :)
Well, the initial error is a firmware induced drive error of some type.
> The infinite loop is usually what happens if a scsi_cmnd gets pulled off
> the eh queue without being scsi_eh_finish_cmnd()'d. Can you send me the
> whole dmesg? It's possible that we're trying to abort a command, which
> of course fails for a SATA disk, so we try bigger and bigger hammers....
> and the big hammers don't call scsi-eh-finish-cmd.
I've put the full log from detection of the aic94xx to forced power off
(all 512k of it) at
http://www2.kernel.org:/pub/linux/kernel/people/jejb/klog.aic94xx.failure.txt
(give it a while for the kernel.org mirrors to propagate)
> Did these SATA link reset errors only start showing up after the v28
> firmware patch, or has this always happened? I've noticed lately that I
> get link reset errors if I run a short exercise on an ext3 filesystem on
> a SATA disk, yet dd exercise runs just fine. But I had also thought
> that it was just my flaky hardware. :)
Er ... no idea ... The problem only shows up with V28 firmware, so I've
never seen a SATA disc fail with the V17 firmware.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-02-04 15:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-30 9:15 [PATCH 00/12] Roll-up of sas_ata patches Darrick J. Wong
2007-02-03 22:32 ` James Bottomley
2007-02-04 9:21 ` Darrick J. Wong
2007-02-04 15:11 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox