* Need help with libata error handling in libsas
@ 2008-02-24 21:02 James Bottomley
2008-02-25 16:34 ` Brian King
0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2008-02-24 21:02 UTC (permalink / raw)
To: linux-ide, linux-scsi
I keep hearing that we need to convert libsas to use libata's new error
handling. Unfortunately, I have very little conception of what that
means. Right at the moment, libsas doesn't use any error handling
functions of libata at all.
I've looked through the libata-eh functions, and I find them frankly
incomprehensible.
Firstly, let me say what SAS error handling actually does:
First of all, we may (or may not) get early warning of problems, so we
have callbacks to allow drivers to trigger the error handler early
(There's a particular event from the aic94xx sequencer which says "I've
detected a screw up on this task, begin error handling now"), which
seems to correspond with ata_qc_schedule_eh().
Then we quiesce the host (standard eh practice, so libata does this to
because SCSI forces it). Then we go through the remaining tasks. The
first thing we try is to abort a task. This is basically asking the HBA
to give me back my task, and is applicable to both ATA and SAS tasks.
Abort serves a dual purpose; if the task is pending or completed, it can
just be flushed from the HBA issue queue. If the task is actually
active on the end device, then we can send a SCSI TMF after it. For
ATA, we can't do this, but the docs recommend sending a register D2H FIS
with a soft reset after a non-NCQ task or a CHECK POWER MODE fis after
an NCQ command. I just don't see anywhere in libata where this is done?
After this, libsas uses a query function (which has no ATA parallel) to
find what the target is doing with the task.
Finally we come to the escalating reset sequece (LUN (hard), phy,
pathway) which seems to mirror what libata-eh would do (well, barring
the pathway reset, since ATA has no concept of that).
All of this leads me to conclude, that all libsas needs is to plumb in
the ATA equivalent of abort, junk the task query for libata devices and
simply proceed, as if the task is held at the target, along the
escalating reset path.
We might be able to weld the error handlers of libsas and libata
together (i.e. use the libata one for everything up to pathway reset and
then move to the libsas one for pathway reset on), the problem is I just
don't see any way of doing this. Plus abort and TMF query skip are
fairly small alterations to the current libsas eh, so it's not clear
there's any value to acutally welding in the libsas eh, even if I could
get it to provide the information I need.
So, my conclusion is tending towards simply adding an ATA component to
libsas and keeping all eh libsas local. In that case, is there anything
I need to do to convince libata that I don't care whether it uses old or
new error handling?
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Need help with libata error handling in libsas
2008-02-24 21:02 Need help with libata error handling in libsas James Bottomley
@ 2008-02-25 16:34 ` Brian King
2008-02-25 16:57 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Brian King @ 2008-02-25 16:34 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-ide, linux-scsi
James Bottomley wrote:
> I keep hearing that we need to convert libsas to use libata's new error
> handling. Unfortunately, I have very little conception of what that
> means. Right at the moment, libsas doesn't use any error handling
> functions of libata at all.
>
> I've looked through the libata-eh functions, and I find them frankly
> incomprehensible.
>
> Firstly, let me say what SAS error handling actually does:
Let me chime in with what ipr error handling does/can do. The ipr firmware
provides two basic SATA error handling methods with some modifiers to each.
Cancel All - This cancels all outstanding commands to the device. When issued
to an ATA device, this gets escalated by the firmware to an SRST. When issued
to an ATAPI device, an ATA NOOP is issued.
Reset Device - This command has modifiers to indicate either a soft reset
or a hard reset.
Currently, the only SATA devices that ipr officially attaches are ATAPI
DVD devices. In our testing we've come to the conclusion that trying to
use anything but a hard reset for ERP is generally more trouble than it
is worth.
> All of this leads me to conclude, that all libsas needs is to plumb in
> the ATA equivalent of abort, junk the task query for libata devices and
> simply proceed, as if the task is held at the target, along the
> escalating reset path.
The new libata-eh is used for more than just EH. It is used for device
probing, device revalidation, and power management. It is also woken for
all command failures and is where the request sense for ATAPI devices is
issued. Device revalidation following reset is also critical for ATA and
ATAPI devices. One example of this is some SATA/PATA converter chips
lose their DMA xfer settings following a reset and default to PIO mode
only. Any DMA transfer that is attempted simply hangs.
The other issue is PMP support. The more that gets pushed into libsas,
the more libsas needs to know about things such as PMP.
-Brian
--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Need help with libata error handling in libsas
2008-02-25 16:34 ` Brian King
@ 2008-02-25 16:57 ` James Bottomley
2008-02-25 17:45 ` Jeff Garzik
0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2008-02-25 16:57 UTC (permalink / raw)
To: Brian King; +Cc: linux-ide, linux-scsi
On Mon, 2008-02-25 at 10:34 -0600, Brian King wrote:
> The new libata-eh is used for more than just EH. It is used for device
> probing, device revalidation, and power management. It is also woken for
> all command failures and is where the request sense for ATAPI devices is
> issued. Device revalidation following reset is also critical for ATA and
> ATAPI devices. One example of this is some SATA/PATA converter chips
> lose their DMA xfer settings following a reset and default to PIO mode
> only. Any DMA transfer that is attempted simply hangs.
OK ... I'm grepping around in the source trying to figure out all of
this. Is it documented anywhere? That would really help me out at the
moment.
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Need help with libata error handling in libsas
2008-02-25 16:57 ` James Bottomley
@ 2008-02-25 17:45 ` Jeff Garzik
0 siblings, 0 replies; 4+ messages in thread
From: Jeff Garzik @ 2008-02-25 17:45 UTC (permalink / raw)
To: James Bottomley; +Cc: Brian King, linux-ide, linux-scsi
James Bottomley wrote:
> On Mon, 2008-02-25 at 10:34 -0600, Brian King wrote:
>> The new libata-eh is used for more than just EH. It is used for device
>> probing, device revalidation, and power management. It is also woken for
>> all command failures and is where the request sense for ATAPI devices is
>> issued. Device revalidation following reset is also critical for ATA and
>> ATAPI devices. One example of this is some SATA/PATA converter chips
>> lose their DMA xfer settings following a reset and default to PIO mode
>> only. Any DMA transfer that is attempted simply hangs.
Strongly seconded. Doing your own ATA EH would be foolish, as that
would imply duplicating all that carefully-time-tested logic handling
devices which follow the ATA specs... about 98% of the time :)
Just the set-transfer-mode logic took years to get right for the
majority of ATA devices.
> OK ... I'm grepping around in the source trying to figure out all of
> this. Is it documented anywhere? That would really help me out at the
> moment.
Unfortunately, not really. The simplistic version is... freeze, set
some flags, call a function to schedule EH as needed -- most notably
when your HBA signals an ATA device error or some other error in the ATA
domain.
Regardless of all this... libsas IMO will cause some libata-EH growing
pains. libsas needs libata-EH for probing, revalidation,
initialization, etc. But libsas probably does NOT need libata-EH for
certain duties like SATA PHY diagnosis and link handling.
libsas needs libata-EH. Unfortunately for libsas, libata-EH was written
from the "libata controls the world" point of view, and probably needs
some modifications to play well in the new SATA/SAS shared worldview.
Brian's recommendation is quite sane... your ->error_handler() probably
just needs hard reset (aka COMRESET) capability.
Jeff
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-02-25 17:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-24 21:02 Need help with libata error handling in libsas James Bottomley
2008-02-25 16:34 ` Brian King
2008-02-25 16:57 ` James Bottomley
2008-02-25 17:45 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).