public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* (unknown)
@ 2003-06-03 23:51 Justin T. Gibbs
  2003-06-03 23:58 ` Marc-Christian Petersen
  2003-06-04  1:34 ` Aic7x_x_x 6.2.36 && Aic79xx 1.3.10 Updates Justin T. Gibbs
  0 siblings, 2 replies; 19+ messages in thread
From: Justin T. Gibbs @ 2003-06-03 23:51 UTC (permalink / raw)
  To: linux-scsi, linux-kernel; +Cc: Linus Torvalds, Alan Cox, Marcelo Tosatti

Folks,

I've just uploaded version 1.3.10 of the aic79xx driver and version 
6.2.36 of the aic7xxx driver.  Both are available for 2.4.X and
2.5.X kernels in either bk send format or as a tarball from here:
 
http://people.FreeBSD.org/~gibbs/linux/SRC/

The change sets relative to the 2.5.X tree are:

ChangeSet@1.1275, 2003-06-03 17:35:01-06:00, gibbs@overdrive.btc.adaptec.com
  Update Aic79xx Readme

ChangeSet@1.1274, 2003-06-03 17:22:05-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Bump version number to 6.2.36
   o Document recent aic7xxx driver releases

ChangeSet@1.1273, 2003-06-03 17:20:14-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Bump driver version to 1.3.10
   o Document recent releases in driver readme.

ChangeSet@1.1272, 2003-05-31 21:12:09-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx and Aic79xx Driver Update
   o Work around negotiation firmware bug in the Quantum Atlas 10K
   o Clear stale PCI errors in our register mapping test to avoid
     false positives from rouge accesses to our registers that occur
     prior to our driver attach.

ChangeSet@1.1271, 2003-05-31 18:34:01-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Implement suspend and resume

ChangeSet@1.1270, 2003-05-31 18:32:36-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Fix some suspend and resume bugs

ChangeSet@1.1269, 2003-05-31 18:27:09-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Correct the type of the DV settings array.

ChangeSet@1.1268, 2003-05-31 18:25:28-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx and Aic79xx driver Update
   o Remove unecessary and incorrect use of ~0 as a mask.

ChangeSet@1.1267, 2003-05-30 13:50:00-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx and Aic79xx Driver Update
   o Adapt to 2.5.X SCSI proc interface change while maitaining
     compatibility with earlier kernels.

ChangeSet@1.1266, 2003-05-30 11:01:02-06:00, gibbs@overdrive.btc.adaptec.com
  Merge http://linux.bkbits.net/linux-2.5
  into overdrive.btc.adaptec.com:/usr/home/gibbs/bk/linux-2.5

ChangeSet@1.1215.4.6, 2003-05-30 10:50:17-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Bring in aic7xxx_reg_print.c update that was missed the
     last time the firmware was regenerated.  The old file worked
     fine, so this is mostly a cosmetic change.

ChangeSet@1.1215.4.5, 2003-05-30 10:48:31-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Correct non-zero lun output on post Rev A4 hardware
     in packetized mode.

ChangeSet@1.1215.4.4, 2003-05-30 10:46:03-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Return to using 16byte alignment for th SCB_TAG field in our SCB.
     The hardware seems to corrupt SCBs on some PCI platforms with the
     tag field in its old location.

ChangeSet@1.1215.4.3, 2003-05-30 10:43:20-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Adopt 2.5.X EISA framework for probing aic7770 controllers

ChangeSet@1.1215.4.2, 2003-05-30 10:31:04-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Correct card identifcation string for the 2920C



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re:
  2003-06-03 23:51 (unknown) Justin T. Gibbs
@ 2003-06-03 23:58 ` Marc-Christian Petersen
  2003-06-04  1:34 ` Aic7x_x_x 6.2.36 && Aic79xx 1.3.10 Updates Justin T. Gibbs
  1 sibling, 0 replies; 19+ messages in thread
From: Marc-Christian Petersen @ 2003-06-03 23:58 UTC (permalink / raw)
  To: Justin T. Gibbs, linux-scsi, linux-kernel
  Cc: Linus Torvalds, Alan Cox, Marcelo Tosatti

On Wednesday 04 June 2003 01:51, Justin T. Gibbs wrote:

Hi Justin,

> I've just uploaded version 1.3.10 of the aic79xx driver and version
> 6.2.36 of the aic7xxx driver.  Both are available for 2.4.X and
> 2.5.X kernels in either bk send format or as a tarball from here:
many thanks! I'll update them for my tree (as always with your updates 
:-)

ciao, Marc


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Aic7x_x_x 6.2.36 && Aic79xx 1.3.10 Updates
  2003-06-03 23:51 (unknown) Justin T. Gibbs
  2003-06-03 23:58 ` Marc-Christian Petersen
@ 2003-06-04  1:34 ` Justin T. Gibbs
  2003-12-24 16:58   ` Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates Justin T. Gibbs
  1 sibling, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2003-06-04  1:34 UTC (permalink / raw)
  To: linux-scsi, linux-kernel; +Cc: Linus Torvalds, Alan Cox, Marcelo Tosatti

[Resent with a subject this time.  Hit send too soon...]

Folks,

I've just uploaded version 1.3.10 of the aic79xx driver and version 
6.2.36 of the aic7xxx driver.  Both are available for 2.4.X and
2.5.X kernels in either bk send format or as a tarball from here:
 
http://people.FreeBSD.org/~gibbs/linux/SRC/

The change sets relative to the 2.5.X tree are:

ChangeSet@1.1275, 2003-06-03 17:35:01-06:00, gibbs@overdrive.btc.adaptec.com
  Update Aic79xx Readme

ChangeSet@1.1274, 2003-06-03 17:22:05-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Bump version number to 6.2.36
   o Document recent aic7xxx driver releases

ChangeSet@1.1273, 2003-06-03 17:20:14-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Bump driver version to 1.3.10
   o Document recent releases in driver readme.

ChangeSet@1.1272, 2003-05-31 21:12:09-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx and Aic79xx Driver Update
   o Work around negotiation firmware bug in the Quantum Atlas 10K
   o Clear stale PCI errors in our register mapping test to avoid
     false positives from rouge accesses to our registers that occur
     prior to our driver attach.

ChangeSet@1.1271, 2003-05-31 18:34:01-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Implement suspend and resume

ChangeSet@1.1270, 2003-05-31 18:32:36-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Fix some suspend and resume bugs

ChangeSet@1.1269, 2003-05-31 18:27:09-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Correct the type of the DV settings array.

ChangeSet@1.1268, 2003-05-31 18:25:28-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx and Aic79xx driver Update
   o Remove unecessary and incorrect use of ~0 as a mask.

ChangeSet@1.1267, 2003-05-30 13:50:00-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx and Aic79xx Driver Update
   o Adapt to 2.5.X SCSI proc interface change while maitaining
     compatibility with earlier kernels.

ChangeSet@1.1266, 2003-05-30 11:01:02-06:00, gibbs@overdrive.btc.adaptec.com
  Merge http://linux.bkbits.net/linux-2.5
  into overdrive.btc.adaptec.com:/usr/home/gibbs/bk/linux-2.5

ChangeSet@1.1215.4.6, 2003-05-30 10:50:17-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Bring in aic7xxx_reg_print.c update that was missed the
     last time the firmware was regenerated.  The old file worked
     fine, so this is mostly a cosmetic change.

ChangeSet@1.1215.4.5, 2003-05-30 10:48:31-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Correct non-zero lun output on post Rev A4 hardware
     in packetized mode.

ChangeSet@1.1215.4.4, 2003-05-30 10:46:03-06:00, gibbs@overdrive.btc.adaptec.com
  Aic79xx Driver Update
   o Return to using 16byte alignment for th SCB_TAG field in our SCB.
     The hardware seems to corrupt SCBs on some PCI platforms with the
     tag field in its old location.

ChangeSet@1.1215.4.3, 2003-05-30 10:43:20-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Adopt 2.5.X EISA framework for probing aic7770 controllers

ChangeSet@1.1215.4.2, 2003-05-30 10:31:04-06:00, gibbs@overdrive.btc.adaptec.com
  Aic7xxx Driver Update
   o Correct card identifcation string for the 2920C


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-06-04  1:34 ` Aic7x_x_x 6.2.36 && Aic79xx 1.3.10 Updates Justin T. Gibbs
@ 2003-12-24 16:58   ` Justin T. Gibbs
  2003-12-24 17:50     ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2003-12-24 16:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: Linus Torvalds, Alan Cox, Marcelo Tosatti

Folks,

I've just uploaded version 2.0.5 of the aic79xx driver and version 
6.3.4 of the aic7xxx driver.  Both drivers are available for 2.4.X and
2.6.X kernels in either bk send format or as a tarball from here:
 
http://people.FreeBSD.org/~gibbs/linux/SRC/

Updated RPMs and DUDs for many distributions can also be found here:

http://people.FreeBSD.org/~gibbs/linux/RPM/aic79xx/
http://people.FreeBSD.org/~gibbs/linux/RPM/aic7xxx/
http://people.FreeBSD.org/~gibbs/linux/DUD/aic79xx/
http://people.FreeBSD.org/~gibbs/linux/DUD/aic7xxx/

The full revision history of these drivers can be found here:

http://people.freebsd.org/~gibbs/linux/CHANGELOG

Changeset descriptions for all changesets not already in the 2.4
and 2.5 trees can be found here:

http://people.freebsd.org/~gibbs/linux/SRC/aic79xx-linux-2.4-20031222.changes 

http://people.freebsd.org/~gibbs/linux/SRC/aic79xx-linux-2.5-20031222.changes 

If you have an questions or problems with the new release of these
drivers, please let me know.

Thanks,
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-24 16:58   ` Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates Justin T. Gibbs
@ 2003-12-24 17:50     ` James Bottomley
       [not found]       ` <2148850000.1072292121@aslan.scsiguy.com>
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2003-12-24 17:50 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

On Wed, 2003-12-24 at 10:58, Justin T. Gibbs wrote:
> I've just uploaded version 2.0.5 of the aic79xx driver and version 
> 6.3.4 of the aic7xxx driver.  Both drivers are available for 2.4.X and
> 2.6.X kernels in either bk send format or as a tarball from here:

I thought when we adopted the latest aic driver for the 2.5 kernel, we
agreed to two things:

1. You'd send small regular updates as patches to linux-scsi
2. You wouldn't try to dump megabytes of patches into the stable kernel.

The diffstats on this patch are:

40 files changed, 6933 insertions(+), 5325 deletions(-)

And it's 640k in size.  You're also trying to rev the aic79xx driver
from 1.3.9 to 2.0.5; that's a bit of a major revision for a bugfix
release.

I'm really not happy dropping this into 2.6.1; what breadth of testing
outside of Adaptec has it had?

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
       [not found]       ` <2148850000.1072292121@aslan.scsiguy.com>
@ 2003-12-24 19:05         ` James Bottomley
  2003-12-25  4:31           ` Justin T. Gibbs
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2003-12-24 19:05 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

On Wed, 2003-12-24 at 12:55, Justin T. Gibbs wrote:
> The last 10% is a change to having the driver completely do its own
> error recovery.  This change originated in late July and has received
> extensive testing since then.  This is the reason that a major driver
> version number bump was required for both drivers.  It is just not
> possible to get sane error recovery behavior if the mid-layer ever
> sees a timeout, so this really is a *bug fix*.

Elaborate on this more please...the error handling has been
substantially revised between 2.4 and 2.6 with a view to making it more
robust.  I don't recall seeing any bug reports from adaptec on the
issue, but if there's a mid-layer problem, I'm sure we can fix it.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-24 19:05         ` James Bottomley
@ 2003-12-25  4:31           ` Justin T. Gibbs
  2003-12-26 18:36             ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2003-12-25  4:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

> On Wed, 2003-12-24 at 12:55, Justin T. Gibbs wrote:
>> The last 10% is a change to having the driver completely do its own
>> error recovery.  This change originated in late July and has received
>> extensive testing since then.  This is the reason that a major driver
>> version number bump was required for both drivers.  It is just not
>> possible to get sane error recovery behavior if the mid-layer ever
>> sees a timeout, so this really is a *bug fix*.
> 
> Elaborate on this more please...the error handling has been
> substantially revised between 2.4 and 2.6 with a view to making it more
> robust.  I don't recall seeing any bug reports from adaptec on the
> issue, but if there's a mid-layer problem, I'm sure we can fix it.

Other than some "refactoring" of code, the 2.4 and 2.6 SCSI layer
error recovery model and behavior is largely unchanged.  In fact,
the behavior is almost identical to the new-eh 2.2 SCSI layer.  I
listed most of my complaints about the error recovery model back
in late 2000 and early 2001, so I was under the impression that my
comments in this area were widely known.  I will list them here again
briefly.  If you want to go into more details about my concerns, I'd
be happy to do so after the first of the year - I hope to be spending
very little time in front of a computer until then.

The crux of the problem is that *watchdog error recovery* is happening
at entirely the wrong level in Linux.  [I emphasize *watchdog* since
real-time applications must have the ability to shoot down arbitrary
commands that take too long.  The current driver hooks being used by
the mid-layer error recovery work sufficiently for this purpose.]
Certainly, having common error recovery code provides all of the benefits
of having centralized code, but code operating at the mid-layer cannot
know with sufficient details what is actually going on with the storage
subsystem to make intelligent decisions.  To illustrate my point, lets
review the current error recovery strategy:

  1) When a command times out, it increments the host_failed count.
     We also stop the queuing of new commands to the host by setting
     the "in recovery" host flag.
  
  2) Once all commands have either timed-out or completed
     (host_failed == host_busy), the recovery thread is woken up
     to recover any failed commands.
  
  3) We loop through all failed commands and:
  
  	a) Issue an abort request to the HBA.
  	b) If the abort is successful, use that same
  	   command structure to issue a TUR.
  
  3) If any abort requests fails we loop through each device on the
     host that has failed commands and issue a BDR.
  
  4) If any BDR requests fail, we perform a bus reset.
  
  Also keep in mind that any timed-out command that completes via
  scsi_done() is ignored.
  
Some of the problems with this strategy are:

1) During recovery, access to perfectly viable devices is cut off.

2) The mid-layer doesn't know which of the timed-out commands is the root
   cause of the failure.  It assumes, since it doesn't have access to
   better information, that all commands that have timed-out are equally
   dead.

3) If the mid-layer happens to abort a command that *is* the root cause
   of the failure, the completions of all the "released" commands are
   ignored.  This causes the mid-layer to request aborts for commands
   that are not outstanding and then replay these commands that have
   already completed successfully.  The replay may have unintended
   side-effects - replay order is not maintained and no thought is given
   to non-DASD devices where replay is destructive.  The replay may
   also occur on a device that never really failed, but what held off
   due to an error on another device.

4) The TUR that occurs after each abort causes the recovery process to
   take an inordinate amount of time.  Consider that the mid-layer can't
   pick the most likely command to abort and that with lots of commands
   outstanding chances are that at least half of the commands will have
   to be aborted before the *right one* is aborted.

In general, the HBA driver has sufficient information to greatly limit
the scope of its recovery efforts.  It can also do this with the least
amount of impact to perfectly operational devices.  For example, when
a command times-out, the HBA can determine things like:

 o Has this command actually been issued to a device?
 o Is some other command currently *hogging* the wire/bus?
 o Is this command currently active on the wire/bus?

etc.  This allows both the HBA drivers to quickly decide if there is
sufficient information to perform a targeted recovery (command stuck on
the bus is the problem) and if not, immediately elevate recovery to
harsher measures.  In the aic7xxx and aic79xx drivers, recovery is
completed within a few milliseconds of a timeout and at worse, in 5
seconds.  With the current mid-layer strategy and 10s of commands
outstanding, recovery typically takes minutes.  In the case of 2.4,
you're lucky if recovery *ever* completes. 8-)

In general, I prefer the CAM model.  Briefly, this means, let the
HBA drivers do what they can do best, provide as much information to
the peripheral drivers so they can do their job correctly, and provide
a "mid-layer" to simply route commands between the two.  This avoids
having a mid-layer that second guesses, often incorrectly, both ends
of the system.

--
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-25  4:31           ` Justin T. Gibbs
@ 2003-12-26 18:36             ` James Bottomley
  2003-12-27  0:13               ` Justin T. Gibbs
  2003-12-27 15:17               ` Alan Cox
  0 siblings, 2 replies; 19+ messages in thread
From: James Bottomley @ 2003-12-26 18:36 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

On Wed, 2003-12-24 at 22:31, Justin T. Gibbs wrote:
> The crux of the problem is that *watchdog error recovery* is happening
> at entirely the wrong level in Linux. 

So this is actually an architectural complaint, not a bug in the SCSI
mid-layer as previously stated.

[...]
> Some of the problems with this strategy are:
> 
> 1) During recovery, access to perfectly viable devices is cut off.
> 
> 2) The mid-layer doesn't know which of the timed-out commands is the root
>    cause of the failure.  It assumes, since it doesn't have access to
>    better information, that all commands that have timed-out are equally
>    dead.
> 
> 3) If the mid-layer happens to abort a command that *is* the root cause
>    of the failure, the completions of all the "released" commands are
>    ignored.  This causes the mid-layer to request aborts for commands
>    that are not outstanding and then replay these commands that have
>    already completed successfully.  The replay may have unintended
>    side-effects - replay order is not maintained and no thought is given
>    to non-DASD devices where replay is destructive.  The replay may
>    also occur on a device that never really failed, but what held off
>    due to an error on another device.
> 
> 4) The TUR that occurs after each abort causes the recovery process to
>    take an inordinate amount of time.  Consider that the mid-layer can't
>    pick the most likely command to abort and that with lots of commands
>    outstanding chances are that at least half of the commands will have
>    to be aborted before the *right one* is aborted.

But your complaint is only that recovery takes longer than you think you
can do in the driver.

If error recovery were critical path in SCSI performance, this might be
a consideration, but it isn't...error recovery should be the exception,
not the rule.

[...]
> In general, I prefer the CAM model.  Briefly, this means, let the
> HBA drivers do what they can do best, provide as much information to
> the peripheral drivers so they can do their job correctly, and provide
> a "mid-layer" to simply route commands between the two.  This avoids
> having a mid-layer that second guesses, often incorrectly, both ends
> of the system.

The CAM (Common Access Model) was last updated in 1995 and is extremely
SCSI-2 (and hence parallel SCSI) specific.  The successive t10
committees charged with rewriting it have never successfully produced a
draft standard that has been published on the t10 site.

The linux SCSI subsystem follows the SAM (Scsi Architecture Model) which
was published as the backbone to SCSI-3 (SAM-3 was last updated in
November 2003).  I find it's command/transport separation extremely
appealing.  It has helped us to add new transports like Fibre and Even
SATA to the mix with relative ease.  This lack of command/transport
separation is, in my view, the biggest hole in CAM, and the reason why
we'll be continuing with SAM for Linux SCSI.

I cannot deny that the current error handler, trying to be all things to
all devices/transports, is out of kilter with this vision...it should,
at the very least have transport and device components...However, in
2.6, it does at least work.

On the Futures roadmap for the block layer in 2.7 is stackable error
recovery (you can already see the beginnings of this in the fastfail
processing) which will form the basis of async I/O, multi-path and
software RAID.

>From a technical perspective, the way you try to thwart mid-layer error
recovery: intercept all the SCSI timers and substitute your own, is
extremely ugly (and leads to quite a bit of code duplication) but it's
surely going to cause a conflict with the evolving stackable error
handling.

If you want to help us with the transport and device separations of the
error handler, you're more than welcome, but trying to pull all error
handling into your driver isn't useful because it adds layering
violations, promotes compatibility problems and cannot be used by any
other driver.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-26 18:36             ` James Bottomley
@ 2003-12-27  0:13               ` Justin T. Gibbs
  2003-12-27  3:20                 ` James Bottomley
  2003-12-27 15:17               ` Alan Cox
  1 sibling, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2003-12-27  0:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

>> The crux of the problem is that *watchdog error recovery* is happening
>> at entirely the wrong level in Linux. 
> 
> So this is actually an architectural complaint, not a bug in the SCSI
> mid-layer as previously stated.

No.  There are several bugs in the mid-layer.  Replaying commands that
have successfully completed *is a bug*.  Stopping I/O to all devices
during recovery *is a bug*.  The current recovery handler is also fairly
stupid in how it does things.  I would also call this a bug.  In at
least 2.4 (haven't tested 2.6 yet), error recovery will loop forever.
This is also a bug.  The fact that peripheral drivers are not in control
of the replay of commands that fail is also a bug. etc. etc.

> But your complaint is only that recovery takes longer than you think you
> can do in the driver.
> 
> If error recovery were critical path in SCSI performance, this might be
> a consideration, but it isn't...error recovery should be the exception,
> not the rule.

Recovery is critical.  Why have failover controllers if it takes several
minutes for that failover to succeed.  The whole point of these controllers
is to allow a critical service to continue to operate almost uninterrupted.

> [...]
>> In general, I prefer the CAM model.  Briefly, this means, let the
>> HBA drivers do what they can do best, provide as much information to
>> the peripheral drivers so they can do their job correctly, and provide
>> a "mid-layer" to simply route commands between the two.  This avoids
>> having a mid-layer that second guesses, often incorrectly, both ends
>> of the system.
> 
> The CAM (Common Access Model) was last updated in 1995 and is extremely
> SCSI-2 (and hence parallel SCSI) specific.

You are missing the point of CAM (either version 2 or 3).  CAM was
designed to be transport agnostic.  It is not a replacement for SAM,
or any other high or low level protocol.  It is simply a routing engine
for  CAM Control Blocks (CCB), and a set of rules for how those CCBs
are routed between a peripheral driver and the low level "SIM" driver.
The exact details listed in the spec for the different CCB types are mostly
irrelevant since a single CAM subsystem may provide support for past and
future transport types that the spec couldn't envision.  In CAM, this
type of extension means defining a few CCB types for the new transport
and reusing the same routing engine unaltered.

> The successive t10
> committees charged with rewriting it have never successfully produced a
> draft standard that has been published on the t10 site.

This is because no-one wanted to rewrite their SCSI layers to be in
complete compliance with the letter and verse of CAM (i.e. the actual
CCB structure definitions listed in the CAM spec).  I was at the last
meeting of the CAM subcommittee so I know why it was disbanded.

> The linux SCSI subsystem follows the SAM (Scsi Architecture Model) which
> was published as the backbone to SCSI-3 (SAM-3 was last updated in
> November 2003).  I find it's command/transport separation extremely
> appealing.  It has helped us to add new transports like Fibre and Even
> SATA to the mix with relative ease.  This lack of command/transport
> separation is, in my view, the biggest hole in CAM, and the reason why
> we'll be continuing with SAM for Linux SCSI.

I'm fully aware of SAM and the whole T10 family of specs.  To believe that
CAM is transport specific is again to completely miss why it was written.
SAM is not even close to a replacement for CAM.

> I cannot deny that the current error handler, trying to be all things to
> all devices/transports, is out of kilter with this vision...it should,
> at the very least have transport and device components...However, in
> 2.6, it does at least work.

If the SCSI layer didn't have the critical flaws that:

 o scsi_done() doesn't always complete commands.
 o the HBA is not informed of timeouts until the HBA is idle
   and all other commands have timed out.
 o the HBA cannot discern between requests related to watchdog
   recovery and "normal" task management.

I would have just rewrittent the error handler.  Unfortunately,
you can't avoid the above three issues without canceling and
restoring timeouts.

> On the Futures roadmap for the block layer in 2.7 is stackable error
> recovery (you can already see the beginnings of this in the fastfail
> processing) which will form the basis of async I/O, multi-path and
> software RAID.

In general, the peripheral driver should get the first crack at any
status returned by an HBA driver.  Until that occurs, the Linux SCSI
layer is critically flawed.  If you are interested, you can look at
how the FreeBSD SCSI layer deals with these issues.  The peripheral
driver "filters" all errors and defaults to using a common, generic
error handler for errors that do not need special handling.  Right
now, the Linux mid-layer hides information and performs actions that
are not necessarilly what the peripheral driver wants.  Other than
perhaps statistics gathering, and other actions that are not visible
to the end device, this should not be the case.

> From a technical perspective, the way you try to thwart mid-layer error
> recovery: intercept all the SCSI timers and substitute your own, is
> extremely ugly (and leads to quite a bit of code duplication) but it's
> surely going to cause a conflict with the evolving stackable error
> handling.

I doubt that.  The higher levels will never see a timeout.  For other
transport errors that are visible in terms of returned SCSI status,
the abort, BDR, and reset entry points are fully functional.  The
current strategy also guarantees that I can quickly and easily correct
any bug reports that are sent to me regarding how my drivers perform
recovery.  This was exactly why I took the steps I did.

> If you want to help us with the transport and device separations of the
> error handler, you're more than welcome, but trying to pull all error
> handling into your driver isn't useful because it adds layering
> violations, promotes compatibility problems and cannot be used by any
> other driver.

I'm not pulling all error recovery into my driver.  I'm pulling transport
specific *watchdog recovery* into my driver.  It is the HBA's job to ensure
that it can access the devices attached to it.  The peripheral driver's
job is to inform the HBA of a drop-dead time for a command.  Recovering
from a timeout is actually very straight forward if you have the information
you need to do it correctly.  This can only be done at the HBA where HBA
specific state can be referenced to pick the correct type of action.  This,
to my mind, is just a minor extention to the transport validation and
recovery that HBAs also must do in order to be robust.  Protocol and device
specific recovery (related to SCSI status, residuals, etc.) will continue to
be performed by the peripheral driver (or as in the case of Linux, by
the mid-layer).  I have no interest in *doing it all*, only what I have
to for the drivers I maintain to be robust.

--
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27  0:13               ` Justin T. Gibbs
@ 2003-12-27  3:20                 ` James Bottomley
  2003-12-27  4:26                   ` Justin T. Gibbs
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2003-12-27  3:20 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

On Fri, 2003-12-26 at 18:13, Justin T. Gibbs wrote:
> Recovery is critical.  Why have failover controllers if it takes several
> minutes for that failover to succeed.  The whole point of these controllers
> is to allow a critical service to continue to operate almost uninterrupted.

Recovery for failover controllers will be done at a higher level using
the fastfail mechanism.

> > The successive t10
> > committees charged with rewriting it have never successfully produced a
> > draft standard that has been published on the t10 site.
> 
> This is because no-one wanted to rewrite their SCSI layers to be in
> complete compliance with the letter and verse of CAM (i.e. the actual
> CCB structure definitions listed in the CAM spec).  I was at the last
> meeting of the CAM subcommittee so I know why it was disbanded.

OK, we have differing views about CAM.  However, regardless of why it
happened, CAM is dead and the committee disbanded.  SCSI development
will go on without reference to CAM.

> In general, the peripheral driver should get the first crack at any
> status returned by an HBA driver.  Until that occurs, the Linux SCSI
> layer is critically flawed.  If you are interested, you can look at
> how the FreeBSD SCSI layer deals with these issues.  The peripheral
> driver "filters" all errors and defaults to using a common, generic
> error handler for errors that do not need special handling.  Right
> now, the Linux mid-layer hides information and performs actions that
> are not necessarilly what the peripheral driver wants.  Other than
> perhaps statistics gathering, and other actions that are not visible
> to the end device, this should not be the case.

That's not true.  For a fatal transport error in a multi-path device all
you'll do is delay the inevitable switchover.  That's why trying to
second guess error recovery like this is counter productive.

> I'm not pulling all error recovery into my driver.  I'm pulling transport
> specific *watchdog recovery* into my driver.  It is the HBA's job to ensure
> that it can access the devices attached to it.  The peripheral driver's
> job is to inform the HBA of a drop-dead time for a command.  Recovering
> from a timeout is actually very straight forward if you have the information
> you need to do it correctly.  This can only be done at the HBA where HBA
> specific state can be referenced to pick the correct type of action.  This,
> to my mind, is just a minor extention to the transport validation and
> recovery that HBAs also must do in order to be robust.  Protocol and device
> specific recovery (related to SCSI status, residuals, etc.) will continue to
> be performed by the peripheral driver (or as in the case of Linux, by
> the mid-layer).  I have no interest in *doing it all*, only what I have
> to for the drivers I maintain to be robust.

By stopping the timers and redirecting to an internal thread in your
driver, you are subverting all of the error recovery for your driver.

I appreciate its a hard thing to be dependent on code outside your
control.  However, duplicating functionality solely to bring code under
your control is not the correct approach.  The open source philosophy is
to encourage people to get involved in areas of code outside their
direct responsibility when this happens and, inevitably, to try to reach
an amicable compromise about fixing it.  This is one of the reasons why
good open source developers tend to have a strong record of
contributions outside their perceived fields of expertise.

I'm prepared to allow driver writers a considerable amount of slack in
terms of deviation from the coding standards, useless and obfuscating
compatibility layers and #ifdef'd code that can never be compiled in
2.6; however, this attempt to hijack the basic SCSI APIs within the
adaptec driver is unacceptable.  Please take it out and resubmit the
patch without it.

Thanks,

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27  3:20                 ` James Bottomley
@ 2003-12-27  4:26                   ` Justin T. Gibbs
  2003-12-27  6:08                     ` Jeff Garzik
                                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Justin T. Gibbs @ 2003-12-27  4:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

> On Fri, 2003-12-26 at 18:13, Justin T. Gibbs wrote:
>> Recovery is critical.  Why have failover controllers if it takes several
>> minutes for that failover to succeed.  The whole point of these controllers
>> is to allow a critical service to continue to operate almost uninterrupted.
> 
> Recovery for failover controllers will be done at a higher level using
> the fastfail mechanism.

You're telling me that fail-over for a multi-controller external RAID box
connected to a single SCSI controller will occur at a higher level?  The
fail-over has already occurred by the time the HBA see the timeout.  This
means that completion of recovery is the only impediment to completing the
fail-over.

As for higher level fail-over, provide the correct transaction error
codes, and high level fail-over is possible regardless of the transport
specific recovery that occurs in my or other drivers.  In the FreeBSD
system, I can indicate that a command timed-out and was successfully
aborted by the driver.  This is all that is required for multi-path
fail-over to occur - the errors are *still* visible at this level.  Its
not as though my driver *retries* commands that timeout.  It only
aborts them and returns the appropriate status.

> OK, we have differing views about CAM.  However, regardless of why it
> happened, CAM is dead and the committee disbanded.  SCSI development
> will go on without reference to CAM.

Perhaps your development will.  Having a "Common Access Method" for
I/O is still a worthy goal in my mind regardless of whether T10 ratifies
some standard on the subject.  Continuing to make the "SCSI layer" SCSI
centric (or any particular protocol - "centric") limits the ability of
the sub-system to gracefully handle emerging protocols.

>> In general, the peripheral driver should get the first crack at any
>> status returned by an HBA driver.  Until that occurs, the Linux SCSI
>> layer is critically flawed.  If you are interested, you can look at
>> how the FreeBSD SCSI layer deals with these issues.  The peripheral
>> driver "filters" all errors and defaults to using a common, generic
>> error handler for errors that do not need special handling.  Right
>> now, the Linux mid-layer hides information and performs actions that
>> are not necessarily what the peripheral driver wants.  Other than
>> perhaps statistics gathering, and other actions that are not visible
>> to the end device, this should not be the case.
> 
> That's not true.  For a fatal transport error in a multi-path device all
> you'll do is delay the inevitable switchover.  That's why trying to
> second guess error recovery like this is counter productive.

On the contrary.  The fail-over agent gets notification of the error
faster with my driver's recovery than with the current mid-layer strategy
since "transport recovery" and the resulting failed command status is
transmitted with at most a 5 second delay beyond the watchdog timeout value.
The current mid-layer mechanism often takes orders of magnitude longer to
recover and doesn't indicate to anyone that a failure has occurred until
recovery is complete.

> By stopping the timers and redirecting to an internal thread in your
> driver, you are subverting all of the error recovery for your driver.

No.  Only watchdog recovery.  The timers are only used for watchdog
recovery.  Proper status, even for commands aborted due to timeouts,
is still given, so status based recovery is still effective.

> I appreciate its a hard thing to be dependent on code outside your
> control.  However, duplicating functionality solely to bring code under
> your control is not the correct approach.  The open source philosophy is
> to encourage people to get involved in areas of code outside their
> direct responsibility when this happens and, inevitably, to try to reach
> an amicable compromise about fixing it.  This is one of the reasons why
> good open source developers tend to have a strong record of
> contributions outside their perceived fields of expertise.

I've already written one OpenSource SCSI layer, I think I've contributed
more than enough in that particular area.  Back in late 2000 and early 2001,
I voiced my opinions, based on that experience, on how Linux should improve
its SCSI subsystem.  After 3 years of waiting for improvement in the
error recovery semantics of Linux, I had to do something to satisfy customer
complaints on error recovery.  I still don't see things improving in
this area of Linux and I'm not about to *break* the driver until there is
a viable alternative.

> I'm prepared to allow driver writers a considerable amount of slack in
> terms of deviation from the coding standards, useless and obfuscating
> compatibility layers and #ifdef'd code that can never be compiled in
> 2.6; however, this attempt to hijack the basic SCSI APIs within the
> adaptec driver is unacceptable.  Please take it out and resubmit the
> patch without it.

I'm sorry you feel that way.  I suppose I will just have to continue
to point distributors and users of this driver to my own patch sets since
that seems to be the only viable alternative you've given me.

--
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27  4:26                   ` Justin T. Gibbs
@ 2003-12-27  6:08                     ` Jeff Garzik
  2003-12-27 15:11                     ` Alan Cox
  2003-12-27 15:52                     ` James Bottomley
  2 siblings, 0 replies; 19+ messages in thread
From: Jeff Garzik @ 2003-12-27  6:08 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: James Bottomley, SCSI Mailing List, Linus Torvalds, Alan Cox,
	Marcelo Tosatti, Andrew Morton

On Fri, Dec 26, 2003 at 09:26:10PM -0700, Justin T. Gibbs wrote:
> > On Fri, 2003-12-26 at 18:13, Justin T. Gibbs wrote:
> > I'm prepared to allow driver writers a considerable amount of slack in
> > terms of deviation from the coding standards, useless and obfuscating
> > compatibility layers and #ifdef'd code that can never be compiled in
> > 2.6; however, this attempt to hijack the basic SCSI APIs within the
> > adaptec driver is unacceptable.  Please take it out and resubmit the
> > patch without it.
> 
> I'm sorry you feel that way.  I suppose I will just have to continue
> to point distributors and users of this driver to my own patch sets since
> that seems to be the only viable alternative you've given me.

Red Hat has a very strong "do it in upstream first" policy these days.

There are reasons for this, and this situation is _precisely_ why.

	Jeff




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27  4:26                   ` Justin T. Gibbs
  2003-12-27  6:08                     ` Jeff Garzik
@ 2003-12-27 15:11                     ` Alan Cox
  2003-12-27 15:47                       ` Justin T. Gibbs
  2003-12-27 15:52                     ` James Bottomley
  2 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2003-12-27 15:11 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: James Bottomley, SCSI Mailing List, Linus Torvalds,
	Marcelo Tosatti, Andrew Morton

On Sad, 2003-12-27 at 04:26, Justin T. Gibbs wrote:
> I'm sorry you feel that way.  I suppose I will just have to continue
> to point distributors and users of this driver to my own patch sets since
> that seems to be the only viable alternative you've given me.

Is that an official Adaptec statement, or should I ask Adaptec if they
can assist in resolving this matter ?

Alan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-26 18:36             ` James Bottomley
  2003-12-27  0:13               ` Justin T. Gibbs
@ 2003-12-27 15:17               ` Alan Cox
  2003-12-27 15:54                 ` James Bottomley
  2003-12-27 16:02                 ` Justin T. Gibbs
  1 sibling, 2 replies; 19+ messages in thread
From: Alan Cox @ 2003-12-27 15:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Justin T. Gibbs, SCSI Mailing List, Linus Torvalds,
	Marcelo Tosatti, Andrew Morton

On Gwe, 2003-12-26 at 18:36, James Bottomley wrote:
> >From a technical perspective, the way you try to thwart mid-layer error
> recovery: intercept all the SCSI timers and substitute your own, is
> extremely ugly (and leads to quite a bit of code duplication) but it's
> surely going to cause a conflict with the evolving stackable error
> handling.

Or install your own EH handler. You don't have to use the kernel eh at
all, you can just opt out of it and for some cards like smart raid
devices its precisely the right thing to do. I can believe the argument
that for fast failover you want to do EH differently, and nothing in the
architecture seems to make that hard providing a driver is willing to
opt cleanly out of the scsi_eh handler.

Similarly it would be nice to be able to revector the actual scsi eh
timeouts via an optional handler. Something of the form


int myscsi_timeout_handler(Scsi_Cmd *cmd, int reason, void *.. whatever)
{
	...

	if(reason == SCSI_TIMEOUT && host->bus_state == FIBRE_DOWN && 
		priv->retry_count < 5)
		return SCSI_RETRY(60);
	return SCSI_DEFAULT;
}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27 15:11                     ` Alan Cox
@ 2003-12-27 15:47                       ` Justin T. Gibbs
  0 siblings, 0 replies; 19+ messages in thread
From: Justin T. Gibbs @ 2003-12-27 15:47 UTC (permalink / raw)
  To: Alan Cox
  Cc: James Bottomley, SCSI Mailing List, Linus Torvalds,
	Marcelo Tosatti, Andrew Morton

> On Sad, 2003-12-27 at 04:26, Justin T. Gibbs wrote:
>> I'm sorry you feel that way.  I suppose I will just have to continue
>> to point distributors and users of this driver to my own patch sets since
>> that seems to be the only viable alternative you've given me.
>
> Is that an official Adaptec statement, or should I ask Adaptec if they
> can assist in resolving this matter ?
 
That is my statement.  You can escalate this matter to whatever formum
you feel is most appropriate.
 
That said, Adaptec's typical stance on these matters is that if a change
is required to these drivers in order to make them work correctly
(i.e. something required by a major customer) that for some reason is not
embedded, that providing separate DUD/RPM/SRC distributions to
our customers is sufficient.  This change is no different.  The current
SCSI layer offers no other alternative to having fully functional watchdog
recovery.  This is not a new revelation and my drivers are not the first
to take *extreme measures* to try and avoid the brokenness.  The changes
were not made in an attempt to be inflammatory, they were made to fix
real field issues that could be resolved within the driver in no other way.
 
James seems to imply that it is my responsibility as a "good open source
developer" to "contribute" the fixes to the SCSI layer to correct these
issues.  I would think that the SCSI layer maintainer would not only
understand these issues, but take up the charge of discussing and correcting
them.  Since we don't agree, on a fundamental level, on what the problem is
much less how to fix it, I think that having me go off and come up with fixes
would be counter productive to all involved.
 
If there is sincere interest in correcting error recovery in the Linux
SCSI layer, I'm still more than happy to give input and perhaps even
provide code.  But to say that I cannot embed a driver that has to skirt
mid-layer brokenness in order to be fully functional - regardless of how
it does that - until that brokenness can be corrected, is a disservice to
all users of this hardware.
 
--
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27  4:26                   ` Justin T. Gibbs
  2003-12-27  6:08                     ` Jeff Garzik
  2003-12-27 15:11                     ` Alan Cox
@ 2003-12-27 15:52                     ` James Bottomley
  2 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2003-12-27 15:52 UTC (permalink / raw)
  To: Justin T. Gibbs
  Cc: SCSI Mailing List, Linus Torvalds, Alan Cox, Marcelo Tosatti,
	Andrew Morton

On Fri, 2003-12-26 at 22:26, Justin T. Gibbs wrote:
> You're telling me that fail-over for a multi-controller external RAID box
> connected to a single SCSI controller will occur at a higher level?  The
> fail-over has already occurred by the time the HBA see the timeout.  This
> means that completion of recovery is the only impediment to completing the
> fail-over.

The usual architecture of a multiple controller RAID box is no
SPOF...Therefore *two* or more SCSI cards...local in card attempts at
recovery only delays eventual failover.

> I've already written one OpenSource SCSI layer, I think I've contributed
> more than enough in that particular area.  Back in late 2000 and early 2001,
> I voiced my opinions, based on that experience, on how Linux should improve
> its SCSI subsystem.  After 3 years of waiting for improvement in the
> error recovery semantics of Linux, I had to do something to satisfy customer
> complaints on error recovery.  I still don't see things improving in
> this area of Linux and I'm not about to *break* the driver until there is
> a viable alternative.

This is open source...areas which cause problems for many people get
fixed (OK, often many times).  Areas that only annoy one person don't
get fixed just by expressing that annoyance.

> I'm sorry you feel that way.  I suppose I will just have to continue
> to point distributors and users of this driver to my own patch sets since
> that seems to be the only viable alternative you've given me.

OK, I'll be sorry to see it happen, but if Adaptec formally wishes to
relinquish maintainership of the aic7xxx/aic79xx drivers and develop
their own fork of the kernel, that is, of course, their right under the
GPL.  If this is what you want to do, could you send the list a note to
that effect so that I can begin looking for a new maintainer now.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27 15:17               ` Alan Cox
@ 2003-12-27 15:54                 ` James Bottomley
  2003-12-27 23:55                   ` Alan Cox
  2003-12-27 16:02                 ` Justin T. Gibbs
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2003-12-27 15:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: Justin T. Gibbs, SCSI Mailing List, Linus Torvalds,
	Marcelo Tosatti, Andrew Morton

On Sat, 2003-12-27 at 09:17, Alan Cox wrote:
> Or install your own EH handler. You don't have to use the kernel eh at
> all, you can just opt out of it and for some cards like smart raid
> devices its precisely the right thing to do. I can believe the argument
> that for fast failover you want to do EH differently, and nothing in the
> architecture seems to make that hard providing a driver is willing to
> opt cleanly out of the scsi_eh handler.

Well, that's what the eh_strategy_handler hook is for.

It certainly makes sense to do this for pseudo-scsi devices, like RAID
cards.  But, for standard transports, like parallel SCSI and Fibre, I'd
much rather see a standardised error handling library as part of the
stack, with the mid-layer providing standard transport recovery
functions on a per-transport basis, and the Upper Layer Drivers
providing device specific handling.

> Similarly it would be nice to be able to revector the actual scsi eh
> timeouts via an optional handler. Something of the form

That's a good idea...do you want to send in the patches?

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27 15:17               ` Alan Cox
  2003-12-27 15:54                 ` James Bottomley
@ 2003-12-27 16:02                 ` Justin T. Gibbs
  1 sibling, 0 replies; 19+ messages in thread
From: Justin T. Gibbs @ 2003-12-27 16:02 UTC (permalink / raw)
  To: Alan Cox, James Bottomley
  Cc: SCSI Mailing List, Linus Torvalds, Marcelo Tosatti, Andrew Morton

> Or install your own EH handler.

This was something I considered, but is sadly not enough.  You need
at least two other changes to make it work:

1) scsi_done needs to complete commands regardless of the timer
   state.  The implied race condition commented about in scsi_done()
   doesn't exist.  The HBA driver knows whether a command is in flight
   or not.  Attempting to abort a command that is not active can just
   return immediately with whatever status is agreed upon for the API.
   Without this change, a driver must reschedule bogus timers to get
   a completion to take effect.

2) The driver's must be notified of timeouts immediately.  Your idea
   of re-vectoring the timeout handler would do this just fine.

--
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates
  2003-12-27 15:54                 ` James Bottomley
@ 2003-12-27 23:55                   ` Alan Cox
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Cox @ 2003-12-27 23:55 UTC (permalink / raw)
  To: James Bottomley
  Cc: Justin T. Gibbs, SCSI Mailing List, Linus Torvalds,
	Marcelo Tosatti, Andrew Morton

On Sad, 2003-12-27 at 15:54, James Bottomley wrote:
> > Similarly it would be nice to be able to revector the actual scsi eh
> > timeouts via an optional handler. Something of the form
> 
> That's a good idea...do you want to send in the patches?

My calendar is full until about next October 8(


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-12-28  0:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-03 23:51 (unknown) Justin T. Gibbs
2003-06-03 23:58 ` Marc-Christian Petersen
2003-06-04  1:34 ` Aic7x_x_x 6.2.36 && Aic79xx 1.3.10 Updates Justin T. Gibbs
2003-12-24 16:58   ` Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates Justin T. Gibbs
2003-12-24 17:50     ` James Bottomley
     [not found]       ` <2148850000.1072292121@aslan.scsiguy.com>
2003-12-24 19:05         ` James Bottomley
2003-12-25  4:31           ` Justin T. Gibbs
2003-12-26 18:36             ` James Bottomley
2003-12-27  0:13               ` Justin T. Gibbs
2003-12-27  3:20                 ` James Bottomley
2003-12-27  4:26                   ` Justin T. Gibbs
2003-12-27  6:08                     ` Jeff Garzik
2003-12-27 15:11                     ` Alan Cox
2003-12-27 15:47                       ` Justin T. Gibbs
2003-12-27 15:52                     ` James Bottomley
2003-12-27 15:17               ` Alan Cox
2003-12-27 15:54                 ` James Bottomley
2003-12-27 23:55                   ` Alan Cox
2003-12-27 16:02                 ` Justin T. Gibbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox