From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Justin T. Gibbs" Subject: Re: Aic7x_x_x 6.3.4 && Aic79xx 2.0.5 Updates Date: Fri, 26 Dec 2003 17:13:44 -0700 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <2832150000.1072484024@aslan.scsiguy.com> References: <1051920000.1054684267@aslan.btc.adaptec.com> <3637050000.1054690456@aslan.s csiguy.com> <2113050000.1072285128@aslan.scsiguy.com> <1072288242.1906.35.camel@mulgrave > <2148850000.1072292121@aslan.scsiguy.com> <1072292714.2415.39.camel@mulgrave> <2304040000.1072326693@aslan.scsiguy.com> <1072463795.1873.127.camel@mulgrave> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from mail.scsiguy.com ([63.229.232.106]:6418 "EHLO aslan.scsiguy.com") by vger.kernel.org with ESMTP id S264268AbTL0AOF (ORCPT ); Fri, 26 Dec 2003 19:14:05 -0500 In-Reply-To: <1072463795.1873.127.camel@mulgrave> Content-Disposition: inline List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: SCSI Mailing List , Linus Torvalds , Alan Cox , Marcelo Tosatti , Andrew Morton >> The crux of the problem is that *watchdog error recovery* is happening >> at entirely the wrong level in Linux. > > So this is actually an architectural complaint, not a bug in the SCSI > mid-layer as previously stated. No. There are several bugs in the mid-layer. Replaying commands that have successfully completed *is a bug*. Stopping I/O to all devices during recovery *is a bug*. The current recovery handler is also fairly stupid in how it does things. I would also call this a bug. In at least 2.4 (haven't tested 2.6 yet), error recovery will loop forever. This is also a bug. The fact that peripheral drivers are not in control of the replay of commands that fail is also a bug. etc. etc. > But your complaint is only that recovery takes longer than you think you > can do in the driver. > > If error recovery were critical path in SCSI performance, this might be > a consideration, but it isn't...error recovery should be the exception, > not the rule. Recovery is critical. Why have failover controllers if it takes several minutes for that failover to succeed. The whole point of these controllers is to allow a critical service to continue to operate almost uninterrupted. > [...] >> In general, I prefer the CAM model. Briefly, this means, let the >> HBA drivers do what they can do best, provide as much information to >> the peripheral drivers so they can do their job correctly, and provide >> a "mid-layer" to simply route commands between the two. This avoids >> having a mid-layer that second guesses, often incorrectly, both ends >> of the system. > > The CAM (Common Access Model) was last updated in 1995 and is extremely > SCSI-2 (and hence parallel SCSI) specific. You are missing the point of CAM (either version 2 or 3). CAM was designed to be transport agnostic. It is not a replacement for SAM, or any other high or low level protocol. It is simply a routing engine for CAM Control Blocks (CCB), and a set of rules for how those CCBs are routed between a peripheral driver and the low level "SIM" driver. The exact details listed in the spec for the different CCB types are mostly irrelevant since a single CAM subsystem may provide support for past and future transport types that the spec couldn't envision. In CAM, this type of extension means defining a few CCB types for the new transport and reusing the same routing engine unaltered. > The successive t10 > committees charged with rewriting it have never successfully produced a > draft standard that has been published on the t10 site. This is because no-one wanted to rewrite their SCSI layers to be in complete compliance with the letter and verse of CAM (i.e. the actual CCB structure definitions listed in the CAM spec). I was at the last meeting of the CAM subcommittee so I know why it was disbanded. > The linux SCSI subsystem follows the SAM (Scsi Architecture Model) which > was published as the backbone to SCSI-3 (SAM-3 was last updated in > November 2003). I find it's command/transport separation extremely > appealing. It has helped us to add new transports like Fibre and Even > SATA to the mix with relative ease. This lack of command/transport > separation is, in my view, the biggest hole in CAM, and the reason why > we'll be continuing with SAM for Linux SCSI. I'm fully aware of SAM and the whole T10 family of specs. To believe that CAM is transport specific is again to completely miss why it was written. SAM is not even close to a replacement for CAM. > I cannot deny that the current error handler, trying to be all things to > all devices/transports, is out of kilter with this vision...it should, > at the very least have transport and device components...However, in > 2.6, it does at least work. If the SCSI layer didn't have the critical flaws that: o scsi_done() doesn't always complete commands. o the HBA is not informed of timeouts until the HBA is idle and all other commands have timed out. o the HBA cannot discern between requests related to watchdog recovery and "normal" task management. I would have just rewrittent the error handler. Unfortunately, you can't avoid the above three issues without canceling and restoring timeouts. > On the Futures roadmap for the block layer in 2.7 is stackable error > recovery (you can already see the beginnings of this in the fastfail > processing) which will form the basis of async I/O, multi-path and > software RAID. In general, the peripheral driver should get the first crack at any status returned by an HBA driver. Until that occurs, the Linux SCSI layer is critically flawed. If you are interested, you can look at how the FreeBSD SCSI layer deals with these issues. The peripheral driver "filters" all errors and defaults to using a common, generic error handler for errors that do not need special handling. Right now, the Linux mid-layer hides information and performs actions that are not necessarilly what the peripheral driver wants. Other than perhaps statistics gathering, and other actions that are not visible to the end device, this should not be the case. > From a technical perspective, the way you try to thwart mid-layer error > recovery: intercept all the SCSI timers and substitute your own, is > extremely ugly (and leads to quite a bit of code duplication) but it's > surely going to cause a conflict with the evolving stackable error > handling. I doubt that. The higher levels will never see a timeout. For other transport errors that are visible in terms of returned SCSI status, the abort, BDR, and reset entry points are fully functional. The current strategy also guarantees that I can quickly and easily correct any bug reports that are sent to me regarding how my drivers perform recovery. This was exactly why I took the steps I did. > If you want to help us with the transport and device separations of the > error handler, you're more than welcome, but trying to pull all error > handling into your driver isn't useful because it adds layering > violations, promotes compatibility problems and cannot be used by any > other driver. I'm not pulling all error recovery into my driver. I'm pulling transport specific *watchdog recovery* into my driver. It is the HBA's job to ensure that it can access the devices attached to it. The peripheral driver's job is to inform the HBA of a drop-dead time for a command. Recovering from a timeout is actually very straight forward if you have the information you need to do it correctly. This can only be done at the HBA where HBA specific state can be referenced to pick the correct type of action. This, to my mind, is just a minor extention to the transport validation and recovery that HBAs also must do in order to be robust. Protocol and device specific recovery (related to SCSI status, residuals, etc.) will continue to be performed by the peripheral driver (or as in the case of Linux, by the mid-layer). I have no interest in *doing it all*, only what I have to for the drivers I maintain to be robust. -- Justin