From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: RE: suspending I/Os to a device Date: 23 Apr 2004 19:28:30 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <1082762917.1615.17.camel@mulgrave> References: <8D43EFD7CCBDB24980134BE078C227E704E37A82@xcm.emulex.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from stat1.steeleye.com ([65.114.3.130]:30920 "EHLO hancock.sc.steeleye.com") by vger.kernel.org with ESMTP id S261704AbUDWX3d (ORCPT ); Fri, 23 Apr 2004 19:29:33 -0400 In-Reply-To: <8D43EFD7CCBDB24980134BE078C227E704E37A82@xcm.emulex.com> List-Id: linux-scsi@vger.kernel.org To: "Infante, Jon" Cc: Linux SCSI Reflector On Fri, 2004-04-23 at 18:12, Infante, Jon wrote: > Here is the senerio I am trying to handle within our Fibre Channel driver. > In Fibre Channel, it is possible for the Target, or HBA, to be disconnected > from one port on a switch and plugged into another port on the same (or even > a different) switch. This transitition can happen with live traffic and the > scsi subsystem / driver is supposed to recover gracefully, without failing > all I/Os back up to applications that are running. In order to accomplish > this, there is a configurable timer, that tells the driver how long to wait > (at the Fibre Channel level), when a device disappears, to give it a chance > to come back. Some OEMs / environments set this to a couple seconds, some a > to couple of minutes. Actually, there is no interface because no-one has had any prior requirements for this. This is effectively a FC specific thing, and I believe you know because you get a lip event telling you the target has gone, so it sounds like the ideal thing to handle in the FC transport class (since it would apply to all FC HBAs) The target reconnect timeout would then be a transport class property and we'd start failing I/Os when it expired. I would guess there needs to be an additional state in the device state model like SUSPEND (QUIESCE doesn't quite fit because it allows the pending queue to drain and the error handler to operate), but we can accomodate that. By default, I would think the reconnect timeout should be zero because most FC operates in a multi-path environment and you want path failure notifications as fast as possible, so users requiring this feature would need to set the timeout. James