From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christof Schmitt Subject: Re: fc_remote_port_delete and returning SCSI commands from LLD Date: Fri, 23 Oct 2009 09:13:24 +0200 Message-ID: <20091023071324.GA5930@schmichrtp.de.ibm.com> References: <20091020144027.GA17717@schmichrtp.de.ibm.com> <4ADF4EC3.6010506@cs.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mtagate2.uk.ibm.com ([194.196.100.162]:55792 "EHLO mtagate2.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751038AbZJWHNW (ORCPT ); Fri, 23 Oct 2009 03:13:22 -0400 Received: from d06nrmr1806.portsmouth.uk.ibm.com (d06nrmr1806.portsmouth.uk.ibm.com [9.149.39.193]) by mtagate2.uk.ibm.com (8.13.1/8.13.1) with ESMTP id n9N7DQ8h003956 for ; Fri, 23 Oct 2009 07:13:26 GMT Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by d06nrmr1806.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n9N7DQY73113200 for ; Fri, 23 Oct 2009 08:13:26 +0100 Received: from d06av01.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n9N7DPAj026096 for ; Fri, 23 Oct 2009 08:13:26 +0100 Content-Disposition: inline In-Reply-To: <4ADF4EC3.6010506@cs.wisc.edu> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Mike Christie Cc: linux-scsi@vger.kernel.org On Wed, Oct 21, 2009 at 01:11:15PM -0500, Mike Christie wrote: > Christof Schmitt wrote: >> If the remote_port status is not BLOCKED, this will trigger the SCSI >> midlayer error handling which cannot do much during the interruption >> to the hardware and will mark the SCSI devices 'offline'. In order to >> prevent this, the rule would be: First call fc_remote_port_delete to >> set the remote port (or in the case of an HBA interruption all remote >> ports) to BLOCKED, and only after this step call scsi_done to pass the >> SCSI commands back to the upper layers. >> > > One other note when doing this. > > For problems where you are deleting the rport, it is best to use > something like DID_TRANSPORT_DISRUPTED to fail the cmd if you are > failing it right away. "something like DID_TRANSPORT_DISRUPTED" would be any error code that goes through "maybe_retry" in scsi_decide_disposition? I guess moving to DID_TRANSPORT_DISRUPTED is nice for consistency, but DID_ERROR triggers the same code paths as far as i can see. > If drivers block the rport, then fail commands > immediately with DID_TRANSPORT_DISRUPTED, then they will not actually be > failed to the block/mpath layer until the fast io fail timeout has > fired. This will prevent very short problems from firing the mutlipath > path offlining code. Just to get the complete picture: Blocking the rport and then returning DID_TRANSPORT_DISRUPTED will retry the command to the LLD which then first calls fc_remote_port_chkready. fc_remote_port_chkready will then keep the command between LLD and SCSI midlayer until the rport state changes or the fast_fail fires. Is this the complete picture or did i miss something? > If your driver deletes the rport and does not fail the cmd immediately > so it can recover within the command or some other reason like the fw > just works that way, then when the fast io fail timer fires and the > terminate_rport_io callback is run you could actually use any error code > since at this time when a IO is sent to the queuecommand the driver will > call fc_remote_port_chkready and IO will be failed immediately with > DID_TRANSPORT_FAILFAST). And the rport state is still BLOCKED, so at this point commands failed in the upper layers with blk_abort_request will not end up in the SCSI error recovery which cannot do much... Thanks for the help, i am starting to get the complete picture... Christof