From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [PATCH 6/7] scsi: Use Scsi_Host as argument for eh_host_reset_handler Date: Fri, 27 Jun 2014 19:52:15 +0200 Message-ID: <53ADAF4F.3060900@suse.de> References: <1403850425-89297-1-git-send-email-hare@suse.de> <1403850425-89297-7-git-send-email-hare@suse.de> <53AD4BDB.2040707@linux.vnet.ibm.com> <53AD8290.7020809@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from cantor2.suse.de ([195.135.220.15]:34999 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751150AbaF0RwS (ORCPT ); Fri, 27 Jun 2014 13:52:18 -0400 In-Reply-To: <53AD8290.7020809@linux.vnet.ibm.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Steffen Maier Cc: James Bottomley , Christoph Hellwig , linux-scsi@vger.kernel.org, James Smart On 06/27/2014 04:41 PM, Steffen Maier wrote: > What base does this patch set apply to? > I failed with scsi:misc / scsi:fixes / vanilla. > > On 06/27/2014 12:47 PM, Steffen Maier wrote: >> On 06/27/2014 08:27 AM, Hannes Reinecke wrote: >>> Issuing a host reset should not rely on any commands. >>> So use Scsi_Host as argument for eh_host_reset_handler. >>> >>> Signed-off-by: Hannes Reinecke >>> --- >> >>> drivers/s390/scsi/zfcp_scsi.c | 3 +- >> >>> 69 files changed, 289 insertions(+), 379 deletions(-) >> >>> diff --git a/drivers/s390/scsi/zfcp_scsi.c >>> b/drivers/s390/scsi/zfcp_scsi.c >>> index dc42c93..fe50f69 100644 >>> --- a/drivers/s390/scsi/zfcp_scsi.c >>> +++ b/drivers/s390/scsi/zfcp_scsi.c >>> @@ -281,13 +281,14 @@ static int >>> zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt) >>> return zfcp_task_mgmt_function(scpnt, FCP_TMF_TGT_RESET); >>> } >>> >>> -static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt= ) >>> +static int zfcp_scsi_eh_host_reset_handler(struct Scsi_Host *host) >>> { >>> struct zfcp_scsi_dev *zfcp_sdev =3D sdev_to_zfcp(scpnt->devic= e); >> >> Scpnt argument no longer exists, so this line must likely go away to >> compile. >> >>> struct zfcp_adapter *adapter =3D zfcp_sdev->port->adapter; >> >> This needs the assignment from the added line below since zfcp_sdev = no >> longer exists. >> Alternatively, drop the assignment here, only have the auto variable >> definition, and do the assignment below with the added line. >> >>> struct fc_rport *rport =3D zfcp_sdev->port->rport; > > Oh crap, this also no longer works now. And as a consequence we canno= t > get a reasonable rport any more now to call fc_block_scsi_eh(). > > I think it's us as LLDD calling fc_remote_port_add() anyway as part o= f > adapter/port recovery, so we trigger the change to an unblocked rport= =2E > So, now we "only" need to make sure to block long enough for all > fc_remote_port_add's to have happened? Either that, or have some hook in queuecommand which will return any=20 command with BUSY until the HBA reset is finished. > zfcp_erp_wait() might not be sufficient which is probably why Christo= f > introduced fc_block_scsi_eh() in that commit Martin referred to. > > Where's a design document which explains how to correctly use > scsi_transport_fc? > Typically the LLDD sets all ports to 'BLOCKED' if it's still doing=20 discovery (ie during LIP for FC-AL) and then updates the port states accordingly once discovery is finished via=20 fc_remote_port_add()/fc_remote_port_delete(). >>> int ret; > > If adapter recovery fails, e.g. because there is no local light on th= e > fibre, I would suppose that our recovery ended with a blocked Scsi_Ho= st > (at the latest after zfcp_erp_wait()). No. It's perfectly okay for host_reset to end up with no local light on= =20 the fibre; that is not a failure. (Of course, the driver would have to call fc_remote_port_delete() here=20 first). Adapter recovery should only be considered 'failed' if for some reason=20 the HBA itself doesn't react to the host reset procedure. > Is that sufficient or do we need to actually return the success of th= e > host reset from this handler function? Not necessarily. If the communication with the HBA is always assumed to= =20 be working (and, judging from DASD details, this is a safe assumption) host reset cannot really fail for zfcp. > If not, we drop the "ret" variable as well. > If yes, how is host reset success defined? > Is it success, e.g. if adapter recovery succeeded but there is no lig= ht > on the fibre? > As mentioned above: Yes, that's perfectly okay. It still means that you can communicate with the HBA, otherwise you wouldn't even know the current light status :-) >>> >>> + adapter =3D (struct zfcp_adapter *)host->hostdata[0]; >>> zfcp_erp_adapter_reopen(adapter, 0, "schrh_1"); >>> zfcp_erp_wait(adapter); >>> ret =3D fc_block_scsi_eh(rport); >> >> A question just for my understanding: Calling fc_block_scsi_eh at th= e >> end *after* our adapter recovery is OK to wait for rports to become >> unblocked (or fast_io_failed)? >> I would guess so. > > Maybe we could replace the call to fc_block_scsi_eh() with something > like the following which would wait until zfcp called > fc_remote_port_add() [synchronously via work item in > zfcp_scsi_schedule_rport_register() during port recovery or if link t= est > with ADISC succeeded] for all ports of the adapter as part of the > adapter recovery: > > flush_workqueue(adapter->work_queue); > > That's as coarse granular as the zfcp_erp_wait() just waiting for the > queue to run empty, no matter which items were queued. > Strictly speaking we would only wait for all rport work items of this > adapter, but that's more complicated to code. It would be safer to > actually return in time, and not prolong arbitrarily if new work item= s > were queued meanwhile and the work queue does not become empty, but t= hen > again flush_workqueue might already be safe compared to drain_workque= ue. > Actually, it would be safe to call 'fc_remote_port_delete()' for all=20 remote ports as the first part of the host_reset() function. (Always assuming you're not doing that already; I haven't checked here)= =2E The you could call zfcp_erp_adapter_reopen() and zfcp_erp_wait() as you= =20 do nowadays. But now it wouldn't matter if there are any work items queued or not; the ports will be reset if and when the workqueue items are called. And the SCSI midlayer wouldn't be sending any commands as the ports are= =20 still in BLOCKED during that time. Of course, it might be that the ports does in fact vanish during host=20 reset (dev_loss_tmo might trigger before host reset finished), but the=20 SCSI layer is well equipped to handle that. > Can I assume, that fc_remote_port_add() is synchronous and does the > rport state change to unblocked before returning? Yes. > If so, I would think this would semantically replace our previous cal= l > to fc_block_scsi_eh(). > No. You cannot assume that the rport is still present after host_reset. So I would really not try to call anything rport related in host_reset = =2E.. > Does any of this make sense? > Sorta. Hope the same holds for my answers :-) Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html