All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Steffen Maier <maier@linux.vnet.ibm.com>
Cc: James Bottomley <jbottomley@parallels.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-scsi@vger.kernel.org, James Smart <james.smart@emulex.com>
Subject: Re: [PATCH 6/7] scsi: Use Scsi_Host as argument for eh_host_reset_handler
Date: Fri, 27 Jun 2014 19:52:15 +0200	[thread overview]
Message-ID: <53ADAF4F.3060900@suse.de> (raw)
In-Reply-To: <53AD8290.7020809@linux.vnet.ibm.com>

On 06/27/2014 04:41 PM, Steffen Maier wrote:
> What base does this patch set apply to?
> I failed with scsi:misc / scsi:fixes / vanilla.
>
> On 06/27/2014 12:47 PM, Steffen Maier wrote:
>> On 06/27/2014 08:27 AM, Hannes Reinecke wrote:
>>> Issuing a host reset should not rely on any commands.
>>> So use Scsi_Host as argument for eh_host_reset_handler.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>
>>>   drivers/s390/scsi/zfcp_scsi.c             |  3 +-
>>
>>>   69 files changed, 289 insertions(+), 379 deletions(-)
>>
>>> diff --git a/drivers/s390/scsi/zfcp_scsi.c
>>> b/drivers/s390/scsi/zfcp_scsi.c
>>> index dc42c93..fe50f69 100644
>>> --- a/drivers/s390/scsi/zfcp_scsi.c
>>> +++ b/drivers/s390/scsi/zfcp_scsi.c
>>> @@ -281,13 +281,14 @@ static int
>>> zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
>>>       return zfcp_task_mgmt_function(scpnt, FCP_TMF_TGT_RESET);
>>>   }
>>>
>>> -static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>>> +static int zfcp_scsi_eh_host_reset_handler(struct Scsi_Host *host)
>>>   {
>>>       struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
>>
>> Scpnt argument no longer exists, so this line must likely go away to
>> compile.
>>
>>>       struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>>
>> This needs the assignment from the added line below since zfcp_sdev no
>> longer exists.
>> Alternatively, drop the assignment here, only have the auto variable
>> definition, and do the assignment below with the added line.
>>
>>>       struct fc_rport *rport = zfcp_sdev->port->rport;
>
> Oh crap, this also no longer works now. And as a consequence we cannot
> get a reasonable rport any more now to call fc_block_scsi_eh().
>
> I think it's us as LLDD calling fc_remote_port_add() anyway as part of
> adapter/port recovery, so we trigger the change to an unblocked rport.
> So, now we "only" need to make sure to block long enough for all
> fc_remote_port_add's to have happened?
Either that, or have some hook in queuecommand which will return any 
command with BUSY until the HBA reset is finished.

> zfcp_erp_wait() might not be sufficient which is probably why Christof
> introduced fc_block_scsi_eh() in that commit Martin referred to.
>
> Where's a design document which explains how to correctly use
> scsi_transport_fc?
>
Typically the LLDD sets all ports to 'BLOCKED' if it's still doing 
discovery (ie during LIP for FC-AL) and then updates the port states
accordingly once discovery is finished via 
fc_remote_port_add()/fc_remote_port_delete().

>>>       int ret;
>
> If adapter recovery fails, e.g. because there is no local light on the
> fibre, I would suppose that our recovery ended with a blocked Scsi_Host
> (at the latest after zfcp_erp_wait()).
No. It's perfectly okay for host_reset to end up with no local light on 
the fibre; that is not a failure.
(Of course, the driver would have to call fc_remote_port_delete() here 
first).
Adapter recovery should only be considered 'failed' if for some reason 
the HBA itself doesn't react to the host reset procedure.

> Is that sufficient or do we need to actually return the success of the
> host reset from this handler function?

Not necessarily. If the communication with the HBA is always assumed to 
be working (and, judging from DASD details, this is a safe assumption)
host reset cannot really fail for zfcp.

> If not, we drop the "ret" variable as well.
> If yes, how is host reset success defined?
> Is it success, e.g. if adapter recovery succeeded but there is no light
> on the fibre?
>
As mentioned above: Yes, that's perfectly okay.
It still means that you can communicate with the HBA, otherwise you
wouldn't even know the current light status :-)

>>>
>>> +    adapter = (struct zfcp_adapter *)host->hostdata[0];
>>>       zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>>>       zfcp_erp_wait(adapter);
>>>       ret = fc_block_scsi_eh(rport);
>>
>> A question just for my understanding: Calling fc_block_scsi_eh at the
>> end *after* our adapter recovery is OK to wait for rports to become
>> unblocked (or fast_io_failed)?
>> I would guess so.
>
> Maybe we could replace the call to fc_block_scsi_eh() with something
> like the following which would wait until zfcp called
> fc_remote_port_add() [synchronously via work item in
> zfcp_scsi_schedule_rport_register() during port recovery or if link test
> with ADISC succeeded] for all ports of the adapter as part of the
> adapter recovery:
>
> flush_workqueue(adapter->work_queue);
>
> That's as coarse granular as the zfcp_erp_wait() just waiting for the
> queue to run empty, no matter which items were queued.
> Strictly speaking we would only wait for all rport work items of this
> adapter, but that's more complicated to code. It would be safer to
> actually return in time, and not prolong arbitrarily if new work items
> were queued meanwhile and the work queue does not become empty, but then
> again flush_workqueue might already be safe compared to drain_workqueue.
>
Actually, it would be safe to call 'fc_remote_port_delete()' for all 
remote ports as the first part of the host_reset() function.
(Always assuming you're not doing that already; I haven't checked here).
The you could call zfcp_erp_adapter_reopen() and zfcp_erp_wait() as you 
do nowadays.
But now it wouldn't matter if there are any work items queued or not;
the ports will be reset if and when the workqueue items are called.
And the SCSI midlayer wouldn't be sending any commands as the ports are 
still in BLOCKED during that time.

Of course, it might be that the ports does in fact vanish during host 
reset (dev_loss_tmo might trigger before host reset finished), but the 
SCSI layer is well equipped to handle that.

> Can I assume, that fc_remote_port_add() is synchronous and does the
> rport state change to unblocked before returning?
Yes.

> If so, I would think this would semantically replace our previous call
> to fc_block_scsi_eh().
>
No. You cannot assume that the rport is still present after host_reset.
So I would really not try to call anything rport related in host_reset ...

> Does any of this make sense?
>
Sorta. Hope the same holds for my answers :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-06-27 17:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27  6:26 [PATCH 0/7] Use 'Scsi_Host' as argument for host reset Hannes Reinecke
2014-06-27  6:26 ` [PATCH 1/7] scsi: fix comment in scsi_device_set_state() Hannes Reinecke
2014-06-27  6:27 ` [PATCH 2/7] mptfc: Do not call fc_block_scsi_eh() on host reset Hannes Reinecke
2014-06-27  6:27 ` [PATCH 3/7] ibmvfc: " Hannes Reinecke
2014-06-27  6:27 ` [PATCH 4/7] libfc: " Hannes Reinecke
2014-06-27  6:27 ` [PATCH 5/7] scsi_transport_fc: Use fc_rport as argument for fc_block_scsi_eh Hannes Reinecke
2014-06-27 12:46   ` Steffen Maier
2014-06-27  6:27 ` [PATCH 6/7] scsi: Use Scsi_Host as argument for eh_host_reset_handler Hannes Reinecke
2014-06-27 10:47   ` Steffen Maier
2014-06-27 11:04     ` Hannes Reinecke
2014-06-27 11:52       ` Martin Peschke
2014-06-27 12:00         ` Hannes Reinecke
2014-06-27 14:41     ` Steffen Maier
2014-06-27 17:52       ` Hannes Reinecke [this message]
2014-06-27  6:27 ` [PATCH 7/7] scsi_error: do not use command list for host reset Hannes Reinecke
2014-06-27  7:06   ` Bart Van Assche
2014-06-27  7:58   ` Christoph Hellwig
2014-06-27 10:42     ` Steffen Maier
2014-06-27  7:59 ` [PATCH 0/7] Use 'Scsi_Host' as argument " Christoph Hellwig
2014-06-27  8:14   ` Hannes Reinecke
2014-06-27  8:15     ` Christoph Hellwig
2014-09-07 16:21 ` Christoph Hellwig
2014-09-08  6:49   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53ADAF4F.3060900@suse.de \
    --to=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=james.smart@emulex.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=maier@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.