From: Steffen Maier <maier@linux.ibm.com>
To: Hannes Reinecke <hare@suse.de>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Benjamin Block <bblock@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>,
James Bottomley <james.bottomley@hansenpartnership.com>,
linux-scsi@vger.kernel.org
Subject: Re: [PATCH 03/24] zfcp: open-code fc_block_scsi_eh() for host reset
Date: Thu, 12 May 2022 08:12:44 +0200 [thread overview]
Message-ID: <2069a564-6b22-dc38-d8fc-6e611d7f2db0@linux.ibm.com> (raw)
In-Reply-To: <6257e8e3-fc82-3a57-93aa-c18ff6a23bab@linux.ibm.com>
Hannes, I'm working on a solution, just need more time.
On 5/4/22 13:40, Steffen Maier wrote:
> On 5/4/22 12:46, Steffen Maier wrote:
>> On 5/3/22 22:06, Hannes Reinecke wrote:
>>> When issuing a host reset we should be waiting for all
>>> ports to become unblocked; just waiting for one might
>>> be resulting in host reset to return too early.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>> Cc: Steffen Maier <maier@linux.ibm.com>
>>> Cc: Benjamin Block <bblock@linux.ibm.com>
>>> ---
>>> drivers/s390/scsi/zfcp_scsi.c | 21 +++++++++++++++------
>>> 1 file changed, 15 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
>>> index 526ac240d9fe..2e615e1dcde6 100644
>>> --- a/drivers/s390/scsi/zfcp_scsi.c
>>> +++ b/drivers/s390/scsi/zfcp_scsi.c
>>> @@ -373,9 +373,11 @@ static int zfcp_scsi_eh_target_reset_handler(struct
>>> scsi_cmnd *scpnt)
>>> static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>>> {
>>> - struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
>>> - struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>>> - int ret = SUCCESS, fc_ret;
>>> + struct Scsi_Host *host = scpnt->device->host;
>>> + struct zfcp_adapter *adapter = (struct zfcp_adapter *)host->hostdata[0];
>>> + int ret = SUCCESS;
>>> + unsigned long flags;
>>> + struct zfcp_port *port;
>>> if (!(adapter->connection_features & FSF_FEATURE_NPIV_MODE)) {
>>> zfcp_erp_port_forced_reopen_all(adapter, 0, "schrh_p");
>>> @@ -383,9 +385,16 @@ static int zfcp_scsi_eh_host_reset_handler(struct
>>> scsi_cmnd *scpnt)
>>> }
>>> zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>>> zfcp_erp_wait(adapter);
>>
>> Now internal zfcp recovery for the adapter completed, but there are async
>> pending follow-up things, such as rport(s) unblock, see below.
>>
>>> - fc_ret = fc_block_scsi_eh(scpnt);
>>> - if (fc_ret)
>>> - ret = fc_ret;
>>> +
>>> + spin_lock_irqsave(&adapter->port_list_lock, flags);
>>
>> zfcp_adapter.port_list_lock is of type rwlock_t:
>>
>> + read_lock_irqsave(&adapter->port_list_lock, flags);
>>
>>> + list_for_each_entry(port, &adapter->port_list, list) {
>>> + struct fc_rport *rport = port->rport;
>>
>> While an rport is blocked, port->rport == NULL [zfcp_scsi_rport_block() and
>> zfcp_scsi_rport_register()], so this could be a NULL pointer deref.
>>
>> Typically all rports would still be blocked after above adapter recovery,
>> until they become unblocked (via zfcp's async rport_work)
>> [zfcp_erp_try_rport_unblock() near the end of lun and port recovery that
>> happen as follow-up parts of adapter recovery and before zfcp_erp_wait()
>> returns; it eventually calls zfcp_scsi_schedule_rport_register() queueing
>> work item port->rport_work on which we could do flush_work() [one prior art
>> in zfcp_init_device_configure()]].
>>
>> Am starting to wonder if we would really need to sync with the async rports
>> unblocks (for this adapter) after zfcp_erp_wait() above and before any deref
>> port->rport. Either within this loop or with a separate loop before this one.
>> ( Another option would be to somehow iterate rports differently via common
>> code lists so we directly get rport references. )
>>
>>> +
>>> + ret = fc_block_rport(rport);
>
> I think I just noticed that that callee includes an msleep but we hold a read
> lock on port_list_lock. That does not seem right [scheduling while atomic?!].
> Maybe we would need an open coded version after all to be able to drop both
> locks in correct order (and re-acquire) in the open-coded copy of
> fc_block_rport()?
>
>>
>> Lock order looks good, we hold port_list_lock here and fc_block_rport() takes
>> Scsi_Host host_lock, so it matches prior art:
>> static void zfcp_erp_action_dismiss_adapter(struct zfcp_adapter *adapter)
>> read_lock(&adapter->port_list_lock);
>> zfcp_erp_action_dismiss_port(port);
>> spin_lock(port->adapter->scsi_host->host_lock);
>>
>>> + if (ret)
>>> + break;
>>
>> So once we find the first rport with
>> (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
>> we would break the loop and return from zfcp_scsi_eh_host_reset_handler()
>> with FAST_IO_FAIL. Is that correct? What if there are still other blocked
>> rports if we were to continue with the loop? Or am I missing something
>> regarding the goal of "waiting for all rports to become unblocked"?
>>
>> Once we completed the loop, the question arises which consolidated return
>> code would be the correct one, if we got different ret for different rports
>> in the loop. Does my previous brain dump make sense?:
>> I was pondering in my own patch version what to return of just a subset
>> of ports ran into fast_io_fail. And for now I thought just fast_io_fail
>> would be good to let I/O bubble up for path failover, even if this would
>> include of rport which meanwhile unblocked properly and would not need
>> bubbling up pending requests because they could service them with a
>> simple retry.
>>
>>> + }
>>> + spin_unlock_irqrestore(&adapter->port_list_lock, flags);
>>
>> + read_unlock_irqrestore(&adapter->port_list_lock, flags);
>>
>>> zfcp_dbf_scsi_eh("schrh_r", adapter, ~0, ret);
>>> return ret;
>>
>>
>
>
--
Mit freundlichen Gruessen / Kind regards
Steffen Maier
Linux on IBM Z and LinuxONE
https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: David Faller
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
next prev parent reply other threads:[~2022-05-12 6:13 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-03 20:06 [PATCHv2 00/24] scsi: EH rework prep patches, part 1 Hannes Reinecke
2022-05-03 20:06 ` [PATCH 01/24] csiostor: use fc_block_rport() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 02/24] fc_fcp: " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 03/24] zfcp: open-code fc_block_scsi_eh() for host reset Hannes Reinecke
2022-05-04 10:46 ` Steffen Maier
2022-05-04 11:40 ` Steffen Maier
2022-05-12 6:12 ` Steffen Maier [this message]
2022-05-04 11:01 ` Benjamin Block
2022-05-04 11:09 ` Benjamin Block
2022-05-04 11:33 ` Benjamin Block
2022-05-03 20:06 ` [PATCH 04/24] mptfc: simplify mpt_fc_block_error_handler() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 05/24] mptfusion: correct definitions for mptscsih_dev_reset() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 06/24] mptfc: open-code mptfc_block_error_handler() for bus reset Hannes Reinecke
2022-05-03 20:06 ` [PATCH 07/24] qedf: use fc rport as argument for qedf_initiate_tmf() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 08/24] bnx2fc: Do not rely on a scsi command for lun or target reset Hannes Reinecke
2022-05-03 20:06 ` [PATCH 09/24] ibmvfc: open-code reset loop for " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 10/24] ibmvfc: use fc_block_rport() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 11/24] fnic: use dedicated device reset command Hannes Reinecke
2022-05-03 20:06 ` [PATCH 12/24] fnic: use fc_block_rport() correctly Hannes Reinecke
2022-05-03 20:06 ` [PATCH 13/24] aic7xxx: make BUILD_SCSIID() a function Hannes Reinecke
2022-05-03 20:06 ` [PATCH 14/24] aic79xx: " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 15/24] aic7xxx: do not reference scsi command when resetting device Hannes Reinecke
2022-05-04 1:10 ` Finn Thain
2022-05-03 20:06 ` [PATCH 16/24] aic79xx: " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 17/24] snic: reserve tag for TMF Hannes Reinecke
2022-05-03 20:06 ` [PATCH 18/24] snic: use dedicated device reset command Hannes Reinecke
2022-05-03 20:06 ` [PATCH 19/24] snic: Use scsi_host_busy_iter() to traverse commands Hannes Reinecke
2022-05-03 20:07 ` [PATCH 20/24] sym53c8xx_2: split off bus reset from host reset Hannes Reinecke
2022-05-03 20:07 ` [PATCH 21/24] ips: Do not try to abort command " Hannes Reinecke
2022-05-03 20:07 ` [PATCH 22/24] qla1280: separate out host reset function from qla1280_error_action() Hannes Reinecke
2022-05-03 20:07 ` [PATCH 23/24] megaraid: pass in NULL scb for host reset Hannes Reinecke
2022-05-03 20:07 ` [PATCH 24/24] mpi3mr: split off bus_reset function from host_reset Hannes Reinecke
-- strict thread matches above, loose matches on Subject: below --
2022-05-02 21:37 [PATCH 00/24] scsi: EH rework prep patches, part 1 Hannes Reinecke
2022-05-02 21:37 ` [PATCH 03/24] zfcp: open-code fc_block_scsi_eh() for host reset Hannes Reinecke
2022-05-03 14:06 ` Christoph Hellwig
2022-05-03 17:21 ` Steffen Maier
2022-05-03 18:42 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2069a564-6b22-dc38-d8fc-6e611d7f2db0@linux.ibm.com \
--to=maier@linux.ibm.com \
--cc=bblock@linux.ibm.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=james.bottomley@hansenpartnership.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox