Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Hannes Reinecke <hare@suse.de>
To: James.Smart@emulex.com
Cc: Jeremy Linton <jlinton@tributary.com>,
	Mike Christie <michaelc@cs.wisc.edu>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Andrew Vasquez <andrew.vasquez@qlogic.com>,
	Chad Dupuis <chad.dupuis@qlogic.com>,
	Robert Elliot <elliot@hp.com>
Subject: Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
Date: Tue, 12 Mar 2013 16:59:28 +0100	[thread overview]
Message-ID: <513F50E0.6060702@suse.de> (raw)
In-Reply-To: <b5c314a7-83ce-469e-8a59-7e35bd92b2b6@CMEXHTCAS2.ad.emulex.com>

On 03/11/2013 07:04 PM, James Smart wrote:
>
> On 3/11/2013 1:05 PM, Hannes Reinecke wrote:
>> On 03/07/2013 09:35 PM, Jeremy Linton wrote:
>>> On 3/7/2013 2:20 PM, Mike Christie wrote:
>>>> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
>>>>>     For lpfc, you never get to the code. Or rather when I was
>>>>> testing it, I
>>>>> couldn't find any way to propagate an error beyond the initial
>>>>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>>>>>
>>>>>     That call pretty much always returns success indpependent
>>>>> of the remote
>>>>> device because the firmware acks the context clear aborts,
>>>>> resulting in the
>>>>> outstanding iocb count being zero (independent of both the mid
>>>>> layer status
>>>>> and the actual device state).
>>>>>
>>>>
>>>> Your lpfc patch fixes that right?
>>>
>>>     Yes. It allows the device reset to fail if the device doesn't
>>> respond to the
>>> task mgmt request, or rejects it, etc.
>>>
>>>     It doesn't unjam the commands that get aborted by the
>>> flush_io_context() call.
>>> Those have to depend on their timeouts. That is another patch...
>>>
>>>
>>
>> It's actually worse than that.
>> lpfc_terminate_rport_io() calls lpfc_sli_abort_iocb(), which has
>> this:
>>
>>
>>          if (lpfc_is_link_up(phba))
>>             abtsiocb->iocb.ulpCommand = CMD_ABORT_XRI_CN;
>>         else
>>             abtsiocb->iocb.ulpCommand = CMD_CLOSE_XRI_CN;
>>
>>         /* Setup callback routine and issue the command. */
>>         abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
>>         ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
>>                           abtsiocb, 0);
>>         if (ret_val == IOCB_ERROR) {
>>             lpfc_sli_release_iocbq(phba, abtsiocb);
>>             errcnt++;
>>             continue;
>>         }
>>
>>
>> Ie we're calling into firmware and waiting for an async event
>> telling us that the command has been aborted (ideally).
>> What I would like is some kind of synchronous call here, which would
>> guarantee us that we won't run into use-after-free issues.
>>
>> Also 'lpfc_is_link_up' is clearly deficient here as the link
>> itself most likely is up, it's the I_T Nexus which is not.
>>
>> James, is it safe to use 'CMD_CLOSE_XRI_CN' even when the link is up?
>
> No, it's not safe.  The ABORT, which sends an ABTS, is mandated so
> that the other end and ourselves maintain proper (unique) exchange
> id state.   CLOSE sends no link traffic - but can only be used if
> the login is broken (e.g. there's a different mechanism that
> communicated termination of exchange states).   I don't believe I
> can trust the logic in the OS about frames laying in wait in the
> fabric (maybe sent earlier, delayed at a switch, delivered after os
> thinks nexus is gone), so driver needs to terminate them properly.
>
True. Just as I thought.

>>
>> Which makes me wonder, how _exactly_ is I_T nexus reset supposed
>> to work? After all, we're trying to tell the target port that we
>> cannot talk to it anymore, right?
>> Which has some hurdles, conceptually ...
>> So from my POV I_T nexus reset can only be implemented on the
>> _initiator_ side, disregarding any target implementation.
>> (which would be pointless anyway).
>>
>> Hmm. Probably have to ask T10 for clarification. Robert, any
>> insights?
>
>
> The I_T nexus reset should be a FC transport implicit logout call to
> the LLDD.  E.g. this becomes a transport-specific action on what it
> means to break the I_T nexus, which for FC, is to terminate the
> login.   This logout call allows the driver to do all the implicit
> work to kill exchange contexts and allows it to adjust the state of
> the target in it's FC discovery engine.  Question is - should the
> driver re-login ? Typically, this would be driven by a RSCN, which
> I'm guessing for this scenario, would not be occurring. If you knew
> it would, you could let the driver respond to the RSCN and re-login
> later.   If there's no RSCN, then I would assume we put a heartbeat
> into the transport to retry login (to a WWPN/WWNN basis - remembered
> from the I_T nexus reset) with the LLDD - a new interface as well -
> call it "establish I_T_nexus".
>
Hmm. As I feared, my solution was a bit optimistic.
But good idea, using a 'logout' to trigger I_T nexus removal.
I wonder if we shouldn't attempt to logout for the fast_io_fail 
case, too?
And for the timer, yeah, I guess we need something like this.

> In lpfc's case - the Logout would allow the driver to take the
> CLOSE_XRI case, giving you the speed/asynchronicity you desire.
> Reuse of scsi job structures still can't occur until the driver
> returns then via the completion routines (as DMA related to them
> must be cancelled within the card by the ABORT/CLOSE commands - even
> if we know there shouldn't be something to DMA).
>
The problem here is that the _eh calls are _synchronous_ in nature.
Not that it works perfectly nowadays (cf the discussion about TMF 
results) but that's at least the theory.

Anyway, thanks for you insights.
It has been _very_ helpful.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2013-03-12 15:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11  8:23 [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
2012-12-11 12:46 ` Martin Peschke
2012-12-11 14:06   ` Hannes Reinecke
2013-03-07 19:19 ` Mike Christie
2013-03-07 20:13   ` Jeremy Linton
2013-03-07 20:20     ` Mike Christie
2013-03-07 20:24       ` Mike Christie
2013-03-07 20:35       ` Jeremy Linton
2013-03-11 17:05         ` Hannes Reinecke
2013-03-11 18:04           ` James Smart
2013-03-11 18:32             ` Vijay Mohan Guvva
2013-03-12 15:59             ` Hannes Reinecke [this message]
2013-03-07 21:44     ` Douglas Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513F50E0.6060702@suse.de \
    --to=hare@suse.de \
    --cc=James.Smart@emulex.com \
    --cc=andrew.vasquez@qlogic.com \
    --cc=chad.dupuis@qlogic.com \
    --cc=elliot@hp.com \
    --cc=jlinton@tributary.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox