Re: fc_remote_port_delete and returning SCSI commands from LLD

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christof Schmitt <christof.schmitt@de.ibm.com>
To: James Smart <James.Smart@Emulex.Com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: fc_remote_port_delete and returning SCSI commands from LLD
Date: Fri, 23 Oct 2009 09:47:47 +0200	[thread overview]
Message-ID: <20091023074746.GB5930@schmichrtp.de.ibm.com> (raw)
In-Reply-To: <4ADF35BF.8050100@emulex.com>

On Wed, Oct 21, 2009 at 12:24:31PM -0400, James Smart wrote:
> Christof Schmitt wrote:
>> I am looking again at how and when a FC LLD should call
>> fc_remote_port_delete. Some help would be welcome to cover all
>> requirements and to plug the holes...
>
> It's pretty simple, as far as the FC transport is concerned.
> Call fc_remote_port_add() once connectivity is established.
> Call fc_remote_port_delete() once connectivity is lost.
> It is expected that there is a clear add->delete->add->delete->... sequence.
>
> Timing is considered "immediate", but there's always a window of delay.
>
> In general, the Transport ignores what happens to outstanding i/o, 
> letting the LLDD do something based on its policy, or let natural i/o 
> timers or the fast fail timer to fire.  The transport, at the delete 
> call, will start the fast fail timer if it is set.

Understood.

>> One scenario i am looking at: The connection to the HBA has been
>> temporarily lost and the LLD has to return all pending I/O requests to
>> the upper layers, so they can be retried later. Now with the SCSI
>> device being part of a multipath device, the first failed I/O request
>> triggers path failover:
>
> You are now asking a different question - how to make the upper layers play
> nice with the different responses from queuecommand, the LLDD's 
> interaction with the transport and midlayer, etc.
>
>>
>> multipath_end_io
>> do_end_io
>> fail_path
>> queue_work(kmultipathd, &pgpath->deactivate_path);
>>
>> which then marks the following returned requests as timed out:
>>
>> deactivate_path
>> blk_abort_queue
>> blk_abort_request
>> blk_rq_timed_out
>> scsi_times_out
>> fc_timed_out
>>
>> If the remote_port status is not BLOCKED, this will trigger the SCSI
>> midlayer error handling which cannot do much during the interruption
>> to the hardware and will mark the SCSI devices 'offline'.
>
> Well - this isn't absolute, but is pretty much true. We expect, when  
> connectivity is lost, for the block state to be temporarily entered. The  
> blocked state holds off further i/o and the eh handler as well, to 
> postpone the normal i/o failure cases which do lead to offline conditions 
> in most scenarios.
>
> But - this process is a coordinated effort between the driver and the 
> upper layers, and where the driver doesn't get helped by the transport 
> (the blocked state) it had better mimic the return codes at the different 
> points, and perhaps more, so that bad things don't happen.

"mimic the return codes" refers to fc_remote_port_chkready? Like
returning DID_IMM_RETRY when the rport is going to be BLOCKED, but
fc_remote_port_delete did not run yet?

>> In order to
>> prevent this, the rule would be: First call fc_remote_port_delete to
>> set the remote port (or in the case of an HBA interruption all remote
>> ports) to BLOCKED, and only after this step call scsi_done to pass the
>> SCSI commands back to the upper layers.
>
> True, although as mentioned, i/o termination is considered independent 
> from the rport/transport. But, you're best off if the target is blocked 
> due to the rport delete as we've prepped the upper layers to behave best 
> with this behavior.
>
> There will always be a few i/o's that sneak in or complete (timeout ?) in 
> between when the LLDD detects connectivity loss and when the  
> fc_remote_port_delete has been called. It's up to the LLDD to handle this 
> window.
>
> Completions, including i/o timeouts, are typically not a big deal and 
> should just return via scsi_done as they normally would. The caveat is 
> when those i/o's are from the eh thread.  Granted - if you are actively 
> aborting/failing i/o at the connectivity loss, and doing so before the 
> block is in place, you're causing more headaches for yourself in getting 
> the upper layers to play right with the LLDD - with the recommendation 
> being "don't do that".
>
> New i/o needs to be caught in queuecommand with the LLDD emulating the  
> transport status that would normally get returned.  E.g. the call to  
> fc_remote_port_chkready() won't catch it as the fc_remote_port_delete() 
> call hasn't completed yet - so the LLDD needs a 2nd check against it's 
> own structures, and if it detects the state, it should fail the i/o with 
> the same codes that chkready would. In reality, if you wanted to accept 
> the command, but never issue it and just leave it outstanding - waiting 
> for i/o timeout, or fast fail i/o timout, or devloss_tmo, I guess you 
> could.
>
>
>> This means, if the HBA problem is detected in interrupt context,
>> fc_remote_port_delete has to be called before calling scsi_done.
>
> Well - execution context is somewhat unrelated, as it depends on how the 
> LLDD is implemented, and what else its doing when connectivity is lost.
>
>>
>> But the description for fc_remote_port_delete states:
>>
>>  * Called from normal process context only - cannot be called from
>>  * interrupt.
>>  *
>>  * Notes:
>>  *	This routine assumes no locks are held on entry.
>>  */
>>
>> Looking at the functions called from fc_remote_port_delete, i don't
>> see a problem in calling fc_remote_port_delete from interrupt context
>> or with locks held. Does this mean the description should be fixed or
>> am i missing something?
>
> That's probably true. underlying routines have changed a bit over time 
> and it may be better now. I'd still hesitate with 
> fc_tgt_it_nexus_destroy() (although it shouldn't be applicable to you), 
> and scsi_target_block().  Creating additional lock hierachies between 
> LLDD locks and the locks in these paths (which the LLDD rarely sees/knows 
> about) isn't good.   Thus, we've mostly pushed LLDDs to use a pristine 
> context when calling the transport (such as a workq context) so that we 
> can disassociate low-level LLDD design from midlayer design.
>
>
>> fc_remote_port_add on the other hand can wait during flushes and has
>> to be called from process context. To summarize:
>> - A LLD has to call fc_remote_port_delete before returning SCSI
>>   commands from a failed port or failed HBA.
>
> not true, but best behavior.
>
>> - fc_remote_port_delete can be called from interrupt context before
>>   calling scsi_done if necessary
>
> part a (called from interrupt context) - do so at your own risk.  These 
> other paths can change at any time and its not fair for those developers 
> to know your driver dependencies.
>
> part b (before calling scsi_done) - recommended approach.
>
>> - fc_remote_port_add has to be called from process context
>
> True.
>
>> - The LLD has to serialize the fc_remote_port_add and
>>   fc_remote_port_delete calls to guarantee the add->delete->...
>>   sequence.
>
> True.

And, at least in zfcp, the notification from the hardware about I/O
completion to the call to scsi_done runs in softirq context. Calling
fc_remote_port_delete from this context is no good thing as you
mentioned and i don't see a good way to synchronize the
fc_remote_port_add/delete calls when going this way.

So far i see two possible solutions:

1) When the error is detected in softirq context, do not call
   scsi_done. Defer this call to the error handling thread/workqueue
   that will first call fc_remote_port_delete and then return all
   affected SCSI commands.

2) Have an LLD internal flag indicating "transitioning to rport
   blocked state", check for this in queuecommand and return
   DID_IMM_RETRY as fc_remote_port_chkready does. As soon as
   fc_remote_port_delete has been called, fc_remote_port_chkready will
   do the right thing.

It looks to me that 2) might be a short-term solution while 1) looks
like a proper way of handling interruptions on the host level in the
long term.

Anyway, thanks for the input.

I am tempted to summarize this for scsi_fc_transport.txt to have the
important requirements in one place. But this depends on the available
time, so no promises.

Christof

next prev parent reply	other threads:[~2009-10-23  7:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-20 14:40 fc_remote_port_delete and returning SCSI commands from LLD Christof Schmitt
2009-10-21 15:24 ` Christof Schmitt
2009-10-21 16:33   ` James Smart
2009-10-23  7:58     ` Christof Schmitt
2009-10-23 14:50       ` James Smart
2009-10-27 16:59         ` Christof Schmitt
2009-10-27 19:44           ` James Smart
2009-10-21 16:24 ` James Smart
2009-10-23  7:47   ` Christof Schmitt [this message]
2009-10-23 14:47     ` James Smart
2009-10-27 21:57       ` Mike Christie
2009-10-21 18:11 ` Mike Christie
2009-10-23  7:13   ` Christof Schmitt
2009-10-27 21:53     ` Mike Christie
2009-10-28 14:27       ` Christof Schmitt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091023074746.GB5930@schmichrtp.de.ibm.com \
    --to=christof.schmitt@de.ibm.com \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.