Re: Debugging scsi abort handling ?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hannes Reinecke <hare@suse.de>
To: Finn Thain <fthain@telegraphics.com.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Hans de Goede <hdegoede@redhat.com>,
	Bart Van Assche <bvanassche@acm.org>,
	SCSI development list <linux-scsi@vger.kernel.org>,
	Robert Elliot <elliot@hp.com>
Subject: Re: Debugging scsi abort handling ?
Date: Fri, 29 Aug 2014 12:30:26 +0200	[thread overview]
Message-ID: <54005642.8050805@suse.de> (raw)
In-Reply-To: <alpine.LNX.2.00.1408291924010.25858@nippy.intranet>

On 08/29/2014 12:14 PM, Finn Thain wrote:
>
> On Fri, 29 Aug 2014, Hannes Reinecke wrote:
>
>> On 08/29/2014 06:39 AM, Finn Thain wrote:
>>>
>>> On Thu, 28 Aug 2014, Hannes Reinecke wrote:
>>>
>>>> What might happen, though, that the command is already dead and gone
>>>> by the time you're calling ->scsi_done() (if you call it after
>>>> eh_abort). So there might not _be_ a command upon which you can call
>>>> ->scsi_done() to start with.
>>>>
>>>> Hence any LLDD need to clear up any internal references after a call
>>>> to eh_XXX to ensure it doesn't call ->scsi_done() an in invalid
>>>> command.
>>>>
>>>> So even if the LLDD returns 'FAILED' upon a call to eh_XXX it
>>>> _still_ needs to clear up the internal reference.
>>>
>>> This is a question that has been bothering me too. If the host's
>>> eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable
>>> to re-issue the same command to the LLD (?)
>>>
>> No.
>> FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.
>
> Makes sense, though it appears to contradict this advice about returning
> SUCCESS in some situations:
> http://marc.info/?l=linux-scsi&m=140923498632496&w=2
>
Well, if the LLDD detects an invalid command (ie if it cannot find 
any internal command matching the midlayer command) that's an 
automatic success, obviously.

So we should rephrase things to:

- The eh_XXX callback shall return 'SUCCESS' if the respective
   TMF (or equvalent) could be initiated or if the matching command
   reference has already been completed by the LLDD. Otherwise
   the eh_XXX callback shall return 'FAILED'.

>> The command will only ever be re-issued once EH completes.
>
> ...
>
>>
>> But indeed, 'FAILED' is not very meaningful here, leaving the midlayer
>> with no information about what happened to the command.
>>
>> Personally I would like to enforce this meaning on the eh_XXX callbacks:
>> - upon each eh_XXX callback the LLDD clears any internal references
>>    to the command / command scope (ie eh_abort_cmd clears the
>>    references to the command, eh_lun_reset clears all internal
>>    references to commands to this ITL nexus etc.)
>>    This happens irrespective of the return code.
>> - The eh_XXX callback shall return 'FAILED' if the respective
>>    TMF (or equivalent) could not be initiated.
>> - The eh_XXX callback shall return 'SUCCESS' if the respective
>>    TMF (or equvalent) could be initiated.
>> - After each eh_XXX callback control for this command / command
>>    scope is transferred back to the midlayer; the LLDD shall not
>>    assume the associated command structures to remain valid after
>>    that point.
>
> Perhaps that last constraint should be relaxed to "After the final EH
> callback (whether implemented or unimplemented by the host), command /
> command scope is transferred back to the midlayer..."
>
No, that's wrong.

By the time any eh_XXX callbacks are triggered control _is_ already 
back at the midlayer. IE the command timeout triggered and the block 
layer already set the REQ_ATOM_COMPLETED flag, short-circuiting any 
attempts to call ->scsi_done().
So with the callbacks the midlayer actually informs the LLDD about a 
certain fact; there is nothing the LLDD can do to change ownership 
at that point.

(Correction: During the call of any eh_XXX callbacks control _is_ 
back at the LLDD, otherwise the callbacks would be pointless. It's
just that the LLDD shouldn't assume the command is valid _after_
any of the eh_XXX callbacks has terminated.)

> A more severe TMF is probably mandatory (e.g. bus reset) but if the driver
> author later added a milder one (e.g. bus device reset), your rule would
> mean that the existing handler would then operate under new constraints,
> which might cause surprises.
>
Well, _if_ we were to adopt this rule we obviously have to audit
existing LLDDs if the rule is followed, and tweak them if not.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2014-08-29 10:30 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-23 14:52 Debugging scsi abort handling ? Hans de Goede
2014-08-23 15:42 ` Douglas Gilbert
2014-08-24  8:39   ` Hans de Goede
2014-08-23 21:05 ` James Bottomley
2014-08-24  8:46   ` Hans de Goede
2014-08-24 21:12     ` Christoph Hellwig
2014-08-25  7:20 ` Paolo Bonzini
2014-08-25  8:47   ` Hans de Goede
2014-08-25 10:28     ` Bart Van Assche
2014-08-25 11:15       ` Paolo Bonzini
2014-08-25 11:26         ` Hans de Goede
2014-08-25 11:39           ` Paolo Bonzini
2014-08-25 15:41             ` James Bottomley
2014-08-26  8:13               ` Hans de Goede
2014-08-26 18:34                 ` James Bottomley
2014-08-26 19:19                   ` Hans de Goede
2014-08-28 12:10                     ` Hannes Reinecke
2014-08-28 12:24                       ` Hans de Goede
2014-08-28 12:04         ` Hannes Reinecke
2014-08-28 12:17           ` Paolo Bonzini
2014-08-28 12:26             ` Hans de Goede
2014-08-28 12:33               ` Paolo Bonzini
2014-08-28 12:37                 ` Hans de Goede
2014-08-28 14:08                   ` James Bottomley
2014-08-28 14:17                   ` Hannes Reinecke
2014-08-28 14:56                     ` Paolo Bonzini
2014-08-28 15:13                       ` Hannes Reinecke
2014-08-28 15:50                         ` Elliott, Robert (Server Storage)
2014-08-28 15:54                           ` Paolo Bonzini
2014-08-28 15:56                             ` Christoph Hellwig
2014-08-29  4:39                         ` Finn Thain
2014-08-29  6:08                           ` Hannes Reinecke
2014-08-29  7:48                             ` Paolo Bonzini
2014-08-29 10:14                             ` Finn Thain
2014-08-29 10:30                               ` Hannes Reinecke [this message]
2014-08-29 10:39                                 ` Hans de Goede
2014-08-29 10:49                                   ` Hannes Reinecke
2014-08-28 12:21           ` Hans de Goede
2014-08-28 14:09             ` James Bottomley
2014-08-29  4:37               ` Finn Thain
2014-08-29  4:52                 ` Elliott, Robert (Server Storage)
2014-08-28 12:31           ` Martin Peschke
2014-08-28 14:22             ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54005642.8050805@suse.de \
    --to=hare@suse.de \
    --cc=bvanassche@acm.org \
    --cc=elliot@hp.com \
    --cc=fthain@telegraphics.com.au \
    --cc=hdegoede@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.