From: Hannes Reinecke <hare@suse.de>
To: Finn Thain <fthain@telegraphics.com.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Hans de Goede <hdegoede@redhat.com>,
Bart Van Assche <bvanassche@acm.org>,
SCSI development list <linux-scsi@vger.kernel.org>,
Robert Elliot <elliot@hp.com>
Subject: Re: Debugging scsi abort handling ?
Date: Fri, 29 Aug 2014 12:30:26 +0200 [thread overview]
Message-ID: <54005642.8050805@suse.de> (raw)
In-Reply-To: <alpine.LNX.2.00.1408291924010.25858@nippy.intranet>
On 08/29/2014 12:14 PM, Finn Thain wrote:
>
> On Fri, 29 Aug 2014, Hannes Reinecke wrote:
>
>> On 08/29/2014 06:39 AM, Finn Thain wrote:
>>>
>>> On Thu, 28 Aug 2014, Hannes Reinecke wrote:
>>>
>>>> What might happen, though, that the command is already dead and gone
>>>> by the time you're calling ->scsi_done() (if you call it after
>>>> eh_abort). So there might not _be_ a command upon which you can call
>>>> ->scsi_done() to start with.
>>>>
>>>> Hence any LLDD need to clear up any internal references after a call
>>>> to eh_XXX to ensure it doesn't call ->scsi_done() an in invalid
>>>> command.
>>>>
>>>> So even if the LLDD returns 'FAILED' upon a call to eh_XXX it
>>>> _still_ needs to clear up the internal reference.
>>>
>>> This is a question that has been bothering me too. If the host's
>>> eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable
>>> to re-issue the same command to the LLD (?)
>>>
>> No.
>> FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.
>
> Makes sense, though it appears to contradict this advice about returning
> SUCCESS in some situations:
> http://marc.info/?l=linux-scsi&m=140923498632496&w=2
>
Well, if the LLDD detects an invalid command (ie if it cannot find
any internal command matching the midlayer command) that's an
automatic success, obviously.
So we should rephrase things to:
- The eh_XXX callback shall return 'SUCCESS' if the respective
TMF (or equvalent) could be initiated or if the matching command
reference has already been completed by the LLDD. Otherwise
the eh_XXX callback shall return 'FAILED'.
>> The command will only ever be re-issued once EH completes.
>
> ...
>
>>
>> But indeed, 'FAILED' is not very meaningful here, leaving the midlayer
>> with no information about what happened to the command.
>>
>> Personally I would like to enforce this meaning on the eh_XXX callbacks:
>> - upon each eh_XXX callback the LLDD clears any internal references
>> to the command / command scope (ie eh_abort_cmd clears the
>> references to the command, eh_lun_reset clears all internal
>> references to commands to this ITL nexus etc.)
>> This happens irrespective of the return code.
>> - The eh_XXX callback shall return 'FAILED' if the respective
>> TMF (or equivalent) could not be initiated.
>> - The eh_XXX callback shall return 'SUCCESS' if the respective
>> TMF (or equvalent) could be initiated.
>> - After each eh_XXX callback control for this command / command
>> scope is transferred back to the midlayer; the LLDD shall not
>> assume the associated command structures to remain valid after
>> that point.
>
> Perhaps that last constraint should be relaxed to "After the final EH
> callback (whether implemented or unimplemented by the host), command /
> command scope is transferred back to the midlayer..."
>
No, that's wrong.
By the time any eh_XXX callbacks are triggered control _is_ already
back at the midlayer. IE the command timeout triggered and the block
layer already set the REQ_ATOM_COMPLETED flag, short-circuiting any
attempts to call ->scsi_done().
So with the callbacks the midlayer actually informs the LLDD about a
certain fact; there is nothing the LLDD can do to change ownership
at that point.
(Correction: During the call of any eh_XXX callbacks control _is_
back at the LLDD, otherwise the callbacks would be pointless. It's
just that the LLDD shouldn't assume the command is valid _after_
any of the eh_XXX callbacks has terminated.)
> A more severe TMF is probably mandatory (e.g. bus reset) but if the driver
> author later added a milder one (e.g. bus device reset), your rule would
> mean that the existing handler would then operate under new constraints,
> which might cause surprises.
>
Well, _if_ we were to adopt this rule we obviously have to audit
existing LLDDs if the rule is followed, and tweak them if not.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-08-29 10:30 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-23 14:52 Debugging scsi abort handling ? Hans de Goede
2014-08-23 15:42 ` Douglas Gilbert
2014-08-24 8:39 ` Hans de Goede
2014-08-23 21:05 ` James Bottomley
2014-08-24 8:46 ` Hans de Goede
2014-08-24 21:12 ` Christoph Hellwig
2014-08-25 7:20 ` Paolo Bonzini
2014-08-25 8:47 ` Hans de Goede
2014-08-25 10:28 ` Bart Van Assche
2014-08-25 11:15 ` Paolo Bonzini
2014-08-25 11:26 ` Hans de Goede
2014-08-25 11:39 ` Paolo Bonzini
2014-08-25 15:41 ` James Bottomley
2014-08-26 8:13 ` Hans de Goede
2014-08-26 18:34 ` James Bottomley
2014-08-26 19:19 ` Hans de Goede
2014-08-28 12:10 ` Hannes Reinecke
2014-08-28 12:24 ` Hans de Goede
2014-08-28 12:04 ` Hannes Reinecke
2014-08-28 12:17 ` Paolo Bonzini
2014-08-28 12:26 ` Hans de Goede
2014-08-28 12:33 ` Paolo Bonzini
2014-08-28 12:37 ` Hans de Goede
2014-08-28 14:08 ` James Bottomley
2014-08-28 14:17 ` Hannes Reinecke
2014-08-28 14:56 ` Paolo Bonzini
2014-08-28 15:13 ` Hannes Reinecke
2014-08-28 15:50 ` Elliott, Robert (Server Storage)
2014-08-28 15:54 ` Paolo Bonzini
2014-08-28 15:56 ` Christoph Hellwig
2014-08-29 4:39 ` Finn Thain
2014-08-29 6:08 ` Hannes Reinecke
2014-08-29 7:48 ` Paolo Bonzini
2014-08-29 10:14 ` Finn Thain
2014-08-29 10:30 ` Hannes Reinecke [this message]
2014-08-29 10:39 ` Hans de Goede
2014-08-29 10:49 ` Hannes Reinecke
2014-08-28 12:21 ` Hans de Goede
2014-08-28 14:09 ` James Bottomley
2014-08-29 4:37 ` Finn Thain
2014-08-29 4:52 ` Elliott, Robert (Server Storage)
2014-08-28 12:31 ` Martin Peschke
2014-08-28 14:22 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54005642.8050805@suse.de \
--to=hare@suse.de \
--cc=bvanassche@acm.org \
--cc=elliot@hp.com \
--cc=fthain@telegraphics.com.au \
--cc=hdegoede@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.