All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Finn Thain <fthain@telegraphics.com.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Hans de Goede <hdegoede@redhat.com>,
	Bart Van Assche <bvanassche@acm.org>,
	SCSI development list <linux-scsi@vger.kernel.org>,
	Robert Elliot <elliot@hp.com>
Subject: Re: Debugging scsi abort handling ?
Date: Fri, 29 Aug 2014 08:08:32 +0200	[thread overview]
Message-ID: <540018E0.9050907@suse.de> (raw)
In-Reply-To: <alpine.LNX.2.00.1408291419070.29948@nippy.intranet>

On 08/29/2014 06:39 AM, Finn Thain wrote:
>
> On Thu, 28 Aug 2014, Hannes Reinecke wrote:
>
>> What might happen, though, that the command is already dead and gone by
>> the time you're calling ->scsi_done() (if you call it after eh_abort).
>> So there might not _be_ a command upon which you can call ->scsi_done()
>> to start with.
>>
>> Hence any LLDD need to clear up any internal references after a call to
>> eh_XXX to ensure it doesn't call ->scsi_done() an in invalid command.
>>
>> So even if the LLDD returns 'FAILED' upon a call to eh_XXX it _still_
>> needs to clear up the internal reference.
>
> This is a question that has been bothering me too. If the host's
> eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable to
> re-issue the same command to the LLD (?)
>
No.
FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.
So the midlayer escalates to the next EH step.
The command will only ever be re-issued once EH completes.

>> Either that or return 'FAILED' for any later eh_XXX function until the
>> internal references can be cleared up.
>
> So if a command may or may not "exist" after eh_abort_handler() returns
> control to the mid-layer (regardless of SUCCESS or FAILURE), then the LLD
> has to be careful about keeping track of which commands were aborted, if
> those commands are still in the process of cleanup when eh_abort_handler()
> returns.
>
Yes.

> It's hard to see how that can work when command pointers are only unique
> while a command "exists".
>
Which is why we have the EH callbacks, to give the LLDD a chance to 
clean up internal references.

> In effect, this would mean that EH functions cannot return at all, until
> the relevant command(s) are completely forgotten by the LLD; and that
> means the LLD itself may have to escalate abort -> device reset -> bus
> reset -> etc instead of simply returning FAILED.
>
More often than not the LLDD has its own internal command structure, 
which reference the midlayer SCSI command structure via a pointer.
Just clearing that pointer will do the trick.

Take eg. lpfc:
It'll construct its internal command here:

	lpfc_cmd = lpfc_get_scsi_buf(phba, ndlp);
	if (lpfc_cmd == NULL) {
		lpfc_rampdown_queue_depth(phba);

		lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP,
				 "0707 driver's buffer pool is empty, "
				 "IO busied\n");
		goto out_host_busy;
	}

	/*
	 * Store the midlayer's command structure for the
	 * completion phase
	 * and complete the command initialization.
	 */
	lpfc_cmd->pCmd  = cmnd;
	lpfc_cmd->rdata = rdata;
	lpfc_cmd->timeout = 0;
	lpfc_cmd->start_time = jiffies;
	cmnd->host_scribble = (unsigned char *)lpfc_cmd;

and then checks for the pointer upon command completion:

static void
lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq 
*pIocbIn,
			struct lpfc_iocbq *pIocbOut)
{
	struct lpfc_scsi_buf *lpfc_cmd =
		(struct lpfc_scsi_buf *) pIocbIn->context1;

[ .. ]
	/* Sanity check on return of outstanding command */
	if (!(lpfc_cmd->pCmd))
		return;

But indeed, 'FAILED' is not very meaningful here, leaving the 
midlayer with no information about what happened to the command.

Personally I would like to enforce this meaning on the eh_XXX callbacks:
- upon each eh_XXX callback the LLDD clears any internal references
   to the command / command scope (ie eh_abort_cmd clears the
   references to the command, eh_lun_reset clears all internal
   references to commands to this ITL nexus etc.)
   This happens irrespective of the return code.
- The eh_XXX callback shall return 'FAILED' if the respective
   TMF (or equivalent) could not be initiated.
- The eh_XXX callback shall return 'SUCCESS' if the respective
   TMF (or equvalent) could be initiated.
- After each eh_XXX callback control for this command / command
   scope is transferred back to the midlayer; the LLDD shall not
   assume the associated command structures to remain valid after
   that point.

I'm tempted to enshrine this in the documentation;
that surely will help me during the EH cleanup.
And Hans will have some guidelines on how to design uas EH :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-08-29  6:08 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-23 14:52 Debugging scsi abort handling ? Hans de Goede
2014-08-23 15:42 ` Douglas Gilbert
2014-08-24  8:39   ` Hans de Goede
2014-08-23 21:05 ` James Bottomley
2014-08-24  8:46   ` Hans de Goede
2014-08-24 21:12     ` Christoph Hellwig
2014-08-25  7:20 ` Paolo Bonzini
2014-08-25  8:47   ` Hans de Goede
2014-08-25 10:28     ` Bart Van Assche
2014-08-25 11:15       ` Paolo Bonzini
2014-08-25 11:26         ` Hans de Goede
2014-08-25 11:39           ` Paolo Bonzini
2014-08-25 15:41             ` James Bottomley
2014-08-26  8:13               ` Hans de Goede
2014-08-26 18:34                 ` James Bottomley
2014-08-26 19:19                   ` Hans de Goede
2014-08-28 12:10                     ` Hannes Reinecke
2014-08-28 12:24                       ` Hans de Goede
2014-08-28 12:04         ` Hannes Reinecke
2014-08-28 12:17           ` Paolo Bonzini
2014-08-28 12:26             ` Hans de Goede
2014-08-28 12:33               ` Paolo Bonzini
2014-08-28 12:37                 ` Hans de Goede
2014-08-28 14:08                   ` James Bottomley
2014-08-28 14:17                   ` Hannes Reinecke
2014-08-28 14:56                     ` Paolo Bonzini
2014-08-28 15:13                       ` Hannes Reinecke
2014-08-28 15:50                         ` Elliott, Robert (Server Storage)
2014-08-28 15:54                           ` Paolo Bonzini
2014-08-28 15:56                             ` Christoph Hellwig
2014-08-29  4:39                         ` Finn Thain
2014-08-29  6:08                           ` Hannes Reinecke [this message]
2014-08-29  7:48                             ` Paolo Bonzini
2014-08-29 10:14                             ` Finn Thain
2014-08-29 10:30                               ` Hannes Reinecke
2014-08-29 10:39                                 ` Hans de Goede
2014-08-29 10:49                                   ` Hannes Reinecke
2014-08-28 12:21           ` Hans de Goede
2014-08-28 14:09             ` James Bottomley
2014-08-29  4:37               ` Finn Thain
2014-08-29  4:52                 ` Elliott, Robert (Server Storage)
2014-08-28 12:31           ` Martin Peschke
2014-08-28 14:22             ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540018E0.9050907@suse.de \
    --to=hare@suse.de \
    --cc=bvanassche@acm.org \
    --cc=elliot@hp.com \
    --cc=fthain@telegraphics.com.au \
    --cc=hdegoede@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.