All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@suse.de>, linux-scsi@vger.kernel.org
Subject: Re: [PATCH] Check return value of fc_block_scsi_eh()
Date: Fri, 15 Oct 2010 10:39:21 +0200	[thread overview]
Message-ID: <4CB81339.8010105@suse.de> (raw)
In-Reply-To: <4CB753D7.6080608@cs.wisc.edu>

Mike Christie wrote:
> On 10/14/2010 01:45 AM, Hannes Reinecke wrote:
>> Mike Christie wrote:
>>> On 10/13/2010 06:42 AM, Hannes Reinecke wrote:
>>>>
>>>> fc_block_scsi_eh() might return with status FAST_IO_FAIL
>>>> indicating I/O has been terminated due to fast_io_fail timeout.
>>>> In this case the rport is still blocked, so any error recovery
>>>> will be failing on this port.
>>>> Hence each driver needs to check if the return value from
>>>> fc_block_scsi_eh() is something other than SUCCESS, in which
>>>> case it should just return with that status.
>>>> And we need to update the error handler to deal with a
>>>> status of FAST_IO_FAIL, too.
>>>> And fc_block_scsi_eh() should return SUCCESS on success,
>>>> not 0. Otherwise the calling routine will become confused
>>>> when reusing that value.
>>>>
>>>
>>> Is this patch supposed to fix the problem I described or is there more
>>> patches to follow or do you think I am being too paranoid? It seems the
>>> patch alone is going to make the problem worse in the drivers you are
>>> speeding up failure in. In the drivers now checking fc_block_scsi_eh and
>>> returning when fast io fail is returned then the scsi eh
>>> scsi_eh_flush_done_q's scsi_finish_command/scsi_queue_insert processing
>>> is going to get done faster and more likely to conflict with the
>>> termiante_rport_io callback processing, right?
>>
>> No, this patch does _not_ fix the race you've described, it just
>> fixes the problem of fc_block_scsi_eh() returning FAST_IO_FAIL,
>> which screws the error handler in drivers not checking the return value.
>> So it's partially covered by your patchset.
>>
>> However, in your patchset you're missing two issues:
>> - The current implementation of fc_block_scsi_eh() return '0' on
>>    success. This screws over drivers which re-uses this value;
>>    check eg. lpfc_abort_handler(). So we should be returning
>>    'SUCCESS' here.
> 
> Yeah, my patchset is actually backwards. I converted the other drivers
> to check for 0 (forgot to fix lpfc though). I think your patch patch
> where we return SUCCESS is better than what I was doing.
> 
> 
>> - scsi_error needs to be fixed up to handle FAST_IO_FAIL.
>>    Any of the functions called from scsi_abort_eh_cmnd() can return
>>    FAST_IO_FAIL, in which case escalating to the next function
>>    becomes pointless.
> 
> There is a problem with fnic (it is also a bug in my patches). If fnic's
> terminate_rport_io fails to abort the cmds, it is relying on the scsi eh
> callouts to clean things up. So if fc_scsi_block_eh returns non-success
> then we cannot return immediately from the callout. The drivers scsi eh
> callout may still need to clean up the command internally.
> 
Maybe it's better to change fnic to modify the FAST_IO_FAIL return
value from fc_block_scsi_eh() and return FAILED here.

> iSCSI/qla4xxx also needs to be changed to return SUCCESS instead of
> zero. Luckily there is only one driver using it so far.
> 
Yes, that would be preferable.

>>
>> I guess it would make sense to merge these two patchsets, either
>> having two patchsets (one for the FAST_IO_FAIL changes and one for
>> the rport race) or indeed merge them both into one.
>> I'm okay with either of it.
>>
> 
> My patches for the race do not work though. When you replied to my
> thread I thought you mean you were going to fix that too :)
> 
> I have been thinking of how to fix the race. I will make my patches over
> yours since your return SUCCESS fix is better.
Ok, I'll be sending an updated patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2010-10-15  8:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-13 11:42 [PATCH] Check return value of fc_block_scsi_eh() Hannes Reinecke
2010-10-13 13:30 ` Christof Schmitt
2010-10-13 20:37 ` Mike Christie
2010-10-14  6:45   ` Hannes Reinecke
2010-10-14 19:02     ` Mike Christie
2010-10-15  8:39       ` Hannes Reinecke [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CB81339.8010105@suse.de \
    --to=hare@suse.de \
    --cc=James.Bottomley@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.