Re: [Open-FCoE] [v2 PATCH 4/5] bnx2fc: Broadcom FCoE Offload driver submission - part 2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Christie <michaelc@cs.wisc.edu>
To: Bhanu Gollapudi <bprakash@broadcom.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	andrew.vasquez@qlogic.com,
	"devel@open-fcoe.org" <devel@open-fcoe.org>
Subject: Re: [Open-FCoE] [v2 PATCH 4/5] bnx2fc: Broadcom FCoE Offload driver submission - part 2
Date: Thu, 03 Feb 2011 15:02:27 -0600	[thread overview]
Message-ID: <4D4B17E3.8070400@cs.wisc.edu> (raw)
In-Reply-To: <4D4B165E.2010805@cs.wisc.edu>

On 02/03/2011 02:55 PM, Mike Christie wrote:
> On 02/03/2011 01:04 AM, Bhanu Gollapudi wrote:
>> On Wed, 2011-02-02 at 20:47 -0800, Mike Christie wrote:
>>> On 02/02/2011 10:05 PM, Mike Christie wrote:
>>>> On 02/02/2011 09:42 PM, Bhanu Gollapudi wrote:
>>>>>>
>>>>>> Actually you do not have to wait for the scsi eh to run, right. It
>>>>>> looks
>>>>>> like bnx2fc would log out the port, which ends up calling
>>>>>> fc_remote_port_delete and that would cause the fc timed out function
>>>>>> to
>>>>>> return BLK_EH_RESET_TIMER to prevent the scsi eh from running. Is
>>>>>> that
>>>>>> right? That type of eh strategy behavior seems like something you
>>>>>> want
>>>>>> to sync up with libfc or the fc class so all drivers do something
>>>>>> similar.
>>>>>
>>>>> As per FCP-4, if the ABTS times out, we will have to explicitly
>>>>> LOGO the
>>>>
>>>> What section is that in?
>>>>
>>>
>>> Ok read it (12.5.1, right).
>>>
>>>>> target and relogin back. If we rely on 60 sec eh_abort_handler, and if
>>>>> ABTS times out, SCSI error handling will go to LUN RESET, TGT reset
>>>>> path, which is a generic error handling than transport specific error
>>>>> handling.
>>>>
>>>> If that is right, then it seems the other FC drivers are doing it wrong
>>>> then, and you hit that problem if someone sets the scsi cmd timer lower
>>>> than BNX2FC_IO_TIMEOUT. If that is right, that just does not seem right
>>>> to hack around the issue in the driver too.
>>>
>>> So if your reading of 12.5.1 is right then libfc is wrong and it seems
>>> other drivers (if they are not doing some magic in firmware) are
>>> wrong too.
>>>
>>> My confidence in my FCP skills are very shaken right now :) I am not
>>> sure I what I was thinking when I read it and reviewed libfc. I think
>>> you need to discuss this out the fcoe list people and James Smart and
>>> Andrew Vasquez.
>>>
>>> I think some of them disagree with the other aborting commands (or maybe
>>> just disagree about some of the details), so that should be discussed
>>> too.
>>>
>>> But if you are right then you cannot work around this in a driver
>>> specific way. You need to change libfc and the fc class in a way that
>>> the error strategy is correct. For example from fc_timed_out you could
>>> kick off the abort. I was slightly off on the other comment about libfc
>>> not doing a abort from their internal timeout handler. They do an abort
>>> still, but if that fails they let the scsi eh run eventually. I thought
>>> they were going to clean that up too when they removed their internal
>>> timer value in the "libfc: use rport timeout values for fcp recovery"
>>> patch.
>>
>> James, Robert, Andrew,
>>
>> Can you please shed some light on this?
>>
>
> I got a response from James S offlist, and I think Bahnu is right. I am
> not sure if we have to change the driver before it is merged. That is up
> to JamesB. However, I would like to fix this in a common way (maybe a ok
> LSF topic or something).
>
> To fix this I think we have to do:
>
> 1. For issue of sending aborts after resets, it seems we need to do
> this. libfc needs this fixed. Maybe qlogic does too (if the firmware
> does not do this then the driver needs code added). I think bfa does too
> if it does not do it in firmware (did not see any code for it in driver).
>
> 2. For the ABTS if it timed out, do a logout issue. Maybe this is time
> to finally have the transport classes help out more, because it does not
> make sense for drivers to side step the eh code.
>
> My idea and some questions.
>
> I was thinking that this could be kicked off from fc_timed_out instead
> of eh_strategy_handler. This would allow us to do recovery without
> having to stop the entire host. I think this will be ok, because FC
> drivers seem to support the ability to send aborts and logout of ports
> without having to stop the entire host.
>
> 1. So fc_timed_out would have the driver kick of an abort, if the port
> state is online.
> 2.
> - If the abort times out, the fc class will have the driver do a logout
> of the port.
> - If the abort completes but indicates failure, do we do want to still
> do a lun reset? If we do a lun reset and that fails, then instead of a
> target reset do the logout of the port.
>
> 4. If logout of the port fails for #2, then let scsi eh have it so it
> can reset the host and possibly offline devices
>

One clarification to #4. fast io fail is not set then I guess we wait 
for dev_loss to timeout and if it does we remove devices (devs never go 
into offline then). If fast io fail is set then I guess we do like we do 
today where we fast fail from the scsi eh and devcies would get offlined 
then removed later if dev_loss fires.

next prev parent reply	other threads:[~2011-02-03 21:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-24  6:02 [v2 PATCH 4/5] bnx2fc: Broadcom FCoE Offload driver submission - part 2 Bhanu Gollapudi
2011-01-15  9:17 ` Mike Christie
2011-01-18  0:37   ` Bhanu Gollapudi
2011-02-02  9:24     ` Mike Christie
2011-02-02  9:57       ` [Open-FCoE] " Mike Christie
2011-02-03  3:42       ` Bhanu Gollapudi
2011-02-03  4:05         ` Mike Christie
2011-02-03  4:16           ` [Open-FCoE] " Mike Christie
2011-02-03  4:47           ` Mike Christie
2011-02-03  7:04             ` Bhanu Gollapudi
2011-02-03 20:55               ` Mike Christie
2011-02-03 21:02                 ` Mike Christie [this message]
2011-02-03 21:26                 ` Mike Christie
2011-01-18  2:44 ` Mike Christie
2011-01-18  3:29   ` Bhanu Gollapudi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D4B17E3.8070400@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=andrew.vasquez@qlogic.com \
    --cc=bprakash@broadcom.com \
    --cc=devel@open-fcoe.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.