netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>,
	netdev <netdev@vger.kernel.org>,
	Ariel Elior <Ariel.Elior@qlogic.com>
Subject: Re: [PATCH net] bnx2x: don't reset chip on cleanup if PCI function is offline
Date: Tue, 16 Aug 2016 14:51:01 -0300	[thread overview]
Message-ID: <57B35285.2090400@linux.vnet.ibm.com> (raw)
In-Reply-To: <57AB2C11.1070706@linux.vnet.ibm.com>

On 08/10/2016 10:28 AM, Guilherme G. Piccoli wrote:
> On 08/10/2016 04:59 AM, Yuval Mintz wrote:
>>>> Why would the published resume()  from pci_error_handlers be called
>>>> in this
>>> scenario?
>>>
>>> It isn't. That's why I specifically commented on commit message:
>>> "There are two
>>> cases though that another path is taken on the code".
>>>
>>> The code path reach bnx2x_chip_cleanup() on device removal from the
>>> system,
>>> as seen in the below call trace:
>>>
>>> bnx2x_chip_cleanup+0x3c0/0x910 [bnx2x]
>>> bnx2x_nic_unload+0x268/0xaf0 [bnx2x]
>>> bnx2x_close+0x34/0x50 [bnx2x]
>>> __dev_close_many+0xd4/0x150
>>> dev_close_many+0xa8/0x160
>>> rollback_registered_many+0x174/0x3f0
>>> rollback_registered+0x40/0x70
>>> unregister_netdevice_queue+0x98/0x110
>>> unregister_netdev+0x34/0x50
>>> __bnx2x_remove+0xa8/0x3a0 [bnx2x]
>>> pci_device_remove+0x70/0x110
>>
>> Makes sense.
>>
>>>>> Also, we avoid the MCP information dump in case of non-recoverable
>>>>> PCI error (when adapter is about to be removed), since it will
>>>>> certainly fail.
>>>>
>>>> We should probably avoid several things here; Why specifically only
>>>> this?
>>>
>>> For example, we shouldn't execute bnx2x_timer() in this scenario. But
>>> I thought
>>> it'd be too much to check every call of a timer function against PCI
>>> channel state
>>> just to avoid it's execution on this scenario, so I just let it
>>> execute, since it seems
>>> harmless.
>>>
>>>>> +    /* Reset the chip, unless PCI function is offline. If we reach
>>>>> this
>>>>> +     * point following a PCI error handling, it means device is
>>>>> really
>>>>> +     * in a bad state and we're about to remove it, so reset the chip
>>>>> +     * is not a good idea.
>>>>> +     */
>>>>> +    if (!pci_channel_offline(bp->pdev)) {
>>>>> +        rc = bnx2x_reset_hw(bp, reset_code);
>>>>> +        if (rc)
>>>>> +            BNX2X_ERR("HW_RESET failed\n");
>>>>> +    }
>>>>
>>>> Why not simply check this at the beginning of the function?
>>>
>>> Because I wasn't sure if I could drop the entire execution of
>>> chip_cleanup(). I
>>> tried to keep the most of this function aiming to shutdown the module
>>> in a
>>> gentle way, like cleaning MAC, stopping queues...but again, I'm open to
>>> suggestions and gladly will change this in v2 if you think it's for
>>> the best.
>>
>> Problem is I won't be able to have a more thorough review of this in
>> the next
>> couple of days - and other than code-review I won't have a reasonable way
>> of testing this [I can use aer_inject, but I don't have your magical EEH
>> error injections, and I'm not at all certain it would suffice for a
>> good testing ].
>>
>> I agree that even as-is, what you're suggesting is an improvement to the
>> existing flow - so it's basically up to dave, i.e., whether to take a
>> half fix
>> or wait for a more thorough one.
>>
>
> Thanks for your consideration. The point is: the important part of this
> patch is avoiding the reset_hw() path, since it will clearly fail and
> generate soft lockups. This is the fix per se, the other part (regarding
> the MCP dump) is just an improvement; surely we have more potential
> improvements to explore, but they wouldn't be fixes, only code
> improvements.
>
> So, I wouldn't call this a half fix, but yet, a completely fix with a
> small improvement as a bonus :)
>
> Cheers,
>
>
> Guilherme
>

David, sorry to bother you - maybe you didn't notice this.
Any comments?

Thanks in advance,


Guilherme

      reply	other threads:[~2016-08-16 17:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-08 20:13 [PATCH net] bnx2x: don't reset chip on cleanup if PCI function is offline Guilherme G. Piccoli
2016-08-09 12:14 ` Yuval Mintz
2016-08-09 13:27   ` Guilherme G. Piccoli
2016-08-10  7:59     ` Yuval Mintz
2016-08-10 13:28       ` Guilherme G. Piccoli
2016-08-16 17:51         ` Guilherme G. Piccoli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57B35285.2090400@linux.vnet.ibm.com \
    --to=gpiccoli@linux.vnet.ibm.com \
    --cc=Ariel.Elior@qlogic.com \
    --cc=Yuval.Mintz@qlogic.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).