From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>,
netdev <netdev@vger.kernel.org>,
Ariel Elior <Ariel.Elior@qlogic.com>
Subject: Re: [PATCH net] bnx2x: don't reset chip on cleanup if PCI function is offline
Date: Tue, 16 Aug 2016 14:51:01 -0300 [thread overview]
Message-ID: <57B35285.2090400@linux.vnet.ibm.com> (raw)
In-Reply-To: <57AB2C11.1070706@linux.vnet.ibm.com>
On 08/10/2016 10:28 AM, Guilherme G. Piccoli wrote:
> On 08/10/2016 04:59 AM, Yuval Mintz wrote:
>>>> Why would the published resume() from pci_error_handlers be called
>>>> in this
>>> scenario?
>>>
>>> It isn't. That's why I specifically commented on commit message:
>>> "There are two
>>> cases though that another path is taken on the code".
>>>
>>> The code path reach bnx2x_chip_cleanup() on device removal from the
>>> system,
>>> as seen in the below call trace:
>>>
>>> bnx2x_chip_cleanup+0x3c0/0x910 [bnx2x]
>>> bnx2x_nic_unload+0x268/0xaf0 [bnx2x]
>>> bnx2x_close+0x34/0x50 [bnx2x]
>>> __dev_close_many+0xd4/0x150
>>> dev_close_many+0xa8/0x160
>>> rollback_registered_many+0x174/0x3f0
>>> rollback_registered+0x40/0x70
>>> unregister_netdevice_queue+0x98/0x110
>>> unregister_netdev+0x34/0x50
>>> __bnx2x_remove+0xa8/0x3a0 [bnx2x]
>>> pci_device_remove+0x70/0x110
>>
>> Makes sense.
>>
>>>>> Also, we avoid the MCP information dump in case of non-recoverable
>>>>> PCI error (when adapter is about to be removed), since it will
>>>>> certainly fail.
>>>>
>>>> We should probably avoid several things here; Why specifically only
>>>> this?
>>>
>>> For example, we shouldn't execute bnx2x_timer() in this scenario. But
>>> I thought
>>> it'd be too much to check every call of a timer function against PCI
>>> channel state
>>> just to avoid it's execution on this scenario, so I just let it
>>> execute, since it seems
>>> harmless.
>>>
>>>>> + /* Reset the chip, unless PCI function is offline. If we reach
>>>>> this
>>>>> + * point following a PCI error handling, it means device is
>>>>> really
>>>>> + * in a bad state and we're about to remove it, so reset the chip
>>>>> + * is not a good idea.
>>>>> + */
>>>>> + if (!pci_channel_offline(bp->pdev)) {
>>>>> + rc = bnx2x_reset_hw(bp, reset_code);
>>>>> + if (rc)
>>>>> + BNX2X_ERR("HW_RESET failed\n");
>>>>> + }
>>>>
>>>> Why not simply check this at the beginning of the function?
>>>
>>> Because I wasn't sure if I could drop the entire execution of
>>> chip_cleanup(). I
>>> tried to keep the most of this function aiming to shutdown the module
>>> in a
>>> gentle way, like cleaning MAC, stopping queues...but again, I'm open to
>>> suggestions and gladly will change this in v2 if you think it's for
>>> the best.
>>
>> Problem is I won't be able to have a more thorough review of this in
>> the next
>> couple of days - and other than code-review I won't have a reasonable way
>> of testing this [I can use aer_inject, but I don't have your magical EEH
>> error injections, and I'm not at all certain it would suffice for a
>> good testing ].
>>
>> I agree that even as-is, what you're suggesting is an improvement to the
>> existing flow - so it's basically up to dave, i.e., whether to take a
>> half fix
>> or wait for a more thorough one.
>>
>
> Thanks for your consideration. The point is: the important part of this
> patch is avoiding the reset_hw() path, since it will clearly fail and
> generate soft lockups. This is the fix per se, the other part (regarding
> the MCP dump) is just an improvement; surely we have more potential
> improvements to explore, but they wouldn't be fixes, only code
> improvements.
>
> So, I wouldn't call this a half fix, but yet, a completely fix with a
> small improvement as a bonus :)
>
> Cheers,
>
>
> Guilherme
>
David, sorry to bother you - maybe you didn't notice this.
Any comments?
Thanks in advance,
Guilherme
prev parent reply other threads:[~2016-08-16 17:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-08 20:13 [PATCH net] bnx2x: don't reset chip on cleanup if PCI function is offline Guilherme G. Piccoli
2016-08-09 12:14 ` Yuval Mintz
2016-08-09 13:27 ` Guilherme G. Piccoli
2016-08-10 7:59 ` Yuval Mintz
2016-08-10 13:28 ` Guilherme G. Piccoli
2016-08-16 17:51 ` Guilherme G. Piccoli [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57B35285.2090400@linux.vnet.ibm.com \
--to=gpiccoli@linux.vnet.ibm.com \
--cc=Ariel.Elior@qlogic.com \
--cc=Yuval.Mintz@qlogic.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.