From: Jarod Wilson <jarod@redhat.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter
Date: Tue, 06 Oct 2015 17:50:29 -0400 [thread overview]
Message-ID: <56144225.2010308@redhat.com> (raw)
In-Reply-To: <5600E55B.1070406@gmail.com>
Alexander Duyck wrote:
> On 09/21/2015 09:14 PM, Jarod Wilson wrote:
>> Alexander Duyck wrote:
>>> On 09/21/2015 10:11 AM, Jarod Wilson wrote:
>>>> Some pci changes upcoming in 4.3 seem to cause additional disconnects,
>>>> which can happen at unfortuitous times for igb, leading to issues
>>>> such as
>>>> this, where the disconnect happened just before
>>>> igb_configure_tx_ring():
>>>>
...
>>>> Note: this is a follow-up patch in addition to the previously submitted
>>>> "igb: don't unmap NULL hw_addr"
>>>>
>>>> drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++--
>>>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
>>>> b/drivers/net/ethernet/intel/igb/igb_main.c
>>>> index 6369f9e..7060edf 100644
>>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>>>> @@ -952,6 +952,11 @@ static int igb_request_msix(struct igb_adapter
>>>> *adapter)
>>>> if (err)
>>>> goto err_out;
>>>>
>>>> + if (E1000_REMOVED(hw->hw_addr)) {
>>>> + err = -EIO;
>>>> + goto err_free;
>>>> + }
>>>> +
>>>> for (i = 0; i < adapter->num_q_vectors; i++) {
>>>> struct igb_q_vector *q_vector = adapter->q_vector[i];
>>>>
>>>
>>>
>>> Instead of using E1000_REMOVED we should just replace the
>>> adapter->hw.hw_addr in the setup of the itr_register with
>>> adapter->io_addr like you did for Rx/Tx below.
>>
>> I just tried that, and it reliably blows up horrifically, wedging the
>> machine to the point where all I could get was a screen shot with my
>> phone thus far, when we jump from igb_request_msix() to
>> igb_configure_msix() to igb_assign_vector() and finally to
>> igb_write_ivar(), at least as best as I can tell from what I was able to
>> see in the trace remnants still on-screen.
>
> Take a look at array_rd32, that is buggy and doesn't match up with the
> rd32 implementation. If you fix that then the blow-up should go away.
>
> You shouldn't need to worry about itr_register since it will only get
> triggered if the interrupt can be enabled and that shouldn't be able to
> happen if the device is not present.
...
>> Just switching to adapter->io_addr everywhere seems to not work as noted
>> above. :\ Note that I'm also chasing this from the other end with the
>> author of the pci patches that seem to have triggered this, so the real
>> bug might be over in pci-land, but hardening against explosions in igb
>> still seems like a worthwhile effort here.
>
> I am pretty sure array_rd32 is the problem. If you fix that then I
> suspect you should quit seeing any further issues.
Okay, I've reworked array_rd32 a bit to call igb_rd32 as well, and with
that and some other minor tweaks, nothing blowing up. The hardware still
doesn't work when hotplugged, due to disappearing off the pci bus
temporarily, but I think that's a pci hotplug issue, not igb's fault.
Things seem to behave just fine if the device is connected at boot.
Updated patch forthcoming for dissection...
--
Jarod Wilson
jarod at redhat.com
WARNING: multiple messages have this Message-ID (diff)
From: Jarod Wilson <jarod@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
intel-wired-lan@lists.osuosl.org
Subject: Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter
Date: Tue, 06 Oct 2015 17:50:29 -0400 [thread overview]
Message-ID: <56144225.2010308@redhat.com> (raw)
In-Reply-To: <5600E55B.1070406@gmail.com>
Alexander Duyck wrote:
> On 09/21/2015 09:14 PM, Jarod Wilson wrote:
>> Alexander Duyck wrote:
>>> On 09/21/2015 10:11 AM, Jarod Wilson wrote:
>>>> Some pci changes upcoming in 4.3 seem to cause additional disconnects,
>>>> which can happen at unfortuitous times for igb, leading to issues
>>>> such as
>>>> this, where the disconnect happened just before
>>>> igb_configure_tx_ring():
>>>>
...
>>>> Note: this is a follow-up patch in addition to the previously submitted
>>>> "igb: don't unmap NULL hw_addr"
>>>>
>>>> drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++--
>>>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
>>>> b/drivers/net/ethernet/intel/igb/igb_main.c
>>>> index 6369f9e..7060edf 100644
>>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>>>> @@ -952,6 +952,11 @@ static int igb_request_msix(struct igb_adapter
>>>> *adapter)
>>>> if (err)
>>>> goto err_out;
>>>>
>>>> + if (E1000_REMOVED(hw->hw_addr)) {
>>>> + err = -EIO;
>>>> + goto err_free;
>>>> + }
>>>> +
>>>> for (i = 0; i < adapter->num_q_vectors; i++) {
>>>> struct igb_q_vector *q_vector = adapter->q_vector[i];
>>>>
>>>
>>>
>>> Instead of using E1000_REMOVED we should just replace the
>>> adapter->hw.hw_addr in the setup of the itr_register with
>>> adapter->io_addr like you did for Rx/Tx below.
>>
>> I just tried that, and it reliably blows up horrifically, wedging the
>> machine to the point where all I could get was a screen shot with my
>> phone thus far, when we jump from igb_request_msix() to
>> igb_configure_msix() to igb_assign_vector() and finally to
>> igb_write_ivar(), at least as best as I can tell from what I was able to
>> see in the trace remnants still on-screen.
>
> Take a look at array_rd32, that is buggy and doesn't match up with the
> rd32 implementation. If you fix that then the blow-up should go away.
>
> You shouldn't need to worry about itr_register since it will only get
> triggered if the interrupt can be enabled and that shouldn't be able to
> happen if the device is not present.
...
>> Just switching to adapter->io_addr everywhere seems to not work as noted
>> above. :\ Note that I'm also chasing this from the other end with the
>> author of the pci patches that seem to have triggered this, so the real
>> bug might be over in pci-land, but hardening against explosions in igb
>> still seems like a worthwhile effort here.
>
> I am pretty sure array_rd32 is the problem. If you fix that then I
> suspect you should quit seeing any further issues.
Okay, I've reworked array_rd32 a bit to call igb_rd32 as well, and with
that and some other minor tweaks, nothing blowing up. The hardware still
doesn't work when hotplugged, due to disappearing off the pci bus
temporarily, but I think that's a pci hotplug issue, not igb's fault.
Things seem to behave just fine if the device is connected at boot.
Updated patch forthcoming for dissection...
--
Jarod Wilson
jarod@redhat.com
next prev parent reply other threads:[~2015-10-06 21:50 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-21 17:11 [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter Jarod Wilson
2015-09-21 17:11 ` Jarod Wilson
2015-09-21 17:11 ` Jarod Wilson
2015-09-21 21:57 ` [Intel-wired-lan] " Alexander Duyck
2015-09-21 21:57 ` Alexander Duyck
2015-09-21 21:57 ` Alexander Duyck
2015-09-22 4:14 ` Jarod Wilson
2015-09-22 4:14 ` Jarod Wilson
2015-09-22 5:03 ` Mark Rustad
2015-09-22 5:03 ` Mark Rustad
2015-09-22 5:21 ` Alexander Duyck
2015-09-22 5:21 ` Alexander Duyck
2015-10-06 21:50 ` Jarod Wilson [this message]
2015-10-06 21:50 ` Jarod Wilson
2015-10-06 21:52 ` [Intel-wired-lan] [PATCH] igb: improve handling of disconnected adapters Jarod Wilson
2015-10-06 21:52 ` Jarod Wilson
2015-10-06 21:52 ` Jarod Wilson
2015-10-06 22:09 ` [Intel-wired-lan] " kbuild test robot
2015-10-06 22:09 ` kbuild test robot
2015-10-06 22:59 ` [Intel-wired-lan] " Alexander Duyck
2015-10-06 22:59 ` Alexander Duyck
2015-10-06 22:59 ` Alexander Duyck
2015-10-07 13:21 ` [Intel-wired-lan] [PATCH v3] " Jarod Wilson
2015-10-07 13:21 ` Jarod Wilson
2015-10-07 13:21 ` Jarod Wilson
2015-10-07 15:03 ` [Intel-wired-lan] " Alexander Duyck
2015-10-07 15:03 ` Alexander Duyck
2015-10-07 15:03 ` Alexander Duyck
2015-10-13 6:10 ` [Intel-wired-lan] " Jeff Kirsher
2015-10-13 6:10 ` Jeff Kirsher
2015-10-19 15:52 ` [Intel-wired-lan] [PATCH v2] " Jarod Wilson
2015-10-19 15:52 ` Jarod Wilson
2015-11-20 4:40 ` [Intel-wired-lan] " Brown, Aaron F
2015-11-20 4:40 ` Brown, Aaron F
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56144225.2010308@redhat.com \
--to=jarod@redhat.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.