All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH net] igb: re-assign hw address pointer on reset after PCI error
Date: Thu, 10 Nov 2016 14:48:30 -0200	[thread overview]
Message-ID: <5824A4DE.9020508@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKgT0UdYrO14ytSgPfa6KgC0OPaydqtd2-j2+AQFgJMgz69d+w@mail.gmail.com>

On 11/09/2016 03:12 PM, Alexander Duyck wrote:
> On Mon, Oct 31, 2016 at 1:12 PM, Guilherme G. Piccoli
> <gpiccoli@linux.vnet.ibm.com> wrote:
>> Whenever the igb driver detects the result of a read operation returns
>> a value composed only by F's (like 0xFFFFFFFF), it will detach the
>> net_device, clear the hw_addr pointer and warn to the user that adapter's
>> link is lost - those steps happen on igb_rd32().
>>
>> In case a PCI error happens on Power architecture, there's a recovery
>> mechanism called EEH, that will reset the PCI slot and call driver's
>> handlers to reset the adapter and network functionality as well.
>>
>> We observed that once hw_addr is NULL after the error is detected on
>> igb_rd32(), it's never assigned back, so in the process of resetting
>> the network functionality we got a NULL pointer dereference in both
>> igb_configure_tx_ring() and igb_configure_rx_ring(). In order to avoid
>> such bug, we re-assign the hw_addr value in the beginning of the
>> function igb_reset(), in case the hw_addr is NULL when we reach that
>> path.
>>
>> Reported-by: Anthony H. Thai <ahthai@us.ibm.com>
>> Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index edc9a6a..c19119c 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -1873,6 +1873,13 @@ void igb_reset(struct igb_adapter *adapter)
>>         struct e1000_fc_info *fc = &hw->fc;
>>         u32 pba, hwm;
>>
>> +       /* In case of PCI error, adapter might have lost its HW
>> +        * address; if we reached this point after an error scenario,
>> +        * we should re-assign the hw_addr based on the saved io_addr.
>> +        */
>> +       if (!hw->hw_addr)
>> +               hw->hw_addr = adapter->io_addr;
>> +
>>         /* Repartition Pba for greater than 9k mtu
>>          * To take effect CTRL.RST is required.
>>          */
> 
> It seems like this would have the potential to get noisy pretty
> quickly since every reset would retrigger this.
> 
> It might make more sense to move this line into igb_io_slot_reset and
> igb_resume where the device would have gone through a reset of some
> sort and then had the state of the device restored and the device
> memory access was re-enabled via pci_enable_device_mem.  Also there is
> no point in really doing the "if" since you should always be okay to
> overwrite hw->hw_addr with adapter->io_addr.

Thanks for the good suggestion Alexander, will send a V2.
Cheers,

Guilherme


> 
> Thanks.
> 
> - Alex
> 


WARNING: multiple messages have this Message-ID (diff)
From: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	Netdev <netdev@vger.kernel.org>,
	"Fujinaka, Todd" <todd.fujinaka@intel.com>
Subject: Re: [Intel-wired-lan] [PATCH net] igb: re-assign hw address pointer on reset after PCI error
Date: Thu, 10 Nov 2016 14:48:30 -0200	[thread overview]
Message-ID: <5824A4DE.9020508@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKgT0UdYrO14ytSgPfa6KgC0OPaydqtd2-j2+AQFgJMgz69d+w@mail.gmail.com>

On 11/09/2016 03:12 PM, Alexander Duyck wrote:
> On Mon, Oct 31, 2016 at 1:12 PM, Guilherme G. Piccoli
> <gpiccoli@linux.vnet.ibm.com> wrote:
>> Whenever the igb driver detects the result of a read operation returns
>> a value composed only by F's (like 0xFFFFFFFF), it will detach the
>> net_device, clear the hw_addr pointer and warn to the user that adapter's
>> link is lost - those steps happen on igb_rd32().
>>
>> In case a PCI error happens on Power architecture, there's a recovery
>> mechanism called EEH, that will reset the PCI slot and call driver's
>> handlers to reset the adapter and network functionality as well.
>>
>> We observed that once hw_addr is NULL after the error is detected on
>> igb_rd32(), it's never assigned back, so in the process of resetting
>> the network functionality we got a NULL pointer dereference in both
>> igb_configure_tx_ring() and igb_configure_rx_ring(). In order to avoid
>> such bug, we re-assign the hw_addr value in the beginning of the
>> function igb_reset(), in case the hw_addr is NULL when we reach that
>> path.
>>
>> Reported-by: Anthony H. Thai <ahthai@us.ibm.com>
>> Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index edc9a6a..c19119c 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -1873,6 +1873,13 @@ void igb_reset(struct igb_adapter *adapter)
>>         struct e1000_fc_info *fc = &hw->fc;
>>         u32 pba, hwm;
>>
>> +       /* In case of PCI error, adapter might have lost its HW
>> +        * address; if we reached this point after an error scenario,
>> +        * we should re-assign the hw_addr based on the saved io_addr.
>> +        */
>> +       if (!hw->hw_addr)
>> +               hw->hw_addr = adapter->io_addr;
>> +
>>         /* Repartition Pba for greater than 9k mtu
>>          * To take effect CTRL.RST is required.
>>          */
> 
> It seems like this would have the potential to get noisy pretty
> quickly since every reset would retrigger this.
> 
> It might make more sense to move this line into igb_io_slot_reset and
> igb_resume where the device would have gone through a reset of some
> sort and then had the state of the device restored and the device
> memory access was re-enabled via pci_enable_device_mem.  Also there is
> no point in really doing the "if" since you should always be okay to
> overwrite hw->hw_addr with adapter->io_addr.

Thanks for the good suggestion Alexander, will send a V2.
Cheers,

Guilherme


> 
> Thanks.
> 
> - Alex
> 

  reply	other threads:[~2016-11-10 16:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-31 20:12 [Intel-wired-lan] [PATCH net] igb: re-assign hw address pointer on reset after PCI error Guilherme G. Piccoli
2016-10-31 20:12 ` Guilherme G. Piccoli
2016-11-09 17:12 ` [Intel-wired-lan] " Alexander Duyck
2016-11-09 17:12   ` Alexander Duyck
2016-11-10 16:48   ` Guilherme G. Piccoli [this message]
2016-11-10 16:48     ` Guilherme G. Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5824A4DE.9020508@linux.vnet.ibm.com \
    --to=gpiccoli@linux.vnet.ibm.com \
    --cc=intel-wired-lan@osuosl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.