From: Alexander Duyck <alexander.h.duyck@redhat.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH] e1000e i219 fix unit hang on reset and runtime D3
Date: Fri, 10 Apr 2015 09:47:35 -0700 [thread overview]
Message-ID: <5527FEA7.1060200@redhat.com> (raw)
In-Reply-To: <978966243-11032-1-git-send-email-yanirx.lubetkin@intel.com>
On 01/08/2001 07:04 AM, Yanir Lubetkin wrote:
Please set your system clock before sending a patch. This went straight
to my archive since my mail client though the message was over 14 years old.
> unit hang may occur if multiple descriptors are available in the rings during
> reset or runtime suspend. This state can be detected by testing the PCI config
> space register FEXTNVM7 bit 8 (0x100 mask). if this bit is on, and there are
> pending descriptors in one of the rings, we must flush them prior to reset.
> same goes for entering runtime suspend.
>
> Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
> ---
> drivers/net/ethernet/intel/e1000e/ich8lan.h | 2 +
> drivers/net/ethernet/intel/e1000e/netdev.c | 94 +++++++++++++++++++++++++++++
> drivers/net/ethernet/intel/e1000e/regs.h | 1 +
> 3 files changed, 97 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.h b/drivers/net/ethernet/intel/e1000e/ich8lan.h
> index 770a573..703f808 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.h
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.h
> @@ -101,6 +101,8 @@
>
> #define E1000_FEXTNVM7_DISABLE_SMB_PERST 0x00000020
>
> +#define E1000_FEXTNVM11_DISABLE_MULR_FIX 0x00002000
> +
> #define K1_ENTRY_LATENCY 0
> #define K1_MIN_TIME 1
> #define NVM_SIZE_MULTIPLIER 4096 /*multiplier for NVMS field */
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 4e56c31..46b5702 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -3788,6 +3788,98 @@ static void e1000_power_down_phy(struct e1000_adapter *adapter)
> }
>
> /**
> + * e1000_flush_tx_ring - remove all descriptors from the tx_ring
> + *
> + * force the hardware to read all the descriptors and discard them
> + * we put a descriptor with the ring itself as its data. reading
> + * the descripto contents is performing a read on all the ring entries and
> + * causes a ring flush
> + */
This description is bogus. Based on the code below all you are doing is
loading one descriptor with the descriptor ring as data.
> +static void e1000_flush_tx_ring(struct e1000_adapter *adapter)
> +{
> + struct e1000_hw *hw = &adapter->hw;
> + struct e1000_ring *tx_ring = adapter->tx_ring;
> + struct e1000_tx_desc *tx_desc = NULL;
> + u32 txd_lower = E1000_TXD_CMD_IFCS;
> + u32 tctl, tdbal, tdbah;
> + int i;
> + u16 size = 512;
> +
> + tctl = er32(TCTL);
> + ew32(TCTL, tctl | E1000_TCTL_EN);
Should there be any sort of sleep after enabling the queue?
> + tdbal = er32(TDBAL(0));
> + tdbah = er32(TDBAH(0));
This seems problematic at best. How can you guarantee that TDH and TDT
are even valid? Maybe you should use the value actually stored in
tx_ring->dma and not trust register reads, or at least make sure the
values match.
What guarantees that tx_ring->next_to_use is actually in sync with TDH
and TDT?
> + i = tx_ring->next_to_use;
> + tx_desc = E1000_TX_DESC(*tx_ring, i);
> + tx_desc->buffer_addr = cpu_to_le64(((u64)tdbah << 32) | tdbal);
> + tx_desc->lower.data = cpu_to_le32(txd_lower | size);
> + tx_desc->upper.data = 0;
> + /* in case other processors access the descriptor ring */
> + wmb();
The description here is invalid. The wmb() should be here to push the
descriptors to memory before you notify the device via an MMIO write.
> + i++;
> + if (i == tx_ring->count)
> + i = 0;
> + ew32(TDT(0), i);
> + mmiowb();
> + usleep_range(200);
> +}
> +
Does the value need to be stored anywhere after you have updated TDT?
For example what happens if you call flush twice?
> +/**
> + * e1000_flush_rx_ring - remove all descriptors from the tx_ring
> + *
> + * Mark all descriptors in the RX ring as consumed and disable the rx ring
> + */
> +static void e1000_flush_rx_ring(struct e1000_adapter *adapter)
> +{
> + u32 rctl, rxdctl;
> + struct e1000_hw *hw = &adapter->hw;
> +
> + rctl = er32(RCTL);
> + ew32(RCTL, rctl & ~E1000_RCTL_EN);
You might want to do a write flush here to guarantee the write is
completed before you start the sleep.
> + usleep_range(100);
> + rxdctl = er32(RXDCTL(0));
Or you could swap the position of these two lines and get the same effect.
> + rxdctl &= 0xffffc000;
> + rxdctl |= (0x1F | (1 << 8) | (1 << 24));
First, please don't use magic numbers, or if you must please try to
explain them as occurs in E1000_RXDCTL_DMA_BURST_ENABLE. BTW the
descriptions in E1000_RXDCTL_DMA_BURST_ENABLE appear to be wrong as well
though the values appear to be correct. The error is that hthresh and
pthresh is swapped in the description.
You should probably get rid of the &= and just assign the value directly
since your mask and the values you are setting overlap so I suspect
either one or the other is wrong.
> + ew32(RXDCTL(0), rxdctl);
> + ew32(RCTL, rctl | E1000_RCTL_EN);
> + usleep_range(100);
> + ew32(RCTL, rctl & ~E1000_RCTL_EN);
> +}
> +
You need some write flushes before the sleep. Otherwise this can toggle
so fast there is no guarantee that things will occur in the correct
order as the first write could be potentially stalled if there are other
MMIO requests on the interface from other processors.
Also I suspect this has the potential to trigger memory corruption.
Have you tried testing this fix w/ an IOMMU (VTd) enabled on the
platform? I would expect that enabling the Rx ring with uninitialized
values, or stale values would open you up to the risk of writing garbage.
> +/**
> + * e1000_flush_desc_rings - remove all descriptors from the descriptor rings
> + *
> + * In i219, the descriptor rings must be emptied before resetting the HW
> + * or before changing the device state to D3 during runtime (runtime PM).
> + *
> + * Failure to do this will cause the HW to enter a unit hang state which can
> + * only be released by PCI reset on the device
> + *
> + */
> +
> +static void e1000_flush_desc_rings(struct e1000_adapter *adapter)
> +{
> + u16 hang_state;
> + u32 fext_nvm11, tdlen;
> + struct e1000_hw *hw = &adapter->hw;
> +
> + /* First, disable MULR fix in FEXTNVM11 */
Any explanation on what MULR means? By any chance is this related to
the Multiple Requests (MULR) workaround that was needed for TSO on 82583v?
> + fext_nvm11 = er32(FEXTNVM11);
> + fext_nvm11 |= E1000_FEXTNVM11_DISABLE_MULR_FIX;
> + ew32(FEXTNVM11, fext_nvm11);
So the other FEXT registers appear to only be reset at power-on. I'm
assuming it is the same for this register as well. Does this mean the
fix that is disabled here is permanent? If so is there any explanation
anywhere on what the side effects will be?
> + /* do nothing if we're not in faulty state, or if the queue is empty */
> + tdlen = er32(TDLEN(0));
So I am assuming this is the "queue is empty" portion of the tests. Is
this what you are using to try and determine if the descriptor rings are
still in sync or not?
> + pci_read_config_word(adapter->pdev, E1000_FEXTNVM7, &hang_state);
> + if ((hang_state & E1000_FEXTNVM7_MULR_NEED_DESCRING_FLUSH) || tdlen)
> + return;
> + e1000_flush_tx_ring(adapter);
> + /* recheck, maybe the fault is caused by the rx ring */
> + pci_read_config_word(adapter->pdev, E1000_FEXTNVM7, &hang_state);
> + if (hang_state & E1000_FEXTNVM7_MULR_NEED_DESCRING_FLUSH)
> + e1000_flush_rx_ring(adapter);
> +}
> +
> +/**
> * e1000e_reset - bring the hardware into a known good state
> *
> * This function boots the hardware and enables some settings that
> @@ -3943,6 +4035,8 @@ void e1000e_reset(struct e1000_adapter *adapter)
> }
> }
>
> + if (hw->mac.type == e1000_pch_spt)
> + e1000_flush_desc_rings(adapter);
> /* Allow time for pending master requests to run */
> mac->ops.reset_hw(hw);
>
> diff --git a/drivers/net/ethernet/intel/e1000e/regs.h b/drivers/net/ethernet/intel/e1000e/regs.h
> index 85eefc4..4648754 100644
> --- a/drivers/net/ethernet/intel/e1000e/regs.h
> +++ b/drivers/net/ethernet/intel/e1000e/regs.h
> @@ -38,6 +38,7 @@
> #define E1000_FEXTNVM4 0x00024 /* Future Extended NVM 4 - RW */
> #define E1000_FEXTNVM6 0x00010 /* Future Extended NVM 6 - RW */
> #define E1000_FEXTNVM7 0x000E4 /* Future Extended NVM 7 - RW */
> +#define E1000_FEXTNVM11 0x5BBC /* Future Extended NVM 11 - RW */
> #define E1000_PCIEANACFG 0x00F18 /* PCIE Analog Config */
> #define E1000_FCT 0x00030 /* Flow Control Type - RW */
> #define E1000_VET 0x00038 /* VLAN Ether Type - RW */
>
Should probably pad the value out with an extra 0 at the start in order
to get it to align with the other values in the list.
next prev parent reply other threads:[~2015-04-10 16:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-10 2:34 [Intel-wired-lan] [PATCH] e1000e i219 fix unit hang on reset and runtime D3 Yanir Lubetkin
2015-04-10 3:41 ` Jeff Kirsher
2015-04-10 16:52 ` Alexander Duyck
2015-04-10 16:47 ` Alexander Duyck [this message]
2015-04-10 16:58 ` Jeff Kirsher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5527FEA7.1060200@redhat.com \
--to=alexander.h.duyck@redhat.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.