netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net PATCH 0/2] Fix descriptor counting and avoid Tx hangs on e1000 w/ TSO
@ 2016-03-02 21:15 Alexander Duyck
  2016-03-02 21:16 ` [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx pre-check Alexander Duyck
  2016-03-02 21:16 ` [net PATCH 2/2] e1000: Double Tx descriptors needed check for 82544 Alexander Duyck
  0 siblings, 2 replies; 5+ messages in thread
From: Alexander Duyck @ 2016-03-02 21:15 UTC (permalink / raw)
  To: netdev, jogreene, intel-wired-lan, jeffrey.t.kirsher, sassmann

This patch series addresses a Tx hang reported in our test lab with
RHEL/CentOS 7.2 running in a VM with an emulated e1000 driver.  We were
able to determine that the issue appears to have been introduced with the
changes that introduced xmit_more.

What we have found is that the pre-check for the number of descriptors
was using a value much larger than the value used for the next transmit at
the end of the xmit path.  As a result we were often not writing the tail,
and then setting then stopping xmit with the next packet and returning
TX_BUSY from the driver.

This patch series addresses the two main issues found.  First it prevents
us from reporting the need for 2 descriptors for every 4K page when we only
needed one.  This wasn't so much an issue when 32K pages are used for a
TSO, but if 4K pages are used then this effectively doubles the size of the
data descriptor count so instead of indicating 1 (head) + 17 (frags) we
were indicating 1 (head) + 32 (frags) because each full 4K frag was
requesting 2 descriptors instead of 1.

The fix for the 82544 is speculative as I don't actually have the hardware
to test with but I suspect it will have a similar issue.  As such I have
build tested it and verified it didn't break existing hardware to increase
the post-xmit test by a couple descriptors, but I have not tested the code
path with an 82544 so I don't know if there are any issues with us
increasing the value by MAX_SKB_FRAGS + 1.

Testing Hints:
The reproduction case for this is pretty simple.  You basically just need
the adapter installed in a multi-CPU system and to perform TSO from a few
threads so that you can hit the point of tx_restart_queue incrementing.
After that the Tx hangs should start being reported since the adapter will
be stopped but the tail never gets updated.  It should be easiest to
reproduce this issue on an 82544 since it will push the upper limit
theoretically as high as trying to request 52 descriptors for a single
frame while the post check is only looking for something like 20.

---

Alexander Duyck (2):
      e1000: Do not overestimate descriptor counts in Tx pre-check
      e1000: Double Tx descriptors needed check for 82544


 drivers/net/ethernet/intel/e1000/e1000_main.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx pre-check
  2016-03-02 21:15 [net PATCH 0/2] Fix descriptor counting and avoid Tx hangs on e1000 w/ TSO Alexander Duyck
@ 2016-03-02 21:16 ` Alexander Duyck
  2016-03-09 23:50   ` Brown, Aaron F
  2016-03-02 21:16 ` [net PATCH 2/2] e1000: Double Tx descriptors needed check for 82544 Alexander Duyck
  1 sibling, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2016-03-02 21:16 UTC (permalink / raw)
  To: netdev, jogreene, intel-wired-lan, jeffrey.t.kirsher, sassmann

The current code path is capable of grossly overestimating the number of
descriptors needed to transmit a new frame.  This specifically occurs if
the skb contains a number of 4K pages.  The issue is that the logic for
determining the descriptors needed is ((S) >> (X)) + 1.  When X is 12 it
means that we were indicating that we required 2 descriptors for each 4K
page when we only needed one.

This change corrects this by instead adding (1 << (X)) - 1 to the S value
instead of adding 1 after the fact.  This way we get an accurate descriptor
needed count as we are essentially doing a DIV_ROUNDUP().

Reported-by: Ivan Suzdal <isuzdal@mirantis.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3fc7bde..d213fb4 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3106,7 +3106,7 @@ static int e1000_maybe_stop_tx(struct net_device *netdev,
 	return __e1000_maybe_stop_tx(netdev, size);
 }
 
-#define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1)
+#define TXD_USE_COUNT(S, X) (((S) + ((1 << (X)) - 1)) >> (X))
 static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 				    struct net_device *netdev)
 {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [net PATCH 2/2] e1000: Double Tx descriptors needed check for 82544
  2016-03-02 21:15 [net PATCH 0/2] Fix descriptor counting and avoid Tx hangs on e1000 w/ TSO Alexander Duyck
  2016-03-02 21:16 ` [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx pre-check Alexander Duyck
@ 2016-03-02 21:16 ` Alexander Duyck
  2016-03-09 23:51   ` [Intel-wired-lan] " Brown, Aaron F
  1 sibling, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2016-03-02 21:16 UTC (permalink / raw)
  To: netdev, jogreene, intel-wired-lan, jeffrey.t.kirsher, sassmann

The 82544 has code that adds one additional descriptor per data buffer.
However we weren't taking that into acount when determining the descriptors
needed for the next transmit at the end of the xmit_frame path.

This change takes that into account by doubling the number of descriptors
needed for the 82544 so that we can avoid a potential issue where we could
hang the Tx ring by loading frames with xmit_more enabled and then stopping
the ring without writing the tail.

In addition it adds a few more descriptors to account for some additional
workarounds that have been added over time.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index d213fb4..ae90d4f 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3256,12 +3256,29 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 			     nr_frags, mss);
 
 	if (count) {
+		/* The descriptors needed is higher than other Intel drivers
+		 * due to a number of workarounds.  The breakdown is below:
+		 * Data descriptors: MAX_SKB_FRAGS + 1
+		 * Context Descriptor: 1
+		 * Keep head from touching tail: 2
+		 * Workarounds: 3
+		 */
+		int desc_needed = MAX_SKB_FRAGS + 7;
+
 		netdev_sent_queue(netdev, skb->len);
 		skb_tx_timestamp(skb);
 
 		e1000_tx_queue(adapter, tx_ring, tx_flags, count);
+
+		/* 82544 potentially requires twice as many data descriptors
+		 * in order to guarantee buffers don't end on evenly-aligned
+		 * dwords
+		 */
+		if (adapter->pcix_82544)
+			desc_needed += MAX_SKB_FRAGS + 1;
+
 		/* Make sure there is space in the ring for the next send. */
-		e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
+		e1000_maybe_stop_tx(netdev, tx_ring, desc_needed);
 
 		if (!skb->xmit_more ||
 		    netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx pre-check
  2016-03-02 21:16 ` [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx pre-check Alexander Duyck
@ 2016-03-09 23:50   ` Brown, Aaron F
  0 siblings, 0 replies; 5+ messages in thread
From: Brown, Aaron F @ 2016-03-09 23:50 UTC (permalink / raw)
  To: Alexander Duyck, netdev@vger.kernel.org, jogreene@redhat.com,
	intel-wired-lan@lists.osuosl.org, Kirsher, Jeffrey T,
	sassmann@redhat.com

> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Alexander Duyck
> Sent: Wednesday, March 2, 2016 1:16 PM
> To: netdev@vger.kernel.org; jogreene@redhat.com; intel-wired-
> lan@lists.osuosl.org; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> sassmann@redhat.com
> Subject: [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx
> pre-check
> 
> The current code path is capable of grossly overestimating the number of
> descriptors needed to transmit a new frame.  This specifically occurs if
> the skb contains a number of 4K pages.  The issue is that the logic for
> determining the descriptors needed is ((S) >> (X)) + 1.  When X is 12 it
> means that we were indicating that we required 2 descriptors for each 4K
> page when we only needed one.
> 
> This change corrects this by instead adding (1 << (X)) - 1 to the S value
> instead of adding 1 after the fact.  This way we get an accurate descriptor
> needed count as we are essentially doing a DIV_ROUNDUP().
> 
> Reported-by: Ivan Suzdal <isuzdal@mirantis.com>
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Intel-wired-lan] [net PATCH 2/2] e1000: Double Tx descriptors needed check for 82544
  2016-03-02 21:16 ` [net PATCH 2/2] e1000: Double Tx descriptors needed check for 82544 Alexander Duyck
@ 2016-03-09 23:51   ` Brown, Aaron F
  0 siblings, 0 replies; 5+ messages in thread
From: Brown, Aaron F @ 2016-03-09 23:51 UTC (permalink / raw)
  To: Alexander Duyck, netdev@vger.kernel.org, jogreene@redhat.com,
	intel-wired-lan@lists.osuosl.org, Kirsher, Jeffrey T,
	sassmann@redhat.com

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Wednesday, March 2, 2016 1:16 PM
> To: netdev@vger.kernel.org; jogreene@redhat.com; intel-wired-
> lan@lists.osuosl.org; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> sassmann@redhat.com
> Subject: [Intel-wired-lan] [net PATCH 2/2] e1000: Double Tx descriptors
> needed check for 82544
> 
> The 82544 has code that adds one additional descriptor per data buffer.
> However we weren't taking that into acount when determining the
> descriptors
> needed for the next transmit at the end of the xmit_frame path.
> 
> This change takes that into account by doubling the number of descriptors
> needed for the 82544 so that we can avoid a potential issue where we could
> hang the Tx ring by loading frames with xmit_more enabled and then
> stopping
> the ring without writing the tail.
> 
> In addition it adds a few more descriptors to account for some additional
> workarounds that have been added over time.
> 
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c |   19
> ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-09 23:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-02 21:15 [net PATCH 0/2] Fix descriptor counting and avoid Tx hangs on e1000 w/ TSO Alexander Duyck
2016-03-02 21:16 ` [net PATCH 1/2] e1000: Do not overestimate descriptor counts in Tx pre-check Alexander Duyck
2016-03-09 23:50   ` Brown, Aaron F
2016-03-02 21:16 ` [net PATCH 2/2] e1000: Double Tx descriptors needed check for 82544 Alexander Duyck
2016-03-09 23:51   ` [Intel-wired-lan] " Brown, Aaron F

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).