netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] abysmal e1000 performance (DITR)
@ 2004-06-01 23:05 Thayne Harbaugh
  2004-06-04 14:20 ` Thayne Harbaugh
  0 siblings, 1 reply; 12+ messages in thread
From: Thayne Harbaugh @ 2004-06-01 23:05 UTC (permalink / raw)
  To: netdev; +Cc: ganesh.venkatesan

The introduction of Dynamic Interrupt Throttling Rate (DITR) in the 4.x
-> 5.x e1000 driver change can cause serious performance problems.  The
ITR can be reduced when the system is *not* under load which results in
gratuitous latencies for network traffic.  In other words - why is the
interrupt load reduced when the system isn't under load?

This is a patch to include system load when calculating the DITR.  It
only allows the diff/goc ratio to factor into the DITR when the 1 minute
load average is above .50.  It appears to work quite well and prevents
throttling when the load is below .50 and allows throttling when the
load is in excess of .50.

There are a few concerns.  The patch requires kernel/timer.c to
EXPORT_SYMBOL(avenrun) - otherwise the symbol isn't available to the
driver when built as a module.  This symbol "clutter" may be undesirable
to some.  Another concern is that NAPI accomplishes a similar interrupt
load reduction and is likely a better solution than ITR.


diff -ur linux-2.4.21-99/drivers/net/e1000/e1000_main.c linux-2.4.21-99-e1000/drivers/net/e1000/e1000_main.c
--- linux-2.4.21-99/drivers/net/e1000/e1000_main.c	2003-09-24 05:47:33.000000000 -0700
+++ linux-2.4.21-99-e1000/drivers/net/e1000/e1000_main.c	2004-05-27 23:38:02.000000000 -0700
@@ -27,6 +27,8 @@
 *******************************************************************************/
 
 #include "e1000.h"
+/* for avenrun[] */
+#include <linux/sched.h>
 
 /* Change Log
  *
@@ -1429,14 +1431,34 @@
 
 	/* Dynamic mode for Interrupt Throttle Rate (ITR) */
 	if(adapter->hw.mac_type >= e1000_82540 && adapter->itr == 1) {
+		/* fixed fractional part of .50 */
+#define FIXED_F50 (FIXED_1 >> 1)
+		/* This maps the range of .50-.99 -> 0-100 */
+#define FIXED_1_MAPPED (FIXED_1 - FIXED_F50)
+		unsigned long laf;	/* load average fractional part */
+		uint32_t goc;		/* good octet count */
+		uint32_t dif;		/* dif between tx and rx goc */
+		uint32_t itr;		/* inturrept throttle rate */
+		/* laf range is mapped:
+		 *     .00-.50 -> .00
+		 *     .50-.99 -> .00-.99
+		 *    1.00+    -> 1.00 */
+		if (avenrun[0] + FIXED_1/200 > FIXED_1)
+			laf = FIXED_1_MAPPED;
+		else if (avenrun[0] + FIXED_1/200 < FIXED_F50)
+			laf = 0;
+		else
+			laf = ((avenrun[0] + FIXED_1/2000) & (FIXED_1 - 1)) - FIXED_1_MAPPED;
 		/* Symmetric Tx/Rx gets a reduced ITR=2000; Total
 		 * asymmetrical Tx or Rx gets ITR=8000; everyone
 		 * else is between 2000-8000. */
-		uint32_t goc = (adapter->gotcl + adapter->gorcl) / 10000;
-		uint32_t dif = (adapter->gotcl > adapter->gorcl ? 
-			adapter->gotcl - adapter->gorcl :
-			adapter->gorcl - adapter->gotcl) / 10000;
-		uint32_t itr = goc > 0 ? (dif * 6000 / goc + 2000) : 8000;
+		goc = (adapter->gotcl + adapter->gorcl) / 10000;
+		dif = (adapter->gotcl > adapter->gorcl ? 
+		       adapter->gotcl - adapter->gorcl :
+		       adapter->gorcl - adapter->gotcl) / 10000;
+		itr = goc > 0
+			? 8000 + 6000*laf/FIXED_1_MAPPED*dif/goc - 6000*laf/FIXED_1_MAPPED
+			: 8000;
 		E1000_WRITE_REG(&adapter->hw, ITR, 1000000000 / (itr * 256));
 	}
 
diff -ur linux-2.4.21-99/kernel/Makefile linux-2.4.21-99-e1000/kernel/Makefile
--- linux-2.4.21-99/kernel/Makefile	2003-09-24 05:47:27.000000000 -0700
+++ linux-2.4.21-99-e1000/kernel/Makefile	2004-05-26 23:45:47.000000000 -0700
@@ -11,7 +11,7 @@
 
 export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o exec_domain.o \
 	      printk.o cpufreq.o rcupdate.o syscall_ksyms.o fork.o hook.o \
-	      rashooks.o module.o
+	      rashooks.o module.o timer.o
 
 obj-y     = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
 	    module.o exit.o itimer.o info.o time.o softirq.o resource.o \
diff -ur linux-2.4.21-99/kernel/timer.c linux-2.4.21-99-e1000/kernel/timer.c
--- linux-2.4.21-99/kernel/timer.c	2003-09-24 05:47:27.000000000 -0700
+++ linux-2.4.21-99-e1000/kernel/timer.c	2004-05-27 00:54:29.000000000 -0700
@@ -686,6 +686,7 @@
  * all seem to differ on different machines.
  */
 unsigned long avenrun[3];
+EXPORT_SYMBOL(avenrun);
 
 static inline void calc_load(unsigned long ticks)
 {

-- 
Thayne Harbaugh
Linux Networx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] abysmal e1000 performance (DITR)
  2004-06-01 23:05 Thayne Harbaugh
@ 2004-06-04 14:20 ` Thayne Harbaugh
  0 siblings, 0 replies; 12+ messages in thread
From: Thayne Harbaugh @ 2004-06-04 14:20 UTC (permalink / raw)
  To: netdev; +Cc: ganesh.venkatesan, Scott Feldman, Jesse Brandeburg

I have documented a serious performance problem with DITR in the e1000
driver.  I have sent a patch to the netdev list as well as CC'ed various
people at Intel.  There has been very little response.  I am wondering
who has reviewed the DITR performance problem and what the plans are for
fixing it.

-- 
Thayne Harbaugh
Linux Networx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
@ 2004-06-04 15:44 Venkatesan, Ganesh
  2004-08-26 17:55 ` jamal
  0 siblings, 1 reply; 12+ messages in thread
From: Venkatesan, Ganesh @ 2004-06-04 15:44 UTC (permalink / raw)
  To: tharbaugh, netdev; +Cc: Feldman, Scott, Brandeburg, Jesse

Thayne:

We are studying the patch and will get back to you once we have a plan
to integrate it into the driver.

Thanks,
ganesh 
 
-------------------------------------------------
Ganesh Venkatesan
Network/Storage Division, Hillsboro, OR

-----Original Message-----
From: Thayne Harbaugh [mailto:tharbaugh@lnxi.com] 
Sent: Friday, June 04, 2004 7:20 AM
To: netdev@oss.sgi.com
Cc: Venkatesan, Ganesh; Feldman, Scott; Brandeburg, Jesse
Subject: Re: [PATCH] abysmal e1000 performance (DITR)

I have documented a serious performance problem with DITR in the e1000
driver.  I have sent a patch to the netdev list as well as CC'ed various
people at Intel.  There has been very little response.  I am wondering
who has reviewed the DITR performance problem and what the plans are for
fixing it.

-- 
Thayne Harbaugh
Linux Networx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
  2004-06-04 15:44 [PATCH] abysmal e1000 performance (DITR) Venkatesan, Ganesh
@ 2004-08-26 17:55 ` jamal
  2004-08-26 20:04   ` Thayne Harbaugh
  0 siblings, 1 reply; 12+ messages in thread
From: jamal @ 2004-08-26 17:55 UTC (permalink / raw)
  To: Venkatesan, Ganesh; +Cc: tharbaugh, netdev, Feldman, Scott, Brandeburg, Jesse

Ganesh,

Can you please make this feature off by default and perhaps
accesible via ethtool for peopel who want to turn it on.
I just wasted a few hours and was bitten by this performance-wise.
Please consider disabling it.

cheers,
jamal

On Fri, 2004-06-04 at 11:44, Venkatesan, Ganesh wrote:
> Thayne:
> 
> We are studying the patch and will get back to you once we have a plan
> to integrate it into the driver.
> 
> Thanks,
> ganesh 
>  
> -------------------------------------------------
> Ganesh Venkatesan
> Network/Storage Division, Hillsboro, OR
> 
> -----Original Message-----
> From: Thayne Harbaugh [mailto:tharbaugh@lnxi.com] 
> Sent: Friday, June 04, 2004 7:20 AM
> To: netdev@oss.sgi.com
> Cc: Venkatesan, Ganesh; Feldman, Scott; Brandeburg, Jesse
> Subject: Re: [PATCH] abysmal e1000 performance (DITR)
> 
> I have documented a serious performance problem with DITR in the e1000
> driver.  I have sent a patch to the netdev list as well as CC'ed various
> people at Intel.  There has been very little response.  I am wondering
> who has reviewed the DITR performance problem and what the plans are for
> fixing it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
  2004-08-26 17:55 ` jamal
@ 2004-08-26 20:04   ` Thayne Harbaugh
  2004-08-26 20:26     ` Jeff Garzik
  0 siblings, 1 reply; 12+ messages in thread
From: Thayne Harbaugh @ 2004-08-26 20:04 UTC (permalink / raw)
  To: hadi; +Cc: Venkatesan, Ganesh, netdev, Feldman, Scott, Brandeburg, Jesse

On Thu, 2004-08-26 at 13:55 -0400, jamal wrote:
> Ganesh,
> 
> Can you please make this feature off by default and perhaps
> accesible via ethtool for peopel who want to turn it on.
> I just wasted a few hours and was bitten by this performance-wise.
> Please consider disabling it.

This is a *horrible* problem.  Even though it's fixable by passing a
module parameter, the default bites those that *know* about it.  We have
had customers bitten by this and customers that have insisted in
swapping all the NICs in a cluster to Broadcom TG3 NICs.

It's a black eye for Intel and a loss of business - that's the opinion
of our customers.

> cheers,
> jamal
> 
> On Fri, 2004-06-04 at 11:44, Venkatesan, Ganesh wrote:
> > Thayne:
> > 
> > We are studying the patch and will get back to you once we have a plan
> > to integrate it into the driver.
> > 
> > Thanks,
> > ganesh 
> >  
> > -------------------------------------------------
> > Ganesh Venkatesan
> > Network/Storage Division, Hillsboro, OR
> > 
> > -----Original Message-----
> > From: Thayne Harbaugh [mailto:tharbaugh@lnxi.com] 
> > Sent: Friday, June 04, 2004 7:20 AM
> > To: netdev@oss.sgi.com
> > Cc: Venkatesan, Ganesh; Feldman, Scott; Brandeburg, Jesse
> > Subject: Re: [PATCH] abysmal e1000 performance (DITR)
> > 
> > I have documented a serious performance problem with DITR in the e1000
> > driver.  I have sent a patch to the netdev list as well as CC'ed various
> > people at Intel.  There has been very little response.  I am wondering
> > who has reviewed the DITR performance problem and what the plans are for
> > fixing it.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] abysmal e1000 performance (DITR)
  2004-08-26 20:04   ` Thayne Harbaugh
@ 2004-08-26 20:26     ` Jeff Garzik
  2004-08-26 21:28       ` Thayne Harbaugh
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Garzik @ 2004-08-26 20:26 UTC (permalink / raw)
  To: tharbaugh
  Cc: hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

Thayne Harbaugh wrote:
> On Thu, 2004-08-26 at 13:55 -0400, jamal wrote:
> 
>>Ganesh,
>>
>>Can you please make this feature off by default and perhaps
>>accesible via ethtool for peopel who want to turn it on.
>>I just wasted a few hours and was bitten by this performance-wise.
>>Please consider disabling it.
> 
> 
> This is a *horrible* problem.  Even though it's fixable by passing a
> module parameter, the default bites those that *know* about it.  We have
> had customers bitten by this and customers that have insisted in
> swapping all the NICs in a cluster to Broadcom TG3 NICs.
> 
> It's a black eye for Intel and a loss of business - that's the opinion
> of our customers.


If it's so bad we should disable it by default, either via the module 
parameter or via a kernel CONFIG_xxx option.

	Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] abysmal e1000 performance (DITR)
  2004-08-26 20:26     ` Jeff Garzik
@ 2004-08-26 21:28       ` Thayne Harbaugh
  2004-08-28 22:58         ` Jeff Garzik
  0 siblings, 1 reply; 12+ messages in thread
From: Thayne Harbaugh @ 2004-08-26 21:28 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

On Thu, 2004-08-26 at 16:26 -0400, Jeff Garzik wrote:
> Thayne Harbaugh wrote:
> > On Thu, 2004-08-26 at 13:55 -0400, jamal wrote:
> > 
> >>Ganesh,
> >>
> >>Can you please make this feature off by default and perhaps
> >>accesible via ethtool for peopel who want to turn it on.
> >>I just wasted a few hours and was bitten by this performance-wise.
> >>Please consider disabling it.
> > 
> > 
> > This is a *horrible* problem.  Even though it's fixable by passing a
> > module parameter, the default bites those that *know* about it.  We have
> > had customers bitten by this and customers that have insisted in
> > swapping all the NICs in a cluster to Broadcom TG3 NICs.
> > 
> > It's a black eye for Intel and a loss of business - that's the opinion
> > of our customers.
> 
> 
> If it's so bad we should disable it by default, either via the module 
> parameter or via a kernel CONFIG_xxx option.

Yes, it is so bad.  The dynamic interrupt setting should be deprecated
by the use of NAPI.

This is a simple way to disable it, yet still keep the code so that
someone can enable it if they really wanted it.  I, however, would just
as soon see all of the DITR code ripped out.

There are other ways that might be better for dealing with it, yet still
keeping the DITR code viable.

--- drivers/net/e1000/e1000_param.c.broken_ditr 2004-08-26 15:40:34.436456736 -0600
+++ drivers/net/e1000/e1000_param.c     2004-08-26 15:49:07.186506880 -0600
@@ -212,7 +212,7 @@
 #define MAX_TXABSDELAY            0xFFFF
 #define MIN_TXABSDELAY                 0
  
-#define DEFAULT_ITR                    1
+#define DEFAULT_ITR                 8000
 #define MAX_ITR                   100000
 #define MIN_ITR                      100
  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
@ 2004-08-27 21:49 Ronciak, John
  2004-08-27 22:05 ` Thayne Harbaugh
  0 siblings, 1 reply; 12+ messages in thread
From: Ronciak, John @ 2004-08-27 21:49 UTC (permalink / raw)
  To: tharbaugh, Jeff Garzik
  Cc: hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

Jamal, Thayne,

I've asked Jeff to go ahead and apply this patch as a way around this
for now.  We would liketo see the DITR stay but now have this
performacne problem so we don't want to rip it out.  We do however need
a test case to replicate this as we have not been seeing it in our
testing.  Please get us those case that break things.  We'll have a
better solution longer term based on the test cases (as well as the ones
we normally use of course).

Thanks.

Cheers,
John


> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Thayne Harbaugh
> Sent: Thursday, August 26, 2004 2:29 PM
> To: Jeff Garzik
> Cc: hadi@cyberus.ca; Venkatesan, Ganesh; netdev@oss.sgi.com; 
> Feldman, Scott; Brandeburg, Jesse
> Subject: Re: [PATCH] abysmal e1000 performance (DITR)
> 
> 
> On Thu, 2004-08-26 at 16:26 -0400, Jeff Garzik wrote:
> > Thayne Harbaugh wrote:
> > > On Thu, 2004-08-26 at 13:55 -0400, jamal wrote:
> > > 
> > >>Ganesh,
> > >>
> > >>Can you please make this feature off by default and perhaps
> > >>accesible via ethtool for peopel who want to turn it on.
> > >>I just wasted a few hours and was bitten by this performance-wise.
> > >>Please consider disabling it.
> > > 
> > > 
> > > This is a *horrible* problem.  Even though it's fixable 
> by passing a
> > > module parameter, the default bites those that *know* 
> about it.  We have
> > > had customers bitten by this and customers that have insisted in
> > > swapping all the NICs in a cluster to Broadcom TG3 NICs.
> > > 
> > > It's a black eye for Intel and a loss of business - 
> that's the opinion
> > > of our customers.
> > 
> > 
> > If it's so bad we should disable it by default, either via 
> the module 
> > parameter or via a kernel CONFIG_xxx option.
> 
> Yes, it is so bad.  The dynamic interrupt setting should be deprecated
> by the use of NAPI.
> 
> This is a simple way to disable it, yet still keep the code so that
> someone can enable it if they really wanted it.  I, however, 
> would just
> as soon see all of the DITR code ripped out.
> 
> There are other ways that might be better for dealing with 
> it, yet still
> keeping the DITR code viable.
> 
> --- drivers/net/e1000/e1000_param.c.broken_ditr 2004-08-26 
> 15:40:34.436456736 -0600
> +++ drivers/net/e1000/e1000_param.c     2004-08-26 
> 15:49:07.186506880 -0600
> @@ -212,7 +212,7 @@
>  #define MAX_TXABSDELAY            0xFFFF
>  #define MIN_TXABSDELAY                 0
>   
> -#define DEFAULT_ITR                    1
> +#define DEFAULT_ITR                 8000
>  #define MAX_ITR                   100000
>  #define MIN_ITR                      100
>   
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
  2004-08-27 21:49 Ronciak, John
@ 2004-08-27 22:05 ` Thayne Harbaugh
  0 siblings, 0 replies; 12+ messages in thread
From: Thayne Harbaugh @ 2004-08-27 22:05 UTC (permalink / raw)
  To: Ronciak, John
  Cc: Jeff Garzik, hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

On Fri, 2004-08-27 at 14:49 -0700, Ronciak, John wrote:
> Jamal, Thayne,
> 
> I've asked Jeff to go ahead and apply this patch as a way around this
> for now.  We would liketo see the DITR stay but now have this
> performacne problem so we don't want to rip it out.  We do however need
> a test case to replicate this as we have not been seeing it in our
> testing.  Please get us those case that break things.  We'll have a
> better solution longer term based on the test cases (as well as the ones
> we normally use of course).

Attached is an email that I sent out 2004 May 25.  The email is very
detailed about what the problem is, how to test it, and an initial
patch.  The email was sent to several addresses at Intel.  I later sent
it to lkml and netdev.

There was almost no response at the time.  The best reply that I
received from Intel was that the DITR was put in and calibrated
according to a marketing benchmark program and that it wouldn't be
changed even though tests (and customers) showed that the performance
was *abysmal*.  It's also a big question what role DITR plays when it
seems deprecated by NAPI.

The response was disappointing.  I'm now wondering why this has now
become interesting and the thread has resumed.  What's changed?  How did
this catch someone's attention?  What can I do next time (hopefully that
won't happen) so that people *do* take interest.

> 
> > -----Original Message-----
> > From: netdev-bounce@oss.sgi.com 
> > [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Thayne Harbaugh
> > Sent: Thursday, August 26, 2004 2:29 PM
> > To: Jeff Garzik
> > Cc: hadi@cyberus.ca; Venkatesan, Ganesh; netdev@oss.sgi.com; 
> > Feldman, Scott; Brandeburg, Jesse
> > Subject: Re: [PATCH] abysmal e1000 performance (DITR)
> > 
> > 
> > On Thu, 2004-08-26 at 16:26 -0400, Jeff Garzik wrote:
> > > Thayne Harbaugh wrote:
> > > > On Thu, 2004-08-26 at 13:55 -0400, jamal wrote:
> > > > 
> > > >>Ganesh,
> > > >>
> > > >>Can you please make this feature off by default and perhaps
> > > >>accesible via ethtool for peopel who want to turn it on.
> > > >>I just wasted a few hours and was bitten by this performance-wise.
> > > >>Please consider disabling it.
> > > > 
> > > > 
> > > > This is a *horrible* problem.  Even though it's fixable 
> > by passing a
> > > > module parameter, the default bites those that *know* 
> > about it.  We have
> > > > had customers bitten by this and customers that have insisted in
> > > > swapping all the NICs in a cluster to Broadcom TG3 NICs.
> > > > 
> > > > It's a black eye for Intel and a loss of business - 
> > that's the opinion
> > > > of our customers.
> > > 
> > > 
> > > If it's so bad we should disable it by default, either via 
> > the module 
> > > parameter or via a kernel CONFIG_xxx option.
> > 
> > Yes, it is so bad.  The dynamic interrupt setting should be deprecated
> > by the use of NAPI.
> > 
> > This is a simple way to disable it, yet still keep the code so that
> > someone can enable it if they really wanted it.  I, however, 
> > would just
> > as soon see all of the DITR code ripped out.
> > 
> > There are other ways that might be better for dealing with 
> > it, yet still
> > keeping the DITR code viable.
> > 
> > --- drivers/net/e1000/e1000_param.c.broken_ditr 2004-08-26 
> > 15:40:34.436456736 -0600
> > +++ drivers/net/e1000/e1000_param.c     2004-08-26 
> > 15:49:07.186506880 -0600
> > @@ -212,7 +212,7 @@
> >  #define MAX_TXABSDELAY            0xFFFF
> >  #define MIN_TXABSDELAY                 0
> >   
> > -#define DEFAULT_ITR                    1
> > +#define DEFAULT_ITR                 8000
> >  #define MAX_ITR                   100000
> >  #define MIN_ITR                      100
> >   
> > 
> > 
> > 
> > 

[-- Attachment #2: Attached message - abysmal e1000 performance (DITR) --]
[-- Type: message/rfc822, Size: 5433 bytes --]

From: Thayne Harbaugh <tharbaugh@lnxi.com>
To: linux-kernel@vger.kernel.org
Subject: abysmal e1000 performance (DITR)
Date: Tue, 25 May 2004 11:02:37 -0600
Message-ID: <1085504557.30156.59.camel@tubarao>

The DITR (Dynamic Interrupt Throttle Rate) introduced in the 5.x version
of the e1000 driver can limit performance to less than 50% of expected.

I have two machines with secondary e1000 NICs directly connected (no
switch).  I run a test using Netpipe
(http://www.scl.ameslab.gov/netpipe/):

flu2:~ # /tmp/NPtcp -h 10.0.0.1
Send and receive buffers are 16384 and 87380 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:       1 bytes   4999 times -->      0.03 Mbps in     250.03 usec
  1:       2 bytes    399 times -->      0.06 Mbps in     250.02 usec
  2:       3 bytes    399 times -->      0.09 Mbps in     250.02 usec

(mostly uninteresting lines)

 70:   24573 bytes    121 times -->    380.39 Mbps in     492.85 usec
 71:   24576 bytes    135 times -->    380.43 Mbps in     492.86 usec
 72:   24579 bytes    135 times -->    380.49 Mbps in     492.84 usec
 73:   32765 bytes     67 times -->    341.32 Mbps in     732.39 usec
 74:   32768 bytes     68 times -->    341.37 Mbps in     732.35 usec
 75:   32771 bytes     68 times -->    341.41 Mbps in     732.33 usec
 76:   49149 bytes     68 times -->    437.02 Mbps in     858.04 usec
 77:   49152 bytes     77 times -->    451.39 Mbps in     830.77 usec
 78:   49155 bytes     80 times -->    499.57 Mbps in     750.69 usec

That's the best performance, but it drops back down.

 79:   65533 bytes     44 times -->    409.48 Mbps in    1221.00 usec
 80:   65536 bytes     40 times -->    409.42 Mbps in    1221.24 usec
 81:   65539 bytes     40 times -->    409.43 Mbps in    1221.28 usec

Not much different.

121: 8388605 bytes      3 times -->    379.88 Mbps in  168474.49 usec
122: 8388608 bytes      3 times -->    411.24 Mbps in  155625.68 usec
123: 8388611 bytes      3 times -->    395.81 Mbps in  161693.50 usec

And there's the end.

I would expect to see ~900 Mbps performance (in fact, a Broadcom tg3 NIC
in the same machine gives the expected ~900 Mbps performance).  The
older, 4.x e1000 series of drivers gives the ~900 Mbps performance as
expected.  I have traced the abysmal performance to the DITR code.  I
have added some output to the e1000_main.c:e1000_watchdog() section
where the Dynamic interrupt is calculated and set.  It's interesting to
note how the goc (good octet count) and the itr oscillate during the
netpipe run (ritr is the real ITR setting that is written to the e1000
ITR register):

goc(18=9+9) dif(0) ritr(1953) DITR = 2000
goc(0=0+0) dif(0) ritr(488) DITR = 8000
goc(44=22+22) dif(0) ritr(1953) DITR = 2000
goc(0=0+0) dif(0) ritr(488) DITR = 8000
goc(54=27+27) dif(0) ritr(1953) DITR = 2000

(many lines of oscillation and increased activity)

goc(0=0+0) dif(0) ritr(488) DITR = 8000
goc(10558=5299+5258) dif(41) ritr(1930) DITR = 2023
goc(0=0+0) dif(0) ritr(488) DITR = 8000
goc(9996=5228+4768) dif(459) ritr(1717) DITR = 2275
goc(0=0+0) dif(0) ritr(488) DITR = 8000
goc(11180=5378+5801) dif(422) ritr(1754) DITR = 2226
goc(0=0+0) dif(0) ritr(488) DITR = 8000
goc(10817=5304+5512) dif(208) ritr(1846) DITR = 2115


It is very interesting to note that if the 5.x driver is loaded with
InterruptThrottleRate=8000 (the default setting of the 4.x e1000 drivers
- which also disables dynamic adjustment of the ITR) then performance is
~900 Mbps:

119: 6291456 bytes      3 times -->    891.33 Mbps in   53851.83 usec
120: 6291459 bytes      3 times -->    891.34 Mbps in   53851.35 usec
121: 8388605 bytes      3 times -->    895.99 Mbps in   71429.49 usec
122: 8388608 bytes      3 times -->    881.12 Mbps in   72634.50 usec
123: 8388611 bytes      3 times -->    885.65 Mbps in   72263.48 usec

My assessment for the poor performance using is DITR is that this
reduces load on the box by limiting interrupts while increasing latency
to service the packets.  The problem is that this is done irrespective
of the actual load on the system and thus results in gratuitous latency
being added.  In other words: why limit the interrupts and reduce the
load on the system when the system isn't loaded and has nothing better
to do?  This kills performance on systems that have plenty of horsepower
to handle their load as well as service interrupts.

I recommend that the DITR formula should use the system load to scale
the 6000/2000 split, and/or that the 8000 ITR setting be the default and
the dynamic setting of ITR=1 be the option.

--- linux/drivers/net/e1000/e1000_param.c.orig	2004-05-25
18:05:10.000000000 -0700
+++ linux/drivers/net/e1000/e1000_param.c	2004-05-25 18:05:26.000000000
-0700
@@ -224,7 +224,7 @@
 #define MAX_TXABSDELAY            0xFFFF
 #define MIN_TXABSDELAY                 0
 
-#define DEFAULT_ITR                    1
+#define DEFAULT_ITR                 8000
 #define MAX_ITR                   100000
 #define MIN_ITR                      100
 


-- 
Thayne Harbaugh
Linux Networx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
@ 2004-08-27 22:41 Ronciak, John
  2004-08-30 17:06 ` Thayne Harbaugh
  0 siblings, 1 reply; 12+ messages in thread
From: Ronciak, John @ 2004-08-27 22:41 UTC (permalink / raw)
  To: tharbaugh
  Cc: Jeff Garzik, hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

Thayne,

I can't speak to what happened previously, only what's going on now.
You need to be careful about basing any tuning on any one single test.
For each one of these test you find that have performance problems I can
show you where it works just fine and most cases works better.  So let
be careful about making judgements like this.

We are going to look at this and make some changes based on what we
find.  I already said that we probably won't be ripping out the DITR as
some people really make a lot of use out of it.  It may be off by
default or have some default setting which doesn't hurt test cases.
Since Netpipe is a relatively unused test for performance (not really
it's purpose, it was developed to find "holes" in internet packets
lengths), it's not part of our normal testing.

Like I said, we are looking at it and will come up with a more robust
solution.

Cheers,
John


> -----Original Message-----
> From: Thayne Harbaugh [mailto:tharbaugh@lnxi.com] 
> Sent: Friday, August 27, 2004 3:05 PM
> To: Ronciak, John
> Cc: Jeff Garzik; hadi@cyberus.ca; Venkatesan, Ganesh; 
> netdev@oss.sgi.com; Feldman, Scott; Brandeburg, Jesse
> Subject: RE: [PATCH] abysmal e1000 performance (DITR)
> 
> 
> On Fri, 2004-08-27 at 14:49 -0700, Ronciak, John wrote:
> > Jamal, Thayne,
> > 
> > I've asked Jeff to go ahead and apply this patch as a way 
> around this
> > for now.  We would liketo see the DITR stay but now have this
> > performacne problem so we don't want to rip it out.  We do 
> however need
> > a test case to replicate this as we have not been seeing it in our
> > testing.  Please get us those case that break things.  We'll have a
> > better solution longer term based on the test cases (as 
> well as the ones
> > we normally use of course).
> 
> Attached is an email that I sent out 2004 May 25.  The email is very
> detailed about what the problem is, how to test it, and an initial
> patch.  The email was sent to several addresses at Intel.  I 
> later sent
> it to lkml and netdev.
> 
> There was almost no response at the time.  The best reply that I
> received from Intel was that the DITR was put in and calibrated
> according to a marketing benchmark program and that it wouldn't be
> changed even though tests (and customers) showed that the performance
> was *abysmal*.  It's also a big question what role DITR plays when it
> seems deprecated by NAPI.
> 
> The response was disappointing.  I'm now wondering why this has now
> become interesting and the thread has resumed.  What's 
> changed?  How did
> this catch someone's attention?  What can I do next time 
> (hopefully that
> won't happen) so that people *do* take interest.
> 
> > 
> > > -----Original Message-----
> > > From: netdev-bounce@oss.sgi.com 
> > > [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Thayne Harbaugh
> > > Sent: Thursday, August 26, 2004 2:29 PM
> > > To: Jeff Garzik
> > > Cc: hadi@cyberus.ca; Venkatesan, Ganesh; netdev@oss.sgi.com; 
> > > Feldman, Scott; Brandeburg, Jesse
> > > Subject: Re: [PATCH] abysmal e1000 performance (DITR)
> > > 
> > > 
> > > On Thu, 2004-08-26 at 16:26 -0400, Jeff Garzik wrote:
> > > > Thayne Harbaugh wrote:
> > > > > On Thu, 2004-08-26 at 13:55 -0400, jamal wrote:
> > > > > 
> > > > >>Ganesh,
> > > > >>
> > > > >>Can you please make this feature off by default and perhaps
> > > > >>accesible via ethtool for peopel who want to turn it on.
> > > > >>I just wasted a few hours and was bitten by this 
> performance-wise.
> > > > >>Please consider disabling it.
> > > > > 
> > > > > 
> > > > > This is a *horrible* problem.  Even though it's fixable 
> > > by passing a
> > > > > module parameter, the default bites those that *know* 
> > > about it.  We have
> > > > > had customers bitten by this and customers that have 
> insisted in
> > > > > swapping all the NICs in a cluster to Broadcom TG3 NICs.
> > > > > 
> > > > > It's a black eye for Intel and a loss of business - 
> > > that's the opinion
> > > > > of our customers.
> > > > 
> > > > 
> > > > If it's so bad we should disable it by default, either via 
> > > the module 
> > > > parameter or via a kernel CONFIG_xxx option.
> > > 
> > > Yes, it is so bad.  The dynamic interrupt setting should 
> be deprecated
> > > by the use of NAPI.
> > > 
> > > This is a simple way to disable it, yet still keep the 
> code so that
> > > someone can enable it if they really wanted it.  I, however, 
> > > would just
> > > as soon see all of the DITR code ripped out.
> > > 
> > > There are other ways that might be better for dealing with 
> > > it, yet still
> > > keeping the DITR code viable.
> > > 
> > > --- drivers/net/e1000/e1000_param.c.broken_ditr 2004-08-26 
> > > 15:40:34.436456736 -0600
> > > +++ drivers/net/e1000/e1000_param.c     2004-08-26 
> > > 15:49:07.186506880 -0600
> > > @@ -212,7 +212,7 @@
> > >  #define MAX_TXABSDELAY            0xFFFF
> > >  #define MIN_TXABSDELAY                 0
> > >   
> > > -#define DEFAULT_ITR                    1
> > > +#define DEFAULT_ITR                 8000
> > >  #define MAX_ITR                   100000
> > >  #define MIN_ITR                      100
> > >   
> > > 
> > > 
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] abysmal e1000 performance (DITR)
  2004-08-26 21:28       ` Thayne Harbaugh
@ 2004-08-28 22:58         ` Jeff Garzik
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2004-08-28 22:58 UTC (permalink / raw)
  To: tharbaugh
  Cc: hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

Thayne Harbaugh wrote:
> --- drivers/net/e1000/e1000_param.c.broken_ditr 2004-08-26 15:40:34.436456736 -0600
> +++ drivers/net/e1000/e1000_param.c     2004-08-26 15:49:07.186506880 -0600
> @@ -212,7 +212,7 @@
>  #define MAX_TXABSDELAY            0xFFFF
>  #define MIN_TXABSDELAY                 0
>   
> -#define DEFAULT_ITR                    1
> +#define DEFAULT_ITR                 8000
>  #define MAX_ITR                   100000
>  #define MIN_ITR                      100

applied

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] abysmal e1000 performance (DITR)
  2004-08-27 22:41 Ronciak, John
@ 2004-08-30 17:06 ` Thayne Harbaugh
  0 siblings, 0 replies; 12+ messages in thread
From: Thayne Harbaugh @ 2004-08-30 17:06 UTC (permalink / raw)
  To: Ronciak, John
  Cc: Jeff Garzik, hadi, Venkatesan, Ganesh, netdev, Feldman, Scott,
	Brandeburg, Jesse

On Fri, 2004-08-27 at 15:41 -0700, Ronciak, John wrote:
> Thayne,
> 
> I can't speak to what happened previously, only what's going on now.
> You need to be careful about basing any tuning on any one single test.
> For each one of these test you find that have performance problems I can
> show you where it works just fine and most cases works better.  So let
> be careful about making judgements like this.

I would agree with you if the problem was only apparent with a
benchmark.  The incident that started the investigation was due to real-
world workloads on a cluster - a newer cluster that should have been
significantly faster, but was many times slower than an older cluster.
That is why customers can't use the e1000 with the stock 5.x e1000
driver (or without changing the DITR value as a module option).  When
the default setup is anywhere from 10 to 30 percent of the expected
performance then it becomes very difficult to justify shipping a NIC
that will likely be incorrectly configured when the driver or kernel is
changed - too many support problems.

I see that as a wise judgment based on the real usage patterns of
customers - not a knee-jerk reaction based on a simplistic benchmark.
At the time that I investigated this problem I looked at other
benchmarks that *didn't* exhibit this problem.  The question was why our
customers had this problem but it didn't show up when we tested the
cluster.

It's a perfect example of the classic problem: benchmarks aren't always
representative of real-world loads.  This is a case where a real-world
load exposed a deficiency of the driver - a deficiency that wasn't
exposed by benchmarks but becomes quite obvious when the driver is
scrutinized and understood.

> We are going to look at this and make some changes based on what we
> find.

Wonderful.

>   I already said that we probably won't be ripping out the DITR as
> some people really make a lot of use out of it.

That's what I'm hoping you can help me with: why is DITR so useful?  I
just can't see why:

NAPI does the same thing and can work with *any* card (provided the
driver is written correctly).  I'd much rather deal with something
universal than have to deal with something unique to a card.

The DITR algorithm doesn't consider packet backlog nor system load.  Why
reduce the interrupt rate when the load is low?  When the interrupt rate
is reduced when the load is low then unnecessary latency is added.  The
DITR may work fine with an old system with an e1000, but a dual Opteron
with an e1000 has plenty of horsepower to perform calculations *and*
push packets.  For DITR to correctly work, it needs to consider much
more than the number of packets received and transmitted - the current
algorithm is overly simplistic (it also doesn't correctly consider
symmetric vs asymmetric receive/transmit loads).

Please provide details so that I can become one of those that can ". . .
make a lot of use out of [DITR]."

> It may be off by
> default or have some default setting which doesn't hurt test cases.

What about settings that don't hurt real-world fluid dynamics
calculations?  I couldn't care less about benchmarks that aren't
representative of the real-world load.  The Netpipe program was used
because it reasonably resembles the traffic flow of real-world cases
that didn't work with DITR.  Netpipe is much easier to provide and
configure than a fluid dynamics cluster program that would necessitate
licensing and other infrastructure for use as a testing tool.

> Since Netpipe is a relatively unused test for performance (not really
> it's purpose, it was developed to find "holes" in internet packets
> lengths), it's not part of our normal testing.

Well, maybe I can interest you in buying a small cluster with some fluid
dynamics software that has packet flow similar to what Netpipe produces?

> Like I said, we are looking at it and will come up with a more robust
> solution.

Wonderful.  I hope you can find a good solution.  I do prefer being able
to use all those e1000 NICs rather than replace them.  Yes, really.  I
significantly prefer using good products that perform as expected than
spending my time whining, complaining and carrying on.  Yes, really, I
prefer to have happy customers, happy vendors and be happy myself.  I
only bring this up because I care and I hope someone else cares, and I
want everyone to get back to being happy.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-08-30 17:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-04 15:44 [PATCH] abysmal e1000 performance (DITR) Venkatesan, Ganesh
2004-08-26 17:55 ` jamal
2004-08-26 20:04   ` Thayne Harbaugh
2004-08-26 20:26     ` Jeff Garzik
2004-08-26 21:28       ` Thayne Harbaugh
2004-08-28 22:58         ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2004-08-27 22:41 Ronciak, John
2004-08-30 17:06 ` Thayne Harbaugh
2004-08-27 21:49 Ronciak, John
2004-08-27 22:05 ` Thayne Harbaugh
2004-06-01 23:05 Thayne Harbaugh
2004-06-04 14:20 ` Thayne Harbaugh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).