public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* NAPI and tg3
@ 2003-01-04 15:23 Steffen Persvold
  2003-01-06 15:00 ` Steffen Persvold
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Persvold @ 2003-01-04 15:23 UTC (permalink / raw)
  To: David S. Miller, Jeff Garzik, linux-kernel

Hi guys,

I have access to 8 Dell 2650s with onboard Broadcom BCM5701 chips. They 
are quipped with Dual 2.4 GHz Xeon processors and 1GB of RAM. I'm running 
RedHat 7.3, but with a stock 2.4.20 kernel.

As I understand it the tg3 driver is using NAPI on the 2.4.20 kernel 
(dev->poll). I've been experiencing bad performance (low bandwidth) on 
cluster applications running with LAM for example, but the problem 
manifest itself if you run two bandwidth needy applications in parallel 
on two machines (i.e two processes on each machine, one per processor) 
using Gbe. 

I've disabled the NAPI mode and went back to the old interrupt method and 
this works much better (i.e the bandwidth is now evenly distributed 
between the two applications).

What could be the cause of this problem ? Is it NAPI itself (doing RX 
under scheduler control) or is it something else (for example lock 
contetion).

Any ideas ?

Thanks,
-- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-04 15:23 Steffen Persvold
@ 2003-01-06 15:00 ` Steffen Persvold
  2003-01-06 16:36   ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Persvold @ 2003-01-06 15:00 UTC (permalink / raw)
  To: David S. Miller, Jeff Garzik, linux-kernel

On Sat, 4 Jan 2003, Steffen Persvold wrote:

> Hi guys,
> 
> I have access to 8 Dell 2650s with onboard Broadcom BCM5701 chips. They 
> are quipped with Dual 2.4 GHz Xeon processors and 1GB of RAM. I'm running 
> RedHat 7.3, but with a stock 2.4.20 kernel.
> 
> As I understand it the tg3 driver is using NAPI on the 2.4.20 kernel 
> (dev->poll). I've been experiencing bad performance (low bandwidth) on 
> cluster applications running with LAM for example, but the problem 
> manifest itself if you run two bandwidth needy applications in parallel 
> on two machines (i.e two processes on each machine, one per processor) 
> using Gbe. 
> 
> I've disabled the NAPI mode and went back to the old interrupt method and 
> this works much better (i.e the bandwidth is now evenly distributed 
> between the two applications).
> 
> What could be the cause of this problem ? Is it NAPI itself (doing RX 
> under scheduler control) or is it something else (for example lock 
> contention).
> 
> Any ideas ?

Hi again,

I discovered that if I renice the ksoftirqd processes to level 0, the 
performance was actually better with the NAPI enabled driver compared to 
the one without (as was intended my NAPI IIRC). With the default nice 
level (19) on the ksoftirqd processes, the performance on multithreaded 
programs was pretty lousy with the NAPI enabled driver.

Any reason why the ksoftirqd shouldn't be nice level 0 by default ? Is 
this already fixed in 2.4.21-pre series ?

Regards,
 -- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-06 16:36   ` Alan Cox
@ 2003-01-06 16:12     ` Steffen Persvold
  2003-01-06 17:58       ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Persvold @ 2003-01-06 16:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, Jeff Garzik, Linux Kernel Mailing List

On 6 Jan 2003, Alan Cox wrote:

> On Mon, 2003-01-06 at 15:00, Steffen Persvold wrote:
> > I discovered that if I renice the ksoftirqd processes to level 0, the 
> > performance was actually better with the NAPI enabled driver compared to 
> > the one without (as was intended my NAPI IIRC). With the default nice 
> > level (19) on the ksoftirqd processes, the performance on multithreaded 
> > programs was pretty lousy with the NAPI enabled driver.
> > 
> > Any reason why the ksoftirqd shouldn't be nice level 0 by default ? Is 
> > this already fixed in 2.4.21-pre series ?
> 
> Hack the code to only fall back to ksoftirqd when there are say 10 rather
> than 1 pending event and it should perform even better but still handle
> overload properly
> 

Ok I can try that, but what about the nice level of ksoftirqd ? Any 
specific reason for it beeing 19 (lowest priority) and not 0 (equally to 
most other processes in the system) ?

Regards,
 -- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-06 15:00 ` Steffen Persvold
@ 2003-01-06 16:36   ` Alan Cox
  2003-01-06 16:12     ` Steffen Persvold
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2003-01-06 16:36 UTC (permalink / raw)
  To: Steffen Persvold; +Cc: David S. Miller, Jeff Garzik, Linux Kernel Mailing List

On Mon, 2003-01-06 at 15:00, Steffen Persvold wrote:
> I discovered that if I renice the ksoftirqd processes to level 0, the 
> performance was actually better with the NAPI enabled driver compared to 
> the one without (as was intended my NAPI IIRC). With the default nice 
> level (19) on the ksoftirqd processes, the performance on multithreaded 
> programs was pretty lousy with the NAPI enabled driver.
> 
> Any reason why the ksoftirqd shouldn't be nice level 0 by default ? Is 
> this already fixed in 2.4.21-pre series ?

Hack the code to only fall back to ksoftirqd when there are say 10 rather
than 1 pending event and it should perform even better but still handle
overload properly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-06 16:12     ` Steffen Persvold
@ 2003-01-06 17:58       ` Alan Cox
  2003-01-07 15:24         ` Steffen Persvold
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2003-01-06 17:58 UTC (permalink / raw)
  To: Steffen Persvold; +Cc: David S. Miller, Jeff Garzik, Linux Kernel Mailing List

> Ok I can try that, but what about the nice level of ksoftirqd ? Any 
> specific reason for it beeing 19 (lowest priority) and not 0 (equally to 
> most other processes in the system) ?

Its triggered (in theory but not practice) only when we are overloaded, in
which case we want to do other *useful* work first rather than using all
the cpu to process requests we can't fulfill


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-06 17:58       ` Alan Cox
@ 2003-01-07 15:24         ` Steffen Persvold
  2003-01-07 18:51           ` Robert Olsson
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Persvold @ 2003-01-07 15:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, Jeff Garzik, Linux Kernel Mailing List

On 6 Jan 2003, Alan Cox wrote:

> > Ok I can try that, but what about the nice level of ksoftirqd ? Any 
> > specific reason for it beeing 19 (lowest priority) and not 0 (equally to 
> > most other processes in the system) ?
> 
> Its triggered (in theory but not practice) only when we are overloaded, in
> which case we want to do other *useful* work first rather than using all
> the cpu to process requests we can't fulfill
> 

I've also tried the NAPI patch for e1000 and it experience the same 
performance problem with multithreaded apps. The "NAPI-HOWTO" doesn't 
mention that this could be an issue at all. Does any of the NAPI authors 
(Jeff ?) have any comments ?

Regards,
-- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-07 15:24         ` Steffen Persvold
@ 2003-01-07 18:51           ` Robert Olsson
  2003-01-07 20:54             ` Steffen Persvold
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Olsson @ 2003-01-07 18:51 UTC (permalink / raw)
  To: Steffen Persvold
  Cc: Alan Cox, David S. Miller, Jeff Garzik, Linux Kernel Mailing List


Steffen Persvold writes:
 > 
 > I've also tried the NAPI patch for e1000 and it experience the same 
 > performance problem with multithreaded apps. The "NAPI-HOWTO" doesn't 
 > mention that this could be an issue at all. Does any of the NAPI authors 
 > (Jeff ?) have any comments ?

 Well wasn't ksoftirqd the general solution to schedule softirq's to run
 before next interrupt and by putting them under scheduler control the
 consecutive softirq's is prevented to monopolize the CPU.

 Well you're right the doc may mention this...

 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-07 18:51           ` Robert Olsson
@ 2003-01-07 20:54             ` Steffen Persvold
  0 siblings, 0 replies; 12+ messages in thread
From: Steffen Persvold @ 2003-01-07 20:54 UTC (permalink / raw)
  To: Robert Olsson
  Cc: Alan Cox, David S. Miller, Jeff Garzik, Linux Kernel Mailing List

On Tue, 7 Jan 2003, Robert Olsson wrote:

> 
> Steffen Persvold writes:
>  > 
>  > I've also tried the NAPI patch for e1000 and it experience the same 
>  > performance problem with multithreaded apps. The "NAPI-HOWTO" doesn't 
>  > mention that this could be an issue at all. Does any of the NAPI authors 
>  > (Jeff ?) have any comments ?
> 
>  Well wasn't ksoftirqd the general solution to schedule softirq's to run
>  before next interrupt and by putting them under scheduler control the
>  consecutive softirq's is prevented to monopolize the CPU.
> 
>  Well you're right the doc may mention this...

True, but it doesn't say that if you have two applications loaded on 
a SMP box, one which is for example constantly receiving and sending data 
from/to the network and doing computations on the data (100 % CPU) while 
some other app is only doing computations (also 100 % CPU), the ksoftirqd 
which should receive packets and refill the TX and RX rings will be put 
last in the queue because of its low nice level (19), thus the network 
dependent application has very much lower performance than what could be 
achieved with a nice level of 0 or even running the interrupt based 
mechanism. A nice level of 0 on ksoftirqd is still a heck of a lot better 
than interrupt context isn't it ?

One simple example would be to run a network throughput application 
such as netpipe, and simultaneously start something like the McAlpin 
stream test. You would notice that with a nice level of 19 (on ksoftirqd) 
the netpipe application would get very low throughput, while the stream 
application would be as optimal as it could get. With a nice level of 0 
the netpipe application would get decent throughput and the stream 
application would still produce the same result.

Regards,
 -- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
@ 2003-01-07 22:21 Robert Olsson
  2003-01-08  0:07 ` Steffen Persvold
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Olsson @ 2003-01-07 22:21 UTC (permalink / raw)
  To: Steffen Persvold
  Cc: Robert Olsson, Alan Cox, David S. Miller, Jeff Garzik,
	Linux Kernel Mailing List


Steffen Persvold writes:

 > True, but it doesn't say that if you have two applications loaded on 
 > a SMP box, one which is for example constantly receiving and sending data 
 > from/to the network and doing computations on the data (100 % CPU) while 
 > some other app is only doing computations (also 100 % CPU), the ksoftirqd 
 > which should receive packets and refill the TX and RX rings will be put 
 > last in the queue because of its low nice level (19), thus the network 
 > dependent application has very much lower performance than what could be 
 > achieved with a nice level of 0 or even running the interrupt based 
 > mechanism. A nice level of 0 on ksoftirqd is still a heck of a lot better 
 > than interrupt context isn't it ?


 Yes my scripts test/production has even been setting -19 to ksoftirq just
 for that reason so I almost forgot this issue so I'm happy you brought
 this up. But dev->poll is not the only user of ksoftirq but for heavy
 networking it's gets pretty dominant. So we add something to NAPI_HOWTO 
 and pass the question about ksoftirq default priority to others.

>From a GIGE router in production.

USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 

Cheers.
						--ro

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-07 22:21 NAPI and tg3 Robert Olsson
@ 2003-01-08  0:07 ` Steffen Persvold
  2003-01-09 17:21   ` Robert Olsson
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Persvold @ 2003-01-08  0:07 UTC (permalink / raw)
  To: Robert Olsson
  Cc: Alan Cox, David S. Miller, Jeff Garzik, Linux Kernel Mailing List

On Tue, 7 Jan 2003, Robert Olsson wrote:

> 
> Steffen Persvold writes:
> 
>  > True, but it doesn't say that if you have two applications loaded on 
>  > a SMP box, one which is for example constantly receiving and sending data 
>  > from/to the network and doing computations on the data (100 % CPU) while 
>  > some other app is only doing computations (also 100 % CPU), the ksoftirqd 
>  > which should receive packets and refill the TX and RX rings will be put 
>  > last in the queue because of its low nice level (19), thus the network 
>  > dependent application has very much lower performance than what could be 
>  > achieved with a nice level of 0 or even running the interrupt based 
>  > mechanism. A nice level of 0 on ksoftirqd is still a heck of a lot better 
>  > than interrupt context isn't it ?
> 
> 
>  Yes my scripts test/production has even been setting -19 to ksoftirq just
>  for that reason so I almost forgot this issue so I'm happy you brought
>  this up. But dev->poll is not the only user of ksoftirq but for heavy
>  networking it's gets pretty dominant. So we add something to NAPI_HOWTO 
>  and pass the question about ksoftirq default priority to others.
> 
> >From a GIGE router in production.
> 
> USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
> root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
> root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 
> 

I'm happy that atleast someone can agree on something these days, looking 
at the latest discussions regarding binary only drivers and GPL could make 
one think that all that kernel developers do is to argue about who is 
right (allright, most of the quarrelsome people arent't really kernel 
developers) ;) So, who takes the decission regarding the ksoftirqd and 
when ?


Best regards,
 -- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-08  0:07 ` Steffen Persvold
@ 2003-01-09 17:21   ` Robert Olsson
  2003-01-10  9:00     ` David S. Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Olsson @ 2003-01-09 17:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Steffen Persvold, Alan Cox, Robert Olsson, Jeff Garzik,
	Linux Kernel Mailing List


Before it's get forgotten...

Cheers.
						--ro


--- NAPI_HOWTO.txt.orig	2002-12-24 06:20:31.000000000 +0100
+++ NAPI_HOWTO.txt	2003-01-09 13:25:30.000000000 +0100
@@ -721,6 +721,23 @@
 
 
 
+
+APPENDIX 3: Scheduling issues.
+==============================
+As seen NAPI moves processing to softirq level. Linux uses the ksoftirqd as the 
+general solution to schedule softirq's to run before next interrupt and by putting 
+them under scheduler control. Also this prevents consecutive softirq's from 
+monopolize the CPU. This also have the effect that the priority of ksoftirq needs 
+to be considered when running very CPU-intensive applications and networking to
+get the proper balance of softirq/user balance. Increasing ksoftirq priority to 0 
+(eventually more) is reported cure problems with low network performance at high 
+CPU load.
+
+Most used processes in a GIGE router:
+USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
+root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
+root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 
+
 --------------------------------------------------------------------
 
 relevant sites:

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NAPI and tg3
  2003-01-09 17:21   ` Robert Olsson
@ 2003-01-10  9:00     ` David S. Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2003-01-10  9:00 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: sp, alan, jgarzik, linux-kernel

   From: Robert Olsson <Robert.Olsson@data.slu.se>
   Date: Thu, 9 Jan 2003 18:21:00 +0100

   Before it's get forgotten...

Applied, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-01-10  9:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-07 22:21 NAPI and tg3 Robert Olsson
2003-01-08  0:07 ` Steffen Persvold
2003-01-09 17:21   ` Robert Olsson
2003-01-10  9:00     ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2003-01-04 15:23 Steffen Persvold
2003-01-06 15:00 ` Steffen Persvold
2003-01-06 16:36   ` Alan Cox
2003-01-06 16:12     ` Steffen Persvold
2003-01-06 17:58       ` Alan Cox
2003-01-07 15:24         ` Steffen Persvold
2003-01-07 18:51           ` Robert Olsson
2003-01-07 20:54             ` Steffen Persvold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox