Re: Gigabit/SMP performance problem

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:29 Avery Fay
  2003-01-06 21:23 ` Martin J. Bligh
  0 siblings, 1 reply; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:29 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

Well, judging by the fact that a UP kernel can route more traffic (and 
consequently more interrupts p/s) than an SMP kernel, I think that one cpu 
can probably handle all of the interrupts. Really the issue I'm trying to 
solve is not routing performance, but rather the fact that SMP routing 
performance is worse while using twice the cpu time (2 cpu's at around 95% 
vs. 1 at around 95%).

Avery Fay

"Martin J. Bligh" <mbligh@aracnet.com>
01/03/2003 04:36 PM

        To:     Avery Fay <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Re: Gigabit/SMP performance problem

P3's distributed interrupts round-robin amongst cpus. P4's send 
everything to CPU 0. If you put irq_balance on, it'll spread
them around, but any given interrupt is still only handled by
one CPU (as far as I understand the code). If you hammer one
adaptor, does that generate more interrupts than 1 cpu can handle?
(turn irq balance off by sticking a return at the top of balance_irq,
and hammer one link, see how much CPU power that burns).

M.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-06 20:29 Gigabit/SMP performance problem Avery Fay
@ 2003-01-06 21:23 ` Martin J. Bligh
  2003-01-07 17:19   ` Mike Black
  0 siblings, 1 reply; 25+ messages in thread
From: Martin J. Bligh @ 2003-01-06 21:23 UTC (permalink / raw)
  To: Avery Fay; +Cc: linux-kernel

> Well, judging by the fact that a UP kernel can route more traffic (and 
> consequently more interrupts p/s) than an SMP kernel, I think that one cpu 

Umm ... what are you comparing here? How many CPUs on your SMP kernel?
If I have an 8 CPU machine, you think it can handle less traffic than
a 1-cpu machine running a UP kernel?

> can probably handle all of the interrupts. Really the issue I'm trying to 
> solve is not routing performance, but rather the fact that SMP routing 
> performance is worse while using twice the cpu time (2 cpu's at around 95% 
> vs. 1 at around 95%).

Can you mail out kernel profiles? What's burning all the time here?

Thanks,

M.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-06 21:23 ` Martin J. Bligh
@ 2003-01-07 17:19   ` Mike Black
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Black @ 2003-01-07 17:19 UTC (permalink / raw)
  To: linux-kernel

I just saw an article that might be of some help to all you gigabit hackers....
http://www.nwfusion.com/news/tech/2003/0106techupdate.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Gigabit/SMP performance problem
@ 2003-01-08 21:44 Ronciak, John
  2003-01-09 12:49 ` Robert Olsson
  0 siblings, 1 reply; 25+ messages in thread
From: Ronciak, John @ 2003-01-08 21:44 UTC (permalink / raw)
  To: 'Robert Olsson', Avery Fay; +Cc: Anton Blanchard, linux-kernel

All,

We (Intel - LAN Access Division, e1000 driver) are taking a look at what is
going on here.  We don't have any data yet but we'll keep you posted on what
we find.

Thanks for your patients.

Cheers,
John



> -----Original Message-----
> From: Robert Olsson [mailto:Robert.Olsson@data.slu.se]
> Sent: Tuesday, January 07, 2003 10:16 AM
> To: Avery Fay
> Cc: Anton Blanchard; linux-kernel@vger.kernel.org
> Subject: Re: Gigabit/SMP performance problem
> 
> 
> 
> Avery Fay writes:
>  > Hmm. That paper is actually very interesting. I'm thinking 
> maybe with the 
>  > P4 I'm better off with only 1 cpu. WRT hyperthreading, I 
> actually disabled > it because it make performance worse 
> (wasn't clear in the original email).
> 
> 
>  With 1CPU-SMP-HT I'm on UP level of performance this with 
> forwarding two 
>  single flows evenly distributes between CPU's. So HT payed 
> the SMP cost so 
>  to say.
>  
>  Also I tested the MB bandwidth with new threaded version of 
> pktgen just 
>  TX'ing a packets on 6 GIGE I'm seeing almost 6 Gbit/s TX'ed 
> w 1500 bytes
>  packets.
> 
>  I have problem populating all slots w. GIGE NIC's. WoL (Wake 
> on Lan) this
>  is a real pain... Seems like my adapters needs a standby 
> current 0.8A and 
>  most Power Supplies gives 2.0A for this. (Number come from 
> SuperMicro). 
>  So booting fails radomlingy. You have 8 NIC's -- Didn't you 
> have problem?
> 
>  Anyway I'll guess profiling is needed?
> 
>  Cheers.
> 						--ro
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Gigabit/SMP performance problem
  2003-01-08 21:44 Ronciak, John
@ 2003-01-09 12:49 ` Robert Olsson
  0 siblings, 0 replies; 25+ messages in thread
From: Robert Olsson @ 2003-01-09 12:49 UTC (permalink / raw)
  To: Ronciak, John
  Cc: 'Robert Olsson', Avery Fay, Anton Blanchard, linux-kernel


Ronciak, John writes:
 > All,
 > 
 > We (Intel - LAN Access Division, e1000 driver) are taking a look at what is
 > going on here.  We don't have any data yet but we'll keep you posted on what
 > we find.

 Thanks.
 FYI. SuperMicro reported they added a new MB jumper to disable Standby-Power 
 in order to get systems to boot. I don't think "driver" operation was verfied.

 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Gigabit/SMP performance problem
@ 2003-01-08 21:12 Feldman, Scott
  0 siblings, 0 replies; 25+ messages in thread
From: Feldman, Scott @ 2003-01-08 21:12 UTC (permalink / raw)
  To: Jon Fraser, linux-kernel, Avery Fay

> If you happen to turn on vlans, I be curious about your 
> results.  Our chipsets produced cisco ISL frames instead of 
> 802.1q frames.  Intel admitted the chipset would do it, but 
> 'shouldn't be doing that...'

This problem has been fixed in tot 2.4 and 2.5.  The VLANs were not
being restored after ifup.  

-scott

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
@ 2003-01-08 12:17 Jon Burgess
  0 siblings, 0 replies; 25+ messages in thread
From: Jon Burgess @ 2003-01-08 12:17 UTC (permalink / raw)
  To: avery_fay; +Cc: linux-kernel

Avery Fay wrote:
> can probably handle all of the interrupts. Really the issue I'm
> trying to solve is not routing performance, but rather the fact
> that SMP routing performance is worse while using twice
> the cpu time (2 cpu's at around 95% vs. 1 at around 95%).

Please forgive me if this is a silly suggestion, but are you sure this is a real
95% utilisation in the 2 CPU case. I think some versions of top show 0..200% for
the 2 CPU case, and therefore 95% utilisation is represents a  real CPU
utilisation of 47.5%

     Jon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:38 Avery Fay
  2003-01-07 18:15 ` Robert Olsson
  0 siblings, 1 reply; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:38 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel

Hmm. That paper is actually very interesting. I'm thinking maybe with the 
P4 I'm better off with only 1 cpu. WRT hyperthreading, I actually disabled 
it because it make performance worse (wasn't clear in the original email).

Avery Fay

Anton Blanchard <anton@samba.org>
01/03/2003 10:33 PM

        To:     Avery Fay <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Re: Gigabit/SMP performance problem

> I'm working with a dual xeon platform with 4 dual e1000 cards on 
different 
> pci-x buses. I'm having trouble getting better performance with the 
second 
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can 
route 
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
> around 90% utilization. This suggests to me that the network code is 
> serialized. I would expect one of two things from my understanding of 
the 
> 2.4.x networking improvements (softirqs allowing execution on more than 
> one cpu):

The Fujitsu guys have a nice summary of this:

http://www.labs.fujitsu.com/en/techinfo/linux/lse-0211/index.html

Skip forward to page 8.

Dont blame the networking code just yet :) Notice how worse UP vs SMP
performance is on the P4 compared to the P3?

This brings up another point, is a single CPU with hyperthreading worth
it? As Rusty will tell you, you need to compare it with a UP kernel
since it avoids all the locking overhead. I suspect for a lot of cases
HT will be a loss (imagine your case, comparing UP and one CPU HT)

Anton

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-06 20:38 Avery Fay
@ 2003-01-07 18:15 ` Robert Olsson
  0 siblings, 0 replies; 25+ messages in thread
From: Robert Olsson @ 2003-01-07 18:15 UTC (permalink / raw)
  To: Avery Fay; +Cc: Anton Blanchard, linux-kernel

Avery Fay writes:
 > Hmm. That paper is actually very interesting. I'm thinking maybe with the 
 > P4 I'm better off with only 1 cpu. WRT hyperthreading, I actually disabled > it because it make performance worse (wasn't clear in the original email).

 With 1CPU-SMP-HT I'm on UP level of performance this with forwarding two 
 single flows evenly distributes between CPU's. So HT payed the SMP cost so 
 to say.

 Also I tested the MB bandwidth with new threaded version of pktgen just 
 TX'ing a packets on 6 GIGE I'm seeing almost 6 Gbit/s TX'ed w 1500 bytes
 packets.

 I have problem populating all slots w. GIGE NIC's. WoL (Wake on Lan) this
 is a real pain... Seems like my adapters needs a standby current 0.8A and 
 most Power Supplies gives 2.0A for this. (Number come from SuperMicro). 
 So booting fails radomlingy. You have 8 NIC's -- Didn't you have problem?

 Anyway I'll guess profiling is needed?

 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:33 Avery Fay
  0 siblings, 0 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:33 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel, Martin J. Bligh

The numbers I got are taking into account packet drops. I think that the 
point where performance starts to go down is when an interface is dropping 
more than a couple hundred packets per second (at least in my testing). In 
my testing scenario, traffic is perfectly distributed across interfaces 
and I have bound the irqs using smp_affinity. Unfortunately, the 
performance gain is small if any.

Avery Fay

Andrew Theurer <habanero@us.ibm.com>
01/03/2003 05:31 PM

        To:     "Martin J. Bligh" <mbligh@aracnet.com>, Avery Fay <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Re: Gigabit/SMP performance problem

On Friday 03 January 2003 15:36, Martin J. Bligh wrote:

...

Monitor for dropped packets when increasing int delay.  At least on the 
older 
e1000 adapters, you would get dropped packets, etc, making the problem 
worse 
in other areas. 
>
> Makes sense, increasing the delays should reduce the interrupt load.
>
> > I'm using 3 Intel PRO/1000 MT Dual Port Server adapters as well as 2
> > onboard Intel PRO/1000 ports. The adapters use the 82546EB chips. I
> > believe that the onboard ports use the same although I'm not sure.
> >
> > Should I get rid of IRQ load balancing? And what do you mean
> > "Intel broke the P4's interrupt routing"?
>
> P3's distributed interrupts round-robin amongst cpus. P4's send
> everything to CPU 0. If you put irq_balance on, it'll spread
> them around, but any given interrupt is still only handled by
> one CPU (as far as I understand the code). If you hammer one
> adaptor, does that generate more interrupts than 1 cpu can handle?
> (turn irq balance off by sticking a return at the top of balance_irq,
> and hammer one link, see how much CPU power that burns).

Another problem you may have is that irq_balance is random, and sometimes 
more 
than one interrupt is serviced by the same cpu at the same.  Actually, let 
me 
clarify.  In your case if your netowrk load was "even" across the 
adapters, 
ideally you would want cpu0 handling the first 4 adapters and cpu1 
handling 
the last 4 adapters.  With irq_balance, this is usually not the case. 
There 
will be times where one cpu is doing more work than the other, possibly 
becomming a bottleneck. 

Now, there was some code in SuSE's kernel (SuSE 8.0, 2.4.18) which did a 
round 
robin static assingment of interrupt to cpu.  In your case, all even 
interrupt numbers would go to cpu0 and all odd interrupt numbers would go 
to 
cpu1.  Since you have exactly 4 adapters in even interrupts and 4 on odd 
interrupts, that would work perfectly.  Now, that doesn't mean there is 
some 
other problem, like PCI bandwidth, but it's a start.  Also, you might be 
able 
to emulate this with irq affinity (/proc/irq/<num>/smp_affnity) but last 
time 
I tried it on P4, it didn't work at all -No interrupts!

-Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:25 Avery Fay
  0 siblings, 0 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:25 UTC (permalink / raw)
  To: Robert Olsson; +Cc: linux-kernel

Right now, I have 4 interfaces in and 4 interfaces out (ideal routing 
setup). I'm using just shy of 1500 byte udp packets for testing.

I tried binding the irqs for each pair of interfaces to a cpu... so for 
example, if eth0 to sending to eth2 they would be bound to the same cpu. 
This seemed to improve performance a little, but I didn't get definite 
numbers and it certainly wasn't much.

I'm currently playing around with UP kernels, but when I go back I'll 
check out softnet_stat

Avery Fay

Robert Olsson <Robert.Olsson@data.slu.se>
01/03/2003 04:20 PM

        To:     "Avery Fay" <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Gigabit/SMP performance problem

Avery Fay writes:
 > 
 > I'm working with a dual xeon platform with 4 dual e1000 cards on 
different 
 > pci-x buses. I'm having trouble getting better performance with the 
second 
 > cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can 
route 
 > about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
 > (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
 > around 90% utilization. This suggests to me that the network code is 
 > serialized. I would expect one of two things from my understanding of 
the 
 > 2.4.x networking improvements (softirqs allowing execution on more than 

 > one cpu):

 Well you have a gigabit router :-)

 How is your routing setup? Packet size?

 Also you'll never get increased performance of a single flow with SMP. 
 Aggregated performance possible at best. I've been fighting with for some 

 time too.

 You have some important data in /proc/net/softnet_stat which are per cpu
 packets received and "cpu collisions" should interest you.

 As far as I understand there no serialization in forwarding path except 
where
 it has to be -- when we add softirq's from different cpu into a single 
device.
 This seen in "cpu collisions"

 Also here we get into inherent SMP cache bouncing problem with TX 
interrupts
 When TX has skb's which are processed/created in different CPU's. Which 
CPU
 gonna take the interrupt? No matter how we do we run kfree we gona see a 
lot 
 of cache bouncing. For systems that have same in/out interface 
smp_affinity
 can be used. In practice this impossible for forwarding.

 And this bouncing hurts especially for small pakets....

 A litte TX test illustrates. Sender on cpu0.

 UP                      186 kpps
 SMP Aff to cpu0         160 kpps
 SMP Aff to cpu0, cpu1   124 kpps
 SMP Aff to cpu1         106 kpps

 We are playing some code that might decrease this problem.

 Cheers.
                 --ro

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <b8ce5e32.0301040439.7bdaa903@posting.google.com>]

* Re: Gigabit/SMP performance problem
       [not found] <b8ce5e32.0301040439.7bdaa903@posting.google.com>
@ 2003-01-06 18:27 ` Bill Davidsen
  2003-01-06 19:09   ` Daniel Blueman
  0 siblings, 1 reply; 25+ messages in thread
From: Bill Davidsen @ 2003-01-06 18:27 UTC (permalink / raw)
  To: Daniel Blueman; +Cc: Linux-Kernel Mailing List

On 4 Jan 2003, Daniel Blueman wrote:

> It's interesting you have IRQs balanced over the two logical
> processors. I can't get this on HT Xeons with stock RedHat 7.3 kernel.

I think he's using two physical processors, if by "logical processors" you
are thinking HT... I also recall he has HT off, but the original post
isn't handy.

> 
> Can you post the exact kernel version string, please?
> 
> TIA,
>   Dan
> 
> "Avery Fay" <avery_fay@symantec.com> wrote in message news:<OF256CD297.9F92C038-ON85256CA3.006A4034-85256CA3.00705DEA@symantec.com>...
> > Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load balancing as 
> > shown below (seems to be applied to Red Hat's kernel). Here's 
> > /proc/interrupts:

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-06 18:27 ` Bill Davidsen
@ 2003-01-06 19:09   ` Daniel Blueman
  2003-01-06 19:26     ` Brian Tinsley
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Blueman @ 2003-01-06 19:09 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

Even with HT turned off on this dual-Xeon box, all IRQs are routed to CPU 0.

Kernel here is the latest RedHat 2.4.18 one.

Just curious what kernel Avery is running...

Dan

> On 4 Jan 2003, Daniel Blueman wrote:
> 
> > It's interesting you have IRQs balanced over the two logical
> > processors. I can't get this on HT Xeons with stock RedHat 7.3 kernel.
> 
> I think he's using two physical processors, if by "logical processors" you
> are thinking HT... I also recall he has HT off, but the original post
> isn't handy.
> 
> > 
> > Can you post the exact kernel version string, please?
> > 
> > TIA,
> >   Dan
> > 
> > "Avery Fay" <avery_fay@symantec.com> wrote in message
>
news:<OF256CD297.9F92C038-ON85256CA3.006A4034-85256CA3.00705DEA@symantec.com>...
> > > Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load
> balancing as 
> > > shown below (seems to be applied to Red Hat's kernel). Here's 
> > > /proc/interrupts:
> 
> -- 
> bill davidsen <davidsen@tmr.com>
>   CTO, TMR Associates, Inc
> Doing interesting things with little computers since 1979.
> 

-- 
Daniel J Blueman

+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-06 19:09   ` Daniel Blueman
@ 2003-01-06 19:26     ` Brian Tinsley
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Tinsley @ 2003-01-06 19:26 UTC (permalink / raw)
  To: Daniel Blueman; +Cc: Bill Davidsen, linux-kernel

I've been able to distribute IRQ servicing to other processors on P4 
Xeon HT systems as described in the IRQ-affinity.txt file in the 
kernel-source Documentation directory. Well, it shows up as doing so in 
/proc/interrupts anyway! Looks like CPUs 0, 2, 4, etc.. are the real 
processors and 1,3,5, etc.. are the logical processors (which do not 
handle interrupts).


Daniel Blueman wrote:

>Even with HT turned off on this dual-Xeon box, all IRQs are routed to CPU 0.
>
>Kernel here is the latest RedHat 2.4.18 one.
>
>Just curious what kernel Avery is running...
>
>Dan
>
>  
>
>>On 4 Jan 2003, Daniel Blueman wrote:
>>
>>    
>>
>>>It's interesting you have IRQs balanced over the two logical
>>>processors. I can't get this on HT Xeons with stock RedHat 7.3 kernel.
>>>      
>>>
>>I think he's using two physical processors, if by "logical processors" you
>>are thinking HT... I also recall he has HT off, but the original post
>>isn't handy.
>>
>>    
>>
>>>Can you post the exact kernel version string, please?
>>>
>>>TIA,
>>>  Dan
>>>
>>>"Avery Fay" <avery_fay@symantec.com> wrote in message
>>>      
>>>
>news:<OF256CD297.9F92C038-ON85256CA3.006A4034-85256CA3.00705DEA@symantec.com>...
>  
>
>>>>Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load
>>>>        
>>>>
>>balancing as 
>>    
>>
>>>>shown below (seems to be applied to Red Hat's kernel). Here's 
>>>>/proc/interrupts:
>>>>        
>>>>
>>-- 
>>bill davidsen <davidsen@tmr.com>
>>  CTO, TMR Associates, Inc
>>Doing interesting things with little computers since 1979.
>>
>>    
>>
>
>  
>

-- 

-[========================]-
-[      Brian Tinsley     ]-
-[ Chief Systems Engineer ]-
-[        Emageon         ]-
-[========================]-





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
@ 2003-01-03 20:25 Avery Fay
  2003-01-03 21:19 ` Arjan van de Ven
  2003-01-03 21:36 ` Martin J. Bligh
  0 siblings, 2 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-03 20:25 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load balancing as 
shown below (seems to be applied to Red Hat's kernel). Here's 
/proc/interrupts:

           CPU0       CPU1 
  0:     179670     182501    IO-APIC-edge  timer
  1:        386        388    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          1          0    IO-APIC-edge  rtc
 12:          9          9    IO-APIC-edge  PS/2 Mouse
 14:       1698       1511    IO-APIC-edge  ide0
 24:    1300174    1298071   IO-APIC-level  eth2
 25:    1935085    1935625   IO-APIC-level  eth3
 28:    1162013    1162734   IO-APIC-level  eth4
 29:    1971246    1967758   IO-APIC-level  eth5
 48:    2753990    2753821   IO-APIC-level  eth0
 49:    2047386    2043894   IO-APIC-level  eth1
 72:     838987     841143   IO-APIC-level  eth6
 73:    2767885    2768307   IO-APIC-level  eth7
NMI:          0          0 
LOC:     362009     362008 
ERR:          0
MIS:          0

I started traffic at different times on the various interfaces so the 
number of interrupts per interface aren't uniform.

I modified RxIntDelay, TxIntDelay, RxAbsIntDelay, TxAbsIntDelay, 
FlowControl, RxDescriptors, TxDescriptors. Increasing the various 
IntDelays seemed to improve performance slightly.

I'm using 3 Intel PRO/1000 MT Dual Port Server adapters as well as 2 
onboard Intel PRO/1000 ports. The adapters use the 82546EB chips. I 
believe that the onboard ports use the same although I'm not sure.

Should I get rid of IRQ load balancing? And what do you mean "Intel broke the P4's interrupt routing"?

Thanks,
Avery Fay





"Martin J. Bligh" <mbligh@aracnet.com>
01/03/2003 01:05 PM

 
        To:     Avery Fay <avery_fay@symantec.com>, linux-kernel@vger.kernel.org
        cc: 
        Subject:        Re: Gigabit/SMP performance problem


> I'm working with a dual xeon platform with 4 dual e1000 cards on 
different 
> pci-x buses. I'm having trouble getting better performance with the 
second 
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can 
route 
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
> around 90% utilization. This suggests to me that the network code is 
> serialized. I would expect one of two things from my understanding of 
the 
> 2.4.x networking improvements (softirqs allowing execution on more than 
> one cpu):
> 
> 1.) with smp I would get ~2.9 gb/s but the combined cpu utilization 
would 
> be that of one cpu at 90%.
> 2.) or with smp I would get more than ~2.9 gb/s.
> 
> Has anyone been able to utilize more than one cpu with pure forwarding?
> 
> Note: I realize that I am not using a stock kernel. I was in the past, 
but 
> I ran into the same problem (smp not improving performance), just at 
lower 
> speeds (redhat's kernel was faster). Therefore, this problem is neither 
> introduced nor solved by redhat's kernel. If anyone has suggestions for 
> improvements, I can move back to a stock kernel.
> 
> Note #2: I've tried tweaking a lot of different things including binding 

> irq's to specific cpus, playing around with e1000 modules settings, etc.
> 
> Thanks in advance and please CC me with any suggestions as I'm not 
> subscribed to the list.

Dual what Xeon? I presume a P4 thing. Can you cat /proc/interrupts? 
Are you using the irq_balance code? If so, I think you'll only use 
1 cpu to process all the interrupts from each gigabit card. Not that 
you have much choice, since Intel broke the P4's interrupt routing.

Which of the e1000 modules settings did you play with? tx_delay
and rx_delay? What rev of the e1000 chipset?

M.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 20:25 Avery Fay
@ 2003-01-03 21:19 ` Arjan van de Ven
  2003-01-03 21:36 ` Martin J. Bligh
  1 sibling, 0 replies; 25+ messages in thread
From: Arjan van de Ven @ 2003-01-03 21:19 UTC (permalink / raw)
  To: Avery Fay; +Cc: Martin J. Bligh, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 215 bytes --]

On Fri, 2003-01-03 at 21:25, Avery Fay wrote:

> Should I get rid of IRQ load balancing? And what do you mean "Intel broke the P4's interrupt routing"?

well you can bind IRQ's to specific cpu's in /proc....


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 20:25 Avery Fay
  2003-01-03 21:19 ` Arjan van de Ven
@ 2003-01-03 21:36 ` Martin J. Bligh
  2003-01-03 22:31   ` Andrew Theurer
  1 sibling, 1 reply; 25+ messages in thread
From: Martin J. Bligh @ 2003-01-03 21:36 UTC (permalink / raw)
  To: Avery Fay; +Cc: linux-kernel

> Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load balancing as 
> shown below (seems to be applied to Red Hat's kernel). Here's 
> /proc/interrupts:

Is in 2.4.20-ac2 at least. See if arch/i386/kernel/io_apic.c
has a function called balance_irq.

>            CPU0       CPU1 
>   0:     179670     182501    IO-APIC-edge  timer
>   1:        386        388    IO-APIC-edge  keyboard
>   2:          0          0          XT-PIC  cascade
>   8:          1          0    IO-APIC-edge  rtc
>  12:          9          9    IO-APIC-edge  PS/2 Mouse
>  14:       1698       1511    IO-APIC-edge  ide0
>  24:    1300174    1298071   IO-APIC-level  eth2
>  25:    1935085    1935625   IO-APIC-level  eth3
>  28:    1162013    1162734   IO-APIC-level  eth4
>  29:    1971246    1967758   IO-APIC-level  eth5
>  48:    2753990    2753821   IO-APIC-level  eth0
>  49:    2047386    2043894   IO-APIC-level  eth1
>  72:     838987     841143   IO-APIC-level  eth6
>  73:    2767885    2768307   IO-APIC-level  eth7
> NMI:          0          0 
> LOC:     362009     362008 
> ERR:          0
> MIS:          0
> 
> I started traffic at different times on the various interfaces so the 
> number of interrupts per interface aren't uniform.
> 
> I modified RxIntDelay, TxIntDelay, RxAbsIntDelay, TxAbsIntDelay, 
> FlowControl, RxDescriptors, TxDescriptors. Increasing the various 
> IntDelays seemed to improve performance slightly.

Makes sense, increasing the delays should reduce the interrupt load.

> I'm using 3 Intel PRO/1000 MT Dual Port Server adapters as well as 2 
> onboard Intel PRO/1000 ports. The adapters use the 82546EB chips. I 
> believe that the onboard ports use the same although I'm not sure.
>
> Should I get rid of IRQ load balancing? And what do you mean 
> "Intel broke the P4's interrupt routing"?

P3's distributed interrupts round-robin amongst cpus. P4's send 
everything to CPU 0. If you put irq_balance on, it'll spread
them around, but any given interrupt is still only handled by
one CPU (as far as I understand the code). If you hammer one
adaptor, does that generate more interrupts than 1 cpu can handle?
(turn irq balance off by sticking a return at the top of balance_irq,
and hammer one link, see how much CPU power that burns).

M.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 21:36 ` Martin J. Bligh
@ 2003-01-03 22:31   ` Andrew Theurer
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Theurer @ 2003-01-03 22:31 UTC (permalink / raw)
  To: Martin J. Bligh, Avery Fay; +Cc: linux-kernel

On Friday 03 January 2003 15:36, Martin J. Bligh wrote:
> > Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load balancing
> > as shown below (seems to be applied to Red Hat's kernel). Here's
> > /proc/interrupts:
>
> Is in 2.4.20-ac2 at least. See if arch/i386/kernel/io_apic.c
> has a function called balance_irq.
>
> >            CPU0       CPU1
> >   0:     179670     182501    IO-APIC-edge  timer
> >   1:        386        388    IO-APIC-edge  keyboard
> >   2:          0          0          XT-PIC  cascade
> >   8:          1          0    IO-APIC-edge  rtc
> >  12:          9          9    IO-APIC-edge  PS/2 Mouse
> >  14:       1698       1511    IO-APIC-edge  ide0
> >  24:    1300174    1298071   IO-APIC-level  eth2
> >  25:    1935085    1935625   IO-APIC-level  eth3
> >  28:    1162013    1162734   IO-APIC-level  eth4
> >  29:    1971246    1967758   IO-APIC-level  eth5
> >  48:    2753990    2753821   IO-APIC-level  eth0
> >  49:    2047386    2043894   IO-APIC-level  eth1
> >  72:     838987     841143   IO-APIC-level  eth6
> >  73:    2767885    2768307   IO-APIC-level  eth7
> > NMI:          0          0
> > LOC:     362009     362008
> > ERR:          0
> > MIS:          0
> >
> > I started traffic at different times on the various interfaces so the
> > number of interrupts per interface aren't uniform.
> >
> > I modified RxIntDelay, TxIntDelay, RxAbsIntDelay, TxAbsIntDelay,
> > FlowControl, RxDescriptors, TxDescriptors. Increasing the various
> > IntDelays seemed to improve performance slightly.

Monitor for dropped packets when increasing int delay.  At least on the older 
e1000 adapters, you would get dropped packets, etc, making the problem worse 
in other areas. 
>
> Makes sense, increasing the delays should reduce the interrupt load.
>
> > I'm using 3 Intel PRO/1000 MT Dual Port Server adapters as well as 2
> > onboard Intel PRO/1000 ports. The adapters use the 82546EB chips. I
> > believe that the onboard ports use the same although I'm not sure.
> >
> > Should I get rid of IRQ load balancing? And what do you mean
> > "Intel broke the P4's interrupt routing"?
>
> P3's distributed interrupts round-robin amongst cpus. P4's send
> everything to CPU 0. If you put irq_balance on, it'll spread
> them around, but any given interrupt is still only handled by
> one CPU (as far as I understand the code). If you hammer one
> adaptor, does that generate more interrupts than 1 cpu can handle?
> (turn irq balance off by sticking a return at the top of balance_irq,
> and hammer one link, see how much CPU power that burns).

Another problem you may have is that irq_balance is random, and sometimes more 
than one interrupt is serviced by the same cpu at the same.  Actually, let me 
clarify.  In your case if your netowrk load was "even" across the adapters, 
ideally you would want cpu0 handling the first 4 adapters and cpu1 handling 
the last 4 adapters.  With irq_balance, this is usually not the case.  There 
will be times where one cpu is doing more work than the other, possibly 
becomming a bottleneck.  

Now, there was some code in SuSE's kernel (SuSE 8.0, 2.4.18) which did a round 
robin static assingment of interrupt to cpu.  In your case, all even 
interrupt numbers would go to cpu0 and all odd interrupt numbers would go to 
cpu1.  Since you have exactly 4 adapters in even interrupts and 4 on odd 
interrupts, that would work perfectly.  Now, that doesn't mean there is some 
other problem, like PCI bandwidth, but it's a start.  Also, you might be able 
to emulate this with irq affinity (/proc/irq/<num>/smp_affnity) but last time 
I tried it on P4, it didn't work at all -No interrupts!

-Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Gigabit/SMP performance problem
@ 2003-01-03 16:12 Avery Fay
  2003-01-03 18:05 ` Martin J. Bligh
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-03 16:12 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm working with a dual xeon platform with 4 dual e1000 cards on different 
pci-x buses. I'm having trouble getting better performance with the second 
cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can route 
about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
(redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
around 90% utilization. This suggests to me that the network code is 
serialized. I would expect one of two things from my understanding of the 
2.4.x networking improvements (softirqs allowing execution on more than 
one cpu):

1.) with smp I would get ~2.9 gb/s but the combined cpu utilization would 
be that of one cpu at 90%.
2.) or with smp I would get more than ~2.9 gb/s.

Has anyone been able to utilize more than one cpu with pure forwarding?

Note: I realize that I am not using a stock kernel. I was in the past, but 
I ran into the same problem (smp not improving performance), just at lower 
speeds (redhat's kernel was faster). Therefore, this problem is neither 
introduced nor solved by redhat's kernel. If anyone has suggestions for 
improvements, I can move back to a stock kernel.

Note #2: I've tried tweaking a lot of different things including binding 
irq's to specific cpus, playing around with e1000 modules settings, etc.

Thanks in advance and please CC me with any suggestions as I'm not 
subscribed to the list.

Avery Fay

P.S. Only got one response on the linux-net list so I'm posting here. One 
thing I did learn from that response is that redhat's kernel is faster 
because they use a napi version of the e1000 driver.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 16:12 Avery Fay
@ 2003-01-03 18:05 ` Martin J. Bligh
  2003-01-03 21:49   ` Ron cooper
  2003-01-03 21:20 ` Robert Olsson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Martin J. Bligh @ 2003-01-03 18:05 UTC (permalink / raw)
  To: Avery Fay, linux-kernel

> I'm working with a dual xeon platform with 4 dual e1000 cards on different 
> pci-x buses. I'm having trouble getting better performance with the second 
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can route 
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
> around 90% utilization. This suggests to me that the network code is 
> serialized. I would expect one of two things from my understanding of the 
> 2.4.x networking improvements (softirqs allowing execution on more than 
> one cpu):
> 
> 1.) with smp I would get ~2.9 gb/s but the combined cpu utilization would 
> be that of one cpu at 90%.
> 2.) or with smp I would get more than ~2.9 gb/s.
> 
> Has anyone been able to utilize more than one cpu with pure forwarding?
> 
> Note: I realize that I am not using a stock kernel. I was in the past, but 
> I ran into the same problem (smp not improving performance), just at lower 
> speeds (redhat's kernel was faster). Therefore, this problem is neither 
> introduced nor solved by redhat's kernel. If anyone has suggestions for 
> improvements, I can move back to a stock kernel.
> 
> Note #2: I've tried tweaking a lot of different things including binding 
> irq's to specific cpus, playing around with e1000 modules settings, etc.
> 
> Thanks in advance and please CC me with any suggestions as I'm not 
> subscribed to the list.

Dual what Xeon? I presume a P4 thing. Can you cat /proc/interrupts? 
Are you using the irq_balance code? If so, I think you'll only use 
1 cpu to process all the interrupts from each gigabit card. Not that 
you have much choice, since Intel broke the P4's interrupt routing.

Which of the e1000 modules settings did you play with? tx_delay
and rx_delay? What rev of the e1000 chipset?

M.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 18:05 ` Martin J. Bligh
@ 2003-01-03 21:49   ` Ron cooper
  2003-01-03 21:47     ` Martin J. Bligh
  0 siblings, 1 reply; 25+ messages in thread
From: Ron cooper @ 2003-01-03 21:49 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

On Friday 03 January 2003 12:05 pm, Martin J. Bligh wrote:

> Dual what Xeon? I presume a P4 thing. Can you cat /proc/interrupts?
> Are you using the irq_balance code? If so, I think you'll only use
> 1 cpu to process all the interrupts from each gigabit card. Not that
> you have much choice, since Intel broke the P4's interrupt routing.
>

You got my attention with this statement.  I've have Dual Xeon Prestonias on 
an I860 chipset (IWill dp400).  cat  /proc/interrupts indeed shows CPU0 as 
processing all IRQ's instead of sharing them with CPU1 on a 2.4.x kernel.

Is there a work around for this, or is this *really* a problem?  Some say it 
might be a problem depending on how many interrupts need to be processed per 
second.  Others imply that cpu0 catching  the irq's might be a good thing.

I happen to have PIII's using VIA chipsets that dont have this issue with 
proc/interrupts.  This is very annonying, but I wonder if it is worth 
worrying about.

Ron.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 21:49   ` Ron cooper
@ 2003-01-03 21:47     ` Martin J. Bligh
  0 siblings, 0 replies; 25+ messages in thread
From: Martin J. Bligh @ 2003-01-03 21:47 UTC (permalink / raw)
  To: Ron cooper; +Cc: linux-kernel

>> Dual what Xeon? I presume a P4 thing. Can you cat /proc/interrupts?
>> Are you using the irq_balance code? If so, I think you'll only use
>> 1 cpu to process all the interrupts from each gigabit card. Not that
>> you have much choice, since Intel broke the P4's interrupt routing.
> 
> You got my attention with this statement.  I've have Dual Xeon Prestonias on 
> an I860 chipset (IWill dp400).  cat  /proc/interrupts indeed shows CPU0 as 
> processing all IRQ's instead of sharing them with CPU1 on a 2.4.x kernel.
> 
> Is there a work around for this, or is this *really* a problem?  Some say it 
> might be a problem depending on how many interrupts need to be processed per 
> second.  Others imply that cpu0 catching  the irq's might be a good thing.

Right - depends what you're doing. You can look at irq balance (in 2.5 
or 2.4-ac), but I don't like it as a solution much. Or you could try 
programming the TPR (were some patches floating around). Would be interesting
to get some perf measurments against people using the TPR patches (is more
expensive to set on a P4). Or someone from Intel posted some code recently
that seemed to do more intelligent things, but I haven't had the time to
look closely. If you want to experiment with that, I'm sure people would
be interested in the results.

> I happen to have PIII's using VIA chipsets that dont have this issue with 
> proc/interrupts.  This is very annonying, but I wonder if it is worth 
> worrying about.

P3's aren't as brain damaged.

M.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Gigabit/SMP performance problem
  2003-01-03 16:12 Avery Fay
  2003-01-03 18:05 ` Martin J. Bligh
@ 2003-01-03 21:20 ` Robert Olsson
  2003-01-04  3:33 ` Anton Blanchard
  2003-01-06 19:43 ` Jon Fraser
  3 siblings, 0 replies; 25+ messages in thread
From: Robert Olsson @ 2003-01-03 21:20 UTC (permalink / raw)
  To: Avery Fay; +Cc: linux-kernel

Avery Fay writes:
 > 
 > I'm working with a dual xeon platform with 4 dual e1000 cards on different 
 > pci-x buses. I'm having trouble getting better performance with the second 
 > cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can route 
 > about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
 > (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
 > around 90% utilization. This suggests to me that the network code is 
 > serialized. I would expect one of two things from my understanding of the 
 > 2.4.x networking improvements (softirqs allowing execution on more than 
 > one cpu):

 Well you have a gigabit router :-)

 How is your routing setup? Packet size?

 Also you'll never get increased performance of a single flow with SMP. 
 Aggregated performance possible at best. I've been fighting with for some 
 time too.

 You have some important data in /proc/net/softnet_stat which are per cpu
 packets received and "cpu collisions" should interest you.

 As far as I understand there no serialization in forwarding path except where
 it has to be -- when we add softirq's from different cpu into a single device.
 This seen in "cpu collisions"

 Also here we get into inherent SMP cache bouncing problem with TX interrupts
 When TX has skb's which are processed/created in different CPU's. Which CPU
 gonna take the interrupt? No matter how we do we run kfree we gona see a lot 
 of cache bouncing. For systems that have same in/out interface smp_affinity
 can be used. In practice this impossible for forwarding.

 And this bouncing hurts especially for small pakets....

 A litte TX test illustrates. Sender on cpu0.

 UP                      186 kpps
 SMP Aff to cpu0         160 kpps
 SMP Aff to cpu0, cpu1   124 kpps
 SMP Aff to cpu1         106 kpps

 We are playing some code that might decrease this problem.

 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 16:12 Avery Fay
  2003-01-03 18:05 ` Martin J. Bligh
  2003-01-03 21:20 ` Robert Olsson
@ 2003-01-04  3:33 ` Anton Blanchard
  2003-01-06 19:43 ` Jon Fraser
  3 siblings, 0 replies; 25+ messages in thread
From: Anton Blanchard @ 2003-01-04  3:33 UTC (permalink / raw)
  To: Avery Fay; +Cc: linux-kernel

 
> I'm working with a dual xeon platform with 4 dual e1000 cards on different 
> pci-x buses. I'm having trouble getting better performance with the second 
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can route 
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
> around 90% utilization. This suggests to me that the network code is 
> serialized. I would expect one of two things from my understanding of the 
> 2.4.x networking improvements (softirqs allowing execution on more than 
> one cpu):

The Fujitsu guys have a nice summary of this:

http://www.labs.fujitsu.com/en/techinfo/linux/lse-0211/index.html

Skip forward to page 8.

Dont blame the networking code just yet :) Notice how worse UP vs SMP
performance is on the P4 compared to the P3?

This brings up another point, is a single CPU with hyperthreading worth
it? As Rusty will tell you, you need to compare it with a UP kernel
since it avoids all the locking overhead. I suspect for a lot of cases
HT will be a loss (imagine your case, comparing UP and one CPU HT)

Anton

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Gigabit/SMP performance problem
  2003-01-03 16:12 Avery Fay
                   ` (2 preceding siblings ...)
  2003-01-04  3:33 ` Anton Blanchard
@ 2003-01-06 19:43 ` Jon Fraser
  3 siblings, 0 replies; 25+ messages in thread
From: Jon Fraser @ 2003-01-06 19:43 UTC (permalink / raw)
  To: linux-kernel, Avery Fay

What is your packet size?  How many packets/second are you  forwarding?

I did a lot of testing on 2.4.18 and 2.4.20 kernels with a couple of
different
hardware platforms, using 82543 and 82544 chipsets.  cache
contention/invalidates
due to locks, counters, and ring buffer access becomes the bottleneck.  I
actually verified the
the stats using the cpu performance counters.  As traffic goes up, cache
invalidate increase
and usefull cpu cycles decrease.

I found I was best off to bind the interrupts for each gig-e chip to a
different processor.
That way, only one cpu is accessing the data structures for that interface.
You also not
suffer from packet reordering if you bind the interrupts.

Also, be sure that you have the latest e1000 driver.   If the driver is
refilling the ring buffer
from a tasklet, find a later driver.

Play with the rx interrupt delay until you minimize the interrupts, if
you're not using NAPI.
Be aware that earlier intel chipsets have some problems.  I believe 82543
and earlier
have unreliable rx interrupt delay and can't use more that 256 ring buffers.

I don't have my numbers handy, but I believe I was able to achieve around
400 kpps, 64 byte size,
with a dual cpu dell box with I believe, 1ghz cpus.

By the way, your performance won't scale linearly with cpu speed.  We had a
2.4 ghz dual HT cpu
box from intel for a bit, and it didn't run that much faster.

You may want to search the archives for netdef@oss.sgi.com for some work
being done
on skbuff recycling.   I did some work along those lines, avoiding
constantly allocing
and freeing memory, and it made quite a difference.  It's been a month since
I last looked,
so there may be more progress.

If you happen to turn on vlans, I be curious about your results.  Our
chipsets produced
cisco ISL frames instead of 802.1q frames.  Intel admitted the chipset would
do it,
but 'shouldn't be doing that...'

    Jon

----- Original Message -----
From: "Avery Fay" <avery_fay@symantec.com>
To: <linux-kernel@vger.kernel.org>
Sent: Friday, January 03, 2003 11:12 AM
Subject: Gigabit/SMP performance problem

> Hello,
>
> I'm working with a dual xeon platform with 4 dual e1000 cards on different
> pci-x buses. I'm having trouble getting better performance with the second
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can route
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at
> around 90% utilization. This suggests to me that the network code is
> serialized. I would expect one of two things from my understanding of the
> 2.4.x networking improvements (softirqs allowing execution on more than
> one cpu):
>
> 1.) with smp I would get ~2.9 gb/s but the combined cpu utilization would
> be that of one cpu at 90%.
> 2.) or with smp I would get more than ~2.9 gb/s.
>
> Has anyone been able to utilize more than one cpu with pure forwarding?
>
> Note: I realize that I am not using a stock kernel. I was in the past, but
> I ran into the same problem (smp not improving performance), just at lower
> speeds (redhat's kernel was faster). Therefore, this problem is neither
> introduced nor solved by redhat's kernel. If anyone has suggestions for
> improvements, I can move back to a stock kernel.
>
> Note #2: I've tried tweaking a lot of different things including binding
> irq's to specific cpus, playing around with e1000 modules settings, etc.
>
> Thanks in advance and please CC me with any suggestions as I'm not
> subscribed to the list.
>
> Avery Fay
>
> P.S. Only got one response on the linux-net list so I'm posting here. One
> thing I did learn from that response is that redhat's kernel is faster
> because they use a napi version of the e1000 driver.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2003-01-09 12:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-06 20:29 Gigabit/SMP performance problem Avery Fay
2003-01-06 21:23 ` Martin J. Bligh
2003-01-07 17:19   ` Mike Black
  -- strict thread matches above, loose matches on Subject: below --
2003-01-08 21:44 Ronciak, John
2003-01-09 12:49 ` Robert Olsson
2003-01-08 21:12 Feldman, Scott
2003-01-08 12:17 Jon Burgess
2003-01-06 20:38 Avery Fay
2003-01-07 18:15 ` Robert Olsson
2003-01-06 20:33 Avery Fay
2003-01-06 20:25 Avery Fay
     [not found] <b8ce5e32.0301040439.7bdaa903@posting.google.com>
2003-01-06 18:27 ` Bill Davidsen
2003-01-06 19:09   ` Daniel Blueman
2003-01-06 19:26     ` Brian Tinsley
2003-01-03 20:25 Avery Fay
2003-01-03 21:19 ` Arjan van de Ven
2003-01-03 21:36 ` Martin J. Bligh
2003-01-03 22:31   ` Andrew Theurer
2003-01-03 16:12 Avery Fay
2003-01-03 18:05 ` Martin J. Bligh
2003-01-03 21:49   ` Ron cooper
2003-01-03 21:47     ` Martin J. Bligh
2003-01-03 21:20 ` Robert Olsson
2003-01-04  3:33 ` Anton Blanchard
2003-01-06 19:43 ` Jon Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox