public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:29 Avery Fay
  2003-01-06 21:23 ` Martin J. Bligh
  0 siblings, 1 reply; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:29 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

Well, judging by the fact that a UP kernel can route more traffic (and 
consequently more interrupts p/s) than an SMP kernel, I think that one cpu 
can probably handle all of the interrupts. Really the issue I'm trying to 
solve is not routing performance, but rather the fact that SMP routing 
performance is worse while using twice the cpu time (2 cpu's at around 95% 
vs. 1 at around 95%).

Avery Fay





"Martin J. Bligh" <mbligh@aracnet.com>
01/03/2003 04:36 PM

 
        To:     Avery Fay <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Re: Gigabit/SMP performance problem


P3's distributed interrupts round-robin amongst cpus. P4's send 
everything to CPU 0. If you put irq_balance on, it'll spread
them around, but any given interrupt is still only handled by
one CPU (as far as I understand the code). If you hammer one
adaptor, does that generate more interrupts than 1 cpu can handle?
(turn irq balance off by sticking a return at the top of balance_irq,
and hammer one link, see how much CPU power that burns).

M.




^ permalink raw reply	[flat|nested] 25+ messages in thread
* RE: Gigabit/SMP performance problem
@ 2003-01-08 21:44 Ronciak, John
  2003-01-09 12:49 ` Robert Olsson
  0 siblings, 1 reply; 25+ messages in thread
From: Ronciak, John @ 2003-01-08 21:44 UTC (permalink / raw)
  To: 'Robert Olsson', Avery Fay; +Cc: Anton Blanchard, linux-kernel

All,

We (Intel - LAN Access Division, e1000 driver) are taking a look at what is
going on here.  We don't have any data yet but we'll keep you posted on what
we find.

Thanks for your patients.

Cheers,
John



> -----Original Message-----
> From: Robert Olsson [mailto:Robert.Olsson@data.slu.se]
> Sent: Tuesday, January 07, 2003 10:16 AM
> To: Avery Fay
> Cc: Anton Blanchard; linux-kernel@vger.kernel.org
> Subject: Re: Gigabit/SMP performance problem
> 
> 
> 
> Avery Fay writes:
>  > Hmm. That paper is actually very interesting. I'm thinking 
> maybe with the 
>  > P4 I'm better off with only 1 cpu. WRT hyperthreading, I 
> actually disabled > it because it make performance worse 
> (wasn't clear in the original email).
> 
> 
>  With 1CPU-SMP-HT I'm on UP level of performance this with 
> forwarding two 
>  single flows evenly distributes between CPU's. So HT payed 
> the SMP cost so 
>  to say.
>  
>  Also I tested the MB bandwidth with new threaded version of 
> pktgen just 
>  TX'ing a packets on 6 GIGE I'm seeing almost 6 Gbit/s TX'ed 
> w 1500 bytes
>  packets.
> 
>  I have problem populating all slots w. GIGE NIC's. WoL (Wake 
> on Lan) this
>  is a real pain... Seems like my adapters needs a standby 
> current 0.8A and 
>  most Power Supplies gives 2.0A for this. (Number come from 
> SuperMicro). 
>  So booting fails radomlingy. You have 8 NIC's -- Didn't you 
> have problem?
> 
>  Anyway I'll guess profiling is needed?
> 
>  Cheers.
> 						--ro
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread
* RE: Gigabit/SMP performance problem
@ 2003-01-08 21:12 Feldman, Scott
  0 siblings, 0 replies; 25+ messages in thread
From: Feldman, Scott @ 2003-01-08 21:12 UTC (permalink / raw)
  To: Jon Fraser, linux-kernel, Avery Fay

> If you happen to turn on vlans, I be curious about your 
> results.  Our chipsets produced cisco ISL frames instead of 
> 802.1q frames.  Intel admitted the chipset would do it, but 
> 'shouldn't be doing that...'

This problem has been fixed in tot 2.4 and 2.5.  The VLANs were not
being restored after ifup.  

-scott

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Gigabit/SMP performance problem
@ 2003-01-08 12:17 Jon Burgess
  0 siblings, 0 replies; 25+ messages in thread
From: Jon Burgess @ 2003-01-08 12:17 UTC (permalink / raw)
  To: avery_fay; +Cc: linux-kernel



Avery Fay wrote:
> can probably handle all of the interrupts. Really the issue I'm
> trying to solve is not routing performance, but rather the fact
> that SMP routing performance is worse while using twice
> the cpu time (2 cpu's at around 95% vs. 1 at around 95%).

Please forgive me if this is a silly suggestion, but are you sure this is a real
95% utilisation in the 2 CPU case. I think some versions of top show 0..200% for
the 2 CPU case, and therefore 95% utilisation is represents a  real CPU
utilisation of 47.5%

     Jon



^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:38 Avery Fay
  2003-01-07 18:15 ` Robert Olsson
  0 siblings, 1 reply; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:38 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel

Hmm. That paper is actually very interesting. I'm thinking maybe with the 
P4 I'm better off with only 1 cpu. WRT hyperthreading, I actually disabled 
it because it make performance worse (wasn't clear in the original email).

Avery Fay





Anton Blanchard <anton@samba.org>
01/03/2003 10:33 PM

 
        To:     Avery Fay <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Re: Gigabit/SMP performance problem


 
> I'm working with a dual xeon platform with 4 dual e1000 cards on 
different 
> pci-x buses. I'm having trouble getting better performance with the 
second 
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can 
route 
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
> around 90% utilization. This suggests to me that the network code is 
> serialized. I would expect one of two things from my understanding of 
the 
> 2.4.x networking improvements (softirqs allowing execution on more than 
> one cpu):

The Fujitsu guys have a nice summary of this:

http://www.labs.fujitsu.com/en/techinfo/linux/lse-0211/index.html

Skip forward to page 8.

Dont blame the networking code just yet :) Notice how worse UP vs SMP
performance is on the P4 compared to the P3?

This brings up another point, is a single CPU with hyperthreading worth
it? As Rusty will tell you, you need to compare it with a UP kernel
since it avoids all the locking overhead. I suspect for a lot of cases
HT will be a loss (imagine your case, comparing UP and one CPU HT)

Anton




^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:33 Avery Fay
  0 siblings, 0 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:33 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel, Martin J. Bligh

The numbers I got are taking into account packet drops. I think that the 
point where performance starts to go down is when an interface is dropping 
more than a couple hundred packets per second (at least in my testing). In 
my testing scenario, traffic is perfectly distributed across interfaces 
and I have bound the irqs using smp_affinity. Unfortunately, the 
performance gain is small if any.

Avery Fay





Andrew Theurer <habanero@us.ibm.com>
01/03/2003 05:31 PM

 
        To:     "Martin J. Bligh" <mbligh@aracnet.com>, Avery Fay <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Re: Gigabit/SMP performance problem


On Friday 03 January 2003 15:36, Martin J. Bligh wrote:

...

Monitor for dropped packets when increasing int delay.  At least on the 
older 
e1000 adapters, you would get dropped packets, etc, making the problem 
worse 
in other areas. 
>
> Makes sense, increasing the delays should reduce the interrupt load.
>
> > I'm using 3 Intel PRO/1000 MT Dual Port Server adapters as well as 2
> > onboard Intel PRO/1000 ports. The adapters use the 82546EB chips. I
> > believe that the onboard ports use the same although I'm not sure.
> >
> > Should I get rid of IRQ load balancing? And what do you mean
> > "Intel broke the P4's interrupt routing"?
>
> P3's distributed interrupts round-robin amongst cpus. P4's send
> everything to CPU 0. If you put irq_balance on, it'll spread
> them around, but any given interrupt is still only handled by
> one CPU (as far as I understand the code). If you hammer one
> adaptor, does that generate more interrupts than 1 cpu can handle?
> (turn irq balance off by sticking a return at the top of balance_irq,
> and hammer one link, see how much CPU power that burns).

Another problem you may have is that irq_balance is random, and sometimes 
more 
than one interrupt is serviced by the same cpu at the same.  Actually, let 
me 
clarify.  In your case if your netowrk load was "even" across the 
adapters, 
ideally you would want cpu0 handling the first 4 adapters and cpu1 
handling 
the last 4 adapters.  With irq_balance, this is usually not the case. 
There 
will be times where one cpu is doing more work than the other, possibly 
becomming a bottleneck. 

Now, there was some code in SuSE's kernel (SuSE 8.0, 2.4.18) which did a 
round 
robin static assingment of interrupt to cpu.  In your case, all even 
interrupt numbers would go to cpu0 and all odd interrupt numbers would go 
to 
cpu1.  Since you have exactly 4 adapters in even interrupts and 4 on odd 
interrupts, that would work perfectly.  Now, that doesn't mean there is 
some 
other problem, like PCI bandwidth, but it's a start.  Also, you might be 
able 
to emulate this with irq affinity (/proc/irq/<num>/smp_affnity) but last 
time 
I tried it on P4, it didn't work at all -No interrupts!

-Andrew




^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Gigabit/SMP performance problem
@ 2003-01-06 20:25 Avery Fay
  0 siblings, 0 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-06 20:25 UTC (permalink / raw)
  To: Robert Olsson; +Cc: linux-kernel

Right now, I have 4 interfaces in and 4 interfaces out (ideal routing 
setup). I'm using just shy of 1500 byte udp packets for testing.

I tried binding the irqs for each pair of interfaces to a cpu... so for 
example, if eth0 to sending to eth2 they would be bound to the same cpu. 
This seemed to improve performance a little, but I didn't get definite 
numbers and it certainly wasn't much.

I'm currently playing around with UP kernels, but when I go back I'll 
check out softnet_stat

Avery Fay





Robert Olsson <Robert.Olsson@data.slu.se>
01/03/2003 04:20 PM

 
        To:     "Avery Fay" <avery_fay@symantec.com>
        cc:     linux-kernel@vger.kernel.org
        Subject:        Gigabit/SMP performance problem



Avery Fay writes:
 > 
 > I'm working with a dual xeon platform with 4 dual e1000 cards on 
different 
 > pci-x buses. I'm having trouble getting better performance with the 
second 
 > cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can 
route 
 > about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
 > (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
 > around 90% utilization. This suggests to me that the network code is 
 > serialized. I would expect one of two things from my understanding of 
the 
 > 2.4.x networking improvements (softirqs allowing execution on more than 

 > one cpu):

 Well you have a gigabit router :-)

 How is your routing setup? Packet size?

 Also you'll never get increased performance of a single flow with SMP. 
 Aggregated performance possible at best. I've been fighting with for some 

 time too.

 You have some important data in /proc/net/softnet_stat which are per cpu
 packets received and "cpu collisions" should interest you.

 As far as I understand there no serialization in forwarding path except 
where
 it has to be -- when we add softirq's from different cpu into a single 
device.
 This seen in "cpu collisions"

 Also here we get into inherent SMP cache bouncing problem with TX 
interrupts
 When TX has skb's which are processed/created in different CPU's. Which 
CPU
 gonna take the interrupt? No matter how we do we run kfree we gona see a 
lot 
 of cache bouncing. For systems that have same in/out interface 
smp_affinity
 can be used. In practice this impossible for forwarding.

 And this bouncing hurts especially for small pakets....

 A litte TX test illustrates. Sender on cpu0.

 UP                      186 kpps
 SMP Aff to cpu0         160 kpps
 SMP Aff to cpu0, cpu1   124 kpps
 SMP Aff to cpu1         106 kpps

 We are playing some code that might decrease this problem.


 Cheers.
                 --ro




^ permalink raw reply	[flat|nested] 25+ messages in thread
[parent not found: <b8ce5e32.0301040439.7bdaa903@posting.google.com>]
* Re: Gigabit/SMP performance problem
@ 2003-01-03 20:25 Avery Fay
  2003-01-03 21:19 ` Arjan van de Ven
  2003-01-03 21:36 ` Martin J. Bligh
  0 siblings, 2 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-03 20:25 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

Dual Pentium 4 Xeon at 2.4 Ghz. I believe I am using irq load balancing as 
shown below (seems to be applied to Red Hat's kernel). Here's 
/proc/interrupts:

           CPU0       CPU1 
  0:     179670     182501    IO-APIC-edge  timer
  1:        386        388    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          1          0    IO-APIC-edge  rtc
 12:          9          9    IO-APIC-edge  PS/2 Mouse
 14:       1698       1511    IO-APIC-edge  ide0
 24:    1300174    1298071   IO-APIC-level  eth2
 25:    1935085    1935625   IO-APIC-level  eth3
 28:    1162013    1162734   IO-APIC-level  eth4
 29:    1971246    1967758   IO-APIC-level  eth5
 48:    2753990    2753821   IO-APIC-level  eth0
 49:    2047386    2043894   IO-APIC-level  eth1
 72:     838987     841143   IO-APIC-level  eth6
 73:    2767885    2768307   IO-APIC-level  eth7
NMI:          0          0 
LOC:     362009     362008 
ERR:          0
MIS:          0

I started traffic at different times on the various interfaces so the 
number of interrupts per interface aren't uniform.

I modified RxIntDelay, TxIntDelay, RxAbsIntDelay, TxAbsIntDelay, 
FlowControl, RxDescriptors, TxDescriptors. Increasing the various 
IntDelays seemed to improve performance slightly.

I'm using 3 Intel PRO/1000 MT Dual Port Server adapters as well as 2 
onboard Intel PRO/1000 ports. The adapters use the 82546EB chips. I 
believe that the onboard ports use the same although I'm not sure.

Should I get rid of IRQ load balancing? And what do you mean "Intel broke the P4's interrupt routing"?

Thanks,
Avery Fay





"Martin J. Bligh" <mbligh@aracnet.com>
01/03/2003 01:05 PM

 
        To:     Avery Fay <avery_fay@symantec.com>, linux-kernel@vger.kernel.org
        cc: 
        Subject:        Re: Gigabit/SMP performance problem


> I'm working with a dual xeon platform with 4 dual e1000 cards on 
different 
> pci-x buses. I'm having trouble getting better performance with the 
second 
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can 
route 
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
> around 90% utilization. This suggests to me that the network code is 
> serialized. I would expect one of two things from my understanding of 
the 
> 2.4.x networking improvements (softirqs allowing execution on more than 
> one cpu):
> 
> 1.) with smp I would get ~2.9 gb/s but the combined cpu utilization 
would 
> be that of one cpu at 90%.
> 2.) or with smp I would get more than ~2.9 gb/s.
> 
> Has anyone been able to utilize more than one cpu with pure forwarding?
> 
> Note: I realize that I am not using a stock kernel. I was in the past, 
but 
> I ran into the same problem (smp not improving performance), just at 
lower 
> speeds (redhat's kernel was faster). Therefore, this problem is neither 
> introduced nor solved by redhat's kernel. If anyone has suggestions for 
> improvements, I can move back to a stock kernel.
> 
> Note #2: I've tried tweaking a lot of different things including binding 

> irq's to specific cpus, playing around with e1000 modules settings, etc.
> 
> Thanks in advance and please CC me with any suggestions as I'm not 
> subscribed to the list.

Dual what Xeon? I presume a P4 thing. Can you cat /proc/interrupts? 
Are you using the irq_balance code? If so, I think you'll only use 
1 cpu to process all the interrupts from each gigabit card. Not that 
you have much choice, since Intel broke the P4's interrupt routing.

Which of the e1000 modules settings did you play with? tx_delay
and rx_delay? What rev of the e1000 chipset?

M.





^ permalink raw reply	[flat|nested] 25+ messages in thread
* Gigabit/SMP performance problem
@ 2003-01-03 16:12 Avery Fay
  2003-01-03 18:05 ` Martin J. Bligh
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Avery Fay @ 2003-01-03 16:12 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm working with a dual xeon platform with 4 dual e1000 cards on different 
pci-x buses. I'm having trouble getting better performance with the second 
cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can route 
about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel 
(redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at 
around 90% utilization. This suggests to me that the network code is 
serialized. I would expect one of two things from my understanding of the 
2.4.x networking improvements (softirqs allowing execution on more than 
one cpu):

1.) with smp I would get ~2.9 gb/s but the combined cpu utilization would 
be that of one cpu at 90%.
2.) or with smp I would get more than ~2.9 gb/s.

Has anyone been able to utilize more than one cpu with pure forwarding?

Note: I realize that I am not using a stock kernel. I was in the past, but 
I ran into the same problem (smp not improving performance), just at lower 
speeds (redhat's kernel was faster). Therefore, this problem is neither 
introduced nor solved by redhat's kernel. If anyone has suggestions for 
improvements, I can move back to a stock kernel.

Note #2: I've tried tweaking a lot of different things including binding 
irq's to specific cpus, playing around with e1000 modules settings, etc.

Thanks in advance and please CC me with any suggestions as I'm not 
subscribed to the list.

Avery Fay

P.S. Only got one response on the linux-net list so I'm posting here. One 
thing I did learn from that response is that redhat's kernel is faster 
because they use a napi version of the e1000 driver.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2003-01-09 12:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-06 20:29 Gigabit/SMP performance problem Avery Fay
2003-01-06 21:23 ` Martin J. Bligh
2003-01-07 17:19   ` Mike Black
  -- strict thread matches above, loose matches on Subject: below --
2003-01-08 21:44 Ronciak, John
2003-01-09 12:49 ` Robert Olsson
2003-01-08 21:12 Feldman, Scott
2003-01-08 12:17 Jon Burgess
2003-01-06 20:38 Avery Fay
2003-01-07 18:15 ` Robert Olsson
2003-01-06 20:33 Avery Fay
2003-01-06 20:25 Avery Fay
     [not found] <b8ce5e32.0301040439.7bdaa903@posting.google.com>
2003-01-06 18:27 ` Bill Davidsen
2003-01-06 19:09   ` Daniel Blueman
2003-01-06 19:26     ` Brian Tinsley
2003-01-03 20:25 Avery Fay
2003-01-03 21:19 ` Arjan van de Ven
2003-01-03 21:36 ` Martin J. Bligh
2003-01-03 22:31   ` Andrew Theurer
2003-01-03 16:12 Avery Fay
2003-01-03 18:05 ` Martin J. Bligh
2003-01-03 21:49   ` Ron cooper
2003-01-03 21:47     ` Martin J. Bligh
2003-01-03 21:20 ` Robert Olsson
2003-01-04  3:33 ` Anton Blanchard
2003-01-06 19:43 ` Jon Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox