* Route cache performance tests
@ 2003-06-10 7:57 Simon Kirby
2003-06-10 11:23 ` Jamal Hadi
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Simon Kirby @ 2003-06-10 7:57 UTC (permalink / raw)
To: ralph+d, Jamal Hadi, CIT/Paul, 'David S. Miller',
fw@deneb.enyo.de
Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org
Okay, I got a chance to run some first tests and have found some simple
results that might be worth a read. The test setup is as follows
(I'll probably be using this setup for a number of other tests):
[ My work desktop, other test boxes on network ]
| | | | |
[ 100 Mbit Switch ]
|
| (100 Mbit)
|
[ Dual tg3 dual 1.4 GHz Opertron box, 1 GB RAM ]
|
| (1000 MBit)
|
[ Single e1000 single 2.4 GHz Xeon box ]
I have a route added on the test boxes to stuff traffic destined for the
Xeon box through the Opertron box. Forwarding is enabled on the Opertron
box, and it has a route for the Xeon box.
I am testing with Juno right now because it generates the (pseudo-)random
IP traffic which we is where the problem is right now. We already know
Linux can do hundreds of thousands of pps of ip<->ip traffic, so we can
test that later.
Juno seems to be able to send about 150,000 pps from my Celery desktop.
Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward
an amazing 39,000 packets per second. Woohoo! NAPI definitely kicks in
and seems to work even on SMP (blink?). The output of "rtstat -i 1" is
somewhat interesting. The "GC: tot" field seems to almost exactly match
the forwarded packet count, which is handy:
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
8 4 4 0 0 0 0 0 0 0 0 0 0 0 0
8 3 3 0 0 0 0 0 0 0 0 0 0 0 0
8 5 6 0 0 0 0 0 0 0 0 0 0 0 0
8 4 4 0 0 0 0 0 0 0 0 0 0 0 0
8 5 5 0 0 0 0 0 0 0 0 0 0 0 0
9 3 5 0 0 1 0 0 0 0 0 0 0 0 0
33549 11 65533 0 0 0 0 0 0 0 0 57347 57345 1 0
53499 13 65200 0 0 1 0 0 0 0 0 65196 65194 1 0
65536 19 65540 0 0 1 0 0 0 0 0 65538 64879 0 0
65536 11 33980 0 0 0 0 0 0 0 0 33978 6123 0 0
65536 9 37491 0 0 1 0 0 0 0 0 37489 930 0 0
65536 13 40487 0 0 0 0 0 0 0 0 40484 991 0 0
65536 13 39287 0 0 1 0 0 0 0 0 39284 933 0 0
65536 10 40790 0 0 1 0 0 0 0 0 40789 1006 0 0
65536 17 37783 0 0 0 0 0 0 0 0 37781 866 0 0
65536 8 38092 0 0 0 0 0 0 0 0 38090 880 0 0
65536 14 38086 0 0 1 0 0 0 0 0 38085 877 0 0
65536 13 39587 0 0 0 0 0 0 0 0 39586 922 0 0
65536 18 39882 0 0 1 0 0 0 0 0 39880 908 0 0
65536 8 39292 0 0 0 0 0 0 0 0 39290 894 0 0
65536 10 38390 0 0 4 0 0 0 0 0 38389 879 0 0
65536 13 38087 0 0 0 0 0 0 0 0 38086 830 0 0
65536 10 38692 0 0 0 0 0 0 0 0 38690 845 0 0
65536 16 38982 0 0 1 0 0 0 0 0 38981 899 0 0
The above is with stock settings. Note how the table completely fills up
causing the forward rate to suffer.
In an attempt to improve performance, I tried "echo 0 > gc_min_interval":
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
65536 15 39585 0 0 0 0 0 0 0 0 39585 909 0 0
65535 13 39587 0 0 1 0 0 0 0 0 39587 877 0 0
32027 10 70044 0 0 0 0 0 0 0 0 70043 0 6 0
32013 8 71092 0 0 0 0 0 0 0 0 71091 0 0 0
31995 10 72290 0 0 1 0 0 0 0 0 72290 0 0 0
31969 13 71087 0 0 2 0 0 0 0 0 71083 0 0 0
31950 5 71695 0 0 0 0 0 0 0 0 71693 0 0 0
31937 10 71690 0 0 2 0 0 0 0 0 71690 0 0 0
31927 10 71390 0 0 0 0 0 0 0 0 71389 0 0 0
31915 18 71382 0 0 0 0 0 0 0 0 71381 0 0 0
31897 5 71395 0 0 0 0 0 0 0 0 71394 0 0 0
31881 7 70793 0 0 0 0 0 0 0 0 70793 0 0 0
31869 5 71095 0 0 0 0 0 0 0 0 71094 0 0 0
31863 16 71084 0 0 0 0 0 0 0 0 71082 0 0 0
31846 22 70778 0 0 0 0 0 0 0 0 70776 0 0 0
31825 5 70795 0 0 1 0 0 0 0 0 70795 0 0 0
31816 10 70490 0 0 0 0 0 0 0 0 70488 0 0 0
And then decided to try "ip route flush cache":
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
31768 8 70192 0 0 0 0 0 0 0 0 70190 0 0 0
31757 15 70185 0 0 1 0 0 0 0 0 70184 0 0 0
31743 5 70495 0 0 1 0 0 0 0 0 70491 0 0 0
8204 2 83314 0 0 0 0 0 1 2 0 75524 0 89 0
8204 2 88859 0 0 0 0 0 1 0 0 88449 0 84 0
8203 3 85797 0 0 1 0 0 0 0 0 85795 0 0 0
8203 0 86100 0 0 0 0 0 0 0 0 86098 0 0 0
...And then I tried reducing gc_thresh:
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
8200 7 85793 0 0 1 0 0 0 0 0 85790 0 0 0
8200 4 85796 0 0 1 0 0 0 0 0 85792 0 0 0
8200 13 86087 0 0 0 0 0 0 0 0 86086 0 0 0
8200 3 86097 0 0 0 0 0 0 0 0 86096 0 0 0
1530 4 87896 0 0 0 0 0 0 0 0 87277 0 562 0
1370 0 135832 0 0 0 0 0 0 0 0 135829 0 617 0
1348 0 135952 0 0 2 0 0 0 0 0 135952 0 543 0
1341 0 135740 0 0 0 0 0 0 0 0 135739 0 529 0
1348 1 135817 0 0 1 0 0 0 0 0 135817 0 567 0
I tried fiddling with more settings, even setting gc_thresh to 1, but I
wasn't able to get the route cache much smaller than that or get it to
forward any more packets per second.
In any case, setting gc_min_interval to 0 definitely helped, but I
suspect Dave's patches will make a bigger difference. Next up is
2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
Simon-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-10 7:57 Route cache performance tests Simon Kirby
@ 2003-06-10 11:23 ` Jamal Hadi
2003-06-10 20:36 ` CIT/Paul
2003-06-10 13:34 ` Ralph Doncaster
2003-06-13 6:20 ` David S. Miller
2 siblings, 1 reply; 31+ messages in thread
From: Jamal Hadi @ 2003-06-10 11:23 UTC (permalink / raw)
To: Simon Kirby
Cc: ralph+d, CIT/Paul, 'David S. Miller', fw@deneb.enyo.de,
netdev@oss.sgi.com, linux-net@vger.kernel.org
On Tue, 10 Jun 2003, Simon Kirby wrote:
[some good stuff deleted]
Simon,
I havent looked at your data in details; i will. Someone like Robert
would be able to snuff it much faster than i do. I just wanna say thanks
for the effort, I will spend time catching up with you folks.
It is clear that our next hurudle is gc.
Do you have profiles for your data? Profiles would be nice to collect
as well.
> In any case, setting gc_min_interval to 0 definitely helped, but I
> suspect Dave's patches will make a bigger difference. Next up is
> 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
>
Also since you are doing all that work post the kernels somewhere so
people like foo can grab them and test as well.
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-10 7:57 Route cache performance tests Simon Kirby
2003-06-10 11:23 ` Jamal Hadi
@ 2003-06-10 13:34 ` Ralph Doncaster
2003-06-10 13:39 ` Jamal Hadi
2003-06-13 6:20 ` David S. Miller
2 siblings, 1 reply; 31+ messages in thread
From: Ralph Doncaster @ 2003-06-10 13:34 UTC (permalink / raw)
To: Simon Kirby
Cc: Jamal Hadi, CIT/Paul, 'David S. Miller', fw@deneb.enyo.de,
netdev@oss.sgi.com, linux-net@vger.kernel.org
On Tue, 10 Jun 2003, Simon Kirby wrote:
> Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward
> an amazing 39,000 packets per second. Woohoo!
I hope that's sarcasm. I know if you posted to NANOG saying it took a
dual 1.4Ghz Opteron to route 39kpps under linux you'd be laughed off the
list. Maybe I should be bragging about my 3-minute lap times on the
Shannonville track in my M5!
-Ralph
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-10 13:34 ` Ralph Doncaster
@ 2003-06-10 13:39 ` Jamal Hadi
0 siblings, 0 replies; 31+ messages in thread
From: Jamal Hadi @ 2003-06-10 13:39 UTC (permalink / raw)
To: ralph+d
Cc: Simon Kirby, CIT/Paul, 'David S. Miller',
fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org
On Tue, 10 Jun 2003, Ralph Doncaster wrote:
> I hope that's sarcasm. I know if you posted to NANOG saying it took a
> dual 1.4Ghz Opteron to route 39kpps under linux you'd be laughed off the
> list. Maybe I should be bragging about my 3-minute lap times on the
> Shannonville track in my M5!
Ralph,
Take a look at the sprint core routers i posted earlier. See how much
data they actually route ;->
This are damn expensive routers btw with OC48 interface.
On your other comment on being able to get rid of the route cache when you
dont need it, i am actually indifferent. You may have a point.
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: Route cache performance tests
2003-06-10 11:23 ` Jamal Hadi
@ 2003-06-10 20:36 ` CIT/Paul
0 siblings, 0 replies; 31+ messages in thread
From: CIT/Paul @ 2003-06-10 20:36 UTC (permalink / raw)
To: 'Jamal Hadi', 'Simon Kirby'
Cc: ralph+d, 'David S. Miller', fw, netdev, linux-net
I'd be happy to set up a repository ftp site or maybe even some cvs
servers so all of us can test
All these things and share data. We are an ISP so it wouldn't be too
hard to just pop up another server
To store all this :> Let me know
Paul xerox@foonet.net http://www.httpd.net
-----Original Message-----
From: Jamal Hadi [mailto:hadi@shell.cyberus.ca]
Sent: Tuesday, June 10, 2003 7:23 AM
To: Simon Kirby
Cc: ralph+d@istop.com; CIT/Paul; 'David S. Miller'; fw@deneb.enyo.de;
netdev@oss.sgi.com; linux-net@vger.kernel.org
Subject: Re: Route cache performance tests
On Tue, 10 Jun 2003, Simon Kirby wrote:
[some good stuff deleted]
Simon,
I havent looked at your data in details; i will. Someone like Robert
would be able to snuff it much faster than i do. I just wanna say thanks
for the effort, I will spend time catching up with you folks. It is
clear that our next hurudle is gc. Do you have profiles for your data?
Profiles would be nice to collect as well.
> In any case, setting gc_min_interval to 0 definitely helped, but I
> suspect Dave's patches will make a bigger difference. Next up is
> 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
>
Also since you are doing all that work post the kernels somewhere so
people like foo can grab them and test as well.
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-10 7:57 Route cache performance tests Simon Kirby
2003-06-10 11:23 ` Jamal Hadi
2003-06-10 13:34 ` Ralph Doncaster
@ 2003-06-13 6:20 ` David S. Miller
2003-06-16 22:37 ` Simon Kirby
2 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-13 6:20 UTC (permalink / raw)
To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
From: Simon Kirby <sim@netnation.com>
Date: Tue, 10 Jun 2003 00:57:32 -0700
In any case, setting gc_min_interval to 0 definitely helped, but I
suspect Dave's patches will make a bigger difference. Next up is
2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
Did you get stuck in some mud? :-) It's been two days.
I even posted new patches for you to test, get on it :)))
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-13 6:20 ` David S. Miller
@ 2003-06-16 22:37 ` Simon Kirby
2003-06-16 22:44 ` David S. Miller
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 22:37 UTC (permalink / raw)
To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
On Thu, Jun 12, 2003 at 11:20:02PM -0700, David S. Miller wrote:
> In any case, setting gc_min_interval to 0 definitely helped, but I
> suspect Dave's patches will make a bigger difference. Next up is
> 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
>
> Did you get stuck in some mud? :-) It's been two days.
>
> I even posted new patches for you to test, get on it :)))
Ok, I dug myself out. :)
I have oprofile working, and I wrote a simple application to measure
received pps on the receiving box with gettimeofday() accuracy. To
reduce noise I am profiling for one minute periods. The sender is
capable of sending about 315,000 pps via an e1000.
So, which kernels shall I try? When I set the thing up I was using
2.5.70-bk14, but I am compiling 2.4.71, and I will try with your patch
above and with Alexey's.
Stock 2.4.21-rc7 (CONFIG_IP_MULTIPLE_TABLES=y):
60.0047 seconds passed, avg forwarding rate: 122169.980 pps
60.0052 seconds passed, avg forwarding rate: 123650.166 pps
60.0045 seconds passed, avg forwarding rate: 122352.499 pps
60.0059 seconds passed, avg forwarding rate: 121830.346 pps
60.0046 seconds passed, avg forwarding rate: 121714.614 pps
60.0057 seconds passed, avg forwarding rate: 121927.324 pps
60.0061 seconds passed, avg forwarding rate: 121995.740 pps
60.0064 seconds passed, avg forwarding rate: 122168.417 pps
60.0030 seconds passed, avg forwarding rate: 123245.149 pps
60.0062 seconds passed, avg forwarding rate: 122613.361 pps
(CPU type is still Opteron, oprofile is wrong. :))
Cpu type: Hammer
Cpu speed was (MHz estimation) : 1393.98
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c0278b00 4537 12.6063 fn_hash_lookup
c027a1d0 3293 9.14976 fib_lookup
c024aa70 2989 8.30508 rt_intern_hash
c024c7d0 2195 6.09892 ip_route_input
c024c000 2020 5.61267 ip_route_input_slow
c024ed20 1244 3.45652 ip_rcv
c0247650 1234 3.42873 eth_header
c0276490 1226 3.4065 fib_validate_source
c024a710 1173 3.25924 rt_garbage_collect
c0132ee0 930 2.58405 kmalloc
c0242f60 924 2.56738 neigh_lookup
c02502a0 915 2.54237 ip_forward
c0132da0 875 2.43123 kmem_cache_alloc
c02444e0 735 2.04223 neigh_resolve_output
c0249f70 734 2.03946 rt_hash_code
c0133070 717 1.99222 kmem_cache_free
c024bcd0 666 1.85051 rt_set_nexthop
c023bfe0 632 1.75604 __kfree_skb
c02778e0 620 1.7227 fib_semantic_match
c023bce0 611 1.69769 alloc_skb
c0247f30 529 1.46985 pfifo_fast_dequeue
c0247830 524 1.45596 eth_type_trans
c02404b0 521 1.44762 netif_receive_skb
c0132cb0 520 1.44485 free_block
c0247a80 487 1.35315 qdisc_restart
c01330f0 466 1.2948 kfree
c02426e0 456 1.26702 dst_alloc
c023fd60 454 1.26146 dev_queue_xmit
c0132bd0 401 1.1142 kmem_cache_alloc_batch
c0272740 370 1.02806 inet_select_addr
c0252e80 353 0.980828 ip_finish_output
c0244350 325 0.903029 neigh_hh_init
c0242840 321 0.891914 dst_destroy
c010d5c0 315 0.875243 do_gettimeofday
c0247ec0 276 0.76688 pfifo_fast_enqueue
c024baa0 245 0.680745 ipv4_dst_destroy
c023bf70 216 0.600167 kfree_skbmem
c0120160 210 0.583495 cpu_raise_softirq
c026f1e0 156 0.433454 arp_bind_neighbour
c0277990 136 0.377883 __fib_res_prefsrc
c027a020 126 0.350097 fib_rules_policy
c0279d60 96 0.266741 fib_rule_put
c0240370 68 0.188941 net_tx_action
c026ec10 37 0.102806 arp_hash
c023bf00 34 0.0944707 skb_release_data
c0240770 14 0.0388997 net_rx_action
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
118130 56 123028 0 0 0 0 0 0 0 0 123028 123024 0 0
109456 58 122426 0 0 0 0 0 0 0 0 122426 122422 0 0
113318 52 124832 0 0 0 0 0 0 0 0 124832 124828 0 0
104356 53 122131 0 0 0 0 0 0 0 0 122131 122127 0 0
98333 56 125064 0 0 0 0 0 0 0 0 125064 125060 0 0
125925 57 125899 0 0 0 0 0 0 0 0 125899 125896 0 0
117516 44 122676 0 0 0 0 0 0 0 0 122676 122672 0 0
121088 48 124472 0 0 0 0 0 0 0 0 124472 124468 0 0
113049 43 123041 0 0 0 0 0 0 0 0 123041 123037 0 0
104339 43 122377 0 0 0 0 0 0 0 0 122377 122373 0 0
98324 46 125074 0 0 0 0 0 0 0 0 125074 125070 0 0
126818 40 126816 0 0 0 0 0 0 0 0 126816 126813 0 0
131018 54 124958 0 0 0 0 0 0 0 0 124958 124954 0 0
122303 51 122369 0 0 0 0 0 0 0 0 122369 122365 0 0
113661 46 122438 0 0 0 0 0 0 0 0 122438 122434 0 0
104350 49 121771 0 0 0 0 0 0 0 0 121771 121767 0 0
98330 58 125062 0 0 0 0 0 0 0 0 125062 125058 0 0
102841 36 124675 0 0 0 0 0 0 0 0 124675 124671 0 0
131031 49 126507 0 0 0 0 0 0 0 0 126507 126504 0 0
Stock 2.5.70-bk14 (CONFIG_IP_MULTIPLE_TABLES=y):
(There is some noise in forward pps rate because the route cache keeps
ballooning and collapsing all over the place.)
60.0042 seconds passed, avg forwarding rate: 102595.362 pps
60.0039 seconds passed, avg forwarding rate: 102690.418 pps
60.0043 seconds passed, avg forwarding rate: 102254.257 pps
60.0036 seconds passed, avg forwarding rate: 102708.344 pps
60.0052 seconds passed, avg forwarding rate: 102647.544 pps
60.0036 seconds passed, avg forwarding rate: 102697.595 pps
60.0042 seconds passed, avg forwarding rate: 102326.652 pps
60.0043 seconds passed, avg forwarding rate: 102615.183 pps
60.0081 seconds passed, avg forwarding rate: 101990.399 pps
60.0036 seconds passed, avg forwarding rate: 102220.386 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c0290480 8213 15.6283 rt_garbage_collect
c011ee80 4979 9.47443 local_bh_enable
c02bf140 3616 6.8808 fn_hash_lookup
c02906d0 1977 3.76199 rt_intern_hash
c0291b10 1949 3.70871 ip_route_input_slow
c028d1d0 1647 3.13404 nf_iterate
c028d4c0 1575 2.99703 nf_hook_slow
c0220c10 1513 2.87905 tg3_start_xmit
c02c0a00 1470 2.79723 fib_lookup
c02923e0 1169 2.22446 ip_route_input
c0220010 895 1.70308 tg3_rx
c0134e20 877 1.66882 kmem_cache_free
c0134d60 863 1.64218 kmem_cache_alloc
c028e270 855 1.62696 pfifo_fast_dequeue
c0134bf0 855 1.62696 free_block
c02bcf30 837 1.59271 fib_validate_source
c0294450 833 1.5851 ip_rcv_finish
c0286b20 782 1.48805 netif_receive_skb
c028db70 765 1.4557 eth_header
c0293ee0 750 1.42716 ip_rcv
c0290190 727 1.38339 rt_may_expire
c028fda0 689 1.31108 rt_hash_code
c0295210 679 1.29205 ip_forward
c0134a30 648 1.23306 cache_alloc_refill
c0134da0 636 1.21023 kmalloc
c0298870 591 1.1246 ip_finish_output2
c0134e60 520 0.989496 kfree
c01ad3d0 504 0.95905 memcpy
c028a860 498 0.947633 neigh_resolve_output
c0282c90 477 0.907672 alloc_skb
c02c08e0 474 0.901964 fib_rules_policy
c02be300 474 0.901964 fib_semantic_match
c0289870 471 0.896255 neigh_lookup
c028dce0 466 0.886741 eth_type_trans
c0289160 445 0.84678 dst_alloc
c0295450 442 0.841072 ip_forward_finish
c02865e0 407 0.774471 dev_queue_xmit
c02b7950 383 0.728802 inet_select_addr
c0289290 344 0.65459 dst_destroy
c0296790 325 0.618435 ip_finish_output
c02b4140 320 0.608921 arp_hash
c021fc50 317 0.603212 tg3_tx
c02d0090 313 0.595601 ipv4_sabotage_out
c028e1f0 306 0.58228 pfifo_fast_enqueue
c021fec0 296 0.563252 tg3_recycle_rx
c028a6e0 294 0.559446 neigh_hh_init
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
5402 29 82379 0 0 0 0 0 0 0 0 76979 76401 575 575
131075 29 125867 0 0 0 0 0 0 0 0 123076 122879 194 194
117660 26 118554 0 0 0 0 0 0 0 0 110363 109467 896 896
86134 22 100138 0 0 0 0 0 0 0 0 91947 91353 591 591
48682 28 94224 0 0 0 0 0 0 0 0 86033 85427 603 603
3419 28 86216 0 0 0 0 0 0 0 0 82916 82390 523 523
131075 25 127871 0 0 0 0 0 0 0 0 122980 122879 98 98
116937 33 117383 0 0 0 0 0 0 0 0 109192 108744 448 448
83306 17 98079 0 0 0 0 0 0 0 0 89888 89248 637 637
43585 17 91951 0 0 0 0 0 0 0 0 83760 83158 599 599
715 23 88681 0 0 0 0 0 0 0 0 88081 87487 591 591
104407 37 134703 0 0 0 0 0 0 0 0 127112 127111 0 0
67650 14 94902 0 0 0 0 0 0 0 0 86711 86122 587 587
23861 25 87875 0 0 0 0 0 0 0 0 79684 79090 591 591
670 26 108386 0 0 0 0 0 0 0 0 107786 107211 572 572
131075 38 130550 0 0 0 0 0 0 0 0 122959 122879 77 77
99588 14 108546 0 0 0 0 0 0 0 0 100355 99586 768 768
61804 27 93912 0 0 0 0 0 0 0 0 85721 85095 624 624
14826 27 84721 0 0 0 0 0 0 0 0 76530 75901 626 626
2.4.71 (CONFIG_IP_MULTIPLE_TABLES=y) w/correction to make
flow_cache_init compile w/CONFIG_SMP=n (register_cpu_notifier):
60.0060 seconds passed, avg forwarding rate: 103857.780 pps
60.0036 seconds passed, avg forwarding rate: 104893.408 pps
60.0061 seconds passed, avg forwarding rate: 104623.946 pps
60.0040 seconds passed, avg forwarding rate: 104457.440 pps
60.0057 seconds passed, avg forwarding rate: 104505.375 pps
60.0042 seconds passed, avg forwarding rate: 103663.532 pps
60.0043 seconds passed, avg forwarding rate: 104240.425 pps
60.0042 seconds passed, avg forwarding rate: 104422.699 pps
60.0034 seconds passed, avg forwarding rate: 104252.729 pps
60.0058 seconds passed, avg forwarding rate: 104138.597 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask)
count 697000
vma samples % symbol name
c0292270 8941 16.2664 rt_garbage_collect
c011f080 5295 9.63323 local_bh_enable
c02c0fb0 3943 7.17353 fn_hash_lookup
c02924c0 2209 4.01885 rt_intern_hash
c0293900 2115 3.84783 ip_route_input_slow
c028ef90 1725 3.1383 nf_iterate
c028f280 1673 3.0437 nf_hook_slow
c02c2880 1536 2.79445 fib_lookup
c02941d0 1381 2.51246 ip_route_input
c0222330 1307 2.37783 tg3_start_xmit
c0296240 1000 1.81931 ip_rcv_finish
c0134ff0 961 1.74835 free_block
c0221710 918 1.67012 tg3_rx
c0290030 909 1.65375 pfifo_fast_dequeue
c0135230 861 1.56642 kmem_cache_free
c0295cd0 844 1.53549 ip_rcv
c02bed90 835 1.51912 fib_validate_source
c0135170 822 1.49547 kmem_cache_alloc
c028f930 818 1.48819 eth_header
c0291f80 741 1.34811 rt_may_expire
c02886a0 726 1.32082 netif_receive_skb
c0291b80 708 1.28807 rt_hash_code
c0134e20 684 1.24441 cache_alloc_refill
c01351b0 644 1.17163 __kmalloc
c028b610 595 1.08249 neigh_lookup
c028c600 542 0.986064 neigh_resolve_output
c0284620 538 0.978787 alloc_skb
c029a680 534 0.97151 ip_finish_output2
c0135270 510 0.927846 kfree
c01adc80 484 0.880544 memcpy
c028af00 468 0.851435 dst_alloc
c02c0160 438 0.796856 fib_semantic_match
c028faa0 434 0.789579 eth_type_trans
c0297020 419 0.762289 ip_forward
c0288160 418 0.76047 dev_queue_xmit
c02c2760 398 0.724084 fib_rules_policy
c0297260 394 0.716807 ip_forward_finish
c02b9790 391 0.711349 inet_select_addr
c0221e90 374 0.680421 tg3_set_txd
c028ffb0 348 0.633119 pfifo_fast_enqueue
c028fcc0 339 0.616745 qdisc_restart
c028b030 326 0.593094 dst_destroy
c02d1fb0 301 0.547611 ipv4_sabotage_out
c02985a0 299 0.543973 ip_finish_output
c0128a00 293 0.533057 call_rcu
c0221350 284 0.516683 tg3_tx
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
26127 24 92112 0 0 0 0 0 0 0 0 83920 83318 599 599
722 32 106131 0 0 0 0 0 0 0 0 105531 104945 583 583
131075 23 130793 0 0 0 0 0 0 0 0 123201 122879 319 319
131074 27 131449 0 0 0 0 0 0 0 0 123257 122878 376 376
120198 23 120965 0 0 0 0 0 0 0 0 112773 112005 768 768
93359 26 104746 0 0 0 0 0 0 0 0 96554 96040 511 511
59620 17 97943 0 0 0 0 0 0 0 0 89751 89140 608 608
18593 20 90640 0 0 0 0 0 0 0 0 82448 81851 593 593
70 23 113117 0 0 0 0 0 0 0 0 113117 112479 636 636
131075 22 131306 0 0 0 0 0 0 0 0 123114 122879 232 232
110767 18 111534 0 0 0 0 0 0 0 0 103342 102574 768 768
79859 20 100776 0 0 0 0 0 0 0 0 92584 91971 610 610
43481 24 95328 0 0 0 0 0 0 0 0 87136 86500 632 632
704 34 88794 0 0 0 0 0 0 0 0 88194 87591 601 601
131075 36 130692 0 0 0 0 0 0 0 0 123100 122879 218 218
110844 25 111611 0 0 0 0 0 0 0 0 103419 102651 768 768
80008 18 100862 0 0 0 0 0 0 0 0 92670 92043 624 624
43390 24 95064 0 0 0 0 0 0 0 0 86872 86260 608 608
720 31 88869 0 0 0 0 0 0 0 0 88269 87682 585 585
2.4.71 (CONFIG_IP_MULTIPLE_TABLES=n)
60.0039 seconds passed, avg forwarding rate: 108482.881 pps
60.0036 seconds passed, avg forwarding rate: 107850.012 pps
60.0043 seconds passed, avg forwarding rate: 108330.941 pps
60.0063 seconds passed, avg forwarding rate: 108424.657 pps
60.0071 seconds passed, avg forwarding rate: 108575.916 pps
60.0040 seconds passed, avg forwarding rate: 107774.861 pps
60.0053 seconds passed, avg forwarding rate: 107765.720 pps
60.0065 seconds passed, avg forwarding rate: 108021.888 pps
60.0039 seconds passed, avg forwarding rate: 107364.055 pps
60.0061 seconds passed, avg forwarding rate: 107593.173 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c0292260 7149 13.2856 rt_garbage_collect
c011f080 6875 12.7764 local_bh_enable
c02c0df0 3440 6.39286 fn_hash_lookup
c02924b0 2180 4.05129 rt_intern_hash
c02938c0 2158 4.01041 ip_route_input_slow
c028ef90 1769 3.28749 nf_iterate
c0222330 1644 3.05519 tg3_start_xmit
c028f280 1601 2.97528 nf_hook_slow
c02940f0 1209 2.24679 ip_route_input
c02bec10 1187 2.20591 fib_validate_source
c0135230 987 1.83423 kmem_cache_free
c0134ff0 950 1.76547 free_block
c0135170 924 1.71715 kmem_cache_alloc
c0295c00 918 1.706 ip_rcv
c0221710 908 1.68742 tg3_rx
c0290020 907 1.68556 pfifo_fast_dequeue
c0296170 897 1.66698 ip_rcv_finish
c028f920 808 1.50158 eth_header
c0291b70 804 1.49415 rt_hash_code
c02886a0 740 1.37521 netif_receive_skb
c0134e20 739 1.37335 cache_alloc_refill
c0291f70 736 1.36778 rt_may_expire
c0296f50 703 1.30645 ip_forward
c028b610 676 1.25627 neigh_lookup
c01351b0 669 1.24326 __kmalloc
c02bffa0 637 1.18379 fib_semantic_match
c029a5b0 635 1.18008 ip_finish_output2
c0135270 614 1.14105 kfree
c0284620 544 1.01096 alloc_skb
c028c600 515 0.957071 neigh_resolve_output
c01adc80 498 0.925479 memcpy
c028fa90 464 0.862293 eth_type_trans
c0288160 457 0.849285 dev_queue_xmit
c0297190 425 0.789816 ip_forward_finish
c028af00 420 0.780524 dst_alloc
c02b9680 411 0.763799 inet_select_addr
c028ffa0 350 0.650437 pfifo_fast_enqueue
c028b030 350 0.650437 dst_destroy
c02984d0 330 0.613269 ip_finish_output
c0221350 326 0.605835 tg3_tx
c0128a00 324 0.602119 call_rcu
c028fcb0 323 0.60026 qdisc_restart
c028c480 323 0.60026 neigh_hh_init
c02215c0 291 0.540792 tg3_recycle_rx
c02d0e80 258 0.479465 ipv4_sabotage_in
c02d0ec0 257 0.477606 ipv4_sabotage_out
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
103545 28 107316 0 0 0 0 0 0 0 0 99124 98495 626 626
76151 24 104300 0 0 0 0 0 0 0 0 96108 95485 620 620
10200 26 83194 0 0 0 0 0 0 0 0 75002 74396 603 603
702 38 122078 0 0 0 0 0 0 0 0 121478 120872 603 603
131075 39 130952 0 0 0 0 0 0 0 0 123360 122879 478 478
126869 28 126996 0 0 0 0 0 0 0 0 118804 118676 128 128
85330 22 107046 0 0 0 0 0 0 0 0 98854 98259 591 591
50739 22 97102 0 0 0 0 0 0 0 0 88910 88288 620 620
10501 17 91459 0 0 0 0 0 0 0 0 83267 82641 623 623
689 34 121790 0 0 0 0 0 0 0 0 121190 120571 616 616
131075 33 130963 0 0 0 0 0 0 0 0 123371 122879 489 489
110335 44 126916 0 0 0 0 0 0 0 0 118724 118659 64 64
81862 16 103228 0 0 0 0 0 0 0 0 95036 94406 628 628
50717 24 100536 0 0 0 0 0 0 0 0 92344 91734 607 607
12301 33 93251 0 0 0 0 0 0 0 0 85059 84463 593 593
684 33 119995 0 0 0 0 0 0 0 0 119395 118771 621 621
131074 26 146318 0 0 0 0 0 0 0 0 138726 138610 113 113
114248 25 115015 0 0 0 0 0 0 0 0 106823 106055 768 768
88917 18 106318 0 0 0 0 0 0 0 0 98126 97548 575 575
57970 24 100752 0 0 0 0 0 0 0 0 92560 91932 625 625
2.5.71-davem-rtcache-jun9:
60.006 seconds passed, avg forwarding rate: 160182.941 pps
60.0077 seconds passed, avg forwarding rate: 159805.476 pps
60.007 seconds passed, avg forwarding rate: 160274.907 pps
60.0084 seconds passed, avg forwarding rate: 160212.101 pps
60.0045 seconds passed, avg forwarding rate: 159345.161 pps
60.0076 seconds passed, avg forwarding rate: 159552.768 pps
60.0046 seconds passed, avg forwarding rate: 159416.702 pps
60.0035 seconds passed, avg forwarding rate: 160435.829 pps
60.0043 seconds passed, avg forwarding rate: 160015.150 pps
60.0072 seconds passed, avg forwarding rate: 159309.661 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c02c0fa0 4875 8.75586 fn_hash_lookup
c02939f0 3839 6.89513 ip_route_input_slow
c028f040 2507 4.50276 nf_iterate
c028f330 2450 4.40038 nf_hook_slow
c0222330 2377 4.26927 tg3_start_xmit
c02bedc0 1722 3.09284 fib_validate_source
c0134ff0 1478 2.6546 free_block
c0292560 1444 2.59353 rt_intern_hash
c0221710 1409 2.53067 tg3_rx
c0135230 1406 2.52528 kmem_cache_free
c0296320 1399 2.51271 ip_rcv_finish
c0294260 1281 2.30077 ip_route_input
c02886a0 1235 2.21815 netif_receive_skb
c028f9d0 1194 2.14451 eth_header
c028af00 1153 2.07087 dst_alloc
c0291c20 1120 2.0116 rt_hash_code
c0295db0 1111 1.99544 ip_rcv
c0135170 1109 1.99185 kmem_cache_alloc
c0134e20 1109 1.99185 cache_alloc_refill
c02900d0 1016 1.82481 pfifo_fast_dequeue
c028b6c0 1014 1.82122 neigh_lookup
c01351b0 1003 1.80146 __kmalloc
c02c0150 894 1.60569 fib_semantic_match
c01adc80 838 1.50511 memcpy
c029a760 837 1.50331 ip_finish_output2
c028c6b0 801 1.43866 neigh_resolve_output
c0288160 792 1.42249 dev_queue_xmit
c0135270 789 1.4171 kfree
c0284620 740 1.32909 alloc_skb
c011f080 661 1.1872 local_bh_enable
c028fb40 656 1.17822 eth_type_trans
c02b9830 648 1.16386 inet_select_addr
c0297340 601 1.07944 ip_forward_finish
c02929d0 591 1.06148 __rt_hash_shrink
c0290050 540 0.96988 pfifo_fast_enqueue
c028b0e0 532 0.955511 dst_destroy
c0298680 511 0.917794 ip_finish_output
c0128a00 501 0.899833 call_rcu
c0297100 479 0.860319 ip_forward
c02d1070 478 0.858523 ipv4_sabotage_out
c0221350 451 0.810029 tg3_tx
c028c530 431 0.774108 neigh_hh_init
c02215c0 426 0.765127 tg3_recycle_rx
c028fd60 394 0.707653 qdisc_restart
c02d1030 375 0.673528 ipv4_sabotage_in
c0292310 359 0.64479 rt_garbage_collect
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
21502 9 160507 0 0 0 0 0 0 0 0 160507 160507 0 0
23701 15 160485 0 0 0 0 0 0 0 0 160485 160484 1 0
21866 6 160498 0 0 0 0 0 0 0 0 160498 160498 0 0
23551 12 160464 0 0 0 0 0 0 0 0 160464 160464 0 0
23266 13 160203 0 0 0 0 0 0 0 0 160203 160203 0 0
22095 9 160591 0 0 0 0 0 0 0 0 160591 160591 0 0
23962 15 160461 0 0 0 0 0 0 0 0 160461 160460 0 0
22066 13 158691 0 0 0 0 0 0 0 0 158691 158691 0 0
22951 10 160166 0 0 0 0 0 0 0 0 160166 160166 0 0
21861 18 159134 0 0 0 0 0 0 0 0 159134 159134 0 0
21097 6 159126 0 0 0 0 0 0 0 0 159126 159125 1 0
22943 6 161350 0 0 0 0 0 0 0 0 161350 161350 0 0
21692 4 160124 0 0 0 0 0 0 0 0 160124 160124 0 0
23524 16 161184 0 0 0 0 0 0 0 0 161184 161184 0 0
20471 15 160833 0 0 0 0 0 0 0 0 160833 160833 0 0
23160 5 161643 0 0 0 0 0 0 0 0 161643 161642 0 0
21981 10 160518 0 0 0 0 0 0 0 0 160518 160518 0 0
20640 11 160145 0 0 0 0 0 0 0 0 160145 160145 0 0
21536 14 160194 0 0 0 0 0 0 0 0 160194 160194 0 0
21110 10 161550 0 0 0 0 0 0 0 0 161550 161550 0 0
...What next? :)
Simon-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-16 22:37 ` Simon Kirby
@ 2003-06-16 22:44 ` David S. Miller
2003-06-16 23:09 ` Simon Kirby
0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-16 22:44 UTC (permalink / raw)
To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
From: Simon Kirby <sim@netnation.com>
Date: Mon, 16 Jun 2003 15:37:14 -0700
So, which kernels shall I try? When I set the thing up I was using
2.5.70-bk14, but I am compiling 2.5.71, and I will try with your patch
above and with Alexey's.
Thanks for your profiles.
I pushed all of our current work to Linus's tree.
But for your convenience here are the routing diffs
against plain 2.5.71
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1318.1.15 -> 1.1318.1.16
# net/ipv4/route.c 1.63 -> 1.64
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/06/16 kuznet@ms2.inr.ac.ru 1.1318.1.16
# [IPV4]: More sane rtcache behavior.
# 1) More reasonable ip_rt_gc_min_interval default
# 2) Trim less valuable entries in hash chain during
# rt_intern_hash when such chains grow too long.
# --------------------------------------------
#
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c Mon Jun 16 15:45:20 2003
+++ b/net/ipv4/route.c Mon Jun 16 15:45:20 2003
@@ -111,7 +111,7 @@
int ip_rt_max_size;
int ip_rt_gc_timeout = RT_GC_TIMEOUT;
int ip_rt_gc_interval = 60 * HZ;
-int ip_rt_gc_min_interval = 5 * HZ;
+int ip_rt_gc_min_interval = HZ / 2;
int ip_rt_redirect_number = 9;
int ip_rt_redirect_load = HZ / 50;
int ip_rt_redirect_silence = ((HZ / 50) << (9 + 1));
@@ -456,6 +456,25 @@
out: return ret;
}
+/* Bits of score are:
+ * 31: very valuable
+ * 30: not quite useless
+ * 29..0: usage counter
+ */
+static inline u32 rt_score(struct rtable *rt)
+{
+ u32 score = rt->u.dst.__use;
+
+ if (rt_valuable(rt))
+ score |= (1<<31);
+
+ if (!rt->fl.iif ||
+ !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL)))
+ score |= (1<<30);
+
+ return score;
+}
+
/* This runs via a timer and thus is always in BH context. */
static void rt_check_expire(unsigned long dummy)
{
@@ -721,6 +740,9 @@
{
struct rtable *rth, **rthp;
unsigned long now = jiffies;
+ struct rtable *cand = NULL, **candp = NULL;
+ u32 min_score = ~(u32)0;
+ int chain_length = 0;
int attempts = !in_softirq();
restart:
@@ -755,7 +777,33 @@
return 0;
}
+ if (!atomic_read(&rth->u.dst.__refcnt)) {
+ u32 score = rt_score(rth);
+
+ if (score <= min_score) {
+ cand = rth;
+ candp = rthp;
+ min_score = score;
+ }
+ }
+
+ chain_length++;
+
rthp = &rth->u.rt_next;
+ }
+
+ if (cand) {
+ /* ip_rt_gc_elasticity used to be average length of chain
+ * length, when exceeded gc becomes really aggressive.
+ *
+ * The second limit is less certain. At the moment it allows
+ * only 2 entries per bucket. We will see.
+ */
+ if (chain_length > ip_rt_gc_elasticity ||
+ (chain_length > 1 && !(min_score & (1<<31)))) {
+ *candp = cand->u.rt_next;
+ rt_free(cand);
+ }
}
/* Try to bind route to arp only if it is output
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1320.1.1 -> 1.1320.1.2
# net/ipv4/route.c 1.64 -> 1.65
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/06/16 robert.olsson@data.slu.se 1.1320.1.2
# [IPV4]: In rt_intern_hash, reinit all state vars on branch to "restart".
# --------------------------------------------
#
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c Mon Jun 16 15:46:05 2003
+++ b/net/ipv4/route.c Mon Jun 16 15:46:05 2003
@@ -739,13 +739,19 @@
static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
struct rtable *rth, **rthp;
- unsigned long now = jiffies;
- struct rtable *cand = NULL, **candp = NULL;
- u32 min_score = ~(u32)0;
- int chain_length = 0;
+ unsigned long now;
+ struct rtable *cand, **candp;
+ u32 min_score;
+ int chain_length;
int attempts = !in_softirq();
restart:
+ chain_length = 0;
+ min_score = ~(u32)0;
+ cand = NULL;
+ candp = NULL;
+ now = jiffies;
+
rthp = &rt_hash_table[hash].chain;
spin_lock_bh(&rt_hash_table[hash].lock);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-16 23:09 ` Simon Kirby
@ 2003-06-16 23:08 ` David S. Miller
2003-06-16 23:27 ` Simon Kirby
0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-16 23:08 UTC (permalink / raw)
To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
From: Simon Kirby <sim@netnation.com>
Date: Mon, 16 Jun 2003 16:09:22 -0700
On Mon, Jun 16, 2003 at 03:44:01PM -0700, David S. Miller wrote:
> I pushed all of our current work to Linus's tree.
> But for your convenience here are the routing diffs
> against plain 2.5.71
Trying to apply against 2.5.71:
patching file net/ipv4/route.c
Hunk #2 succeeded at 454 (offset -2 lines).
Hunk #3 succeeded at 738 (offset -2 lines).
Hunk #4 succeeded at 775 (offset -2 lines).
patching file net/ipv4/route.c
Hunk #1 FAILED at 739.
1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej
Trying to apply against 2.5.71-bk2:
patching file net/ipv4/route.c
patching file net/ipv4/route.c
Hunk #1 FAILED at 739.
1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej
Missing something between?
Code from bk2:
static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
struct rtable *rth, **rthp;
unsigned long now = jiffies;
int attempts = !in_softirq();
Patch:
It depends upon the first patch that I enclosed.
What I gave you was a 2-part patch, the first one
did:
@@ -721,6 +740,9 @@
{
struct rtable *rth, **rthp;
unsigned long now = jiffies;
+ struct rtable *cand = NULL, **candp = NULL;
+ u32 min_score = ~(u32)0;
+ int chain_length = 0;
int attempts = !in_softirq();
restart:
The second one did:
@@ -739,13 +739,19 @@
static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
struct rtable *rth, **rthp;
- unsigned long now = jiffies;
- struct rtable *cand = NULL, **candp = NULL;
- u32 min_score = ~(u32)0;
- int chain_length = 0;
+ unsigned long now;
+ struct rtable *cand, **candp;
+ u32 min_score;
+ int chain_length;
int attempts = !in_softirq();
...
I have no idea why it doesn't apply.
Nothing else has happened in these bits of code for a while.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-16 22:44 ` David S. Miller
@ 2003-06-16 23:09 ` Simon Kirby
2003-06-16 23:08 ` David S. Miller
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 23:09 UTC (permalink / raw)
To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
On Mon, Jun 16, 2003 at 03:44:01PM -0700, David S. Miller wrote:
> I pushed all of our current work to Linus's tree.
> But for your convenience here are the routing diffs
> against plain 2.5.71
Trying to apply against 2.5.71:
patching file net/ipv4/route.c
Hunk #2 succeeded at 454 (offset -2 lines).
Hunk #3 succeeded at 738 (offset -2 lines).
Hunk #4 succeeded at 775 (offset -2 lines).
patching file net/ipv4/route.c
Hunk #1 FAILED at 739.
1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej
Trying to apply against 2.5.71-bk2:
patching file net/ipv4/route.c
patching file net/ipv4/route.c
Hunk #1 FAILED at 739.
1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej
Missing something between?
Code from bk2:
static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
struct rtable *rth, **rthp;
unsigned long now = jiffies;
int attempts = !in_softirq();
Patch:
static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
struct rtable *rth, **rthp;
- unsigned long now = jiffies;
- struct rtable *cand = NULL, **candp = NULL;
...
Simon-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-16 23:08 ` David S. Miller
@ 2003-06-16 23:27 ` Simon Kirby
2003-06-16 23:49 ` Simon Kirby
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 23:27 UTC (permalink / raw)
To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
On Mon, Jun 16, 2003 at 04:08:56PM -0700, David S. Miller wrote:
> It depends upon the first patch that I enclosed.
Never mind. :) Such patches don't work very well with patch --dry.
Simon-
[ Simon Kirby ][ Network Operations ]
[ sim@netnation.com ][ NetNation Communications Inc. ]
[ Opinions expressed are not necessarily those of my employer. ]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-16 23:27 ` Simon Kirby
@ 2003-06-16 23:49 ` Simon Kirby
2003-06-17 15:59 ` David S. Miller
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 23:49 UTC (permalink / raw)
To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
On Mon, Jun 16, 2003 at 04:27:50PM -0700, Simon Kirby wrote:
> On Mon, Jun 16, 2003 at 04:08:56PM -0700, David S. Miller wrote:
>
> > It depends upon the first patch that I enclosed.
>
> Never mind. :) Such patches don't work very well with patch --dry.
Okay, here goes 2.5.71 + this patch:
60.0049 seconds passed, avg forwarding rate: 160190.859 pps
60.0085 seconds passed, avg forwarding rate: 157118.708 pps
60.0046 seconds passed, avg forwarding rate: 157211.097 pps
60.0073 seconds passed, avg forwarding rate: 157557.710 pps
...Looks like a tad worse than with your patch, but not by much.
Forwarding rate is still pretty crappy for an Opteron. Will fiddle
a bit more tonight to see what I can do.
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c02c0ea0 5113 9.07075 fn_hash_lookup
c0293970 3264 5.79052 ip_route_input_slow
c028ef90 2734 4.85027 nf_iterate
c028f280 2525 4.47949 nf_hook_slow
c02924b0 2127 3.77342 rt_intern_hash
c0222330 2125 3.76987 tg3_start_xmit
c02becc0 1755 3.11347 fib_validate_source
c0290020 1684 2.98751 pfifo_fast_dequeue
c0296220 1531 2.71608 ip_rcv_finish
c0135230 1449 2.57061 kmem_cache_free
c0134ff0 1431 2.53867 free_block
c0221710 1369 2.42868 tg3_rx
c0295cb0 1350 2.39498 ip_rcv
c0135170 1304 2.31337 kmem_cache_alloc
c02941a0 1258 2.23176 ip_route_input
c028f920 1255 2.22644 eth_header
c0134e20 1148 2.03662 cache_alloc_refill
c0291b70 1104 1.95856 rt_hash_code
c02886a0 1082 1.91953 netif_receive_skb
c01351b0 983 1.7439 __kmalloc
c028b610 923 1.63745 neigh_lookup
c02c0050 914 1.62149 fib_semantic_match
c029a660 857 1.52037 ip_finish_output2
c028c600 829 1.47069 neigh_resolve_output
c01adc80 766 1.35893 memcpy
c0135270 743 1.31812 kfree
c0297000 741 1.31458 ip_forward
c0284620 686 1.217 alloc_skb
c02b9730 666 1.18152 inet_select_addr
c028fa90 663 1.1762 eth_type_trans
c0128a00 649 1.15136 call_rcu
c0297240 623 1.10524 ip_forward_finish
c028af00 620 1.09991 dst_alloc
c0288160 597 1.05911 dev_queue_xmit
c028ffa0 570 1.01121 pfifo_fast_enqueue
c028b030 486 0.862191 dst_destroy
c0292260 485 0.860417 rt_garbage_collect
c028fcb0 472 0.837355 qdisc_restart
c0221350 467 0.828484 tg3_tx
c028c480 463 0.821388 neigh_hh_init
c02215c0 455 0.807196 tg3_recycle_rx
c02d0f70 447 0.793003 ipv4_sabotage_out
c0298580 443 0.785907 ip_finish_output
c011f080 430 0.762844 local_bh_enable
c010fc40 358 0.635112 do_gettimeofday
c0284860 345 0.612049 __kfree_skb
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
22910 10 158190 0 0 0 0 0 0 0 0 158190 158188 1 0
20590 10 158330 0 0 0 0 0 0 0 0 158330 158328 1 0
20515 14 158306 0 0 0 0 0 0 0 0 158306 158304 1 0
21000 4 158964 0 0 0 0 0 0 0 0 158964 158962 1 0
21631 8 159300 0 0 0 0 0 0 0 0 159300 159298 0 0
20329 13 160059 0 0 0 0 0 0 0 0 160059 160057 1 0
22995 7 157441 0 0 0 0 0 0 0 0 157441 157439 1 0
22418 9 156831 0 0 0 0 0 0 0 0 156831 156829 1 0
22417 11 157321 0 0 0 0 0 0 0 0 157321 157319 1 0
21339 6 157898 0 0 0 0 0 0 0 0 157898 157896 0 0
22562 10 157734 0 0 0 0 0 0 0 0 157734 157732 1 0
20488 12 159496 0 0 0 0 0 0 0 0 159496 159493 1 0
22527 10 157674 0 0 0 0 0 0 0 0 157674 157672 1 0
21992 7 156729 0 0 0 0 0 0 0 0 156729 156727 0 0
21372 10 157106 0 0 0 0 0 0 0 0 157106 157104 1 0
22950 10 156402 0 0 0 0 0 0 0 0 156402 156400 2 0
20471 11 157057 0 0 0 0 0 0 0 0 157057 157055 1 0
20864 13 159082 0 0 0 0 0 0 0 0 159082 159080 0 0
22416 10 157658 0 0 0 0 0 0 0 0 157658 157656 1 0
22659 8 157348 0 0 0 0 0 0 0 0 157348 157346 1 0
Simon-
[ Simon Kirby ][ Network Operations ]
[ sim@netnation.com ][ NetNation Communications Inc. ]
[ Opinions expressed are not necessarily those of my employer. ]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-16 23:49 ` Simon Kirby
@ 2003-06-17 15:59 ` David S. Miller
2003-06-17 16:50 ` Robert Olsson
0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-17 15:59 UTC (permalink / raw)
To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net
From: Simon Kirby <sim@netnation.com>
Date: Mon, 16 Jun 2003 16:49:37 -0700
60.0073 seconds passed, avg forwarding rate: 157557.710 pps
...Looks like a tad worse than with your patch, but not by much.
Forwarding rate is still pretty crappy for an Opteron. Will fiddle
a bit more tonight to see what I can do.
To be honest, this isn't half-bad for pure DoS load.
This reminds me, maybe a good test would be PPS for "well behaved
flows" in the presence of DoS load. You'd probably need 4 systems
to carry out such a test accurately.
Because, really, who cares how fast we can forward the DoS traffic
as long as legitimate users still see good metrics.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 15:59 ` David S. Miller
@ 2003-06-17 16:50 ` Robert Olsson
2003-06-17 16:50 ` David S. Miller
2003-06-17 20:07 ` Simon Kirby
0 siblings, 2 replies; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 16:50 UTC (permalink / raw)
To: David S. Miller; +Cc: sim, ralph+d, hadi, xerox, fw, netdev, linux-net
David S. Miller writes:
> 60.0073 seconds passed, avg forwarding rate: 157557.710 pps
> To be honest, this isn't half-bad for pure DoS load.
No thats pretty good and profiles looks as expected. It would interesting
to get the singeflow performance as a comparison.
Also think Simon used only /32 routes... I took "real" Internet-routing
and made a script so it can be used for experiments. I can make it available.
Cheers.
--ro
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 16:50 ` Robert Olsson
@ 2003-06-17 16:50 ` David S. Miller
2003-06-17 17:29 ` Robert Olsson
2003-06-17 20:07 ` Simon Kirby
1 sibling, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-17 16:50 UTC (permalink / raw)
To: Robert.Olsson; +Cc: sim, ralph+d, hadi, xerox, fw, netdev, linux-net
From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Tue, 17 Jun 2003 18:50:03 +0200
Also think Simon used only /32 routes... I took "real"
Internet-routing and made a script so it can be used for
experiments. I can make it available.
Please do, I'd like to play with such a list locally.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 16:50 ` David S. Miller
@ 2003-06-17 17:29 ` Robert Olsson
2003-06-17 19:06 ` Mr. James W. Laferriere
0 siblings, 1 reply; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 17:29 UTC (permalink / raw)
To: David S. Miller
Cc: Robert.Olsson, sim, ralph+d, hadi, xerox, fw, netdev, linux-net
David S. Miller writes:
> Internet-routing and made a script so it can be used for
> experiments. I can make it available.
>
> Please do, I'd like to play with such a list locally.
ftp://robur.slu.se/pub/Linux/net-development/inet_routes/
Just configure the script and run...
And Simon can you do a run with this routing table too? And even fibstat
output could be interesting to compare.
Cheers.
--ro
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 17:29 ` Robert Olsson
@ 2003-06-17 19:06 ` Mr. James W. Laferriere
2003-06-17 20:12 ` Robert Olsson
0 siblings, 1 reply; 31+ messages in thread
From: Mr. James W. Laferriere @ 2003-06-17 19:06 UTC (permalink / raw)
To: Robert Olsson; +Cc: netdev, Linux networking maillist
Hello Robert , First thank you for these tools . Now for the
questions .
Is 'fibstat' only for 2.5.x kernels ?
The reason for that ? is there isn't a /proc/net/fib_stat .
Under 2.4.21 , which has no mention of fib_stat anywhere in the
sources .
The next ? is packet-generator.c (appears) to require kernel
include files . Is there an updated user level net/ includes ?
Tia , JimL
On Tue, 17 Jun 2003, Robert Olsson wrote:
> David S. Miller writes:
> > Internet-routing and made a script so it can be used for
> > experiments. I can make it available.
> > Please do, I'd like to play with such a list locally.
> ftp://robur.slu.se/pub/Linux/net-development/inet_routes/
> Just configure the script and run...
> And Simon can you do a run with this routing table too? And even fibstat
> output could be interesting to compare.
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 16:50 ` Robert Olsson
2003-06-17 16:50 ` David S. Miller
@ 2003-06-17 20:07 ` Simon Kirby
2003-06-17 20:17 ` Martin Josefsson
` (3 more replies)
1 sibling, 4 replies; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 20:07 UTC (permalink / raw)
To: Robert Olsson
Cc: David S. Miller, ralph+d, hadi, xerox, fw, netdev, linux-net
On Tue, Jun 17, 2003 at 06:50:03PM +0200, Robert Olsson wrote:
> David S. Miller writes:
>
> > 60.0073 seconds passed, avg forwarding rate: 157557.710 pps
>
> > To be honest, this isn't half-bad for pure DoS load.
>
> No thats pretty good and profiles looks as expected. It would interesting
> to get the singeflow performance as a comparison.
I changed Juno to send from a single IP, but it only spat out about
330000 pps, which the dual Tigon3 Opteron box forwarded completely.
In order to do a single flow forwarding test, I need to be able to create
more input traffic somehow. Seeing as you wrote pktgen.c, maybe you
could help in this department. :)
> Also think Simon used only /32 routes... I took "real" Internet-routing
> and made a script so it can be used for experiments. I can make it available.
Yes, I found that area less interesting since Dave M. fixed the hash
buckets. But yes, the prefix scanning will slow it down some.
Whoa. Uhm. A lot. I should compare with 2.4 again to see what's going
on here.
60.0042 seconds passed, avg forwarding rate: 50759.683 pps
60.0039 seconds passed, avg forwarding rate: 50311.258 pps
60.0046 seconds passed, avg forwarding rate: 50420.562 pps
60.0036 seconds passed, avg forwarding rate: 50399.389 pps
60.0038 seconds passed, avg forwarding rate: 50431.732 pps
60.0041 seconds passed, avg forwarding rate: 50403.777 pps
60.0036 seconds passed, avg forwarding rate: 50210.604 pps
60.0033 seconds passed, avg forwarding rate: 50279.220 pps
60.0036 seconds passed, avg forwarding rate: 50549.291 pps
60.0046 seconds passed, avg forwarding rate: 50437.615 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.26
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c02bf730 16019 33.2014 fn_hash_lookup
c0292b70 3882 8.04593 ip_route_input_slow
c0221710 2335 4.83958 tg3_rx
c02bd550 2004 4.15354 fib_validate_source
c0290d70 1955 4.05198 rt_hash_code
c0294e50 1670 3.46128 ip_rcv
c02933a0 1404 2.90997 ip_route_input
c01351b0 1349 2.79597 __kmalloc
c02885c0 1314 2.72343 netif_receive_skb
c02b8040 1168 2.42083 inet_select_addr
c0135270 1123 2.32756 kfree
c0284620 987 2.04568 alloc_skb
c028ec90 900 1.86536 eth_type_trans
c0135170 860 1.78246 kmem_cache_alloc
c02be8e0 844 1.7493 fib_semantic_match
c0135230 812 1.68297 kmem_cache_free
c0222330 652 1.35135 tg3_start_xmit
c02916b0 648 1.34306 rt_intern_hash
c02215c0 542 1.12336 tg3_recycle_rx
c010fc40 459 0.951335 do_gettimeofday
c028f220 422 0.874648 pfifo_fast_dequeue
c02be9b0 419 0.86843 __fib_res_prefsrc
c028eb20 417 0.864285 eth_header
c0295df0 386 0.800033 ip_forward
c028c520 363 0.752363 neigh_resolve_output
c0284860 345 0.715056 __kfree_skb
c0284840 311 0.644586 kfree_skbmem
c02847a0 295 0.611424 skb_release_data
c028b530 285 0.590698 neigh_lookup
c01adc80 276 0.572044 memcpy
c0134ff0 269 0.557536 free_block
c0291460 240 0.49743 rt_garbage_collect
c0134e20 236 0.489139 cache_alloc_refill
c02972d0 216 0.447687 ip_finish_output
c0114030 215 0.445614 get_offset_tsc
c0128a00 197 0.408307 call_rcu
c028ae20 193 0.400017 dst_alloc
c0288080 187 0.387581 dev_queue_xmit
c028af50 184 0.381363 dst_destroy
c028f1a0 175 0.362709 pfifo_fast_enqueue
c011f080 170 0.352346 local_bh_enable
c0221350 168 0.348201 tg3_tx
c0221e90 160 0.33162 tg3_set_txd
c0297570 152 0.315039 ip_output
c028c3a0 149 0.308821 neigh_hh_init
c028eeb0 141 0.29224 qdisc_restart
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
17929 13 214343 0 0 0 0 163822 0 0 0 50521 50519 0 0
18296 18 213694 0 0 0 0 163018 0 0 0 50676 50674 1 0
17616 11 214529 0 0 0 0 163993 0 0 0 50536 50534 0 0
17841 12 213816 0 0 0 0 163157 0 0 0 50659 50657 1 0
18272 7 214093 0 0 0 0 163583 0 0 0 50510 50508 1 0
18216 9 214843 0 0 0 0 164214 0 0 0 50629 50627 0 0
18318 16 214976 0 0 0 0 164299 0 0 0 50677 50675 0 0
18099 9 213447 0 0 0 0 162995 0 0 0 50452 50450 1 0
17610 14 216438 0 0 0 0 165408 0 0 0 51030 51028 1 0
17643 14 214638 0 0 0 0 163987 0 0 0 50651 50649 0 0
17516 7 213185 0 0 0 0 163016 0 0 0 50169 50167 1 0
18355 10 213894 0 0 0 0 163564 0 0 0 50330 50328 1 0
17723 11 214477 0 0 0 0 163705 0 0 0 50772 50770 0 0
17915 6 214342 0 0 0 0 163625 0 0 0 50717 50715 0 0
18166 19 213965 0 0 0 0 163521 0 0 0 50444 50442 0 0
17943 19 213417 0 0 0 0 162955 0 0 0 50462 50460 2 0
17515 5 214423 0 0 0 0 163718 0 0 0 50705 50703 0 0
18231 10 213434 0 0 0 0 162919 0 0 0 50515 50513 1 0
17523 8 213856 0 0 0 0 163385 0 0 0 50471 50469 0 0
18217 16 214940 0 0 0 0 164165 0 0 0 50775 50773 0 0
...recompiling with fibstats...
Erm. I can't get fib_stats2.pat to apply against 2.5.71, 2.5.71+davem's
join-two-diffs patch, 2.4.21-rc7, or 2.5.71+davem's rtcache changes.
What's it supposed to be against?
[sroot@debinst:/d/linux-2.5]# patch -p0 --dry < ../fib_stats2.pat
patching file include/net/ip_fib.h
Hunk #1 succeeded at 139 (offset 4 lines).
patching file net/ipv4/fib_hash.c
Hunk #3 succeeded at 305 (offset -11 lines).
Hunk #4 succeeded at 1110 with fuzz 1 (offset -14 lines).
Hunk #5 succeeded at 1166 (offset -14 lines).
patching file net/ipv4/route.c
Hunk #1 FAILED at 2754.
Hunk #2 succeeded at 2760 (offset -6 lines).
Hunk #3 succeeded at 2783 (offset -6 lines).
Hunk #4 FAILED at 2793.
2 out of 4 hunks FAILED -- saving rejects to file net/ipv4/route.c.rej
In any event, here is the profile of the single flow case with the full
routing table (probably identical to the empty routing table case). The
sender is pushing enough for NAPI to kick in, so there is a lot of tg3
overhead that would be with more traffic:
60.0041 seconds passed, avg forwarding rate: 329808.310 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.26
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes
exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask)
count 697000
vma samples % symbol name
c0222330 4470 8.51445 tg3_start_xmit
c0221710 3760 7.16204 tg3_rx
c0294e50 3142 5.98488 ip_rcv
c02885c0 2428 4.62485 netif_receive_skb
c0295df0 2065 3.93341 ip_forward
c028f220 2058 3.92007 pfifo_fast_dequeue
c02933a0 2033 3.87245 ip_route_input
c01351b0 1987 3.78483 __kmalloc
c0290d70 1904 3.62674 rt_hash_code
c02972d0 1752 3.33721 ip_finish_output
c01adc80 1649 3.14101 memcpy
c0134ff0 1626 3.0972 free_block
c0284620 1511 2.87815 alloc_skb
c0135270 1489 2.83624 kfree
c0288080 1461 2.78291 dev_queue_xmit
c028ec90 1351 2.57338 eth_type_trans
c0135170 1319 2.51243 kmem_cache_alloc
c028f1a0 1243 2.36766 pfifo_fast_enqueue
c0134e20 1172 2.23242 cache_alloc_refill
c0297570 1145 2.18099 ip_output
c0135230 1133 2.15814 kmem_cache_free
c0221350 1085 2.06671 tg3_tx
c0221a50 991 1.88766 tg3_poll
c02215c0 893 1.70098 tg3_recycle_rx
c0221e90 832 1.58479 tg3_set_txd
c0221b60 812 1.5467 tg3_interrupt
c028eeb0 755 1.43812 qdisc_restart
c010fc40 672 1.28002 do_gettimeofday
c011f080 578 1.10097 local_bh_enable
c0284840 492 0.937161 kfree_skbmem
c010a8b2 426 0.811444 restore_all
c02847a0 375 0.714299 skb_release_data
c010c6a0 327 0.622869 handle_IRQ_event
c010c910 288 0.548582 do_IRQ
c0284860 284 0.540963 __kfree_skb
c0114030 283 0.539058 get_offset_tsc
c021f4b0 270 0.514296 tg3_enable_ints
c0115e10 234 0.445723 end_level_ioapic_irq
c011f4a0 220 0.419056 cpu_raise_softirq
c028eb20 199 0.379055 eth_header
c0288500 188 0.358102 net_tx_action
c028c520 187 0.356197 neigh_resolve_output
c0134b40 179 0.340959 cache_init_objs
c0288900 171 0.32572 net_rx_action
c0132290 165 0.314292 buffered_rmqueue
c01321b0 122 0.232385 free_hot_cold_page
If I start two threads on the sender (Xeon w/HT), I'm able to push 420000
pps, which only partially starts to use NAPI on the Opteron box. Going
to try 2.4 again for a comparison (note: 2.5 seems to have an opposite
PCI scan order from 2.4 for the dual Tigon3s).
Simon-
[ Simon Kirby ][ Network Operations ]
[ sim@netnation.com ][ NetNation Communications Inc. ]
[ Opinions expressed are not necessarily those of my employer. ]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 19:06 ` Mr. James W. Laferriere
@ 2003-06-17 20:12 ` Robert Olsson
0 siblings, 0 replies; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 20:12 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: Robert Olsson, netdev, Linux networking maillist
Mr. James W. Laferriere writes:
>
> Is 'fibstat' only for 2.5.x kernels ?
Yes.
> The reason for that ? is there isn't a /proc/net/fib_stat .
> Under 2.4.21 , which has no mention of fib_stat anywhere in the
> sources .
The kernel part creates /proc/net/fib_stat. I should pretty straight
for 2.4.X too. If people find it useful it can be backported. Try. Look
at route.c and rt_cache_stat if you run into problems.
> The next ? is packet-generator.c (appears) to require kernel
> include files . Is there an updated user level net/ includes ?
It's a kernel module and should be compiled with the kernel.
Cheers.
--ro
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:07 ` Simon Kirby
@ 2003-06-17 20:17 ` Martin Josefsson
2003-06-17 20:37 ` Simon Kirby
2003-06-17 20:49 ` Robert Olsson
` (2 subsequent siblings)
3 siblings, 1 reply; 31+ messages in thread
From: Martin Josefsson @ 2003-06-17 20:17 UTC (permalink / raw)
To: Simon Kirby
Cc: Robert Olsson, David S. Miller, ralph+d, hadi, xerox, fw, netdev,
linux-net
On Tue, 2003-06-17 at 22:07, Simon Kirby wrote:
> > Also think Simon used only /32 routes... I took "real" Internet-routing
> > and made a script so it can be used for experiments. I can make it available.
>
> Yes, I found that area less interesting since Dave M. fixed the hash
> buckets. But yes, the prefix scanning will slow it down some.
>
> Whoa. Uhm. A lot. I should compare with 2.4 again to see what's going
> on here.
> size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
> 17929 13 214343 0 0 0 0 163822 0 0 0 50521 50519 0 0
Did you have rp_filter enabled? Looks like it.
--
/Martin
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:37 ` Simon Kirby
@ 2003-06-17 20:36 ` David S. Miller
2003-06-17 20:51 ` Simon Kirby
0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-17 20:36 UTC (permalink / raw)
To: sim; +Cc: gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw, netdev,
linux-net
From: Simon Kirby <sim@netnation.com>
Date: Tue, 17 Jun 2003 13:37:03 -0700
Forwarding rate more than doubles when I turn off
rp_filter off (Debian turns it on by default).
I have no idea why they do this, it's the stupidest thing
you can possibly do by default.
If we thought it was a good idea to turn this on by default
we would have done so in the kernel.
Does anyone have some cycles to spare to try and urge whoever is
repsponsible for this in Debian to leave the kernel's default setting
alone?
Thanks.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:17 ` Martin Josefsson
@ 2003-06-17 20:37 ` Simon Kirby
2003-06-17 20:36 ` David S. Miller
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 20:37 UTC (permalink / raw)
To: Martin Josefsson
Cc: Robert Olsson, David S. Miller, ralph+d, hadi, xerox, fw, netdev,
linux-net
On Tue, Jun 17, 2003 at 10:17:14PM +0200, Martin Josefsson wrote:
>
> > size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
> > 17929 13 214343 0 0 0 0 163822 0 0 0 50521 50519 0 0
>
> Did you have rp_filter enabled? Looks like it.
Yes, good spotting. Forwarding rate more than doubles when I turn off
rp_filter off (Debian turns it on by default).
60.0049 seconds passed, avg forwarding rate: 108222.462 pps
60.0041 seconds passed, avg forwarding rate: 108868.822 pps
60.0042 seconds passed, avg forwarding rate: 108767.194 pps
60.0040 seconds passed, avg forwarding rate: 108872.188 pps
60.0045 seconds passed, avg forwarding rate: 108856.575 pps
60.0041 seconds passed, avg forwarding rate: 108743.443 pps
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.26
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma samples % symbol name
c02bf730 15382 33.0213 fn_hash_lookup
c0292b70 2127 4.56614 ip_route_input_slow
c02916b0 1380 2.96252 rt_intern_hash
c0222330 1340 2.87665 tg3_start_xmit
c02bd550 1336 2.86806 fib_validate_source
c0221710 1219 2.61689 tg3_rx
c02be8e0 1154 2.47735 fib_semantic_match
c0294e50 1068 2.29273 ip_rcv
c028f220 983 2.11026 pfifo_fast_dequeue
c0135230 981 2.10596 kmem_cache_free
c02b8040 906 1.94496 inet_select_addr
c0290d70 901 1.93422 rt_hash_code
c0134ff0 877 1.8827 free_block
c028eb20 873 1.87411 eth_header
c0135170 805 1.72814 kmem_cache_alloc
c02885c0 798 1.71311 netif_receive_skb
c02933a0 788 1.69164 ip_route_input
c028c520 778 1.67017 neigh_resolve_output
c0295df0 744 1.59718 ip_forward
c0134e20 734 1.57572 cache_alloc_refill
c01351b0 727 1.56069 __kmalloc
c028b530 686 1.47267 neigh_lookup
c01adc80 535 1.14851 memcpy
c0135270 510 1.09484 kfree
c0284620 506 1.08626 alloc_skb
c0291460 498 1.06908 rt_garbage_collect
c028ec90 480 1.03044 eth_type_trans
c028ae20 439 0.942424 dst_alloc
c0128a00 437 0.938131 call_rcu
c02972d0 433 0.929544 ip_finish_output
c0297570 407 0.873728 ip_output
c011f080 380 0.815766 local_bh_enable
c0288080 376 0.807179 dev_queue_xmit
c028eeb0 369 0.792151 qdisc_restart
c0221e90 361 0.774977 tg3_set_txd
c028af50 360 0.772831 dst_destroy
c028f1a0 352 0.755657 pfifo_fast_enqueue
c028c3a0 317 0.68052 neigh_hh_init
c02215c0 315 0.676227 tg3_recycle_rx
c0221350 308 0.6612 tg3_tx
c02be9b0 294 0.631145 __fib_res_prefsrc
c010fc40 200 0.42935 do_gettimeofday
c02b48c0 195 0.418617 arp_hash
c0292860 195 0.418617 rt_set_nexthop
c02b4df0 178 0.382122 arp_bind_neighbour
c0294500 159 0.341334 dst_free
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
20262 5 109659 0 0 0 0 0 0 0 0 109659 109657 1 0
19229 7 109493 0 0 0 0 0 0 0 0 109493 109491 0 0
20320 4 109576 0 0 0 0 0 0 0 0 109576 109574 1 0
19280 9 109439 0 0 0 0 0 0 0 0 109439 109437 1 0
20325 10 109314 0 0 0 0 0 0 0 0 109314 109312 1 0
18983 6 109530 0 0 0 0 0 0 0 0 109530 109528 1 0
20313 5 109867 0 0 0 0 0 0 0 0 109867 109865 0 0
19127 4 109256 0 0 0 0 0 0 0 0 109256 109254 1 0
18897 4 109508 0 0 0 0 0 0 0 0 109508 109506 1 0
20338 11 109717 0 0 0 0 0 0 0 0 109717 109715 0 0
19054 7 109209 0 0 0 0 0 0 0 0 109209 109207 1 0
20397 11 109273 0 0 0 0 0 0 0 0 109273 109271 1 0
Simon-
[ Simon Kirby ][ Network Operations ]
[ sim@netnation.com ][ NetNation Communications Inc. ]
[ Opinions expressed are not necessarily those of my employer. ]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:51 ` Simon Kirby
@ 2003-06-17 20:49 ` David S. Miller
2003-06-18 5:50 ` Pekka Savola
1 sibling, 0 replies; 31+ messages in thread
From: David S. Miller @ 2003-06-17 20:49 UTC (permalink / raw)
To: sim; +Cc: gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw, netdev,
linux-net
From: Simon Kirby <sim@netnation.com>
Date: Tue, 17 Jun 2003 13:51:01 -0700
Specific firewall rules would have to be created otherwise. And
the overhead only really shows when the routing table is large,
right?
rp filter breaks things... just like firewalls break things...
so just like a user enables firewall rules by himself, he may
enable rp filter by himself...
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:07 ` Simon Kirby
2003-06-17 20:17 ` Martin Josefsson
@ 2003-06-17 20:49 ` Robert Olsson
2003-06-17 21:07 ` Simon Kirby
2003-06-17 22:11 ` Ralph Doncaster
3 siblings, 0 replies; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 20:49 UTC (permalink / raw)
To: Simon Kirby
Cc: Robert Olsson, David S. Miller, ralph+d, hadi, xerox, fw, netdev,
linux-net
Simon Kirby writes:
> I changed Juno to send from a single IP, but it only spat out about
> 330000 pps, which the dual Tigon3 Opteron box forwarded completely.
> In order to do a single flow forwarding test, I need to be able to create
> more input traffic somehow. Seeing as you wrote pktgen.c, maybe you
> could help in this department. :)
OK. See below.
> > Also think Simon used only /32 routes... I took "real" Internet-routing
> > and made a script so it can be used for experiments. I can make it available.
>
> Yes, I found that area less interesting since Dave M. fixed the hash
> buckets. But yes, the prefix scanning will slow it down some.
Well I don't think it's so easy as there are 33 zones with prefixes if
you have all the routes in one zone I'm not sure what happens thats why
I suggested the comparison.
> Erm. I can't get fib_stats2.pat to apply against 2.5.71, 2.5.71+davem's
> join-two-diffs patch, 2.4.21-rc7, or 2.5.71+davem's rtcache changes.
> What's it supposed to be against?
Sorry. Our production system and lab uses very patched 2.5.66
I'll make a patch for 2.5.71....
> If I start two threads on the sender (Xeon w/HT), I'm able to push 420000
> pps, which only partially starts to use NAPI on the Opteron box. Going
> to try 2.4 again for a comparison (note: 2.5 seems to have an opposite
> PCI scan order from 2.4 for the dual Tigon3s).
Not bad. Replace net/core/pktgen.c in 2.5.X with the version from
ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
and edit pktgen.sh to suit your needs.
And see what you got. I'm interested since you are using both different
processors as NIC's. Also packet generation itself is interesting as
it tests driver/HW xmit-path.
Cheers.
--ro
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:36 ` David S. Miller
@ 2003-06-17 20:51 ` Simon Kirby
2003-06-17 20:49 ` David S. Miller
2003-06-18 5:50 ` Pekka Savola
0 siblings, 2 replies; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 20:51 UTC (permalink / raw)
To: David S. Miller
Cc: gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw, netdev,
linux-net
On Tue, Jun 17, 2003 at 01:36:35PM -0700, David S. Miller wrote:
> I have no idea why they do this, it's the stupidest thing
> you can possibly do by default.
>
> If we thought it was a good idea to turn this on by default
> we would have done so in the kernel.
>
> Does anyone have some cycles to spare to try and urge whoever is
> repsponsible for this in Debian to leave the kernel's default setting
> alone?
Sure, I can do this. But why is this stupid? It uses more CPU, but
stops IP spoofing by default. Specific firewall rules would have to be
created otherwise. And the overhead only really shows when the routing
table is large, right?
Simon-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:07 ` Simon Kirby
2003-06-17 20:17 ` Martin Josefsson
2003-06-17 20:49 ` Robert Olsson
@ 2003-06-17 21:07 ` Simon Kirby
2003-06-17 22:50 ` Simon Kirby
2003-06-17 22:11 ` Ralph Doncaster
3 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 21:07 UTC (permalink / raw)
To: Robert Olsson
Cc: David S. Miller, ralph+d, hadi, xerox, fw, netdev, linux-net
On Tue, Jun 17, 2003 at 01:07:21PM -0700, Simon Kirby wrote:
> Whoa. Uhm. A lot. I should compare with 2.4 again to see what's going
> on here.
>
> 60.0042 seconds passed, avg forwarding rate: 50759.683 pps
Ummmm, yeah, 2.5.71 is quite a bit slower than 2.4.21. I applied
Alexey's 2.5.71 rtcache fixes to 2.4.21 (changing "fl" to "key" in
the scoring function), and now I see:
60.0065 seconds passed, avg forwarding rate: 135379.152 pps
If reboot and don't fill the routing table:
60.0104 seconds passed, avg forwarding rate: 259027.200 pps
This is with standard juno (pseudo-random sources).
This is with CONFIG_IP_MULTIPLE_TABLES still on, too. I'll turn that off
and do some profiles. The only weird thing I'm seeing while doing this
is that the route cache table continues to grow slowly, and the pps
slowly falls off over a few minutes. "ip route flush cache" restores
performance again. I'll verify this is not happening in 2.5.
Simon-
[ Simon Kirby ][ Network Operations ]
[ sim@netnation.com ][ NetNation Communications Inc. ]
[ Opinions expressed are not necessarily those of my employer. ]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 22:11 ` Ralph Doncaster
@ 2003-06-17 22:08 ` David S. Miller
0 siblings, 0 replies; 31+ messages in thread
From: David S. Miller @ 2003-06-17 22:08 UTC (permalink / raw)
To: ralph+d, ralph; +Cc: sim, netdev, linux-net
From: Ralph Doncaster <ralph@istop.com>
Date: Tue, 17 Jun 2003 18:11:00 -0400 (EDT)
My (obviously incorrect) assumption would
be that fib_validate_source is responsible for rp_filter, and turning it
off would lead to only a 5% performance increase.
fib_validate_source() with rp_filter enabled causes an extra
fib_lookup() to occur for each packet.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:07 ` Simon Kirby
` (2 preceding siblings ...)
2003-06-17 21:07 ` Simon Kirby
@ 2003-06-17 22:11 ` Ralph Doncaster
2003-06-17 22:08 ` David S. Miller
3 siblings, 1 reply; 31+ messages in thread
From: Ralph Doncaster @ 2003-06-17 22:11 UTC (permalink / raw)
To: Simon Kirby; +Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org
On Tue, 17 Jun 2003, Simon Kirby wrote:
> vma samples % symbol name
> c02bf730 16019 33.2014 fn_hash_lookup
> c0292b70 3882 8.04593 ip_route_input_slow
> c0221710 2335 4.83958 tg3_rx
> c02bd550 2004 4.15354 fib_validate_source
> c0290d70 1955 4.05198 rt_hash_code
> c0294e50 1670 3.46128 ip_rcv
> c02933a0 1404 2.90997 ip_route_input
If turning off rp_filter doubles your performance, then the profile
numbers above are misleading. My (obviously incorrect) assumption would
be that fib_validate_source is responsible for rp_filter, and turning it
off would lead to only a 5% performance increase.
Considering that, what kind of performance difference should removing the
route hashing make (i.e. going with r-trees or something like that). In
most of the profiles fn_hash_lookup has been at the top of the list.
-Ralph
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 21:07 ` Simon Kirby
@ 2003-06-17 22:50 ` Simon Kirby
2003-06-17 23:07 ` David S. Miller
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 22:50 UTC (permalink / raw)
To: David S. Miller
Cc: Robert Olsson, ralph+d, hadi, xerox, fw, netdev, linux-net
On Tue, Jun 17, 2003 at 02:07:01PM -0700, Simon Kirby wrote:
> 60.0104 seconds passed, avg forwarding rate: 259027.200 pps
>
> This is with standard juno (pseudo-random sources).
>
> This is with CONFIG_IP_MULTIPLE_TABLES still on, too.
Here is with CONFIG_IP_MULTIPLE_TABLES=n and CONFIG_NETFILTER=n
(rp_filter off and CONFIG_SMP=n in all tests):
60.0050 seconds passed, avg forwarding rate: 276893.102 pps
60.0046 seconds passed, avg forwarding rate: 257257.533 pps
60.0101 seconds passed, avg forwarding rate: 251852.843 pps
60.0106 seconds passed, avg forwarding rate: 248110.756 pps
60.0045 seconds passed, avg forwarding rate: 246280.066 pps
"rtstat -i 1" shows the pps rate decreasing because of the growing
rtcache:
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
16688 18 294882 0 0 0 0 0 0 0 0 294882 294880 2 0
16834 24 302376 0 0 0 0 0 0 0 0 302376 302374 2 0
16970 12 294288 0 0 0 0 0 0 0 0 294288 294286 2 0
17037 21 294278 0 0 0 0 0 0 0 0 294278 294276 1 0
17133 20 293080 0 0 0 0 0 0 0 0 293080 293078 1 0
17195 22 293978 0 0 0 0 0 0 0 0 293978 293976 1 0
17293 16 292184 0 0 0 0 0 0 0 0 292184 292182 1 0
17370 19 293681 0 0 0 0 0 0 0 0 293681 293679 1 0
17450 21 293079 0 0 0 0 0 0 0 0 293079 293077 1 0
17542 12 293388 0 0 0 0 0 0 0 0 293388 293386 1 0
17604 16 293684 0 0 0 0 0 0 0 0 293684 293682 1 0
17676 27 294573 0 0 0 0 0 0 0 0 294573 294571 0 0
17762 18 291582 0 0 0 0 0 0 0 0 291582 291580 1 0
...
21615 17 257683 0 0 0 0 0 0 0 0 257683 257681 1 0
21641 23 257077 0 0 0 0 0 0 0 0 257077 257075 0 0
21672 23 257077 0 0 0 0 0 0 0 0 257077 257075 1 0
Profile:
vma samples % symbol name
c025ee10 9228 15.1465 fn_hash_lookup
c02379a0 4025 6.60648 ip_route_input_slow
c0236650 3321 5.45096 rt_intern_hash
c0235d60 2775 4.55478 rt_hash_code
c012d750 2622 4.30365 kmem_cache_alloc
c012d950 2338 3.83751 kmem_cache_free
c0239f50 2323 3.81288 ip_rcv
c0238060 2277 3.73738 ip_route_input
c0233ac0 2190 3.59458 eth_header
c0231b50 2119 3.47805 neigh_resolve_output
c025ca50 2074 3.40419 fib_validate_source
c023aed0 1987 3.26139 ip_forward
c0230a70 1926 3.16126 neigh_lookup
c012d830 1845 3.02831 kmalloc
c0234270 1523 2.49979 pfifo_fast_dequeue
c025dde0 1407 2.3094 fib_semantic_match
c02376c0 1354 2.2224 rt_set_nexthop
c022a730 1322 2.16988 __kfree_skb
c022a480 1248 2.04842 alloc_skb
c012da00 1229 2.01723 kfree
c0259830 1171 1.92204 inet_select_addr
c0233ca0 1169 1.91875 eth_type_trans
c02304a0 994 1.63151 dst_destroy
c0230380 976 1.60197 dst_alloc
c010c910 806 1.32294 do_gettimeofday
c02319d0 793 1.3016 neigh_hh_init
c0236380 739 1.21297 rt_garbage_collect
c0233ef0 732 1.20148 qdisc_restart
c0234200 731 1.19984 pfifo_fast_enqueue
c022e590 695 1.14075 netif_receive_skb
c022dff0 623 1.02257 dev_queue_xmit
c023d6e0 575 0.943783 ip_finish_output
c02562a0 293 0.480919 arp_hash
Full route table:
60.0054 seconds passed, avg forwarding rate: 141888.209 pps
vma samples % symbol name
c025ee10 21133 42.5588 fn_hash_lookup
c02379a0 2219 4.46874 ip_route_input_slow
c0236650 1600 3.22217 rt_intern_hash
c0235d60 1471 2.96238 rt_hash_code
c012d750 1282 2.58176 kmem_cache_alloc
c012d950 1279 2.57572 kmem_cache_free
c0259830 1253 2.52336 inet_select_addr
c0239f50 1231 2.47906 ip_rcv
c0233ac0 1214 2.44482 eth_header
c025dde0 1165 2.34614 fib_semantic_match
c0231b50 1133 2.2817 neigh_resolve_output
c0238060 1126 2.2676 ip_route_input
c025ca50 1120 2.25552 fib_validate_source
c0230a70 1041 2.09642 neigh_lookup
c023aed0 1032 2.0783 ip_forward
c012d830 1002 2.01788 kmalloc
c02376c0 762 1.53456 rt_set_nexthop
c0234270 733 1.47616 pfifo_fast_dequeue
c022a730 709 1.42782 __kfree_skb
c022a480 633 1.27477 alloc_skb
c012da00 629 1.26671 kfree
c0233ca0 623 1.25463 eth_type_trans
c02304a0 549 1.10561 dst_destroy
c0230380 519 1.04519 dst_alloc
c02319d0 447 0.900193 neigh_hh_init
c0236380 426 0.857902 rt_garbage_collect
c010c910 425 0.855889 do_gettimeofday
c022e590 395 0.795473 netif_receive_skb
c0234200 392 0.789431 pfifo_fast_enqueue
c0233ef0 381 0.767279 qdisc_restart
c022dff0 345 0.69478 dev_queue_xmit
c025de90 298 0.600129 __fib_res_prefsrc
c023d6e0 294 0.592073 ip_finish_output
c02567d0 145 0.292009 arp_bind_neighbour
Here's 2.5.72 (which seems to have all of the patches already in), empty
routing table:
60.0085 seconds passed, avg forwarding rate: 166543.268 pps
60.0080 seconds passed, avg forwarding rate: 167055.912 pps
60.0051 seconds passed, avg forwarding rate: 166843.560 pps
vma samples % symbol name
c02c0020 5193 10.2685 fn_hash_lookup
c02930d0 3475 6.87139 ip_route_input_slow
c02222c0 2349 4.64486 tg3_start_xmit
c0291c10 2217 4.38385 rt_intern_hash
c02bde40 1910 3.77679 fib_validate_source
c0293900 1864 3.68583 ip_route_input
c02953d0 1646 3.25477 ip_rcv
c0288b40 1609 3.1816 netif_receive_skb
c0135210 1462 2.89093 kmem_cache_free
c0134fd0 1457 2.88104 free_block
c02216a0 1390 2.74856 tg3_rx
c0135150 1295 2.56071 kmem_cache_alloc
c02912d0 1275 2.52116 rt_hash_code
c028f040 1250 2.47172 eth_header
c0134e00 1250 2.47172 cache_alloc_refill
c028ca40 1223 2.41833 neigh_resolve_output
c028f740 1072 2.11975 pfifo_fast_dequeue
c0135190 1019 2.01495 __kmalloc
c028ba50 991 1.95958 neigh_lookup
c02bf1d0 843 1.66693 fib_semantic_match
c01adc10 816 1.61354 memcpy
c0284ba0 768 1.51863 alloc_skb
c0135250 724 1.43162 kfree
c028f1b0 702 1.38812 eth_type_trans
c02b8930 674 1.33275 inet_select_addr
c0288600 668 1.32089 dev_queue_xmit
c0297850 664 1.31298 ip_finish_output
c0221e20 654 1.29321 tg3_set_txd
c028b340 635 1.25564 dst_alloc
c01289e0 632 1.2497 call_rcu
c0296370 606 1.19829 ip_forward
c0297af0 567 1.12117 ip_output
c028c8c0 547 1.08163 neigh_hh_init
c028b470 547 1.08163 dst_destroy
c011f080 489 0.966938 local_bh_enable
c0221550 477 0.94321 tg3_recycle_rx
Erp, needed a new rtstat. And a wider console, apparently:
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf HASH: in_search out_search
22233 18 329638 0 0 0 0 0 0 0 0 329638 329634 2 0 665742 0
20523 22 329074 0 0 0 0 0 0 0 0 329074 329070 2 0 665184 0
23510 26 331502 0 0 0 0 0 0 0 0 331502 331498 2 0 671610 0
22552 24 330464 0 0 0 0 0 0 0 0 330464 330460 4 0 669214 0
20359 8 329512 0 0 0 0 0 0 0 0 329512 329508 2 0 664428 0
19965 22 330090 0 0 0 0 0 0 0 0 330090 330086 2 0 663296 0
20081 20 332660 0 0 0 0 0 0 0 0 332660 332656 2 0 671912 0
21113 22 330458 0 0 0 0 0 0 0 0 330458 330454 2 0 666340 0
19864 14 329778 0 0 0 0 0 0 0 0 329778 329774 2 0 667324 0
20195 18 329702 0 0 0 0 0 0 0 0 329702 329698 2 0 670646 0
Route cache size does not increase on 2.5, so the problems in 2.4 are
probably the result of me hacking in the 2.5 patch.
2.5.72, full routing table:
60.0057 seconds passed, avg forwarding rate: 101800.795 pps
60.0045 seconds passed, avg forwarding rate: 101612.797 pps
60.0046 seconds passed, avg forwarding rate: 102004.873 pps
60.0044 seconds passed, avg forwarding rate: 102042.629 pps
60.0055 seconds passed, avg forwarding rate: 102135.224 pps
60.0057 seconds passed, avg forwarding rate: 102158.546 pps
60.0044 seconds passed, avg forwarding rate: 102200.430 pps
vma samples % symbol name
c02c0020 14206 33.0911 fn_hash_lookup
c02930d0 2103 4.89867 ip_route_input_slow
c02222c0 1436 3.34498 tg3_start_xmit
c0291c10 1328 3.09341 rt_intern_hash
c02bde40 1315 3.06313 fib_validate_source
c0293900 1122 2.61356 ip_route_input
c02953d0 1028 2.3946 ip_rcv
c02bf1d0 1013 2.35966 fib_semantic_match
c0288b40 957 2.22921 netif_receive_skb
c0134fd0 840 1.95667 free_block
c0135210 823 1.91707 kmem_cache_free
c02b8930 811 1.88912 inet_select_addr
c02216a0 804 1.87282 tg3_rx
c028ca40 801 1.86583 neigh_resolve_output
c02912d0 786 1.83089 rt_hash_code
c028f040 709 1.65153 eth_header
c0135190 700 1.63056 __kmalloc
c0134e00 692 1.61193 cache_alloc_refill
c028f740 644 1.50012 pfifo_fast_dequeue
c028ba50 597 1.39064 neigh_lookup
c0135150 591 1.37666 kmem_cache_alloc
c0284ba0 539 1.25553 alloc_skb
c01adc10 491 1.14372 memcpy
c0135250 449 1.04589 kfree
c028b340 437 1.01794 dst_alloc
c028f1b0 433 1.00862 eth_type_trans
c0297af0 422 0.982996 ip_output
c01289e0 402 0.936408 call_rcu
c0297850 400 0.931749 ip_finish_output
c0296370 386 0.899138 ip_forward
c0221e20 384 0.894479 tg3_set_txd
c0288600 365 0.850221 dev_queue_xmit
c011f080 364 0.847892 local_bh_enable
c02919c0 356 0.829257 rt_garbage_collect
c028c8c0 317 0.738411 neigh_hh_init
c028b470 313 0.729094 dst_destroy
c02212e0 299 0.696483 tg3_tx
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf HASH: in_search out_search
18755 12 202740 0 0 0 0 0 0 0 0 202740 202736 2 0 405030 0
19945 20 203884 0 0 0 0 0 0 0 0 203884 203880 2 0 406726 0
18449 8 204152 0 0 0 0 0 0 0 0 204152 204148 0 0 409590 0
19637 10 205302 0 0 0 0 0 0 0 0 205302 205298 2 0 413004 0
19213 10 204022 0 0 0 0 0 0 0 0 204022 204018 2 0 411092 0
20182 8 204280 0 0 0 0 0 0 0 0 204280 204276 2 0 412044 0
19311 14 203378 0 0 0 0 0 0 0 0 203378 203374 2 0 411052 0
18790 16 202480 0 0 0 0 0 0 0 0 202480 202476 2 0 409440 0
18835 24 204776 0 0 0 0 0 0 0 0 204776 204772 0 0 414416 0
19830 8 204792 0 0 0 0 0 0 0 0 204792 204788 2 0 415514 0
Simon-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 22:50 ` Simon Kirby
@ 2003-06-17 23:07 ` David S. Miller
0 siblings, 0 replies; 31+ messages in thread
From: David S. Miller @ 2003-06-17 23:07 UTC (permalink / raw)
To: sim; +Cc: Robert.Olsson, ralph+d, hadi, xerox, fw, netdev, linux-net
From: Simon Kirby <sim@netnation.com>
Date: Tue, 17 Jun 2003 15:50:36 -0700
so the problems in 2.4 are probably the result of me hacking in the
2.5 patch.
I have them in my pending 2.4.x tree, try this:
diff -Nru a/include/net/route.h b/include/net/route.h
--- a/include/net/route.h Tue Jun 17 16:08:06 2003
+++ b/include/net/route.h Tue Jun 17 16:08:06 2003
@@ -114,6 +114,8 @@
unsigned int gc_ignored;
unsigned int gc_goal_miss;
unsigned int gc_dst_overflow;
+ unsigned int in_hlist_search;
+ unsigned int out_hlist_search;
} ____cacheline_aligned_in_smp;
extern struct ip_rt_acct *ip_rt_acct;
diff -Nru a/net/ipv4/Config.in b/net/ipv4/Config.in
--- a/net/ipv4/Config.in Tue Jun 17 16:08:06 2003
+++ b/net/ipv4/Config.in Tue Jun 17 16:08:06 2003
@@ -14,7 +14,6 @@
bool ' IP: equal cost multipath' CONFIG_IP_ROUTE_MULTIPATH
bool ' IP: use TOS value as routing key' CONFIG_IP_ROUTE_TOS
bool ' IP: verbose route monitoring' CONFIG_IP_ROUTE_VERBOSE
- bool ' IP: large routing tables' CONFIG_IP_ROUTE_LARGE_TABLES
fi
bool ' IP: kernel level autoconfiguration' CONFIG_IP_PNP
if [ "$CONFIG_IP_PNP" = "y" ]; then
diff -Nru a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
--- a/net/ipv4/fib_hash.c Tue Jun 17 16:08:07 2003
+++ b/net/ipv4/fib_hash.c Tue Jun 17 16:08:07 2003
@@ -89,7 +89,7 @@
int fz_nent; /* Number of entries */
int fz_divisor; /* Hash divisor */
- u32 fz_hashmask; /* (1<<fz_divisor) - 1 */
+ u32 fz_hashmask; /* (fz_divisor - 1) */
#define FZ_HASHMASK(fz) ((fz)->fz_hashmask)
int fz_order; /* Zone order */
@@ -149,9 +149,19 @@
static rwlock_t fib_hash_lock = RW_LOCK_UNLOCKED;
-#define FZ_MAX_DIVISOR 1024
+#define FZ_MAX_DIVISOR ((PAGE_SIZE<<MAX_ORDER) / sizeof(struct fib_node *))
-#ifdef CONFIG_IP_ROUTE_LARGE_TABLES
+static struct fib_node **fz_hash_alloc(int divisor)
+{
+ unsigned long size = divisor * sizeof(struct fib_node *);
+
+ if (divisor <= 1024) {
+ return kmalloc(size, GFP_KERNEL);
+ } else {
+ return (struct fib_node **)
+ __get_free_pages(GFP_KERNEL, get_order(size));
+ }
+}
/* The fib hash lock must be held when this is called. */
static __inline__ void fn_rebuild_zone(struct fn_zone *fz,
@@ -174,6 +184,15 @@
}
}
+static void fz_hash_free(struct fib_node **hash, int divisor)
+{
+ if (divisor <= 1024)
+ kfree(hash);
+ else
+ free_pages((unsigned long) hash,
+ get_order(divisor * sizeof(struct fib_node *)));
+}
+
static void fn_rehash_zone(struct fn_zone *fz)
{
struct fib_node **ht, **old_ht;
@@ -185,24 +204,30 @@
switch (old_divisor) {
case 16:
new_divisor = 256;
- new_hashmask = 0xFF;
break;
case 256:
new_divisor = 1024;
- new_hashmask = 0x3FF;
break;
default:
- printk(KERN_CRIT "route.c: bad divisor %d!\n", old_divisor);
- return;
+ if ((old_divisor << 1) > FZ_MAX_DIVISOR) {
+ printk(KERN_CRIT "route.c: bad divisor %d!\n", old_divisor);
+ return;
+ }
+ new_divisor = (old_divisor << 1);
+ break;
}
+
+ new_hashmask = (new_divisor - 1);
+
#if RT_CACHE_DEBUG >= 2
printk("fn_rehash_zone: hash for zone %d grows from %d\n", fz->fz_order, old_divisor);
#endif
- ht = kmalloc(new_divisor*sizeof(struct fib_node*), GFP_KERNEL);
+ ht = fz_hash_alloc(new_divisor);
if (ht) {
memset(ht, 0, new_divisor*sizeof(struct fib_node*));
+
write_lock_bh(&fib_hash_lock);
old_ht = fz->fz_hash;
fz->fz_hash = ht;
@@ -210,10 +235,10 @@
fz->fz_divisor = new_divisor;
fn_rebuild_zone(fz, old_ht, old_divisor);
write_unlock_bh(&fib_hash_lock);
- kfree(old_ht);
+
+ fz_hash_free(old_ht, old_divisor);
}
}
-#endif /* CONFIG_IP_ROUTE_LARGE_TABLES */
static void fn_free_node(struct fib_node * f)
{
@@ -233,12 +258,11 @@
memset(fz, 0, sizeof(struct fn_zone));
if (z) {
fz->fz_divisor = 16;
- fz->fz_hashmask = 0xF;
} else {
fz->fz_divisor = 1;
- fz->fz_hashmask = 0;
}
- fz->fz_hash = kmalloc(fz->fz_divisor*sizeof(struct fib_node*), GFP_KERNEL);
+ fz->fz_hashmask = (fz->fz_divisor - 1);
+ fz->fz_hash = fz_hash_alloc(fz->fz_divisor);
if (!fz->fz_hash) {
kfree(fz);
return NULL;
@@ -467,12 +491,10 @@
if ((fi = fib_create_info(r, rta, n, &err)) == NULL)
return err;
-#ifdef CONFIG_IP_ROUTE_LARGE_TABLES
- if (fz->fz_nent > (fz->fz_divisor<<2) &&
+ if (fz->fz_nent > (fz->fz_divisor<<1) &&
fz->fz_divisor < FZ_MAX_DIVISOR &&
(z==32 || (1<<z) > fz->fz_divisor))
fn_rehash_zone(fz);
-#endif
fp = fz_chain_p(key, fz);
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c Tue Jun 17 16:08:07 2003
+++ b/net/ipv4/route.c Tue Jun 17 16:08:07 2003
@@ -108,7 +108,7 @@
int ip_rt_max_size;
int ip_rt_gc_timeout = RT_GC_TIMEOUT;
int ip_rt_gc_interval = 60 * HZ;
-int ip_rt_gc_min_interval = 5 * HZ;
+int ip_rt_gc_min_interval = HZ / 2;
int ip_rt_redirect_number = 9;
int ip_rt_redirect_load = HZ / 50;
int ip_rt_redirect_silence = ((HZ / 50) << (9 + 1));
@@ -287,7 +287,7 @@
for (lcpu = 0; lcpu < smp_num_cpus; lcpu++) {
i = cpu_logical_map(lcpu);
- len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
+ len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
dst_entries,
rt_cache_stat[i].in_hit,
rt_cache_stat[i].in_slow_tot,
@@ -304,7 +304,9 @@
rt_cache_stat[i].gc_total,
rt_cache_stat[i].gc_ignored,
rt_cache_stat[i].gc_goal_miss,
- rt_cache_stat[i].gc_dst_overflow
+ rt_cache_stat[i].gc_dst_overflow,
+ rt_cache_stat[i].in_hlist_search,
+ rt_cache_stat[i].out_hlist_search
);
}
@@ -344,16 +346,17 @@
rth->u.dst.expires;
}
-static __inline__ int rt_may_expire(struct rtable *rth, int tmo1, int tmo2)
+static __inline__ int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2)
{
- int age;
+ unsigned long age;
int ret = 0;
if (atomic_read(&rth->u.dst.__refcnt))
goto out;
ret = 1;
- if (rth->u.dst.expires && (long)(rth->u.dst.expires - jiffies) <= 0)
+ if (rth->u.dst.expires &&
+ time_after_eq(jiffies, rth->u.dst.expires))
goto out;
age = jiffies - rth->u.dst.lastuse;
@@ -365,6 +368,25 @@
out: return ret;
}
+/* Bits of score are:
+ * 31: very valuable
+ * 30: not quite useless
+ * 29..0: usage counter
+ */
+static inline u32 rt_score(struct rtable *rt)
+{
+ u32 score = rt->u.dst.__use;
+
+ if (rt_valuable(rt))
+ score |= (1<<31);
+
+ if (!rt->key.iif ||
+ !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL)))
+ score |= (1<<30);
+
+ return score;
+}
+
/* This runs via a timer and thus is always in BH context. */
static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy)
{
@@ -375,7 +397,7 @@
for (t = ip_rt_gc_interval << rt_hash_log; t >= 0;
t -= ip_rt_gc_timeout) {
- unsigned tmo = ip_rt_gc_timeout;
+ unsigned long tmo = ip_rt_gc_timeout;
i = (i + 1) & rt_hash_mask;
rthp = &rt_hash_table[i].chain;
@@ -384,7 +406,7 @@
while ((rth = *rthp) != NULL) {
if (rth->u.dst.expires) {
/* Entry is expired even if it is in use */
- if ((long)(now - rth->u.dst.expires) <= 0) {
+ if (time_before_eq(now, rth->u.dst.expires)) {
tmo >>= 1;
rthp = &rth->u.rt_next;
continue;
@@ -402,7 +424,7 @@
write_unlock(&rt_hash_table[i].lock);
/* Fallback loop breaker. */
- if ((jiffies - now) > 0)
+ if (time_after(jiffies, now))
break;
}
rover = i;
@@ -504,7 +526,7 @@
static int rt_garbage_collect(void)
{
- static unsigned expire = RT_GC_TIMEOUT;
+ static unsigned long expire = RT_GC_TIMEOUT;
static unsigned long last_gc;
static int rover;
static int equilibrium;
@@ -556,7 +578,7 @@
int i, k;
for (i = rt_hash_mask, k = rover; i >= 0; i--) {
- unsigned tmo = expire;
+ unsigned long tmo = expire;
k = (k + 1) & rt_hash_mask;
rthp = &rt_hash_table[k].chain;
@@ -602,7 +624,7 @@
if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
goto out;
- } while (!in_softirq() && jiffies - now < 1);
+ } while (!in_softirq() && time_before_eq(jiffies, now));
if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
goto out;
@@ -626,10 +648,19 @@
static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
struct rtable *rth, **rthp;
- unsigned long now = jiffies;
+ unsigned long now;
+ struct rtable *cand, **candp;
+ u32 min_score;
+ int chain_length;
int attempts = !in_softirq();
restart:
+ chain_length = 0;
+ min_score = ~(u32)0;
+ cand = NULL;
+ candp = NULL;
+ now = jiffies;
+
rthp = &rt_hash_table[hash].chain;
write_lock_bh(&rt_hash_table[hash].lock);
@@ -650,9 +681,35 @@
return 0;
}
+ if (!atomic_read(&rth->u.dst.__refcnt)) {
+ u32 score = rt_score(rth);
+
+ if (score <= min_score) {
+ cand = rth;
+ candp = rthp;
+ min_score = score;
+ }
+ }
+
+ chain_length++;
+
rthp = &rth->u.rt_next;
}
+ if (cand) {
+ /* ip_rt_gc_elasticity used to be average length of chain
+ * length, when exceeded gc becomes really aggressive.
+ *
+ * The second limit is less certain. At the moment it allows
+ * only 2 entries per bucket. We will see.
+ */
+ if (chain_length > ip_rt_gc_elasticity ||
+ (chain_length > 1 && !(min_score & (1<<31)))) {
+ *candp = cand->u.rt_next;
+ rt_free(cand);
+ }
+ }
+
/* Try to bind route to arp only if it is output
route or unicast forwarding path.
*/
@@ -960,7 +1017,7 @@
/* No redirected packets during ip_rt_redirect_silence;
* reset the algorithm.
*/
- if (jiffies - rt->u.dst.rate_last > ip_rt_redirect_silence)
+ if (time_after(jiffies, rt->u.dst.rate_last + ip_rt_redirect_silence))
rt->u.dst.rate_tokens = 0;
/* Too many ignored redirects; do not send anything
@@ -974,8 +1031,9 @@
/* Check for load limit; set rate_last to the latest sent
* redirect.
*/
- if (jiffies - rt->u.dst.rate_last >
- (ip_rt_redirect_load << rt->u.dst.rate_tokens)) {
+ if (time_after(jiffies,
+ (rt->u.dst.rate_last +
+ (ip_rt_redirect_load << rt->u.dst.rate_tokens)))) {
icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway);
rt->u.dst.rate_last = jiffies;
++rt->u.dst.rate_tokens;
@@ -1672,6 +1730,7 @@
skb->dst = (struct dst_entry*)rth;
return 0;
}
+ rt_cache_stat[smp_processor_id()].in_hlist_search++;
}
read_unlock(&rt_hash_table[hash].lock);
@@ -2032,6 +2091,7 @@
*rp = rth;
return 0;
}
+ rt_cache_stat[smp_processor_id()].out_hlist_search++;
}
read_unlock_bh(&rt_hash_table[hash].lock);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Route cache performance tests
2003-06-17 20:51 ` Simon Kirby
2003-06-17 20:49 ` David S. Miller
@ 2003-06-18 5:50 ` Pekka Savola
1 sibling, 0 replies; 31+ messages in thread
From: Pekka Savola @ 2003-06-18 5:50 UTC (permalink / raw)
To: Simon Kirby
Cc: David S. Miller, gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw,
netdev, linux-net
On Tue, 17 Jun 2003, Simon Kirby wrote:
> On Tue, Jun 17, 2003 at 01:36:35PM -0700, David S. Miller wrote:
>
> > I have no idea why they do this, it's the stupidest thing
> > you can possibly do by default.
> >
> > If we thought it was a good idea to turn this on by default
> > we would have done so in the kernel.
> >
> > Does anyone have some cycles to spare to try and urge whoever is
> > repsponsible for this in Debian to leave the kernel's default setting
> > alone?
>
> Sure, I can do this. But why is this stupid? It uses more CPU, but
> stops IP spoofing by default. Specific firewall rules would have to be
> created otherwise. And the overhead only really shows when the routing
> table is large, right?
Personally I think rp_filter by default is the only good choice
(security/operational-wise). It's typically not useful when you have a
lot of routes, though.. but as the 99.9% of users _don't_, it still seems
like a good default value.
--
Pekka Savola "You each name yourselves king, yet the
Netcore Oy kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2003-06-18 5:50 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-10 7:57 Route cache performance tests Simon Kirby
2003-06-10 11:23 ` Jamal Hadi
2003-06-10 20:36 ` CIT/Paul
2003-06-10 13:34 ` Ralph Doncaster
2003-06-10 13:39 ` Jamal Hadi
2003-06-13 6:20 ` David S. Miller
2003-06-16 22:37 ` Simon Kirby
2003-06-16 22:44 ` David S. Miller
2003-06-16 23:09 ` Simon Kirby
2003-06-16 23:08 ` David S. Miller
2003-06-16 23:27 ` Simon Kirby
2003-06-16 23:49 ` Simon Kirby
2003-06-17 15:59 ` David S. Miller
2003-06-17 16:50 ` Robert Olsson
2003-06-17 16:50 ` David S. Miller
2003-06-17 17:29 ` Robert Olsson
2003-06-17 19:06 ` Mr. James W. Laferriere
2003-06-17 20:12 ` Robert Olsson
2003-06-17 20:07 ` Simon Kirby
2003-06-17 20:17 ` Martin Josefsson
2003-06-17 20:37 ` Simon Kirby
2003-06-17 20:36 ` David S. Miller
2003-06-17 20:51 ` Simon Kirby
2003-06-17 20:49 ` David S. Miller
2003-06-18 5:50 ` Pekka Savola
2003-06-17 20:49 ` Robert Olsson
2003-06-17 21:07 ` Simon Kirby
2003-06-17 22:50 ` Simon Kirby
2003-06-17 23:07 ` David S. Miller
2003-06-17 22:11 ` Ralph Doncaster
2003-06-17 22:08 ` David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).