netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Route cache performance tests
@ 2003-06-10  7:57 Simon Kirby
  2003-06-10 11:23 ` Jamal Hadi
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Simon Kirby @ 2003-06-10  7:57 UTC (permalink / raw)
  To: ralph+d, Jamal Hadi, CIT/Paul, 'David S. Miller',
	fw@deneb.enyo.de
  Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org

Okay, I got a chance to run some first tests and have found some simple
results that might be worth a read.  The test setup is as follows
(I'll probably be using this setup for a number of other tests):

[ My work desktop, other test boxes on network ]
 |   |   |   |   |
[ 100 Mbit Switch ]
         |
         | (100 Mbit)
         |
[ Dual tg3 dual 1.4 GHz Opertron box, 1 GB RAM ]
         |
         | (1000 MBit)
         |
[ Single e1000 single 2.4 GHz Xeon box ]

I have a route added on the test boxes to stuff traffic destined for the
Xeon box through the Opertron box.  Forwarding is enabled on the Opertron
box, and it has a route for the Xeon box.

I am testing with Juno right now because it generates the (pseudo-)random
IP traffic which we is where the problem is right now.  We already know
Linux can do hundreds of thousands of pps of ip<->ip traffic, so we can
test that later.

Juno seems to be able to send about 150,000 pps from my Celery desktop. 

Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward
an amazing 39,000 packets per second.  Woohoo!  NAPI definitely kicks in
and seems to work even on SMP (blink?).  The output of "rtstat -i 1" is
somewhat interesting.  The "GC: tot" field seems to almost exactly match
the forwarded packet count, which is handy:

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
    8         4       4     0     0     0     0     0         0       0      0       0       0         0    0
    8         3       3     0     0     0     0     0         0       0      0       0       0         0    0
    8         5       6     0     0     0     0     0         0       0      0       0       0         0    0
    8         4       4     0     0     0     0     0         0       0      0       0       0         0    0
    8         5       5     0     0     0     0     0         0       0      0       0       0         0    0
    9         3       5     0     0     1     0     0         0       0      0       0       0         0    0
33549        11   65533     0     0     0     0     0         0       0      0   57347   57345         1    0
53499        13   65200     0     0     1     0     0         0       0      0   65196   65194         1    0
65536        19   65540     0     0     1     0     0         0       0      0   65538   64879         0    0
65536        11   33980     0     0     0     0     0         0       0      0   33978    6123         0    0
65536         9   37491     0     0     1     0     0         0       0      0   37489     930         0    0
65536        13   40487     0     0     0     0     0         0       0      0   40484     991         0    0
65536        13   39287     0     0     1     0     0         0       0      0   39284     933         0    0
65536        10   40790     0     0     1     0     0         0       0      0   40789    1006         0    0
65536        17   37783     0     0     0     0     0         0       0      0   37781     866         0    0
65536         8   38092     0     0     0     0     0         0       0      0   38090     880         0    0
65536        14   38086     0     0     1     0     0         0       0      0   38085     877         0    0
65536        13   39587     0     0     0     0     0         0       0      0   39586     922         0    0
65536        18   39882     0     0     1     0     0         0       0      0   39880     908         0    0
65536         8   39292     0     0     0     0     0         0       0      0   39290     894         0    0
65536        10   38390     0     0     4     0     0         0       0      0   38389     879         0    0
65536        13   38087     0     0     0     0     0         0       0      0   38086     830         0    0
65536        10   38692     0     0     0     0     0         0       0      0   38690     845         0    0
65536        16   38982     0     0     1     0     0         0       0      0   38981     899         0    0

The above is with stock settings.  Note how the table completely fills up
causing the forward rate to suffer.

In an attempt to improve performance, I tried "echo 0 > gc_min_interval":

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
65536        15   39585     0     0     0     0     0         0       0      0   39585     909         0    0
65535        13   39587     0     0     1     0     0         0       0      0   39587     877         0    0
32027        10   70044     0     0     0     0     0         0       0      0   70043       0         6    0
32013         8   71092     0     0     0     0     0         0       0      0   71091       0         0    0
31995        10   72290     0     0     1     0     0         0       0      0   72290       0         0    0
31969        13   71087     0     0     2     0     0         0       0      0   71083       0         0    0
31950         5   71695     0     0     0     0     0         0       0      0   71693       0         0    0
31937        10   71690     0     0     2     0     0         0       0      0   71690       0         0    0
31927        10   71390     0     0     0     0     0         0       0      0   71389       0         0    0
31915        18   71382     0     0     0     0     0         0       0      0   71381       0         0    0
31897         5   71395     0     0     0     0     0         0       0      0   71394       0         0    0
31881         7   70793     0     0     0     0     0         0       0      0   70793       0         0    0
31869         5   71095     0     0     0     0     0         0       0      0   71094       0         0    0
31863        16   71084     0     0     0     0     0         0       0      0   71082       0         0    0
31846        22   70778     0     0     0     0     0         0       0      0   70776       0         0    0
31825         5   70795     0     0     1     0     0         0       0      0   70795       0         0    0
31816        10   70490     0     0     0     0     0         0       0      0   70488       0         0    0

And then decided to try "ip route flush cache":

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
31768         8   70192     0     0     0     0     0         0       0      0   70190       0         0    0
31757        15   70185     0     0     1     0     0         0       0      0   70184       0         0    0
31743         5   70495     0     0     1     0     0         0       0      0   70491       0         0    0
 8204         2   83314     0     0     0     0     0         1       2      0   75524       0        89    0
 8204         2   88859     0     0     0     0     0         1       0      0   88449       0        84    0
 8203         3   85797     0     0     1     0     0         0       0      0   85795       0         0    0
 8203         0   86100     0     0     0     0     0         0       0      0   86098       0         0    0

...And then I tried reducing gc_thresh:

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
 8200         7   85793     0     0     1     0     0         0       0      0   85790       0         0    0
 8200         4   85796     0     0     1     0     0         0       0      0   85792       0         0    0
 8200        13   86087     0     0     0     0     0         0       0      0   86086       0         0    0
 8200         3   86097     0     0     0     0     0         0       0      0   86096       0         0    0
 1530         4   87896     0     0     0     0     0         0       0      0   87277       0       562    0
 1370         0  135832     0     0     0     0     0         0       0      0  135829       0       617    0
 1348         0  135952     0     0     2     0     0         0       0      0  135952       0       543    0
 1341         0  135740     0     0     0     0     0         0       0      0  135739       0       529    0
 1348         1  135817     0     0     1     0     0         0       0      0  135817       0       567    0

I tried fiddling with more settings, even setting gc_thresh to 1, but I
wasn't able to get the route cache much smaller than that or get it to
forward any more packets per second.

In any case, setting gc_min_interval to 0 definitely helped, but I
suspect Dave's patches will make a bigger difference.  Next up is
2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-10  7:57 Route cache performance tests Simon Kirby
@ 2003-06-10 11:23 ` Jamal Hadi
  2003-06-10 20:36   ` CIT/Paul
  2003-06-10 13:34 ` Ralph Doncaster
  2003-06-13  6:20 ` David S. Miller
  2 siblings, 1 reply; 31+ messages in thread
From: Jamal Hadi @ 2003-06-10 11:23 UTC (permalink / raw)
  To: Simon Kirby
  Cc: ralph+d, CIT/Paul, 'David S. Miller', fw@deneb.enyo.de,
	netdev@oss.sgi.com, linux-net@vger.kernel.org



On Tue, 10 Jun 2003, Simon Kirby wrote:

[some good stuff deleted]

Simon,
I havent looked at your data in details; i will. Someone like Robert
would be able to snuff it much faster than i do. I just wanna say thanks
for the effort, I will spend time catching up with you folks.
It is clear that our next hurudle is gc.
Do you have profiles for your data? Profiles would be nice to collect
as well.

> In any case, setting gc_min_interval to 0 definitely helped, but I
> suspect Dave's patches will make a bigger difference.  Next up is
> 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
>

Also since you are doing all that work post the kernels somewhere so
people like foo can grab them and test as well.

cheers,
jamal

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-10  7:57 Route cache performance tests Simon Kirby
  2003-06-10 11:23 ` Jamal Hadi
@ 2003-06-10 13:34 ` Ralph Doncaster
  2003-06-10 13:39   ` Jamal Hadi
  2003-06-13  6:20 ` David S. Miller
  2 siblings, 1 reply; 31+ messages in thread
From: Ralph Doncaster @ 2003-06-10 13:34 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Jamal Hadi, CIT/Paul, 'David S. Miller', fw@deneb.enyo.de,
	netdev@oss.sgi.com, linux-net@vger.kernel.org

On Tue, 10 Jun 2003, Simon Kirby wrote:

> Running with vanilla 2.4.21-rc7 (for now), the kernel manages to forward
> an amazing 39,000 packets per second.  Woohoo!

I hope that's sarcasm.  I know if you posted to NANOG saying it took a
dual 1.4Ghz Opteron to route 39kpps under linux you'd be laughed off the
list.  Maybe I should be bragging about my 3-minute lap times on the
Shannonville track in my M5!

-Ralph

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-10 13:34 ` Ralph Doncaster
@ 2003-06-10 13:39   ` Jamal Hadi
  0 siblings, 0 replies; 31+ messages in thread
From: Jamal Hadi @ 2003-06-10 13:39 UTC (permalink / raw)
  To: ralph+d
  Cc: Simon Kirby, CIT/Paul, 'David S. Miller',
	fw@deneb.enyo.de, netdev@oss.sgi.com, linux-net@vger.kernel.org



On Tue, 10 Jun 2003, Ralph Doncaster wrote:

> I hope that's sarcasm.  I know if you posted to NANOG saying it took a
> dual 1.4Ghz Opteron to route 39kpps under linux you'd be laughed off the
> list.  Maybe I should be bragging about my 3-minute lap times on the
> Shannonville track in my M5!

Ralph,
Take a look at the sprint core routers i posted earlier. See how much
data they actually route ;->
This are damn expensive routers btw with OC48 interface.

On your other comment on being able to get rid of the route cache when you
dont need it, i am actually indifferent. You may have a point.

cheers,
jamal

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Route cache performance tests
  2003-06-10 11:23 ` Jamal Hadi
@ 2003-06-10 20:36   ` CIT/Paul
  0 siblings, 0 replies; 31+ messages in thread
From: CIT/Paul @ 2003-06-10 20:36 UTC (permalink / raw)
  To: 'Jamal Hadi', 'Simon Kirby'
  Cc: ralph+d, 'David S. Miller', fw, netdev, linux-net

I'd be happy to set up a repository ftp site or maybe even some cvs
servers so all of us can test 
All these things and share data.  We are an ISP so it wouldn't be too
hard to just pop up another server
To store all this :>  Let me know 

Paul xerox@foonet.net http://www.httpd.net


-----Original Message-----
From: Jamal Hadi [mailto:hadi@shell.cyberus.ca] 
Sent: Tuesday, June 10, 2003 7:23 AM
To: Simon Kirby
Cc: ralph+d@istop.com; CIT/Paul; 'David S. Miller'; fw@deneb.enyo.de;
netdev@oss.sgi.com; linux-net@vger.kernel.org
Subject: Re: Route cache performance tests




On Tue, 10 Jun 2003, Simon Kirby wrote:

[some good stuff deleted]

Simon,
I havent looked at your data in details; i will. Someone like Robert
would be able to snuff it much faster than i do. I just wanna say thanks
for the effort, I will spend time catching up with you folks. It is
clear that our next hurudle is gc. Do you have profiles for your data?
Profiles would be nice to collect as well.

> In any case, setting gc_min_interval to 0 definitely helped, but I 
> suspect Dave's patches will make a bigger difference.  Next up is 
> 2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
>

Also since you are doing all that work post the kernels somewhere so
people like foo can grab them and test as well.

cheers,
jamal

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-10  7:57 Route cache performance tests Simon Kirby
  2003-06-10 11:23 ` Jamal Hadi
  2003-06-10 13:34 ` Ralph Doncaster
@ 2003-06-13  6:20 ` David S. Miller
  2003-06-16 22:37   ` Simon Kirby
  2 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-13  6:20 UTC (permalink / raw)
  To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Tue, 10 Jun 2003 00:57:32 -0700
   
   In any case, setting gc_min_interval to 0 definitely helped, but I
   suspect Dave's patches will make a bigger difference.  Next up is
   2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.

Did you get stuck in some mud? :-)  It's been two days.

I even posted new patches for you to test, get on it :)))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-13  6:20 ` David S. Miller
@ 2003-06-16 22:37   ` Simon Kirby
  2003-06-16 22:44     ` David S. Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 22:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

On Thu, Jun 12, 2003 at 11:20:02PM -0700, David S. Miller wrote:

>    In any case, setting gc_min_interval to 0 definitely helped, but I
>    suspect Dave's patches will make a bigger difference.  Next up is
>    2.5.70-bk14 and 2.5.70-bk14+davem's stuff from yesterday.
> 
> Did you get stuck in some mud? :-)  It's been two days.
> 
> I even posted new patches for you to test, get on it :)))

Ok, I dug myself out. :)

I have oprofile working, and I wrote a simple application to measure
received pps on the receiving box with gettimeofday() accuracy.  To
reduce noise I am profiling for one minute periods.  The sender is
capable of sending about 315,000 pps via an e1000.

So, which kernels shall I try?  When I set the thing up I was using
2.5.70-bk14, but I am compiling 2.4.71, and I will try with your patch
above and with Alexey's.

Stock 2.4.21-rc7 (CONFIG_IP_MULTIPLE_TABLES=y):

60.0047 seconds passed, avg forwarding rate: 122169.980 pps
60.0052 seconds passed, avg forwarding rate: 123650.166 pps
60.0045 seconds passed, avg forwarding rate: 122352.499 pps
60.0059 seconds passed, avg forwarding rate: 121830.346 pps
60.0046 seconds passed, avg forwarding rate: 121714.614 pps
60.0057 seconds passed, avg forwarding rate: 121927.324 pps
60.0061 seconds passed, avg forwarding rate: 121995.740 pps
60.0064 seconds passed, avg forwarding rate: 122168.417 pps
60.0030 seconds passed, avg forwarding rate: 123245.149 pps
60.0062 seconds passed, avg forwarding rate: 122613.361 pps

(CPU type is still Opteron, oprofile is wrong. :))
Cpu type: Hammer
Cpu speed was (MHz estimation) : 1393.98
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c0278b00 4537     12.6063     fn_hash_lookup
c027a1d0 3293     9.14976     fib_lookup
c024aa70 2989     8.30508     rt_intern_hash
c024c7d0 2195     6.09892     ip_route_input
c024c000 2020     5.61267     ip_route_input_slow
c024ed20 1244     3.45652     ip_rcv
c0247650 1234     3.42873     eth_header
c0276490 1226     3.4065      fib_validate_source
c024a710 1173     3.25924     rt_garbage_collect
c0132ee0 930      2.58405     kmalloc
c0242f60 924      2.56738     neigh_lookup
c02502a0 915      2.54237     ip_forward
c0132da0 875      2.43123     kmem_cache_alloc
c02444e0 735      2.04223     neigh_resolve_output
c0249f70 734      2.03946     rt_hash_code
c0133070 717      1.99222     kmem_cache_free
c024bcd0 666      1.85051     rt_set_nexthop
c023bfe0 632      1.75604     __kfree_skb
c02778e0 620      1.7227      fib_semantic_match
c023bce0 611      1.69769     alloc_skb
c0247f30 529      1.46985     pfifo_fast_dequeue
c0247830 524      1.45596     eth_type_trans
c02404b0 521      1.44762     netif_receive_skb
c0132cb0 520      1.44485     free_block
c0247a80 487      1.35315     qdisc_restart
c01330f0 466      1.2948      kfree
c02426e0 456      1.26702     dst_alloc
c023fd60 454      1.26146     dev_queue_xmit
c0132bd0 401      1.1142      kmem_cache_alloc_batch
c0272740 370      1.02806     inet_select_addr
c0252e80 353      0.980828    ip_finish_output
c0244350 325      0.903029    neigh_hh_init
c0242840 321      0.891914    dst_destroy
c010d5c0 315      0.875243    do_gettimeofday
c0247ec0 276      0.76688     pfifo_fast_enqueue
c024baa0 245      0.680745    ipv4_dst_destroy
c023bf70 216      0.600167    kfree_skbmem
c0120160 210      0.583495    cpu_raise_softirq
c026f1e0 156      0.433454    arp_bind_neighbour
c0277990 136      0.377883    __fib_res_prefsrc
c027a020 126      0.350097    fib_rules_policy
c0279d60 96       0.266741    fib_rule_put
c0240370 68       0.188941    net_tx_action
c026ec10 37       0.102806    arp_hash
c023bf00 34       0.0944707   skb_release_data
c0240770 14       0.0388997   net_rx_action

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
118130        56  123028     0     0     0     0     0         0       0      0  123028  123024         0    0
109456        58  122426     0     0     0     0     0         0       0      0  122426  122422         0    0
113318        52  124832     0     0     0     0     0         0       0      0  124832  124828         0    0
104356        53  122131     0     0     0     0     0         0       0      0  122131  122127         0    0
98333        56  125064     0     0     0     0     0         0       0      0  125064  125060         0    0
125925        57  125899     0     0     0     0     0         0       0      0  125899  125896         0    0
117516        44  122676     0     0     0     0     0         0       0      0  122676  122672         0    0
121088        48  124472     0     0     0     0     0         0       0      0  124472  124468         0    0
113049        43  123041     0     0     0     0     0         0       0      0  123041  123037         0    0
104339        43  122377     0     0     0     0     0         0       0      0  122377  122373         0    0
98324        46  125074     0     0     0     0     0         0       0      0  125074  125070         0    0
126818        40  126816     0     0     0     0     0         0       0      0  126816  126813         0    0
131018        54  124958     0     0     0     0     0         0       0      0  124958  124954         0    0
122303        51  122369     0     0     0     0     0         0       0      0  122369  122365         0    0
113661        46  122438     0     0     0     0     0         0       0      0  122438  122434         0    0
104350        49  121771     0     0     0     0     0         0       0      0  121771  121767         0    0
98330        58  125062     0     0     0     0     0         0       0      0  125062  125058         0    0
102841        36  124675     0     0     0     0     0         0       0      0  124675  124671         0    0
131031        49  126507     0     0     0     0     0         0       0      0  126507  126504         0    0


Stock 2.5.70-bk14 (CONFIG_IP_MULTIPLE_TABLES=y):

(There is some noise in forward pps rate because the route cache keeps
ballooning and collapsing all over the place.)

60.0042 seconds passed, avg forwarding rate: 102595.362 pps
60.0039 seconds passed, avg forwarding rate: 102690.418 pps
60.0043 seconds passed, avg forwarding rate: 102254.257 pps
60.0036 seconds passed, avg forwarding rate: 102708.344 pps
60.0052 seconds passed, avg forwarding rate: 102647.544 pps
60.0036 seconds passed, avg forwarding rate: 102697.595 pps
60.0042 seconds passed, avg forwarding rate: 102326.652 pps
60.0043 seconds passed, avg forwarding rate: 102615.183 pps
60.0081 seconds passed, avg forwarding rate: 101990.399 pps
60.0036 seconds passed, avg forwarding rate: 102220.386 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c0290480 8213     15.6283     rt_garbage_collect
c011ee80 4979     9.47443     local_bh_enable
c02bf140 3616     6.8808      fn_hash_lookup
c02906d0 1977     3.76199     rt_intern_hash
c0291b10 1949     3.70871     ip_route_input_slow
c028d1d0 1647     3.13404     nf_iterate
c028d4c0 1575     2.99703     nf_hook_slow
c0220c10 1513     2.87905     tg3_start_xmit
c02c0a00 1470     2.79723     fib_lookup
c02923e0 1169     2.22446     ip_route_input
c0220010 895      1.70308     tg3_rx
c0134e20 877      1.66882     kmem_cache_free
c0134d60 863      1.64218     kmem_cache_alloc
c028e270 855      1.62696     pfifo_fast_dequeue
c0134bf0 855      1.62696     free_block
c02bcf30 837      1.59271     fib_validate_source
c0294450 833      1.5851      ip_rcv_finish
c0286b20 782      1.48805     netif_receive_skb
c028db70 765      1.4557      eth_header
c0293ee0 750      1.42716     ip_rcv
c0290190 727      1.38339     rt_may_expire
c028fda0 689      1.31108     rt_hash_code
c0295210 679      1.29205     ip_forward
c0134a30 648      1.23306     cache_alloc_refill
c0134da0 636      1.21023     kmalloc
c0298870 591      1.1246      ip_finish_output2
c0134e60 520      0.989496    kfree
c01ad3d0 504      0.95905     memcpy
c028a860 498      0.947633    neigh_resolve_output
c0282c90 477      0.907672    alloc_skb
c02c08e0 474      0.901964    fib_rules_policy
c02be300 474      0.901964    fib_semantic_match
c0289870 471      0.896255    neigh_lookup
c028dce0 466      0.886741    eth_type_trans
c0289160 445      0.84678     dst_alloc
c0295450 442      0.841072    ip_forward_finish
c02865e0 407      0.774471    dev_queue_xmit
c02b7950 383      0.728802    inet_select_addr
c0289290 344      0.65459     dst_destroy
c0296790 325      0.618435    ip_finish_output
c02b4140 320      0.608921    arp_hash
c021fc50 317      0.603212    tg3_tx
c02d0090 313      0.595601    ipv4_sabotage_out
c028e1f0 306      0.58228     pfifo_fast_enqueue
c021fec0 296      0.563252    tg3_recycle_rx
c028a6e0 294      0.559446    neigh_hh_init

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
 5402        29   82379     0     0     0     0     0         0       0      0   76979   76401       575  575
131075        29  125867     0     0     0     0     0         0       0      0  123076  122879       194  194
117660        26  118554     0     0     0     0     0         0       0      0  110363  109467       896  896
86134        22  100138     0     0     0     0     0         0       0      0   91947   91353       591  591
48682        28   94224     0     0     0     0     0         0       0      0   86033   85427       603  603
 3419        28   86216     0     0     0     0     0         0       0      0   82916   82390       523  523
131075        25  127871     0     0     0     0     0         0       0      0  122980  122879        98   98
116937        33  117383     0     0     0     0     0         0       0      0  109192  108744       448  448
83306        17   98079     0     0     0     0     0         0       0      0   89888   89248       637  637
43585        17   91951     0     0     0     0     0         0       0      0   83760   83158       599  599
  715        23   88681     0     0     0     0     0         0       0      0   88081   87487       591  591
104407        37  134703     0     0     0     0     0         0       0      0  127112  127111         0    0
67650        14   94902     0     0     0     0     0         0       0      0   86711   86122       587  587
23861        25   87875     0     0     0     0     0         0       0      0   79684   79090       591  591
  670        26  108386     0     0     0     0     0         0       0      0  107786  107211       572  572
131075        38  130550     0     0     0     0     0         0       0      0  122959  122879        77   77
99588        14  108546     0     0     0     0     0         0       0      0  100355   99586       768  768
61804        27   93912     0     0     0     0     0         0       0      0   85721   85095       624  624
14826        27   84721     0     0     0     0     0         0       0      0   76530   75901       626  626


2.4.71 (CONFIG_IP_MULTIPLE_TABLES=y) w/correction to make
flow_cache_init compile w/CONFIG_SMP=n (register_cpu_notifier):

60.0060 seconds passed, avg forwarding rate: 103857.780 pps
60.0036 seconds passed, avg forwarding rate: 104893.408 pps
60.0061 seconds passed, avg forwarding rate: 104623.946 pps
60.0040 seconds passed, avg forwarding rate: 104457.440 pps
60.0057 seconds passed, avg forwarding rate: 104505.375 pps
60.0042 seconds passed, avg forwarding rate: 103663.532 pps
60.0043 seconds passed, avg forwarding rate: 104240.425 pps
60.0042 seconds passed, avg forwarding rate: 104422.699 pps
60.0034 seconds passed, avg forwarding rate: 104252.729 pps
60.0058 seconds passed, avg forwarding rate: 104138.597 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27 Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask)
count 697000
vma      samples  %           symbol name
c0292270 8941     16.2664     rt_garbage_collect
c011f080 5295     9.63323     local_bh_enable
c02c0fb0 3943     7.17353     fn_hash_lookup
c02924c0 2209     4.01885     rt_intern_hash
c0293900 2115     3.84783     ip_route_input_slow
c028ef90 1725     3.1383      nf_iterate
c028f280 1673     3.0437      nf_hook_slow
c02c2880 1536     2.79445     fib_lookup
c02941d0 1381     2.51246     ip_route_input
c0222330 1307     2.37783     tg3_start_xmit
c0296240 1000     1.81931     ip_rcv_finish
c0134ff0 961      1.74835     free_block
c0221710 918      1.67012     tg3_rx
c0290030 909      1.65375     pfifo_fast_dequeue
c0135230 861      1.56642     kmem_cache_free
c0295cd0 844      1.53549     ip_rcv
c02bed90 835      1.51912     fib_validate_source
c0135170 822      1.49547     kmem_cache_alloc
c028f930 818      1.48819     eth_header
c0291f80 741      1.34811     rt_may_expire
c02886a0 726      1.32082     netif_receive_skb
c0291b80 708      1.28807     rt_hash_code
c0134e20 684      1.24441     cache_alloc_refill
c01351b0 644      1.17163     __kmalloc
c028b610 595      1.08249     neigh_lookup
c028c600 542      0.986064    neigh_resolve_output
c0284620 538      0.978787    alloc_skb
c029a680 534      0.97151     ip_finish_output2
c0135270 510      0.927846    kfree
c01adc80 484      0.880544    memcpy
c028af00 468      0.851435    dst_alloc
c02c0160 438      0.796856    fib_semantic_match
c028faa0 434      0.789579    eth_type_trans
c0297020 419      0.762289    ip_forward
c0288160 418      0.76047     dev_queue_xmit
c02c2760 398      0.724084    fib_rules_policy
c0297260 394      0.716807    ip_forward_finish
c02b9790 391      0.711349    inet_select_addr
c0221e90 374      0.680421    tg3_set_txd
c028ffb0 348      0.633119    pfifo_fast_enqueue
c028fcc0 339      0.616745    qdisc_restart
c028b030 326      0.593094    dst_destroy
c02d1fb0 301      0.547611    ipv4_sabotage_out
c02985a0 299      0.543973    ip_finish_output
c0128a00 293      0.533057    call_rcu
c0221350 284      0.516683    tg3_tx

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
26127        24   92112     0     0     0     0     0         0       0      0   83920   83318       599  599
  722        32  106131     0     0     0     0     0         0       0      0  105531  104945       583  583
131075        23  130793     0     0     0     0     0         0       0      0  123201  122879       319  319
131074        27  131449     0     0     0     0     0         0       0      0  123257  122878       376  376
120198        23  120965     0     0     0     0     0         0       0      0  112773  112005       768  768
93359        26  104746     0     0     0     0     0         0       0      0   96554   96040       511  511
59620        17   97943     0     0     0     0     0         0       0      0   89751   89140       608  608
18593        20   90640     0     0     0     0     0         0       0      0   82448   81851       593  593
   70        23  113117     0     0     0     0     0         0       0      0  113117  112479       636  636
131075        22  131306     0     0     0     0     0         0       0      0  123114  122879       232  232
110767        18  111534     0     0     0     0     0         0       0      0  103342  102574       768  768
79859        20  100776     0     0     0     0     0         0       0      0   92584   91971       610  610
43481        24   95328     0     0     0     0     0         0       0      0   87136   86500       632  632
  704        34   88794     0     0     0     0     0         0       0      0   88194   87591       601  601
131075        36  130692     0     0     0     0     0         0       0      0  123100  122879       218  218
110844        25  111611     0     0     0     0     0         0       0      0  103419  102651       768  768
80008        18  100862     0     0     0     0     0         0       0      0   92670   92043       624  624
43390        24   95064     0     0     0     0     0         0       0      0   86872   86260       608  608
  720        31   88869     0     0     0     0     0         0       0      0   88269   87682       585  585


2.4.71 (CONFIG_IP_MULTIPLE_TABLES=n)

60.0039 seconds passed, avg forwarding rate: 108482.881 pps
60.0036 seconds passed, avg forwarding rate: 107850.012 pps
60.0043 seconds passed, avg forwarding rate: 108330.941 pps
60.0063 seconds passed, avg forwarding rate: 108424.657 pps
60.0071 seconds passed, avg forwarding rate: 108575.916 pps
60.0040 seconds passed, avg forwarding rate: 107774.861 pps
60.0053 seconds passed, avg forwarding rate: 107765.720 pps
60.0065 seconds passed, avg forwarding rate: 108021.888 pps
60.0039 seconds passed, avg forwarding rate: 107364.055 pps
60.0061 seconds passed, avg forwarding rate: 107593.173 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c0292260 7149     13.2856     rt_garbage_collect
c011f080 6875     12.7764     local_bh_enable
c02c0df0 3440     6.39286     fn_hash_lookup
c02924b0 2180     4.05129     rt_intern_hash
c02938c0 2158     4.01041     ip_route_input_slow
c028ef90 1769     3.28749     nf_iterate
c0222330 1644     3.05519     tg3_start_xmit
c028f280 1601     2.97528     nf_hook_slow
c02940f0 1209     2.24679     ip_route_input
c02bec10 1187     2.20591     fib_validate_source
c0135230 987      1.83423     kmem_cache_free
c0134ff0 950      1.76547     free_block
c0135170 924      1.71715     kmem_cache_alloc
c0295c00 918      1.706       ip_rcv
c0221710 908      1.68742     tg3_rx
c0290020 907      1.68556     pfifo_fast_dequeue
c0296170 897      1.66698     ip_rcv_finish
c028f920 808      1.50158     eth_header
c0291b70 804      1.49415     rt_hash_code
c02886a0 740      1.37521     netif_receive_skb
c0134e20 739      1.37335     cache_alloc_refill
c0291f70 736      1.36778     rt_may_expire
c0296f50 703      1.30645     ip_forward
c028b610 676      1.25627     neigh_lookup
c01351b0 669      1.24326     __kmalloc
c02bffa0 637      1.18379     fib_semantic_match
c029a5b0 635      1.18008     ip_finish_output2
c0135270 614      1.14105     kfree
c0284620 544      1.01096     alloc_skb
c028c600 515      0.957071    neigh_resolve_output
c01adc80 498      0.925479    memcpy
c028fa90 464      0.862293    eth_type_trans
c0288160 457      0.849285    dev_queue_xmit
c0297190 425      0.789816    ip_forward_finish
c028af00 420      0.780524    dst_alloc
c02b9680 411      0.763799    inet_select_addr
c028ffa0 350      0.650437    pfifo_fast_enqueue
c028b030 350      0.650437    dst_destroy
c02984d0 330      0.613269    ip_finish_output
c0221350 326      0.605835    tg3_tx
c0128a00 324      0.602119    call_rcu
c028fcb0 323      0.60026     qdisc_restart
c028c480 323      0.60026     neigh_hh_init
c02215c0 291      0.540792    tg3_recycle_rx
c02d0e80 258      0.479465    ipv4_sabotage_in
c02d0ec0 257      0.477606    ipv4_sabotage_out

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
103545        28  107316     0     0     0     0     0         0       0      0   99124   98495       626  626
76151        24  104300     0     0     0     0     0         0       0      0   96108   95485       620  620
10200        26   83194     0     0     0     0     0         0       0      0   75002   74396       603  603
  702        38  122078     0     0     0     0     0         0       0      0  121478  120872       603  603
131075        39  130952     0     0     0     0     0         0       0      0  123360  122879       478  478
126869        28  126996     0     0     0     0     0         0       0      0  118804  118676       128  128
85330        22  107046     0     0     0     0     0         0       0      0   98854   98259       591  591
50739        22   97102     0     0     0     0     0         0       0      0   88910   88288       620  620
10501        17   91459     0     0     0     0     0         0       0      0   83267   82641       623  623
  689        34  121790     0     0     0     0     0         0       0      0  121190  120571       616  616
131075        33  130963     0     0     0     0     0         0       0      0  123371  122879       489  489
110335        44  126916     0     0     0     0     0         0       0      0  118724  118659        64   64
81862        16  103228     0     0     0     0     0         0       0      0   95036   94406       628  628
50717        24  100536     0     0     0     0     0         0       0      0   92344   91734       607  607
12301        33   93251     0     0     0     0     0         0       0      0   85059   84463       593  593
  684        33  119995     0     0     0     0     0         0       0      0  119395  118771       621  621
131074        26  146318     0     0     0     0     0         0       0      0  138726  138610       113  113
114248        25  115015     0     0     0     0     0         0       0      0  106823  106055       768  768
88917        18  106318     0     0     0     0     0         0       0      0   98126   97548       575  575
57970        24  100752     0     0     0     0     0         0       0      0   92560   91932       625  625


2.5.71-davem-rtcache-jun9:

60.006 seconds passed, avg forwarding rate: 160182.941 pps
60.0077 seconds passed, avg forwarding rate: 159805.476 pps
60.007 seconds passed, avg forwarding rate: 160274.907 pps
60.0084 seconds passed, avg forwarding rate: 160212.101 pps
60.0045 seconds passed, avg forwarding rate: 159345.161 pps
60.0076 seconds passed, avg forwarding rate: 159552.768 pps
60.0046 seconds passed, avg forwarding rate: 159416.702 pps
60.0035 seconds passed, avg forwarding rate: 160435.829 pps
60.0043 seconds passed, avg forwarding rate: 160015.150 pps
60.0072 seconds passed, avg forwarding rate: 159309.661 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c02c0fa0 4875     8.75586     fn_hash_lookup
c02939f0 3839     6.89513     ip_route_input_slow
c028f040 2507     4.50276     nf_iterate
c028f330 2450     4.40038     nf_hook_slow
c0222330 2377     4.26927     tg3_start_xmit
c02bedc0 1722     3.09284     fib_validate_source
c0134ff0 1478     2.6546      free_block
c0292560 1444     2.59353     rt_intern_hash
c0221710 1409     2.53067     tg3_rx
c0135230 1406     2.52528     kmem_cache_free
c0296320 1399     2.51271     ip_rcv_finish
c0294260 1281     2.30077     ip_route_input
c02886a0 1235     2.21815     netif_receive_skb
c028f9d0 1194     2.14451     eth_header
c028af00 1153     2.07087     dst_alloc
c0291c20 1120     2.0116      rt_hash_code
c0295db0 1111     1.99544     ip_rcv
c0135170 1109     1.99185     kmem_cache_alloc
c0134e20 1109     1.99185     cache_alloc_refill
c02900d0 1016     1.82481     pfifo_fast_dequeue
c028b6c0 1014     1.82122     neigh_lookup
c01351b0 1003     1.80146     __kmalloc
c02c0150 894      1.60569     fib_semantic_match
c01adc80 838      1.50511     memcpy
c029a760 837      1.50331     ip_finish_output2
c028c6b0 801      1.43866     neigh_resolve_output
c0288160 792      1.42249     dev_queue_xmit
c0135270 789      1.4171      kfree
c0284620 740      1.32909     alloc_skb
c011f080 661      1.1872      local_bh_enable
c028fb40 656      1.17822     eth_type_trans
c02b9830 648      1.16386     inet_select_addr
c0297340 601      1.07944     ip_forward_finish
c02929d0 591      1.06148     __rt_hash_shrink
c0290050 540      0.96988     pfifo_fast_enqueue
c028b0e0 532      0.955511    dst_destroy
c0298680 511      0.917794    ip_finish_output
c0128a00 501      0.899833    call_rcu
c0297100 479      0.860319    ip_forward
c02d1070 478      0.858523    ipv4_sabotage_out
c0221350 451      0.810029    tg3_tx
c028c530 431      0.774108    neigh_hh_init
c02215c0 426      0.765127    tg3_recycle_rx
c028fd60 394      0.707653    qdisc_restart
c02d1030 375      0.673528    ipv4_sabotage_in
c0292310 359      0.64479     rt_garbage_collect

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
21502         9  160507     0     0     0     0     0         0       0      0  160507  160507         0    0
23701        15  160485     0     0     0     0     0         0       0      0  160485  160484         1    0
21866         6  160498     0     0     0     0     0         0       0      0  160498  160498         0    0
23551        12  160464     0     0     0     0     0         0       0      0  160464  160464         0    0
23266        13  160203     0     0     0     0     0         0       0      0  160203  160203         0    0
22095         9  160591     0     0     0     0     0         0       0      0  160591  160591         0    0
23962        15  160461     0     0     0     0     0         0       0      0  160461  160460         0    0
22066        13  158691     0     0     0     0     0         0       0      0  158691  158691         0    0
22951        10  160166     0     0     0     0     0         0       0      0  160166  160166         0    0
21861        18  159134     0     0     0     0     0         0       0      0  159134  159134         0    0
21097         6  159126     0     0     0     0     0         0       0      0  159126  159125         1    0
22943         6  161350     0     0     0     0     0         0       0      0  161350  161350         0    0
21692         4  160124     0     0     0     0     0         0       0      0  160124  160124         0    0
23524        16  161184     0     0     0     0     0         0       0      0  161184  161184         0    0
20471        15  160833     0     0     0     0     0         0       0      0  160833  160833         0    0
23160         5  161643     0     0     0     0     0         0       0      0  161643  161642         0    0
21981        10  160518     0     0     0     0     0         0       0      0  160518  160518         0    0
20640        11  160145     0     0     0     0     0         0       0      0  160145  160145         0    0
21536        14  160194     0     0     0     0     0         0       0      0  160194  160194         0    0
21110        10  161550     0     0     0     0     0         0       0      0  161550  161550         0    0

...What next? :)

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-16 22:37   ` Simon Kirby
@ 2003-06-16 22:44     ` David S. Miller
  2003-06-16 23:09       ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-16 22:44 UTC (permalink / raw)
  To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Mon, 16 Jun 2003 15:37:14 -0700

   So, which kernels shall I try?  When I set the thing up I was using
   2.5.70-bk14, but I am compiling 2.5.71, and I will try with your patch
   above and with Alexey's.

Thanks for your profiles.

I pushed all of our current work to Linus's tree.
But for your convenience here are the routing diffs
against plain 2.5.71

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1318.1.15 -> 1.1318.1.16
#	    net/ipv4/route.c	1.63    -> 1.64   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/06/16	kuznet@ms2.inr.ac.ru	1.1318.1.16
# [IPV4]: More sane rtcache behavior.
# 1) More reasonable ip_rt_gc_min_interval default
# 2) Trim less valuable entries in hash chain during
#    rt_intern_hash when such chains grow too long.
# --------------------------------------------
#
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c	Mon Jun 16 15:45:20 2003
+++ b/net/ipv4/route.c	Mon Jun 16 15:45:20 2003
@@ -111,7 +111,7 @@
 int ip_rt_max_size;
 int ip_rt_gc_timeout		= RT_GC_TIMEOUT;
 int ip_rt_gc_interval		= 60 * HZ;
-int ip_rt_gc_min_interval	= 5 * HZ;
+int ip_rt_gc_min_interval	= HZ / 2;
 int ip_rt_redirect_number	= 9;
 int ip_rt_redirect_load		= HZ / 50;
 int ip_rt_redirect_silence	= ((HZ / 50) << (9 + 1));
@@ -456,6 +456,25 @@
 out:	return ret;
 }
 
+/* Bits of score are:
+ * 31: very valuable
+ * 30: not quite useless
+ * 29..0: usage counter
+ */
+static inline u32 rt_score(struct rtable *rt)
+{
+	u32 score = rt->u.dst.__use;
+
+	if (rt_valuable(rt))
+		score |= (1<<31);
+
+	if (!rt->fl.iif ||
+	    !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL)))
+		score |= (1<<30);
+
+	return score;
+}
+
 /* This runs via a timer and thus is always in BH context. */
 static void rt_check_expire(unsigned long dummy)
 {
@@ -721,6 +740,9 @@
 {
 	struct rtable	*rth, **rthp;
 	unsigned long	now = jiffies;
+	struct rtable *cand = NULL, **candp = NULL;
+	u32 		min_score = ~(u32)0;
+	int		chain_length = 0;
 	int attempts = !in_softirq();
 
 restart:
@@ -755,7 +777,33 @@
 			return 0;
 		}
 
+		if (!atomic_read(&rth->u.dst.__refcnt)) {
+			u32 score = rt_score(rth);
+
+			if (score <= min_score) {
+				cand = rth;
+				candp = rthp;
+				min_score = score;
+			}
+		}
+
+		chain_length++;
+
 		rthp = &rth->u.rt_next;
+	}
+
+	if (cand) {
+		/* ip_rt_gc_elasticity used to be average length of chain
+		 * length, when exceeded gc becomes really aggressive.
+		 *
+		 * The second limit is less certain. At the moment it allows
+		 * only 2 entries per bucket. We will see.
+		 */
+		if (chain_length > ip_rt_gc_elasticity ||
+		    (chain_length > 1 && !(min_score & (1<<31)))) {
+			*candp = cand->u.rt_next;
+			rt_free(cand);
+		}
 	}
 
 	/* Try to bind route to arp only if it is output
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1320.1.1 -> 1.1320.1.2
#	    net/ipv4/route.c	1.64    -> 1.65   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/06/16	robert.olsson@data.slu.se	1.1320.1.2
# [IPV4]: In rt_intern_hash, reinit all state vars on branch to "restart".
# --------------------------------------------
#
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c	Mon Jun 16 15:46:05 2003
+++ b/net/ipv4/route.c	Mon Jun 16 15:46:05 2003
@@ -739,13 +739,19 @@
 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
 	struct rtable	*rth, **rthp;
-	unsigned long	now = jiffies;
-	struct rtable *cand = NULL, **candp = NULL;
-	u32 		min_score = ~(u32)0;
-	int		chain_length = 0;
+	unsigned long	now;
+	struct rtable *cand, **candp;
+	u32 		min_score;
+	int		chain_length;
 	int attempts = !in_softirq();
 
 restart:
+	chain_length = 0;
+	min_score = ~(u32)0;
+	cand = NULL;
+	candp = NULL;
+	now = jiffies;
+
 	rthp = &rt_hash_table[hash].chain;
 
 	spin_lock_bh(&rt_hash_table[hash].lock);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-16 23:09       ` Simon Kirby
@ 2003-06-16 23:08         ` David S. Miller
  2003-06-16 23:27           ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-16 23:08 UTC (permalink / raw)
  To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Mon, 16 Jun 2003 16:09:22 -0700

   On Mon, Jun 16, 2003 at 03:44:01PM -0700, David S. Miller wrote:
   
   > I pushed all of our current work to Linus's tree.
   > But for your convenience here are the routing diffs
   > against plain 2.5.71
   
   Trying to apply against 2.5.71:
   
   patching file net/ipv4/route.c
   Hunk #2 succeeded at 454 (offset -2 lines).
   Hunk #3 succeeded at 738 (offset -2 lines).
   Hunk #4 succeeded at 775 (offset -2 lines).
   patching file net/ipv4/route.c
   Hunk #1 FAILED at 739.
   1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej
   
   Trying to apply against 2.5.71-bk2:
   
   patching file net/ipv4/route.c
   patching file net/ipv4/route.c
   Hunk #1 FAILED at 739.
   1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej
   
   Missing something between?
   
   Code from bk2:
   
   static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
   {
           struct rtable   *rth, **rthp;
           unsigned long   now = jiffies;
           int attempts = !in_softirq();
   
   Patch:
   
It depends upon the first patch that I enclosed.
What I gave you was a 2-part patch, the first one
did:

@@ -721,6 +740,9 @@
 {
 	struct rtable	*rth, **rthp;
 	unsigned long	now = jiffies;
+	struct rtable *cand = NULL, **candp = NULL;
+	u32 		min_score = ~(u32)0;
+	int		chain_length = 0;
 	int attempts = !in_softirq();
 
 restart:

The second one did:

@@ -739,13 +739,19 @@
 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
 	struct rtable	*rth, **rthp;
-	unsigned long	now = jiffies;
-	struct rtable *cand = NULL, **candp = NULL;
-	u32 		min_score = ~(u32)0;
-	int		chain_length = 0;
+	unsigned long	now;
+	struct rtable *cand, **candp;
+	u32 		min_score;
+	int		chain_length;
 	int attempts = !in_softirq();
 
...

I have no idea why it doesn't apply.
Nothing else has happened in these bits of code for a while.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-16 22:44     ` David S. Miller
@ 2003-06-16 23:09       ` Simon Kirby
  2003-06-16 23:08         ` David S. Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 23:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

On Mon, Jun 16, 2003 at 03:44:01PM -0700, David S. Miller wrote:

> I pushed all of our current work to Linus's tree.
> But for your convenience here are the routing diffs
> against plain 2.5.71

Trying to apply against 2.5.71:

patching file net/ipv4/route.c
Hunk #2 succeeded at 454 (offset -2 lines).
Hunk #3 succeeded at 738 (offset -2 lines).
Hunk #4 succeeded at 775 (offset -2 lines).
patching file net/ipv4/route.c
Hunk #1 FAILED at 739.
1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej

Trying to apply against 2.5.71-bk2:

patching file net/ipv4/route.c
patching file net/ipv4/route.c
Hunk #1 FAILED at 739.
1 out of 1 hunk FAILED -- saving rejects to file net/ipv4/route.c.rej

Missing something between?

Code from bk2:

static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
{
        struct rtable   *rth, **rthp;
        unsigned long   now = jiffies;
        int attempts = !in_softirq();

Patch:

 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
        struct rtable   *rth, **rthp;
-       unsigned long   now = jiffies;
-       struct rtable *cand = NULL, **candp = NULL;
...

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-16 23:08         ` David S. Miller
@ 2003-06-16 23:27           ` Simon Kirby
  2003-06-16 23:49             ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 23:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

On Mon, Jun 16, 2003 at 04:08:56PM -0700, David S. Miller wrote:

> It depends upon the first patch that I enclosed.

Never mind. :)  Such patches don't work very well with patch --dry.

Simon-

[        Simon Kirby        ][        Network Operations        ]
[     sim@netnation.com     ][   NetNation Communications Inc.  ]
[  Opinions expressed are not necessarily those of my employer. ]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-16 23:27           ` Simon Kirby
@ 2003-06-16 23:49             ` Simon Kirby
  2003-06-17 15:59               ` David S. Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-16 23:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

On Mon, Jun 16, 2003 at 04:27:50PM -0700, Simon Kirby wrote:

> On Mon, Jun 16, 2003 at 04:08:56PM -0700, David S. Miller wrote:
> 
> > It depends upon the first patch that I enclosed.
> 
> Never mind. :)  Such patches don't work very well with patch --dry.

Okay, here goes 2.5.71 + this patch:

60.0049 seconds passed, avg forwarding rate: 160190.859 pps
60.0085 seconds passed, avg forwarding rate: 157118.708 pps
60.0046 seconds passed, avg forwarding rate: 157211.097 pps
60.0073 seconds passed, avg forwarding rate: 157557.710 pps

...Looks like a tad worse than with your patch, but not by much. 
Forwarding rate is still pretty crappy for an Opteron.  Will fiddle
a bit more tonight to see what I can do.

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.27
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c02c0ea0 5113     9.07075     fn_hash_lookup
c0293970 3264     5.79052     ip_route_input_slow
c028ef90 2734     4.85027     nf_iterate
c028f280 2525     4.47949     nf_hook_slow
c02924b0 2127     3.77342     rt_intern_hash
c0222330 2125     3.76987     tg3_start_xmit
c02becc0 1755     3.11347     fib_validate_source
c0290020 1684     2.98751     pfifo_fast_dequeue
c0296220 1531     2.71608     ip_rcv_finish
c0135230 1449     2.57061     kmem_cache_free
c0134ff0 1431     2.53867     free_block
c0221710 1369     2.42868     tg3_rx
c0295cb0 1350     2.39498     ip_rcv
c0135170 1304     2.31337     kmem_cache_alloc
c02941a0 1258     2.23176     ip_route_input
c028f920 1255     2.22644     eth_header
c0134e20 1148     2.03662     cache_alloc_refill
c0291b70 1104     1.95856     rt_hash_code
c02886a0 1082     1.91953     netif_receive_skb
c01351b0 983      1.7439      __kmalloc
c028b610 923      1.63745     neigh_lookup
c02c0050 914      1.62149     fib_semantic_match
c029a660 857      1.52037     ip_finish_output2
c028c600 829      1.47069     neigh_resolve_output
c01adc80 766      1.35893     memcpy
c0135270 743      1.31812     kfree
c0297000 741      1.31458     ip_forward
c0284620 686      1.217       alloc_skb
c02b9730 666      1.18152     inet_select_addr
c028fa90 663      1.1762      eth_type_trans
c0128a00 649      1.15136     call_rcu
c0297240 623      1.10524     ip_forward_finish
c028af00 620      1.09991     dst_alloc
c0288160 597      1.05911     dev_queue_xmit
c028ffa0 570      1.01121     pfifo_fast_enqueue
c028b030 486      0.862191    dst_destroy
c0292260 485      0.860417    rt_garbage_collect
c028fcb0 472      0.837355    qdisc_restart
c0221350 467      0.828484    tg3_tx
c028c480 463      0.821388    neigh_hh_init
c02215c0 455      0.807196    tg3_recycle_rx
c02d0f70 447      0.793003    ipv4_sabotage_out
c0298580 443      0.785907    ip_finish_output
c011f080 430      0.762844    local_bh_enable
c010fc40 358      0.635112    do_gettimeofday
c0284860 345      0.612049    __kfree_skb

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
22910        10  158190     0     0     0     0     0         0       0      0  158190  158188         1    0
20590        10  158330     0     0     0     0     0         0       0      0  158330  158328         1    0
20515        14  158306     0     0     0     0     0         0       0      0  158306  158304         1    0
21000         4  158964     0     0     0     0     0         0       0      0  158964  158962         1    0
21631         8  159300     0     0     0     0     0         0       0      0  159300  159298         0    0
20329        13  160059     0     0     0     0     0         0       0      0  160059  160057         1    0
22995         7  157441     0     0     0     0     0         0       0      0  157441  157439         1    0
22418         9  156831     0     0     0     0     0         0       0      0  156831  156829         1    0
22417        11  157321     0     0     0     0     0         0       0      0  157321  157319         1    0
21339         6  157898     0     0     0     0     0         0       0      0  157898  157896         0    0
22562        10  157734     0     0     0     0     0         0       0      0  157734  157732         1    0
20488        12  159496     0     0     0     0     0         0       0      0  159496  159493         1    0
22527        10  157674     0     0     0     0     0         0       0      0  157674  157672         1    0
21992         7  156729     0     0     0     0     0         0       0      0  156729  156727         0    0
21372        10  157106     0     0     0     0     0         0       0      0  157106  157104         1    0
22950        10  156402     0     0     0     0     0         0       0      0  156402  156400         2    0
20471        11  157057     0     0     0     0     0         0       0      0  157057  157055         1    0
20864        13  159082     0     0     0     0     0         0       0      0  159082  159080         0    0
22416        10  157658     0     0     0     0     0         0       0      0  157658  157656         1    0
22659         8  157348     0     0     0     0     0         0       0      0  157348  157346         1    0

Simon-

[        Simon Kirby        ][        Network Operations        ]
[     sim@netnation.com     ][   NetNation Communications Inc.  ]
[  Opinions expressed are not necessarily those of my employer. ]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-16 23:49             ` Simon Kirby
@ 2003-06-17 15:59               ` David S. Miller
  2003-06-17 16:50                 ` Robert Olsson
  0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-17 15:59 UTC (permalink / raw)
  To: sim; +Cc: ralph+d, hadi, xerox, fw, netdev, linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Mon, 16 Jun 2003 16:49:37 -0700

   60.0073 seconds passed, avg forwarding rate: 157557.710 pps
   
   ...Looks like a tad worse than with your patch, but not by much. 
   Forwarding rate is still pretty crappy for an Opteron.  Will fiddle
   a bit more tonight to see what I can do.

To be honest, this isn't half-bad for pure DoS load.

This reminds me, maybe a good test would be PPS for "well behaved
flows" in the presence of DoS load.  You'd probably need 4 systems
to carry out such a test accurately.

Because, really, who cares how fast we can forward the DoS traffic
as long as legitimate users still see good metrics.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 15:59               ` David S. Miller
@ 2003-06-17 16:50                 ` Robert Olsson
  2003-06-17 16:50                   ` David S. Miller
  2003-06-17 20:07                   ` Simon Kirby
  0 siblings, 2 replies; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 16:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: sim, ralph+d, hadi, xerox, fw, netdev, linux-net


David S. Miller writes:

 >    60.0073 seconds passed, avg forwarding rate: 157557.710 pps

 > To be honest, this isn't half-bad for pure DoS load.

 No thats pretty good and profiles looks as expected. It would interesting
 to get the singeflow performance as a comparison.

 Also think Simon used only /32 routes... I took "real" Internet-routing
 and made a script so it can be used for experiments. I can make it available.

 Cheers.

						--ro

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 16:50                 ` Robert Olsson
@ 2003-06-17 16:50                   ` David S. Miller
  2003-06-17 17:29                     ` Robert Olsson
  2003-06-17 20:07                   ` Simon Kirby
  1 sibling, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-17 16:50 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: sim, ralph+d, hadi, xerox, fw, netdev, linux-net

   From: Robert Olsson <Robert.Olsson@data.slu.se>
   Date: Tue, 17 Jun 2003 18:50:03 +0200
   
    Also think Simon used only /32 routes... I took "real"
    Internet-routing and made a script so it can be used for
    experiments. I can make it available.

Please do, I'd like to play with such a list locally.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 16:50                   ` David S. Miller
@ 2003-06-17 17:29                     ` Robert Olsson
  2003-06-17 19:06                       ` Mr. James W. Laferriere
  0 siblings, 1 reply; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 17:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Robert.Olsson, sim, ralph+d, hadi, xerox, fw, netdev, linux-net


David S. Miller writes:

 >     Internet-routing and made a script so it can be used for
 >     experiments. I can make it available.
 > 
 > Please do, I'd like to play with such a list locally.


ftp://robur.slu.se/pub/Linux/net-development/inet_routes/
Just configure the script and run...

And Simon can you do a run with this routing table too? And even fibstat 
output could be interesting to compare.


Cheers.

						--ro



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 17:29                     ` Robert Olsson
@ 2003-06-17 19:06                       ` Mr. James W. Laferriere
  2003-06-17 20:12                         ` Robert Olsson
  0 siblings, 1 reply; 31+ messages in thread
From: Mr. James W. Laferriere @ 2003-06-17 19:06 UTC (permalink / raw)
  To: Robert Olsson; +Cc: netdev, Linux networking maillist

	Hello Robert ,  First thank you for these tools .  Now for the
	questions .

	Is 'fibstat' only for 2.5.x kernels ?
	The reason for that ? is there isn't a /proc/net/fib_stat .
	Under 2.4.21 , which has no mention of fib_stat anywhere in the
	sources .

	The next ? is packet-generator.c (appears) to require kernel
	include files .  Is there an updated user level net/ includes ?

		Tia ,  JimL
On Tue, 17 Jun 2003, Robert Olsson wrote:
> David S. Miller writes:
>  >     Internet-routing and made a script so it can be used for
>  >     experiments. I can make it available.
>  > Please do, I'd like to play with such a list locally.

> ftp://robur.slu.se/pub/Linux/net-development/inet_routes/
> Just configure the script and run...

> And Simon can you do a run with this routing table too? And even fibstat
> output could be interesting to compare.
-- 
       +------------------------------------------------------------------+
       | James   W.   Laferriere | System    Techniques | Give me VMS     |
       | Network        Engineer |     P.O. Box 854     |  Give me Linux  |
       | babydr@baby-dragons.com | Coudersport PA 16915 |   only  on  AXP |
       +------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 16:50                 ` Robert Olsson
  2003-06-17 16:50                   ` David S. Miller
@ 2003-06-17 20:07                   ` Simon Kirby
  2003-06-17 20:17                     ` Martin Josefsson
                                       ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 20:07 UTC (permalink / raw)
  To: Robert Olsson
  Cc: David S. Miller, ralph+d, hadi, xerox, fw, netdev, linux-net

On Tue, Jun 17, 2003 at 06:50:03PM +0200, Robert Olsson wrote:

> David S. Miller writes:
> 
>  >    60.0073 seconds passed, avg forwarding rate: 157557.710 pps
> 
>  > To be honest, this isn't half-bad for pure DoS load.
> 
>  No thats pretty good and profiles looks as expected. It would interesting
>  to get the singeflow performance as a comparison.

I changed Juno to send from a single IP, but it only spat out about
330000 pps, which the dual Tigon3 Opteron box forwarded completely.
In order to do a single flow forwarding test, I need to be able to create
more input traffic somehow.  Seeing as you wrote pktgen.c, maybe you
could help in this department. :)

>  Also think Simon used only /32 routes... I took "real" Internet-routing
>  and made a script so it can be used for experiments. I can make it available.

Yes, I found that area less interesting since Dave M. fixed the hash
buckets.  But yes, the prefix scanning will slow it down some.

Whoa.  Uhm.  A lot.  I should compare with 2.4 again to see what's going
on here.

60.0042 seconds passed, avg forwarding rate: 50759.683 pps
60.0039 seconds passed, avg forwarding rate: 50311.258 pps
60.0046 seconds passed, avg forwarding rate: 50420.562 pps
60.0036 seconds passed, avg forwarding rate: 50399.389 pps
60.0038 seconds passed, avg forwarding rate: 50431.732 pps
60.0041 seconds passed, avg forwarding rate: 50403.777 pps
60.0036 seconds passed, avg forwarding rate: 50210.604 pps
60.0033 seconds passed, avg forwarding rate: 50279.220 pps
60.0036 seconds passed, avg forwarding rate: 50549.291 pps
60.0046 seconds passed, avg forwarding rate: 50437.615 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.26
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c02bf730 16019    33.2014     fn_hash_lookup
c0292b70 3882     8.04593     ip_route_input_slow
c0221710 2335     4.83958     tg3_rx
c02bd550 2004     4.15354     fib_validate_source
c0290d70 1955     4.05198     rt_hash_code
c0294e50 1670     3.46128     ip_rcv
c02933a0 1404     2.90997     ip_route_input
c01351b0 1349     2.79597     __kmalloc
c02885c0 1314     2.72343     netif_receive_skb
c02b8040 1168     2.42083     inet_select_addr
c0135270 1123     2.32756     kfree
c0284620 987      2.04568     alloc_skb
c028ec90 900      1.86536     eth_type_trans
c0135170 860      1.78246     kmem_cache_alloc
c02be8e0 844      1.7493      fib_semantic_match
c0135230 812      1.68297     kmem_cache_free
c0222330 652      1.35135     tg3_start_xmit
c02916b0 648      1.34306     rt_intern_hash
c02215c0 542      1.12336     tg3_recycle_rx
c010fc40 459      0.951335    do_gettimeofday
c028f220 422      0.874648    pfifo_fast_dequeue
c02be9b0 419      0.86843     __fib_res_prefsrc
c028eb20 417      0.864285    eth_header
c0295df0 386      0.800033    ip_forward
c028c520 363      0.752363    neigh_resolve_output
c0284860 345      0.715056    __kfree_skb
c0284840 311      0.644586    kfree_skbmem
c02847a0 295      0.611424    skb_release_data
c028b530 285      0.590698    neigh_lookup
c01adc80 276      0.572044    memcpy
c0134ff0 269      0.557536    free_block
c0291460 240      0.49743     rt_garbage_collect
c0134e20 236      0.489139    cache_alloc_refill
c02972d0 216      0.447687    ip_finish_output
c0114030 215      0.445614    get_offset_tsc
c0128a00 197      0.408307    call_rcu
c028ae20 193      0.400017    dst_alloc
c0288080 187      0.387581    dev_queue_xmit
c028af50 184      0.381363    dst_destroy
c028f1a0 175      0.362709    pfifo_fast_enqueue
c011f080 170      0.352346    local_bh_enable
c0221350 168      0.348201    tg3_tx
c0221e90 160      0.33162     tg3_set_txd
c0297570 152      0.315039    ip_output
c028c3a0 149      0.308821    neigh_hh_init
c028eeb0 141      0.29224     qdisc_restart

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
17929        13  214343     0     0     0     0 163822         0       0      0   50521   50519         0    0
18296        18  213694     0     0     0     0 163018         0       0      0   50676   50674         1    0
17616        11  214529     0     0     0     0 163993         0       0      0   50536   50534         0    0
17841        12  213816     0     0     0     0 163157         0       0      0   50659   50657         1    0
18272         7  214093     0     0     0     0 163583         0       0      0   50510   50508         1    0
18216         9  214843     0     0     0     0 164214         0       0      0   50629   50627         0    0
18318        16  214976     0     0     0     0 164299         0       0      0   50677   50675         0    0
18099         9  213447     0     0     0     0 162995         0       0      0   50452   50450         1    0
17610        14  216438     0     0     0     0 165408         0       0      0   51030   51028         1    0
17643        14  214638     0     0     0     0 163987         0       0      0   50651   50649         0    0
17516         7  213185     0     0     0     0 163016         0       0      0   50169   50167         1    0
18355        10  213894     0     0     0     0 163564         0       0      0   50330   50328         1    0
17723        11  214477     0     0     0     0 163705         0       0      0   50772   50770         0    0
17915         6  214342     0     0     0     0 163625         0       0      0   50717   50715         0    0
18166        19  213965     0     0     0     0 163521         0       0      0   50444   50442         0    0
17943        19  213417     0     0     0     0 162955         0       0      0   50462   50460         2    0
17515         5  214423     0     0     0     0 163718         0       0      0   50705   50703         0    0
18231        10  213434     0     0     0     0 162919         0       0      0   50515   50513         1    0
17523         8  213856     0     0     0     0 163385         0       0      0   50471   50469         0    0
18217        16  214940     0     0     0     0 164165         0       0      0   50775   50773         0    0

...recompiling with fibstats...

Erm.  I can't get fib_stats2.pat to apply against 2.5.71, 2.5.71+davem's
join-two-diffs patch, 2.4.21-rc7, or 2.5.71+davem's rtcache changes. 
What's it supposed to be against?

[sroot@debinst:/d/linux-2.5]# patch -p0 --dry < ../fib_stats2.pat
patching file include/net/ip_fib.h
Hunk #1 succeeded at 139 (offset 4 lines).
patching file net/ipv4/fib_hash.c
Hunk #3 succeeded at 305 (offset -11 lines).
Hunk #4 succeeded at 1110 with fuzz 1 (offset -14 lines).
Hunk #5 succeeded at 1166 (offset -14 lines).
patching file net/ipv4/route.c
Hunk #1 FAILED at 2754.
Hunk #2 succeeded at 2760 (offset -6 lines).
Hunk #3 succeeded at 2783 (offset -6 lines).
Hunk #4 FAILED at 2793.
2 out of 4 hunks FAILED -- saving rejects to file net/ipv4/route.c.rej

In any event, here is the profile of the single flow case with the full
routing table (probably identical to the empty routing table case).  The
sender is pushing enough for NAPI to kick in, so there is a lot of tg3
overhead that would be with more traffic:

60.0041 seconds passed, avg forwarding rate: 329808.310 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.26
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes
exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask)
count 697000
vma      samples  %           symbol name
c0222330 4470     8.51445     tg3_start_xmit
c0221710 3760     7.16204     tg3_rx
c0294e50 3142     5.98488     ip_rcv
c02885c0 2428     4.62485     netif_receive_skb
c0295df0 2065     3.93341     ip_forward
c028f220 2058     3.92007     pfifo_fast_dequeue
c02933a0 2033     3.87245     ip_route_input
c01351b0 1987     3.78483     __kmalloc
c0290d70 1904     3.62674     rt_hash_code
c02972d0 1752     3.33721     ip_finish_output
c01adc80 1649     3.14101     memcpy
c0134ff0 1626     3.0972      free_block
c0284620 1511     2.87815     alloc_skb
c0135270 1489     2.83624     kfree
c0288080 1461     2.78291     dev_queue_xmit
c028ec90 1351     2.57338     eth_type_trans
c0135170 1319     2.51243     kmem_cache_alloc
c028f1a0 1243     2.36766     pfifo_fast_enqueue
c0134e20 1172     2.23242     cache_alloc_refill
c0297570 1145     2.18099     ip_output
c0135230 1133     2.15814     kmem_cache_free
c0221350 1085     2.06671     tg3_tx
c0221a50 991      1.88766     tg3_poll
c02215c0 893      1.70098     tg3_recycle_rx
c0221e90 832      1.58479     tg3_set_txd
c0221b60 812      1.5467      tg3_interrupt
c028eeb0 755      1.43812     qdisc_restart
c010fc40 672      1.28002     do_gettimeofday
c011f080 578      1.10097     local_bh_enable
c0284840 492      0.937161    kfree_skbmem
c010a8b2 426      0.811444    restore_all
c02847a0 375      0.714299    skb_release_data
c010c6a0 327      0.622869    handle_IRQ_event
c010c910 288      0.548582    do_IRQ
c0284860 284      0.540963    __kfree_skb
c0114030 283      0.539058    get_offset_tsc
c021f4b0 270      0.514296    tg3_enable_ints
c0115e10 234      0.445723    end_level_ioapic_irq
c011f4a0 220      0.419056    cpu_raise_softirq
c028eb20 199      0.379055    eth_header
c0288500 188      0.358102    net_tx_action
c028c520 187      0.356197    neigh_resolve_output
c0134b40 179      0.340959    cache_init_objs
c0288900 171      0.32572     net_rx_action
c0132290 165      0.314292    buffered_rmqueue
c01321b0 122      0.232385    free_hot_cold_page

If I start two threads on the sender (Xeon w/HT), I'm able to push 420000
pps, which only partially starts to use NAPI on the Opteron box.  Going
to try 2.4 again for a comparison (note: 2.5 seems to have an opposite
PCI scan order from 2.4 for the dual Tigon3s).

Simon-

[        Simon Kirby        ][        Network Operations        ]
[     sim@netnation.com     ][   NetNation Communications Inc.  ]
[  Opinions expressed are not necessarily those of my employer. ]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 19:06                       ` Mr. James W. Laferriere
@ 2003-06-17 20:12                         ` Robert Olsson
  0 siblings, 0 replies; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 20:12 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: Robert Olsson, netdev, Linux networking maillist


Mr. James W. Laferriere writes:
 > 
 > 	Is 'fibstat' only for 2.5.x kernels ? 

 Yes.

 > 	The reason for that ? is there isn't a /proc/net/fib_stat .
 > 	Under 2.4.21 , which has no mention of fib_stat anywhere in the
 > 	sources .

 The kernel part creates /proc/net/fib_stat. I should pretty straight
 for 2.4.X too. If people find it useful it can be backported. Try. Look 
 at route.c and rt_cache_stat if you run into problems.

 > 	The next ? is packet-generator.c (appears) to require kernel
 > 	include files .  Is there an updated user level net/ includes ?

 It's a kernel module and should be compiled with the kernel.

 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:07                   ` Simon Kirby
@ 2003-06-17 20:17                     ` Martin Josefsson
  2003-06-17 20:37                       ` Simon Kirby
  2003-06-17 20:49                     ` Robert Olsson
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Martin Josefsson @ 2003-06-17 20:17 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Robert Olsson, David S. Miller, ralph+d, hadi, xerox, fw, netdev,
	linux-net

On Tue, 2003-06-17 at 22:07, Simon Kirby wrote:

> >  Also think Simon used only /32 routes... I took "real" Internet-routing
> >  and made a script so it can be used for experiments. I can make it available.
> 
> Yes, I found that area less interesting since Dave M. fixed the hash
> buckets.  But yes, the prefix scanning will slow it down some.
> 
> Whoa.  Uhm.  A lot.  I should compare with 2.4 again to see what's going
> on here.

>  size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
> 17929        13  214343     0     0     0     0 163822         0       0      0   50521   50519         0    0

Did you have rp_filter enabled? Looks like it.

-- 
/Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:37                       ` Simon Kirby
@ 2003-06-17 20:36                         ` David S. Miller
  2003-06-17 20:51                           ` Simon Kirby
  0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2003-06-17 20:36 UTC (permalink / raw)
  To: sim; +Cc: gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw, netdev,
	linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Tue, 17 Jun 2003 13:37:03 -0700
   
   Forwarding rate more than doubles when I turn off
   rp_filter off (Debian turns it on by default).

I have no idea why they do this, it's the stupidest thing
you can possibly do by default.

If we thought it was a good idea to turn this on by default
we would have done so in the kernel.

Does anyone have some cycles to spare to try and urge whoever is
repsponsible for this in Debian to leave the kernel's default setting
alone?

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:17                     ` Martin Josefsson
@ 2003-06-17 20:37                       ` Simon Kirby
  2003-06-17 20:36                         ` David S. Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 20:37 UTC (permalink / raw)
  To: Martin Josefsson
  Cc: Robert Olsson, David S. Miller, ralph+d, hadi, xerox, fw, netdev,
	linux-net

On Tue, Jun 17, 2003 at 10:17:14PM +0200, Martin Josefsson wrote:
> 
> >  size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
> > 17929        13  214343     0     0     0     0 163822         0       0      0   50521   50519         0    0
> 
> Did you have rp_filter enabled? Looks like it.

Yes, good spotting.  Forwarding rate more than doubles when I turn off
rp_filter off (Debian turns it on by default).

60.0049 seconds passed, avg forwarding rate: 108222.462 pps
60.0041 seconds passed, avg forwarding rate: 108868.822 pps
60.0042 seconds passed, avg forwarding rate: 108767.194 pps
60.0040 seconds passed, avg forwarding rate: 108872.188 pps
60.0045 seconds passed, avg forwarding rate: 108856.575 pps
60.0041 seconds passed, avg forwarding rate: 108743.443 pps

Cpu type: Athlon
Cpu speed was (MHz estimation) : 1394.26
Counter 0 counted RETIRED_INSNS events (Retired instructions (includes exceptions, interrupts, resyncs)) with a unit mask of 0x00 (No unit mask) count 697000
vma      samples  %           symbol name
c02bf730 15382    33.0213     fn_hash_lookup
c0292b70 2127     4.56614     ip_route_input_slow
c02916b0 1380     2.96252     rt_intern_hash
c0222330 1340     2.87665     tg3_start_xmit
c02bd550 1336     2.86806     fib_validate_source
c0221710 1219     2.61689     tg3_rx
c02be8e0 1154     2.47735     fib_semantic_match
c0294e50 1068     2.29273     ip_rcv
c028f220 983      2.11026     pfifo_fast_dequeue
c0135230 981      2.10596     kmem_cache_free
c02b8040 906      1.94496     inet_select_addr
c0290d70 901      1.93422     rt_hash_code
c0134ff0 877      1.8827      free_block
c028eb20 873      1.87411     eth_header
c0135170 805      1.72814     kmem_cache_alloc
c02885c0 798      1.71311     netif_receive_skb
c02933a0 788      1.69164     ip_route_input
c028c520 778      1.67017     neigh_resolve_output
c0295df0 744      1.59718     ip_forward
c0134e20 734      1.57572     cache_alloc_refill
c01351b0 727      1.56069     __kmalloc
c028b530 686      1.47267     neigh_lookup
c01adc80 535      1.14851     memcpy
c0135270 510      1.09484     kfree
c0284620 506      1.08626     alloc_skb
c0291460 498      1.06908     rt_garbage_collect
c028ec90 480      1.03044     eth_type_trans
c028ae20 439      0.942424    dst_alloc
c0128a00 437      0.938131    call_rcu
c02972d0 433      0.929544    ip_finish_output
c0297570 407      0.873728    ip_output
c011f080 380      0.815766    local_bh_enable
c0288080 376      0.807179    dev_queue_xmit
c028eeb0 369      0.792151    qdisc_restart
c0221e90 361      0.774977    tg3_set_txd
c028af50 360      0.772831    dst_destroy
c028f1a0 352      0.755657    pfifo_fast_enqueue
c028c3a0 317      0.68052     neigh_hh_init
c02215c0 315      0.676227    tg3_recycle_rx
c0221350 308      0.6612      tg3_tx
c02be9b0 294      0.631145    __fib_res_prefsrc
c010fc40 200      0.42935     do_gettimeofday
c02b48c0 195      0.418617    arp_hash
c0292860 195      0.418617    rt_set_nexthop
c02b4df0 178      0.382122    arp_bind_neighbour
c0294500 159      0.341334    dst_free

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
20262         5  109659     0     0     0     0     0         0       0      0  109659  109657         1    0
19229         7  109493     0     0     0     0     0         0       0      0  109493  109491         0    0
20320         4  109576     0     0     0     0     0         0       0      0  109576  109574         1    0
19280         9  109439     0     0     0     0     0         0       0      0  109439  109437         1    0
20325        10  109314     0     0     0     0     0         0       0      0  109314  109312         1    0
18983         6  109530     0     0     0     0     0         0       0      0  109530  109528         1    0
20313         5  109867     0     0     0     0     0         0       0      0  109867  109865         0    0
19127         4  109256     0     0     0     0     0         0       0      0  109256  109254         1    0
18897         4  109508     0     0     0     0     0         0       0      0  109508  109506         1    0
20338        11  109717     0     0     0     0     0         0       0      0  109717  109715         0    0
19054         7  109209     0     0     0     0     0         0       0      0  109209  109207         1    0
20397        11  109273     0     0     0     0     0         0       0      0  109273  109271         1    0

Simon-

[        Simon Kirby        ][        Network Operations        ]
[     sim@netnation.com     ][   NetNation Communications Inc.  ]
[  Opinions expressed are not necessarily those of my employer. ]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:51                           ` Simon Kirby
@ 2003-06-17 20:49                             ` David S. Miller
  2003-06-18  5:50                             ` Pekka Savola
  1 sibling, 0 replies; 31+ messages in thread
From: David S. Miller @ 2003-06-17 20:49 UTC (permalink / raw)
  To: sim; +Cc: gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw, netdev,
	linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Tue, 17 Jun 2003 13:51:01 -0700
   
   Specific firewall rules would have to be created otherwise.  And
   the overhead only really shows when the routing table is large,
   right?

rp filter breaks things... just like firewalls break things...

so just like a user enables firewall rules by himself, he may
enable rp filter by himself...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:07                   ` Simon Kirby
  2003-06-17 20:17                     ` Martin Josefsson
@ 2003-06-17 20:49                     ` Robert Olsson
  2003-06-17 21:07                     ` Simon Kirby
  2003-06-17 22:11                     ` Ralph Doncaster
  3 siblings, 0 replies; 31+ messages in thread
From: Robert Olsson @ 2003-06-17 20:49 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Robert Olsson, David S. Miller, ralph+d, hadi, xerox, fw, netdev,
	linux-net


Simon Kirby writes:

 > I changed Juno to send from a single IP, but it only spat out about
 > 330000 pps, which the dual Tigon3 Opteron box forwarded completely.
 > In order to do a single flow forwarding test, I need to be able to create
 > more input traffic somehow.  Seeing as you wrote pktgen.c, maybe you
 > could help in this department. :)

 OK. See below.

 > >  Also think Simon used only /32 routes... I took "real" Internet-routing
 > >  and made a script so it can be used for experiments. I can make it available.
 > 
 > Yes, I found that area less interesting since Dave M. fixed the hash
 > buckets.  But yes, the prefix scanning will slow it down some.

 Well I  don't think it's so easy as there are 33 zones with prefixes if 
 you have all the routes in one zone I'm not sure what happens thats why 
 I suggested the comparison.

 > Erm.  I can't get fib_stats2.pat to apply against 2.5.71, 2.5.71+davem's
 > join-two-diffs patch, 2.4.21-rc7, or 2.5.71+davem's rtcache changes. 
 > What's it supposed to be against?
 
 Sorry. Our production system and lab uses very patched 2.5.66  
 I'll make a patch for 2.5.71....

 > If I start two threads on the sender (Xeon w/HT), I'm able to push 420000
 > pps, which only partially starts to use NAPI on the Opteron box.  Going
 > to try 2.4 again for a comparison (note: 2.5 seems to have an opposite
 > PCI scan order from 2.4 for the dual Tigon3s).

 Not bad. Replace net/core/pktgen.c in 2.5.X with the version from 
 ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
 and edit pktgen.sh to suit your needs.

 And see what you got. I'm interested since you are using both different 
 processors as NIC's. Also packet generation itself is interesting as
 it tests driver/HW xmit-path.


 Cheers.
						--ro	

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:36                         ` David S. Miller
@ 2003-06-17 20:51                           ` Simon Kirby
  2003-06-17 20:49                             ` David S. Miller
  2003-06-18  5:50                             ` Pekka Savola
  0 siblings, 2 replies; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 20:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw, netdev,
	linux-net

On Tue, Jun 17, 2003 at 01:36:35PM -0700, David S. Miller wrote:

> I have no idea why they do this, it's the stupidest thing
> you can possibly do by default.
> 
> If we thought it was a good idea to turn this on by default
> we would have done so in the kernel.
> 
> Does anyone have some cycles to spare to try and urge whoever is
> repsponsible for this in Debian to leave the kernel's default setting
> alone?

Sure, I can do this.  But why is this stupid?  It uses more CPU, but
stops IP spoofing by default.  Specific firewall rules would have to be
created otherwise.  And the overhead only really shows when the routing
table is large, right?

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:07                   ` Simon Kirby
  2003-06-17 20:17                     ` Martin Josefsson
  2003-06-17 20:49                     ` Robert Olsson
@ 2003-06-17 21:07                     ` Simon Kirby
  2003-06-17 22:50                       ` Simon Kirby
  2003-06-17 22:11                     ` Ralph Doncaster
  3 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 21:07 UTC (permalink / raw)
  To: Robert Olsson
  Cc: David S. Miller, ralph+d, hadi, xerox, fw, netdev, linux-net

On Tue, Jun 17, 2003 at 01:07:21PM -0700, Simon Kirby wrote:

> Whoa.  Uhm.  A lot.  I should compare with 2.4 again to see what's going
> on here.
> 
> 60.0042 seconds passed, avg forwarding rate: 50759.683 pps

Ummmm, yeah, 2.5.71 is quite a bit slower than 2.4.21.  I applied
Alexey's 2.5.71 rtcache fixes to 2.4.21 (changing "fl" to "key" in
the scoring function), and now I see:

60.0065 seconds passed, avg forwarding rate: 135379.152 pps

If reboot and don't fill the routing table:

60.0104 seconds passed, avg forwarding rate: 259027.200 pps

This is with standard juno (pseudo-random sources).

This is with CONFIG_IP_MULTIPLE_TABLES still on, too.  I'll turn that off
and do some profiles.  The only weird thing I'm seeing while doing this
is that the route cache table continues to grow slowly, and the pps
slowly falls off over a few minutes.  "ip route flush cache" restores
performance again.  I'll verify this is not happening in 2.5.

Simon-

[        Simon Kirby        ][        Network Operations        ]
[     sim@netnation.com     ][   NetNation Communications Inc.  ]
[  Opinions expressed are not necessarily those of my employer. ]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 22:11                     ` Ralph Doncaster
@ 2003-06-17 22:08                       ` David S. Miller
  0 siblings, 0 replies; 31+ messages in thread
From: David S. Miller @ 2003-06-17 22:08 UTC (permalink / raw)
  To: ralph+d, ralph; +Cc: sim, netdev, linux-net

   From: Ralph Doncaster <ralph@istop.com>
   Date: Tue, 17 Jun 2003 18:11:00 -0400 (EDT)
   
   My (obviously incorrect) assumption would
   be that fib_validate_source is responsible for rp_filter, and turning it
   off would lead to only a 5% performance increase.

fib_validate_source() with rp_filter enabled causes an extra
fib_lookup() to occur for each packet.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:07                   ` Simon Kirby
                                       ` (2 preceding siblings ...)
  2003-06-17 21:07                     ` Simon Kirby
@ 2003-06-17 22:11                     ` Ralph Doncaster
  2003-06-17 22:08                       ` David S. Miller
  3 siblings, 1 reply; 31+ messages in thread
From: Ralph Doncaster @ 2003-06-17 22:11 UTC (permalink / raw)
  To: Simon Kirby; +Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org

On Tue, 17 Jun 2003, Simon Kirby wrote:

> vma      samples  %           symbol name
> c02bf730 16019    33.2014     fn_hash_lookup
> c0292b70 3882     8.04593     ip_route_input_slow
> c0221710 2335     4.83958     tg3_rx
> c02bd550 2004     4.15354     fib_validate_source
> c0290d70 1955     4.05198     rt_hash_code
> c0294e50 1670     3.46128     ip_rcv
> c02933a0 1404     2.90997     ip_route_input

If turning off rp_filter doubles your performance, then the profile
numbers above are misleading.  My (obviously incorrect) assumption would
be that fib_validate_source is responsible for rp_filter, and turning it
off would lead to only a 5% performance increase.

Considering that, what kind of performance difference should removing the
route hashing make (i.e. going with r-trees or something like that).  In
most of the profiles fn_hash_lookup has been at the top of the list.

-Ralph


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 21:07                     ` Simon Kirby
@ 2003-06-17 22:50                       ` Simon Kirby
  2003-06-17 23:07                         ` David S. Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2003-06-17 22:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: Robert Olsson, ralph+d, hadi, xerox, fw, netdev, linux-net

On Tue, Jun 17, 2003 at 02:07:01PM -0700, Simon Kirby wrote:

> 60.0104 seconds passed, avg forwarding rate: 259027.200 pps
> 
> This is with standard juno (pseudo-random sources).
> 
> This is with CONFIG_IP_MULTIPLE_TABLES still on, too.

Here is with CONFIG_IP_MULTIPLE_TABLES=n and CONFIG_NETFILTER=n
(rp_filter off and CONFIG_SMP=n in all tests):

60.0050 seconds passed, avg forwarding rate: 276893.102 pps
60.0046 seconds passed, avg forwarding rate: 257257.533 pps
60.0101 seconds passed, avg forwarding rate: 251852.843 pps
60.0106 seconds passed, avg forwarding rate: 248110.756 pps
60.0045 seconds passed, avg forwarding rate: 246280.066 pps

"rtstat -i 1" shows the pps rate decreasing because of the growing
rtcache:

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf
16688        18  294882     0     0     0     0     0         0       0      0  294882  294880         2    0
16834        24  302376     0     0     0     0     0         0       0      0  302376  302374         2    0
16970        12  294288     0     0     0     0     0         0       0      0  294288  294286         2    0
17037        21  294278     0     0     0     0     0         0       0      0  294278  294276         1    0
17133        20  293080     0     0     0     0     0         0       0      0  293080  293078         1    0
17195        22  293978     0     0     0     0     0         0       0      0  293978  293976         1    0
17293        16  292184     0     0     0     0     0         0       0      0  292184  292182         1    0
17370        19  293681     0     0     0     0     0         0       0      0  293681  293679         1    0
17450        21  293079     0     0     0     0     0         0       0      0  293079  293077         1    0
17542        12  293388     0     0     0     0     0         0       0      0  293388  293386         1    0
17604        16  293684     0     0     0     0     0         0       0      0  293684  293682         1    0
17676        27  294573     0     0     0     0     0         0       0      0  294573  294571         0    0
17762        18  291582     0     0     0     0     0         0       0      0  291582  291580         1    0
...
21615        17  257683     0     0     0     0     0         0       0      0  257683  257681         1    0
21641        23  257077     0     0     0     0     0         0       0      0  257077  257075         0    0
21672        23  257077     0     0     0     0     0         0       0      0  257077  257075         1    0

Profile:

vma      samples  %           symbol name
c025ee10 9228     15.1465     fn_hash_lookup
c02379a0 4025     6.60648     ip_route_input_slow
c0236650 3321     5.45096     rt_intern_hash
c0235d60 2775     4.55478     rt_hash_code
c012d750 2622     4.30365     kmem_cache_alloc
c012d950 2338     3.83751     kmem_cache_free
c0239f50 2323     3.81288     ip_rcv
c0238060 2277     3.73738     ip_route_input
c0233ac0 2190     3.59458     eth_header
c0231b50 2119     3.47805     neigh_resolve_output
c025ca50 2074     3.40419     fib_validate_source
c023aed0 1987     3.26139     ip_forward
c0230a70 1926     3.16126     neigh_lookup
c012d830 1845     3.02831     kmalloc
c0234270 1523     2.49979     pfifo_fast_dequeue
c025dde0 1407     2.3094      fib_semantic_match
c02376c0 1354     2.2224      rt_set_nexthop
c022a730 1322     2.16988     __kfree_skb
c022a480 1248     2.04842     alloc_skb
c012da00 1229     2.01723     kfree
c0259830 1171     1.92204     inet_select_addr
c0233ca0 1169     1.91875     eth_type_trans
c02304a0 994      1.63151     dst_destroy
c0230380 976      1.60197     dst_alloc
c010c910 806      1.32294     do_gettimeofday
c02319d0 793      1.3016      neigh_hh_init
c0236380 739      1.21297     rt_garbage_collect
c0233ef0 732      1.20148     qdisc_restart
c0234200 731      1.19984     pfifo_fast_enqueue
c022e590 695      1.14075     netif_receive_skb
c022dff0 623      1.02257     dev_queue_xmit
c023d6e0 575      0.943783    ip_finish_output
c02562a0 293      0.480919    arp_hash

Full route table:

60.0054 seconds passed, avg forwarding rate: 141888.209 pps

vma      samples  %           symbol name
c025ee10 21133    42.5588     fn_hash_lookup
c02379a0 2219     4.46874     ip_route_input_slow
c0236650 1600     3.22217     rt_intern_hash
c0235d60 1471     2.96238     rt_hash_code
c012d750 1282     2.58176     kmem_cache_alloc
c012d950 1279     2.57572     kmem_cache_free
c0259830 1253     2.52336     inet_select_addr
c0239f50 1231     2.47906     ip_rcv
c0233ac0 1214     2.44482     eth_header
c025dde0 1165     2.34614     fib_semantic_match
c0231b50 1133     2.2817      neigh_resolve_output
c0238060 1126     2.2676      ip_route_input
c025ca50 1120     2.25552     fib_validate_source
c0230a70 1041     2.09642     neigh_lookup
c023aed0 1032     2.0783      ip_forward
c012d830 1002     2.01788     kmalloc
c02376c0 762      1.53456     rt_set_nexthop
c0234270 733      1.47616     pfifo_fast_dequeue
c022a730 709      1.42782     __kfree_skb
c022a480 633      1.27477     alloc_skb
c012da00 629      1.26671     kfree
c0233ca0 623      1.25463     eth_type_trans
c02304a0 549      1.10561     dst_destroy
c0230380 519      1.04519     dst_alloc
c02319d0 447      0.900193    neigh_hh_init
c0236380 426      0.857902    rt_garbage_collect
c010c910 425      0.855889    do_gettimeofday
c022e590 395      0.795473    netif_receive_skb
c0234200 392      0.789431    pfifo_fast_enqueue
c0233ef0 381      0.767279    qdisc_restart
c022dff0 345      0.69478     dev_queue_xmit
c025de90 298      0.600129    __fib_res_prefsrc
c023d6e0 294      0.592073    ip_finish_output
c02567d0 145      0.292009    arp_bind_neighbour

Here's 2.5.72 (which seems to have all of the patches already in), empty
routing table:

60.0085 seconds passed, avg forwarding rate: 166543.268 pps
60.0080 seconds passed, avg forwarding rate: 167055.912 pps
60.0051 seconds passed, avg forwarding rate: 166843.560 pps

vma      samples  %           symbol name
c02c0020 5193     10.2685     fn_hash_lookup
c02930d0 3475     6.87139     ip_route_input_slow
c02222c0 2349     4.64486     tg3_start_xmit
c0291c10 2217     4.38385     rt_intern_hash
c02bde40 1910     3.77679     fib_validate_source
c0293900 1864     3.68583     ip_route_input
c02953d0 1646     3.25477     ip_rcv
c0288b40 1609     3.1816      netif_receive_skb
c0135210 1462     2.89093     kmem_cache_free
c0134fd0 1457     2.88104     free_block
c02216a0 1390     2.74856     tg3_rx
c0135150 1295     2.56071     kmem_cache_alloc
c02912d0 1275     2.52116     rt_hash_code
c028f040 1250     2.47172     eth_header
c0134e00 1250     2.47172     cache_alloc_refill
c028ca40 1223     2.41833     neigh_resolve_output
c028f740 1072     2.11975     pfifo_fast_dequeue
c0135190 1019     2.01495     __kmalloc
c028ba50 991      1.95958     neigh_lookup
c02bf1d0 843      1.66693     fib_semantic_match
c01adc10 816      1.61354     memcpy
c0284ba0 768      1.51863     alloc_skb
c0135250 724      1.43162     kfree
c028f1b0 702      1.38812     eth_type_trans
c02b8930 674      1.33275     inet_select_addr
c0288600 668      1.32089     dev_queue_xmit
c0297850 664      1.31298     ip_finish_output
c0221e20 654      1.29321     tg3_set_txd
c028b340 635      1.25564     dst_alloc
c01289e0 632      1.2497      call_rcu
c0296370 606      1.19829     ip_forward
c0297af0 567      1.12117     ip_output
c028c8c0 547      1.08163     neigh_hh_init
c028b470 547      1.08163     dst_destroy
c011f080 489      0.966938    local_bh_enable
c0221550 477      0.94321     tg3_recycle_rx

Erp, needed a new rtstat.  And a wider console, apparently:

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf HASH: in_search out_search
22233        18  329638     0     0     0     0     0         0       0      0  329638  329634         2    0          665742          0
20523        22  329074     0     0     0     0     0         0       0      0  329074  329070         2    0          665184          0
23510        26  331502     0     0     0     0     0         0       0      0  331502  331498         2    0          671610          0
22552        24  330464     0     0     0     0     0         0       0      0  330464  330460         4    0          669214          0
20359         8  329512     0     0     0     0     0         0       0      0  329512  329508         2    0          664428          0
19965        22  330090     0     0     0     0     0         0       0      0  330090  330086         2    0          663296          0
20081        20  332660     0     0     0     0     0         0       0      0  332660  332656         2    0          671912          0
21113        22  330458     0     0     0     0     0         0       0      0  330458  330454         2    0          666340          0
19864        14  329778     0     0     0     0     0         0       0      0  329778  329774         2    0          667324          0
20195        18  329702     0     0     0     0     0         0       0      0  329702  329698         2    0          670646          0

Route cache size does not increase on 2.5, so the problems in 2.4 are
probably the result of me hacking in the 2.5 patch.

2.5.72, full routing table:

60.0057 seconds passed, avg forwarding rate: 101800.795 pps
60.0045 seconds passed, avg forwarding rate: 101612.797 pps
60.0046 seconds passed, avg forwarding rate: 102004.873 pps
60.0044 seconds passed, avg forwarding rate: 102042.629 pps
60.0055 seconds passed, avg forwarding rate: 102135.224 pps
60.0057 seconds passed, avg forwarding rate: 102158.546 pps
60.0044 seconds passed, avg forwarding rate: 102200.430 pps

vma      samples  %           symbol name
c02c0020 14206    33.0911     fn_hash_lookup
c02930d0 2103     4.89867     ip_route_input_slow
c02222c0 1436     3.34498     tg3_start_xmit
c0291c10 1328     3.09341     rt_intern_hash
c02bde40 1315     3.06313     fib_validate_source
c0293900 1122     2.61356     ip_route_input
c02953d0 1028     2.3946      ip_rcv
c02bf1d0 1013     2.35966     fib_semantic_match
c0288b40 957      2.22921     netif_receive_skb
c0134fd0 840      1.95667     free_block
c0135210 823      1.91707     kmem_cache_free
c02b8930 811      1.88912     inet_select_addr
c02216a0 804      1.87282     tg3_rx
c028ca40 801      1.86583     neigh_resolve_output
c02912d0 786      1.83089     rt_hash_code
c028f040 709      1.65153     eth_header
c0135190 700      1.63056     __kmalloc
c0134e00 692      1.61193     cache_alloc_refill
c028f740 644      1.50012     pfifo_fast_dequeue
c028ba50 597      1.39064     neigh_lookup
c0135150 591      1.37666     kmem_cache_alloc
c0284ba0 539      1.25553     alloc_skb
c01adc10 491      1.14372     memcpy
c0135250 449      1.04589     kfree
c028b340 437      1.01794     dst_alloc
c028f1b0 433      1.00862     eth_type_trans
c0297af0 422      0.982996    ip_output
c01289e0 402      0.936408    call_rcu
c0297850 400      0.931749    ip_finish_output
c0296370 386      0.899138    ip_forward
c0221e20 384      0.894479    tg3_set_txd
c0288600 365      0.850221    dev_queue_xmit
c011f080 364      0.847892    local_bh_enable
c02919c0 356      0.829257    rt_garbage_collect
c028c8c0 317      0.738411    neigh_hh_init
c028b470 313      0.729094    dst_destroy
c02212e0 299      0.696483    tg3_tx

 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc GC: tot ignored goal_miss ovrf HASH: in_search out_search
18755        12  202740     0     0     0     0     0         0       0      0  202740  202736         2    0          405030          0
19945        20  203884     0     0     0     0     0         0       0      0  203884  203880         2    0          406726          0
18449         8  204152     0     0     0     0     0         0       0      0  204152  204148         0    0          409590          0
19637        10  205302     0     0     0     0     0         0       0      0  205302  205298         2    0          413004          0
19213        10  204022     0     0     0     0     0         0       0      0  204022  204018         2    0          411092          0
20182         8  204280     0     0     0     0     0         0       0      0  204280  204276         2    0          412044          0
19311        14  203378     0     0     0     0     0         0       0      0  203378  203374         2    0          411052          0
18790        16  202480     0     0     0     0     0         0       0      0  202480  202476         2    0          409440          0
18835        24  204776     0     0     0     0     0         0       0      0  204776  204772         0    0          414416          0
19830         8  204792     0     0     0     0     0         0       0      0  204792  204788         2    0          415514          0

Simon-

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 22:50                       ` Simon Kirby
@ 2003-06-17 23:07                         ` David S. Miller
  0 siblings, 0 replies; 31+ messages in thread
From: David S. Miller @ 2003-06-17 23:07 UTC (permalink / raw)
  To: sim; +Cc: Robert.Olsson, ralph+d, hadi, xerox, fw, netdev, linux-net

   From: Simon Kirby <sim@netnation.com>
   Date: Tue, 17 Jun 2003 15:50:36 -0700
   
   so the problems in 2.4 are probably the result of me hacking in the
   2.5 patch.

I have them in my pending 2.4.x tree, try this:

diff -Nru a/include/net/route.h b/include/net/route.h
--- a/include/net/route.h	Tue Jun 17 16:08:06 2003
+++ b/include/net/route.h	Tue Jun 17 16:08:06 2003
@@ -114,6 +114,8 @@
         unsigned int gc_ignored;
         unsigned int gc_goal_miss;
         unsigned int gc_dst_overflow;
+	unsigned int in_hlist_search;
+	unsigned int out_hlist_search;
 } ____cacheline_aligned_in_smp;
 
 extern struct ip_rt_acct *ip_rt_acct;
diff -Nru a/net/ipv4/Config.in b/net/ipv4/Config.in
--- a/net/ipv4/Config.in	Tue Jun 17 16:08:06 2003
+++ b/net/ipv4/Config.in	Tue Jun 17 16:08:06 2003
@@ -14,7 +14,6 @@
    bool '    IP: equal cost multipath' CONFIG_IP_ROUTE_MULTIPATH
    bool '    IP: use TOS value as routing key' CONFIG_IP_ROUTE_TOS
    bool '    IP: verbose route monitoring' CONFIG_IP_ROUTE_VERBOSE
-   bool '    IP: large routing tables' CONFIG_IP_ROUTE_LARGE_TABLES
 fi
 bool '  IP: kernel level autoconfiguration' CONFIG_IP_PNP
 if [ "$CONFIG_IP_PNP" = "y" ]; then
diff -Nru a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
--- a/net/ipv4/fib_hash.c	Tue Jun 17 16:08:07 2003
+++ b/net/ipv4/fib_hash.c	Tue Jun 17 16:08:07 2003
@@ -89,7 +89,7 @@
 	int		fz_nent;	/* Number of entries	*/
 
 	int		fz_divisor;	/* Hash divisor		*/
-	u32		fz_hashmask;	/* (1<<fz_divisor) - 1	*/
+	u32		fz_hashmask;	/* (fz_divisor - 1)	*/
 #define FZ_HASHMASK(fz)	((fz)->fz_hashmask)
 
 	int		fz_order;	/* Zone order		*/
@@ -149,9 +149,19 @@
 
 static rwlock_t fib_hash_lock = RW_LOCK_UNLOCKED;
 
-#define FZ_MAX_DIVISOR 1024
+#define FZ_MAX_DIVISOR ((PAGE_SIZE<<MAX_ORDER) / sizeof(struct fib_node *))
 
-#ifdef CONFIG_IP_ROUTE_LARGE_TABLES
+static struct fib_node **fz_hash_alloc(int divisor)
+{
+	unsigned long size = divisor * sizeof(struct fib_node *);
+
+	if (divisor <= 1024) {
+		return kmalloc(size, GFP_KERNEL);
+	} else {
+		return (struct fib_node **)
+			__get_free_pages(GFP_KERNEL, get_order(size));
+	}
+}
 
 /* The fib hash lock must be held when this is called. */
 static __inline__ void fn_rebuild_zone(struct fn_zone *fz,
@@ -174,6 +184,15 @@
 	}
 }
 
+static void fz_hash_free(struct fib_node **hash, int divisor)
+{
+	if (divisor <= 1024)
+		kfree(hash);
+	else
+		free_pages((unsigned long) hash,
+			   get_order(divisor * sizeof(struct fib_node *)));
+}
+
 static void fn_rehash_zone(struct fn_zone *fz)
 {
 	struct fib_node **ht, **old_ht;
@@ -185,24 +204,30 @@
 	switch (old_divisor) {
 	case 16:
 		new_divisor = 256;
-		new_hashmask = 0xFF;
 		break;
 	case 256:
 		new_divisor = 1024;
-		new_hashmask = 0x3FF;
 		break;
 	default:
-		printk(KERN_CRIT "route.c: bad divisor %d!\n", old_divisor);
-		return;
+		if ((old_divisor << 1) > FZ_MAX_DIVISOR) {
+			printk(KERN_CRIT "route.c: bad divisor %d!\n", old_divisor);
+			return;
+		}
+		new_divisor = (old_divisor << 1);
+		break;
 	}
+
+	new_hashmask = (new_divisor - 1);
+
 #if RT_CACHE_DEBUG >= 2
 	printk("fn_rehash_zone: hash for zone %d grows from %d\n", fz->fz_order, old_divisor);
 #endif
 
-	ht = kmalloc(new_divisor*sizeof(struct fib_node*), GFP_KERNEL);
+	ht = fz_hash_alloc(new_divisor);
 
 	if (ht)	{
 		memset(ht, 0, new_divisor*sizeof(struct fib_node*));
+
 		write_lock_bh(&fib_hash_lock);
 		old_ht = fz->fz_hash;
 		fz->fz_hash = ht;
@@ -210,10 +235,10 @@
 		fz->fz_divisor = new_divisor;
 		fn_rebuild_zone(fz, old_ht, old_divisor);
 		write_unlock_bh(&fib_hash_lock);
-		kfree(old_ht);
+
+		fz_hash_free(old_ht, old_divisor);
 	}
 }
-#endif /* CONFIG_IP_ROUTE_LARGE_TABLES */
 
 static void fn_free_node(struct fib_node * f)
 {
@@ -233,12 +258,11 @@
 	memset(fz, 0, sizeof(struct fn_zone));
 	if (z) {
 		fz->fz_divisor = 16;
-		fz->fz_hashmask = 0xF;
 	} else {
 		fz->fz_divisor = 1;
-		fz->fz_hashmask = 0;
 	}
-	fz->fz_hash = kmalloc(fz->fz_divisor*sizeof(struct fib_node*), GFP_KERNEL);
+	fz->fz_hashmask = (fz->fz_divisor - 1);
+	fz->fz_hash = fz_hash_alloc(fz->fz_divisor);
 	if (!fz->fz_hash) {
 		kfree(fz);
 		return NULL;
@@ -467,12 +491,10 @@
 	if  ((fi = fib_create_info(r, rta, n, &err)) == NULL)
 		return err;
 
-#ifdef CONFIG_IP_ROUTE_LARGE_TABLES
-	if (fz->fz_nent > (fz->fz_divisor<<2) &&
+	if (fz->fz_nent > (fz->fz_divisor<<1) &&
 	    fz->fz_divisor < FZ_MAX_DIVISOR &&
 	    (z==32 || (1<<z) > fz->fz_divisor))
 		fn_rehash_zone(fz);
-#endif
 
 	fp = fz_chain_p(key, fz);
 
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c	Tue Jun 17 16:08:07 2003
+++ b/net/ipv4/route.c	Tue Jun 17 16:08:07 2003
@@ -108,7 +108,7 @@
 int ip_rt_max_size;
 int ip_rt_gc_timeout		= RT_GC_TIMEOUT;
 int ip_rt_gc_interval		= 60 * HZ;
-int ip_rt_gc_min_interval	= 5 * HZ;
+int ip_rt_gc_min_interval	= HZ / 2;
 int ip_rt_redirect_number	= 9;
 int ip_rt_redirect_load		= HZ / 50;
 int ip_rt_redirect_silence	= ((HZ / 50) << (9 + 1));
@@ -287,7 +287,7 @@
         for (lcpu = 0; lcpu < smp_num_cpus; lcpu++) {
                 i = cpu_logical_map(lcpu);
 
-		len += sprintf(buffer+len, "%08x  %08x %08x %08x %08x %08x %08x %08x  %08x %08x %08x %08x %08x %08x %08x \n",
+		len += sprintf(buffer+len, "%08x  %08x %08x %08x %08x %08x %08x %08x  %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
 			       dst_entries,		       
 			       rt_cache_stat[i].in_hit,
 			       rt_cache_stat[i].in_slow_tot,
@@ -304,7 +304,9 @@
 			       rt_cache_stat[i].gc_total,
 			       rt_cache_stat[i].gc_ignored,
 			       rt_cache_stat[i].gc_goal_miss,
-			       rt_cache_stat[i].gc_dst_overflow
+			       rt_cache_stat[i].gc_dst_overflow,
+			       rt_cache_stat[i].in_hlist_search,
+			       rt_cache_stat[i].out_hlist_search
 
 			);
 	}
@@ -344,16 +346,17 @@
 		rth->u.dst.expires;
 }
 
-static __inline__ int rt_may_expire(struct rtable *rth, int tmo1, int tmo2)
+static __inline__ int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2)
 {
-	int age;
+	unsigned long age;
 	int ret = 0;
 
 	if (atomic_read(&rth->u.dst.__refcnt))
 		goto out;
 
 	ret = 1;
-	if (rth->u.dst.expires && (long)(rth->u.dst.expires - jiffies) <= 0)
+	if (rth->u.dst.expires &&
+	    time_after_eq(jiffies, rth->u.dst.expires))
 		goto out;
 
 	age = jiffies - rth->u.dst.lastuse;
@@ -365,6 +368,25 @@
 out:	return ret;
 }
 
+/* Bits of score are:
+ * 31: very valuable
+ * 30: not quite useless
+ * 29..0: usage counter
+ */
+static inline u32 rt_score(struct rtable *rt)
+{
+	u32 score = rt->u.dst.__use;
+
+	if (rt_valuable(rt))
+		score |= (1<<31);
+
+	if (!rt->key.iif ||
+	    !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL)))
+		score |= (1<<30);
+
+	return score;
+}
+
 /* This runs via a timer and thus is always in BH context. */
 static void SMP_TIMER_NAME(rt_check_expire)(unsigned long dummy)
 {
@@ -375,7 +397,7 @@
 
 	for (t = ip_rt_gc_interval << rt_hash_log; t >= 0;
 	     t -= ip_rt_gc_timeout) {
-		unsigned tmo = ip_rt_gc_timeout;
+		unsigned long tmo = ip_rt_gc_timeout;
 
 		i = (i + 1) & rt_hash_mask;
 		rthp = &rt_hash_table[i].chain;
@@ -384,7 +406,7 @@
 		while ((rth = *rthp) != NULL) {
 			if (rth->u.dst.expires) {
 				/* Entry is expired even if it is in use */
-				if ((long)(now - rth->u.dst.expires) <= 0) {
+				if (time_before_eq(now, rth->u.dst.expires)) {
 					tmo >>= 1;
 					rthp = &rth->u.rt_next;
 					continue;
@@ -402,7 +424,7 @@
 		write_unlock(&rt_hash_table[i].lock);
 
 		/* Fallback loop breaker. */
-		if ((jiffies - now) > 0)
+		if (time_after(jiffies, now))
 			break;
 	}
 	rover = i;
@@ -504,7 +526,7 @@
 
 static int rt_garbage_collect(void)
 {
-	static unsigned expire = RT_GC_TIMEOUT;
+	static unsigned long expire = RT_GC_TIMEOUT;
 	static unsigned long last_gc;
 	static int rover;
 	static int equilibrium;
@@ -556,7 +578,7 @@
 		int i, k;
 
 		for (i = rt_hash_mask, k = rover; i >= 0; i--) {
-			unsigned tmo = expire;
+			unsigned long tmo = expire;
 
 			k = (k + 1) & rt_hash_mask;
 			rthp = &rt_hash_table[k].chain;
@@ -602,7 +624,7 @@
 
 		if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
 			goto out;
-	} while (!in_softirq() && jiffies - now < 1);
+	} while (!in_softirq() && time_before_eq(jiffies, now));
 
 	if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
 		goto out;
@@ -626,10 +648,19 @@
 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
 	struct rtable	*rth, **rthp;
-	unsigned long	now = jiffies;
+	unsigned long	now;
+	struct rtable *cand, **candp;
+	u32 		min_score;
+	int		chain_length;
 	int attempts = !in_softirq();
 
 restart:
+	chain_length = 0;
+	min_score = ~(u32)0;
+	cand = NULL;
+	candp = NULL;
+	now = jiffies;
+
 	rthp = &rt_hash_table[hash].chain;
 
 	write_lock_bh(&rt_hash_table[hash].lock);
@@ -650,9 +681,35 @@
 			return 0;
 		}
 
+		if (!atomic_read(&rth->u.dst.__refcnt)) {
+			u32 score = rt_score(rth);
+
+			if (score <= min_score) {
+				cand = rth;
+				candp = rthp;
+				min_score = score;
+			}
+		}
+
+		chain_length++;
+
 		rthp = &rth->u.rt_next;
 	}
 
+	if (cand) {
+		/* ip_rt_gc_elasticity used to be average length of chain
+		 * length, when exceeded gc becomes really aggressive.
+		 *
+		 * The second limit is less certain. At the moment it allows
+		 * only 2 entries per bucket. We will see.
+		 */
+		if (chain_length > ip_rt_gc_elasticity ||
+		    (chain_length > 1 && !(min_score & (1<<31)))) {
+			*candp = cand->u.rt_next;
+			rt_free(cand);
+		}
+	}
+
 	/* Try to bind route to arp only if it is output
 	   route or unicast forwarding path.
 	 */
@@ -960,7 +1017,7 @@
 	/* No redirected packets during ip_rt_redirect_silence;
 	 * reset the algorithm.
 	 */
-	if (jiffies - rt->u.dst.rate_last > ip_rt_redirect_silence)
+	if (time_after(jiffies, rt->u.dst.rate_last + ip_rt_redirect_silence))
 		rt->u.dst.rate_tokens = 0;
 
 	/* Too many ignored redirects; do not send anything
@@ -974,8 +1031,9 @@
 	/* Check for load limit; set rate_last to the latest sent
 	 * redirect.
 	 */
-	if (jiffies - rt->u.dst.rate_last >
-	    (ip_rt_redirect_load << rt->u.dst.rate_tokens)) {
+	if (time_after(jiffies,
+		       (rt->u.dst.rate_last +
+			(ip_rt_redirect_load << rt->u.dst.rate_tokens)))) {
 		icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway);
 		rt->u.dst.rate_last = jiffies;
 		++rt->u.dst.rate_tokens;
@@ -1672,6 +1730,7 @@
 			skb->dst = (struct dst_entry*)rth;
 			return 0;
 		}
+		rt_cache_stat[smp_processor_id()].in_hlist_search++;
 	}
 	read_unlock(&rt_hash_table[hash].lock);
 
@@ -2032,6 +2091,7 @@
 			*rp = rth;
 			return 0;
 		}
+		rt_cache_stat[smp_processor_id()].out_hlist_search++;
 	}
 	read_unlock_bh(&rt_hash_table[hash].lock);
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Route cache performance tests
  2003-06-17 20:51                           ` Simon Kirby
  2003-06-17 20:49                             ` David S. Miller
@ 2003-06-18  5:50                             ` Pekka Savola
  1 sibling, 0 replies; 31+ messages in thread
From: Pekka Savola @ 2003-06-18  5:50 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David S. Miller, gandalf, Robert.Olsson, ralph+d, hadi, xerox, fw,
	netdev, linux-net

On Tue, 17 Jun 2003, Simon Kirby wrote:
> On Tue, Jun 17, 2003 at 01:36:35PM -0700, David S. Miller wrote:
> 
> > I have no idea why they do this, it's the stupidest thing
> > you can possibly do by default.
> > 
> > If we thought it was a good idea to turn this on by default
> > we would have done so in the kernel.
> > 
> > Does anyone have some cycles to spare to try and urge whoever is
> > repsponsible for this in Debian to leave the kernel's default setting
> > alone?
> 
> Sure, I can do this.  But why is this stupid?  It uses more CPU, but
> stops IP spoofing by default.  Specific firewall rules would have to be
> created otherwise.  And the overhead only really shows when the routing
> table is large, right?

Personally I think rp_filter by default is the only good choice
(security/operational-wise).  It's typically not useful when you have a
lot of routes, though.. but as the 99.9% of users _don't_, it still seems 
like a good default value.

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2003-06-18  5:50 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-10  7:57 Route cache performance tests Simon Kirby
2003-06-10 11:23 ` Jamal Hadi
2003-06-10 20:36   ` CIT/Paul
2003-06-10 13:34 ` Ralph Doncaster
2003-06-10 13:39   ` Jamal Hadi
2003-06-13  6:20 ` David S. Miller
2003-06-16 22:37   ` Simon Kirby
2003-06-16 22:44     ` David S. Miller
2003-06-16 23:09       ` Simon Kirby
2003-06-16 23:08         ` David S. Miller
2003-06-16 23:27           ` Simon Kirby
2003-06-16 23:49             ` Simon Kirby
2003-06-17 15:59               ` David S. Miller
2003-06-17 16:50                 ` Robert Olsson
2003-06-17 16:50                   ` David S. Miller
2003-06-17 17:29                     ` Robert Olsson
2003-06-17 19:06                       ` Mr. James W. Laferriere
2003-06-17 20:12                         ` Robert Olsson
2003-06-17 20:07                   ` Simon Kirby
2003-06-17 20:17                     ` Martin Josefsson
2003-06-17 20:37                       ` Simon Kirby
2003-06-17 20:36                         ` David S. Miller
2003-06-17 20:51                           ` Simon Kirby
2003-06-17 20:49                             ` David S. Miller
2003-06-18  5:50                             ` Pekka Savola
2003-06-17 20:49                     ` Robert Olsson
2003-06-17 21:07                     ` Simon Kirby
2003-06-17 22:50                       ` Simon Kirby
2003-06-17 23:07                         ` David S. Miller
2003-06-17 22:11                     ` Ralph Doncaster
2003-06-17 22:08                       ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).