* Performance problem with route cache ?
@ 2003-11-20 10:55 dada1
2003-11-20 12:37 ` Robert Olsson
0 siblings, 1 reply; 4+ messages in thread
From: dada1 @ 2003-11-20 10:55 UTC (permalink / raw)
To: netdev
Hi all
I'm doing some oprofile on a bi athlon, linux-2.6.0-test8 kernel, (with a
copybreak intel e1000 patch, because the machine receives a lot of small (<
30 bytes) messages)
It seems the route cache may/should be tuned, but I dont know how.
The machine receives about 10000 packets/second, from a lot of different IP.
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1991.43
Counter 0 counted CPU_CLK_UNHALTED events (Cycles outside of halt state)
with a unit mask of 0x00 (No unit mask) count
50000
vma samples % symbol name
a0251fd0 6477022 15.6789 rt_garbage_collect
a028b6a0 2692339 6.51731 ipt_do_table
a0254560 1779167 4.30681 ip_route_input
a0251c40 1544801 3.73948 rt_may_expire
a0273370 1119668 2.71037 tcp_v4_rcv
a01f4350 1084883 2.62616 e1000_clean_rx_irq
a013dd70 1048705 2.53859 free_block
a013e120 844766 2.04492 kfree
a0269cc0 600011 1.45244 tcp_rcv_established
a0111c20 540780 1.30906 mark_offset_tsc
a0235d40 536856 1.29956 skb_release_data
a023a9a0 521576 1.26257 net_rx_action
a0259dc0 515115 1.24693 ip_queue_xmit
a023f990 498217 1.20603 dst_alloc
a0260190 487681 1.18052 tcp_sendmsg
a0231fe0 477588 1.15609 sockfd_lookup
a0118c30 459348 1.11194 schedule
a0261bd0 454353 1.09985 tcp_recvmsg
a01070c0 442280 1.07062 default_idle
# grep ip_dst_cache /proc/slabinfo
ip_dst_cache 253051 324840 320 12 1 : tunables 54 27 8 :
slabdata 27070 27070 0
# cat /proc/net/sockstat
sockets: used 246459
TCP: inuse 245501 orphan 591 tw 5639 alloc 247035 mem 74061
UDP: inuse 6
RAW: inuse 0
FRAG: inuse 3 memory 1164
#grep "routing cache hash table " /var/log/syslog
kernel: IP: routing cache hash table of 32768 buckets, 256Kbytes
# cat /proc/meminfo
MemTotal: 3371408 kB
MemFree: 847716 kB
Buffers: 70352 kB
Cached: 91080 kB
SwapCached: 0 kB
Active: 1162512 kB
Inactive: 27360 kB
HighTotal: 1900544 kB
HighFree: 161920 kB
LowTotal: 1470864 kB
LowFree: 685796 kB
SwapTotal: 506008 kB
SwapFree: 506008 kB
Dirty: 2092 kB
Writeback: 0 kB
Mapped: 1038884 kB
Slab: 704636 kB
Committed_AS: 1549704 kB
PageTables: 1412 kB
VmallocTotal: 49144 kB
VmallocUsed: 2380 kB
VmallocChunk: 45896 kB
HugePages_Total: 150
HugePages_Free: 48
Hugepagesize: 4096 kB
Does anybody have a hint, what should I change to lower a bit the CPU used
in kernel land ?
Thanks
Eric Dumazet
^ permalink raw reply [flat|nested] 4+ messages in thread
* Performance problem with route cache ?
2003-11-20 10:55 Performance problem with route cache ? dada1
@ 2003-11-20 12:37 ` Robert Olsson
2003-11-21 7:53 ` dada1
0 siblings, 1 reply; 4+ messages in thread
From: Robert Olsson @ 2003-11-20 12:37 UTC (permalink / raw)
To: dada1; +Cc: netdev
dada1 writes:
> Hi all
>
> I'm doing some oprofile on a bi athlon, linux-2.6.0-test8 kernel, (with a
> copybreak intel e1000 patch, because the machine receives a lot of small (<
> 30 bytes) messages)
Does the copybreak help? I tested copybreak long time ago and saw no gain
from it. Do you have pointer to the patch?
> It seems the route cache may/should be tuned, but I dont know how.
Tuning is not an easy task and not necessarily you will gain anything,
It fully depends on your input traffic but it seems your system has a
lot of dst entries. Which would cause a lot linear search. Packets per
flow and no of parallel flows should be of interest for you. rtstat can
give you a better feeling of incoming routing load.
Is 10kpkts/s the max performance?
Cheers.
--ro
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Performance problem with route cache ?
2003-11-20 12:37 ` Robert Olsson
@ 2003-11-21 7:53 ` dada1
2003-11-21 9:37 ` Robert Olsson
0 siblings, 1 reply; 4+ messages in thread
From: dada1 @ 2003-11-21 7:53 UTC (permalink / raw)
To: Robert Olsson; +Cc: netdev, scott.feldman
From: "Robert Olsson" <Robert.Olsson@data.slu.se>
> Does the copybreak help? I tested copybreak long time ago and saw no gain
> from it. Do you have pointer to the patch?
This is a patch kindly given to me by Scott Feldmann, that just copy the skb
to a new shorter one.
In my case, this is indeed usefull, because the server receives a lot of
small tcp messages, from a lot of clients.
Instead of using 2K or 4K buffers to store the data into socket, we end up
using 64 bytes size buffers... quite a huge difference indeed.
In case of a spike in network, the machine no longer consume 200 MB of
lowmem and stay alive.
In a router context, copybreak should not be used, because the extra copy is
CPU intensive, and the data should not live long enough on memory.
>
> > It seems the route cache may/should be tuned, but I dont know how.
>
> Tuning is not an easy task and not necessarily you will gain anything,
> It fully depends on your input traffic but it seems your system has a
> lot of dst entries. Which would cause a lot linear search. Packets per
> flow and no of parallel flows should be of interest for you. rtstat can
> give you a better feeling of incoming routing load.
>
> Is 10kpkts/s the max performance?
Well, the machine is not network bounded... We dont want to increase the
trafic.
The machine just receives a lot of frames from a lot of different IP. That
is not a DOS attack.
Some rtstat samples :
rtstat -i 1
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot
mc GC: tot ignored goal_miss ovrf HASH: in
_search out_search
279536 1490 11719 0 0 0 0 0 231 843
0 12562 12560 2 0
97100 6853
282928 1732 9705 0 0 0 0 0 275 800
0 10505 10503 2 0
80811 6774
278436 1211 10609 0 0 0 0 0 166 551
0 11160 11158 2 0
87831 4550
280942 1195 8996 0 0 0 0 0 299 1128
0 10124 10122 2 0
75815 9523
281774 1994 12201 0 0 0 0 0 295 1070
0 13271 13269 2 0
102299 8864
274220 1514 11692 0 0 0 0 0 252 929
0 12621 12619 2 0
96551 7743
274248 1271 10805 0 0 0 0 0 273 1094
0 11899 11897 2 0
88822 9013
278905 1157 9942 0 0 0 0 0 85 308
0 10250 10248 2 0
82651 2569
283423 1509 8471 0 0 0 0 0 176 631
0 9102 9100 2 0
71805 5286
288620 1530 7685 0 0 0 0 0 74 299
0 7984 7982 2 0
66499 2505
293264 1666 7432 0 0 0 0 0 174 552
0 7984 7982 2 0
65642 4785
296831 1840 7570 0 0 0 0 0 129 371
0 7941 7939 2 0
67687 3257
297902 2214 9863 0 0 0 0 0 346 1192
0 11055 11053 2 0
88987 10566
300225 2192 8136 0 0 0 0 0 308 877
0 9013 9011 2 0
74096 8029
size-32 931 1008 32 112 1 : tunables 120 60 8 :
slabdata 9 9 0
size-64 18477 51566 64 59 1 : tunables 120 60 8 :
slabdata 874 874 480
size-128 2013 2040 128 30 1 : tunables 120 60 8 :
slabdata 68 68 0
size-256 90 90 256 15 1 : tunables 120 60 8 :
slabdata 6 6 0
size-512 17569 37680 512 8 1 : tunables 54 27 8 :
slabdata 4710 4710 27
size-1024 348 348 1024 4 1 : tunables 54 27 8 :
slabdata 87 87 54
size-2048 13414 17306 2048 2 1 : tunables 24 12 8 :
slabdata 8653 8653 96
size-4096 3970 3970 4096 1 1 : tunables 24 12 8 :
slabdata 3970 3970 0
size-8192 139 139 8192 1 2 : tunables 8 4 0 :
slabdata 139 139 0
ip_dst_cache 280165 386208 320 12 1 : tunables 54 27 8 :
slabdata 32184 32184 0
skbuff_head_cache 35060 110200 192 20 1 : tunables 120 60 8 :
slabdata 5510 5510 60
Thanks
Eric Dumazet
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Performance problem with route cache ?
2003-11-21 7:53 ` dada1
@ 2003-11-21 9:37 ` Robert Olsson
0 siblings, 0 replies; 4+ messages in thread
From: Robert Olsson @ 2003-11-21 9:37 UTC (permalink / raw)
To: dada1; +Cc: Robert Olsson, netdev, scott.feldman
dada1 writes:
> Instead of using 2K or 4K buffers to store the data into socket, we end up
> using 64 bytes size buffers... quite a huge difference indeed.
> In case of a spike in network, the machine no longer consume 200 MB of
> lowmem and stay alive.
Well have no experice here... but think e1000 was using 4k for 1500 MTU
packets at least before.
> In a router context, copybreak should not be used, because the extra copy is
> CPU intensive, and the data should not live long enough on memory.
It easy to try again. I'm playing with realtek r8169 driver and are filling
the skb one-by-one now as they are passed onto the stack. Rather than doing
the RX-batch refill. I need to collect some data...
> The machine just receives a lot of frames from a lot of different IP. That
> is not a DOS attack.
> rtstat -i 1
> size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot
> mc GC: tot ignored goal_miss ovrf HASH: in
> _search out_search
> 279536 1490 11719 0 0 0 0 0 231 843
> 0 12562 12560 2 0
Well not DoS? What is this? See you have almost 10 times more new entries
per/sec then what hits your already huge dst-cache. (hit/total).
Compare to a procduction router for 10:th of thousands of users at 55 kpps.
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf HASH: in_search out_search
21797 54543 529 0 49 0 0 0 137 12 0 0 0 0 0 28139 59
Cheers.
--ro
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-11-21 9:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-20 10:55 Performance problem with route cache ? dada1
2003-11-20 12:37 ` Robert Olsson
2003-11-21 7:53 ` dada1
2003-11-21 9:37 ` Robert Olsson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).