* Re: [PATCH] option for large routing hash
@ 2003-12-10 14:47 Robert Olsson
2003-12-10 23:05 ` David S. Miller
0 siblings, 1 reply; 7+ messages in thread
From: Robert Olsson @ 2003-12-10 14:47 UTC (permalink / raw)
To: David S. Miller; +Cc: Robert Olsson, kuznet, netdev
Hello!
More thoughts and observations before doing anything else maybe we can wake
up Alexey...
> IP: routing cache hash table of 4096 buckets, 32Kbytes
...
> And tot+hit gives the pps throughput 152 kpps.
...
> IP: routing cache hash table of 32768 buckets, 256Kbytes
...
> We see cache is now used as hit > tot and we get a performance jump from
> 152 to 265 kpps.
>
Some more experiment details. First full Internet routing table was used.
Processor UP XEON 2.6 GHz.
With large route hash see dst cache overflow. Somewhat surprising it
seems at first sight but as we increase to 265 kpps so we get much
closer to max_size (265k entries). So if RCU get problems with getting
the batch job (freeing the dst entries) done. We get dst cache over-
flow.
The RCU/sofirq stuff pop-up again (now with routing table loaded).
RCU will probable get on the agenda again as from what I heard the
netfilter folks have plans to use it.
It's also worth to notice that focusing just of getting rid of
"dst cache overflow" can give half performance as seen here. :-)
Since have max_size, gc_elasity, gc_thresh same. I think it's goal of
GC that causes the difference. (rt_hash_mask). With a smaller number the
GC gets much more aggressive.
I think this is indicated in rtstat:
size IN: hit tot
35320 62700 88890
Versus:
212976 212665 52703
Much more entries when we increased the bucket size. RCU can play some tricks
here as well.
Conclusions? IMO the dst cache "within it's operation range" is well behaved
traffic is giving us very good performance and something we don't have pay
attention to. We see cache is in effect simply as hit > tot.
But when hit <= tot we have at least two cases:
1) Traffic is not well behaved. DoS.
2) Traffic is well behaved but tuning is bad.
As the actions is totally different from above cases:
w. 1. Reduce size to avoid searching.
w. 2. Increase size so cache becomes active.
So it crucial to distinguish between the cases. Can it be done from incoming
traffic?
Cheers.
--ro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] option for large routing hash
2003-12-10 14:47 [PATCH] option for large routing hash Robert Olsson
@ 2003-12-10 23:05 ` David S. Miller
2003-12-12 23:10 ` Robert Olsson
0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2003-12-10 23:05 UTC (permalink / raw)
To: Robert Olsson; +Cc: Robert.Olsson, kuznet, netdev
On Wed, 10 Dec 2003 15:47:44 +0100
Robert Olsson <Robert.Olsson@data.slu.se> wrote:
> The RCU/sofirq stuff pop-up again (now with routing table loaded).
> RCU will probable get on the agenda again as from what I heard the
> netfilter folks have plans to use it.
I wish we had resolved that in the long thread we had about it
a few months ago. We should really reestablish the point we had
reached, and try to make some more forward progress.
This problem is not going to go away, as you say.
> But when hit <= tot we have at least two cases:
> 1) Traffic is not well behaved. DoS.
> 2) Traffic is well behaved but tuning is bad.
#2 should be prevented by the fact that GC prefers to toss out entries
in an LRU'ish manner.
I think I see what you're saying though, and in my opinion there
is way too much black magic in choosing gc_thresh etc. routing cache
parameters.
Other caches in the kernel grow to fill the needs of the system. The
routing cache grows only as large as it has been configured to be
allowed to grow, and afterwards it no longer responds to increasing
system needs.
The IPV4 routing front end has exactly this kind of logic. As you
load more and more routing table entries into the kernel it grows
the hashes in response.
> So it crucial to distinguish between the cases. Can it be done from
> incoming traffic?
This is the real question, and the other is how to implement dynamic
hash table growth in the routing cache. The locking is particularly
problematic, but RCU may be able to help us with this.
Monitoring traffic to make this decision is hard because to determine
whether cache misses are due to not well formed traffic vs. too small
cache sizing we'd need the entry we garbage collected to determine the
latter. It's the chicken and egg problem.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] option for large routing hash
2003-12-10 23:05 ` David S. Miller
@ 2003-12-12 23:10 ` Robert Olsson
0 siblings, 0 replies; 7+ messages in thread
From: Robert Olsson @ 2003-12-12 23:10 UTC (permalink / raw)
To: David S. Miller; +Cc: Robert Olsson, kuznet, netdev
David S. Miller writes:
> I wish we had resolved that in the long thread we had about it
> a few months ago. We should really reestablish the point we had
> reached, and try to make some more forward progress.
>
> This problem is not going to go away, as you say.
No I will not. I revisited the patches and they seems to some help but
here must be some other problem too. Seems we have to start over. Well
let me add the patch with the debugging from Sarma to start with. At
least there is no problem to trigger the overflows.
> I think I see what you're saying though, and in my opinion there
> is way too much black magic in choosing gc_thresh etc. routing cache
> parameters.
>
> Other caches in the kernel grow to fill the needs of the system. The
> routing cache grows only as large as it has been configured to be
> allowed to grow, and afterwards it no longer responds to increasing
> system needs.
>
> The IPV4 routing front end has exactly this kind of logic. As you
> load more and more routing table entries into the kernel it grows
> the hashes in response.
>
> > So it crucial to distinguish between the cases. Can it be done from
> > incoming traffic?
>
> This is the real question, and the other is how to implement dynamic
> hash table growth in the routing cache. The locking is particularly
> problematic, but RCU may be able to help us with this.
>
> Monitoring traffic to make this decision is hard because to determine
> whether cache misses are due to not well formed traffic vs. too small
> cache sizing we'd need the entry we garbage collected to determine the
> latter. It's the chicken and egg problem.
Yes. Well for desktop systems and system with not too much flows there is
no problem. For hi-flow servers and routers we should expect administrators
do some manual setup according their environment. As we started this
thread IMO increasing just hash size is the most continent in this case
a as most routing cache parameters is derived from it so the black
magic is untouched but still doing it's magic. We have used this for some
time on our routers. Of course things can be done better but RCU is probably
the most annoying for now.
Cheers.
--ro
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] option for large routing hash
@ 2003-12-09 15:09 Robert Olsson
2003-12-09 20:20 ` David S. Miller
0 siblings, 1 reply; 7+ messages in thread
From: Robert Olsson @ 2003-12-09 15:09 UTC (permalink / raw)
To: davem; +Cc: kuznet, netdev, Robert.Olsson
Hello!
I think patch should be useful as it helps performance a lot during high
flow load. I have some numbers if you are interested.
Cheers.
--ro
--- net/ipv4/Kconfig.orig 2003-12-09 14:00:14.000000000 +0100
+++ net/ipv4/Kconfig 2003-12-09 14:51:27.000000000 +0100
@@ -75,6 +75,19 @@
If unsure, say N.
+config IP_ROUTE_LARGE_HASH
+ bool "IP: large routing hash"
+ ---help---
+ Say yes to increase the number of dst hash buckets. Most likely you
+ are running a server or router with lots of flows. Saying yes
+ will increase used dst hash 8 times typically from 16Kbytes to 128
+ Kbytes on a 256Mbytes system.
+
+ dst hash is otherwise tunable via /proc/sys/net/ipv4/route/
+ Monitoring can be done with rtstat utility this comes with iproute2.
+
+ If unsure, say N.
+
config IP_ROUTE_FWMARK
bool "IP: use netfilter MARK value as routing key"
depends on IP_MULTIPLE_TABLES && NETFILTER
--- net/ipv4/route.c.orig 2003-12-09 13:59:57.000000000 +0100
+++ net/ipv4/route.c 2003-12-09 14:49:48.000000000 +0100
@@ -2747,6 +2747,9 @@
goal = num_physpages >> (26 - PAGE_SHIFT);
+#ifdef CONFIG_IP_ROUTE_LARGE_HASH
+ goal <<= 3;
+#endif
for (order = 0; (1UL << order) < goal; order++)
/* NOTHING */;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] option for large routing hash
2003-12-09 15:09 Robert Olsson
@ 2003-12-09 20:20 ` David S. Miller
2003-12-09 22:28 ` Robert Olsson
0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2003-12-09 20:20 UTC (permalink / raw)
To: Robert Olsson; +Cc: kuznet, netdev, Robert.Olsson
On Tue, 9 Dec 2003 16:09:07 +0100
Robert Olsson <Robert.Olsson@data.slu.se> wrote:
> I think patch should be useful as it helps performance a lot during high
> flow load. I have some numbers if you are interested.
I'm very hesitant about this, and it is not because I don't believe
that it brings better performance in your tests on your machines :)
Recently there was a thread on linux-kernel by the folks, such as Jes,
working on super-duper-huge NUMA boxes and how big the hash tables
get sized to on these machines.
In some of their configurations it was trying to allocate 1GB TCP
hash tables or something totally rediculious like this.
The problem with all of our current algorithms for size selection is
that it considers only one parameter when there are actually two.
It considers currently only relative memory consumption. It needs
to also consider hard limits that exist for useful hash table sizes.
There is a point at which hash table size exceeds it's usefulness in
that the gains you are getting from the O(1) lookup are offset by the
fact that the access to the hash table heads are constantly taking cpu
cache misses.
You've obtained good results in your tests with a _specific_ hash
table size for the routing cache, but the algorithm you are proposing
for the kernel computes things relative to the amount of memory in the
machine. It cannot be a function of only this parameter.
Do you see my point?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] option for large routing hash
2003-12-09 20:20 ` David S. Miller
@ 2003-12-09 22:28 ` Robert Olsson
2003-12-10 8:15 ` David S. Miller
0 siblings, 1 reply; 7+ messages in thread
From: Robert Olsson @ 2003-12-09 22:28 UTC (permalink / raw)
To: David S. Miller; +Cc: Robert Olsson, kuznet, netdev
David S. Miller writes:
> There is a point at which hash table size exceeds it's usefulness in
> that the gains you are getting from the O(1) lookup are offset by the
> fact that the access to the hash table heads are constantly taking cpu
> cache misses.
Yes.
> You've obtained good results in your tests with a _specific_ hash
> table size for the routing cache, but the algorithm you are proposing
> for the kernel computes things relative to the amount of memory in the
> machine. It cannot be a function of only this parameter.
>
> Do you see my point?
I do. Lets look at experiment to start with also we seen people trying
to use in real hi-flow environments. pktgen is sending 32 kflows with
flowlen 10 packets (64 byte) at 2 * 300 kpps on into a router. Packet
streams 2 * 1M packets. TX OK gives thoughput.
And cache settings max_size=262144 gc_thresh=32768 gc_elastity=8 for both
setup.
IP: routing cache hash table of 4096 buckets, 32Kbytes
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags
eth0 1500 0 2671635 9583466 9583466 7328370 10 0 0 0 BRU
eth1 1500 0 12 0 0 0 2671640 0 0 0 BRU
eth2 1500 0 2623413 9556039 9556039 7376591 4 0 0 0 BRU
eth3 1500 0 1 0 0 0 2623412 0 0 0 BRU
rtstat sample (truncated)
size IN: hit tot mc no_rt bcast madst masrc
35320 62700 88890 0 0 0 0 0
Look due to the cache size and GC it looks like a route DoS attack. We have
very little use of the hash as tot(long path) > (cache) hit. Lots of linear
seach in hash. And tot+hit gives the pps throughput 152 kpps. It's easy to
characterize as a DoS attack but flowlen is 10 packets.
----------------------------------------------------------------------------
IP: routing cache hash table of 32768 buckets, 256Kbytes
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags
eth0 1500 0 4382945 9293599 9293599 5617062 13 0 0 0 BRU
eth1 1500 0 16 0 0 0 4381291 0 0 0 BRU
eth2 1500 0 4290290 9292399 9292399 5709713 3 0 0 0 BRU
eth3 1500 0 1 0 0 0 4288727 0 0 0 BRU
rtstat sample (truncated)
size IN: hit tot mc no_rt bcast madst masrc
212976 212665 52703 0 0 0 0 0
We see cache is now used as hit > tot and we get a performance jump from
152 to 265 kpps.
Just as you said this was the experiment. I'll stop here for now.
Cheers.
--ro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] option for large routing hash
2003-12-09 22:28 ` Robert Olsson
@ 2003-12-10 8:15 ` David S. Miller
0 siblings, 0 replies; 7+ messages in thread
From: David S. Miller @ 2003-12-10 8:15 UTC (permalink / raw)
To: Robert Olsson; +Cc: Robert.Olsson, kuznet, netdev
On Tue, 9 Dec 2003 23:28:31 +0100
Robert Olsson <Robert.Olsson@data.slu.se> wrote:
> IP: routing cache hash table of 4096 buckets, 32Kbytes
...
> And tot+hit gives the pps throughput 152 kpps.
...
> IP: routing cache hash table of 32768 buckets, 256Kbytes
...
> We see cache is now used as hit > tot and we get a performance jump from
> 152 to 265 kpps.
>
> Just as you said this was the experiment. I'll stop here for now.
Thanks for the data.
I would eventually like an algorithm that uses a min/max range.
Perhaps something like:
const unsigned long rthash_min = PAGE_SIZE:
const unsigned long rthash_max = PAGE_ALIGN(512 * 1024 *
sizeof(struct rt_hash_bucket));
unsigned long rthash_choose_size(unsigned long num_physpages)
{
unsigned long goal;
goal = num_physpages >> (23 - PAGE_SHIFT);
if (goal < rthash_min)
goal = rthash_min;
if (goal > rthash_max)
goal = rthash_max;
return goal;
}
It's a combination of your goal computation adjustment along with
sanity limits, that's all.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-12-12 23:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-10 14:47 [PATCH] option for large routing hash Robert Olsson
2003-12-10 23:05 ` David S. Miller
2003-12-12 23:10 ` Robert Olsson
-- strict thread matches above, loose matches on Subject: below --
2003-12-09 15:09 Robert Olsson
2003-12-09 20:20 ` David S. Miller
2003-12-09 22:28 ` Robert Olsson
2003-12-10 8:15 ` David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).