* route cache GC monitoring
@ 2002-04-26 12:55 Robert Olsson
2002-04-26 15:25 ` kuznet
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Robert Olsson @ 2002-04-26 12:55 UTC (permalink / raw)
To: kuznet; +Cc: davem, hadi, netdev, jensl
[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 584 bytes --]
Hello!
We like to propose four new stats. counters to monitor the garbage process
of route cache. This should be useful for tuning and debugging installations
which have a large number of flows.
Tuning is another business but we have played with max_size and gc_thresh
there are more tuning knobs. The number of buckets in the hash table is
something to tune as well be but we have not experimented with this here.
Anyway for most installations the "default" setting does a very good job
but when we see high numbers in the new GC counters one should be warned.
Kernel patch.
[-- Attachment #2: GC.pat --]
[-- Type: application/octet-stream, Size: 2246 bytes --]
--- linux/include/net/route.h.orig Mon Feb 25 20:38:13 2002
+++ linux/include/net/route.h Fri Apr 19 15:50:20 2002
@@ -110,6 +110,10 @@
unsigned int out_hit;
unsigned int out_slow_tot;
unsigned int out_slow_mc;
+ unsigned int gc_total;
+ unsigned int gc_ignored;
+ unsigned int gc_goal_miss;
+ unsigned int gc_dst_overflow;
} ____cacheline_aligned_in_smp;
extern struct ip_rt_acct *ip_rt_acct;
--- linux/net/ipv4/route.c.orig Thu Apr 18 07:43:44 2002
+++ linux/net/ipv4/route.c Fri Apr 26 11:27:16 2002
@@ -286,7 +286,7 @@
for (lcpu = 0; lcpu < smp_num_cpus; lcpu++) {
i = cpu_logical_map(lcpu);
- len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
+ len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
dst_entries,
rt_cache_stat[i].in_hit,
rt_cache_stat[i].in_slow_tot,
@@ -298,7 +298,13 @@
rt_cache_stat[i].out_hit,
rt_cache_stat[i].out_slow_tot,
- rt_cache_stat[i].out_slow_mc
+ rt_cache_stat[i].out_slow_mc,
+
+ rt_cache_stat[i].gc_total,
+ rt_cache_stat[i].gc_ignored,
+ rt_cache_stat[i].gc_goal_miss,
+ rt_cache_stat[i].gc_dst_overflow
+
);
}
len -= offset;
@@ -499,9 +505,14 @@
* Garbage collection is pretty expensive,
* do not make it too frequently.
*/
+
+ rt_cache_stat[smp_processor_id()].gc_total++;
+
if (now - last_gc < ip_rt_gc_min_interval &&
- atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
+ atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size) {
+ rt_cache_stat[smp_processor_id()].gc_ignored++;
goto out;
+ }
/* Calculate number of entries, which we want to expire now. */
goal = atomic_read(&ipv4_dst_ops.entries) -
@@ -567,6 +578,8 @@
We will not spin here for long time in any case.
*/
+ rt_cache_stat[smp_processor_id()].gc_goal_miss++;
+
if (expire == 0)
break;
@@ -584,6 +597,7 @@
goto out;
if (net_ratelimit())
printk(KERN_WARNING "dst cache overflow\n");
+ rt_cache_stat[smp_processor_id()].gc_dst_overflow++;
return 1;
work_done:
[-- Attachment #3: message body text --]
[-- Type: text/plain, Size: 17 bytes --]
Use/Experiment.
[-- Attachment #4: usage --]
[-- Type: application/octet-stream, Size: 7665 bytes --]
Julian Anatasovs test program. Using 80000 different sources.
testlvs x.y.z:80 -udp -srcnum 80000 -packets 0
tot == GC: total calls per sec
ignored == GC: calls ignored per sec
goal_miss == GC: goal miss per sec
ovrflw == GC: dst_overflow per sec
Not loaded:
----------
max_size 65536
gc_thresh 4096
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
11 134 1159 4 0 15 0 0 146 854 0 0 0 0 0
11 0 1 0 0 0 0 0 0 0 0 0 0 0 0
11 0 1 0 0 0 0 0 0 0 0 0 0 0 0
11 0 3 0 0 0 0 0 0 0 0 0 0 0 0
Cache too small:
---------------
max_size 2000
gc_thresh 1000
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
2000 541 1326737 4 0 39 0 0 542 1321662 0 1286563 1009673 3946 3946
2000 0 876 0 0 0 0 0 0 808 0 1724 18 67 67
2000 0 877 0 0 0 0 0 0 810 0 1719 10 65 65
2000 0 876 0 0 0 0 0 0 809 0 1723 17 66 66
2000 0 876 0 0 0 0 0 0 808 0 1723 7 67 67
2000 0 875 0 0 0 0 0 0 807 0 1722 11 67 67
2000 0 875 0 0 0 0 0 0 806 0 1713 6 69 69
2000 0 875 0 0 0 0 0 0 807 0 1720 13 68 68
2000 0 875 0 0 0 0 0 0 810 0 1727 10 65 65
Cache still "too" small but no dst overflow:
--------------------------------------------
max_size 8000
gc_thresh 4000
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
3408 551 1374530 4 0 39 0 0 558 1366058 0 1377074 1010314 7313 7313
4743 0 673 0 0 0 0 0 0 671 0 743 742 0 0
5880 0 569 0 0 0 0 0 0 568 0 1137 1137 0 0
7610 0 861 0 0 0 0 0 0 861 0 1722 1722 0 0
7348 0 872 0 0 0 0 0 0 871 0 1742 1741 0 0
7563 0 872 0 0 0 0 0 0 870 0 1740 1738 0 0
7963 0 872 0 0 0 0 0 0 871 0 1742 1718 0 0
7976 0 873 0 0 0 0 0 0 871 0 1742 1704 0 0
7956 0 872 0 0 0 0 0 0 871 0 1742 1708 0 0
7970 0 876 0 0 0 0 0 0 875 0 1750 1712 0 0
More decent settings: (oscillating around gc_thresh)
----------------------------------------------------
max_size 94000
gc_thresh 47000
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
42490 559 1442677 6 0 46 0 0 574 1434138 0 1445795 1077777 7313 7313
43864 0 686 0 0 0 0 0 0 685 0 0 0 0 0
45224 0 676 0 0 0 0 0 0 675 0 0 0 0 0
46616 0 691 0 0 0 0 0 0 691 0 0 0 0 0
40518 0 684 0 0 0 0 0 0 684 0 660 659 0 0
41948 0 711 0 0 0 0 0 0 711 0 0 0 0 0
43698 3 871 0 0 0 0 0 7 871 0 0 0 0 0
45448 2 872 0 0 0 0 0 3 872 0 0 0 0 0
47198 0 871 0 0 0 0 0 0 871 0 197 197 0 0
40980 0 872 0 0 0 0 0 0 872 0 1207 1206 0 0
42534 0 774 0 0 0 0 0 0 773 0 0 0 0 0
44062 0 764 0 0 0 0 0 0 764 0 0 0 0 0
45260 0 596 0 0 0 0 0 0 595 0 0 0 0 0
46508 0 621 0 0 0 0 0 0 620 0 0 0 0 0
40081 0 561 0 0 0 0 0 0 560 0 227 226 0 0
41835 0 874 0 0 0 0 0 0 873 0 0 0 0 0
43581 0 870 0 0 0 0 0 0 869 0 0 0 0 0
45330 0 872 0 0 1 0 0 0 870 0 0 0 0 0
47082 0 872 0 0 0 0 0 0 872 0 81 81 0 0
40910 0 871 0 0 0 0 0 0 871 0 1031 1030 0 0
Optimum (?) setting for this load:
----------------------------------
max_size 100000
gc_thresh 50000
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc GC: tot ignored goal_miss ovrf
42271 589 1529963 6 0 52 0 0 610 1521341 0 1456394 1088362 7313 7313
44001 0 866 0 0 0 0 0 0 865 0 0 0 0 0
45743 0 872 0 0 0 0 0 0 871 0 0 0 0 0
47491 0 875 0 0 0 0 0 0 874 0 0 0 0 0
49241 0 875 0 0 0 0 0 0 875 0 0 0 0 0
42326 0 860 0 0 0 0 0 0 860 0 1 0 0 0
44075 0 876 0 0 1 0 0 0 874 0 0 0 0 0
45825 0 876 0 0 0 0 0 0 875 0 0 0 0 0
47574 0 876 1 0 0 0 0 0 874 0 0 0 0 0
49316 0 872 0 0 0 0 0 0 871 0 0 0 0 0
42373 0 876 0 0 0 0 0 0 875 0 8 7 0 0
44121 0 875 0 0 0 0 0 0 874 0 0 0 0 0
45873 0 875 0 0 0 0 0 0 875 0 0 0 0 0
47623 0 875 0 0 0 0 0 0 875 0 0 0 0 0
49371 0 875 0 0 0 0 0 0 874 0 0 0 0 0
[-- Attachment #5: message body text --]
[-- Type: text/plain, Size: 68 bytes --]
Cheers.
Robert Olsson, Jens Laas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: route cache GC monitoring
2002-04-26 12:55 route cache GC monitoring Robert Olsson
@ 2002-04-26 15:25 ` kuznet
2002-04-29 10:14 ` dst_entry and friends Gabriel Paues
2002-05-09 9:35 ` route cache GC monitoring David S. Miller
2 siblings, 0 replies; 5+ messages in thread
From: kuznet @ 2002-04-26 15:25 UTC (permalink / raw)
To: Robert Olsson; +Cc: davem, hadi, netdev, jensl
Hello!
> We like to propose four new stats. counters to monitor the garbage process
> of route cache. This should be useful for tuning and debugging installations
> which have a large number of flows.
OK.
> Tuning is another business but we have played with max_size and gc_thresh
> there are more tuning knobs.
There exist one more interesting parameter: gc_elasticity,
which is supposed to control length of hash bucket.
Alexey
^ permalink raw reply [flat|nested] 5+ messages in thread
* dst_entry and friends
2002-04-26 12:55 route cache GC monitoring Robert Olsson
2002-04-26 15:25 ` kuznet
@ 2002-04-29 10:14 ` Gabriel Paues
2002-04-29 14:55 ` Andi Kleen
2002-05-09 9:35 ` route cache GC monitoring David S. Miller
2 siblings, 1 reply; 5+ messages in thread
From: Gabriel Paues @ 2002-04-29 10:14 UTC (permalink / raw)
To: netdev
Hi!
I'm working on my Masters Thesis which aims at making a label switched
environment. I know there are protocols for doing that but I am supposed to do
it at IPv6-level (by altering the IPv6-stack) thus making Linux route on
different criteria than the usual routing table.
This is what i want to do:
If the flowlabel in the IPv6 header is set to some value except zero then skip
the whole routing behaviour and check for the flowlabel in an internal table,
and get the out-device and new flowlabel from that table and send the packet.
otherwise:
Just do as usual
My idea is to use netfilter and in the PRE_ROUTING-hook check the flowlabel for
values other than zero, and set the dst_entry in the skbuff to something with
its input-function to ip6_output. I dont know if I should allocate this
fabricated dst_entry myself or if I should scan the list of already existing
ones. By setting the input-function-pointer to ip6_output i will skip the whole
routing behaviour in a snap. The same goes for the hook LOCAL_OUT where i want
to skip ip6_maybe_reroute function and go directly to ip6_output.
This might sound like a quite destructive concept, but the whole thing is that
my teacher thought those "small hacks" could be done in notime, whle I have
discovered that thats not the case.
Do you know of any easier way of doing this? And if not, how do I get hold of a
dst_entry when I know what netdevice that should be the output? I have had a
look in rt6_device_match function in route.c and think that it does more or
less what I want to do.
I would be grateful if some Linux Ipv6-implementation guru would read this and
try to understand what I'm trying to do... :-)
Sincerily
Gabriel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: dst_entry and friends
2002-04-29 10:14 ` dst_entry and friends Gabriel Paues
@ 2002-04-29 14:55 ` Andi Kleen
0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2002-04-29 14:55 UTC (permalink / raw)
To: Gabriel Paues; +Cc: netdev
On Mon, Apr 29, 2002 at 12:14:48PM +0200, Gabriel Paues wrote:
> My idea is to use netfilter and in the PRE_ROUTING-hook check the flowlabel for
> values other than zero, and set the dst_entry in the skbuff to something with
> its input-function to ip6_output. I dont know if I should allocate this
> fabricated dst_entry myself or if I should scan the list of already existing
> ones. By setting the input-function-pointer to ip6_output i will skip the whole
That won't work for most networking protocols, which always route themselves.
All you can do is to drop the route later in netfilter and reroute/resubmit
Still your concept will fail for path mtu discovery for example, which
assumes that an incoming ICMP can be matched to an dst_entry in the routing
cache to communicate the new pmtu.
It may be easier to just change ip_route_output(). Routes are often
cached however, so you would need to make sure that the caches
are invalidated when the flow label changes (e.g. by setting dst->obsolete)
Overall the concept does not look very promising however. I guess to make
the labels unique you'll have to add srcip/dstip to your private tables too,
and that means that the label lookup is exactly equivalent in cost to
a normal routing cache lookup - even if you didn't need the src/dst it
probably wouldn't be much cheaper.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: route cache GC monitoring
2002-04-26 12:55 route cache GC monitoring Robert Olsson
2002-04-26 15:25 ` kuznet
2002-04-29 10:14 ` dst_entry and friends Gabriel Paues
@ 2002-05-09 9:35 ` David S. Miller
2 siblings, 0 replies; 5+ messages in thread
From: David S. Miller @ 2002-05-09 9:35 UTC (permalink / raw)
To: Robert.Olsson; +Cc: kuznet, hadi, netdev, jensl
Sorry for taking so long, I've added your patch to both
2.4.x and 2.5.x, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-05-09 9:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-26 12:55 route cache GC monitoring Robert Olsson
2002-04-26 15:25 ` kuznet
2002-04-29 10:14 ` dst_entry and friends Gabriel Paues
2002-04-29 14:55 ` Andi Kleen
2002-05-09 9:35 ` route cache GC monitoring David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).