* dst cache overflow 2.2.x; x>=16
@ 2002-04-14 17:51 Milam, Chad
2002-04-14 19:39 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 17:51 UTC (permalink / raw)
To: netdev
First, I apologise for the cross post. I sent this to the lkml,
and several people said I should post it here as well.....
I have been looking into a problem on a couple of linux routers that
I have. All of them are used for routing between the private network
and the Internet. Some run Check Point VPN-1 v4.1 SP5, some do not.
The problem is that after about 22 hours, they all have "dst cache
overflow", which is quite easily traced back to net/ipv4/route.c
rt_garbage_collect(). rt_garbage_collect appears to be called or
initiated from two places 1) the dst_ops structure 2) rt_intern_hash.
Based on my testing here is what I have come up with.... dst_ops.entries
is not being set appropriately (or at least not what I would expect it
to be). I determined this by changing dst_ops->gc to point to a new
function (rt_display_tot) for debugging, as well as had rt_intern_hash()
call this function instead of rt_garbage_collect().
The sole purpose of this function was to loop through all hash chains and
count up the entries that should be valid, do a printk("%d %d", count,
dst_ops.entries), then return(rt_garbage_collect()). The result was, that
when rt_garbage_collect() returned 1 (dst cache overflow), the number of
entries reported by dst_ops.entries was far different than the number
reported by my loop/counter.
Upon further investigation in to rt_free(), __dst_free(), and dst_destroy(),
I found that only dst_destroy() decrements dst_ops.entries.
Furthermore, when dst_ops->gc returns 1, dst_alloc() will not create an
entry, appropriately so, the box is at a stand still.
So... For the interim, I have create a small patch that purges the dst cache
table, and resets dst_ops.entries to 0 anytime rt_garbage_collect() returns 1.
The result... The box stays up, and hums along quite happily.
I would appreciate any comments with regards to this matter. I have also
included a copy of the patch that I created to work around the issue.
Thanks,
Chad
diff -urNp linux-2.2.16cwm/net/ipv4/route.c linux-2.2.16cwm/net/ipv4/route.c
-n- linux-2.2.16/net/ipv4/route.c Tue Jan 4 13:12:26 2000
+++ linux-2.2.16cwm/net/ipv4/route.c Tue Apr 9 15:14:12 2002
@@ -96,7 +96,7 @@
#define IP_MAX_MTU 0xFFF0
-#define RT_GC_TIMEOUT (300*HZ)
+#define RT_GC_TIMEOUT (120*HZ)
int ip_rt_min_delay = 2*HZ;
int ip_rt_max_delay = 10*HZ;
@@ -134,7 +134,8 @@ static struct dst_entry * ipv4_dst_rerou
static struct dst_entry * ipv4_negative_advice(struct dst_entry *);
static void ipv4_link_failure(struct sk_buff *skb);
static int rt_garbage_collect(void);
-
+static int rt_delete_now(void);
+static int rt_garbage_ctl(void);
struct dst_ops ipv4_dst_ops =
{
@@ -142,7 +143,7 @@ struct dst_ops ipv4_dst_ops =
__constant_htons(ETH_P_IP),
RT_HASH_DIVISOR,
- rt_garbage_collect,
+ rt_garbage_ctl,
ipv4_dst_check,
ipv4_dst_reroute,
NULL,
@@ -508,8 +509,7 @@ static int rt_garbage_collect(void)
if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
return 0;
- if (net_ratelimit())
- printk("dst cache overflow\n");
+
return 1;
work_done:
@@ -570,7 +570,7 @@ restart:
int saved_int = ip_rt_gc_min_interval;
ip_rt_gc_elasticity = 1;
ip_rt_gc_min_interval = 0;
- rt_garbage_collect();
+ rt_garbage_ctl();
ip_rt_gc_min_interval = saved_int;
ip_rt_gc_elasticity = saved_elasticity;
goto restart;
@@ -2045,4 +2045,44 @@ __initfunc(void ip_rt_init(void))
ent->read_proc = ip_rt_acct_read;
#endif
#endif
+}
+
+static int rt_delete_now(void){
+ struct rtable *rth, **rthp;
+ int i,ent1,ent2,c;
+
+ i=0;
+ ent1=0;
+ ent2=0;
+ c=0;
+
+ ent1=atomic_read(&ipv4_dst_ops.entries);
+ start_bh_atomic();
+ while(i<RT_HASH_DIVISOR){
+ rthp=&rt_hash_table[i];
+ while((rth=*rthp)!=NULL){
+ *rthp=rth->u.rt_next;
+ rth->u.rt_next=NULL;
+ c+=1;
+ rt_free(rth);
+ }
+ i++;
+ }
+
+ atomic_set(&ipv4_dst_ops.entries,0);
+ end_bh_atomic();
+ ent2=atomic_read(&ipv4_dst_ops.entries);
+
+ if(net_ratelimit()){
+ printk("dst cache overflow\n");
+ printk("rt_delete_now(); s:%d e:%d t:%d\n",ent1,ent2,c);
+ }
+
+ return 0;
+}
+
+static int rt_garbage_ctl(void){
+ if(rt_garbage_collect())
+ rt_delete_now();
+ return 0;
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: dst cache overflow 2.2.x; x>=16
2002-04-14 17:51 Milam, Chad
@ 2002-04-14 19:39 ` jamal
2002-04-14 19:43 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2002-04-14 19:39 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
Hi,
Why couldnt you just modify the parameters in /proc?
-Increase the overflow threshold (you actually hardcode this)
/proc/sys/net/ipv4/route/gc_thresh
-decrease the gc timer
/proc/sys/net/ipv4/route/gc_timeout
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: dst cache overflow 2.2.x; x>=16
2002-04-14 19:39 ` jamal
@ 2002-04-14 19:43 ` jamal
0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-14 19:43 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
On Sun, 14 Apr 2002, jamal wrote:
>
> Hi,
>
> Why couldnt you just modify the parameters in /proc?
>
> -Increase the overflow threshold (you actually hardcode this)
> /proc/sys/net/ipv4/route/gc_thresh
> -decrease the gc timer
> /proc/sys/net/ipv4/route/gc_timeout
Sorry, you hardcoded /proc/sys/net/ipv4/route/gc_timeout
to 120 secs.
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-14 19:54 Milam, Chad
2002-04-14 20:04 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 19:54 UTC (permalink / raw)
To: netdev
I lowered the timeout to make gc more agressive. Though, it can still
be adjusted via a /proc entry. Default was 300. Increasing the other
parameters that you specified (which I have done) only delays the
inevitable "dst cache overflow". The problem is that gc (rather
rt_free) is not decrementing .entries. So it _thinks_ the table
has overflown.
chad
> -----Original Message-----
> From: owner-netdev@oss.sgi.com@YRINC On Behalf Of jamal <hadi@cyberus.ca>
> Sent: Sunday, April 14, 2002 3:44 PM
> To: Milam, Chad
> Cc: netdev@oss.sgi.com
> Subject: Re: dst cache overflow 2.2.x; x>=16
>
>
>
> On Sun, 14 Apr 2002, jamal wrote:
>
> >
> > Hi,
> >
> > Why couldnt you just modify the parameters in /proc?
> >
> > -Increase the overflow threshold (you actually hardcode this)
> > /proc/sys/net/ipv4/route/gc_thresh
> > -decrease the gc timer
> > /proc/sys/net/ipv4/route/gc_timeout
>
> Sorry, you hardcoded /proc/sys/net/ipv4/route/gc_timeout
> to 120 secs.
>
> cheers,
> jamal
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-14 19:54 Milam, Chad
@ 2002-04-14 20:04 ` jamal
0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-14 20:04 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
On Sun, 14 Apr 2002, Milam, Chad wrote:
> I lowered the timeout to make gc more agressive. Though, it can still
> be adjusted via a /proc entry. Default was 300. Increasing the other
> parameters that you specified (which I have done) only delays the
> inevitable "dst cache overflow". The problem is that gc (rather
> rt_free) is not decrementing .entries. So it _thinks_ the table
> has overflown.
>
Overflow will only happen if /proc/sys/net/ipv4/route/gc_thresh
is exceeded. A default of 512 aint that big. What is the average number
of entries you are seeing?
What kind of data do you get from running rtstat?
Increment the size of /proc/sys/net/ipv4/route/gc_thresh to a higher
number matching your avg entries;
Garbage collection aint that cheap: so safer to just make the size
larger instead of invoking it more frequently -- RAM is cheap. Note also
that garbage collection will run every
/proc/sys/net/ipv4/route/gc_min_interval time expiry regardless of how
you big your max threshold is.
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-14 20:25 Milam, Chad
2002-04-14 20:32 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 20:25 UTC (permalink / raw)
To: netdev
> On Sun, 14 Apr 2002, Milam, Chad wrote:
>
> > I lowered the timeout to make gc more agressive. Though, it can still
> > be adjusted via a /proc entry. Default was 300. Increasing the other
> > parameters that you specified (which I have done) only delays the
> > inevitable "dst cache overflow". The problem is that gc (rather
> > rt_free) is not decrementing .entries. So it _thinks_ the table
> > has overflown.
> >
>
> Overflow will only happen if /proc/sys/net/ipv4/route/gc_thresh
> is exceeded. A default of 512 aint that big. What is the average number
> of entries you are seeing?
> What kind of data do you get from running rtstat?
> Increment the size of /proc/sys/net/ipv4/route/gc_thresh to a higher
> number matching your avg entries;
>
> Garbage collection aint that cheap: so safer to just make the size
> larger instead of invoking it more frequently -- RAM is cheap. Note also
> that garbage collection will run every
> /proc/sys/net/ipv4/route/gc_min_interval time expiry regardless of how
> you big your max threshold is.
>
> cheers,
> jamal
Actually changing the timer to 120*HZ was not supposed to end up in the patch,
I pulled it out of there, but managed to leave it in the diff for the email.
/proc/sys/net/ipv4/route/gc_min_interval = 1
/proc/sys/net/ipv4/route/max_size=16384 (default 4096)
Also, based on the code from route.c:
--
if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
return 0;
if (net_ratelimit())
printk("dst cache overflow\n");
return 1;
--
Looks to me like gc_thresh has nothing to do with it. Did I read that wrong?
chad
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-14 20:25 Milam, Chad
@ 2002-04-14 20:32 ` jamal
2002-04-14 20:38 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2002-04-14 20:32 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
On Sun, 14 Apr 2002, Milam, Chad wrote:
> /proc/sys/net/ipv4/route/max_size=16384 (default 4096)
Ok, i meant:
/proc/sys/net/ipv4/route/gc_thresh
> Also, based on the code from route.c:
> --
> if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
> return 0;
> if (net_ratelimit())
> printk("dst cache overflow\n");
> return 1;
> --
>
0 typically means the gc was succesful
> Looks to me like gc_thresh has nothing to do with it. Did I read that wrong?
>
I just remembered an old problem that used to cause this; do you
have lo configured?
make sure it is IP address 127.0.0.1 and it is up.
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-14 20:32 ` jamal
@ 2002-04-14 20:38 ` jamal
0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-14 20:38 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
On Sun, 14 Apr 2002, jamal wrote:
> > /proc/sys/net/ipv4/route/max_size=16384 (default 4096)
>
> Ok, i meant:
> /proc/sys/net/ipv4/route/gc_thresh
>
Sorry, you are right, the correct value is written/read from
/proc/sys/net/ipv4/route/max_size
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-14 20:44 Milam, Chad
2002-04-14 20:53 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 20:44 UTC (permalink / raw)
To: netdev, hadi
> On Sun, 14 Apr 2002, Milam, Chad wrote:
>
> > /proc/sys/net/ipv4/route/max_size=16384 (default 4096)
>
> Ok, i meant:
> /proc/sys/net/ipv4/route/gc_thresh
it also looks like gc_thresh defaults to RT_HASH_DIVISOR (256).
however I have....
/proc/sys/net/ipv4/route/gc_thresh=2048
> > Also, based on the code from route.c:
> > --
> > if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
> > return 0;
> > if (net_ratelimit())
> > printk("dst cache overflow\n");
> > return 1;
> > --
> >
>
> 0 typically means the gc was succesful
true enough. but the way I read that is that if entries>max_size,
you are going to get "dst cache overflow" which returns a 1. there
is no test here for entries>gc_thresh.
> I just remembered an old problem that used to cause this; do you
> have lo configured?
> make sure it is IP address 127.0.0.1 and it is up.
sure do. read about that issue quite sometime ago.
thanks again,
chad
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-14 20:44 dst cache overflow 2.2.x; x>=16 Milam, Chad
@ 2002-04-14 20:53 ` jamal
2002-04-14 21:38 ` Robert Olsson
0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2002-04-14 20:53 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
If i summarize your problem is that you are building up
dst caches faster than they can be garbage collected.
Solution
1. Make the max size large enough to catchup with rate
2. Make sure that every time you go into garbage collection you are
successful.
- reducing the min interval to 1 might be a little aggressive.
But you can tune this later
- You wanna make sure you get a large positive "goal" every time
play with ip_rt_gc_elasticity (/proc/sys/net/ipv4/route/gc_elasticity)
also the rt_hash_log
All the above are configurable via /proc
have to run
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-14 20:53 ` jamal
@ 2002-04-14 21:38 ` Robert Olsson
0 siblings, 0 replies; 21+ messages in thread
From: Robert Olsson @ 2002-04-14 21:38 UTC (permalink / raw)
To: Milam, Chad; +Cc: jamal, netdev
jamal writes:
>
>
>
> If i summarize your problem is that you are building up
> dst caches faster than they can be garbage collected.
>
> Solution
> 1. Make the max size large enough to catchup with rate
> 2. Make sure that every time you go into garbage collection you are
> successful.
> - reducing the min interval to 1 might be a little aggressive.
> But you can tune this later
> - You wanna make sure you get a large positive "goal" every time
> play with ip_rt_gc_elasticity (/proc/sys/net/ipv4/route/gc_elasticity)
> also the rt_hash_log
>
> All the above are configurable via /proc
>
> have to run
And in in 2.4.X the GC is done more dynamically around an "equilibrium point".
Alexey warned about 2.2 code...
Snaphot from Linux router. 2.4.10
cat /proc/sys/net/ipv4/route/max_size
65536
rtstat
size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc
9861 24721 131 0 1 0 0 0 2 1 0
10119 25044 128 0 0 0 0 0 2 0 0
2514 24125 1293 2 0 0 0 0 1 2 0
3654 24315 591 2 1 1 0 0 0 2 1
4441 25170 387 0 2 0 0 0 1 3 0
5060 25000 304 2 1 0 0 0 0 2 0
5532 25627 230 2 0 0 0 0 0 2 0
5947 25754 242 2 0 0 0 0 1 3 0
6379 25602 211 0 1 0 0 0 2 3 0
6371 25523 235 0 0 0 0 0 1 1 0
6752 24251 187 1 0 0 0 0 0 1 0
7077 25310 160 0 0 0 0 0 1 1 0
6851 24608 222 2 1 0 0 0 1 3 0
7256 25313 199 1 0 0 0 0 1 2 0
7086 24656 174 0 0 0 0 0 0 1 0
7459 24070 180 3 1 0 0 0 1 2 0
2434 23844 1340 7 1 0 0 0 1 3 0
1:st ipv4_dst_ops.entries. (You see GC happen)
2:nd: Warm cache hits -> approx aggregated packet/sec.
3:rd: Cache misses -> approx connections/sec.
Cheers.
--ro
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 15:21 Milam, Chad
2002-04-15 17:53 ` jamal
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 15:21 UTC (permalink / raw)
To: netdev
Robert Olson writes:
> jamal writes:
> >
> > If i summarize your problem is that you are building up
> > dst caches faster than they can be garbage collected.
> >
> > Solution
> > 1. Make the max size large enough to catchup with rate
> > 2. Make sure that every time you go into garbage collection you are
> > successful.
> > - reducing the min interval to 1 might be a little aggressive.
> > But you can tune this later
> > - You wanna make sure you get a large positive "goal" every time
> > play with ip_rt_gc_elasticity (/proc/sys/net/ipv4/route/gc_elasticity)
> > also the rt_hash_log
> >
> > All the above are configurable via /proc
> >
> > have to run
>
> And in in 2.4.X the GC is done more dynamically around an "equilibrium point".
> Alexey warned about 2.2 code...
>
> Snaphot from Linux router. 2.4.10
>
> cat /proc/sys/net/ipv4/route/max_size
> 65536
>
> rtstat
> size IN: hit tot mc no_rt bcast madst masrc OUT: hit tot mc
>
> 1:st ipv4_dst_ops.entries. (You see GC happen)
> 2:nd: Warm cache hits -> approx aggregated packet/sec.
> 3:rd: Cache misses -> approx connections/sec.
unfortunately I cannot move my Check Point boxes to 2.4.x yet. Maybe this will come
in the future.
At the end of the day, my patch was not trying to avoid GC, or eliminate it. It was just
there to keep the box from going completely dead... it accomplishes exactly that. I
have run it now for about a week, and the box has not had to be restarted, where as with
out it, I would have restarted once per day. Even with route/max_size set to 65536,
I had to restart about every two weeks.
I am also not suggesting that GC does not work, it does (for the most part). What I am
trying to say is that there is a condition (still working on that bit) that keeps the
.entries counter from decreasing to what it should be. Something, some process, is
leaking routes (at least into the counter). This is where my problem is. And the patch
works around that, by setting the cache to zero, and starting over. Which, again, is better
than restarting.
Thanks again,
chad
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-15 15:21 Milam, Chad
@ 2002-04-15 17:53 ` jamal
0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-15 17:53 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
On Mon, 15 Apr 2002, Milam, Chad wrote:
> At the end of the day, my patch was not trying to avoid GC, or eliminate
> it. It was just
> there to keep the box from going completely dead...
I dont think yourt patch guarantees this (you are nuking routes that
may still be actively used) -- i think you may have been lucky so far.
Regardless, this seems to be an interesting case of fixing what appears
to be a application bug with a kernel patch. Its amazing what you can do
when you have source.
cheers,
jamal
PS:- the fact that you are running 2.2 is useful information that
you left out.
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 18:10 Milam, Chad
2002-04-15 18:27 ` jamal
2002-04-15 18:31 ` Julian Anastasov
0 siblings, 2 replies; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 18:10 UTC (permalink / raw)
To: netdev
jamal writes:
> On Mon, 15 Apr 2002, Milam, Chad wrote:
>
> > At the end of the day, my patch was not trying to avoid GC, or eliminate
> > it. It was just
> > there to keep the box from going completely dead...
>
> I dont think yourt patch guarantees this (you are nuking routes that
> may still be actively used) -- i think you may have been lucky so far.
> Regardless, this seems to be an interesting case of fixing what appears
> to be a application bug with a kernel patch. Its amazing what you can do
> when you have source.
>
> cheers,
> jamal
>
> PS:- the fact that you are running 2.2 is useful information that
> you left out.
>
the fact that i am using 2.2 is stated in the subject line. i did neglect to
put it explicitly put it in the message (sorry). it is, however, also
diff/patch file.
I also do not think that nuking valid routes in the cache will produce any
major issues, other than slowing things down for a few seconds. the cache
is just the cache, not the real route table. and yes, it pretty much
guarantees the route cache will be purged, therefore avoiding a reboot and
avoiding a quickly repeated overflow...
and yes, having the source makes things much easier. :)
chad
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-15 18:10 Milam, Chad
@ 2002-04-15 18:27 ` jamal
2002-04-15 18:31 ` Julian Anastasov
1 sibling, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-15 18:27 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
On Mon, 15 Apr 2002, Milam, Chad wrote:
> the fact that i am using 2.2 is stated in the subject line. i did neglect to
> put it explicitly put it in the message (sorry). it is, however, also
> diff/patch file.
I apologize. I spent all my time rambling to you based on 2.4 code ;-<
>
> I also do not think that nuking valid routes in the cache will produce any
> major issues, other than slowing things down for a few seconds. the cache
> is just the cache, not the real route table. and yes, it pretty much
> guarantees the route cache will be purged, therefore avoiding a reboot and
> avoiding a quickly repeated overflow...
>
Typically most of the code will check for the dst cache or some
dereferencing within it before using it. I am not sure we can
swear by this ;-> i suppose we will find out when you get an oops ;->
maybe you should just purge the routes marked as expired.
cheers,
jamal
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-15 18:10 Milam, Chad
2002-04-15 18:27 ` jamal
@ 2002-04-15 18:31 ` Julian Anastasov
1 sibling, 0 replies; 21+ messages in thread
From: Julian Anastasov @ 2002-04-15 18:31 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev
Hello,
On Mon, 15 Apr 2002, Milam, Chad wrote:
> I also do not think that nuking valid routes in the cache will produce any
> major issues, other than slowing things down for a few seconds. the cache
> is just the cache, not the real route table. and yes, it pretty much
Of course. You can play only with max_size to achieve the same
result. max_size should be appropriate to the rate new hosts appear
in the cache. I'm wondering whether your patched kernel does not have
some bug, for example, unfreed skbs or struct rtable. Make sure that
the unpatched kernels have the same bug. If it appears after 22
hours (I assume the system load for all these 22 hours is same)
then this is a bug. Playing with the hash size is final step but it
can only give you some CPU cycles. Touching max_size should be
enough.
> guarantees the route cache will be purged, therefore avoiding a reboot and
> avoiding a quickly repeated overflow...
Are you sure you have stalled entries? What shows /proc/slabinfo
after 22 hours (skbuff_head_cache, etc)?
One hint: can this command solve the problem (to flush the
cache entries)?:
for i in down up ; do ip link set ethXXX $i ; done
> chad
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 19:07 Milam, Chad
2002-04-15 22:53 ` Julian Anastasov
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 19:07 UTC (permalink / raw)
To: netdev, Julian Anastasov <ja@ssi.bg>
Julian writes:
> Hello,
>
> On Mon, 15 Apr 2002, Milam, Chad wrote:
>
> > I also do not think that nuking valid routes in the cache will produce any
> > major issues, other than slowing things down for a few seconds. the cache
> > is just the cache, not the real route table. and yes, it pretty much
>
> Of course. You can play only with max_size to achieve the same
> result. max_size should be appropriate to the rate new hosts appear
> in the cache. I'm wondering whether your patched kernel does not have
> some bug, for example, unfreed skbs or struct rtable. Make sure that
> the unpatched kernels have the same bug. If it appears after 22
> hours (I assume the system load for all these 22 hours is same)
> then this is a bug. Playing with the hash size is final step but it
> can only give you some CPU cycles. Touching max_size should be
> enough.
no. Increasing max_size only delays its death. That is my point. The problem
existed on an out of the box RH7, RH6.2, RH6.1 install. The whole point of the
patch was to fix a problem that existed _prior_ to me patching it.
> > guarantees the route cache will be purged, therefore avoiding a reboot and
> > avoiding a quickly repeated overflow...
>
> Are you sure you have stalled entries? What shows /proc/slabinfo
> after 22 hours (skbuff_head_cache, etc)?
well, what I can tell you is this. If I run a loop like the following, counter
will only show say, 50 routes in the cache.
------
start=atomic_read(&ipv4_dst_ops.entries);
i=0;
counter=0;
while(i<RT_HASH_DIVISOR){
rthp=&rt_hash_table[i];
while((rth=*rthp)!=NULL){
*rthp=rth->u.rt_next;
rth->u.rt_next=NULL;
counter+=1;
}
i++;
}
printk("before: %d, after: %d", start, counter);
------
> One hint: can this command solve the problem (to flush the
> cache entries)?:
>
> for i in down up ; do ip link set ethXXX $i ; done
downing all interfaces and reupping them does not seem to solve the problem either :S
thanks,
chad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: dst cache overflow 2.2.x; x>=16
2002-04-15 22:53 ` Julian Anastasov
@ 2002-04-15 19:53 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2002-04-15 19:53 UTC (permalink / raw)
To: Julian Anastasov; +Cc: Milam, Chad, netdev, Julian Anastasov <ja@ssi.bg>
> I mean when you see the dst cache overflow message can the
> command help? But ... may be are running a kernel patched with your
> changes. I'm asking this because I know cases where wrong changes
> can make problems with dst cache. But the plain kernel should be
> fine. One question more: can you say that this box is used only as
> router or what kind of TCP or UDP connections you have (to/from the
> box)? There can be some corner cases in the dst cache usage from
> connected sockets.
I would suspect CheckPoint (I think it has kernel modules, hasn't it)
We had a similar report of such a thing a few months ago and they were
using CheckPoint too.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 19:58 Milam, Chad
2002-04-15 21:47 ` Robert Olsson
0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 19:58 UTC (permalink / raw)
To: netdev, Julian Anastasov <ja@ssi.bg>
Julian wrote:
> On Mon, 15 Apr 2002, Milam, Chad wrote:
>
> > > can only give you some CPU cycles. Touching max_size should be
> > > enough.
> >
> > no. Increasing max_size only delays its death. That is my point. The problem
> > existed on an out of the box RH7, RH6.2, RH6.1 install. The whole point of the
> > patch was to fix a problem that existed _prior_ to me patching it.
>
> I assume you mean a plain kernel (without Check Point?).
indeed.. out of the box, nothing installed additional, ip params tuned.
>
> > > Are you sure you have stalled entries? What shows /proc/slabinfo
> > > after 22 hours (skbuff_head_cache, etc)?
> >
> > well, what I can tell you is this. If I run a loop like the following, counter
> > will only show say, 50 routes in the cache.
>
> It means there are 50 linked cache entries but the
> ipv4_dst_ops.entries reaches the limit, very strange.
this is what I am keep trying to say :)
> > > for i in down up ; do ip link set ethXXX $i ; done
> >
> > downing all interfaces and reupping them does not seem to solve the problem either :S
>
> I mean when you see the dst cache overflow message can the
> command help? But ... may be are running a kernel patched with your
> changes. I'm asking this because I know cases where wrong changes
> can make problems with dst cache. But the plain kernel should be
> fine. One question more: can you say that this box is used only as
> router or what kind of TCP or UDP connections you have (to/from the
> box)? There can be some corner cases in the dst cache usage from
> connected sockets.
I originally setup a script to do basically this... grep "dst cache"
/var/log/messages; if i get a hit, down and up all interfaces. that didnt
work, so then I changed it to reboot the box :(. This was done on a box
with _no_ funny stuff... again, bog standard.
The box is a router, no ip masq, no ip chains, no ip fw, just a router.
Thanks,
chad
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-15 19:58 Milam, Chad
@ 2002-04-15 21:47 ` Robert Olsson
0 siblings, 0 replies; 21+ messages in thread
From: Robert Olsson @ 2002-04-15 21:47 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev, Julian Anastasov <ja@ssi.bg>
[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 455 bytes --]
Milam, Chad writes:
>
> The box is a router, no ip masq, no ip chains, no ip fw, just a router.
Weird. Julian has a useful program testlvs for testing route
cache. I just tested this w. linux-2.2.17 many srcnum (80000) but
cannot force cache overflow. Ok it was not for hours. And 2.2.X
has recent cache code I was wrong here. I have used it for just
for routers for pretty demanding jobs.
I would look for grows in /proc/slabinfo too...
[-- Attachment #2: rt_cache_stat-2.2.17.pat --]
[-- Type: application/octet-stream, Size: 5850 bytes --]
--- linux/include/net/route.h.orig Sun Nov 5 22:18:35 2000
+++ linux/include/net/route.h Mon Apr 15 22:19:18 2002
@@ -14,6 +14,7 @@
* Alan Cox : Support for TCP parameters.
* Alexey Kuznetsov: Major changes for new routing code.
* Mike McLagan : Routing by source
+ * Robert Olsson : Added rt_cache statistics
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -102,6 +103,20 @@
__u32 o_packets;
__u32 i_bytes;
__u32 i_packets;
+};
+
+struct rt_cache_stat
+{
+ unsigned in_hit;
+ unsigned in_slow_tot;
+ unsigned in_slow_mc;
+ unsigned in_no_route;
+ unsigned in_brd;
+ unsigned in_martian_dst;
+ unsigned in_martian_src;
+ unsigned out_hit;
+ unsigned out_slow_tot;
+ unsigned out_slow_mc;
};
extern struct ip_rt_acct ip_rt_acct[256];
--- linux/net/ipv4/route.c.orig Tue Jan 4 19:12:26 2000
+++ linux/net/ipv4/route.c Mon Apr 15 23:55:37 2002
@@ -52,6 +52,7 @@
* Tobias Ringstrom : Uninitialized res.type in ip_route_output_slow.
* Vladimir V. Ivanov : IP rule info (flowid) is really useful.
* Marc Boucher : routing by fwmark
+ * Robert Olsson : Added rt_cache statistics
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -176,6 +177,8 @@
struct rtable *rt_hash_table[RT_HASH_DIVISOR];
+struct rt_cache_stat rt_cache_stat[NR_CPUS];
+
static int rt_intern_hash(unsigned hash, struct rtable * rth, struct rtable ** res);
static __inline__ unsigned rt_hash_code(u32 daddr, u32 saddr, u8 tos)
@@ -357,6 +360,44 @@
}
end_bh_atomic();
}
+
+
+#ifdef CONFIG_PROC_FS
+static int rt_cache_stat_get_info(char *buffer, char **start, off_t offset, int length)
+{
+ int i, lcpu;
+ int len=0;
+ unsigned int dst_entries = atomic_read(&ipv4_dst_ops.entries);
+
+ for (lcpu=0; lcpu<smp_num_cpus; lcpu++) {
+ i = cpu_logical_map(lcpu);
+
+ len += sprintf(buffer+len, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
+ dst_entries,
+ rt_cache_stat[i].in_hit,
+ rt_cache_stat[i].in_slow_tot,
+ rt_cache_stat[i].in_slow_mc,
+ rt_cache_stat[i].in_no_route,
+ rt_cache_stat[i].in_brd,
+ rt_cache_stat[i].in_martian_dst,
+ rt_cache_stat[i].in_martian_src,
+
+ rt_cache_stat[i].out_hit,
+ rt_cache_stat[i].out_slow_tot,
+ rt_cache_stat[i].out_slow_mc
+ );
+ }
+ len -= offset;
+
+ if (len > length)
+ len = length;
+ if (len < 0)
+ len = 0;
+
+ *start = buffer + offset;
+ return len;
+}
+#endif
void rt_cache_flush(int delay)
{
@@ -1027,6 +1068,7 @@
struct in_device *in_dev = dev->ip_ptr;
u32 itag = 0;
+
/* Primary sanity checks. */
if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr) ||
@@ -1078,6 +1120,7 @@
#ifdef CONFIG_IP_MROUTE
if (!LOCAL_MCAST(daddr) && IN_DEV_MFORWARD(in_dev))
rth->u.dst.input = ip_mr_input;
+ rt_cache_stat[smp_processor_id()].in_slow_mc++;
#endif
hash = rt_hash_code(daddr, saddr^(dev->ifindex<<5), tos);
@@ -1155,6 +1198,8 @@
goto no_route;
}
+ rt_cache_stat[smp_processor_id()].in_slow_tot++;
+
#ifdef CONFIG_IP_ROUTE_NAT
/* Policy is applied before mapping destination,
but rerouting after map should be made with old source.
@@ -1287,6 +1332,7 @@
}
flags |= RTCF_BROADCAST;
res.type = RTN_BROADCAST;
+ rt_cache_stat[smp_processor_id()].in_brd++;
local_input:
rth = dst_alloc(sizeof(struct rtable), &ipv4_dst_ops);
@@ -1328,6 +1374,7 @@
return rt_intern_hash(hash, rth, (struct rtable**)&skb->dst);
no_route:
+ rt_cache_stat[smp_processor_id()].in_no_route++;
spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE);
res.type = RTN_UNREACHABLE;
goto local_input;
@@ -1336,6 +1383,7 @@
* Do not cache martian addresses: they should be logged (RFC1812)
*/
martian_destination:
+ rt_cache_stat[smp_processor_id()].in_martian_dst++;
#ifdef CONFIG_IP_ROUTE_VERBOSE
if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit())
printk(KERN_WARNING "martian destination %08x from %08x, dev %s\n", daddr, saddr, dev->name);
@@ -1343,6 +1391,8 @@
return -EINVAL;
martian_source:
+
+ rt_cache_stat[smp_processor_id()].in_martian_src++;
#ifdef CONFIG_IP_ROUTE_VERBOSE
if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit()) {
/*
@@ -1384,6 +1434,7 @@
rth->key.tos == tos) {
rth->u.dst.lastuse = jiffies;
atomic_inc(&rth->u.dst.use);
+ rt_cache_stat[smp_processor_id()].in_hit++;
atomic_inc(&rth->u.dst.refcnt);
skb->dst = (struct dst_entry*)rth;
return 0;
@@ -1634,14 +1685,18 @@
rth->u.dst.output=ip_output;
+ rt_cache_stat[smp_processor_id()].out_slow_tot++;
+
if (flags&RTCF_LOCAL) {
rth->u.dst.input = ip_local_deliver;
rth->rt_spec_dst = key.dst;
}
if (flags&(RTCF_BROADCAST|RTCF_MULTICAST)) {
rth->rt_spec_dst = key.src;
- if (flags&RTCF_LOCAL && !(dev_out->flags&IFF_LOOPBACK))
+ if (flags&RTCF_LOCAL && !(dev_out->flags&IFF_LOOPBACK)) {
rth->u.dst.output = ip_mc_output;
+ rt_cache_stat[smp_processor_id()].out_slow_mc++;
+ }
#ifdef CONFIG_IP_MROUTE
if (res.type == RTN_MULTICAST && dev_out->ip_ptr) {
struct in_device *in_dev = dev_out->ip_ptr;
@@ -1683,6 +1738,7 @@
) {
rth->u.dst.lastuse = jiffies;
atomic_inc(&rth->u.dst.use);
+ rt_cache_stat[smp_processor_id()].out_hit++;
atomic_inc(&rth->u.dst.refcnt);
end_bh_atomic();
*rp = rth;
@@ -2041,6 +2097,8 @@
rt_cache_get_info
});
#ifdef CONFIG_NET_CLS_ROUTE
+ ent = create_proc_entry ("net/rt_cache_stat", 0, 0);
+ ent->read_proc = rt_cache_stat_get_info;
ent = create_proc_entry("net/rt_acct", 0, 0);
ent->read_proc = ip_rt_acct_read;
#endif
[-- Attachment #3: message body text --]
[-- Type: text/plain, Size: 193 bytes --]
With the patch you can monitor ipv4_dst_ops.entries wich rtstat and
testlvs is good exerciser.
Cheers.
--ro
BTW. I think rtstat can hold have some stats about the GC process too.
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: dst cache overflow 2.2.x; x>=16
2002-04-15 19:07 Milam, Chad
@ 2002-04-15 22:53 ` Julian Anastasov
2002-04-15 19:53 ` Andi Kleen
0 siblings, 1 reply; 21+ messages in thread
From: Julian Anastasov @ 2002-04-15 22:53 UTC (permalink / raw)
To: Milam, Chad; +Cc: netdev, Julian Anastasov <ja@ssi.bg>
Hello,
On Mon, 15 Apr 2002, Milam, Chad wrote:
> > can only give you some CPU cycles. Touching max_size should be
> > enough.
>
> no. Increasing max_size only delays its death. That is my point. The problem
> existed on an out of the box RH7, RH6.2, RH6.1 install. The whole point of the
> patch was to fix a problem that existed _prior_ to me patching it.
I assume you mean a plain kernel (without Check Point?).
> > Are you sure you have stalled entries? What shows /proc/slabinfo
> > after 22 hours (skbuff_head_cache, etc)?
>
> well, what I can tell you is this. If I run a loop like the following, counter
> will only show say, 50 routes in the cache.
It means there are 50 linked cache entries but the
ipv4_dst_ops.entries reaches the limit, very strange.
> > for i in down up ; do ip link set ethXXX $i ; done
>
> downing all interfaces and reupping them does not seem to solve the problem either :S
I mean when you see the dst cache overflow message can the
command help? But ... may be are running a kernel patched with your
changes. I'm asking this because I know cases where wrong changes
can make problems with dst cache. But the plain kernel should be
fine. One question more: can you say that this box is used only as
router or what kind of TCP or UDP connections you have (to/from the
box)? There can be some corner cases in the dst cache usage from
connected sockets.
> thanks,
> chad
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2002-04-15 22:53 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-14 20:44 dst cache overflow 2.2.x; x>=16 Milam, Chad
2002-04-14 20:53 ` jamal
2002-04-14 21:38 ` Robert Olsson
-- strict thread matches above, loose matches on Subject: below --
2002-04-15 19:58 Milam, Chad
2002-04-15 21:47 ` Robert Olsson
2002-04-15 19:07 Milam, Chad
2002-04-15 22:53 ` Julian Anastasov
2002-04-15 19:53 ` Andi Kleen
2002-04-15 18:10 Milam, Chad
2002-04-15 18:27 ` jamal
2002-04-15 18:31 ` Julian Anastasov
2002-04-15 15:21 Milam, Chad
2002-04-15 17:53 ` jamal
2002-04-14 20:25 Milam, Chad
2002-04-14 20:32 ` jamal
2002-04-14 20:38 ` jamal
2002-04-14 19:54 Milam, Chad
2002-04-14 20:04 ` jamal
2002-04-14 17:51 Milam, Chad
2002-04-14 19:39 ` jamal
2002-04-14 19:43 ` jamal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).