netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dst cache overflow 2.2.x; x>=16
@ 2002-04-14 17:51 Milam, Chad
  2002-04-14 19:39 ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 17:51 UTC (permalink / raw)
  To: netdev

First, I apologise for the cross post.  I sent this to the lkml,
and several people said I should post it here as well.....

I have been looking into a problem on a couple of linux routers that 
I have.  All of them are used for routing between the private network 
and the Internet. Some run Check Point VPN-1 v4.1 SP5, some do not.  
The problem is that after about 22 hours, they all have "dst cache 
overflow", which is quite easily traced back to net/ipv4/route.c 
rt_garbage_collect().  rt_garbage_collect appears to be called or 
initiated from two places 1) the dst_ops structure 2) rt_intern_hash.

Based on my testing here is what I have come up with.... dst_ops.entries 
is not being set appropriately (or at least not what I would expect it 
to be). I determined this by changing dst_ops->gc to point to a new 
function (rt_display_tot) for debugging, as well as had rt_intern_hash() 
call this function instead of rt_garbage_collect().  

The sole purpose of this function was to loop through all hash chains and 
count up the entries that should be valid, do a printk("%d %d", count, 
dst_ops.entries), then return(rt_garbage_collect()).  The result was, that 
when rt_garbage_collect() returned 1 (dst cache overflow), the number of 
entries reported by dst_ops.entries was far different than the number 
reported by my loop/counter.  

Upon further investigation in to rt_free(), __dst_free(), and dst_destroy(), 
I found that only dst_destroy() decrements dst_ops.entries.

Furthermore, when dst_ops->gc returns 1, dst_alloc() will not create an 
entry, appropriately so, the box is at a stand still.  

So... For the interim, I have create a small patch that purges the dst cache 
table, and resets dst_ops.entries to 0 anytime rt_garbage_collect() returns 1.  
The result... The box stays up, and hums along quite happily.

I would appreciate any comments with regards to this matter.  I have also 
included a copy of the patch that I created to work around the issue.

Thanks,
Chad

diff -urNp linux-2.2.16cwm/net/ipv4/route.c linux-2.2.16cwm/net/ipv4/route.c
-n- linux-2.2.16/net/ipv4/route.c	Tue Jan  4 13:12:26 2000
+++ linux-2.2.16cwm/net/ipv4/route.c	Tue Apr  9 15:14:12 2002
@@ -96,7 +96,7 @@
 
 #define IP_MAX_MTU	0xFFF0
 
-#define RT_GC_TIMEOUT (300*HZ)
+#define RT_GC_TIMEOUT (120*HZ)
 
 int ip_rt_min_delay = 2*HZ;
 int ip_rt_max_delay = 10*HZ;
@@ -134,7 +134,8 @@ static struct dst_entry * ipv4_dst_rerou
 static struct dst_entry * ipv4_negative_advice(struct dst_entry *);
 static void		  ipv4_link_failure(struct sk_buff *skb);
 static int rt_garbage_collect(void);
-
+static int rt_delete_now(void);
+static int rt_garbage_ctl(void);
 
 struct dst_ops ipv4_dst_ops =
 {
@@ -142,7 +143,7 @@ struct dst_ops ipv4_dst_ops =
 	__constant_htons(ETH_P_IP),
 	RT_HASH_DIVISOR,
 
-	rt_garbage_collect,
+	rt_garbage_ctl,
 	ipv4_dst_check,
 	ipv4_dst_reroute,
 	NULL,
@@ -508,8 +509,7 @@ static int rt_garbage_collect(void)
 
 	if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
 		return 0;
-	if (net_ratelimit())
-		printk("dst cache overflow\n");
+
 	return 1;
 
 work_done:
@@ -570,7 +570,7 @@ restart:
 				int saved_int = ip_rt_gc_min_interval;
 				ip_rt_gc_elasticity = 1;
 				ip_rt_gc_min_interval = 0;
-				rt_garbage_collect();
+				rt_garbage_ctl();
 				ip_rt_gc_min_interval = saved_int;
 				ip_rt_gc_elasticity = saved_elasticity;
 				goto restart;
@@ -2045,4 +2045,44 @@ __initfunc(void ip_rt_init(void))
 	ent->read_proc = ip_rt_acct_read;
 #endif
 #endif
+}
+
+static int rt_delete_now(void){
+	struct rtable *rth, **rthp;
+	int i,ent1,ent2,c;
+
+	i=0;
+	ent1=0;
+	ent2=0;
+	c=0;
+
+	ent1=atomic_read(&ipv4_dst_ops.entries);
+	start_bh_atomic();
+	while(i<RT_HASH_DIVISOR){
+		rthp=&rt_hash_table[i];
+		while((rth=*rthp)!=NULL){
+			*rthp=rth->u.rt_next;
+			rth->u.rt_next=NULL;
+			c+=1;
+			rt_free(rth);
+		}
+		i++;
+	}
+
+	atomic_set(&ipv4_dst_ops.entries,0);
+	end_bh_atomic();
+	ent2=atomic_read(&ipv4_dst_ops.entries);
+
+	if(net_ratelimit()){
+		printk("dst cache overflow\n");
+		printk("rt_delete_now(); s:%d e:%d t:%d\n",ent1,ent2,c);
+	}
+
+	return 0;
+}
+
+static int rt_garbage_ctl(void){
+	if(rt_garbage_collect())
+		rt_delete_now();
+	return 0;
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dst cache overflow 2.2.x; x>=16
  2002-04-14 17:51 Milam, Chad
@ 2002-04-14 19:39 ` jamal
  2002-04-14 19:43   ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2002-04-14 19:39 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev


Hi,

Why couldnt you just modify the parameters in /proc?

-Increase the overflow threshold (you actually hardcode this)
/proc/sys/net/ipv4/route/gc_thresh
-decrease the gc timer
/proc/sys/net/ipv4/route/gc_timeout

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dst cache overflow 2.2.x; x>=16
  2002-04-14 19:39 ` jamal
@ 2002-04-14 19:43   ` jamal
  0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-14 19:43 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev



On Sun, 14 Apr 2002, jamal wrote:

>
> Hi,
>
> Why couldnt you just modify the parameters in /proc?
>
> -Increase the overflow threshold (you actually hardcode this)
> /proc/sys/net/ipv4/route/gc_thresh
> -decrease the gc timer
> /proc/sys/net/ipv4/route/gc_timeout

Sorry, you hardcoded /proc/sys/net/ipv4/route/gc_timeout
to 120 secs.

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-14 19:54 Milam, Chad
  2002-04-14 20:04 ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 19:54 UTC (permalink / raw)
  To: netdev

I lowered the timeout to make gc more agressive.  Though, it can still
be adjusted via a /proc entry.  Default was 300. Increasing the other
parameters that you specified (which I have done) only delays the 
inevitable "dst cache overflow".  The problem is that gc (rather
rt_free) is not decrementing .entries.  So it _thinks_ the table
has overflown.

chad


>  -----Original Message-----
> From: 	owner-netdev@oss.sgi.com@YRINC   On Behalf Of jamal <hadi@cyberus.ca>
> Sent:	Sunday, April 14, 2002 3:44 PM
> To:	Milam, Chad
> Cc:	netdev@oss.sgi.com
> Subject:	Re: dst cache overflow 2.2.x; x>=16
> 
> 
> 
> On Sun, 14 Apr 2002, jamal wrote:
> 
> >
> > Hi,
> >
> > Why couldnt you just modify the parameters in /proc?
> >
> > -Increase the overflow threshold (you actually hardcode this)
> > /proc/sys/net/ipv4/route/gc_thresh
> > -decrease the gc timer
> > /proc/sys/net/ipv4/route/gc_timeout
> 
> Sorry, you hardcoded /proc/sys/net/ipv4/route/gc_timeout
> to 120 secs.
> 
> cheers,
> jamal
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-14 19:54 Milam, Chad
@ 2002-04-14 20:04 ` jamal
  0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-14 20:04 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev



On Sun, 14 Apr 2002, Milam, Chad wrote:

> I lowered the timeout to make gc more agressive.  Though, it can still
> be adjusted via a /proc entry.  Default was 300. Increasing the other
> parameters that you specified (which I have done) only delays the
> inevitable "dst cache overflow".  The problem is that gc (rather
> rt_free) is not decrementing .entries.  So it _thinks_ the table
> has overflown.
>

Overflow will only happen if /proc/sys/net/ipv4/route/gc_thresh
is exceeded. A default of 512 aint that big. What is the average number
of entries you are seeing?
What kind of data do you get from running rtstat?
Increment the size of /proc/sys/net/ipv4/route/gc_thresh to a higher
number matching your avg entries;

Garbage collection aint that cheap: so safer to just make the size
larger instead of invoking it more frequently -- RAM is cheap. Note also
that garbage collection will run every
/proc/sys/net/ipv4/route/gc_min_interval time expiry regardless of how
you big your max threshold is.

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-14 20:25 Milam, Chad
  2002-04-14 20:32 ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 20:25 UTC (permalink / raw)
  To: netdev

> On Sun, 14 Apr 2002, Milam, Chad wrote:
> 
> > I lowered the timeout to make gc more agressive.  Though, it can still
> > be adjusted via a /proc entry.  Default was 300. Increasing the other
> > parameters that you specified (which I have done) only delays the
> > inevitable "dst cache overflow".  The problem is that gc (rather
> > rt_free) is not decrementing .entries.  So it _thinks_ the table
> > has overflown.
> >
> 
> Overflow will only happen if /proc/sys/net/ipv4/route/gc_thresh
> is exceeded. A default of 512 aint that big. What is the average number
> of entries you are seeing?
> What kind of data do you get from running rtstat?
> Increment the size of /proc/sys/net/ipv4/route/gc_thresh to a higher
> number matching your avg entries;
> 
> Garbage collection aint that cheap: so safer to just make the size
> larger instead of invoking it more frequently -- RAM is cheap. Note also
> that garbage collection will run every
> /proc/sys/net/ipv4/route/gc_min_interval time expiry regardless of how
> you big your max threshold is.
> 
> cheers,
> jamal

Actually changing the timer to 120*HZ was not supposed to end up in the patch, 
I pulled it out of there, but managed to leave it in the diff for the email.

/proc/sys/net/ipv4/route/gc_min_interval = 1
/proc/sys/net/ipv4/route/max_size=16384 (default 4096)

Also, based on the code from route.c:
--
if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
        return 0;
if (net_ratelimit())
	printk("dst cache overflow\n");
return 1;
--

Looks to me like gc_thresh has nothing to do with it. Did I read that wrong?

chad

 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-14 20:25 Milam, Chad
@ 2002-04-14 20:32 ` jamal
  2002-04-14 20:38   ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2002-04-14 20:32 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev



On Sun, 14 Apr 2002, Milam, Chad wrote:

> /proc/sys/net/ipv4/route/max_size=16384 (default 4096)

Ok, i meant:
/proc/sys/net/ipv4/route/gc_thresh

> Also, based on the code from route.c:
> --
> if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
>         return 0;
> if (net_ratelimit())
> 	printk("dst cache overflow\n");
> return 1;
> --
>

0 typically means the gc was succesful

> Looks to me like gc_thresh has nothing to do with it. Did I read that wrong?
>

I just remembered an old problem that used to cause this; do you
have lo configured?
make sure it is IP address 127.0.0.1 and it is up.

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-14 20:32 ` jamal
@ 2002-04-14 20:38   ` jamal
  0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-14 20:38 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev



On Sun, 14 Apr 2002, jamal wrote:

> > /proc/sys/net/ipv4/route/max_size=16384 (default 4096)
>
> Ok, i meant:
> /proc/sys/net/ipv4/route/gc_thresh
>

Sorry, you are right, the correct value is written/read from
/proc/sys/net/ipv4/route/max_size

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-14 20:44 Milam, Chad
  2002-04-14 20:53 ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-14 20:44 UTC (permalink / raw)
  To: netdev, hadi

> On Sun, 14 Apr 2002, Milam, Chad wrote:
> 
> > /proc/sys/net/ipv4/route/max_size=16384 (default 4096)
> 
> Ok, i meant:
> /proc/sys/net/ipv4/route/gc_thresh

it also looks like gc_thresh defaults to RT_HASH_DIVISOR (256).

however I have....
/proc/sys/net/ipv4/route/gc_thresh=2048

> > Also, based on the code from route.c:
> > --
> > if (atomic_read(&ipv4_dst_ops.entries) < ip_rt_max_size)
> >         return 0;
> > if (net_ratelimit())
> > 	printk("dst cache overflow\n");
> > return 1;
> > --
> >
> 
> 0 typically means the gc was succesful

true enough.  but the way I read that is that if entries>max_size,
you are going to get "dst cache overflow" which returns a 1. there
is no test here for entries>gc_thresh.

> I just remembered an old problem that used to cause this; do you
> have lo configured?
> make sure it is IP address 127.0.0.1 and it is up.

sure do. read about that issue quite sometime ago.

thanks again,
chad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-14 20:44 dst cache overflow 2.2.x; x>=16 Milam, Chad
@ 2002-04-14 20:53 ` jamal
  2002-04-14 21:38   ` Robert Olsson
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2002-04-14 20:53 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev




If i summarize your problem is that you are building up
dst caches faster than they can be garbage collected.

Solution
1. Make the max size large enough to catchup with rate
2. Make sure that every time you go into garbage collection you are
successful.
- reducing the min interval to 1 might be a little aggressive.
But you can tune this later
- You wanna make sure you get a large positive "goal" every time
play with ip_rt_gc_elasticity (/proc/sys/net/ipv4/route/gc_elasticity)
also the rt_hash_log

All the above are configurable via /proc

have to run

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-14 20:53 ` jamal
@ 2002-04-14 21:38   ` Robert Olsson
  0 siblings, 0 replies; 21+ messages in thread
From: Robert Olsson @ 2002-04-14 21:38 UTC (permalink / raw)
  To: Milam, Chad; +Cc: jamal, netdev


jamal writes:
 > 
 > 
 > 
 > If i summarize your problem is that you are building up
 > dst caches faster than they can be garbage collected.
 > 
 > Solution
 > 1. Make the max size large enough to catchup with rate
 > 2. Make sure that every time you go into garbage collection you are
 > successful.
 > - reducing the min interval to 1 might be a little aggressive.
 > But you can tune this later
 > - You wanna make sure you get a large positive "goal" every time
 > play with ip_rt_gc_elasticity (/proc/sys/net/ipv4/route/gc_elasticity)
 > also the rt_hash_log
 > 
 > All the above are configurable via /proc
 > 
 > have to run

 And in in 2.4.X the GC is done more dynamically around an "equilibrium point".
 Alexey warned about 2.2 code...


 Snaphot from Linux router. 2.4.10
 
 cat /proc/sys/net/ipv4/route/max_size 
 65536


 rtstat
 size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc
 9861     24721     131     0     1     0     0     0         2       1      0
10119     25044     128     0     0     0     0     0         2       0      0
 2514     24125    1293     2     0     0     0     0         1       2      0
 3654     24315     591     2     1     1     0     0         0       2      1
 4441     25170     387     0     2     0     0     0         1       3      0
 5060     25000     304     2     1     0     0     0         0       2      0
 5532     25627     230     2     0     0     0     0         0       2      0
 5947     25754     242     2     0     0     0     0         1       3      0
 6379     25602     211     0     1     0     0     0         2       3      0
 6371     25523     235     0     0     0     0     0         1       1      0
 6752     24251     187     1     0     0     0     0         0       1      0
 7077     25310     160     0     0     0     0     0         1       1      0
 6851     24608     222     2     1     0     0     0         1       3      0
 7256     25313     199     1     0     0     0     0         1       2      0
 7086     24656     174     0     0     0     0     0         0       1      0
 7459     24070     180     3     1     0     0     0         1       2      0
 2434     23844    1340     7     1     0     0     0         1       3      0


 1:st  ipv4_dst_ops.entries. (You see GC happen)
 2:nd: Warm cache hits -> approx aggregated packet/sec. 
 3:rd: Cache misses    -> approx connections/sec.


 Cheers.

						--ro

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 15:21 Milam, Chad
  2002-04-15 17:53 ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 15:21 UTC (permalink / raw)
  To: netdev

Robert Olson writes:
> jamal writes:
>  >
>  > If i summarize your problem is that you are building up
>  > dst caches faster than they can be garbage collected.
>  >
>  > Solution
>  > 1. Make the max size large enough to catchup with rate
>  > 2. Make sure that every time you go into garbage collection you are
>  > successful.
>  > - reducing the min interval to 1 might be a little aggressive.
>  > But you can tune this later
>  > - You wanna make sure you get a large positive "goal" every time
>  > play with ip_rt_gc_elasticity (/proc/sys/net/ipv4/route/gc_elasticity)
>  > also the rt_hash_log
>  >
>  > All the above are configurable via /proc
>  >
>  > have to run
> 
>  And in in 2.4.X the GC is done more dynamically around an "equilibrium point".
>  Alexey warned about 2.2 code...
> 
>  Snaphot from Linux router. 2.4.10
> 
>  cat /proc/sys/net/ipv4/route/max_size
>  65536
> 
>  rtstat
>  size   IN: hit     tot    mc no_rt bcast madst masrc  OUT: hit     tot     mc
> 
>  1:st  ipv4_dst_ops.entries. (You see GC happen)
>  2:nd: Warm cache hits -> approx aggregated packet/sec.
>  3:rd: Cache misses    -> approx connections/sec.

unfortunately I cannot move my Check Point boxes to 2.4.x yet.  Maybe this will come 
in the future.

At the end of the day, my patch was not trying to avoid GC, or eliminate it.  It was just 
there to keep the box from going completely dead... it accomplishes exactly that.  I 
have run it now for about a week, and the box has not had to be restarted, where as with
out it, I would have restarted once per day. Even with route/max_size set to 65536,
I had to restart about every two weeks.

I am also not suggesting that GC does not work, it does (for the most part). What I am 
trying to say is that there is a condition (still working on that bit) that keeps the 
.entries counter from decreasing to what it should be.  Something, some process, is 
leaking routes (at least into the counter).  This is where my problem is.  And the patch
works around that, by setting the cache to zero, and starting over. Which, again, is better
than restarting.

Thanks again,
chad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-15 15:21 Milam, Chad
@ 2002-04-15 17:53 ` jamal
  0 siblings, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-15 17:53 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev



On Mon, 15 Apr 2002, Milam, Chad wrote:

> At the end of the day, my patch was not trying to avoid GC, or eliminate
> it.  It was just
> there to keep the box from going completely dead...

I dont think yourt patch guarantees this (you are nuking routes that
may still be actively used) -- i think you may have been lucky so far.
Regardless, this seems to be  an interesting case of fixing what appears
to be a application bug with a kernel patch. Its amazing what you can do
when you have source.

cheers,
jamal

PS:- the fact that you are running 2.2 is useful information that
you left out.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 18:10 Milam, Chad
  2002-04-15 18:27 ` jamal
  2002-04-15 18:31 ` Julian Anastasov
  0 siblings, 2 replies; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 18:10 UTC (permalink / raw)
  To: netdev

jamal writes: 
> On Mon, 15 Apr 2002, Milam, Chad wrote:
> 
> > At the end of the day, my patch was not trying to avoid GC, or eliminate
> > it.  It was just
> > there to keep the box from going completely dead...
> 
> I dont think yourt patch guarantees this (you are nuking routes that
> may still be actively used) -- i think you may have been lucky so far.
> Regardless, this seems to be  an interesting case of fixing what appears
> to be a application bug with a kernel patch. Its amazing what you can do
> when you have source.
> 
> cheers,
> jamal
> 
> PS:- the fact that you are running 2.2 is useful information that
> you left out.
> 

the fact that i am using 2.2 is stated in the subject line. i did neglect to 
put it explicitly put it in the message (sorry).  it is, however, also 
diff/patch file.

I also do not think that nuking valid routes in the cache will produce any
major issues, other than slowing things down for a few seconds.  the cache
is just the cache, not the real route table. and yes, it pretty much 
guarantees the route cache will be purged, therefore avoiding a reboot and
avoiding a quickly repeated overflow...

and yes, having the source makes things much easier. :)

chad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-15 18:10 Milam, Chad
@ 2002-04-15 18:27 ` jamal
  2002-04-15 18:31 ` Julian Anastasov
  1 sibling, 0 replies; 21+ messages in thread
From: jamal @ 2002-04-15 18:27 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev



On Mon, 15 Apr 2002, Milam, Chad wrote:

> the fact that i am using 2.2 is stated in the subject line. i did neglect to
> put it explicitly put it in the message (sorry).  it is, however, also
> diff/patch file.

I apologize. I spent all my time rambling to you based on 2.4 code ;-<

>
> I also do not think that nuking valid routes in the cache will produce any
> major issues, other than slowing things down for a few seconds.  the cache
> is just the cache, not the real route table. and yes, it pretty much
> guarantees the route cache will be purged, therefore avoiding a reboot and
> avoiding a quickly repeated overflow...
>

Typically most of the code will check for the dst cache or some
dereferencing within it before using it. I am not sure we can
swear by this ;-> i suppose we will find out when you get an oops ;->
maybe you should just purge the routes marked as expired.

cheers,
jamal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-15 18:10 Milam, Chad
  2002-04-15 18:27 ` jamal
@ 2002-04-15 18:31 ` Julian Anastasov
  1 sibling, 0 replies; 21+ messages in thread
From: Julian Anastasov @ 2002-04-15 18:31 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev


	Hello,

On Mon, 15 Apr 2002, Milam, Chad wrote:

> I also do not think that nuking valid routes in the cache will produce any
> major issues, other than slowing things down for a few seconds.  the cache
> is just the cache, not the real route table. and yes, it pretty much

	Of course. You can play only with max_size to achieve the same
result. max_size should be appropriate to the rate new hosts appear
in the cache. I'm wondering whether your patched kernel does not have
some bug, for example, unfreed skbs or struct rtable. Make sure that
the unpatched kernels have the same bug. If it appears after 22
hours (I assume the system load for all these 22 hours is same)
then this is a bug. Playing with the hash size is final step but it
can only give you some CPU cycles. Touching max_size should be
enough.

> guarantees the route cache will be purged, therefore avoiding a reboot and
> avoiding a quickly repeated overflow...

	Are you sure you have stalled entries? What shows /proc/slabinfo
after 22 hours (skbuff_head_cache, etc)?

	One hint: can this command solve the problem (to flush the
cache entries)?:

for i in down up ; do ip link set ethXXX $i ; done

> chad

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 19:07 Milam, Chad
  2002-04-15 22:53 ` Julian Anastasov
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 19:07 UTC (permalink / raw)
  To: netdev, Julian Anastasov <ja@ssi.bg>

Julian writes: 
> 	Hello,
> 
> On Mon, 15 Apr 2002, Milam, Chad wrote:
> 
> > I also do not think that nuking valid routes in the cache will produce any
> > major issues, other than slowing things down for a few seconds.  the cache
> > is just the cache, not the real route table. and yes, it pretty much
> 
> 	Of course. You can play only with max_size to achieve the same
> result. max_size should be appropriate to the rate new hosts appear
> in the cache. I'm wondering whether your patched kernel does not have
> some bug, for example, unfreed skbs or struct rtable. Make sure that
> the unpatched kernels have the same bug. If it appears after 22
> hours (I assume the system load for all these 22 hours is same)
> then this is a bug. Playing with the hash size is final step but it
> can only give you some CPU cycles. Touching max_size should be
> enough.

no. Increasing max_size only delays its death. That is my point. The problem
existed on an out of the box RH7, RH6.2, RH6.1 install. The whole point of the
patch was to fix a problem that existed _prior_ to me patching it.


> > guarantees the route cache will be purged, therefore avoiding a reboot and
> > avoiding a quickly repeated overflow...
> 
> 	Are you sure you have stalled entries? What shows /proc/slabinfo
> after 22 hours (skbuff_head_cache, etc)?

well, what I can tell you is this. If I run a loop like the following, counter 
will only show say, 50 routes in the cache.
------
start=atomic_read(&ipv4_dst_ops.entries);
i=0;
counter=0;
while(i<RT_HASH_DIVISOR){
	rthp=&rt_hash_table[i];
	while((rth=*rthp)!=NULL){
		*rthp=rth->u.rt_next;
		rth->u.rt_next=NULL;
		counter+=1;
	}
	i++;
}
printk("before: %d, after: %d", start, counter);
------ 
> 	One hint: can this command solve the problem (to flush the
> cache entries)?:
> 
> for i in down up ; do ip link set ethXXX $i ; done

downing all interfaces and reupping them does not seem to solve the problem either :S

thanks,
chad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dst cache overflow 2.2.x; x>=16
  2002-04-15 22:53 ` Julian Anastasov
@ 2002-04-15 19:53   ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2002-04-15 19:53 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Milam, Chad, netdev, Julian Anastasov <ja@ssi.bg>

> 	I mean when you see the dst cache overflow message can the
> command help? But ... may be are running a kernel patched with your
> changes. I'm asking this because I know cases where wrong changes
> can make problems with dst cache. But the plain kernel should be
> fine. One question more: can you say that this box is used only as
> router or what kind of TCP or UDP connections you have (to/from the
> box)? There can be some corner cases in the dst cache usage from
> connected sockets.

I would suspect CheckPoint (I think it has kernel modules, hasn't it)
We had a similar report of such a thing a few months ago and they were
using CheckPoint too.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
@ 2002-04-15 19:58 Milam, Chad
  2002-04-15 21:47 ` Robert Olsson
  0 siblings, 1 reply; 21+ messages in thread
From: Milam, Chad @ 2002-04-15 19:58 UTC (permalink / raw)
  To: netdev, Julian Anastasov <ja@ssi.bg>


Julian wrote:
> On Mon, 15 Apr 2002, Milam, Chad wrote:
> 
> > > can only give you some CPU cycles. Touching max_size should be
> > > enough.
> >
> > no. Increasing max_size only delays its death. That is my point. The problem
> > existed on an out of the box RH7, RH6.2, RH6.1 install. The whole point of the
> > patch was to fix a problem that existed _prior_ to me patching it.
> 
> 	I assume you mean a plain kernel (without Check Point?).

indeed.. out of the box, nothing installed additional, ip params tuned.

> 
> > > 	Are you sure you have stalled entries? What shows /proc/slabinfo
> > > after 22 hours (skbuff_head_cache, etc)?
> >
> > well, what I can tell you is this. If I run a loop like the following, counter
> > will only show say, 50 routes in the cache.
> 
> 	It means there are 50 linked cache entries but the
> ipv4_dst_ops.entries reaches the limit, very strange.

this is what I am keep trying to say :)

 
> > > for i in down up ; do ip link set ethXXX $i ; done
> >
> > downing all interfaces and reupping them does not seem to solve the problem either :S
> 
> 	I mean when you see the dst cache overflow message can the
> command help? But ... may be are running a kernel patched with your
> changes. I'm asking this because I know cases where wrong changes
> can make problems with dst cache. But the plain kernel should be
> fine. One question more: can you say that this box is used only as
> router or what kind of TCP or UDP connections you have (to/from the
> box)? There can be some corner cases in the dst cache usage from
> connected sockets.

I originally setup a script to do basically this... grep "dst cache" 
/var/log/messages; if i get a hit, down and up all interfaces.  that didnt
work, so then I changed it to reboot the box :(.  This was done on a box
with _no_ funny stuff... again, bog standard.

The box is a router, no ip masq, no ip chains, no ip fw, just a router.

Thanks,
chad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-15 19:58 Milam, Chad
@ 2002-04-15 21:47 ` Robert Olsson
  0 siblings, 0 replies; 21+ messages in thread
From: Robert Olsson @ 2002-04-15 21:47 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev, Julian Anastasov <ja@ssi.bg>

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 455 bytes --]


Milam, Chad writes:
 > 
 > The box is a router, no ip masq, no ip chains, no ip fw, just a router.

 Weird. Julian has a useful program testlvs for testing route
 cache. I just tested this w. linux-2.2.17 many srcnum (80000) but 
 cannot force cache overflow. Ok it was not for hours. And 2.2.X 
 has recent cache code I was wrong here. I have used it for just 
 for routers for pretty demanding jobs.

 I would look for grows in /proc/slabinfo too...

 

[-- Attachment #2: rt_cache_stat-2.2.17.pat --]
[-- Type: application/octet-stream, Size: 5850 bytes --]

--- linux/include/net/route.h.orig	Sun Nov  5 22:18:35 2000
+++ linux/include/net/route.h	Mon Apr 15 22:19:18 2002
@@ -14,6 +14,7 @@
  *		Alan Cox	:	Support for TCP parameters.
  *		Alexey Kuznetsov:	Major changes for new routing code.
  *		Mike McLagan    :	Routing by source
+ *		Robert Olsson   :	Added rt_cache statistics
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -102,6 +103,20 @@
 	__u32 	o_packets;
 	__u32 	i_bytes;
 	__u32 	i_packets;
+};
+
+struct rt_cache_stat 
+{
+        unsigned in_hit;
+        unsigned in_slow_tot;
+        unsigned in_slow_mc;
+        unsigned in_no_route;
+        unsigned in_brd;
+        unsigned in_martian_dst;
+        unsigned in_martian_src;
+        unsigned out_hit;
+        unsigned out_slow_tot;
+        unsigned out_slow_mc;
 };
 
 extern struct ip_rt_acct ip_rt_acct[256];
--- linux/net/ipv4/route.c.orig	Tue Jan  4 19:12:26 2000
+++ linux/net/ipv4/route.c	Mon Apr 15 23:55:37 2002
@@ -52,6 +52,7 @@
  *	Tobias Ringstrom	:	Uninitialized res.type in ip_route_output_slow.
  *	Vladimir V. Ivanov	:	IP rule info (flowid) is really useful.
  *		Marc Boucher	:	routing by fwmark
+ *	Robert Olsson		:	Added rt_cache statistics
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -176,6 +177,8 @@
 
 struct rtable 	*rt_hash_table[RT_HASH_DIVISOR];
 
+struct rt_cache_stat rt_cache_stat[NR_CPUS];
+
 static int rt_intern_hash(unsigned hash, struct rtable * rth, struct rtable ** res);
 
 static __inline__ unsigned rt_hash_code(u32 daddr, u32 saddr, u8 tos)
@@ -357,6 +360,44 @@
 	}
 	end_bh_atomic();
 }
+
+
+#ifdef CONFIG_PROC_FS
+static int rt_cache_stat_get_info(char *buffer, char **start, off_t offset, int length)
+{
+	int i, lcpu;
+        int len=0;
+	unsigned int dst_entries = atomic_read(&ipv4_dst_ops.entries);
+
+        for (lcpu=0; lcpu<smp_num_cpus; lcpu++) {
+                i = cpu_logical_map(lcpu);
+
+		len += sprintf(buffer+len, "%08x  %08x %08x %08x %08x %08x %08x %08x  %08x %08x %08x\n",
+			       dst_entries,		       
+			       rt_cache_stat[i].in_hit,
+			       rt_cache_stat[i].in_slow_tot,
+			       rt_cache_stat[i].in_slow_mc,
+			       rt_cache_stat[i].in_no_route,
+			       rt_cache_stat[i].in_brd,
+			       rt_cache_stat[i].in_martian_dst,
+			       rt_cache_stat[i].in_martian_src,
+
+			       rt_cache_stat[i].out_hit,
+			       rt_cache_stat[i].out_slow_tot,
+			       rt_cache_stat[i].out_slow_mc
+			);
+	}
+	len -= offset;
+
+	if (len > length)
+		len = length;
+	if (len < 0)
+		len = 0;
+
+	*start = buffer + offset;
+  	return len;
+}
+#endif
   
 void rt_cache_flush(int delay)
 {
@@ -1027,6 +1068,7 @@
 	struct in_device *in_dev = dev->ip_ptr;
 	u32 itag = 0;
 
+
 	/* Primary sanity checks. */
 
 	if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr) ||
@@ -1078,6 +1120,7 @@
 #ifdef CONFIG_IP_MROUTE
 	if (!LOCAL_MCAST(daddr) && IN_DEV_MFORWARD(in_dev))
 		rth->u.dst.input = ip_mr_input;
+ 	rt_cache_stat[smp_processor_id()].in_slow_mc++;
 #endif
 
 	hash = rt_hash_code(daddr, saddr^(dev->ifindex<<5), tos);
@@ -1155,6 +1198,8 @@
 		goto no_route;
 	}
 
+	rt_cache_stat[smp_processor_id()].in_slow_tot++;
+
 #ifdef CONFIG_IP_ROUTE_NAT
 	/* Policy is applied before mapping destination,
 	   but rerouting after map should be made with old source.
@@ -1287,6 +1332,7 @@
 	}
 	flags |= RTCF_BROADCAST;
 	res.type = RTN_BROADCAST;
+	rt_cache_stat[smp_processor_id()].in_brd++;
 
 local_input:
 	rth = dst_alloc(sizeof(struct rtable), &ipv4_dst_ops);
@@ -1328,6 +1374,7 @@
 	return rt_intern_hash(hash, rth, (struct rtable**)&skb->dst);
 
 no_route:
+	rt_cache_stat[smp_processor_id()].in_no_route++;
 	spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE);
 	res.type = RTN_UNREACHABLE;
 	goto local_input;
@@ -1336,6 +1383,7 @@
 	 *	Do not cache martian addresses: they should be logged (RFC1812)
 	 */
 martian_destination:
+	rt_cache_stat[smp_processor_id()].in_martian_dst++;
 #ifdef CONFIG_IP_ROUTE_VERBOSE
 	if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit())
 		printk(KERN_WARNING "martian destination %08x from %08x, dev %s\n", daddr, saddr, dev->name);
@@ -1343,6 +1391,8 @@
 	return -EINVAL;
 
 martian_source:
+
+	rt_cache_stat[smp_processor_id()].in_martian_src++;
 #ifdef CONFIG_IP_ROUTE_VERBOSE
 	if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit()) {
 		/*
@@ -1384,6 +1434,7 @@
 		    rth->key.tos == tos) {
 			rth->u.dst.lastuse = jiffies;
 			atomic_inc(&rth->u.dst.use);
+ 			rt_cache_stat[smp_processor_id()].in_hit++;
 			atomic_inc(&rth->u.dst.refcnt);
 			skb->dst = (struct dst_entry*)rth;
 			return 0;
@@ -1634,14 +1685,18 @@
 
 	rth->u.dst.output=ip_output;
 
+	rt_cache_stat[smp_processor_id()].out_slow_tot++;
+
 	if (flags&RTCF_LOCAL) {
 		rth->u.dst.input = ip_local_deliver;
 		rth->rt_spec_dst = key.dst;
 	}
 	if (flags&(RTCF_BROADCAST|RTCF_MULTICAST)) {
 		rth->rt_spec_dst = key.src;
-		if (flags&RTCF_LOCAL && !(dev_out->flags&IFF_LOOPBACK))
+		if (flags&RTCF_LOCAL && !(dev_out->flags&IFF_LOOPBACK)) {
 			rth->u.dst.output = ip_mc_output;
+			rt_cache_stat[smp_processor_id()].out_slow_mc++;
+		}
 #ifdef CONFIG_IP_MROUTE
 		if (res.type == RTN_MULTICAST && dev_out->ip_ptr) {
 			struct in_device *in_dev = dev_out->ip_ptr;
@@ -1683,6 +1738,7 @@
 		) {
 			rth->u.dst.lastuse = jiffies;
 			atomic_inc(&rth->u.dst.use);
+			rt_cache_stat[smp_processor_id()].out_hit++;
 			atomic_inc(&rth->u.dst.refcnt);
 			end_bh_atomic();
 			*rp = rth;
@@ -2041,6 +2097,8 @@
 		rt_cache_get_info
 	});
 #ifdef CONFIG_NET_CLS_ROUTE
+ 	ent = create_proc_entry ("net/rt_cache_stat", 0, 0);
+	ent->read_proc = rt_cache_stat_get_info;
 	ent = create_proc_entry("net/rt_acct", 0, 0);
 	ent->read_proc = ip_rt_acct_read;
 #endif

[-- Attachment #3: message body text --]
[-- Type: text/plain, Size: 193 bytes --]


 With the patch you can monitor ipv4_dst_ops.entries wich rtstat and 
 testlvs is good exerciser.

 Cheers.

					--ro

 BTW. I think rtstat can hold have some stats about the GC process too.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: dst cache overflow 2.2.x; x>=16
  2002-04-15 19:07 Milam, Chad
@ 2002-04-15 22:53 ` Julian Anastasov
  2002-04-15 19:53   ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Julian Anastasov @ 2002-04-15 22:53 UTC (permalink / raw)
  To: Milam, Chad; +Cc: netdev, Julian Anastasov <ja@ssi.bg>


	Hello,

On Mon, 15 Apr 2002, Milam, Chad wrote:

> > can only give you some CPU cycles. Touching max_size should be
> > enough.
>
> no. Increasing max_size only delays its death. That is my point. The problem
> existed on an out of the box RH7, RH6.2, RH6.1 install. The whole point of the
> patch was to fix a problem that existed _prior_ to me patching it.

	I assume you mean a plain kernel (without Check Point?).

> > 	Are you sure you have stalled entries? What shows /proc/slabinfo
> > after 22 hours (skbuff_head_cache, etc)?
>
> well, what I can tell you is this. If I run a loop like the following, counter
> will only show say, 50 routes in the cache.

	It means there are 50 linked cache entries but the
ipv4_dst_ops.entries reaches the limit, very strange.

> > for i in down up ; do ip link set ethXXX $i ; done
>
> downing all interfaces and reupping them does not seem to solve the problem either :S

	I mean when you see the dst cache overflow message can the
command help? But ... may be are running a kernel patched with your
changes. I'm asking this because I know cases where wrong changes
can make problems with dst cache. But the plain kernel should be
fine. One question more: can you say that this box is used only as
router or what kind of TCP or UDP connections you have (to/from the
box)? There can be some corner cases in the dst cache usage from
connected sockets.

> thanks,
> chad

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2002-04-15 22:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-14 20:44 dst cache overflow 2.2.x; x>=16 Milam, Chad
2002-04-14 20:53 ` jamal
2002-04-14 21:38   ` Robert Olsson
  -- strict thread matches above, loose matches on Subject: below --
2002-04-15 19:58 Milam, Chad
2002-04-15 21:47 ` Robert Olsson
2002-04-15 19:07 Milam, Chad
2002-04-15 22:53 ` Julian Anastasov
2002-04-15 19:53   ` Andi Kleen
2002-04-15 18:10 Milam, Chad
2002-04-15 18:27 ` jamal
2002-04-15 18:31 ` Julian Anastasov
2002-04-15 15:21 Milam, Chad
2002-04-15 17:53 ` jamal
2002-04-14 20:25 Milam, Chad
2002-04-14 20:32 ` jamal
2002-04-14 20:38   ` jamal
2002-04-14 19:54 Milam, Chad
2002-04-14 20:04 ` jamal
2002-04-14 17:51 Milam, Chad
2002-04-14 19:39 ` jamal
2002-04-14 19:43   ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).