* [PATCH] repairing rtcache killer
@ 2003-08-05 13:40 kuznet
2003-08-05 17:08 ` Robert Olsson
2003-08-06 17:01 ` Robert Olsson
0 siblings, 2 replies; 14+ messages in thread
From: kuznet @ 2003-08-05 13:40 UTC (permalink / raw)
To: davem, Robert.Olsson, netdev
Hello!
Alexey
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1613 -> 1.1614
# net/ipv4/route.c 1.66 -> 1.67
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/08/05 kuznet@mops.inr.ac.ru 1.1614
# route.c:
# [IPV4] Repair calculation of rtcache entries score
#
# Two serious and interesting mistakes were made in the patch of 2003-06-16.
# 1. Variance of hash chain turned out to be unexpectedly high, so truncation
# chain length at <=ip_rt_gc_elasticity results in strong growth of
# cache misses. Set the threshould to 2*ip_rt_gc_elasticity.
# And continue to think how to switch to mode when lots of cache
# entries are used once or twice, so truncation should be done at 1.
# 2. The selection rt_score() function based on use count resulted in killing
# new fresh entries. Actually, it is clear when minimal brain efforts
# are applied. :-) So, switch to scoring using last used time, which
# should give real LRU behaviour.
# --------------------------------------------
#
diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c Tue Aug 5 17:37:41 2003
+++ b/net/ipv4/route.c Tue Aug 5 17:37:41 2003
@@ -463,7 +463,9 @@
*/
static inline u32 rt_score(struct rtable *rt)
{
- u32 score = rt->u.dst.__use;
+ u32 score = jiffies - rt->u.dst.lastuse;
+
+ score = ~score & ~(3<<30);
if (rt_valuable(rt))
score |= (1<<31);
@@ -807,8 +809,7 @@
* The second limit is less certain. At the moment it allows
* only 2 entries per bucket. We will see.
*/
- if (chain_length > ip_rt_gc_elasticity ||
- (chain_length > 1 && !(min_score & (1<<31)))) {
+ if (chain_length > 2*ip_rt_gc_elasticity) {
*candp = cand->u.rt_next;
rt_free(cand);
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] repairing rtcache killer
2003-08-05 13:40 [PATCH] repairing rtcache killer kuznet
@ 2003-08-05 17:08 ` Robert Olsson
2003-08-06 7:52 ` David S. Miller
2003-08-06 17:14 ` kuznet
2003-08-06 17:01 ` Robert Olsson
1 sibling, 2 replies; 14+ messages in thread
From: Robert Olsson @ 2003-08-05 17:08 UTC (permalink / raw)
To: kuznet; +Cc: davem, Robert.Olsson, netdev
kuznet@ms2.inr.ac.ru writes:
> # Two serious and interesting mistakes were made in the patch of 2003-06-16.
> # 1. Variance of hash chain turned out to be unexpectedly high, so truncation
> # chain length at <=ip_rt_gc_elasticity results in strong growth of
> # cache misses. Set the threshould to 2*ip_rt_gc_elasticity.
> # And continue to think how to switch to mode when lots of cache
> # entries are used once or twice, so truncation should be done at 1.
Hello!
I'll guess the setting was very much affected by DoS attacs discussion which
indicated very different flowlenths compared to our actual measurement for
Uppsala University which had 65 pkts per new DST entry. Proably due to the
"new" applications and lots of students.
For autotuning I think we can have help from a ratio of warm cache hits (in_hit)
and misses (in_slow_tot) to set threshhold to trim hash chain lengths.
Cheers.
--ro
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-05 17:08 ` Robert Olsson
@ 2003-08-06 7:52 ` David S. Miller
2003-08-06 17:23 ` Robert Olsson
2003-08-06 21:23 ` kuznet
2003-08-06 17:14 ` kuznet
1 sibling, 2 replies; 14+ messages in thread
From: David S. Miller @ 2003-08-06 7:52 UTC (permalink / raw)
To: Robert Olsson; +Cc: kuznet, Robert.Olsson, netdev
On Tue, 5 Aug 2003 19:08:23 +0200
Robert Olsson <Robert.Olsson@data.slu.se> wrote:
>
> kuznet@ms2.inr.ac.ru writes:
> > # Two serious and interesting mistakes were made in the patch of 2003-06-16.
Mama mia! This patch exists in 2.4.22-preX too, so full fix
becomes more urgent.
> For autotuning I think we can have help from a ratio of warm cache
> hits (in_hit) and misses (in_slow_tot) to set threshhold to trim
> hash chain lengths.
Yes, I agree, and algorithm can be even not too smart, something like
the following.
Before scan loop, we compute:
in_hit = in_slow_tot = 0;
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_possible(i))
continue;
in_hit += per_cpu_ptr(rt_cache_stat, i)->in_hit;
in_slow_tot += per_cpu_ptr(rt_cache_stat, i)->in_slow_tot;
}
aggressive = 0;
if (in_hit < (in_slow_tot >> 2))
aggressive = 1;
thresh = ip_rt_gc_elasticity;
if (!aggressive)
thresh <<= 1;
Then the purging test becomes:
if (chain_length > thresh ||
(aggressive &&
chain_length > 1 &&
!(min_score & (1<<31)))) {
*candp = cand->u.rt_next;
rt_free(cand);
}
To make algorithm cheaper, we can even use only the current
cpu's rt_cache_stat in order to make our decisions about whether
to enter agressive mode or not.
Alexey, given all this what would you like to do? Should I push
your patch urgently into 2.4.x or spend some more time trying to
solve this issue?
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] repairing rtcache killer
2003-08-05 13:40 [PATCH] repairing rtcache killer kuznet
2003-08-05 17:08 ` Robert Olsson
@ 2003-08-06 17:01 ` Robert Olsson
2003-08-06 17:58 ` kuznet
1 sibling, 1 reply; 14+ messages in thread
From: Robert Olsson @ 2003-08-06 17:01 UTC (permalink / raw)
To: kuznet; +Cc: davem, Robert.Olsson, netdev
kuznet@ms2.inr.ac.ru writes:
> # This is a BitKeeper generated patch for the following project:
> # Project Name: Linux kernel tree
> # This patch format is intended for GNU patch command version 2.5 or higher.
> # This patch includes the following deltas:
> # ChangeSet 1.1613 -> 1.1614
> # net/ipv4/route.c 1.66 -> 1.67
Hello!
Crap. We are back to the dst cache overflow again even with routing tables
loaded. Well test is now on SMP and 2.6.0-test1. Undo the min_score test
and give it retry? Or some new RCU stuff discover. It's current lab setup.
Cheers.
--ro
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-05 17:08 ` Robert Olsson
2003-08-06 7:52 ` David S. Miller
@ 2003-08-06 17:14 ` kuznet
1 sibling, 0 replies; 14+ messages in thread
From: kuznet @ 2003-08-06 17:14 UTC (permalink / raw)
To: Robert Olsson; +Cc: davem, Robert.Olsson, netdev
Hello!
> For autotuning I think we can have help from a ratio of warm cache hits (in_hit)
> and misses (in_slow_tot) to set threshhold to trim hash chain lengths.
Have you already forgotten the whole day lost staring to some impossible
numbers for number of misses? :-)
The only output which you can get from ratio hits/misses is to _increase_
size of cache when number of misses growth. It is not only useless, it
is exactly opposite to the behaviour which you want to see.
We want to shrink it at DoS, remember? I still do not know any criterium,
apparently it should be based not on ratio hits/misses, but on absolute
rates or something like that. average/max = 1/2 is always acceptable,
perfect at normal flow rates and not disasterous even for 1packet/flow.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 7:52 ` David S. Miller
@ 2003-08-06 17:23 ` Robert Olsson
2003-08-06 18:06 ` kuznet
2003-08-06 21:23 ` kuznet
1 sibling, 1 reply; 14+ messages in thread
From: Robert Olsson @ 2003-08-06 17:23 UTC (permalink / raw)
To: David S. Miller; +Cc: Robert Olsson, kuznet, netdev
David S. Miller writes:
> To make algorithm cheaper, we can even use only the current
> cpu's rt_cache_stat in order to make our decisions about whether
> to enter agressive mode or not.
We get into complex senarios... we risk that not all CPU see "attacks"
so they still contribute to the hash chain and spinning. We can run the
chain length calculation at interval as alternative.
One starts dreaming of having a route hash per cpu to avoid cache bouncing
in the hash lookup. :-) Well not all cache bouncing well dissapear
(read DoS) but as nowadays there is "more" affinity on input due to NAPI
and scheduler etc.
Somewhat reworked code:
--- include/net/route.h.orig 2003-07-14 05:28:54.000000000 +0200
+++ include/net/route.h 2003-08-06 18:27:52.000000000 +0200
@@ -89,7 +89,9 @@
struct rt_cache_stat
{
unsigned int in_hit;
+ unsigned int in_hit_last;
unsigned int in_slow_tot;
+ unsigned int in_slow_tot_last;
unsigned int in_slow_mc;
unsigned int in_no_route;
unsigned int in_brd;
--- net/ipv4/route.c.ank-repair-hash 2003-08-06 16:27:19.000000000 +0200
+++ net/ipv4/route.c 2003-08-06 17:53:43.000000000 +0200
@@ -118,6 +118,8 @@
int ip_rt_error_cost = HZ;
int ip_rt_error_burst = 5 * HZ;
int ip_rt_gc_elasticity = 8;
+int ip_rt_gc_elasticity2 = 2 * 8;
+int ip_rt_gc_elasticity2_recalc = 0;
int ip_rt_mtu_expires = 10 * 60 * HZ;
int ip_rt_min_pmtu = 512 + 20 + 20;
int ip_rt_min_advmss = 256;
@@ -747,6 +749,40 @@
int chain_length;
int attempts = !in_softirq();
+
+ if (! (ip_rt_gc_elasticity2_recalc++ % 200 )) {
+ unsigned in_hit = 0, in_slow_tot = 0;
+ int i;
+
+ for (i = 0; i < NR_CPUS; i++) {
+ if (!cpu_possible(i))
+ continue;
+
+ in_hit += per_cpu_ptr(rt_cache_stat, i)->in_hit -
+ per_cpu_ptr(rt_cache_stat, i)->in_hit_last;
+
+ per_cpu_ptr(rt_cache_stat, i)->in_hit_last =
+ per_cpu_ptr(rt_cache_stat, i)->in_hit;
+
+ in_slow_tot += per_cpu_ptr(rt_cache_stat, i)->in_slow_tot -
+ per_cpu_ptr(rt_cache_stat, i)->in_slow_tot_last;
+
+ per_cpu_ptr(rt_cache_stat, i)->in_slow_tot_last =
+ per_cpu_ptr(rt_cache_stat, i)->in_slow_tot;
+
+ }
+
+ if (in_hit < in_slow_tot) {
+ /* Aggressive */
+ if(ip_rt_gc_elasticity2 > 1)
+ ip_rt_gc_elasticity2 >>= 1;
+ }
+ else
+ if(ip_rt_gc_elasticity2 < 2*ip_rt_gc_elasticity) {
+ ip_rt_gc_elasticity2 <<= 1;
+ }
+ }
+
restart:
chain_length = 0;
min_score = ~(u32)0;
@@ -801,13 +837,10 @@
}
if (cand) {
- /* ip_rt_gc_elasticity used to be average length of chain
- * length, when exceeded gc becomes really aggressive.
- *
- * The second limit is less certain. At the moment it allows
- * only 2 entries per bucket. We will see.
+ /* ip_rt_gc_elasticity2 used to limit length of chain
+ * when exceeded gc becomes really aggressive.
*/
- if (chain_length > 2*ip_rt_gc_elasticity) {
+ if (chain_length > ip_rt_gc_elasticity2) {
*candp = cand->u.rt_next;
rt_free(cand);
}
Cheers.
--ro
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 17:01 ` Robert Olsson
@ 2003-08-06 17:58 ` kuznet
2003-08-06 18:20 ` Robert Olsson
0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-08-06 17:58 UTC (permalink / raw)
To: Robert Olsson; +Cc: davem, Robert.Olsson, netdev
Hello!
> Crap. We are back to the dst cache overflow again even with routing tables
> loaded. Well test is now on SMP and 2.6.0-test1. Undo the min_score test
> and give it retry?
No. This patch is not related to RCU problem at all. It just sanitizes
craziness of balancing introduced by chain truncation,
that's why the subject is different. :-)
I think you did not apply patch, which is responsible for repairing
rcu troubles.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 17:23 ` Robert Olsson
@ 2003-08-06 18:06 ` kuznet
2003-08-06 18:50 ` Robert Olsson
0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-08-06 18:06 UTC (permalink / raw)
To: Robert Olsson; +Cc: davem, Robert.Olsson, netdev
Hello!
> + if (in_hit < in_slow_tot) {
> + /* Aggressive */
> + if(ip_rt_gc_elasticity2 > 1)
> + ip_rt_gc_elasticity2 >>= 1;
> + }
> + else
> + if(ip_rt_gc_elasticity2 < 2*ip_rt_gc_elasticity) {
> + ip_rt_gc_elasticity2 <<= 1;
> + }
It is the system with positive feedback. Reduction of chain length
results in increasing amount of misses and so on. Under normal load
it has the only stable state, zero chain length and will never leave it.
hits/misses is wrong feedback, unless you use it to increase chain length. :-)
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 17:58 ` kuznet
@ 2003-08-06 18:20 ` Robert Olsson
2003-08-06 18:34 ` kuznet
0 siblings, 1 reply; 14+ messages in thread
From: Robert Olsson @ 2003-08-06 18:20 UTC (permalink / raw)
To: kuznet; +Cc: Robert Olsson, davem, netdev
kuznet@ms2.inr.ac.ru writes:
> I think you did not apply patch, which is responsible for repairing
> rcu troubles.
>
> Alexey
No correct RCU patches are not applied but even before the RCU pathes we
didn't see dst cache overflow during DoS if the routing table was fully
loaded and we used hash chain limit patch.
Cheers.
--ro
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 18:20 ` Robert Olsson
@ 2003-08-06 18:34 ` kuznet
0 siblings, 0 replies; 14+ messages in thread
From: kuznet @ 2003-08-06 18:34 UTC (permalink / raw)
To: Robert Olsson; +Cc: Robert.Olsson, davem, netdev
Hello!
> No correct RCU patches are not applied but even before the RCU pathes we
> didn't see dst cache overflow during DoS if the routing table was fully
> loaded and we used hash chain limit patch.
Sure. :-) Robert, did not we discover a week ago that the reason of rcu
stalls is rt_run_flush() which runs only when routes change? :-)
By the way, to refresh your memory, months ago there was another reason
for overflows. It was fixed by setting sane value to ip_rt_gc_min_interval.
RCU showed on surface after this.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 18:06 ` kuznet
@ 2003-08-06 18:50 ` Robert Olsson
2003-08-06 19:01 ` kuznet
0 siblings, 1 reply; 14+ messages in thread
From: Robert Olsson @ 2003-08-06 18:50 UTC (permalink / raw)
To: kuznet; +Cc: Robert Olsson, davem, netdev
kuznet@ms2.inr.ac.ru writes:
> It is the system with positive feedback. Reduction of chain length
> results in increasing amount of misses and so on. Under normal load
> it has the only stable state, zero chain length and will never leave it.
>
> hits/misses is wrong feedback, unless you use it to increase chain length. :-)
>
Well is was not the intention to find any optimum or equilibrium point
the idea was just to get a different and more agressive setting pure DoS
attacks to start with.
Something like:
if (in_hit < in_slow_tot)
ip_rt_gc_elasticity2 = 1;
else
ip_rt_gc_elasticity2 = 2*ip_rt_gc_elasticity;
would have been cleaner but maybe it's not worth it.
Cheers.
--ro
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 18:50 ` Robert Olsson
@ 2003-08-06 19:01 ` kuznet
0 siblings, 0 replies; 14+ messages in thread
From: kuznet @ 2003-08-06 19:01 UTC (permalink / raw)
To: Robert Olsson; +Cc: Robert.Olsson, davem, netdev
Hello!
> if (in_hit < in_slow_tot)
> ip_rt_gc_elasticity2 = 1;
> else
> ip_rt_gc_elasticity2 = 2*ip_rt_gc_elasticity;
>
> would have been cleaner
It is _much_ cleaner.
> but maybe it's not worth it.
Well, it is wrong before all. With default settings for number of flows
at pktgen you will get always 1 not depending on length of flow at all.
No hits just because cache is too small. See? It is the problem.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 7:52 ` David S. Miller
2003-08-06 17:23 ` Robert Olsson
@ 2003-08-06 21:23 ` kuznet
2003-08-06 22:57 ` Robert Olsson
1 sibling, 1 reply; 14+ messages in thread
From: kuznet @ 2003-08-06 21:23 UTC (permalink / raw)
To: David S. Miller; +Cc: Robert.Olsson, kuznet, netdev
Hello!
> > > # Two serious and interesting mistakes were made in the patch of 2003-06-16.
>
> Mama mia! This patch exists in 2.4.22-preX too, so full fix
> becomes more urgent.
I exaggerated saying "serious". The emphasize is rather on "interesting". :-)
Mistake were not evident, Robert and me spent day and half to figure out
what the hell is going on. :-) It shows only at high flow rate and it is just
suboptimal thing, not a disaster.
> Alexey, given all this what would you like to do? Should I push
> your patch urgently into 2.4.x or spend some more time trying to
> solve this issue?
Hour ago I would say "yes". But this bus trip happened
to be productive. :-) Seems, I know how to do right thing.
I will code it now, and if Robert will be happy...
Robert, look, the idea is:
1. Periodically we reset elasticity2 to 2*elasticity, f.e. from
periodic gc timer.
2. We measure hits and misses with higher frequency, f.e. from
forced gc. The measurement are suppressed for some time
after each flush while cache collects new fresh entries.
F.e.
if (misses > rt_hash_mask+1 && hits < misses)
elasticity2 = 0;
else
elasticity2 = 2*elasticity;
misses > rt_hash_mask+1 guarantees that cache is populated and probed
enough, rt_hash_mask+1 is not a random number, it corresponds
to maximal size with elasticity2 = 0.
Seems, it should work. And it is simple enough.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] repairing rtcache killer
2003-08-06 21:23 ` kuznet
@ 2003-08-06 22:57 ` Robert Olsson
0 siblings, 0 replies; 14+ messages in thread
From: Robert Olsson @ 2003-08-06 22:57 UTC (permalink / raw)
To: kuznet; +Cc: David S. Miller, Robert.Olsson, netdev
kuznet@ms2.inr.ac.ru writes:
> Robert, look, the idea is:
>
> 1. Periodically we reset elasticity2 to 2*elasticity, f.e. from
> periodic gc timer.
This solve the "positive" feedback. Actually the code I tested moved
from elasticity2=1 to elasticity*2 as well but this seems more reliable.
> 2. We measure hits and misses with higher frequency, f.e. from
> forced gc. The measurement are suppressed for some time
> after each flush while cache collects new fresh entries.
And the measure (forced gc) should not be inhibited by any elastiticy2
setting.
> if (misses > rt_hash_mask+1 && hits < misses)
> elasticity2 = 0;
> else
> elasticity2 = 2*elasticity;
( hits < misses )
Delicate balancing point but actually it didn't look too bad in the
lab setup.
> misses > rt_hash_mask+1 guarantees that cache is populated and probed
> enough, rt_hash_mask+1 is not a random number, it corresponds
> to maximal size with elasticity2 = 0.
Yes better.
> Seems, it should work. And it is simple enough.
Let's try... ;-)
Cheers.
--ro
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2003-08-06 22:57 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-05 13:40 [PATCH] repairing rtcache killer kuznet
2003-08-05 17:08 ` Robert Olsson
2003-08-06 7:52 ` David S. Miller
2003-08-06 17:23 ` Robert Olsson
2003-08-06 18:06 ` kuznet
2003-08-06 18:50 ` Robert Olsson
2003-08-06 19:01 ` kuznet
2003-08-06 21:23 ` kuznet
2003-08-06 22:57 ` Robert Olsson
2003-08-06 17:14 ` kuznet
2003-08-06 17:01 ` Robert Olsson
2003-08-06 17:58 ` kuznet
2003-08-06 18:20 ` Robert Olsson
2003-08-06 18:34 ` kuznet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).