* Re: Kernel-DOS error in arp mechanism – no delete off incomplete arp adresses
[not found] <4EEC5286.3070408@gmx.de>
@ 2011-12-17 13:26 ` richard -rw- weinberger
2011-12-21 7:44 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: richard -rw- weinberger @ 2011-12-17 13:26 UTC (permalink / raw)
To: Robert Gladewitz; +Cc: linux-kernel, netdev
On Sat, Dec 17, 2011 at 9:27 AM, Robert Gladewitz <gladewitz@gmx.de> wrote:
> Hello,
>
> first i have to say sorry for m y bad english. I try my best to descripe the
> error.
>
> I Use Linux-Routers for internal and external firewall components. For this
> I Use own kernel configurations und use only the drivers an modules what I
> need. Other features and modules I deactivated in my kernel versions
>
> Since the kernel version 2.6.36 there is some mistake in the ipv4 arp
> implementation. The the System try to find an unknown system, the send an
> “who is” and marked the ip address as “incomplete” (German: unvollständig).
> The thing is, usually linux delete all incomplete and complete entries in
> some time, but in all kernel versions since 2.6.36 he doas not delete any
> addresses.
>
> In my case, I scan my network-segmens for new devices (Kaspersky, Landesk)
> and on this process, the router learned a lot of incomplete addresses. I
> have some class b networks (from the history), and this means the router
> will be learned mor then 2^16 adresses.
>
> Now the kerlen learn a maximum addresses – I know this is defined on
> gc_thresh1 , gc_thresh2 and gc_thresh3 in the proc system under
> sys.net.ipv4.neight.default. If the table have the maximum addresses in the
> table (default=1024), no new host can send traffic packet over this router.
> This means, we have a classical risk of DOS. In my case, I have only an
> internal risk, but some providers may have also external risc.
>
> I hope, my description help you to find this error. I send also my kernel
> config, may there is some relation to small configurations in kernel
>
> Viele Grüße
>
> Robert Gladewitz
>
CC'ing netdev.
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel-DOS error in arp mechanism – no delete off incomplete arp adresses
2011-12-17 13:26 ` Kernel-DOS error in arp mechanism – no delete off incomplete arp adresses richard -rw- weinberger
@ 2011-12-21 7:44 ` Eric Dumazet
2011-12-21 8:07 ` Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= " David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2011-12-21 7:44 UTC (permalink / raw)
To: richard -rw- weinberger, David Miller
Cc: Robert Gladewitz, linux-kernel, netdev
Le samedi 17 décembre 2011 à 14:26 +0100, richard -rw- weinberger a
écrit :
> On Sat, Dec 17, 2011 at 9:27 AM, Robert Gladewitz <gladewitz@gmx.de> wrote:
> > Hello,
> >
> > first i have to say sorry for m y bad english. I try my best to descripe the
> > error.
> >
> > I Use Linux-Routers for internal and external firewall components. For this
> > I Use own kernel configurations und use only the drivers an modules what I
> > need. Other features and modules I deactivated in my kernel versions
> >
> > Since the kernel version 2.6.36 there is some mistake in the ipv4 arp
> > implementation. The the System try to find an unknown system, the send an
> > “who is” and marked the ip address as “incomplete” (German: unvollständig).
> > The thing is, usually linux delete all incomplete and complete entries in
> > some time, but in all kernel versions since 2.6.36 he doas not delete any
> > addresses.
> >
> > In my case, I scan my network-segmens for new devices (Kaspersky, Landesk)
> > and on this process, the router learned a lot of incomplete addresses. I
> > have some class b networks (from the history), and this means the router
> > will be learned mor then 2^16 adresses.
> >
> > Now the kerlen learn a maximum addresses – I know this is defined on
> > gc_thresh1 , gc_thresh2 and gc_thresh3 in the proc system under
> > sys.net.ipv4.neight.default. If the table have the maximum addresses in the
> > table (default=1024), no new host can send traffic packet over this router.
> > This means, we have a classical risk of DOS. In my case, I have only an
> > internal risk, but some providers may have also external risc.
> >
> > I hope, my description help you to find this error. I send also my kernel
> > config, may there is some relation to small configurations in kernel
> >
> > Viele Grüße
> >
> > Robert Gladewitz
> >
>
> CC'ing netdev.
>
Hmm, it seems we removed the ip route garbage collection too soon.
In the old times, we had a mechanism (rt_check_expire()) to remove old
route cache entries, but not anymore, so we can have some slots in IP
route cache referencing neighbours entries for infinite amount of time.
8.0.0.72 dev eth3 ref 2 used 1038/1098/1035 probes 6 FAILED
8.0.0.205 dev eth3 ref 2 used 639/699/636 probes 6 FAILED
8.0.0.82 dev eth3 ref 2 used 1008/1068/1005 probes 6 FAILED
8.0.0.215 dev eth3 ref 2 used 609/669/606 probes 6 FAILED
In 2.6.39, commit 2c8cec5c10bc (ipv4: Cache learned PMTU information in
inetpeer.) removed rt_check_expire() and added this problem.
David, I suggest we add back the garbage collector for current kernels,
we'll remove it when route cache really disappear ?
I'll send a patch today.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= no delete off incomplete arp adresses
2011-12-21 7:44 ` Eric Dumazet
@ 2011-12-21 8:07 ` David Miller
2011-12-21 9:51 ` Kernel-DOS error in arp mechanism – " Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2011-12-21 8:07 UTC (permalink / raw)
To: eric.dumazet; +Cc: richard.weinberger, gladewitz, linux-kernel, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Dec 2011 08:44:27 +0100
> David, I suggest we add back the garbage collector for current kernels,
> we'll remove it when route cache really disappear ?
>
> I'll send a patch today.
Yes, it's the best idea.
We can actually remove it again as early as when when route neigh's
are ref-less.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel-DOS error in arp mechanism – no delete off incomplete arp adresses
2011-12-21 8:07 ` Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= " David Miller
@ 2011-12-21 9:51 ` Eric Dumazet
2011-12-21 20:47 ` Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= " David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2011-12-21 9:51 UTC (permalink / raw)
To: David Miller; +Cc: richard.weinberger, gladewitz, linux-kernel, netdev
Le mercredi 21 décembre 2011 à 03:07 -0500, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Dec 2011 08:44:27 +0100
>
> > David, I suggest we add back the garbage collector for current kernels,
> > we'll remove it when route cache really disappear ?
> >
> > I'll send a patch today.
>
> Yes, it's the best idea.
>
> We can actually remove it again as early as when when route neigh's
> are ref-less.
Here is the patch I successfully tested in the neighbour stress
situation. This is a stable candidate (2.6.39+)
Thanks !
[PATCH] ipv4: reintroduce route cache garbage collector
Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer)
removed IP route cache garbage collector a bit too soon, as this gc was
responsible for expired routes cleanup, releasing their neighbour
reference.
As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
their neighbour cache.
Reintroduce the garbage collection, since we'll have to wait our
neighbour lookups become refcount-less to not depend on this stuff.
Reported-by: Robert Gladewitz <gladewitz@gmx.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/route.c | 107 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 46af623..252c512 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -120,6 +120,7 @@
static int ip_rt_max_size;
static int ip_rt_gc_timeout __read_mostly = RT_GC_TIMEOUT;
+static int ip_rt_gc_interval __read_mostly = 60 * HZ;
static int ip_rt_gc_min_interval __read_mostly = HZ / 2;
static int ip_rt_redirect_number __read_mostly = 9;
static int ip_rt_redirect_load __read_mostly = HZ / 50;
@@ -133,6 +134,9 @@ static int ip_rt_min_advmss __read_mostly = 256;
static int rt_chain_length_max __read_mostly = 20;
static int redirect_genid;
+static struct delayed_work expires_work;
+static unsigned long expires_ljiffies;
+
/*
* Interface to generic destination cache.
*/
@@ -830,6 +834,97 @@ static int has_noalias(const struct rtable *head, const struct rtable *rth)
return ONE;
}
+static void rt_check_expire(void)
+{
+ static unsigned int rover;
+ unsigned int i = rover, goal;
+ struct rtable *rth;
+ struct rtable __rcu **rthp;
+ unsigned long samples = 0;
+ unsigned long sum = 0, sum2 = 0;
+ unsigned long delta;
+ u64 mult;
+
+ delta = jiffies - expires_ljiffies;
+ expires_ljiffies = jiffies;
+ mult = ((u64)delta) << rt_hash_log;
+ if (ip_rt_gc_timeout > 1)
+ do_div(mult, ip_rt_gc_timeout);
+ goal = (unsigned int)mult;
+ if (goal > rt_hash_mask)
+ goal = rt_hash_mask + 1;
+ for (; goal > 0; goal--) {
+ unsigned long tmo = ip_rt_gc_timeout;
+ unsigned long length;
+
+ i = (i + 1) & rt_hash_mask;
+ rthp = &rt_hash_table[i].chain;
+
+ if (need_resched())
+ cond_resched();
+
+ samples++;
+
+ if (rcu_dereference_raw(*rthp) == NULL)
+ continue;
+ length = 0;
+ spin_lock_bh(rt_hash_lock_addr(i));
+ while ((rth = rcu_dereference_protected(*rthp,
+ lockdep_is_held(rt_hash_lock_addr(i)))) != NULL) {
+ prefetch(rth->dst.rt_next);
+ if (rt_is_expired(rth)) {
+ *rthp = rth->dst.rt_next;
+ rt_free(rth);
+ continue;
+ }
+ if (rth->dst.expires) {
+ /* Entry is expired even if it is in use */
+ if (time_before_eq(jiffies, rth->dst.expires)) {
+nofree:
+ tmo >>= 1;
+ rthp = &rth->dst.rt_next;
+ /*
+ * We only count entries on
+ * a chain with equal hash inputs once
+ * so that entries for different QOS
+ * levels, and other non-hash input
+ * attributes don't unfairly skew
+ * the length computation
+ */
+ length += has_noalias(rt_hash_table[i].chain, rth);
+ continue;
+ }
+ } else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
+ goto nofree;
+
+ /* Cleanup aged off entries. */
+ *rthp = rth->dst.rt_next;
+ rt_free(rth);
+ }
+ spin_unlock_bh(rt_hash_lock_addr(i));
+ sum += length;
+ sum2 += length*length;
+ }
+ if (samples) {
+ unsigned long avg = sum / samples;
+ unsigned long sd = int_sqrt(sum2 / samples - avg*avg);
+ rt_chain_length_max = max_t(unsigned long,
+ ip_rt_gc_elasticity,
+ (avg + 4*sd) >> FRACT_BITS);
+ }
+ rover = i;
+}
+
+/*
+ * rt_worker_func() is run in process context.
+ * we call rt_check_expire() to scan part of the hash table
+ */
+static void rt_worker_func(struct work_struct *work)
+{
+ rt_check_expire();
+ schedule_delayed_work(&expires_work, ip_rt_gc_interval);
+}
+
/*
* Perturbation of rt_genid by a small quantity [1..256]
* Using 8 bits of shuffling ensure we can call rt_cache_invalidate()
@@ -3179,6 +3274,13 @@ static ctl_table ipv4_route_table[] = {
.proc_handler = proc_dointvec_jiffies,
},
{
+ .procname = "gc_interval",
+ .data = &ip_rt_gc_interval,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
.procname = "redirect_load",
.data = &ip_rt_redirect_load,
.maxlen = sizeof(int),
@@ -3388,6 +3490,11 @@ int __init ip_rt_init(void)
devinet_init();
ip_fib_init();
+ INIT_DELAYED_WORK_DEFERRABLE(&expires_work, rt_worker_func);
+ expires_ljiffies = jiffies;
+ schedule_delayed_work(&expires_work,
+ net_random() % ip_rt_gc_interval + ip_rt_gc_interval);
+
if (ip_rt_proc_init())
printk(KERN_ERR "Unable to create route proc files\n");
#ifdef CONFIG_XFRM
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= no delete off incomplete arp adresses
2011-12-21 9:51 ` Kernel-DOS error in arp mechanism – " Eric Dumazet
@ 2011-12-21 20:47 ` David Miller
0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2011-12-21 20:47 UTC (permalink / raw)
To: eric.dumazet; +Cc: richard.weinberger, gladewitz, linux-kernel, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Dec 2011 10:51:12 +0100
> [PATCH] ipv4: reintroduce route cache garbage collector
>
> Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer)
> removed IP route cache garbage collector a bit too soon, as this gc was
> responsible for expired routes cleanup, releasing their neighbour
> reference.
>
> As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
> their neighbour cache.
>
> Reintroduce the garbage collection, since we'll have to wait our
> neighbour lookups become refcount-less to not depend on this stuff.
>
> Reported-by: Robert Gladewitz <gladewitz@gmx.de>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-12-21 20:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4EEC5286.3070408@gmx.de>
2011-12-17 13:26 ` Kernel-DOS error in arp mechanism – no delete off incomplete arp adresses richard -rw- weinberger
2011-12-21 7:44 ` Eric Dumazet
2011-12-21 8:07 ` Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= " David Miller
2011-12-21 9:51 ` Kernel-DOS error in arp mechanism – " Eric Dumazet
2011-12-21 20:47 ` Kernel-DOS error in arp mechanism =?iso-2022-jp-3?B?GyQoUSN8GyhC?= " David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).