Fw: [Bug 13339] New: rtable leak in ipv4/route.c

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fw: [Bug 13339] New: rtable leak in ipv4/route.c
@ 2009-05-19  2:35 Stephen Hemminger
  2009-05-19 12:34 ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Stephen Hemminger @ 2009-05-19  2:35 UTC (permalink / raw)
  To: netdev

Begin forwarded message:

Date: Mon, 18 May 2009 14:10:20 GMT
From: bugzilla-daemon@bugzilla.kernel.org
To: shemminger@linux-foundation.org
Subject: [Bug 13339] New: rtable leak in ipv4/route.c

http://bugzilla.kernel.org/show_bug.cgi?id=13339

           Summary: rtable leak in ipv4/route.c
           Product: Networking
           Version: 2.5
    Kernel Version: 2.6.29
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: IPV4
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: lav@yar.ru
        Regression: Yes

2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
patch has at least one critical flaw, and another problem.

rt_intern_hash calculates rthi pointer, which is later used for new entry
insertion. The same loop calculates cand pointer which is used to clean the
list. If the pointers are the same, rtable leak occurs, as first the cand is
removed then the new entry is appended to it.

This leak leads to unregister_netdevice problem (usage count > 0).

Another problem of the patch is that it tries to insert the entries in certain
order, to facilitate counting of entries distinct by all but QoS parameters.
Unfortunately, referencing an existing rtable entry moves it to list beginning,
to speed up further lookups, so the carefully built order is destroyed.

For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
it will also destroy the ordering.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

-- 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19  2:35 Fw: [Bug 13339] New: rtable leak in ipv4/route.c Stephen Hemminger
@ 2009-05-19 12:34 ` Jarek Poplawski
  2009-05-19 15:12   ` Neil Horman
  2009-05-19 15:32   ` Eric Dumazet
  0 siblings, 2 replies; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-19 12:34 UTC (permalink / raw)
  To: lav; +Cc: Stephen Hemminger, netdev, Neil Horman

On 19-05-2009 04:35, Stephen Hemminger wrote:
> 
> Begin forwarded message:
> 
> Date: Mon, 18 May 2009 14:10:20 GMT
> From: bugzilla-daemon@bugzilla.kernel.org
> To: shemminger@linux-foundation.org
> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> 
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=13339
...
> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> patch has at least one critical flaw, and another problem.
> 
> rt_intern_hash calculates rthi pointer, which is later used for new entry
> insertion. The same loop calculates cand pointer which is used to clean the
> list. If the pointers are the same, rtable leak occurs, as first the cand is
> removed then the new entry is appended to it.
> 
> This leak leads to unregister_netdevice problem (usage count > 0).
> 
> Another problem of the patch is that it tries to insert the entries in certain
> order, to facilitate counting of entries distinct by all but QoS parameters.
> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> to speed up further lookups, so the carefully built order is destroyed.
> 
> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> it will also destroy the ordering.

I think fixing this bug fast is more important than this
ordering or counting. Could you send your patch proposal?

Thanks,
Jarek P.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 12:34 ` Jarek Poplawski
@ 2009-05-19 15:12   ` Neil Horman
  2009-05-19 15:32   ` Eric Dumazet
  1 sibling, 0 replies; 28+ messages in thread
From: Neil Horman @ 2009-05-19 15:12 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 12:34:17PM +0000, Jarek Poplawski wrote:
> On 19-05-2009 04:35, Stephen Hemminger wrote:
> > 
> > Begin forwarded message:
> > 
> > Date: Mon, 18 May 2009 14:10:20 GMT
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: shemminger@linux-foundation.org
> > Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> > 
> > 
> > http://bugzilla.kernel.org/show_bug.cgi?id=13339
> ...
> > 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> > patch has at least one critical flaw, and another problem.
> > 
> > rt_intern_hash calculates rthi pointer, which is later used for new entry
> > insertion. The same loop calculates cand pointer which is used to clean the
> > list. If the pointers are the same, rtable leak occurs, as first the cand is
> > removed then the new entry is appended to it.
> > 
> > This leak leads to unregister_netdevice problem (usage count > 0).
> > 
> > Another problem of the patch is that it tries to insert the entries in certain
> > order, to facilitate counting of entries distinct by all but QoS parameters.
> > Unfortunately, referencing an existing rtable entry moves it to list beginning,
> > to speed up further lookups, so the carefully built order is destroyed.
> > 
> > For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> > it will also destroy the ordering.
> 
> I think fixing this bug fast is more important than this
> ordering or counting. Could you send your patch proposal?


I agree that the corner case in which cand and rthi is a problem (although as
you point out easily fixable), and with an alteration to your proposal it can be
done without destroying the order.

As for the reordering by reference however, I don't see any move to front
heuristic being implemented anywhere in the routing path.  Where do you see this
happening?

Neil


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 12:34 ` Jarek Poplawski
  2009-05-19 15:12   ` Neil Horman
@ 2009-05-19 15:32   ` Eric Dumazet
  2009-05-19 16:20     ` Neil Horman
                       ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Eric Dumazet @ 2009-05-19 15:32 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: lav, Stephen Hemminger, netdev, Neil Horman

Jarek Poplawski a écrit :
> On 19-05-2009 04:35, Stephen Hemminger wrote:
>> Begin forwarded message:
>>
>> Date: Mon, 18 May 2009 14:10:20 GMT
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: shemminger@linux-foundation.org
>> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
>>
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> ...
>> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
>> patch has at least one critical flaw, and another problem.
>>
>> rt_intern_hash calculates rthi pointer, which is later used for new entry
>> insertion. The same loop calculates cand pointer which is used to clean the
>> list. If the pointers are the same, rtable leak occurs, as first the cand is
>> removed then the new entry is appended to it.
>>
>> This leak leads to unregister_netdevice problem (usage count > 0).
>>
>> Another problem of the patch is that it tries to insert the entries in certain
>> order, to facilitate counting of entries distinct by all but QoS parameters.
>> Unfortunately, referencing an existing rtable entry moves it to list beginning,
>> to speed up further lookups, so the carefully built order is destroyed.

We could change rt_check_expire() to be smarter and handle any order in chains.

This would let rt_intern_hash() be simpler.

As its a more performance critical path, all would be good :)

>>
>> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
>> it will also destroy the ordering.
> 
> I think fixing this bug fast is more important than this
> ordering or counting. Could you send your patch proposal?
> 

Here is mine, only compiled, not tested yet.

All credits for Stephen for doing the analysis of course :)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4c60e9..fbe77ad 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -780,12 +780,37 @@ static void rt_do_flush(int process_context)
 #define FRACT_BITS 3
 #define ONE (1UL << FRACT_BITS)
 
+static unsigned long compute_length(struct rtable *head)
+{
+	struct rtable *rth, *aux;
+	unsigned long length = 0;
+
+	for (rth = head; rth != NULL; rth = rth->u.dst.rt_next) {
+		/*
+		 * We ignore items having same hash inputs
+		 * so that entries for different QOS
+		 * levels, and other non-hash input
+		 * attributes don't unfairly skew
+		 * the length computation
+		 */
+		for (aux = head; ;aux = aux->u.dst.rt_next) {
+			if (aux == rth) {
+				length += ONE;
+				break;
+			}
+			if (compare_hash_inputs(&aux->fl, &rth->fl))
+				break;
+		}
+	}
+	return length;
+}
+
 static void rt_check_expire(void)
 {
 	static unsigned int rover;
 	unsigned int i = rover, goal;
 	struct rtable *rth, **rthp;
-	unsigned long length = 0, samples = 0;
+	unsigned long length, samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	u64 mult;
 
@@ -795,7 +820,6 @@ static void rt_check_expire(void)
 	goal = (unsigned int)mult;
 	if (goal > rt_hash_mask)
 		goal = rt_hash_mask + 1;
-	length = 0;
 	for (; goal > 0; goal--) {
 		unsigned long tmo = ip_rt_gc_timeout;
 
@@ -821,29 +845,11 @@ static void rt_check_expire(void)
 				if (time_before_eq(jiffies, rth->u.dst.expires)) {
 					tmo >>= 1;
 					rthp = &rth->u.dst.rt_next;
-					/*
-					 * Only bump our length if the hash
-					 * inputs on entries n and n+1 are not
-					 * the same, we only count entries on
-					 * a chain with equal hash inputs once
-					 * so that entries for different QOS
-					 * levels, and other non-hash input
-					 * attributes don't unfairly skew
-					 * the length computation
-					 */
-					if ((*rthp == NULL) ||
-					    !compare_hash_inputs(&(*rthp)->fl,
-								 &rth->fl))
-						length += ONE;
 					continue;
 				}
 			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) {
 				tmo >>= 1;
 				rthp = &rth->u.dst.rt_next;
-				if ((*rthp == NULL) ||
-				    !compare_hash_inputs(&(*rthp)->fl,
-							 &rth->fl))
-					length += ONE;
 				continue;
 			}
 
@@ -851,6 +857,7 @@ static void rt_check_expire(void)
 			*rthp = rth->u.dst.rt_next;
 			rt_free(rth);
 		}
+		length = compute_length(rt_hash_table[i].chain);
 		spin_unlock_bh(rt_hash_lock_addr(i));
 		sum += length;
 		sum2 += length*length;
@@ -1068,7 +1075,6 @@ out:	return 0;
 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
 	struct rtable	*rth, **rthp;
-	struct rtable	*rthi;
 	unsigned long	now;
 	struct rtable *cand, **candp;
 	u32 		min_score;
@@ -1088,7 +1094,6 @@ restart:
 	}
 
 	rthp = &rt_hash_table[hash].chain;
-	rthi = NULL;
 
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	while ((rth = *rthp) != NULL) {
@@ -1134,17 +1139,6 @@ restart:
 		chain_length++;
 
 		rthp = &rth->u.dst.rt_next;
-
-		/*
-		 * check to see if the next entry in the chain
-		 * contains the same hash input values as rt.  If it does
-		 * This is where we will insert into the list, instead of
-		 * at the head.  This groups entries that differ by aspects not
-		 * relvant to the hash function together, which we use to adjust
-		 * our chain length
-		 */
-		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
-			rthi = rth;
 	}
 
 	if (cand) {
@@ -1205,10 +1199,7 @@ restart:
 		}
 	}
 
-	if (rthi)
-		rt->u.dst.rt_next = rthi->u.dst.rt_next;
-	else
-		rt->u.dst.rt_next = rt_hash_table[hash].chain;
+	rt->u.dst.rt_next = rt_hash_table[hash].chain;
 
 #if RT_CACHE_DEBUG >= 2
 	if (rt->u.dst.rt_next) {
@@ -1224,10 +1215,7 @@ restart:
 	 * previous writes to rt are comitted to memory
 	 * before making rt visible to other CPUS.
 	 */
-	if (rthi)
-		rcu_assign_pointer(rthi->u.dst.rt_next, rt);
-	else
-		rcu_assign_pointer(rt_hash_table[hash].chain, rt);
+	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
 
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 	*rp = rt;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 15:32   ` Eric Dumazet
@ 2009-05-19 16:20     ` Neil Horman
  2009-05-19 18:47       ` Eric Dumazet
  2009-05-19 16:23     ` Fw: [Bug 13339] New: rtable leak in ipv4/route.c Neil Horman
  2009-05-19 17:22     ` Jarek Poplawski
  2 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-05-19 16:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On 19-05-2009 04:35, Stephen Hemminger wrote:
> >> Begin forwarded message:
> >>
> >> Date: Mon, 18 May 2009 14:10:20 GMT
> >> From: bugzilla-daemon@bugzilla.kernel.org
> >> To: shemminger@linux-foundation.org
> >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> >>
> >>
> >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > ...
> >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> >> patch has at least one critical flaw, and another problem.
> >>
> >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> >> insertion. The same loop calculates cand pointer which is used to clean the
> >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> >> removed then the new entry is appended to it.
> >>
> >> This leak leads to unregister_netdevice problem (usage count > 0).
> >>
> >> Another problem of the patch is that it tries to insert the entries in certain
> >> order, to facilitate counting of entries distinct by all but QoS parameters.
> >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> >> to speed up further lookups, so the carefully built order is destroyed.
> 
> We could change rt_check_expire() to be smarter and handle any order in chains.
> 
> This would let rt_intern_hash() be simpler.
> 
> As its a more performance critical path, all would be good :)
> 
> >>
> >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> >> it will also destroy the ordering.
> > 
> > I think fixing this bug fast is more important than this
> > ordering or counting. Could you send your patch proposal?
> > 
> 

I was thinking something along these lines (also completely untested).  The
extra check in rt_intern hash prevents us from identifying a 'group leader' that
is about to be deleted, and sets rthi to point at the group leader rather than
the last entry in the group as it previously did, which we then mark with the
new flag in the dst_entry (updating it on rt_free of course).  This lets us
identify the boundaries of a group, so when we do a move to front heruistic, we
can move the entire group rather than just the single entry.    I prefer this to
Erics approach since, even though rt_check_expire isn't in a performance
critical path, the addition of the extra list search when the list is unordered
makes rt_check_expire run in O(n^2) rather than O(n), which I think will have a
significant impact on high traffic systems.  Since the ordered list creation was
done at almost zero performance cost in rt_intern_hash, I'd like to keep it if
at all possible.

There is of course a shortcomming in my patch in that it doesn't actually do
anything currently with DST_GRPLDR other than set/clear it appropriately, I've
yet to identify where the route cache move to front heruistic is implemented.
If someone could show me where that is, I'd be happy to finish this patch.

Neil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 15:32   ` Eric Dumazet
  2009-05-19 16:20     ` Neil Horman
@ 2009-05-19 16:23     ` Neil Horman
  2009-05-19 17:17       ` Jarek Poplawski
  2009-05-19 17:22     ` Jarek Poplawski
  2 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-05-19 16:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On 19-05-2009 04:35, Stephen Hemminger wrote:
> >> Begin forwarded message:
> >>
> >> Date: Mon, 18 May 2009 14:10:20 GMT
> >> From: bugzilla-daemon@bugzilla.kernel.org
> >> To: shemminger@linux-foundation.org
> >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> >>
> >>
> >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > ...
> >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> >> patch has at least one critical flaw, and another problem.
> >>
> >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> >> insertion. The same loop calculates cand pointer which is used to clean the
> >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> >> removed then the new entry is appended to it.
> >>
> >> This leak leads to unregister_netdevice problem (usage count > 0).
> >>
> >> Another problem of the patch is that it tries to insert the entries in certain
> >> order, to facilitate counting of entries distinct by all but QoS parameters.
> >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> >> to speed up further lookups, so the carefully built order is destroyed.
> 
> We could change rt_check_expire() to be smarter and handle any order in chains.
> 
> This would let rt_intern_hash() be simpler.
> 
> As its a more performance critical path, all would be good :)
> 
> >>
> >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> >> it will also destroy the ordering.
> > 
> > I think fixing this bug fast is more important than this
> > ordering or counting. Could you send your patch proposal?
> > 
> 

Of course, it helps if I attach the patch :)


diff --git a/include/net/dst.h b/include/net/dst.h
index 6be3b08..a39db6d 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -47,6 +47,7 @@ struct dst_entry
 #define DST_NOXFRM		2
 #define DST_NOPOLICY		4
 #define DST_NOHASH		8
+#define DST_GRPLDR		16
 	unsigned long		expires;
 
 	unsigned short		header_len;	/* more space at head required */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4c60e9..0120f0e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void)
 
 static inline void rt_free(struct rtable *rt)
 {
+	if (rt->u.dst.flags & DST_GRPLDR)
+		rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR;
 	call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free);
 }
 
@@ -1143,8 +1145,11 @@ restart:
 		 * relvant to the hash function together, which we use to adjust
 		 * our chain length
 		 */
-		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
+		if (!*rthi && *rthp && 
+		    compare_hash_inputs(&(*rthp)->fl, &rt->fl) &&
+		    (cand != rth))
 			rthi = rth;
+		}
 	}
 
 	if (cand) {
@@ -1205,11 +1210,14 @@ restart:
 		}
 	}
 
-	if (rthi)
+	if (rthi) {
 		rt->u.dst.rt_next = rthi->u.dst.rt_next;
-	else
+		rthi->u.dst.flags &= ~(DST_GRPLDR);
+	} else
 		rt->u.dst.rt_next = rt_hash_table[hash].chain;
 
+	rt->u.dst.flags |= DST_GRPLDR;
+
 #if RT_CACHE_DEBUG >= 2
 	if (rt->u.dst.rt_next) {
 		struct rtable *trt;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 16:23     ` Fw: [Bug 13339] New: rtable leak in ipv4/route.c Neil Horman
@ 2009-05-19 17:17       ` Jarek Poplawski
  2009-05-19 17:45         ` Neil Horman
  2009-05-19 17:47         ` Jarek Poplawski
  0 siblings, 2 replies; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-19 17:17 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 12:23:30PM -0400, Neil Horman wrote:
> On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> > Jarek Poplawski a écrit :
> > > On 19-05-2009 04:35, Stephen Hemminger wrote:
> > >> Begin forwarded message:
> > >>
> > >> Date: Mon, 18 May 2009 14:10:20 GMT
> > >> From: bugzilla-daemon@bugzilla.kernel.org
> > >> To: shemminger@linux-foundation.org
> > >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> > >>
> > >>
> > >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > > ...
> > >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> > >> patch has at least one critical flaw, and another problem.
> > >>
> > >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> > >> insertion. The same loop calculates cand pointer which is used to clean the
> > >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> > >> removed then the new entry is appended to it.
> > >>
> > >> This leak leads to unregister_netdevice problem (usage count > 0).
> > >>
> > >> Another problem of the patch is that it tries to insert the entries in certain
> > >> order, to facilitate counting of entries distinct by all but QoS parameters.
> > >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> > >> to speed up further lookups, so the carefully built order is destroyed.
> > 
> > We could change rt_check_expire() to be smarter and handle any order in chains.
> > 
> > This would let rt_intern_hash() be simpler.
> > 
> > As its a more performance critical path, all would be good :)
> > 
> > >>
> > >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> > >> it will also destroy the ordering.
> > > 
> > > I think fixing this bug fast is more important than this
> > > ordering or counting. Could you send your patch proposal?
> > > 
> > 
> 
> Of course, it helps if I attach the patch :)
> 
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 6be3b08..a39db6d 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -47,6 +47,7 @@ struct dst_entry
>  #define DST_NOXFRM		2
>  #define DST_NOPOLICY		4
>  #define DST_NOHASH		8
> +#define DST_GRPLDR		16
>  	unsigned long		expires;
>  
>  	unsigned short		header_len;	/* more space at head required */
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index c4c60e9..0120f0e 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void)
>  
>  static inline void rt_free(struct rtable *rt)
>  {
> +	if (rt->u.dst.flags & DST_GRPLDR)
> +		rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR;
>  	call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free);
>  }
>  
> @@ -1143,8 +1145,11 @@ restart:
>  		 * relvant to the hash function together, which we use to adjust
>  		 * our chain length
>  		 */
> -		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
> +		if (!*rthi && *rthp && 
> +		    compare_hash_inputs(&(*rthp)->fl, &rt->fl) &&
> +		    (cand != rth))
>  			rthi = rth;

Does it really prevent cand == rthi in the next loop?

I didn't check Eric's patch yet, but I really don't know what's wrong
with something as simple as below for -stable, until "proper" fix is
analyzed and tested.

Jarek P.
---

 net/ipv4/route.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4c60e9..f4e6c7a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1157,6 +1157,8 @@ restart:
 		if (chain_length > ip_rt_gc_elasticity) {
 			*candp = cand->u.dst.rt_next;
 			rt_free(cand);
+			if (rthi == cand)
+				rthi = NULL;
 		}
 	} else {
 		if (chain_length > rt_chain_length_max) {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 15:32   ` Eric Dumazet
  2009-05-19 16:20     ` Neil Horman
  2009-05-19 16:23     ` Fw: [Bug 13339] New: rtable leak in ipv4/route.c Neil Horman
@ 2009-05-19 17:22     ` Jarek Poplawski
  2 siblings, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-19 17:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: lav, Stephen Hemminger, netdev, Neil Horman

On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On 19-05-2009 04:35, Stephen Hemminger wrote:
> >> Begin forwarded message:
> >>
> >> Date: Mon, 18 May 2009 14:10:20 GMT
> >> From: bugzilla-daemon@bugzilla.kernel.org
> >> To: shemminger@linux-foundation.org
> >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> >>
> >>
> >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > ...
> >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> >> patch has at least one critical flaw, and another problem.
> >>
> >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> >> insertion. The same loop calculates cand pointer which is used to clean the
> >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> >> removed then the new entry is appended to it.
> >>
> >> This leak leads to unregister_netdevice problem (usage count > 0).
> >>
> >> Another problem of the patch is that it tries to insert the entries in certain
> >> order, to facilitate counting of entries distinct by all but QoS parameters.
> >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> >> to speed up further lookups, so the carefully built order is destroyed.
> 
> We could change rt_check_expire() to be smarter and handle any order in chains.
> 
> This would let rt_intern_hash() be simpler.
> 
> As its a more performance critical path, all would be good :)
> 
> >>
> >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> >> it will also destroy the ordering.
> > 
> > I think fixing this bug fast is more important than this
> > ordering or counting. Could you send your patch proposal?
> > 
> 
> Here is mine, only compiled, not tested yet.
> 
> All credits for Stephen for doing the analysis of course :)

- All credits for Stephen for doing the analysis of course :)
+ All credits for Alexander V. Lukyanov for doing the analysis of course :)

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 17:17       ` Jarek Poplawski
@ 2009-05-19 17:45         ` Neil Horman
  2009-05-19 17:53           ` Jarek Poplawski
  2009-05-19 18:05           ` Jarek Poplawski
  2009-05-19 17:47         ` Jarek Poplawski
  1 sibling, 2 replies; 28+ messages in thread
From: Neil Horman @ 2009-05-19 17:45 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 07:17:03PM +0200, Jarek Poplawski wrote:
> On Tue, May 19, 2009 at 12:23:30PM -0400, Neil Horman wrote:
> > On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> > > Jarek Poplawski a écrit :
> > > > On 19-05-2009 04:35, Stephen Hemminger wrote:
> > > >> Begin forwarded message:
> > > >>
> > > >> Date: Mon, 18 May 2009 14:10:20 GMT
> > > >> From: bugzilla-daemon@bugzilla.kernel.org
> > > >> To: shemminger@linux-foundation.org
> > > >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> > > >>
> > > >>
> > > >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > > > ...
> > > >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> > > >> patch has at least one critical flaw, and another problem.
> > > >>
> > > >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> > > >> insertion. The same loop calculates cand pointer which is used to clean the
> > > >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> > > >> removed then the new entry is appended to it.
> > > >>
> > > >> This leak leads to unregister_netdevice problem (usage count > 0).
> > > >>
> > > >> Another problem of the patch is that it tries to insert the entries in certain
> > > >> order, to facilitate counting of entries distinct by all but QoS parameters.
> > > >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> > > >> to speed up further lookups, so the carefully built order is destroyed.
> > > 
> > > We could change rt_check_expire() to be smarter and handle any order in chains.
> > > 
> > > This would let rt_intern_hash() be simpler.
> > > 
> > > As its a more performance critical path, all would be good :)
> > > 
> > > >>
> > > >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> > > >> it will also destroy the ordering.
> > > > 
> > > > I think fixing this bug fast is more important than this
> > > > ordering or counting. Could you send your patch proposal?
> > > > 
> > > 
> > 
> > Of course, it helps if I attach the patch :)
> > 
> > 
> > diff --git a/include/net/dst.h b/include/net/dst.h
> > index 6be3b08..a39db6d 100644
> > --- a/include/net/dst.h
> > +++ b/include/net/dst.h
> > @@ -47,6 +47,7 @@ struct dst_entry
> >  #define DST_NOXFRM		2
> >  #define DST_NOPOLICY		4
> >  #define DST_NOHASH		8
> > +#define DST_GRPLDR		16
> >  	unsigned long		expires;
> >  
> >  	unsigned short		header_len;	/* more space at head required */
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index c4c60e9..0120f0e 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void)
> >  
> >  static inline void rt_free(struct rtable *rt)
> >  {
> > +	if (rt->u.dst.flags & DST_GRPLDR)
> > +		rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR;
> >  	call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free);
> >  }
> >  
> > @@ -1143,8 +1145,11 @@ restart:
> >  		 * relvant to the hash function together, which we use to adjust
> >  		 * our chain length
> >  		 */
> > -		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
> > +		if (!*rthi && *rthp && 
> > +		    compare_hash_inputs(&(*rthp)->fl, &rt->fl) &&
> > +		    (cand != rth))
> >  			rthi = rth;
> 
> Does it really prevent cand == rthi in the next loop?
> 
Yes, because cand and rthi are inspected during the same loop iteration, and
both assigned from rth.  since I added a check which requires !rthi (which is
actually a bug above that I need to fix), once rthi is set, it won't be moved,
and on the next iteration, if cand is assigned, it is assigned to rth, which
(being the next iteration), is a different rt cache entry


> I didn't check Eric's patch yet, but I really don't know what's wrong
> with something as simple as below for -stable, until "proper" fix is
> analyzed and tested.
> 
Because the above fixes it without continuing to break the ordering.  You're
change below prevents the leak, but still allows for disordered lists to form,
which IMHO doesn't really make it a candidate for -stable.  I'd much rather fix
both the leak and the ordering before pushing anything out

speaking of which, I'm going to ask again, I've been looking all morning, and
I'm unable to find the move to front heuristic that you mentioned creates furthe
list disordering.  If you can point it out to me, I can complete my patch and
present something for you to test more throughly.

Neil

> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 17:17       ` Jarek Poplawski
  2009-05-19 17:45         ` Neil Horman
@ 2009-05-19 17:47         ` Jarek Poplawski
  1 sibling, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-19 17:47 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 07:17:03PM +0200, Jarek Poplawski wrote:
> On Tue, May 19, 2009 at 12:23:30PM -0400, Neil Horman wrote:
...
> > Of course, it helps if I attach the patch :)
> > 
> > 
> > diff --git a/include/net/dst.h b/include/net/dst.h
> > index 6be3b08..a39db6d 100644
> > --- a/include/net/dst.h
> > +++ b/include/net/dst.h
> > @@ -47,6 +47,7 @@ struct dst_entry
> >  #define DST_NOXFRM		2
> >  #define DST_NOPOLICY		4
> >  #define DST_NOHASH		8
> > +#define DST_GRPLDR		16
> >  	unsigned long		expires;
> >  
> >  	unsigned short		header_len;	/* more space at head required */
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index c4c60e9..0120f0e 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void)
> >  
> >  static inline void rt_free(struct rtable *rt)
> >  {
> > +	if (rt->u.dst.flags & DST_GRPLDR)
> > +		rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR;
> >  	call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free);
> >  }
> >  
> > @@ -1143,8 +1145,11 @@ restart:
> >  		 * relvant to the hash function together, which we use to adjust
> >  		 * our chain length
> >  		 */
> > -		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
> > +		if (!*rthi && *rthp && 
> > +		    compare_hash_inputs(&(*rthp)->fl, &rt->fl) &&
> > +		    (cand != rth))
> >  			rthi = rth;
> 
> Does it really prevent cand == rthi in the next loop?

Hmm.. It looks OK yet!

Sorry,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 17:45         ` Neil Horman
@ 2009-05-19 17:53           ` Jarek Poplawski
  2009-05-19 18:05           ` Jarek Poplawski
  1 sibling, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-19 17:53 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 01:45:04PM -0400, Neil Horman wrote:
> On Tue, May 19, 2009 at 07:17:03PM +0200, Jarek Poplawski wrote:
...
> > > diff --git a/include/net/dst.h b/include/net/dst.h
> > > index 6be3b08..a39db6d 100644
> > > --- a/include/net/dst.h
> > > +++ b/include/net/dst.h
> > > @@ -47,6 +47,7 @@ struct dst_entry
> > >  #define DST_NOXFRM		2
> > >  #define DST_NOPOLICY		4
> > >  #define DST_NOHASH		8
> > > +#define DST_GRPLDR		16
> > >  	unsigned long		expires;
> > >  
> > >  	unsigned short		header_len;	/* more space at head required */
> > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > > index c4c60e9..0120f0e 100644
> > > --- a/net/ipv4/route.c
> > > +++ b/net/ipv4/route.c
> > > @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void)
> > >  
> > >  static inline void rt_free(struct rtable *rt)
> > >  {
> > > +	if (rt->u.dst.flags & DST_GRPLDR)
> > > +		rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR;
> > >  	call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free);
> > >  }
> > >  
> > > @@ -1143,8 +1145,11 @@ restart:
> > >  		 * relvant to the hash function together, which we use to adjust
> > >  		 * our chain length
> > >  		 */
> > > -		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
> > > +		if (!*rthi && *rthp && 
> > > +		    compare_hash_inputs(&(*rthp)->fl, &rt->fl) &&
> > > +		    (cand != rth))
> > >  			rthi = rth;
> > 
> > Does it really prevent cand == rthi in the next loop?
> > 
> Yes, because cand and rthi are inspected during the same loop iteration, and
> both assigned from rth.  since I added a check which requires !rthi (which is
> actually a bug above that I need to fix), once rthi is set, it won't be moved,
> and on the next iteration, if cand is assigned, it is assigned to rth, which
> (being the next iteration), is a different rt cache entry
> 
> 
> > I didn't check Eric's patch yet, but I really don't know what's wrong
> > with something as simple as below for -stable, until "proper" fix is
> > analyzed and tested.
> > 
> Because the above fixes it without continuing to break the ordering.  You're
> change below prevents the leak, but still allows for disordered lists to form,
> which IMHO doesn't really make it a candidate for -stable.  I'd much rather fix
> both the leak and the ordering before pushing anything out
> 
> speaking of which, I'm going to ask again, I've been looking all morning, and
> I'm unable to find the move to front heuristic that you mentioned creates furthe
> list disordering.  If you can point it out to me, I can complete my patch and
> present something for you to test more throughly.

As I've written already, your patch looks OK to me.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 17:45         ` Neil Horman
  2009-05-19 17:53           ` Jarek Poplawski
@ 2009-05-19 18:05           ` Jarek Poplawski
  2009-05-19 18:16             ` Neil Horman
  1 sibling, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-19 18:05 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 01:45:04PM -0400, Neil Horman wrote:
...
> I'm unable to find the move to front heuristic that you mentioned creates furthe
> list disordering. [...]

I'd only like to make it clear the analysis you refer to is by
Alexander V. Lukyanov <lav@yar.ru>

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 18:05           ` Jarek Poplawski
@ 2009-05-19 18:16             ` Neil Horman
  2009-05-20  6:36               ` Alexander V. Lukyanov
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-05-19 18:16 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 08:05:05PM +0200, Jarek Poplawski wrote:
> On Tue, May 19, 2009 at 01:45:04PM -0400, Neil Horman wrote:
> ...
> > I'm unable to find the move to front heuristic that you mentioned creates furthe
> > list disordering. [...]
> 
> I'd only like to make it clear the analysis you refer to is by
> Alexander V. Lukyanov <lav@yar.ru>
> 
Ok, thank you for giving credit where its due :).  Alexander, can you point out
the location of the move-to-front heruistic that you mention in your analysis?

Neil

> Jarek P.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 16:20     ` Neil Horman
@ 2009-05-19 18:47       ` Eric Dumazet
  2009-05-19 19:24         ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2009-05-19 18:47 UTC (permalink / raw)
  To: Neil Horman; +Cc: Jarek Poplawski, lav, Stephen Hemminger, netdev

Neil Horman a écrit :
> On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
>> Jarek Poplawski a écrit :
>>> On 19-05-2009 04:35, Stephen Hemminger wrote:
>>>> Begin forwarded message:
>>>>
>>>> Date: Mon, 18 May 2009 14:10:20 GMT
>>>> From: bugzilla-daemon@bugzilla.kernel.org
>>>> To: shemminger@linux-foundation.org
>>>> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
>>>>
>>>>
>>>> http://bugzilla.kernel.org/show_bug.cgi?id=13339
>>> ...
>>>> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
>>>> patch has at least one critical flaw, and another problem.
>>>>
>>>> rt_intern_hash calculates rthi pointer, which is later used for new entry
>>>> insertion. The same loop calculates cand pointer which is used to clean the
>>>> list. If the pointers are the same, rtable leak occurs, as first the cand is
>>>> removed then the new entry is appended to it.
>>>>
>>>> This leak leads to unregister_netdevice problem (usage count > 0).
>>>>
>>>> Another problem of the patch is that it tries to insert the entries in certain
>>>> order, to facilitate counting of entries distinct by all but QoS parameters.
>>>> Unfortunately, referencing an existing rtable entry moves it to list beginning,
>>>> to speed up further lookups, so the carefully built order is destroyed.
>> We could change rt_check_expire() to be smarter and handle any order in chains.
>>
>> This would let rt_intern_hash() be simpler.
>>
>> As its a more performance critical path, all would be good :)
>>
>>>> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
>>>> it will also destroy the ordering.
>>> I think fixing this bug fast is more important than this
>>> ordering or counting. Could you send your patch proposal?
>>>
> 
> I was thinking something along these lines (also completely untested).  The
> extra check in rt_intern hash prevents us from identifying a 'group leader' that
> is about to be deleted, and sets rthi to point at the group leader rather than
> the last entry in the group as it previously did, which we then mark with the
> new flag in the dst_entry (updating it on rt_free of course).  This lets us
> identify the boundaries of a group, so when we do a move to front heruistic, we
> can move the entire group rather than just the single entry.    I prefer this to
> Erics approach since, even though rt_check_expire isn't in a performance
> critical path, the addition of the extra list search when the list is unordered
> makes rt_check_expire run in O(n^2) rather than O(n), which I think will have a
> significant impact on high traffic systems.  Since the ordered list creation was
> done at almost zero performance cost in rt_intern_hash, I'd like to keep it if
> at all possible.
> 
> There is of course a shortcomming in my patch in that it doesn't actually do
> anything currently with DST_GRPLDR other than set/clear it appropriately, I've
> yet to identify where the route cache move to front heruistic is implemented.
> If someone could show me where that is, I'd be happy to finish this patch.
> 

O(n^2) ? Not exactly ...

N * (Y + x)  instead of N * (Y),  where x <= Y/100, and we can make x being 0
in fact with proper prefetching.

Real cost is bringing into CPU cache the information needed, by two
orders of magnitude. (one cache line miss : more than 200 cycles,
and we need at least two cache lines per rtable.)

Adding 'group' notion to dst entries is overengineering an already
too complex ipv4/route.c file, that is almost unreadable these days,
since even you can not spot the point where we move the entry at chain
front.

Hint : search for /* Put it first */ in rt_intern_hash()

Moving whole group in front would defeat the purpose of move, actually,
since rank in chain is used to decay the timeout in garbage collector.
(search for tmo >>= 1; )


Here is is a revised patch, doing the length computation inside
the loop, while cpu is prefeching next entry into its cache.

BTW, we actually have another bug, since length is set to 0 at the wrong place
(before the for (; goal > 0; goal--) loop)

We must of course reset length to 0 for each chain.

All credits for Alexander V. Lukyanov for doing the analysis of course :)


 net/ipv4/route.c |   57 ++++++++++++++-------------------------------
 1 files changed, 18 insertions(+), 39 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4c60e9..769f72e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -784,7 +784,7 @@ static void rt_check_expire(void)
 {
 	static unsigned int rover;
 	unsigned int i = rover, goal;
-	struct rtable *rth, **rthp;
+	struct rtable *rth, *aux, **rthp;
 	unsigned long length = 0, samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	u64 mult;
@@ -795,7 +795,6 @@ static void rt_check_expire(void)
 	goal = (unsigned int)mult;
 	if (goal > rt_hash_mask)
 		goal = rt_hash_mask + 1;
-	length = 0;
 	for (; goal > 0; goal--) {
 		unsigned long tmo = ip_rt_gc_timeout;
 
@@ -809,8 +808,10 @@ static void rt_check_expire(void)
 
 		if (*rthp == NULL)
 			continue;
+		length = 0;
 		spin_lock_bh(rt_hash_lock_addr(i));
 		while ((rth = *rthp) != NULL) {
+			prefetch(rth->u.dst.rt_next);
 			if (rt_is_expired(rth)) {
 				*rthp = rth->u.dst.rt_next;
 				rt_free(rth);
@@ -819,33 +820,30 @@ static void rt_check_expire(void)
 			if (rth->u.dst.expires) {
 				/* Entry is expired even if it is in use */
 				if (time_before_eq(jiffies, rth->u.dst.expires)) {
+nofree:
 					tmo >>= 1;
 					rthp = &rth->u.dst.rt_next;
 					/*
-					 * Only bump our length if the hash
-					 * inputs on entries n and n+1 are not
-					 * the same, we only count entries on
+					 * We only count entries on
 					 * a chain with equal hash inputs once
 					 * so that entries for different QOS
 					 * levels, and other non-hash input
 					 * attributes don't unfairly skew
 					 * the length computation
 					 */
-					if ((*rthp == NULL) ||
-					    !compare_hash_inputs(&(*rthp)->fl,
-								 &rth->fl))
-						length += ONE;
+					for (aux = rt_hash_table[i].chain;;) {
+						if (aux == rth) {
+							length += ONE;
+							break;
+						}
+						if (compare_hash_inputs(&aux->fl, &rth->fl))
+							break;
+						aux = aux->u.dst.rt_next;
+					}
 					continue;
 				}
-			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) {
-				tmo >>= 1;
-				rthp = &rth->u.dst.rt_next;
-				if ((*rthp == NULL) ||
-				    !compare_hash_inputs(&(*rthp)->fl,
-							 &rth->fl))
-					length += ONE;
-				continue;
-			}
+			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
+				goto nofree;
 
 			/* Cleanup aged off entries. */
 			*rthp = rth->u.dst.rt_next;
@@ -1068,7 +1066,6 @@ out:	return 0;
 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
 	struct rtable	*rth, **rthp;
-	struct rtable	*rthi;
 	unsigned long	now;
 	struct rtable *cand, **candp;
 	u32 		min_score;
@@ -1088,7 +1085,6 @@ restart:
 	}
 
 	rthp = &rt_hash_table[hash].chain;
-	rthi = NULL;
 
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	while ((rth = *rthp) != NULL) {
@@ -1134,17 +1130,6 @@ restart:
 		chain_length++;
 
 		rthp = &rth->u.dst.rt_next;
-
-		/*
-		 * check to see if the next entry in the chain
-		 * contains the same hash input values as rt.  If it does
-		 * This is where we will insert into the list, instead of
-		 * at the head.  This groups entries that differ by aspects not
-		 * relvant to the hash function together, which we use to adjust
-		 * our chain length
-		 */
-		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
-			rthi = rth;
 	}
 
 	if (cand) {
@@ -1205,10 +1190,7 @@ restart:
 		}
 	}
 
-	if (rthi)
-		rt->u.dst.rt_next = rthi->u.dst.rt_next;
-	else
-		rt->u.dst.rt_next = rt_hash_table[hash].chain;
+	rt->u.dst.rt_next = rt_hash_table[hash].chain;
 
 #if RT_CACHE_DEBUG >= 2
 	if (rt->u.dst.rt_next) {
@@ -1224,10 +1206,7 @@ restart:
 	 * previous writes to rt are comitted to memory
 	 * before making rt visible to other CPUS.
 	 */
-	if (rthi)
-		rcu_assign_pointer(rthi->u.dst.rt_next, rt);
-	else
-		rcu_assign_pointer(rt_hash_table[hash].chain, rt);
+	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
 
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 	*rp = rt;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 18:47       ` Eric Dumazet
@ 2009-05-19 19:24         ` Neil Horman
  2009-05-19 22:05           ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-05-19 19:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, lav, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 08:47:54PM +0200, Eric Dumazet wrote:
> Neil Horman a écrit :
> > On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> >> Jarek Poplawski a écrit :
> >>> On 19-05-2009 04:35, Stephen Hemminger wrote:
> >>>> Begin forwarded message:
> >>>>
> >>>> Date: Mon, 18 May 2009 14:10:20 GMT
> >>>> From: bugzilla-daemon@bugzilla.kernel.org
> >>>> To: shemminger@linux-foundation.org
> >>>> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> >>>>
> >>>>
> >>>> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> >>> ...
> >>>> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> >>>> patch has at least one critical flaw, and another problem.
> >>>>
> >>>> rt_intern_hash calculates rthi pointer, which is later used for new entry
> >>>> insertion. The same loop calculates cand pointer which is used to clean the
> >>>> list. If the pointers are the same, rtable leak occurs, as first the cand is
> >>>> removed then the new entry is appended to it.
> >>>>
> >>>> This leak leads to unregister_netdevice problem (usage count > 0).
> >>>>
> >>>> Another problem of the patch is that it tries to insert the entries in certain
> >>>> order, to facilitate counting of entries distinct by all but QoS parameters.
> >>>> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> >>>> to speed up further lookups, so the carefully built order is destroyed.
> >> We could change rt_check_expire() to be smarter and handle any order in chains.
> >>
> >> This would let rt_intern_hash() be simpler.
> >>
> >> As its a more performance critical path, all would be good :)
> >>
> >>>> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> >>>> it will also destroy the ordering.
> >>> I think fixing this bug fast is more important than this
> >>> ordering or counting. Could you send your patch proposal?
> >>>
> > 
> > I was thinking something along these lines (also completely untested).  The
> > extra check in rt_intern hash prevents us from identifying a 'group leader' that
> > is about to be deleted, and sets rthi to point at the group leader rather than
> > the last entry in the group as it previously did, which we then mark with the
> > new flag in the dst_entry (updating it on rt_free of course).  This lets us
> > identify the boundaries of a group, so when we do a move to front heruistic, we
> > can move the entire group rather than just the single entry.    I prefer this to
> > Erics approach since, even though rt_check_expire isn't in a performance
> > critical path, the addition of the extra list search when the list is unordered
> > makes rt_check_expire run in O(n^2) rather than O(n), which I think will have a
> > significant impact on high traffic systems.  Since the ordered list creation was
> > done at almost zero performance cost in rt_intern_hash, I'd like to keep it if
> > at all possible.
> > 
> > There is of course a shortcomming in my patch in that it doesn't actually do
> > anything currently with DST_GRPLDR other than set/clear it appropriately, I've
> > yet to identify where the route cache move to front heruistic is implemented.
> > If someone could show me where that is, I'd be happy to finish this patch.
> > 
> 
> O(n^2) ? Not exactly ...
> 
you're right, on closer examination, its not O(n^2), but its not what you have
below either I don't think
 
> N * (Y + x)  instead of N * (Y),  where x <= Y/100, and we can make x being 0
> in fact with proper prefetching.
> 
From the way I read you patch (which is actually pretty clever as I look at it),
you add an extra loop that searches from the start of the given chain to the
point at which the outer loop is set looking for a duplicate, so that each
rtable entry which varies only by non-hash relevant values is counted only once.
Thats good.  But that means for a chain of n nodes in which you are checking the
ith node for duplicates you need to visit at worst i other nodes, so to process
the entire chain we need to visit SUM(i=1..n)(i) == (n(n-1)/2) nodes rather than
the simply linear search requireing the inspection of only n nodes.  I see how
with prefetching the cost of the additional node visitation can be minimized,
but thats still a significantly larger set of nodes to check per expiration run.

> Real cost is bringing into CPU cache the information needed, by two
> orders of magnitude. (one cache line miss : more than 200 cycles,
> and we need at least two cache lines per rtable.)

> Adding 'group' notion to dst entries is overengineering an already
> too complex ipv4/route.c file, that is almost unreadable these days,
> since even you can not spot the point where we move the entry at chain
> front.
> 
> Hint : search for /* Put it first */ in rt_intern_hash()
> 
Ah! So its not actually on rtable access (i.e. route lookup) that we do a
move-to-front, its only when we're hashing in new entries.  That explains why I
didn't see it, I was thinking route lookups moved the entry to the front, not
inserts.

> Moving whole group in front would defeat the purpose of move, actually,
> since rank in chain is used to decay the timeout in garbage collector.
> (search for tmo >>= 1; )
> 
Argh, so the list is implicitly ordered by expiration time.  That really defeats
the entire purpose of doing grouping in the ilst at all.  If thats the case,
then I agree, its probably better to to take the additional visitation hit in in
check_expire above than to try and preserve ordering. 

Thanks
Neil


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 19:24         ` Neil Horman
@ 2009-05-19 22:05           ` David Miller
  2009-05-19 23:05             ` Neil Horman
  2009-05-20  4:54             ` [PATCH] net: fix length computation in rt_check_expire() Eric Dumazet
  0 siblings, 2 replies; 28+ messages in thread
From: David Miller @ 2009-05-19 22:05 UTC (permalink / raw)
  To: nhorman; +Cc: dada1, jarkao2, lav, shemminger, netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 19 May 2009 15:24:50 -0400

>> Moving whole group in front would defeat the purpose of move, actually,
>> since rank in chain is used to decay the timeout in garbage collector.
>> (search for tmo >>= 1; )
>> 
> Argh, so the list is implicitly ordered by expiration time.  That
> really defeats the entire purpose of doing grouping in the ilst at
> all.  If thats the case, then I agree, its probably better to to
> take the additional visitation hit in in check_expire above than to
> try and preserve ordering.

Yes, this seems best.

I was worried that somehow the ordering also influences lookups,
because the TOS bits don't go into the hash so I worried that it would
be important that explicit TOS values appear before wildcard ones.
But it doesn't appear that this is an issue, we don't have wildcard
TOSs in the rtable entries, they are always explicit.

So I would like to see an explicit final patch from Eric so we can get
this fixed now.

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 22:05           ` David Miller
@ 2009-05-19 23:05             ` Neil Horman
  2009-05-20  4:54             ` [PATCH] net: fix length computation in rt_check_expire() Eric Dumazet
  1 sibling, 0 replies; 28+ messages in thread
From: Neil Horman @ 2009-05-19 23:05 UTC (permalink / raw)
  To: David Miller; +Cc: dada1, jarkao2, lav, shemminger, netdev

On Tue, May 19, 2009 at 03:05:17PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Tue, 19 May 2009 15:24:50 -0400
> 
> >> Moving whole group in front would defeat the purpose of move, actually,
> >> since rank in chain is used to decay the timeout in garbage collector.
> >> (search for tmo >>= 1; )
> >> 
> > Argh, so the list is implicitly ordered by expiration time.  That
> > really defeats the entire purpose of doing grouping in the ilst at
> > all.  If thats the case, then I agree, its probably better to to
> > take the additional visitation hit in in check_expire above than to
> > try and preserve ordering.
> 
> Yes, this seems best.
> 
> I was worried that somehow the ordering also influences lookups,
> because the TOS bits don't go into the hash so I worried that it would
> be important that explicit TOS values appear before wildcard ones.
> But it doesn't appear that this is an issue, we don't have wildcard
> TOSs in the rtable entries, they are always explicit.
> 
> So I would like to see an explicit final patch from Eric so we can get
> this fixed now.
> 
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Agreed, thanks once again for educating me Eric!
Neil


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] net: fix length computation in rt_check_expire()
  2009-05-19 22:05           ` David Miller
  2009-05-19 23:05             ` Neil Horman
@ 2009-05-20  4:54             ` Eric Dumazet
  2009-05-20  6:13               ` David Miller
                                 ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Eric Dumazet @ 2009-05-20  4:54 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, jarkao2, lav, shemminger, netdev

David Miller a écrit :
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Tue, 19 May 2009 15:24:50 -0400
> 
>>> Moving whole group in front would defeat the purpose of move, actually,
>>> since rank in chain is used to decay the timeout in garbage collector.
>>> (search for tmo >>= 1; )
>>>
>> Argh, so the list is implicitly ordered by expiration time.  That
>> really defeats the entire purpose of doing grouping in the ilst at
>> all.  If thats the case, then I agree, its probably better to to
>> take the additional visitation hit in in check_expire above than to
>> try and preserve ordering.
> 
> Yes, this seems best.
> 
> I was worried that somehow the ordering also influences lookups,
> because the TOS bits don't go into the hash so I worried that it would
> be important that explicit TOS values appear before wildcard ones.
> But it doesn't appear that this is an issue, we don't have wildcard
> TOSs in the rtable entries, they are always explicit.
> 
> So I would like to see an explicit final patch from Eric so we can get
> this fixed now.
> 

I would like to split patches because we have two bugs indeed, and
I prefer to get attention for both problems, I dont remember Neil acknowledged
the length computation problem.

First and small patch, candidate for net-2.6 and stable (for 2.6.29) :

Thank you

[PATCH] net: fix length computation in rt_check_expire()

rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead 
of meaningful results.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 net/ipv4/route.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c4c60e9..869cf1c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -785,7 +785,7 @@ static void rt_check_expire(void)
 	static unsigned int rover;
 	unsigned int i = rover, goal;
 	struct rtable *rth, **rthp;
-	unsigned long length = 0, samples = 0;
+	unsigned long samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	u64 mult;
 
@@ -795,9 +795,9 @@ static void rt_check_expire(void)
 	goal = (unsigned int)mult;
 	if (goal > rt_hash_mask)
 		goal = rt_hash_mask + 1;
-	length = 0;
 	for (; goal > 0; goal--) {
 		unsigned long tmo = ip_rt_gc_timeout;
+		unsigned long length;
 
 		i = (i + 1) & rt_hash_mask;
 		rthp = &rt_hash_table[i].chain;
@@ -809,6 +809,7 @@ static void rt_check_expire(void)
 
 		if (*rthp == NULL)
 			continue;
+		length = 0;
 		spin_lock_bh(rt_hash_lock_addr(i));
 		while ((rth = *rthp) != NULL) {
 			if (rt_is_expired(rth)) {


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix length computation in rt_check_expire()
  2009-05-20  4:54             ` [PATCH] net: fix length computation in rt_check_expire() Eric Dumazet
@ 2009-05-20  6:13               ` David Miller
  2009-05-20  6:14               ` [PATCH] net: fix rtable leak in net/ipv4/route.c Eric Dumazet
  2009-05-20 10:27               ` [PATCH] net: fix length computation in rt_check_expire() Neil Horman
  2 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2009-05-20  6:13 UTC (permalink / raw)
  To: dada1; +Cc: nhorman, jarkao2, lav, shemminger, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 20 May 2009 06:54:22 +0200

> I would like to split patches because we have two bugs indeed, and
> I prefer to get attention for both problems, I dont remember Neil acknowledged
> the length computation problem.
> 
> First and small patch, candidate for net-2.6 and stable (for 2.6.29) :
> 
> Thank you
> 
> [PATCH] net: fix length computation in rt_check_expire()
> 
> rt_check_expire() computes average and standard deviation of chain lengths,
> but not correclty reset length to 0 at beginning of each chain.
> This probably gives overflows for sum2 (and sum) on loaded machines instead 
> of meaningful results.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Yes I definitely agree this length initialization is a bug.

I'll integrate this fix soon, thanks Eric.  You can send me the
other part of the bug fix relative to this if you like.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] net: fix rtable leak in net/ipv4/route.c
  2009-05-20  4:54             ` [PATCH] net: fix length computation in rt_check_expire() Eric Dumazet
  2009-05-20  6:13               ` David Miller
@ 2009-05-20  6:14               ` Eric Dumazet
  2009-05-20 10:03                 ` Jarek Poplawski
  2009-05-20 10:48                 ` Neil Horman
  2009-05-20 10:27               ` [PATCH] net: fix length computation in rt_check_expire() Neil Horman
  2 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2009-05-20  6:14 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, jarkao2, lav, shemminger, netdev

Eric Dumazet a écrit :
> David Miller a écrit :
>> From: Neil Horman <nhorman@tuxdriver.com>
>> Date: Tue, 19 May 2009 15:24:50 -0400
>>
>>>> Moving whole group in front would defeat the purpose of move, actually,
>>>> since rank in chain is used to decay the timeout in garbage collector.
>>>> (search for tmo >>= 1; )
>>>>
>>> Argh, so the list is implicitly ordered by expiration time.  That
>>> really defeats the entire purpose of doing grouping in the ilst at
>>> all.  If thats the case, then I agree, its probably better to to
>>> take the additional visitation hit in in check_expire above than to
>>> try and preserve ordering.
>> Yes, this seems best.
>>
>> I was worried that somehow the ordering also influences lookups,
>> because the TOS bits don't go into the hash so I worried that it would
>> be important that explicit TOS values appear before wildcard ones.
>> But it doesn't appear that this is an issue, we don't have wildcard
>> TOSs in the rtable entries, they are always explicit.
>>
>> So I would like to see an explicit final patch from Eric so we can get
>> this fixed now.
>>
> 
> I would like to split patches because we have two bugs indeed, and
> I prefer to get attention for both problems, I dont remember Neil acknowledged
> the length computation problem.
> 
> First and small patch, candidate for net-2.6 and stable (for 2.6.29) :
> 

Then here is the patch on top on previous one.

Thanks to all

[PATCH] net: fix rtable leak in net/ipv4/route.c

Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation 
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation


Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.

Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 net/ipv4/route.c |   55 +++++++++++++--------------------------------
 1 files changed, 17 insertions(+), 38 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 869cf1c..28205e5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -784,7 +784,7 @@ static void rt_check_expire(void)
 {
 	static unsigned int rover;
 	unsigned int i = rover, goal;
-	struct rtable *rth, **rthp;
+	struct rtable *rth, *aux, **rthp;
 	unsigned long samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	u64 mult;
@@ -812,6 +812,7 @@ static void rt_check_expire(void)
 		length = 0;
 		spin_lock_bh(rt_hash_lock_addr(i));
 		while ((rth = *rthp) != NULL) {
+			prefetch(rth->u.dst.rt_next);
 			if (rt_is_expired(rth)) {
 				*rthp = rth->u.dst.rt_next;
 				rt_free(rth);
@@ -820,33 +821,30 @@ static void rt_check_expire(void)
 			if (rth->u.dst.expires) {
 				/* Entry is expired even if it is in use */
 				if (time_before_eq(jiffies, rth->u.dst.expires)) {
+nofree:
 					tmo >>= 1;
 					rthp = &rth->u.dst.rt_next;
 					/*
-					 * Only bump our length if the hash
-					 * inputs on entries n and n+1 are not
-					 * the same, we only count entries on
+					 * We only count entries on
 					 * a chain with equal hash inputs once
 					 * so that entries for different QOS
 					 * levels, and other non-hash input
 					 * attributes don't unfairly skew
 					 * the length computation
 					 */
-					if ((*rthp == NULL) ||
-					    !compare_hash_inputs(&(*rthp)->fl,
-								 &rth->fl))
-						length += ONE;
+					for (aux = rt_hash_table[i].chain;;) {
+						if (aux == rth) {
+							length += ONE;
+							break;
+						}
+						if (compare_hash_inputs(&aux->fl, &rth->fl))
+							break;
+						aux = aux->u.dst.rt_next;
+					}
 					continue;
 				}
-			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) {
-				tmo >>= 1;
-				rthp = &rth->u.dst.rt_next;
-				if ((*rthp == NULL) ||
-				    !compare_hash_inputs(&(*rthp)->fl,
-							 &rth->fl))
-					length += ONE;
-				continue;
-			}
+			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
+				goto nofree;
 
 			/* Cleanup aged off entries. */
 			*rthp = rth->u.dst.rt_next;
@@ -1069,7 +1067,6 @@ out:	return 0;
 static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
 {
 	struct rtable	*rth, **rthp;
-	struct rtable	*rthi;
 	unsigned long	now;
 	struct rtable *cand, **candp;
 	u32 		min_score;
@@ -1089,7 +1086,6 @@ restart:
 	}
 
 	rthp = &rt_hash_table[hash].chain;
-	rthi = NULL;
 
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	while ((rth = *rthp) != NULL) {
@@ -1135,17 +1131,6 @@ restart:
 		chain_length++;
 
 		rthp = &rth->u.dst.rt_next;
-
-		/*
-		 * check to see if the next entry in the chain
-		 * contains the same hash input values as rt.  If it does
-		 * This is where we will insert into the list, instead of
-		 * at the head.  This groups entries that differ by aspects not
-		 * relvant to the hash function together, which we use to adjust
-		 * our chain length
-		 */
-		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
-			rthi = rth;
 	}
 
 	if (cand) {
@@ -1206,10 +1191,7 @@ restart:
 		}
 	}
 
-	if (rthi)
-		rt->u.dst.rt_next = rthi->u.dst.rt_next;
-	else
-		rt->u.dst.rt_next = rt_hash_table[hash].chain;
+	rt->u.dst.rt_next = rt_hash_table[hash].chain;
 
 #if RT_CACHE_DEBUG >= 2
 	if (rt->u.dst.rt_next) {
@@ -1225,10 +1207,7 @@ restart:
 	 * previous writes to rt are comitted to memory
 	 * before making rt visible to other CPUS.
 	 */
-	if (rthi)
-		rcu_assign_pointer(rthi->u.dst.rt_next, rt);
-	else
-		rcu_assign_pointer(rt_hash_table[hash].chain, rt);
+	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
 
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 	*rp = rt;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
  2009-05-19 18:16             ` Neil Horman
@ 2009-05-20  6:36               ` Alexander V. Lukyanov
  0 siblings, 0 replies; 28+ messages in thread
From: Alexander V. Lukyanov @ 2009-05-20  6:36 UTC (permalink / raw)
  To: Neil Horman; +Cc: Jarek Poplawski, Eric Dumazet, Stephen Hemminger, netdev

On Tue, May 19, 2009 at 02:16:07PM -0400, Neil Horman wrote:
> Alexander, can you point out
> the location of the move-to-front heruistic that you mention in your analysis?

It's near /* Put it first */ comment (if I read the code correctly).

-- 
   Alexander..

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix rtable leak in net/ipv4/route.c
  2009-05-20  6:14               ` [PATCH] net: fix rtable leak in net/ipv4/route.c Eric Dumazet
@ 2009-05-20 10:03                 ` Jarek Poplawski
  2009-05-20 11:13                   ` Eric Dumazet
  2009-05-20 10:48                 ` Neil Horman
  1 sibling, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-20 10:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, nhorman, lav, shemminger, netdev

On Wed, May 20, 2009 at 08:14:28AM +0200, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > David Miller a écrit :
> >> From: Neil Horman <nhorman@tuxdriver.com>
> >> Date: Tue, 19 May 2009 15:24:50 -0400
> >>
> >>>> Moving whole group in front would defeat the purpose of move, actually,
> >>>> since rank in chain is used to decay the timeout in garbage collector.
> >>>> (search for tmo >>= 1; )
> >>>>
> >>> Argh, so the list is implicitly ordered by expiration time.  That
> >>> really defeats the entire purpose of doing grouping in the ilst at
> >>> all.  If thats the case, then I agree, its probably better to to
> >>> take the additional visitation hit in in check_expire above than to
> >>> try and preserve ordering.
> >> Yes, this seems best.
> >>
> >> I was worried that somehow the ordering also influences lookups,
> >> because the TOS bits don't go into the hash so I worried that it would
> >> be important that explicit TOS values appear before wildcard ones.
> >> But it doesn't appear that this is an issue, we don't have wildcard
> >> TOSs in the rtable entries, they are always explicit.
> >>
> >> So I would like to see an explicit final patch from Eric so we can get
> >> this fixed now.
> >>
> > 
> > I would like to split patches because we have two bugs indeed, and
> > I prefer to get attention for both problems, I dont remember Neil acknowledged
> > the length computation problem.
> > 
> > First and small patch, candidate for net-2.6 and stable (for 2.6.29) :
> > 
> 
> Then here is the patch on top on previous one.
> 
> Thanks to all
> 
> [PATCH] net: fix rtable leak in net/ipv4/route.c
> 
> Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
> analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
> Quoted here because its a perfect one :
> 
> begin_of_quotation 
>  2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
>  patch has at least one critical flaw, and another problem.
> 
>  rt_intern_hash calculates rthi pointer, which is later used for new entry
>  insertion. The same loop calculates cand pointer which is used to clean the
>  list. If the pointers are the same, rtable leak occurs, as first the cand is
>  removed then the new entry is appended to it.
> 
>  This leak leads to unregister_netdevice problem (usage count > 0).
> 
>  Another problem of the patch is that it tries to insert the entries in certain
>  order, to facilitate counting of entries distinct by all but QoS parameters.
>  Unfortunately, referencing an existing rtable entry moves it to list beginning,
>  to speed up further lookups, so the carefully built order is destroyed.
> 
>  For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
>  it will also destroy the ordering.
> end_of_quotation
> 
> 
> Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
> (net: implement emergency route cache rebulds when gc_elasticity is exceeded)
> 
> Trying to keep dst_entries ordered is too complex and breaks the fact that
> order should depend on the frequency of use for garbage collection.
> 
> A possible fix is to make rt_intern_hash() simpler, and only makes
> rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
> entries order. The added loop is running on cache hot data, while cpu
> is prefetching next object, so should be unnoticied.
> 
> Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
>  net/ipv4/route.c |   55 +++++++++++++--------------------------------
>  1 files changed, 17 insertions(+), 38 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 869cf1c..28205e5 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -784,7 +784,7 @@ static void rt_check_expire(void)
>  {
>  	static unsigned int rover;
>  	unsigned int i = rover, goal;
> -	struct rtable *rth, **rthp;
> +	struct rtable *rth, *aux, **rthp;
>  	unsigned long samples = 0;
>  	unsigned long sum = 0, sum2 = 0;
>  	u64 mult;
> @@ -812,6 +812,7 @@ static void rt_check_expire(void)
>  		length = 0;
>  		spin_lock_bh(rt_hash_lock_addr(i));
>  		while ((rth = *rthp) != NULL) {
> +			prefetch(rth->u.dst.rt_next);
>  			if (rt_is_expired(rth)) {
>  				*rthp = rth->u.dst.rt_next;
>  				rt_free(rth);
> @@ -820,33 +821,30 @@ static void rt_check_expire(void)
>  			if (rth->u.dst.expires) {
>  				/* Entry is expired even if it is in use */
>  				if (time_before_eq(jiffies, rth->u.dst.expires)) {
> +nofree:
>  					tmo >>= 1;
>  					rthp = &rth->u.dst.rt_next;
>  					/*
> -					 * Only bump our length if the hash
> -					 * inputs on entries n and n+1 are not
> -					 * the same, we only count entries on
> +					 * We only count entries on
>  					 * a chain with equal hash inputs once
>  					 * so that entries for different QOS
>  					 * levels, and other non-hash input
>  					 * attributes don't unfairly skew
>  					 * the length computation
>  					 */
> -					if ((*rthp == NULL) ||
> -					    !compare_hash_inputs(&(*rthp)->fl,
> -								 &rth->fl))
> -						length += ONE;
> +					for (aux = rt_hash_table[i].chain;;) {
> +						if (aux == rth) {
> +							length += ONE;
> +							break;
> +						}
> +						if (compare_hash_inputs(&aux->fl, &rth->fl))
> +							break;
> +						aux = aux->u.dst.rt_next;
> +					}

Very "interesting" for() usage, but isn't it more readable like this?:

					aux = rt_hash_table[i].chain;
					while (aux != rth) {
						if (compare_hash_inputs(&aux->fl, &rth->fl))
							break;
						aux = aux->u.dst.rt_next;
					}

					if (aux == rth)
						length += ONE;

Jarek P.

>  					continue;
>  				}
> -			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout)) {
> -				tmo >>= 1;
> -				rthp = &rth->u.dst.rt_next;
> -				if ((*rthp == NULL) ||
> -				    !compare_hash_inputs(&(*rthp)->fl,
> -							 &rth->fl))
> -					length += ONE;
> -				continue;
> -			}
> +			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
> +				goto nofree;
>  
>  			/* Cleanup aged off entries. */
>  			*rthp = rth->u.dst.rt_next;
> @@ -1069,7 +1067,6 @@ out:	return 0;
>  static int rt_intern_hash(unsigned hash, struct rtable *rt, struct rtable **rp)
>  {
>  	struct rtable	*rth, **rthp;
> -	struct rtable	*rthi;
>  	unsigned long	now;
>  	struct rtable *cand, **candp;
>  	u32 		min_score;
> @@ -1089,7 +1086,6 @@ restart:
>  	}
>  
>  	rthp = &rt_hash_table[hash].chain;
> -	rthi = NULL;
>  
>  	spin_lock_bh(rt_hash_lock_addr(hash));
>  	while ((rth = *rthp) != NULL) {
> @@ -1135,17 +1131,6 @@ restart:
>  		chain_length++;
>  
>  		rthp = &rth->u.dst.rt_next;
> -
> -		/*
> -		 * check to see if the next entry in the chain
> -		 * contains the same hash input values as rt.  If it does
> -		 * This is where we will insert into the list, instead of
> -		 * at the head.  This groups entries that differ by aspects not
> -		 * relvant to the hash function together, which we use to adjust
> -		 * our chain length
> -		 */
> -		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
> -			rthi = rth;
>  	}
>  
>  	if (cand) {
> @@ -1206,10 +1191,7 @@ restart:
>  		}
>  	}
>  
> -	if (rthi)
> -		rt->u.dst.rt_next = rthi->u.dst.rt_next;
> -	else
> -		rt->u.dst.rt_next = rt_hash_table[hash].chain;
> +	rt->u.dst.rt_next = rt_hash_table[hash].chain;
>  
>  #if RT_CACHE_DEBUG >= 2
>  	if (rt->u.dst.rt_next) {
> @@ -1225,10 +1207,7 @@ restart:
>  	 * previous writes to rt are comitted to memory
>  	 * before making rt visible to other CPUS.
>  	 */
> -	if (rthi)
> -		rcu_assign_pointer(rthi->u.dst.rt_next, rt);
> -	else
> -		rcu_assign_pointer(rt_hash_table[hash].chain, rt);
> +	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
>  
>  	spin_unlock_bh(rt_hash_lock_addr(hash));
>  	*rp = rt;
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix length computation in rt_check_expire()
  2009-05-20  4:54             ` [PATCH] net: fix length computation in rt_check_expire() Eric Dumazet
  2009-05-20  6:13               ` David Miller
  2009-05-20  6:14               ` [PATCH] net: fix rtable leak in net/ipv4/route.c Eric Dumazet
@ 2009-05-20 10:27               ` Neil Horman
  2009-05-21  0:19                 ` David Miller
  2 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-05-20 10:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, jarkao2, lav, shemminger, netdev

On Wed, May 20, 2009 at 06:54:22AM +0200, Eric Dumazet wrote:
> David Miller a écrit :
> > From: Neil Horman <nhorman@tuxdriver.com>
> > Date: Tue, 19 May 2009 15:24:50 -0400
> > 
> >>> Moving whole group in front would defeat the purpose of move, actually,
> >>> since rank in chain is used to decay the timeout in garbage collector.
> >>> (search for tmo >>= 1; )
> >>>
> >> Argh, so the list is implicitly ordered by expiration time.  That
> >> really defeats the entire purpose of doing grouping in the ilst at
> >> all.  If thats the case, then I agree, its probably better to to
> >> take the additional visitation hit in in check_expire above than to
> >> try and preserve ordering.
> > 
> > Yes, this seems best.
> > 
> > I was worried that somehow the ordering also influences lookups,
> > because the TOS bits don't go into the hash so I worried that it would
> > be important that explicit TOS values appear before wildcard ones.
> > But it doesn't appear that this is an issue, we don't have wildcard
> > TOSs in the rtable entries, they are always explicit.
> > 
> > So I would like to see an explicit final patch from Eric so we can get
> > this fixed now.
> > 
> 
> I would like to split patches because we have two bugs indeed, and
> I prefer to get attention for both problems, I dont remember Neil acknowledged
> the length computation problem.
> 
> First and small patch, candidate for net-2.6 and stable (for 2.6.29) :
> 
> Thank you
> 
> [PATCH] net: fix length computation in rt_check_expire()
> 
> rt_check_expire() computes average and standard deviation of chain lengths,
> but not correclty reset length to 0 at beginning of each chain.
> This probably gives overflows for sum2 (and sum) on loaded machines instead 
> of meaningful results.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix rtable leak in net/ipv4/route.c
  2009-05-20  6:14               ` [PATCH] net: fix rtable leak in net/ipv4/route.c Eric Dumazet
  2009-05-20 10:03                 ` Jarek Poplawski
@ 2009-05-20 10:48                 ` Neil Horman
  2009-05-21  0:19                   ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-05-20 10:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, jarkao2, lav, shemminger, netdev

On Wed, May 20, 2009 at 08:14:28AM +0200, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > David Miller a écrit :
> >> From: Neil Horman <nhorman@tuxdriver.com>
> >> Date: Tue, 19 May 2009 15:24:50 -0400
> >>
> >>>> Moving whole group in front would defeat the purpose of move, actually,
> >>>> since rank in chain is used to decay the timeout in garbage collector.
> >>>> (search for tmo >>= 1; )
> >>>>
> >>> Argh, so the list is implicitly ordered by expiration time.  That
> >>> really defeats the entire purpose of doing grouping in the ilst at
> >>> all.  If thats the case, then I agree, its probably better to to
> >>> take the additional visitation hit in in check_expire above than to
> >>> try and preserve ordering.
> >> Yes, this seems best.
> >>
> >> I was worried that somehow the ordering also influences lookups,
> >> because the TOS bits don't go into the hash so I worried that it would
> >> be important that explicit TOS values appear before wildcard ones.
> >> But it doesn't appear that this is an issue, we don't have wildcard
> >> TOSs in the rtable entries, they are always explicit.
> >>
> >> So I would like to see an explicit final patch from Eric so we can get
> >> this fixed now.
> >>
> > 
> > I would like to split patches because we have two bugs indeed, and
> > I prefer to get attention for both problems, I dont remember Neil acknowledged
> > the length computation problem.
> > 
> > First and small patch, candidate for net-2.6 and stable (for 2.6.29) :
> > 
> 
> Then here is the patch on top on previous one.
> 
> Thanks to all
> 
> [PATCH] net: fix rtable leak in net/ipv4/route.c
> 
> Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
> analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
> Quoted here because its a perfect one :
> 
> begin_of_quotation 
>  2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
>  patch has at least one critical flaw, and another problem.
> 
>  rt_intern_hash calculates rthi pointer, which is later used for new entry
>  insertion. The same loop calculates cand pointer which is used to clean the
>  list. If the pointers are the same, rtable leak occurs, as first the cand is
>  removed then the new entry is appended to it.
> 
>  This leak leads to unregister_netdevice problem (usage count > 0).
> 
>  Another problem of the patch is that it tries to insert the entries in certain
>  order, to facilitate counting of entries distinct by all but QoS parameters.
>  Unfortunately, referencing an existing rtable entry moves it to list beginning,
>  to speed up further lookups, so the carefully built order is destroyed.
> 
>  For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
>  it will also destroy the ordering.
> end_of_quotation
> 
> 
> Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
> (net: implement emergency route cache rebulds when gc_elasticity is exceeded)
> 
> Trying to keep dst_entries ordered is too complex and breaks the fact that
> order should depend on the frequency of use for garbage collection.
> 
> A possible fix is to make rt_intern_hash() simpler, and only makes
> rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
> entries order. The added loop is running on cache hot data, while cpu
> is prefetching next object, so should be unnoticied.
> 
> Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix rtable leak in net/ipv4/route.c
  2009-05-20 10:03                 ` Jarek Poplawski
@ 2009-05-20 11:13                   ` Eric Dumazet
  2009-05-20 11:37                     ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2009-05-20 11:13 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, nhorman, lav, shemminger, netdev

Jarek Poplawski a écrit :
> On Wed, May 20, 2009 at 08:14:28AM +0200, Eric Dumazet wrote:
>> +					for (aux = rt_hash_table[i].chain;;) {
>> +						if (aux == rth) {
>> +							length += ONE;
>> +							break;
>> +						}
>> +						if (compare_hash_inputs(&aux->fl, &rth->fl))
>> +							break;
>> +						aux = aux->u.dst.rt_next;
>> +					}
> 
> Very "interesting" for() usage, but isn't it more readable like this?:
> 
> 					aux = rt_hash_table[i].chain;
> 					while (aux != rth) {
> 						if (compare_hash_inputs(&aux->fl, &rth->fl))
> 							break;
> 						aux = aux->u.dst.rt_next;
> 					}

well, this test is done two times, this is the difference...

> 
> 					if (aux == rth)
> 						length += ONE;
> 
> Jarek P.

I first wrote :

					for (aux = rt_hash_table[i].chain ; ; aux = aux->u.dst.rt_next) {
						if (aux == rth) {
							length += ONE;
							break;
						}
						if (compare_hash_inputs(&aux->fl, &rth->fl))
							break;
					}

but had to split the too long line, so ended in the form in the patch :)


Thank you



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix rtable leak in net/ipv4/route.c
  2009-05-20 11:13                   ` Eric Dumazet
@ 2009-05-20 11:37                     ` Jarek Poplawski
  0 siblings, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2009-05-20 11:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, nhorman, lav, shemminger, netdev

On Wed, May 20, 2009 at 01:13:41PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On Wed, May 20, 2009 at 08:14:28AM +0200, Eric Dumazet wrote:
> >> +					for (aux = rt_hash_table[i].chain;;) {
> >> +						if (aux == rth) {
> >> +							length += ONE;
> >> +							break;
> >> +						}
> >> +						if (compare_hash_inputs(&aux->fl, &rth->fl))
> >> +							break;
> >> +						aux = aux->u.dst.rt_next;
> >> +					}
> > 
> > Very "interesting" for() usage, but isn't it more readable like this?:
> > 
> > 					aux = rt_hash_table[i].chain;
> > 					while (aux != rth) {
> > 						if (compare_hash_inputs(&aux->fl, &rth->fl))
> > 							break;
> > 						aux = aux->u.dst.rt_next;
> > 					}
> 
> well, this test is done two times, this is the difference...

I know, but I guess this is used quite often. And probably it's not
very hard optimization for a compiler (while - else). As a matter of
fact even this would confuse me less here:
					aux = rt_hash_table[i].chain;
					for (;;) {

But of course, it's a matter of taste, so no big deal.

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix length computation in rt_check_expire()
  2009-05-20 10:27               ` [PATCH] net: fix length computation in rt_check_expire() Neil Horman
@ 2009-05-21  0:19                 ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2009-05-21  0:19 UTC (permalink / raw)
  To: nhorman; +Cc: dada1, jarkao2, lav, shemminger, netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 20 May 2009 06:27:54 -0400

> On Wed, May 20, 2009 at 06:54:22AM +0200, Eric Dumazet wrote:
>> [PATCH] net: fix length computation in rt_check_expire()
>> 
>> rt_check_expire() computes average and standard deviation of chain lengths,
>> but not correclty reset length to 0 at beginning of each chain.
>> This probably gives overflows for sum2 (and sum) on loaded machines instead 
>> of meaningful results.
>> 
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net: fix rtable leak in net/ipv4/route.c
  2009-05-20 10:48                 ` Neil Horman
@ 2009-05-21  0:19                   ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2009-05-21  0:19 UTC (permalink / raw)
  To: nhorman; +Cc: dada1, jarkao2, lav, shemminger, netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 20 May 2009 06:48:38 -0400

> On Wed, May 20, 2009 at 08:14:28AM +0200, Eric Dumazet wrote:
>> [PATCH] net: fix rtable leak in net/ipv4/route.c
 ...
>> Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 

Also applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-05-21  0:19 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-19  2:35 Fw: [Bug 13339] New: rtable leak in ipv4/route.c Stephen Hemminger
2009-05-19 12:34 ` Jarek Poplawski
2009-05-19 15:12   ` Neil Horman
2009-05-19 15:32   ` Eric Dumazet
2009-05-19 16:20     ` Neil Horman
2009-05-19 18:47       ` Eric Dumazet
2009-05-19 19:24         ` Neil Horman
2009-05-19 22:05           ` David Miller
2009-05-19 23:05             ` Neil Horman
2009-05-20  4:54             ` [PATCH] net: fix length computation in rt_check_expire() Eric Dumazet
2009-05-20  6:13               ` David Miller
2009-05-20  6:14               ` [PATCH] net: fix rtable leak in net/ipv4/route.c Eric Dumazet
2009-05-20 10:03                 ` Jarek Poplawski
2009-05-20 11:13                   ` Eric Dumazet
2009-05-20 11:37                     ` Jarek Poplawski
2009-05-20 10:48                 ` Neil Horman
2009-05-21  0:19                   ` David Miller
2009-05-20 10:27               ` [PATCH] net: fix length computation in rt_check_expire() Neil Horman
2009-05-21  0:19                 ` David Miller
2009-05-19 16:23     ` Fw: [Bug 13339] New: rtable leak in ipv4/route.c Neil Horman
2009-05-19 17:17       ` Jarek Poplawski
2009-05-19 17:45         ` Neil Horman
2009-05-19 17:53           ` Jarek Poplawski
2009-05-19 18:05           ` Jarek Poplawski
2009-05-19 18:16             ` Neil Horman
2009-05-20  6:36               ` Alexander V. Lukyanov
2009-05-19 17:47         ` Jarek Poplawski
2009-05-19 17:22     ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).