From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c Date: Tue, 19 May 2009 12:23:30 -0400 Message-ID: <20090519162330.GC28034@hmsreliant.think-freely.org> References: <20090519123417.GA7376@ff.dom.local> <4A12D10D.3000504@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jarek Poplawski , lav@yar.ru, Stephen Hemminger , netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from charlotte.tuxdriver.com ([70.61.120.58]:60057 "EHLO smtp.tuxdriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753544AbZESQXm (ORCPT ); Tue, 19 May 2009 12:23:42 -0400 Content-Disposition: inline In-Reply-To: <4A12D10D.3000504@cosmosbay.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote: > Jarek Poplawski a =E9crit : > > On 19-05-2009 04:35, Stephen Hemminger wrote: > >> Begin forwarded message: > >> > >> Date: Mon, 18 May 2009 14:10:20 GMT > >> From: bugzilla-daemon@bugzilla.kernel.org > >> To: shemminger@linux-foundation.org > >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c > >> > >> > >> http://bugzilla.kernel.org/show_bug.cgi?id=3D13339 > > ... > >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfor= tunately the > >> patch has at least one critical flaw, and another problem. > >> > >> rt_intern_hash calculates rthi pointer, which is later used for ne= w entry > >> insertion. The same loop calculates cand pointer which is used to = clean the > >> list. If the pointers are the same, rtable leak occurs, as first t= he cand is > >> removed then the new entry is appended to it. > >> > >> This leak leads to unregister_netdevice problem (usage count > 0). > >> > >> Another problem of the patch is that it tries to insert the entrie= s in certain > >> order, to facilitate counting of entries distinct by all but QoS p= arameters. > >> Unfortunately, referencing an existing rtable entry moves it to li= st beginning, > >> to speed up further lookups, so the carefully built order is destr= oyed. >=20 > We could change rt_check_expire() to be smarter and handle any order = in chains. >=20 > This would let rt_intern_hash() be simpler. >=20 > As its a more performance critical path, all would be good :) >=20 > >> > >> For the first problem the simplest patch it to set rthi=3D0 when r= thi=3D=3Dcand, but > >> it will also destroy the ordering. > >=20 > > I think fixing this bug fast is more important than this > > ordering or counting. Could you send your patch proposal? > >=20 >=20 Of course, it helps if I attach the patch :) diff --git a/include/net/dst.h b/include/net/dst.h index 6be3b08..a39db6d 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -47,6 +47,7 @@ struct dst_entry #define DST_NOXFRM 2 #define DST_NOPOLICY 4 #define DST_NOHASH 8 +#define DST_GRPLDR 16 unsigned long expires; =20 unsigned short header_len; /* more space at head required */ diff --git a/net/ipv4/route.c b/net/ipv4/route.c index c4c60e9..0120f0e 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void) =20 static inline void rt_free(struct rtable *rt) { + if (rt->u.dst.flags & DST_GRPLDR) + rt->u.dst.rt_next->u.dst.flag |=3D DST_GRPLDR; call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free); } =20 @@ -1143,8 +1145,11 @@ restart: * relvant to the hash function together, which we use to adjust * our chain length */ - if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl)) + if (!*rthi && *rthp &&=20 + compare_hash_inputs(&(*rthp)->fl, &rt->fl) && + (cand !=3D rth)) rthi =3D rth; + } } =20 if (cand) { @@ -1205,11 +1210,14 @@ restart: } } =20 - if (rthi) + if (rthi) { rt->u.dst.rt_next =3D rthi->u.dst.rt_next; - else + rthi->u.dst.flags &=3D ~(DST_GRPLDR); + } else rt->u.dst.rt_next =3D rt_hash_table[hash].chain; =20 + rt->u.dst.flags |=3D DST_GRPLDR; + #if RT_CACHE_DEBUG >=3D 2 if (rt->u.dst.rt_next) { struct rtable *trt;