From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Dichtel Subject: Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Date: Tue, 28 Sep 2010 18:45:58 +0200 Message-ID: <4CA21BC6.5070300@6wind.com> References: <4CA208C9.1020800@6wind.com> <1285691629.3154.80.camel@edumazet-laptop> Reply-To: nicolas.dichtel@6wind.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev , Octavian Purdila To: Eric Dumazet Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:39123 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754534Ab0I1QqE (ORCPT ); Tue, 28 Sep 2010 12:46:04 -0400 Received: by wyb28 with SMTP id 28so5156617wyb.19 for ; Tue, 28 Sep 2010 09:46:02 -0700 (PDT) In-Reply-To: <1285691629.3154.80.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: Eric Dumazet wrote: > Le mardi 28 septembre 2010 =C3=A0 17:24 +0200, Nicolas Dichtel a =C3=A9= crit : >> Hi, >> >> I face a problem when I try to remove an interface,=20 >> netdev_wait_allrefs() complains about refcount. >> >> Here is a trivial scenario to reproduce the problem: >> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev et= h0 >> # ./a.out tunl1 >> # ip tunnel del tunl1 >> >> Note: a.out binary create an IPv4 raw socket, attach it to tunl1=20 >> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the=20 >> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header=20 >> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.1= 8). >> >> Note2: when a.out is executed, tunl1 has no ip address and is down. >> >=20 > CC Octavian Purdila, the patch author. >=20 > I am just wondering why this route is created in the first place. At first, I asked myself the same question, but it seems that this is=20 allowed to send a packet through this kind of socket, even if interface= =20 is down. Packet will be destroyed by the noop qdisk. But I agree that it is strange to perform route lookup and everything t= o=20 destroy the packet at the end ... Maybe raw_sendmsg() can delete it directly ;-) ... or maybe=20 ip_route_output_flow(). Any suggestions welcome. Regards, Nicolas >=20 > Maybe a fix would be to forbid this ? >=20 > Some machines have a giant route cache, so its very important to avoi= d > expensive scans. >=20 >> Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice= :=20 >> waiting for tunl1 to become free. Usage count =3D 3" and after some = time,=20 >> interface is removed. >> >> The problem is that route cache entries are only invalidate on=20 >> UNREGISTER event, and not removed (introduced by commit=20 >> e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that=20 >> rt_check_expire() remove the remaining route cache entries. >> >> To fix the problem, I propose to remove a part of the previous commi= t. >> >> Regards, >> Nicolas >> pi=C3=A8ce jointe diff=C3=A9rences entre fichiers >> (0001-ipv4-remove-all-rt-cache-entries-on-UNREGISTER-even.patch) >> From 3344e2e0431fe803c4dac8757a8746908357d780 Mon Sep 17 00:00:00 20= 01 >> From: Nicolas Dichtel >> Date: Tue, 28 Sep 2010 16:38:19 +0200 >> Subject: [PATCH] ipv4: remove all rt cache entries on UNREGISTER eve= nt >> >> Commit e2ce146848c81af2f6d42e67990191c284bf0c33 (ipv4: factorize cac= he clearing >> for batched unregister operations) add a new parameter to fib_disabl= e_ip() to >> only invalidate route cache entries on unregister event. >> This is wrong, we should ensure that all cache entries are removed o= n >> unregister event, else netdev_wait_allrefs() may complain. A cache e= ntry >> can be created between event DOWN and UNREGISTER. >> >> So, I revert a part of the patch. >> >> Signed-off-by: Nicolas Dichtel >> --- >> net/ipv4/fib_frontend.c | 10 +++++----- >> 1 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c >> index 7d02a9f..377e815 100644 >> --- a/net/ipv4/fib_frontend.c >> +++ b/net/ipv4/fib_frontend.c >> @@ -917,11 +917,11 @@ static void nl_fib_lookup_exit(struct net *net= ) >> net->ipv4.fibnl =3D NULL; >> } >> =20 >> -static void fib_disable_ip(struct net_device *dev, int force, int d= elay) >> +static void fib_disable_ip(struct net_device *dev, int force) >> { >> if (fib_sync_down_dev(dev, force)) >> fib_flush(dev_net(dev)); >> - rt_cache_flush(dev_net(dev), delay); >> + rt_cache_flush(dev_net(dev), 0); >> arp_ifdown(dev); >> } >> =20 >> @@ -944,7 +944,7 @@ static int fib_inetaddr_event(struct notifier_bl= ock *this, unsigned long event, >> /* Last address was deleted from this interface. >> Disable IP. >> */ >> - fib_disable_ip(dev, 1, 0); >> + fib_disable_ip(dev, 1); >> } else { >> rt_cache_flush(dev_net(dev), -1); >> } >> @@ -959,7 +959,7 @@ static int fib_netdev_event(struct notifier_bloc= k *this, unsigned long event, vo >> struct in_device *in_dev =3D __in_dev_get_rtnl(dev); >> =20 >> if (event =3D=3D NETDEV_UNREGISTER) { >> - fib_disable_ip(dev, 2, -1); >> + fib_disable_ip(dev, 2); >> return NOTIFY_DONE; >> } >> =20 >> @@ -977,7 +977,7 @@ static int fib_netdev_event(struct notifier_bloc= k *this, unsigned long event, vo >> rt_cache_flush(dev_net(dev), -1); >> break; >> case NETDEV_DOWN: >> - fib_disable_ip(dev, 0, 0); >> + fib_disable_ip(dev, 0); >> break; >> case NETDEV_CHANGEMTU: >> case NETDEV_CHANGE: >=20 >=20