* Re: [RFC] net: ipv4 -- Introduce ifa limit per net [not found] ` <20160309204158.GO2207@uranus.lan> @ 2016-03-09 20:47 ` David Miller 2016-03-09 20:57 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2016-03-09 20:47 UTC (permalink / raw) To: gorcunov Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Wed, 9 Mar 2016 23:41:58 +0300 > On Wed, Mar 09, 2016 at 03:27:30PM -0500, David Miller wrote: >> > >> > Yes. I can drop it off for a while and run tests without it, >> > then turn it back and try again. Would you like to see such >> > numbers? >> >> That would be very helpful, yes. > > Just sent out. Take a look please. Indeed it sits inside get_next_corpse > a lot. And now I think I've to figure out where we can optimize it. > Continue tomorrow. The problem is that the masquerading code flushes the entire conntrack table once for _every_ address removed. The code path is: masq_device_event() if (event == NETDEV_DOWN) { /* Device was downed. Search entire table for * conntracks which were associated with that device, * and forget them. */ NF_CT_ASSERT(dev->ifindex != 0); nf_ct_iterate_cleanup(net, device_cmp, (void *)(long)dev->ifindex, 0, 0); So if you have a million IP addresses, this flush happens a million times on inetdev destroy. Part of the problem is that we emit NETDEV_DOWN inetdev notifiers per address removed, instead of once per inetdev destroy. Maybe if we put some boolean state into the inetdev, we could make sure we did this flush only once time while inetdev->dead = 1. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-09 20:47 ` [RFC] net: ipv4 -- Introduce ifa limit per net David Miller @ 2016-03-09 20:57 ` Cyrill Gorcunov 2016-03-09 21:10 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-09 20:57 UTC (permalink / raw) To: David Miller Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Wed, Mar 09, 2016 at 03:47:25PM -0500, David Miller wrote: > From: Cyrill Gorcunov <gorcunov@gmail.com> > Date: Wed, 9 Mar 2016 23:41:58 +0300 > > > On Wed, Mar 09, 2016 at 03:27:30PM -0500, David Miller wrote: > >> > > >> > Yes. I can drop it off for a while and run tests without it, > >> > then turn it back and try again. Would you like to see such > >> > numbers? > >> > >> That would be very helpful, yes. > > > > Just sent out. Take a look please. Indeed it sits inside get_next_corpse > > a lot. And now I think I've to figure out where we can optimize it. > > Continue tomorrow. > > The problem is that the masquerading code flushes the entire conntrack > table once for _every_ address removed. > > The code path is: > > masq_device_event() > if (event == NETDEV_DOWN) { > /* Device was downed. Search entire table for > * conntracks which were associated with that device, > * and forget them. > */ > NF_CT_ASSERT(dev->ifindex != 0); > > nf_ct_iterate_cleanup(net, device_cmp, > (void *)(long)dev->ifindex, 0, 0); > > So if you have a million IP addresses, this flush happens a million times > on inetdev destroy. > > Part of the problem is that we emit NETDEV_DOWN inetdev notifiers per > address removed, instead of once per inetdev destroy. > > Maybe if we put some boolean state into the inetdev, we could make sure > we did this flush only once time while inetdev->dead = 1. Aha! So in your patch __inet_del_ifa bypass first blocking_notifier_call_chain __inet_del_ifa ... if (in_dev->dead) goto no_promotions; // First call to NETDEV_DOWN ... no_promotions: rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid); blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); and here we call for NETDEV_DOWN, which then hits masq_device_event and go further to conntrack code. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-09 20:57 ` Cyrill Gorcunov @ 2016-03-09 21:10 ` David Miller 2016-03-09 21:16 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2016-03-09 21:10 UTC (permalink / raw) To: gorcunov Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Wed, 9 Mar 2016 23:57:47 +0300 > Aha! So in your patch __inet_del_ifa bypass first blocking_notifier_call_chain > > __inet_del_ifa > ... > if (in_dev->dead) > goto no_promotions; > > // First call to NETDEV_DOWN > ... > no_promotions: > rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid); > blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); > > and here we call for NETDEV_DOWN, which then hits masq_device_event > and go further to conntrack code. Yes that's where the notifier comes from, which happens with or without my patch. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-09 21:10 ` David Miller @ 2016-03-09 21:16 ` Cyrill Gorcunov 2016-03-10 10:20 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-09 21:16 UTC (permalink / raw) To: David Miller Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Wed, Mar 09, 2016 at 04:10:38PM -0500, David Miller wrote: > > > > and here we call for NETDEV_DOWN, which then hits masq_device_event > > and go further to conntrack code. > > Yes that's where the notifier comes from, which happens with or without > my patch. Thanks for explanation, Dave! I'll continue on this task tomorrow tryin to implement optimization you proposed. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-09 21:16 ` Cyrill Gorcunov @ 2016-03-10 10:20 ` Cyrill Gorcunov 2016-03-10 11:03 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 10:20 UTC (permalink / raw) To: David Miller, alexei.starovoitov, eric.dumazet Cc: netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: > > Thanks for explanation, Dave! I'll continue on this task tomorrow > tryin to implement optimization you proposed. OK, here are the results for the preliminary patch with conntrack running --- [root@s125 ~]# ./exploit.sh START 4 addresses STOP 1457604516 1457604518 START 2704 addresses STOP 1457604520 1457604521 START 10404 addresses STOP 1457604533 1457604534 START 23104 addresses STOP 1457604566 1457604567 START 40804 addresses STOP 1457604642 1457604643 START 63504 addresses STOP 1457604809 1457604810 It takes ~1 second for each case, which is great. --- net/ipv4/devinet.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) Index: linux-ml.git/net/ipv4/devinet.c =================================================================== --- linux-ml.git.orig/net/ipv4/devinet.c +++ linux-ml.git/net/ipv4/devinet.c @@ -403,7 +403,18 @@ no_promotions: So that, this order is correct. */ rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid); - blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); + + if (!in_dev->dead || (in_dev->dead && !ifa1->ifa_next)) { + /* + * We might be destroying device with millions + * of addresses assigned, so we need to call + * the notifier on the last pass only, otherwise + * conntack code (if kernel supports) gonna scan + * on each cleanup iteration wasting huge amount + * of time. + */ + blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); + } if (promote) { struct in_ifaddr *next_sec = promote->ifa_next; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 10:20 ` Cyrill Gorcunov @ 2016-03-10 11:03 ` Cyrill Gorcunov 2016-03-10 15:09 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 11:03 UTC (permalink / raw) To: David Miller, alexei.starovoitov, eric.dumazet Cc: netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 01:20:18PM +0300, Cyrill Gorcunov wrote: > On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: > > > > Thanks for explanation, Dave! I'll continue on this task tomorrow > > tryin to implement optimization you proposed. > > OK, here are the results for the preliminary patch with conntrack running ... > net/ipv4/devinet.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > Index: linux-ml.git/net/ipv4/devinet.c > =================================================================== > --- linux-ml.git.orig/net/ipv4/devinet.c > +++ linux-ml.git/net/ipv4/devinet.c > @@ -403,7 +403,18 @@ no_promotions: > So that, this order is correct. > */ This patch is wrong, so drop it please. I'll do another. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 11:03 ` Cyrill Gorcunov @ 2016-03-10 15:09 ` Cyrill Gorcunov 2016-03-10 18:01 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 15:09 UTC (permalink / raw) To: David Miller, alexei.starovoitov, eric.dumazet Cc: netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 02:03:24PM +0300, Cyrill Gorcunov wrote: > On Thu, Mar 10, 2016 at 01:20:18PM +0300, Cyrill Gorcunov wrote: > > On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: > > > > > > Thanks for explanation, Dave! I'll continue on this task tomorrow > > > tryin to implement optimization you proposed. > > > > OK, here are the results for the preliminary patch with conntrack running > ... > > net/ipv4/devinet.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > Index: linux-ml.git/net/ipv4/devinet.c > > =================================================================== > > --- linux-ml.git.orig/net/ipv4/devinet.c > > +++ linux-ml.git/net/ipv4/devinet.c > > @@ -403,7 +403,18 @@ no_promotions: > > So that, this order is correct. > > */ > > This patch is wrong, so drop it please. I'll do another. Here I think is a better variant. The resulst are good enough -- 1 sec for cleanup. Does the patch look sane? --- net/ipv4/netfilter/nf_nat_masquerade_ipv4.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) Index: linux-ml.git/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c =================================================================== --- linux-ml.git.orig/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ linux-ml.git/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -108,9 +108,22 @@ static int masq_inet_event(struct notifi unsigned long event, void *ptr) { - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; + struct in_ifaddr *ifa = ptr; + struct net_device *dev = ifa->ifa_dev->dev; struct netdev_notifier_info info; + if (event == NETDEV_DOWN) { + /* + * When we meet dead device which is + * being released with dozeon of addresses + * assigned -- we can optimize calls + * to conntrack cleanups and do it only + * once. + */ + if (ifa->ifa_dev->dead && ifa->ifa_next) + return NOTIFY_DONE; + } + netdev_notifier_info_init(&info, dev); return masq_device_event(this, event, &info); } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 15:09 ` Cyrill Gorcunov @ 2016-03-10 18:01 ` David Miller 2016-03-10 18:48 ` Cyrill Gorcunov 2016-03-10 19:02 ` Cong Wang 0 siblings, 2 replies; 27+ messages in thread From: David Miller @ 2016-03-10 18:01 UTC (permalink / raw) To: gorcunov Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Thu, 10 Mar 2016 18:09:20 +0300 > On Thu, Mar 10, 2016 at 02:03:24PM +0300, Cyrill Gorcunov wrote: >> On Thu, Mar 10, 2016 at 01:20:18PM +0300, Cyrill Gorcunov wrote: >> > On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: >> > > >> > > Thanks for explanation, Dave! I'll continue on this task tomorrow >> > > tryin to implement optimization you proposed. >> > >> > OK, here are the results for the preliminary patch with conntrack running >> ... >> > net/ipv4/devinet.c | 13 ++++++++++++- >> > 1 file changed, 12 insertions(+), 1 deletion(-) >> > >> > Index: linux-ml.git/net/ipv4/devinet.c >> > =================================================================== >> > --- linux-ml.git.orig/net/ipv4/devinet.c >> > +++ linux-ml.git/net/ipv4/devinet.c >> > @@ -403,7 +403,18 @@ no_promotions: >> > So that, this order is correct. >> > */ >> >> This patch is wrong, so drop it please. I'll do another. > > Here I think is a better variant. The resulst are good > enough -- 1 sec for cleanup. Does the patch look sane? I'm tempted to say that we should provide these notifier handlers with the information they need, explicitly, to handle this case. Most intdev notifiers actually want to know the individual addresses that get removed, one by one. That's handled by the existing NETDEV_DOWN event and the ifa we pass to that. But some, like this netfilter masq case, would be satisfied with a single event that tells them the whole inetdev instance is being torn down. Which is the case we care about here. We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so maybe we could use that. And that is consistent with the core netdev notifier that triggers this call chain in the first place. Roughly, something like this: diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 8c3df2c..6eee5cb 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -292,6 +292,11 @@ static void inetdev_destroy(struct in_device *in_dev) in_dev->dead = 1; + if (in_dev->ifa_list) + blocking_notifier_call_chain(&inetaddr_chain, + NETDEV_UNREGISTER, + in_dev->ifa_list); + ip_mc_destroy_dev(in_dev); while ((ifa = in_dev->ifa_list) != NULL) { diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c index c6eb421..1bb8026 100644 --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -111,6 +111,10 @@ static int masq_inet_event(struct notifier_block *this, struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; struct netdev_notifier_info info; + if (event != NETDEV_UNREGISTER) + return NOTIFY_DONE; + event = NETDEV_DOWN; + netdev_notifier_info_init(&info, dev); return masq_device_event(this, event, &info); } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 18:01 ` David Miller @ 2016-03-10 18:48 ` Cyrill Gorcunov 2016-03-10 19:02 ` Cong Wang 1 sibling, 0 replies; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 18:48 UTC (permalink / raw) To: David Miller Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 01:01:38PM -0500, David Miller wrote: > From: Cyrill Gorcunov <gorcunov@gmail.com> > Date: Thu, 10 Mar 2016 18:09:20 +0300 > > > On Thu, Mar 10, 2016 at 02:03:24PM +0300, Cyrill Gorcunov wrote: > >> On Thu, Mar 10, 2016 at 01:20:18PM +0300, Cyrill Gorcunov wrote: > >> > On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: > >> > > > >> > > Thanks for explanation, Dave! I'll continue on this task tomorrow > >> > > tryin to implement optimization you proposed. > >> > > >> > OK, here are the results for the preliminary patch with conntrack running > >> ... > >> > net/ipv4/devinet.c | 13 ++++++++++++- > >> > 1 file changed, 12 insertions(+), 1 deletion(-) > >> > > >> > Index: linux-ml.git/net/ipv4/devinet.c > >> > =================================================================== > >> > --- linux-ml.git.orig/net/ipv4/devinet.c > >> > +++ linux-ml.git/net/ipv4/devinet.c > >> > @@ -403,7 +403,18 @@ no_promotions: > >> > So that, this order is correct. > >> > */ > >> > >> This patch is wrong, so drop it please. I'll do another. > > > > Here I think is a better variant. The resulst are good > > enough -- 1 sec for cleanup. Does the patch look sane? > > I'm tempted to say that we should provide these notifier handlers with > the information they need, explicitly, to handle this case. > > Most intdev notifiers actually want to know the individual addresses > that get removed, one by one. That's handled by the existing > NETDEV_DOWN event and the ifa we pass to that. > > But some, like this netfilter masq case, would be satisfied with a > single event that tells them the whole inetdev instance is being torn > down. Which is the case we care about here. > > We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so > maybe we could use that. > > And that is consistent with the core netdev notifier that triggers > this call chain in the first place. > > Roughly, something like this: I see. Dave, gimme some time to test but I'm sure it'll work. I don't have some strong opinion here, so your patch looks pretty fine to me. But maybe people from netdev camp have some other ideas. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 18:01 ` David Miller 2016-03-10 18:48 ` Cyrill Gorcunov @ 2016-03-10 19:02 ` Cong Wang 2016-03-10 19:55 ` David Miller 1 sibling, 1 reply; 27+ messages in thread From: Cong Wang @ 2016-03-10 19:02 UTC (permalink / raw) To: David Miller Cc: Cyrill Gorcunov, Alexei Starovoitov, Eric Dumazet, Linux Kernel Network Developers, solar, Vasily Averin, avagin, xemul, vdavydov, khorenko, Pablo Neira Ayuso, netfilter-devel On Thu, Mar 10, 2016 at 10:01 AM, David Miller <davem@davemloft.net> wrote: > I'm tempted to say that we should provide these notifier handlers with > the information they need, explicitly, to handle this case. > > Most intdev notifiers actually want to know the individual addresses > that get removed, one by one. That's handled by the existing > NETDEV_DOWN event and the ifa we pass to that. > > But some, like this netfilter masq case, would be satisfied with a > single event that tells them the whole inetdev instance is being torn > down. Which is the case we care about here. > > We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so > maybe we could use that. > > And that is consistent with the core netdev notifier that triggers > this call chain in the first place. > > Roughly, something like this: > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index 8c3df2c..6eee5cb 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -292,6 +292,11 @@ static void inetdev_destroy(struct in_device *in_dev) > > in_dev->dead = 1; > > + if (in_dev->ifa_list) > + blocking_notifier_call_chain(&inetaddr_chain, > + NETDEV_UNREGISTER, > + in_dev->ifa_list); > + > ip_mc_destroy_dev(in_dev); Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER is happening and masq already registers a netdev notifier... > > while ((ifa = in_dev->ifa_list) != NULL) { > diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > index c6eb421..1bb8026 100644 > --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > @@ -111,6 +111,10 @@ static int masq_inet_event(struct notifier_block *this, > struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; > struct netdev_notifier_info info; > > + if (event != NETDEV_UNREGISTER) > + return NOTIFY_DONE; > + event = NETDEV_DOWN; > + > netdev_notifier_info_init(&info, dev); > return masq_device_event(this, event, &info); > } If masq really doesn't care about inetdev destroy or inetaddr removal, we should just remove its inetaddr notifier. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 19:02 ` Cong Wang @ 2016-03-10 19:55 ` David Miller 2016-03-10 20:01 ` Cyrill Gorcunov 2016-03-10 21:09 ` Cong Wang 0 siblings, 2 replies; 27+ messages in thread From: David Miller @ 2016-03-10 19:55 UTC (permalink / raw) To: xiyou.wangcong Cc: gorcunov, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cong Wang <xiyou.wangcong@gmail.com> Date: Thu, 10 Mar 2016 11:02:28 -0800 > On Thu, Mar 10, 2016 at 10:01 AM, David Miller <davem@davemloft.net> wrote: >> I'm tempted to say that we should provide these notifier handlers with >> the information they need, explicitly, to handle this case. >> >> Most intdev notifiers actually want to know the individual addresses >> that get removed, one by one. That's handled by the existing >> NETDEV_DOWN event and the ifa we pass to that. >> >> But some, like this netfilter masq case, would be satisfied with a >> single event that tells them the whole inetdev instance is being torn >> down. Which is the case we care about here. >> >> We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so >> maybe we could use that. >> >> And that is consistent with the core netdev notifier that triggers >> this call chain in the first place. >> >> Roughly, something like this: >> >> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c >> index 8c3df2c..6eee5cb 100644 >> --- a/net/ipv4/devinet.c >> +++ b/net/ipv4/devinet.c >> @@ -292,6 +292,11 @@ static void inetdev_destroy(struct in_device *in_dev) >> >> in_dev->dead = 1; >> >> + if (in_dev->ifa_list) >> + blocking_notifier_call_chain(&inetaddr_chain, >> + NETDEV_UNREGISTER, >> + in_dev->ifa_list); >> + >> ip_mc_destroy_dev(in_dev); > > > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER > is happening and masq already registers a netdev notifier... Indeed, good catch. Therefore: 1) Keep the masq netdev notifier. That will flush the conntrack table for the inetdev_destroy event. 2) Make the inetdev notifier only do something if inetdev->dead is false. (ie. we are flushing an individual address) And then we don't need the NETDEV_UNREGISTER thing at all: diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c index c6eb421..f71841a 100644 --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, unsigned long event, void *ptr) { - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; struct netdev_notifier_info info; + struct in_ifaddr *ifa = ptr; + struct in_device *idev; - netdev_notifier_info_init(&info, dev); + /* The masq_dev_notifier will catch the case of the device going + * down. So if the inetdev is dead and being destroyed we have + * no work to do. Otherwise this is an individual address removal + * and we have to perform the flush. + */ + idev = ifa->ifa_dev; + if (idev->dead) + return NOTIFY_DONE; + + netdev_notifier_info_init(&info, idev->dev); return masq_device_event(this, event, &info); } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 19:55 ` David Miller @ 2016-03-10 20:01 ` Cyrill Gorcunov 2016-03-10 20:03 ` David Miller 2016-03-10 21:09 ` Cong Wang 1 sibling, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 20:01 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: > > > > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER > > is happening and masq already registers a netdev notifier... > > Indeed, good catch. Therefore: > > 1) Keep the masq netdev notifier. That will flush the conntrack table > for the inetdev_destroy event. > > 2) Make the inetdev notifier only do something if inetdev->dead is > false. (ie. we are flushing an individual address) > > And then we don't need the NETDEV_UNREGISTER thing at all: > > diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > index c6eb421..f71841a 100644 > --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, > unsigned long event, > void *ptr) > { > - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; > struct netdev_notifier_info info; > + struct in_ifaddr *ifa = ptr; > + struct in_device *idev; > > - netdev_notifier_info_init(&info, dev); > + /* The masq_dev_notifier will catch the case of the device going > + * down. So if the inetdev is dead and being destroyed we have > + * no work to do. Otherwise this is an individual address removal > + * and we have to perform the flush. > + */ > + idev = ifa->ifa_dev; > + if (idev->dead) > + return NOTIFY_DONE; > + > + netdev_notifier_info_init(&info, idev->dev); > return masq_device_event(this, event, &info); > } Guys, I'm lost. Currently masq_device_event calls for conntrack cleanup with device index, so that once device is going down, the appropriate conntracks gonna be dropped off. Now if device is dead nobody will cleanup the conntracks? Cyrill ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 20:01 ` Cyrill Gorcunov @ 2016-03-10 20:03 ` David Miller 2016-03-10 20:13 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2016-03-10 20:03 UTC (permalink / raw) To: gorcunov Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Thu, 10 Mar 2016 23:01:34 +0300 > On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: >> > >> > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER >> > is happening and masq already registers a netdev notifier... >> >> Indeed, good catch. Therefore: >> >> 1) Keep the masq netdev notifier. That will flush the conntrack table >> for the inetdev_destroy event. >> >> 2) Make the inetdev notifier only do something if inetdev->dead is >> false. (ie. we are flushing an individual address) >> >> And then we don't need the NETDEV_UNREGISTER thing at all: >> >> diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> index c6eb421..f71841a 100644 >> --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, >> unsigned long event, >> void *ptr) >> { >> - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; >> struct netdev_notifier_info info; >> + struct in_ifaddr *ifa = ptr; >> + struct in_device *idev; >> >> - netdev_notifier_info_init(&info, dev); >> + /* The masq_dev_notifier will catch the case of the device going >> + * down. So if the inetdev is dead and being destroyed we have >> + * no work to do. Otherwise this is an individual address removal >> + * and we have to perform the flush. >> + */ >> + idev = ifa->ifa_dev; >> + if (idev->dead) >> + return NOTIFY_DONE; >> + >> + netdev_notifier_info_init(&info, idev->dev); >> return masq_device_event(this, event, &info); >> } > > Guys, I'm lost. Currently masq_device_event calls for conntrack > cleanup with device index, so that once device is going down, the > appropriate conntracks gonna be dropped off. Now if device is dead > nobody will cleanup the conntracks? Both notifiers are run in the inetdev_destroy() case. Maybe that's what you are missing. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 20:03 ` David Miller @ 2016-03-10 20:13 ` Cyrill Gorcunov 2016-03-10 20:19 ` Cyrill Gorcunov 2016-03-10 21:05 ` David Miller 0 siblings, 2 replies; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 20:13 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 03:03:11PM -0500, David Miller wrote: > From: Cyrill Gorcunov <gorcunov@gmail.com> > Date: Thu, 10 Mar 2016 23:01:34 +0300 > > > On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: > >> > > >> > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER > >> > is happening and masq already registers a netdev notifier... > >> > >> Indeed, good catch. Therefore: > >> > >> 1) Keep the masq netdev notifier. That will flush the conntrack table > >> for the inetdev_destroy event. > >> > >> 2) Make the inetdev notifier only do something if inetdev->dead is > >> false. (ie. we are flushing an individual address) > >> > >> And then we don't need the NETDEV_UNREGISTER thing at all: > >> > >> diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> index c6eb421..f71841a 100644 > >> --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, > >> unsigned long event, > >> void *ptr) > >> { > >> - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; > >> struct netdev_notifier_info info; > >> + struct in_ifaddr *ifa = ptr; > >> + struct in_device *idev; > >> > >> - netdev_notifier_info_init(&info, dev); > >> + /* The masq_dev_notifier will catch the case of the device going > >> + * down. So if the inetdev is dead and being destroyed we have > >> + * no work to do. Otherwise this is an individual address removal > >> + * and we have to perform the flush. > >> + */ > >> + idev = ifa->ifa_dev; > >> + if (idev->dead) > >> + return NOTIFY_DONE; > >> + > >> + netdev_notifier_info_init(&info, idev->dev); > >> return masq_device_event(this, event, &info); > >> } > > > > Guys, I'm lost. Currently masq_device_event calls for conntrack > > cleanup with device index, so that once device is going down, the > > appropriate conntracks gonna be dropped off. Now if device is dead > > nobody will cleanup the conntracks? > > Both notifiers are run in the inetdev_destroy() case. > > Maybe that's what you are missing. No :) Look, here is what I mean. Previously with your two patches we've been calling nf-cleanup for every address, so we had to make code call for cleanup for one time only. Now with the patch above the code flow is the following inetdev_destroy in_dev->dead = 1; ... inet_del_ifa ... blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); ... masq_inet_event ... masq_device_event if (idev->dead) return NOTIFY_DONE; and nobody calls for nf_ct_iterate_cleanup, no? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 20:13 ` Cyrill Gorcunov @ 2016-03-10 20:19 ` Cyrill Gorcunov 2016-03-10 21:05 ` David Miller 1 sibling, 0 replies; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 20:19 UTC (permalink / raw) To: David Miller, xiyou.wangcong Cc: alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 11:13:51PM +0300, Cyrill Gorcunov wrote: > > > > Both notifiers are run in the inetdev_destroy() case. > > > > Maybe that's what you are missing. > > No :) Look, here is what I mean. Previously with your two patches > we've been calling nf-cleanup for every address, so we had to make > code call for cleanup for one time only. Now with the patch above > the code flow is the following Ah, I'm idiot, drop the question. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 20:13 ` Cyrill Gorcunov 2016-03-10 20:19 ` Cyrill Gorcunov @ 2016-03-10 21:05 ` David Miller 2016-03-10 21:19 ` Cyrill Gorcunov 1 sibling, 1 reply; 27+ messages in thread From: David Miller @ 2016-03-10 21:05 UTC (permalink / raw) To: gorcunov Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Thu, 10 Mar 2016 23:13:51 +0300 > On Thu, Mar 10, 2016 at 03:03:11PM -0500, David Miller wrote: >> From: Cyrill Gorcunov <gorcunov@gmail.com> >> Date: Thu, 10 Mar 2016 23:01:34 +0300 >> >> > On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: >> >> > >> >> > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER >> >> > is happening and masq already registers a netdev notifier... >> >> >> >> Indeed, good catch. Therefore: >> >> >> >> 1) Keep the masq netdev notifier. That will flush the conntrack table >> >> for the inetdev_destroy event. >> >> >> >> 2) Make the inetdev notifier only do something if inetdev->dead is >> >> false. (ie. we are flushing an individual address) >> >> >> >> And then we don't need the NETDEV_UNREGISTER thing at all: >> >> >> >> diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> index c6eb421..f71841a 100644 >> >> --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, >> >> unsigned long event, >> >> void *ptr) >> >> { >> >> - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; >> >> struct netdev_notifier_info info; >> >> + struct in_ifaddr *ifa = ptr; >> >> + struct in_device *idev; >> >> >> >> - netdev_notifier_info_init(&info, dev); >> >> + /* The masq_dev_notifier will catch the case of the device going >> >> + * down. So if the inetdev is dead and being destroyed we have >> >> + * no work to do. Otherwise this is an individual address removal >> >> + * and we have to perform the flush. >> >> + */ >> >> + idev = ifa->ifa_dev; >> >> + if (idev->dead) >> >> + return NOTIFY_DONE; >> >> + >> >> + netdev_notifier_info_init(&info, idev->dev); >> >> return masq_device_event(this, event, &info); >> >> } >> > >> > Guys, I'm lost. Currently masq_device_event calls for conntrack >> > cleanup with device index, so that once device is going down, the >> > appropriate conntracks gonna be dropped off. Now if device is dead >> > nobody will cleanup the conntracks? >> >> Both notifiers are run in the inetdev_destroy() case. >> >> Maybe that's what you are missing. > > No :) Look, here is what I mean. Previously with your two patches > we've been calling nf-cleanup for every address, so we had to make > code call for cleanup for one time only. Now with the patch above > the code flow is the following > > inetdev_destroy > in_dev->dead = 1; > ... > inet_del_ifa > ... > blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); > ... > masq_inet_event > ... > masq_device_event > if (idev->dead) > return NOTIFY_DONE; > > and nobody calls for nf_ct_iterate_cleanup, no? Oh yes they do, from masq's non-inet notifier. masq registers two notifiers, one for generic netdev and one for inetdev. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 21:05 ` David Miller @ 2016-03-10 21:19 ` Cyrill Gorcunov 2016-03-10 21:59 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 21:19 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 04:05:21PM -0500, David Miller wrote: > > > > and nobody calls for nf_ct_iterate_cleanup, no? > > Oh yes they do, from masq's non-inet notifier. masq registers two > notifiers, one for generic netdev and one for inetdev. Thanks a huge David! I'll test it just to be sure. Cyrill ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 21:19 ` Cyrill Gorcunov @ 2016-03-10 21:59 ` Cyrill Gorcunov 2016-03-10 22:36 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 21:59 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Fri, Mar 11, 2016 at 12:19:45AM +0300, Cyrill Gorcunov wrote: > > > > Oh yes they do, from masq's non-inet notifier. masq registers two > > notifiers, one for generic netdev and one for inetdev. > > Thanks a huge David! I'll test it just to be sure. Works like a charm! So David, what are the next steps then? Mind to gather all your patches into one (maybe)? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 21:59 ` Cyrill Gorcunov @ 2016-03-10 22:36 ` David Miller 2016-03-10 22:40 ` Cyrill Gorcunov 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2016-03-10 22:36 UTC (permalink / raw) To: gorcunov Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Fri, 11 Mar 2016 00:59:59 +0300 > On Fri, Mar 11, 2016 at 12:19:45AM +0300, Cyrill Gorcunov wrote: >> > >> > Oh yes they do, from masq's non-inet notifier. masq registers two >> > notifiers, one for generic netdev and one for inetdev. >> >> Thanks a huge David! I'll test it just to be sure. > > Works like a charm! So David, what are the next steps then? > Mind to gather all your patches into one (maybe)? I'll re-review all of the changes tomorrow and also look into ipv6 masq, to see if it needs the same treatment, as well. Thanks for all of your help and testing so far. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 22:36 ` David Miller @ 2016-03-10 22:40 ` Cyrill Gorcunov 2016-03-11 20:40 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-10 22:40 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Thu, Mar 10, 2016 at 05:36:30PM -0500, David Miller wrote: > > > > Works like a charm! So David, what are the next steps then? > > Mind to gather all your patches into one (maybe)? > > I'll re-review all of the changes tomorrow and also look into ipv6 > masq, to see if it needs the same treatment, as well. > > Thanks for all of your help and testing so far. Thanks a lot, David! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 22:40 ` Cyrill Gorcunov @ 2016-03-11 20:40 ` David Miller 2016-03-11 20:58 ` Florian Westphal ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: David Miller @ 2016-03-11 20:40 UTC (permalink / raw) To: gorcunov Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Fri, 11 Mar 2016 01:40:56 +0300 > On Thu, Mar 10, 2016 at 05:36:30PM -0500, David Miller wrote: >> > >> > Works like a charm! So David, what are the next steps then? >> > Mind to gather all your patches into one (maybe)? >> >> I'll re-review all of the changes tomorrow and also look into ipv6 >> masq, to see if it needs the same treatment, as well. >> >> Thanks for all of your help and testing so far. > > Thanks a lot, David! Cyrill please retest this final patch and let me know if it still works properly. I looked at ipv6, and it's more complicated. The problem is that ipv6 doesn't mark the inet6dev object as dead in the NETDEV_DOWN case, in fact it keeps the object around. It only releases it and marks it dead in the NETDEV_UNREGISTER case. We pay a very large price for having allowed the behavior of ipv6 and ipv4 to diverge so greatly in these areas :-( Nevertheless we should try to fix it somehow, maybe we can detect the situation in another way for the ipv6 side. ==================== ipv4: Don't do expensive useless work during inetdev destroy. When an inetdev is destroyed, every address assigned to the interface is removed. And in this scenerio we do two pointless things which can be very expensive if the number of assigned interfaces is large: 1) Address promotion. We are deleting all addresses, so there is no point in doing this. 2) A full nf conntrack table purge for every address. We only need to do this once, as is already caught by the existing masq_dev_notifier so masq_inet_event() can skip this. Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index f6303b1..0212591 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -334,6 +334,9 @@ static void __inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap, ASSERT_RTNL(); + if (in_dev->dead) + goto no_promotions; + /* 1. Deleting primary ifaddr forces deletion all secondaries * unless alias promotion is set **/ @@ -380,6 +383,7 @@ static void __inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap, fib_del_ifaddr(ifa, ifa1); } +no_promotions: /* 2. Unlink it */ *ifap = ifa1->ifa_next; diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c index c6eb421..ea91058 100644 --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -108,10 +108,18 @@ static int masq_inet_event(struct notifier_block *this, unsigned long event, void *ptr) { - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; + struct in_device *idev = ((struct in_ifaddr *)ptr)->ifa_dev; struct netdev_notifier_info info; - netdev_notifier_info_init(&info, dev); + /* The masq_dev_notifier will catch the case of the device going + * down. So if the inetdev is dead and being destroyed we have + * no work to do. Otherwise this is an individual address removal + * and we have to perform the flush. + */ + if (idev->dead) + return NOTIFY_DONE; + + netdev_notifier_info_init(&info, idev->dev); return masq_device_event(this, event, &info); } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-11 20:40 ` David Miller @ 2016-03-11 20:58 ` Florian Westphal 2016-03-11 21:00 ` Cyrill Gorcunov 2016-03-11 21:22 ` Cyrill Gorcunov 2 siblings, 0 replies; 27+ messages in thread From: Florian Westphal @ 2016-03-11 20:58 UTC (permalink / raw) To: David Miller Cc: gorcunov, xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel David Miller <davem@davemloft.net> wrote: > From: Cyrill Gorcunov <gorcunov@gmail.com> > Date: Fri, 11 Mar 2016 01:40:56 +0300 > > > On Thu, Mar 10, 2016 at 05:36:30PM -0500, David Miller wrote: > >> > > >> > Works like a charm! So David, what are the next steps then? > >> > Mind to gather all your patches into one (maybe)? > >> > >> I'll re-review all of the changes tomorrow and also look into ipv6 > >> masq, to see if it needs the same treatment, as well. > >> > >> Thanks for all of your help and testing so far. > > > > Thanks a lot, David! > > Cyrill please retest this final patch and let me know if it still works > properly. > > I looked at ipv6, and it's more complicated. The problem is that ipv6 > doesn't mark the inet6dev object as dead in the NETDEV_DOWN case, in > fact it keeps the object around. It only releases it and marks it > dead in the NETDEV_UNREGISTER case. > > We pay a very large price for having allowed the behavior of ipv6 and > ipv4 to diverge so greatly in these areas :-( > > Nevertheless we should try to fix it somehow, maybe we can detect the > situation in another way for the ipv6 side. Note that as the ipv6 inet notifier is atomic; now that nf_ct_iterate_cleanup can schedule the ipv6 masq version defers the cleanup to a work queue, with a backlog cap of 16. So in case of a gazillion events most will be ignored and teardown should not be delayed (at least not even close to what ipv4 masq did). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-11 20:40 ` David Miller 2016-03-11 20:58 ` Florian Westphal @ 2016-03-11 21:00 ` Cyrill Gorcunov 2016-03-11 21:22 ` Cyrill Gorcunov 2 siblings, 0 replies; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-11 21:00 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Fri, Mar 11, 2016 at 03:40:46PM -0500, David Miller wrote: > > > > Thanks a lot, David! > > Cyrill please retest this final patch and let me know if it still works > properly. ... Thanks David! Give me some time, gonna build and run test. Cyrill ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-11 20:40 ` David Miller 2016-03-11 20:58 ` Florian Westphal 2016-03-11 21:00 ` Cyrill Gorcunov @ 2016-03-11 21:22 ` Cyrill Gorcunov 2016-03-11 21:59 ` Cyrill Gorcunov 2 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-11 21:22 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Fri, Mar 11, 2016 at 03:40:46PM -0500, David Miller wrote: > > Thanks a lot, David! > > Cyrill please retest this final patch and let me know if it still works > properly. > > I looked at ipv6, and it's more complicated. The problem is that ipv6 > doesn't mark the inet6dev object as dead in the NETDEV_DOWN case, in > fact it keeps the object around. It only releases it and marks it > dead in the NETDEV_UNREGISTER case. > > We pay a very large price for having allowed the behavior of ipv6 and > ipv4 to diverge so greatly in these areas :-( > > Nevertheless we should try to fix it somehow, maybe we can detect the > situation in another way for the ipv6 side. David, thanks a huge! But you forgot to merge your patch #2 (once I add it manually on top, it works quite well :) --- net/ipv4/fib_frontend.c | 4 ++++ 1 file changed, 4 insertions(+) Index: linux-ml.git/net/ipv4/fib_frontend.c =================================================================== --- linux-ml.git.orig/net/ipv4/fib_frontend.c +++ linux-ml.git/net/ipv4/fib_frontend.c @@ -922,6 +922,9 @@ void fib_del_ifaddr(struct in_ifaddr *if subnet = 1; } + if (in_dev->dead) + goto no_promotions; + /* Deletion is more complicated than add. * We should take care of not to delete too much :-) * @@ -997,6 +1000,7 @@ void fib_del_ifaddr(struct in_ifaddr *if } } +no_promotions: if (!(ok & BRD_OK)) fib_magic(RTM_DELROUTE, RTN_BROADCAST, ifa->ifa_broadcast, 32, prim); if (subnet && ifa->ifa_prefixlen < 31) { ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-11 21:22 ` Cyrill Gorcunov @ 2016-03-11 21:59 ` Cyrill Gorcunov 2016-03-14 3:29 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Cyrill Gorcunov @ 2016-03-11 21:59 UTC (permalink / raw) To: David Miller Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel On Sat, Mar 12, 2016 at 12:22:47AM +0300, Cyrill Gorcunov wrote: > On Fri, Mar 11, 2016 at 03:40:46PM -0500, David Miller wrote: > > > Thanks a lot, David! > > > > Cyrill please retest this final patch and let me know if it still works > > properly. > > > > I looked at ipv6, and it's more complicated. The problem is that ipv6 > > doesn't mark the inet6dev object as dead in the NETDEV_DOWN case, in > > fact it keeps the object around. It only releases it and marks it > > dead in the NETDEV_UNREGISTER case. > > > > We pay a very large price for having allowed the behavior of ipv6 and > > ipv4 to diverge so greatly in these areas :-( > > > > Nevertheless we should try to fix it somehow, maybe we can detect the > > situation in another way for the ipv6 side. > > David, thanks a huge! But you forgot to merge your patch #2 > (once I add it manually on top, it works quite well :) Here is a cumulative one, which works just brilliant! Thanks a lot, David! (I cahcnged reported-by tag, since it's Solar Designer who tell us about the issue, I forgot to mentioned it in first report, very sorry). --- From: David Miller <davem@davemloft.net> Subject: [PATCH] ipv4: Don't do expensive useless work during inetdev destroy. When an inetdev is destroyed, every address assigned to the interface is removed. And in this scenerio we do two pointless things which can be very expensive if the number of assigned interfaces is large: 1) Address promotion. We are deleting all addresses, so there is no point in doing this. 2) A full nf conntrack table purge for every address. We only need to do this once, as is already caught by the existing masq_dev_notifier so masq_inet_event() can skip this. Reported-by: Solar Designer <solar@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> --- net/ipv4/devinet.c | 4 ++++ net/ipv4/fib_frontend.c | 4 ++++ net/ipv4/netfilter/nf_nat_masquerade_ipv4.c | 12 ++++++++++-- 3 files changed, 18 insertions(+), 2 deletions(-) Index: linux-ml.git/net/ipv4/devinet.c =================================================================== --- linux-ml.git.orig/net/ipv4/devinet.c +++ linux-ml.git/net/ipv4/devinet.c @@ -334,6 +334,9 @@ static void __inet_del_ifa(struct in_dev ASSERT_RTNL(); + if (in_dev->dead) + goto no_promotions; + /* 1. Deleting primary ifaddr forces deletion all secondaries * unless alias promotion is set **/ @@ -380,6 +383,7 @@ static void __inet_del_ifa(struct in_dev fib_del_ifaddr(ifa, ifa1); } +no_promotions: /* 2. Unlink it */ *ifap = ifa1->ifa_next; Index: linux-ml.git/net/ipv4/fib_frontend.c =================================================================== --- linux-ml.git.orig/net/ipv4/fib_frontend.c +++ linux-ml.git/net/ipv4/fib_frontend.c @@ -922,6 +922,9 @@ void fib_del_ifaddr(struct in_ifaddr *if subnet = 1; } + if (in_dev->dead) + goto no_promotions; + /* Deletion is more complicated than add. * We should take care of not to delete too much :-) * @@ -997,6 +1000,7 @@ void fib_del_ifaddr(struct in_ifaddr *if } } +no_promotions: if (!(ok & BRD_OK)) fib_magic(RTM_DELROUTE, RTN_BROADCAST, ifa->ifa_broadcast, 32, prim); if (subnet && ifa->ifa_prefixlen < 31) { Index: linux-ml.git/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c =================================================================== --- linux-ml.git.orig/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ linux-ml.git/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -108,10 +108,18 @@ static int masq_inet_event(struct notifi unsigned long event, void *ptr) { - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; + struct in_device *idev = ((struct in_ifaddr *)ptr)->ifa_dev; struct netdev_notifier_info info; - netdev_notifier_info_init(&info, dev); + /* The masq_dev_notifier will catch the case of the device going + * down. So if the inetdev is dead and being destroyed we have + * no work to do. Otherwise this is an individual address removal + * and we have to perform the flush. + */ + if (idev->dead) + return NOTIFY_DONE; + + netdev_notifier_info_init(&info, idev->dev); return masq_device_event(this, event, &info); } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-11 21:59 ` Cyrill Gorcunov @ 2016-03-14 3:29 ` David Miller 0 siblings, 0 replies; 27+ messages in thread From: David Miller @ 2016-03-14 3:29 UTC (permalink / raw) To: gorcunov Cc: xiyou.wangcong, alexei.starovoitov, eric.dumazet, netdev, solar, vvs, avagin, xemul, vdavydov, khorenko, pablo, netfilter-devel From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Sat, 12 Mar 2016 00:59:12 +0300 > Here is a cumulative one, which works just brilliant! Thanks a lot, David! > (I cahcnged reported-by tag, since it's Solar Designer who tell us > about the issue, I forgot to mentioned it in first report, very > sorry). Thanks for fixing this up and retesting, applied and queued up for -stable. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] net: ipv4 -- Introduce ifa limit per net 2016-03-10 19:55 ` David Miller 2016-03-10 20:01 ` Cyrill Gorcunov @ 2016-03-10 21:09 ` Cong Wang 1 sibling, 0 replies; 27+ messages in thread From: Cong Wang @ 2016-03-10 21:09 UTC (permalink / raw) To: David Miller Cc: Cyrill Gorcunov, Alexei Starovoitov, Eric Dumazet, Linux Kernel Network Developers, solar, Vasily Averin, avagin, xemul, vdavydov, khorenko, Pablo Neira Ayuso, netfilter-devel On Thu, Mar 10, 2016 at 11:55 AM, David Miller <davem@davemloft.net> wrote: > Indeed, good catch. Therefore: > > 1) Keep the masq netdev notifier. That will flush the conntrack table > for the inetdev_destroy event. > > 2) Make the inetdev notifier only do something if inetdev->dead is > false. (ie. we are flushing an individual address) > > And then we don't need the NETDEV_UNREGISTER thing at all: This makes sense to me. I guess similar thing needs to do for IPv6 masq too. Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2016-03-14 3:29 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20160309175307.GM2207@uranus.lan> [not found] ` <20160309.152730.691838022304871697.davem@davemloft.net> [not found] ` <20160309204158.GO2207@uranus.lan> 2016-03-09 20:47 ` [RFC] net: ipv4 -- Introduce ifa limit per net David Miller 2016-03-09 20:57 ` Cyrill Gorcunov 2016-03-09 21:10 ` David Miller 2016-03-09 21:16 ` Cyrill Gorcunov 2016-03-10 10:20 ` Cyrill Gorcunov 2016-03-10 11:03 ` Cyrill Gorcunov 2016-03-10 15:09 ` Cyrill Gorcunov 2016-03-10 18:01 ` David Miller 2016-03-10 18:48 ` Cyrill Gorcunov 2016-03-10 19:02 ` Cong Wang 2016-03-10 19:55 ` David Miller 2016-03-10 20:01 ` Cyrill Gorcunov 2016-03-10 20:03 ` David Miller 2016-03-10 20:13 ` Cyrill Gorcunov 2016-03-10 20:19 ` Cyrill Gorcunov 2016-03-10 21:05 ` David Miller 2016-03-10 21:19 ` Cyrill Gorcunov 2016-03-10 21:59 ` Cyrill Gorcunov 2016-03-10 22:36 ` David Miller 2016-03-10 22:40 ` Cyrill Gorcunov 2016-03-11 20:40 ` David Miller 2016-03-11 20:58 ` Florian Westphal 2016-03-11 21:00 ` Cyrill Gorcunov 2016-03-11 21:22 ` Cyrill Gorcunov 2016-03-11 21:59 ` Cyrill Gorcunov 2016-03-14 3:29 ` David Miller 2016-03-10 21:09 ` Cong Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).