public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: nicolas.dichtel@6wind.com, David Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>,
	Octavian Purdila <opurdila@ixiacom.com>,
	Benjamin LaHaise <bcrl@kvack.org>
Subject: [PATCH] net: use rcu_barrier() in rollback_registered_many
Date: Tue, 14 Sep 2010 00:24:54 +0200	[thread overview]
Message-ID: <1284416694.2627.89.camel@edumazet-laptop> (raw)
In-Reply-To: <1284128679.24675.38.camel@edumazet-laptop>

Le vendredi 10 septembre 2010 à 16:24 +0200, Eric Dumazet a écrit : 
> Le vendredi 10 septembre 2010 à 15:35 +0200, Nicolas Dichtel a écrit :
> > Hi all,
> > 
> > We got a scalability problem when we try to remove a lot of virtual interfaces. 
> > After analysis, we found that a refcnt on a device was released too late.
> > Here is a proposal patch. If we are not missing something, the refcnt can be 
> > release before call_rcu(). In IPv6, this is already the case.
> > 
> > Comments are welcome.
> > 
> > 
> > Regards,
> > Nicolas
> > pièce jointe différences entre fichiers
> > (0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch)
> > From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
> > From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> > Date: Fri, 10 Sep 2010 14:52:15 +0200
> > Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev
> > 
> > When a virtual device is removed, refcnt on dev is released
> > after rcu barrier, hence we fall always in the msleep(250)
> > of netdev_wait_allrefs(). This causes a long delay when
> > a lot of interfaces are removed.
> > Refcnt can be released before this rcu barrier, this allows
> > to accelerate the removing of virtual interfaces.
> > 
> > Test of removing 50 ipip tunnel interfaces:
> >  Before the patch:
> >   real    0m12.804s
> >   user    0m0.020s
> >   sys     0m0.000s
> > 
> >  After the patch:
> >   real    0m0.988s
> >   user    0m0.004s
> >   sys     0m0.016s
> > 
> > Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> > ---
> 
> This is a well known problem, (many patches were sent some months ago)
> but your patch is not the right solution.
> 
> As long as the idev is not yet freed, it can be used and we need to
> access idev->dev
> 
> 

I believe I understood one problem.

In rollback_registered_many(), we call the inetdev_event() (and
inetdev_destroy() at line 4844 :

call_netdevice_notifiers(NETDEV_UNREGISTER, dev);

Then, we call synchronize_net() at line 4870

So by the time netdev_wait_allrefs() is called, we should have called
in_dev_finish_destroy() 

But using synchronize_net() is a bit wrong here : 

	"It waits until all pre-existing rcu readers have completed."

We have no guarantee all call_rcu() that we posted to dismantle the
device completed :

- If number of online cpus is 1, synchronize_net() is a no op
- If our thread migrates to another cpu, synchronize_net() can returns
  while old callbacks are not yet processed.

We should probably use rcu_barrier() instead, to wait for all
outstanding RCU callbacks to complete.

I also believe the order of netdevice notifiers is wrong (we dont set
priority), and that we should call fib_netdev_event() _before_
dst_dev_event(). This needs another patch.

Thanks

[PATCH] net: use rcu_barrier() in rollback_registered_many

netdev_wait_allrefs() waits that all references to a device vanishes.

It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.

Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.

time to remove 50 ipip tunnels on a UP machine :

before patch : real 11.910s
after patch : real 1.250s

Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fc2dc93..6de5a82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4867,7 +4867,7 @@ static void rollback_registered_many(struct list_head *head)
 	dev = list_first_entry(head, struct net_device, unreg_list);
 	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 
-	synchronize_net();
+	rcu_barrier();
 
 	list_for_each_entry(dev, head, unreg_list)
 		dev_put(dev);



  parent reply	other threads:[~2010-09-13 22:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-10 13:35 [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev Nicolas Dichtel
2010-09-10 14:24 ` Eric Dumazet
2010-09-10 14:57   ` Nicolas Dichtel
2010-09-10 15:16     ` Eric Dumazet
2010-09-14 20:45       ` David Miller
2010-09-15  6:01         ` Eric Dumazet
2010-09-13 22:24   ` Eric Dumazet [this message]
2010-09-14 21:27     ` [PATCH] net: use rcu_barrier() in rollback_registered_many David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1284416694.2627.89.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=bcrl@kvack.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=opurdila@ixiacom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox