From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: netdev@vger.kernel.org
Subject: Re: Scalability of interface creation and deletion
Date: Sat, 07 May 2011 14:22:06 +0200 [thread overview]
Message-ID: <1304770926.2821.1157.camel@edumazet-laptop> (raw)
In-Reply-To: <891B02256A0667292521A4BF@Ximines.local>
Le samedi 07 mai 2011 à 12:08 +0100, Alex Bligh a écrit :
> I am trying to track down why interface creation slows down badly with
> large numbers of interfaces (~1,000 interfaces) and why deletion is so
> slow. Use case: restarting routers needs to be fast; some failover methods
> require interface up/down; some routers need lots of interfaces.
>
> I have written a small shell script to create and delete a number of
> interfaces supplied on the command line (script appended below). It
> is important to run this with udev, udev-bridge etc. disabled. In
> my environment
> (Ubuntu 2.6.32-28-generic, Lucid). I did this by
> * service upstart-udev-bridge stop
> * service udev stop
> * unshare -n bash
> If you don't do this, you are simply timing your distro's interface
> scripts.
>
> Note the "-n" parameter creates the supplied number of veth pair
> interfaces. As these are pairs, there are twice as many interfaces actually
> created.
>
> So, the results which are pretty repeatable are as follows:
>
> 100 pairs 500 pairs
> Interface creation 14ms 110ms
> Interface deletion 160ms 148ms
>
> Now I don't think interface deletion has in fact got faster: simply
> the overhead of loading the script is spread over more processes.
> But there are two obvious conclusions:
>
> 1. Interface creation slows down hugely with more interfaces
sysfs is the problem, a very well known one.
(sysfs_refresh_inode(),
try :
$ time ls /sys/class/net >/dev/null
real 0m0.002s
user 0m0.000s
sys 0m0.001s
$ modprobe dummy numdummies=1000
$ time ls /sys/class/net >/dev/null
real 0m0.041s
user 0m0.003s
sys 0m0.002s
> 2. Interface deletion is normally much slower than interface creation
>
> strace -T -ttt on the "ip" command used to do this does not show the delay
> where I thought it would be - cataloguing the existing interfaces. Instead,
> it's the final send() to the netlink socket which does the relevant action
> which appears to be slow, for both addition and detion. Adding the last
> interface takes 200ms in that syscall, the first is quick (symptomatic of a
> slowdown); for deletion the last send syscall is quick.
>
> Poking about in net/core/dev.c, I see that interface names are hashed using
> a hash with a maximum of 256 entries. However, these seem to be hash
> buckets supporting multiple entries so I can't imagine a chain of 4 entries
> is problematic.
Its not.
>
> I am having difficulty seeing what might be the issue in interface
> creation. Any ideas?
>
Actually a lot, just make
git log net/core/dev.c
and you'll see many commits to make this faster.
> In interface deletion, my attention is drawn to netdev_wait_allrefs,
> which does this:
> refcnt = netdev_refcnt_read(dev);
>
Here refcnt is 0, or there is a bug somewhere.
(It happens, we fix bugs once in a while)
> while (refcnt != 0) {
> ...
> msleep(250);
>
> refcnt = netdev_refcnt_read(dev);
> ....
> }
>
> I am guessing that this is going to do the msleep 50% of the time,
> explaining 125ms of the observed time. How would people react to
> exponential backoff instead (untested):
>
> int backoff = 10;
> refcnt = netdev_refcnt_read(dev);
>
> while (refcnt != 0) {
> ...
> msleep(backoff);
> if ((backoff *= 2) > 250)
> backoff = 250;
>
> refcnt = netdev_refcnt_read(dev);
> ....
> }
>
>
Welcome to the club. This is what is discussed on netdev since many
years. Lot of work had been done to make it better.
Interface deletion needs several rcu synch calls, they are very
expensive. This is the price to pay to have lockless network stack in
fast paths.
next prev parent reply other threads:[~2011-05-07 12:22 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-07 11:08 Scalability of interface creation and deletion Alex Bligh
2011-05-07 12:22 ` Eric Dumazet [this message]
2011-05-07 15:26 ` Alex Bligh
2011-05-07 15:54 ` Eric Dumazet
2011-05-07 16:23 ` Ben Greear
2011-05-07 16:37 ` Eric Dumazet
2011-05-07 16:44 ` Ben Greear
2011-05-07 16:51 ` Eric Dumazet
2011-05-08 3:45 ` Ben Greear
2011-05-08 8:08 ` Alex Bligh
2011-05-09 21:46 ` Octavian Purdila
2011-05-07 16:26 ` Eric Dumazet
2011-05-07 18:24 ` Alex Bligh
2011-05-07 18:32 ` Eric Dumazet
2011-05-07 18:39 ` Eric Dumazet
2011-05-08 10:09 ` Alex Bligh
2011-05-07 18:42 ` Eric Dumazet
2011-05-07 18:50 ` Alex Bligh
2011-05-08 7:12 ` Eric Dumazet
2011-05-08 8:06 ` Alex Bligh
2011-05-08 9:35 ` Alex Bligh
2011-05-08 12:18 ` Alex Bligh
2011-05-08 12:50 ` Paul E. McKenney
2011-05-08 13:13 ` Alex Bligh
2011-05-08 13:44 ` Paul E. McKenney
2011-05-08 14:27 ` Alex Bligh
2011-05-08 14:47 ` Paul E. McKenney
2011-05-08 15:17 ` Alex Bligh
2011-05-08 15:48 ` Paul E. McKenney
2011-05-08 21:00 ` Eric Dumazet
2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet
2011-05-09 6:56 ` Michał Mirosław
2011-05-09 8:20 ` Eric Dumazet
2011-05-09 9:17 ` [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan Eric Dumazet
2011-05-09 18:42 ` David Miller
2011-05-09 19:05 ` Eric Dumazet
2011-05-09 20:17 ` Eric Dumazet
2011-05-10 6:40 ` [PATCH net-2.6] vlan: fix GVRP at dismantle time Eric Dumazet
2011-05-10 19:23 ` David Miller
2011-05-09 7:45 ` [PATCH v2 net-next-2.6] veth: use batched device unregister Eric Dumazet
2011-05-09 9:22 ` Eric Dumazet
2011-05-09 5:37 ` Scalability of interface creation and deletion Alex Bligh
2011-05-09 6:37 ` Eric Dumazet
2011-05-09 7:11 ` Paul E. McKenney
2011-05-09 17:30 ` Jesse Gross
2011-05-08 12:44 ` Paul E. McKenney
2011-05-08 13:06 ` Alex Bligh
2011-05-08 13:14 ` Alex Bligh
2011-05-08 12:32 ` Paul E. McKenney
2011-05-07 18:51 ` Alex Bligh
2011-05-07 19:24 ` Eric Dumazet
2011-05-07 18:38 ` Alex Bligh
2011-05-07 18:44 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1304770926.2821.1157.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=alex@alex.org.uk \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox