From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
Benoit Lourdelet <blourdel@juniper.net>,
Serge Hallyn <serge.hallyn@ubuntu.com>,
"netdev\@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [RFC][PATCH] iproute: Faster ip link add, set and delete
Date: Thu, 28 Mar 2013 18:06:32 -0700 [thread overview]
Message-ID: <87ip4b5813.fsf@xmission.com> (raw)
In-Reply-To: <1364517837.15753.61.camel@edumazet-glaptop> (Eric Dumazet's message of "Thu, 28 Mar 2013 17:43:57 -0700")
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Thu, 2013-03-28 at 17:25 -0700, Eric W. Biederman wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>
>> > On Thu, 2013-03-28 at 16:52 -0700, Eric W. Biederman wrote:
>> >
>> >> On my microbenchmark of just creating 5000 veth pairs this takes pairs
>> >> 16s instead of 13s of my earlier hacks but that is well down in the
>> >> usable range.
>> >
>> > I guess most of the time is taken by sysctl_check_table()
>>
>> All of the significant sysctl slowdowns were fixed in 3.4. If you see
>> something of sysctl show up in a trace I would be happy to talk about
>> it. The kernel side seems to be creating N network devices seems to
>> take NlogN time now. Both sysfs and sysctl store directories as
>> rbtrees removing their previous bottlenecks.
>>
>> The loop I timed at 16s was just:
>>
>> time for i in $(seq 1 5000) ; do ip link add a$i type veth peer name b$i; done
>>
>> There is plenty of room for inefficiencies in 10000 network devices and
>> 5000 forks+execs.
>
> Ah right, the sysctl part is fixed ;)
>
> In batch mode, I can create these veth pairs in 4 seconds
>
> for i in $(seq 1 5000) ; do echo link add a$i type veth peer name b$i;
> done | ip -batch -
Yes. The interesting story here is that the bottleneck before these
patches was the ll_init_map function of iproute2. Which resulted in an
over an order of magnitude slowdown of when starting iproute on a system
with lots of network devices.
It is still unclear where iproute comes into the picture in the original
problem scenario of creating 2000 containers each with 2 veth pairs.
But apparently it was.
As the fundamental use case here was taking 2000 separate independent
actions it turns out to be important for things to not slowdown
unreasonably outside of batch mode. So I was explicitly testing the
non-batch mode performance.
On the flip side it might be interesting to see if we can get batch mode
deletes to batch in the kernel, so we don't have to wait for through
syncrhonize_rcu_expidited for each of them. Although for the container
case I can just drop the last reference to the network namespace and all
of the network device removals will batch.
Ultimately shrug. Except in the previous O(N^2) userspace behavior
there don't seem to be any practical performance problems with this many
network devices. What is interesting is that this many network devices
is becoming interesting on inexpensive COTS servers, for cases that are
not purely network focused.
Eric
next prev parent reply other threads:[~2013-03-29 1:06 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-22 22:23 [RFC][PATCH] iproute: Faster ip link add, set and delete Eric W. Biederman
2013-03-22 22:27 ` Stephen Hemminger
2013-03-26 11:51 ` Benoit Lourdelet
2013-03-26 12:40 ` Eric W. Biederman
2013-03-26 14:17 ` Serge Hallyn
2013-03-26 14:33 ` Serge Hallyn
2013-03-27 13:37 ` Benoit Lourdelet
2013-03-27 15:11 ` Eric W. Biederman
2013-03-27 17:47 ` Stephen Hemminger
2013-03-28 0:46 ` Eric W. Biederman
2013-03-28 3:20 ` Serge Hallyn
2013-03-28 3:44 ` Eric W. Biederman
2013-03-28 4:28 ` Serge Hallyn
2013-03-28 5:00 ` Eric W. Biederman
2013-03-28 13:36 ` Serge Hallyn
2013-03-28 13:42 ` Benoit Lourdelet
2013-03-28 15:04 ` Serge Hallyn
2013-03-28 15:21 ` Benoit Lourdelet
2013-03-28 22:20 ` Stephen Hemminger
2013-03-28 23:52 ` Eric W. Biederman
2013-03-29 0:13 ` Eric Dumazet
2013-03-29 0:25 ` Eric W. Biederman
2013-03-29 0:43 ` Eric Dumazet
2013-03-29 1:06 ` Eric W. Biederman [this message]
2013-03-29 1:10 ` Eric Dumazet
2013-03-29 1:29 ` Eric W. Biederman
2013-03-29 1:38 ` Eric Dumazet
2013-03-30 10:09 ` Benoit Lourdelet
2013-03-30 14:44 ` Eric Dumazet
2013-03-30 16:07 ` Benoit Lourdelet
2013-03-28 20:27 ` Benoit Lourdelet
2013-03-26 15:31 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ip4b5813.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=blourdel@juniper.net \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=serge.hallyn@ubuntu.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.