From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: netdev@vger.kernel.org
Subject: Re: Scalability of interface creation and deletion
Date: Sat, 07 May 2011 20:32:54 +0200 [thread overview]
Message-ID: <1304793174.3207.22.camel@edumazet-laptop> (raw)
In-Reply-To: <178E8895FB84C07251538EF7@Ximines.local>
Le samedi 07 mai 2011 à 19:24 +0100, Alex Bligh a écrit :
> Eric,
>
> --On 7 May 2011 18:26:29 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > Here, on 2.6.38 kernel (Ubuntu 11.04 provided, on my 2 core laptop)
> ># time rmmod dummy
> > real 0m0.111s
> ...
> > On another machine with a very recent kernel :
> > $ modprobe dummy numdummies=1
> > $ ifconfig dummy0 192.168.46.46 up
> > $ time rmmod dummy
> >
> > real 0m0.032s
>
> I know it's different machines, but that's a pretty significant
> difference. So I compiled from 2.6.39-rc6 head (i.e. a kernel
> less than an hour old), with only your suggested change in,
> so that (a) I could eliminate old kernels, and (b) I could
> instrument it.
>
> > synchronize_rcu() calls are not consuming cpu, they just _wait_
> > rcu grace period.
> >
> > I suggest you read Documentation/RCU files if you really want to :)
>
> I understand the basic point: it needs to wait for all readers
> to drop their references. It's sort of hard to understand why
> on a machine with an idle network there would be reader(s) holding
> references for 250ms. And indeed the analysis below shows that
> isn't the case (it's more like 44 ms).
>
> > If you want to check how expensive it is, its quite easy:
> > add a trace in synchronize_net()
>
> At least for veth devices, I see the same on 2.6.39-rc6 - if anything
> it's worse:
>
> # ./ifseq -n 100
> Sat May 7 17:50:53 UTC 2011 creating 100 interfaces
> Sat May 7 17:50:54 UTC 2011 done
>
> real 0m1.549s
> user 0m0.060s
> sys 0m0.990s
> Sat May 7 17:50:54 UTC 2011 deleting 100 interfaces
> Sat May 7 17:51:22 UTC 2011 done
>
> real 0m27.917s
> user 0m0.420s
> sys 0m0.060s
>
> Performing that operation produced exactly 200 calls to synchronize net.
> The timestamps indicate that's 2 per veth pair deletion, and zero
> per veth pair creation.
>
> Analysing the resultant logs shows only 31% of the problem is
> time spent within synchronize_net() (perl script below).
>
> $ ./analyse.pl < syncnet | tail -2
> Total 18.98515 Usage 199 Average 0.09540 elsewhere
> Total 8.77581 Usage 200 Average 0.04388 synchronizing
>
> So *something* is spending more than twice as much time as
> synchronize_net().
>
> I've attached the log below as well.
>
> --
> Alex Bligh
>
>
> $ cat analyse.pl
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $lastuptime;
> my $uptime;
> my $diff;
> my $area;
> my %time;
> my %usage;
>
> while (<>)
> {
> chomp;
> if (m/\[\s*([0-9.]+)\].*synchronize_net/)
> {
> $uptime = $1;
> if (defined($lastuptime))
> {
> $area = (m/end/)?"synchronizing":"elsewhere";
> $diff = $uptime - $lastuptime;
> printf "%5.5f $area\n", $diff;
> $time{$area}+=$diff;
> $usage{$area}++;
> }
> $lastuptime = $uptime;
> }
> }
>
> print "\n";
>
> my $k;
> foreach $k (sort keys %time)
> {
> printf "Total %5.5f Usage %d Average %5.5f %s\n", $time{$k},
> $usage{$k}, $time{$k}/$usage{$k}, $k;
> }
>
>
>
> May 7 17:50:55 nattytest kernel: [ 127.490142] begin synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.560084] end synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.610350] begin synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.610932] end synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.740078] begin synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.820071] end synchronize_net()
Well, there is also one rcu_barrier() call that is expensive.
(It was changed from one synchronize_rcu() to one rcu_barrier() lately
in commit ef885afb , in 2.6.36 kernel)
net/core/dev.c line 5167
http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ef885afbf8a37689afc1d9d545e2f3e7a8276c17
netdev_wait_allrefs() waits that all references to a device vanishes.
It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.
Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.
time to remove 50 ipip tunnels on a UP machine :
before patch : real 11.910s
after patch : real 1.250s
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
next prev parent reply other threads:[~2011-05-07 18:32 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-07 11:08 Scalability of interface creation and deletion Alex Bligh
2011-05-07 12:22 ` Eric Dumazet
2011-05-07 15:26 ` Alex Bligh
2011-05-07 15:54 ` Eric Dumazet
2011-05-07 16:23 ` Ben Greear
2011-05-07 16:37 ` Eric Dumazet
2011-05-07 16:44 ` Ben Greear
2011-05-07 16:51 ` Eric Dumazet
2011-05-08 3:45 ` Ben Greear
2011-05-08 8:08 ` Alex Bligh
2011-05-09 21:46 ` Octavian Purdila
2011-05-07 16:26 ` Eric Dumazet
2011-05-07 18:24 ` Alex Bligh
2011-05-07 18:32 ` Eric Dumazet [this message]
2011-05-07 18:39 ` Eric Dumazet
2011-05-08 10:09 ` Alex Bligh
2011-05-07 18:42 ` Eric Dumazet
2011-05-07 18:50 ` Alex Bligh
2011-05-08 7:12 ` Eric Dumazet
2011-05-08 8:06 ` Alex Bligh
2011-05-08 9:35 ` Alex Bligh
2011-05-08 12:18 ` Alex Bligh
2011-05-08 12:50 ` Paul E. McKenney
2011-05-08 13:13 ` Alex Bligh
2011-05-08 13:44 ` Paul E. McKenney
2011-05-08 14:27 ` Alex Bligh
2011-05-08 14:47 ` Paul E. McKenney
2011-05-08 15:17 ` Alex Bligh
2011-05-08 15:48 ` Paul E. McKenney
2011-05-08 21:00 ` Eric Dumazet
2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet
2011-05-09 6:56 ` Michał Mirosław
2011-05-09 8:20 ` Eric Dumazet
2011-05-09 9:17 ` [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan Eric Dumazet
2011-05-09 18:42 ` David Miller
2011-05-09 19:05 ` Eric Dumazet
2011-05-09 20:17 ` Eric Dumazet
2011-05-10 6:40 ` [PATCH net-2.6] vlan: fix GVRP at dismantle time Eric Dumazet
2011-05-10 19:23 ` David Miller
2011-05-09 7:45 ` [PATCH v2 net-next-2.6] veth: use batched device unregister Eric Dumazet
2011-05-09 9:22 ` Eric Dumazet
2011-05-09 5:37 ` Scalability of interface creation and deletion Alex Bligh
2011-05-09 6:37 ` Eric Dumazet
2011-05-09 7:11 ` Paul E. McKenney
2011-05-09 17:30 ` Jesse Gross
2011-05-08 12:44 ` Paul E. McKenney
2011-05-08 13:06 ` Alex Bligh
2011-05-08 13:14 ` Alex Bligh
2011-05-08 12:32 ` Paul E. McKenney
2011-05-07 18:51 ` Alex Bligh
2011-05-07 19:24 ` Eric Dumazet
2011-05-07 18:38 ` Alex Bligh
2011-05-07 18:44 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1304793174.3207.22.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=alex@alex.org.uk \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox