From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Simon Kirby <sim@hostway.ca>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: net_ns cleanup / RCU overhead
Date: Fri, 29 Aug 2014 19:52:25 -0700 [thread overview]
Message-ID: <20140830025225.GC5001@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140829004029.GA18300@hostway.ca>
On Thu, Aug 28, 2014 at 05:40:29PM -0700, Simon Kirby wrote:
> On Thu, Aug 28, 2014 at 01:46:58PM -0700, Paul E. McKenney wrote:
>
> > On Thu, Aug 28, 2014 at 03:33:42PM -0500, Eric W. Biederman wrote:
> >
> > > I just want to add a little bit more analysis to this.
> > >
> > > What we desire to be fast is the copy_net_ns, cleanup_net is batched and
> > > asynchronous which nothing really cares how long it takes except that
> > > cleanup_net holds the net_mutex and thus blocks copy_net_ns.
> > >
> > > The puzzle is why and which rcu delays Simon is seeing in the network
> > > namespace cleanup path, as it seems like the synchronize_rcu is not
> > > the only one, and in the case of vsftp with trivail network namespaces
> > > where nothing has been done we should not need to delay.
> >
> > Indeed, given the version and .config, I can't see why any individual
> > RCU grace-period operation would be particularly slow.
> >
> > I suggest using ftrace on synchronize_rcu() and friends.
>
> I made a parallel net namespace create/destroy benchmark that prints the
> progress and time to create and cleanup 32 unshare()d child processes:
>
> http://0x.ca/sim/ref/tools/netnsbench.c
>
> I noticed that if I haven't run it for a while, the first batch often is
> fast, followed by slowness from then on:
>
> ++++++++++++++++++++++++++++++++-------------------------------- 0.039478s
> ++++++++++++++++++++-----+----------------+++++++++---------++-- 4.463837s
> +++++++++++++++++++++++++------+--------------------++++++------ 3.011882s
> +++++++++++++++---+-------------++++++++++++++++---------------- 2.283993s
>
> Fiddling around on a stock kernel, "echo 1 > /sys/kernel/rcu_expedited"
> makes behaviour change as it did with my patch:
>
> ++-++-+++-+-----+-+-++-+-++--++-+--+-+-++--++-+-+-+-++-+--++---- 0.801406s
> +-+-+-++-+-+-+-+-++--+-+-++-+--++-+-+-+-+-+-+-+-+-+-+-+--++-+--- 0.872011s
> ++--+-++--+-++--+-++--+-+-+-+-++-+--++--+-++-+-+-+-+--++-+-+-+-- 0.946745s
>
> How would I use ftrace on synchronize_rcu() here?
http://lwn.net/Articles/370423/ is your friend here. If your kernel
is built with the needed configuration, you give the command
"echo synchronize_rcu > set_ftrace_filter"
http://lwn.net/Articles/365835/ and http://lwn.net/Articles/366796/
have background info.
> As Eric said, cleanup_net() is batched, but while it is cleaning up,
> net_mutex is held. Isn't the issue just that net_mutex is held while
> some other things are going on that are meant to be lazy / batched?
>
> What is net_mutex protecting in cleanup_net()?
>
> I noticed that [kworker/u16:0]'s stack is often:
>
> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
> [<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
> [<ffffffff81062997>] process_one_work+0x157/0x440
> [<ffffffff81063303>] worker_thread+0x63/0x520
> [<ffffffff81068b96>] kthread+0xd6/0xf0
> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> and
>
> [<ffffffff81095364>] _rcu_barrier+0x154/0x1f0
> [<ffffffff81095450>] rcu_barrier+0x10/0x20
> [<ffffffff81102c2c>] kmem_cache_destroy+0x6c/0xb0
> [<ffffffffa0089e97>] nf_conntrack_cleanup_net_list+0x167/0x1c0 [nf_conntrack]
> [<ffffffffa008aab5>] nf_conntrack_pernet_exit+0x65/0x70 [nf_conntrack]
> [<ffffffff81720353>] ops_exit_list.isra.4+0x53/0x60
> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
> [<ffffffff81062997>] process_one_work+0x157/0x440
> [<ffffffff81063303>] worker_thread+0x63/0x520
> [<ffffffff81068b96>] kthread+0xd6/0xf0
> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> So I tried flushing iptables rules and rmmoding netfilter bits:
>
> ++++++++++++++++++++-+--------------------+++++++++++----------- 0.179940s
> ++++++++++++++--+-------------+++++++++++++++++----------------- 0.151988s
> ++++++++++++++++++++++++++++---+--------------------------+++--- 0.159967s
> ++++++++++++++++++++++----------------------++++++++++---------- 0.175964s
>
> Expedited:
>
> ++-+--++-+-+-+-+-+-+--++-+-+-++-+-+-+--++-+-+-+-+-+-+-+-+-+-+--- 0.079988s
> ++-+-+-+-+-+-+-+-+-+-+-+--++-+--++-+--+-++-+-+--++-+-+-+-+-+-+-- 0.089347s
> ++++--+++--++--+-+++++++-+++++--------------++-+-+--++-+-+--++-- 0.081566s
> +++++-+++-------++-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+--- 0.089026s
>
> So, much faster. It seems that just loading nf_conntrack_ipv4 (like by
> running iptables -t nat -nvL) is enough to slow it way down. But it is
> still capable of being fast, as above.
My first guess is that this code sequence is calling synchronize_rcu()
quite often. Would it be possible to consolidate these?
Thanx, Paul
prev parent reply other threads:[~2014-08-30 2:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-20 5:58 net_ns cleanup / RCU overhead Simon Kirby
2014-08-28 19:24 ` Paul E. McKenney
2014-08-28 19:44 ` Simon Kirby
2014-08-28 20:33 ` Eric W. Biederman
2014-08-28 20:46 ` Paul E. McKenney
2014-08-29 0:40 ` Simon Kirby
2014-08-29 3:57 ` Julian Anastasov
2014-08-29 21:57 ` Eric W. Biederman
2014-08-29 23:52 ` Florian Westphal
2014-08-30 2:56 ` Paul E. McKenney
2014-08-30 8:20 ` Julian Anastasov
2014-08-30 2:52 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140830025225.GC5001@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sim@hostway.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.