Re: net_ns cleanup / RCU overhead

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Kirby <sim@hostway.ca>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: net_ns cleanup / RCU overhead
Date: Thu, 28 Aug 2014 17:40:29 -0700	[thread overview]
Message-ID: <20140829004029.GA18300@hostway.ca> (raw)
In-Reply-To: <20140828204658.GL5001@linux.vnet.ibm.com>

On Thu, Aug 28, 2014 at 01:46:58PM -0700, Paul E. McKenney wrote:

> On Thu, Aug 28, 2014 at 03:33:42PM -0500, Eric W. Biederman wrote:
> 
> > I just want to add a little bit more analysis to this.
> > 
> > What we desire to be fast is the copy_net_ns, cleanup_net is batched and
> > asynchronous which nothing really cares how long it takes except that
> > cleanup_net holds the net_mutex and thus blocks copy_net_ns.
> > 
> > The puzzle is why and which rcu delays Simon is seeing in the network
> > namespace cleanup path, as it seems like the synchronize_rcu is not
> > the only one, and in the case of vsftp with trivail network namespaces
> > where nothing has been done we should not need to delay.
> 
> Indeed, given the version and .config, I can't see why any individual
> RCU grace-period operation would be particularly slow.
> 
> I suggest using ftrace on synchronize_rcu() and friends.

I made a parallel net namespace create/destroy benchmark that prints the
progress and time to create and cleanup 32 unshare()d child processes:

http://0x.ca/sim/ref/tools/netnsbench.c

I noticed that if I haven't run it for a while, the first batch often is
fast, followed by slowness from then on:

++++++++++++++++++++++++++++++++-------------------------------- 0.039478s
++++++++++++++++++++-----+----------------+++++++++---------++-- 4.463837s
+++++++++++++++++++++++++------+--------------------++++++------ 3.011882s
+++++++++++++++---+-------------++++++++++++++++---------------- 2.283993s

Fiddling around on a stock kernel, "echo 1 > /sys/kernel/rcu_expedited"
makes behaviour change as it did with my patch:

++-++-+++-+-----+-+-++-+-++--++-+--+-+-++--++-+-+-+-++-+--++---- 0.801406s
+-+-+-++-+-+-+-+-++--+-+-++-+--++-+-+-+-+-+-+-+-+-+-+-+--++-+--- 0.872011s
++--+-++--+-++--+-++--+-+-+-+-++-+--++--+-++-+-+-+-+--++-+-+-+-- 0.946745s

How would I use ftrace on synchronize_rcu() here?

As Eric said, cleanup_net() is batched, but while it is cleaning up,
net_mutex is held. Isn't the issue just that net_mutex is held while
some other things are going on that are meant to be lazy / batched?

What is net_mutex protecting in cleanup_net()?

I noticed that [kworker/u16:0]'s stack is often:

[<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
[<ffffffff8109607e>] synchronize_sched+0x2e/0x50
[<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
[<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
[<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
[<ffffffff81062997>] process_one_work+0x157/0x440
[<ffffffff81063303>] worker_thread+0x63/0x520
[<ffffffff81068b96>] kthread+0xd6/0xf0
[<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

and

[<ffffffff81095364>] _rcu_barrier+0x154/0x1f0
[<ffffffff81095450>] rcu_barrier+0x10/0x20
[<ffffffff81102c2c>] kmem_cache_destroy+0x6c/0xb0
[<ffffffffa0089e97>] nf_conntrack_cleanup_net_list+0x167/0x1c0 [nf_conntrack]
[<ffffffffa008aab5>] nf_conntrack_pernet_exit+0x65/0x70 [nf_conntrack]
[<ffffffff81720353>] ops_exit_list.isra.4+0x53/0x60
[<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
[<ffffffff81062997>] process_one_work+0x157/0x440
[<ffffffff81063303>] worker_thread+0x63/0x520
[<ffffffff81068b96>] kthread+0xd6/0xf0
[<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

So I tried flushing iptables rules and rmmoding netfilter bits:

++++++++++++++++++++-+--------------------+++++++++++----------- 0.179940s
++++++++++++++--+-------------+++++++++++++++++----------------- 0.151988s
++++++++++++++++++++++++++++---+--------------------------+++--- 0.159967s
++++++++++++++++++++++----------------------++++++++++---------- 0.175964s

Expedited:

++-+--++-+-+-+-+-+-+--++-+-+-++-+-+-+--++-+-+-+-+-+-+-+-+-+-+--- 0.079988s
++-+-+-+-+-+-+-+-+-+-+-+--++-+--++-+--+-++-+-+--++-+-+-+-+-+-+-- 0.089347s
++++--+++--++--+-+++++++-+++++--------------++-+-+--++-+-+--++-- 0.081566s
+++++-+++-------++-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+--- 0.089026s

So, much faster. It seems that just loading nf_conntrack_ipv4 (like by
running iptables -t nat -nvL) is enough to slow it way down. But it is
still capable of being fast, as above.

Simon-

next prev parent reply	other threads:[~2014-08-29  0:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-20  5:58 net_ns cleanup / RCU overhead Simon Kirby
2014-08-28 19:24 ` Paul E. McKenney
2014-08-28 19:44   ` Simon Kirby
2014-08-28 20:33     ` Eric W. Biederman
2014-08-28 20:46       ` Paul E. McKenney
2014-08-29  0:40         ` Simon Kirby [this message]
2014-08-29  3:57           ` Julian Anastasov
2014-08-29 21:57             ` Eric W. Biederman
2014-08-29 23:52               ` Florian Westphal
2014-08-30  2:56                 ` Paul E. McKenney
2014-08-30  8:20               ` Julian Anastasov
2014-08-30  2:52           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140829004029.GA18300@hostway.ca \
    --to=sim@hostway.ca \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.