All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Justin Cormack <justin.cormack@docker.com>,
	Ian Campbell <ian.campbell@docker.com>
Subject: Re: Long delays creating a netns after deleting one (possibly RCU related)
Date: Thu, 10 Nov 2016 13:24:04 -0800	[thread overview]
Message-ID: <20161110212404.GB4127@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAM_iQpUNRyv3wHi3cJpcH6avo+D7Gxwqp+iybUxYc7PCo3WUPw@mail.gmail.com>

On Thu, Nov 10, 2016 at 09:37:47AM -0800, Cong Wang wrote:
> (Cc'ing Paul)
> 
> On Wed, Nov 9, 2016 at 7:42 AM, Rolf Neugebauer
> <rolf.neugebauer@docker.com> wrote:
> > Hi
> >
> > We noticed some long delays starting docker containers on some newer
> > kernels (starting with 4.5.x and still present in 4.9-rc4, 4.4.x is
> > fine). We narrowed this down to the creation of a network namespace
> > being delayed directly after removing another one (details and
> > reproduction below). We have seen delays of up to 60s on some systems.
> >
> > - The delay is proportional to the number of CPUs (online or offline).
> > We first discovered it with a Hyper-V Linux VM. Hyper-V advertises up
> > to 240 offline vCPUs even if one configures the VM with only, say 2
> > vCPUs. We see linear increase in delay when we change NR_CPUS in the
> > kernel config.
> >
> > - The delay is also dependent on some tunnel network interfaces being
> > present (which we had compiled in in one of our kernel configs).
> >
> > - We can reproduce this issue with stock kernels from
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/running in Hyper-V VMs
> > as well as other hypervisors like qemu and hyperkit where we have good
> > control over the number of CPUs.
> >
> > A simple test is:
> > modprobe ipip
> > moprobe  ip_gre
> > modprobe ip_vti
> > echo -n "add netns foo ===> "; /usr/bin/time -f "%E" ip netns add foo
> > echo -n "del netns foo ===> "; /usr/bin/time -f "%E" ip netns delete foo
> > echo -n "add netns bar ===> "; /usr/bin/time -f "%E" ip netns add bar
> > echo -n "del netns bar ===> "; /usr/bin/time -f "%E" ip netns delete bar
> >
> > with an output like:
> > add netns foo ===> 0:00.00
> > del netns foo ===> 0:00.01
> > add netns bar ===> 0:08.53
> > del netns bar ===> 0:00.01
> >
> > This is on a 4.9-rc4 kernel from the above URL configured with
> > NR_CPUS=256 running in a Hyper-V VM (kernel config attached).
> >
> > Below is a dump of the work queues while the second 'ip add netns' is
> > hanging. The state of the work queues does not seem to change while
> > the command is delayed and the pattern shown is consistent across
> > different kernel versions.
> >
> > Is this a known issue and/or is someone working on a fix?
> 
> Not to me.
> 
> 
> >
> > [  610.356272] sysrq: SysRq : Show Blocked State
> > [  610.356742]   task                        PC stack   pid father
> > [  610.357252] kworker/u480:1  D    0  1994      2 0x00000000
> > [  610.357752] Workqueue: netns cleanup_net
> > [  610.358239]  ffff9892f1065800 0000000000000000 ffff9892ee1e1e00
> > ffff9892f8e59340
> > [  610.358705]  ffff9892f4526900 ffffbf0104b5ba88 ffffffffbe486df3
> > ffffbf0104b5ba60
> > [  610.359168]  00ffffffbdcbe663 ffff9892f8e59340 0000000100012e70
> > ffff9892ee1e1e00
> > [  610.359677] Call Trace:
> > [  610.360169]  [<ffffffffbe486df3>] ? __schedule+0x233/0x6e0
> > [  610.360723]  [<ffffffffbe4872d6>] schedule+0x36/0x80
> > [  610.361194]  [<ffffffffbe48a9ca>] schedule_timeout+0x22a/0x3f0
> > [  610.361789]  [<ffffffffbe486dfb>] ? __schedule+0x23b/0x6e0
> > [  610.362260]  [<ffffffffbe487d24>] wait_for_completion+0xb4/0x140
> > [  610.362736]  [<ffffffffbdcb05a0>] ? wake_up_q+0x80/0x80
> > [  610.363306]  [<ffffffffbdceb528>] __wait_rcu_gp+0xc8/0xf0
> > [  610.363782]  [<ffffffffbdceea5c>] synchronize_sched+0x5c/0x80
> > [  610.364137]  [<ffffffffbdcf0010>] ? call_rcu_bh+0x20/0x20
> > [  610.364742]  [<ffffffffbdceb440>] ?
> > trace_raw_output_rcu_utilization+0x60/0x60
> > [  610.365337]  [<ffffffffbe3696bc>] synchronize_net+0x1c/0x30
> 
> This is a worker which holds the net_mutex and is waiting for
> a RCU grace period to elapse.
> 
> 
> > [  610.365846]  [<ffffffffbe369803>] netif_napi_del+0x23/0x80
> > [  610.367494]  [<ffffffffc057f6f8>] ip_tunnel_dev_free+0x68/0xf0 [ip_tunnel]
> > [  610.368007]  [<ffffffffbe372c10>] netdev_run_todo+0x230/0x330
> > [  610.368454]  [<ffffffffbe37eb4e>] rtnl_unlock+0xe/0x10
> > [  610.369001]  [<ffffffffc057f4df>] ip_tunnel_delete_net+0xdf/0x120 [ip_tunnel]
> > [  610.369500]  [<ffffffffc058b92c>] ipip_exit_net+0x2c/0x30 [ipip]
> > [  610.369997]  [<ffffffffbe362688>] ops_exit_list.isra.4+0x38/0x60
> > [  610.370636]  [<ffffffffbe363674>] cleanup_net+0x1c4/0x2b0
> > [  610.371130]  [<ffffffffbdc9e4ac>] process_one_work+0x1fc/0x4b0
> > [  610.371812]  [<ffffffffbdc9e7ab>] worker_thread+0x4b/0x500
> > [  610.373074]  [<ffffffffbdc9e760>] ? process_one_work+0x4b0/0x4b0
> > [  610.373622]  [<ffffffffbdc9e760>] ? process_one_work+0x4b0/0x4b0
> > [  610.374100]  [<ffffffffbdca4b09>] kthread+0xd9/0xf0
> > [  610.374574]  [<ffffffffbdca4a30>] ? kthread_park+0x60/0x60
> > [  610.375198]  [<ffffffffbe48c2b5>] ret_from_fork+0x25/0x30
> > [  610.375678] ip              D    0  2149   2148 0x00000000
> > [  610.376185]  ffff9892f0a99000 0000000000000000 ffff9892f0a66900
> > [  610.376185]  ffff9892f8e59340
> > [  610.376185]  ffff9892f4526900 ffffbf0101173db8 ffffffffbe486df3
> > [  610.376753]  00000005fecffd76
> > [  610.376762]  00ff9892f11d9820 ffff9892f8e59340 ffff989200000000
> > ffff9892f0a66900
> > [  610.377274] Call Trace:
> > [  610.377789]  [<ffffffffbe486df3>] ? __schedule+0x233/0x6e0
> > [  610.378306]  [<ffffffffbe4872d6>] schedule+0x36/0x80
> > [  610.378992]  [<ffffffffbe48756e>] schedule_preempt_disabled+0xe/0x10
> > [  610.379514]  [<ffffffffbe489199>] __mutex_lock_slowpath+0xb9/0x130
> > [  610.380031]  [<ffffffffbde0fce2>] ? __kmalloc+0x162/0x1e0
> > [  610.380556]  [<ffffffffbe48922f>] mutex_lock+0x1f/0x30
> > [  610.381135]  [<ffffffffbe3637ff>] copy_net_ns+0x9f/0x170
> > [  610.381647]  [<ffffffffbdca5e6b>] create_new_namespaces+0x11b/0x200
> > [  610.382249]  [<ffffffffbdca60fa>] unshare_nsproxy_namespaces+0x5a/0xb0
> > [  610.382818]  [<ffffffffbdc82dcd>] SyS_unshare+0x1cd/0x360
> > [  610.383319]  [<ffffffffbe48c03b>] entry_SYSCALL_64_fastpath+0x1e/0xad
> 
> This process is apparently waiting for the net_mutex held by the previous one.
> 
> Either RCU implementation is broken or something else is missing.
> Do you have more stack traces of related processes? For example,
> rcu_tasks_kthread. And if anything you can help to narrow down the problem,
> it would be great.

Did you set the rcu_normal boot parameter?  Doing so would have this effect.

(It is intended for real-time users who don't like expedited grace periods.)

							Thanx, Paul

  reply	other threads:[~2016-11-10 21:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-09 15:42 Long delays creating a netns after deleting one (possibly RCU related) Rolf Neugebauer
2016-11-10 17:37 ` Cong Wang
2016-11-10 21:24   ` Paul E. McKenney [this message]
2016-11-11 13:11     ` Rolf Neugebauer
2016-11-12  0:23       ` Paul E. McKenney
2016-11-12  0:55         ` Cong Wang
2016-11-14  6:47           ` Cong Wang
2016-11-14 16:24             ` Paul E. McKenney
2016-11-14 17:44               ` Cong Wang
2016-11-14 18:14                 ` Paul E. McKenney
2016-11-14 22:12                   ` Eric W. Biederman
2016-11-14 22:46                     ` Eric Dumazet
2016-11-14 23:09                       ` Eric Dumazet
2016-11-15  0:28                         ` [PATCH net] gro_cells: mark napi struct as not busy poll candidates Eric Dumazet
2016-11-15  5:21                           ` Cong Wang
2016-11-15 17:08                           ` Rolf Neugebauer
2016-11-16  3:29                           ` David Miller
2016-11-18  0:31                         ` Long delays creating a netns after deleting one (possibly RCU related) Jarno Rajahalme
2016-11-19  0:38                         ` Jarno Rajahalme
2016-11-19  0:41                           ` Eric Dumazet
2016-11-14 17:29           ` Hannes Frederic Sowa
2016-11-11 14:00   ` Rolf Neugebauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161110212404.GB4127@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ian.campbell@docker.com \
    --cc=justin.cormack@docker.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rolf.neugebauer@docker.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.