Re: [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Li Zefan <lizefan@huawei.com>
To: Tejun Heo <tj@kernel.org>
Cc: paul@paulmenage.org, glommer@parallels.com,
	containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	peterz@infradead.org, mhocko@suse.cz, bsingharora@gmail.com,
	hannes@cmpxchg.org, kamezawa.hiroyu@jp.fujitsu.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core
Date: Wed, 26 Dec 2012 18:51:02 +0800	[thread overview]
Message-ID: <50DAD696.8050400@huawei.com> (raw)
In-Reply-To: <1354138460-19286-1-git-send-email-tj@kernel.org>

On 2012/11/29 5:34, Tejun Heo wrote:
> Hello, guys.
> 
> Depending on cgroup core locking - cgroup_mutex - is messy and makes
> cgroup prone to locking dependency problems.  The current code already
> has lock dependency loop - memcg nests get_online_cpus() inside
> cgroup_mutex.  cpuset the other way around.
> 
> Regardless of the locking details, whatever is protecting cgroup has
> inherently to be something outer to most other locking constructs.
> cgroup calls into a lot of major subsystems which in turn have to
> perform subsystem-specific locking.  Trying to nest cgroup
> synchronization inside other locks isn't something which can work
> well.
> 
> cgroup now has enough API to allow subsystems to implement their own
> locking and cgroup_mutex is scheduled to be made private to cgroup
> core.  This patchset makes cpuset implement its own locking instead of
> relying on cgroup_mutex.
> 
> cpuset is rather nasty in this respect.  Some of it seems to have come
> from the implementation history - cgroup core grew out of cpuset - but
> big part stems from cpuset's need to migrate tasks to an ancestor
> cgroup when an hotunplug event makes a cpuset empty (w/o any cpu or
> memory).
> 
> This patchset decouples cpuset locking from cgroup_mutex.  After the
> patchset, cpuset uses cpuset-specific cpuset_mutex instead of
> cgroup_mutex.  This also removes the lockdep warning triggered during
> cpu offlining (see 0009).
> 
> Note that this leaves memcg as the only external user of cgroup_mutex.
> Michal, Kame, can you guys please convert memcg to use its own locking
> too?
> 
> This patchset contains the following thirteen patches.
> 
>  0001-cpuset-remove-unused-cpuset_unlock.patch
>  0002-cpuset-remove-fast-exit-path-from-remove_tasks_in_em.patch
>  0003-cpuset-introduce-css_on-offline.patch
>  0004-cpuset-introduce-CS_ONLINE.patch
>  0005-cpuset-introduce-cpuset_for_each_child.patch
>  0006-cpuset-cleanup-cpuset-_can-_attach.patch
>  0007-cpuset-drop-async_rebuild_sched_domains.patch
>  0008-cpuset-reorganize-CPU-memory-hotplug-handling.patch
>  0009-cpuset-don-t-nest-cgroup_mutex-inside-get_online_cpu.patch
>  0010-cpuset-make-CPU-memory-hotplug-propagation-asynchron.patch
>  0011-cpuset-pin-down-cpus-and-mems-while-a-task-is-being-.patch
>  0012-cpuset-schedule-hotplug-propagation-from-cpuset_atta.patch
>  0013-cpuset-replace-cgroup_mutex-locking-with-cpuset-inte.patch
> 
> 0001-0006 are prep patches.
> 
> 0007-0009 make cpuset nest get_online_cpus() inside cgroup_mutex, not
> the other way around.
> 
> 0010-0012 plug holes which would be exposed by switching to
> cpuset-specific locking.
> 
> 0013 replaces cgroup_mutex with cpuset_mutex.
> 
> This patchset is on top of cgroup/for-3.8 (fddfb02ad0) and also
> available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cpuset-locking
> 
> diffstat follows.
> 
>  kernel/cpuset.c |  750 +++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 423 insertions(+), 327 deletions(-)
> 

I reverted 38d7bee9d24adf4c95676a3dc902827c72930ebb ("cpuset: use N_MEMORY instead N_HIGH_MEMORY")
and applied this patchset against 3.8-rc1.

I created a cpuset which has cpuset.cpus=1, and I forked a few cpu-hog tasks
and moved them to this cpuset, and the final operations was offlining cpu1.
It stucked.

The only processes in D state are kworker threads:

# cat /proc/6/stack

[<ffffffff81062be1>] wait_rcu_gp+0x51/0x60
[<ffffffff810d18f6>] synchronize_sched+0x36/0x50
[<ffffffff810b1b84>] cgroup_attach_task+0x144/0x1a0
[<ffffffff810b737d>] cpuset_do_move_task+0x2d/0x40
[<ffffffff810b3887>] cgroup_scan_tasks+0x1a7/0x270
[<ffffffff810b9080>] cpuset_propagate_hotplug_workfn+0x130/0x360
[<ffffffff8105d9d3>] process_one_work+0x1c3/0x3c0
[<ffffffff81060e3a>] worker_thread+0x13a/0x400
[<ffffffff8106613e>] kthread+0xce/0xe0
[<ffffffff8144166c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

# cat /proc/80/stack

[<ffffffff81060015>] flush_workqueue+0x185/0x460
[<ffffffff810b8b90>] cpuset_hotplug_workfn+0x2f0/0x5b0
[<ffffffff8105d9d3>] process_one_work+0x1c3/0x3c0
[<ffffffff81060e3a>] worker_thread+0x13a/0x400
[<ffffffff8106613e>] kthread+0xce/0xe0
[<ffffffff8144166c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff


After a while. dmesg:

[  222.290677] smpboot: CPU 1 is now offline
[  222.292405] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  229.383324] smpboot: CPU 1 is now offline
[  229.385415] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  231.715738] smpboot: CPU 1 is now offline
[  231.717657] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  287.773881] smpboot: CPU 1 is now offline
[  287.789983] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  343.248988] INFO: rcu_sched self-detected stall on CPU { 4}  (t=5250 jiffies g=1650 c=1649 q=2683)
[  343.248998] Pid: 7861, comm: test3.sh Not tainted 3.8.0-rc1-0.7-default+ #6
[  343.249000] Call Trace:
[  343.249002]  <IRQ>  [<ffffffff810d11b9>] rcu_check_callbacks+0x249/0x770
[  343.249018]  [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
[  343.249021]  [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
[  343.249028]  [<ffffffff810521f6>] update_process_times+0x46/0x90
[  343.249031]  [<ffffffff8109bf9f>] tick_sched_handle+0x3f/0x50
[  343.249034]  [<ffffffff8109c1a4>] tick_sched_timer+0x54/0x90
[  343.249037]  [<ffffffff8106a99f>] __run_hrtimer+0xcf/0x1d0
[  343.249040]  [<ffffffff8106ace7>] hrtimer_interrupt+0xe7/0x220
[  343.249048]  [<ffffffff81443279>] smp_apic_timer_interrupt+0x69/0x99
[  343.249051]  [<ffffffff81442332>] apic_timer_interrupt+0x72/0x80
[  343.249053]  <EOI>  [<ffffffff81439320>] ? retint_restore_args+0x13/0x13
[  343.249062]  [<ffffffff8106fa60>] ? task_nice+0x20/0x20
[  343.249066]  [<ffffffff814224aa>] ? _cpu_down+0x19a/0x2e0
[  343.249069]  [<ffffffff8142262e>] cpu_down+0x3e/0x60
[  343.249072]  [<ffffffff81426635>] store_online+0x75/0xe0
[  343.249076]  [<ffffffff812fc450>] dev_attr_store+0x20/0x30
[  343.249082]  [<ffffffff811d6b07>] sysfs_write_file+0xc7/0x140
[  343.249087]  [<ffffffff811671bb>] vfs_write+0xcb/0x130
[  343.249090]  [<ffffffff81167a31>] sys_write+0x61/0xa0
[  343.249093]  [<ffffffff81441719>] system_call_fastpath+0x16/0x1b
[  406.164733] INFO: rcu_sched self-detected stall on CPU { 4}  (t=21003 jiffies g=1650 c=1649 q=9248)
[  406.164741] Pid: 7861, comm: test3.sh Not tainted 3.8.0-rc1-0.7-default+ #6
[  406.164743] Call Trace:
[  406.164744]  <IRQ>  [<ffffffff810d11b9>] rcu_check_callbacks+0x249/0x770
[  406.164753]  [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
[  406.164756]  [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
[  406.164760]  [<ffffffff810521f6>] update_process_times+0x46/0x90
[  406.164763]  [<ffffffff8109bf9f>] tick_sched_handle+0x3f/0x50
[  406.164766]  [<ffffffff8109c1a4>] tick_sched_timer+0x54/0x90
[  406.164769]  [<ffffffff8106a99f>] __run_hrtimer+0xcf/0x1d0
[  406.164771]  [<ffffffff8106ace7>] hrtimer_interrupt+0xe7/0x220
[  406.164777]  [<ffffffff81443279>] smp_apic_timer_interrupt+0x69/0x99
[  406.164780]  [<ffffffff81442332>] apic_timer_interrupt+0x72/0x80
[  406.164781]  <EOI>  [<ffffffff814224aa>] ? _cpu_down+0x19a/0x2e0
[  406.164787]  [<ffffffff814224aa>] ? _cpu_down+0x19a/0x2e0
[  406.164790]  [<ffffffff8142262e>] cpu_down+0x3e/0x60
[  406.164792]  [<ffffffff81426635>] store_online+0x75/0xe0
[  406.164795]  [<ffffffff812fc450>] dev_attr_store+0x20/0x30
[  406.164799]  [<ffffffff811d6b07>] sysfs_write_file+0xc7/0x140
[  406.164802]  [<ffffffff811671bb>] vfs_write+0xcb/0x130
[  406.164805]  [<ffffffff81167a31>] sys_write+0x61/0xa0
[  406.164808]  [<ffffffff81441719>] system_call_fastpath+0x16/0x1b

I did the same thing without this patchset, and everthing's fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-12-26 10:53 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-28 21:34 [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core Tejun Heo
2012-11-28 21:34 ` [PATCH 01/13] cpuset: remove unused cpuset_unlock() Tejun Heo
2012-11-28 21:34 ` [PATCH 02/13] cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() Tejun Heo
2012-11-28 21:34 ` [PATCH 03/13] cpuset: introduce ->css_on/offline() Tejun Heo
2012-11-28 21:34 ` [PATCH 04/13] cpuset: introduce CS_ONLINE Tejun Heo
2012-11-28 21:34 ` [PATCH 05/13] cpuset: introduce cpuset_for_each_child() Tejun Heo
2012-11-28 21:34 ` [PATCH 06/13] cpuset: cleanup cpuset[_can]_attach() Tejun Heo
2012-12-26 10:20   ` Li Zefan
2012-12-26 12:04     ` Tejun Heo
2013-01-02  4:42       ` Rusty Russell
2013-01-02 15:34         ` Tejun Heo
2013-01-03  0:47           ` Rusty Russell
2013-01-03  2:29             ` Tejun Heo
2013-01-06 23:28               ` Rusty Russell
2012-11-28 21:34 ` [PATCH 07/13] cpuset: drop async_rebuild_sched_domains() Tejun Heo
2012-11-28 21:34 ` [PATCH 08/13] cpuset: reorganize CPU / memory hotplug handling Tejun Heo
2012-11-28 21:34 ` [PATCH 09/13] cpuset: don't nest cgroup_mutex inside get_online_cpus() Tejun Heo
2012-11-28 21:34 ` [PATCH 10/13] cpuset: make CPU / memory hotplug propagation asynchronous Tejun Heo
2012-11-28 21:34 ` [PATCH 11/13] cpuset: pin down cpus and mems while a task is being attached Tejun Heo
2012-11-28 21:34 ` [PATCH 12/13] cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty Tejun Heo
2012-11-28 21:34 ` [PATCH 13/13] cpuset: replace cgroup_mutex locking with cpuset internal locking Tejun Heo
2012-11-29 11:14 ` [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core Glauber Costa
2012-11-29 14:26   ` Tejun Heo
2012-11-29 14:36     ` Tejun Heo
2012-11-30  3:21 ` Kamezawa Hiroyuki
2012-11-30  8:33   ` Michal Hocko
2012-11-30  9:00   ` Glauber Costa
2012-11-30  9:24     ` Michal Hocko
2012-11-30  9:33       ` Glauber Costa
2012-11-30  9:42       ` Glauber Costa
2012-11-30  9:49         ` Michal Hocko
2012-11-30 10:00           ` Glauber Costa
2012-11-30 14:59             ` Tejun Heo
2012-11-30 15:09               ` Glauber Costa
2012-12-03 15:22 ` Michal Hocko
2012-12-03 16:53   ` Tejun Heo
2012-12-06  6:25     ` Li Zefan
2012-12-06 13:09       ` Michal Hocko
2012-12-06 16:54         ` Tejun Heo
2012-12-26 10:51 ` Li Zefan [this message]
2013-01-02  8:53   ` Michal Hocko
2013-01-02 15:36     ` Tejun Heo
2013-01-02 16:02       ` Michal Hocko
2013-01-03 22:20   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50DAD696.8050400@huawei.com \
    --to=lizefan@huawei.com \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=paul@paulmenage.org \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).