From: Michal Hocko <mhocko@suse.cz>
To: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>,
paul@paulmenage.org, glommer@parallels.com,
containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
peterz@infradead.org, bsingharora@gmail.com, hannes@cmpxchg.org,
kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core
Date: Wed, 2 Jan 2013 09:53:55 +0100 [thread overview]
Message-ID: <20130102085355.GA22160@dhcp22.suse.cz> (raw)
In-Reply-To: <50DAD696.8050400@huawei.com>
Hi Li,
On Wed 26-12-12 18:51:02, Li Zefan wrote:
> I reverted 38d7bee9d24adf4c95676a3dc902827c72930ebb ("cpuset: use N_MEMORY instead N_HIGH_MEMORY")
> and applied this patchset against 3.8-rc1.
I didn't find any patch in this email.
Anyway I am wondering how the above patch could cause the stuck you
mention below? The patch just renames N_HIGH_MEMORY -> N_MEMORY which
should map to the very same constant so there are no functional changes
AFAIU.
> I created a cpuset which has cpuset.cpus=1, and I forked a few cpu-hog tasks
> and moved them to this cpuset, and the final operations was offlining cpu1.
> It stucked.
>
> The only processes in D state are kworker threads:
>
> # cat /proc/6/stack
>
> [<ffffffff81062be1>] wait_rcu_gp+0x51/0x60
> [<ffffffff810d18f6>] synchronize_sched+0x36/0x50
> [<ffffffff810b1b84>] cgroup_attach_task+0x144/0x1a0
> [<ffffffff810b737d>] cpuset_do_move_task+0x2d/0x40
> [<ffffffff810b3887>] cgroup_scan_tasks+0x1a7/0x270
> [<ffffffff810b9080>] cpuset_propagate_hotplug_workfn+0x130/0x360
> [<ffffffff8105d9d3>] process_one_work+0x1c3/0x3c0
> [<ffffffff81060e3a>] worker_thread+0x13a/0x400
> [<ffffffff8106613e>] kthread+0xce/0xe0
> [<ffffffff8144166c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> # cat /proc/80/stack
>
> [<ffffffff81060015>] flush_workqueue+0x185/0x460
> [<ffffffff810b8b90>] cpuset_hotplug_workfn+0x2f0/0x5b0
> [<ffffffff8105d9d3>] process_one_work+0x1c3/0x3c0
> [<ffffffff81060e3a>] worker_thread+0x13a/0x400
> [<ffffffff8106613e>] kthread+0xce/0xe0
> [<ffffffff8144166c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> After a while. dmesg:
>
> [ 222.290677] smpboot: CPU 1 is now offline
> [ 222.292405] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 229.383324] smpboot: CPU 1 is now offline
> [ 229.385415] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 231.715738] smpboot: CPU 1 is now offline
> [ 231.717657] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 287.773881] smpboot: CPU 1 is now offline
> [ 287.789983] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 343.248988] INFO: rcu_sched self-detected stall on CPU { 4} (t=5250 jiffies g=1650 c=1649 q=2683)
> [ 343.248998] Pid: 7861, comm: test3.sh Not tainted 3.8.0-rc1-0.7-default+ #6
> [ 343.249000] Call Trace:
> [ 343.249002] <IRQ> [<ffffffff810d11b9>] rcu_check_callbacks+0x249/0x770
> [ 343.249018] [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
> [ 343.249021] [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
> [ 343.249028] [<ffffffff810521f6>] update_process_times+0x46/0x90
> [ 343.249031] [<ffffffff8109bf9f>] tick_sched_handle+0x3f/0x50
> [ 343.249034] [<ffffffff8109c1a4>] tick_sched_timer+0x54/0x90
> [ 343.249037] [<ffffffff8106a99f>] __run_hrtimer+0xcf/0x1d0
> [ 343.249040] [<ffffffff8106ace7>] hrtimer_interrupt+0xe7/0x220
> [ 343.249048] [<ffffffff81443279>] smp_apic_timer_interrupt+0x69/0x99
> [ 343.249051] [<ffffffff81442332>] apic_timer_interrupt+0x72/0x80
> [ 343.249053] <EOI> [<ffffffff81439320>] ? retint_restore_args+0x13/0x13
> [ 343.249062] [<ffffffff8106fa60>] ? task_nice+0x20/0x20
> [ 343.249066] [<ffffffff814224aa>] ? _cpu_down+0x19a/0x2e0
> [ 343.249069] [<ffffffff8142262e>] cpu_down+0x3e/0x60
> [ 343.249072] [<ffffffff81426635>] store_online+0x75/0xe0
> [ 343.249076] [<ffffffff812fc450>] dev_attr_store+0x20/0x30
> [ 343.249082] [<ffffffff811d6b07>] sysfs_write_file+0xc7/0x140
> [ 343.249087] [<ffffffff811671bb>] vfs_write+0xcb/0x130
> [ 343.249090] [<ffffffff81167a31>] sys_write+0x61/0xa0
> [ 343.249093] [<ffffffff81441719>] system_call_fastpath+0x16/0x1b
> [ 406.164733] INFO: rcu_sched self-detected stall on CPU { 4} (t=21003 jiffies g=1650 c=1649 q=9248)
> [ 406.164741] Pid: 7861, comm: test3.sh Not tainted 3.8.0-rc1-0.7-default+ #6
> [ 406.164743] Call Trace:
> [ 406.164744] <IRQ> [<ffffffff810d11b9>] rcu_check_callbacks+0x249/0x770
> [ 406.164753] [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
> [ 406.164756] [<ffffffff8109c150>] ? tick_nohz_handler+0xc0/0xc0
> [ 406.164760] [<ffffffff810521f6>] update_process_times+0x46/0x90
> [ 406.164763] [<ffffffff8109bf9f>] tick_sched_handle+0x3f/0x50
> [ 406.164766] [<ffffffff8109c1a4>] tick_sched_timer+0x54/0x90
> [ 406.164769] [<ffffffff8106a99f>] __run_hrtimer+0xcf/0x1d0
> [ 406.164771] [<ffffffff8106ace7>] hrtimer_interrupt+0xe7/0x220
> [ 406.164777] [<ffffffff81443279>] smp_apic_timer_interrupt+0x69/0x99
> [ 406.164780] [<ffffffff81442332>] apic_timer_interrupt+0x72/0x80
> [ 406.164781] <EOI> [<ffffffff814224aa>] ? _cpu_down+0x19a/0x2e0
> [ 406.164787] [<ffffffff814224aa>] ? _cpu_down+0x19a/0x2e0
> [ 406.164790] [<ffffffff8142262e>] cpu_down+0x3e/0x60
> [ 406.164792] [<ffffffff81426635>] store_online+0x75/0xe0
> [ 406.164795] [<ffffffff812fc450>] dev_attr_store+0x20/0x30
> [ 406.164799] [<ffffffff811d6b07>] sysfs_write_file+0xc7/0x140
> [ 406.164802] [<ffffffff811671bb>] vfs_write+0xcb/0x130
> [ 406.164805] [<ffffffff81167a31>] sys_write+0x61/0xa0
> [ 406.164808] [<ffffffff81441719>] system_call_fastpath+0x16/0x1b
>
> I did the same thing without this patchset, and everthing's fine.
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-01-02 8:54 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-28 21:34 [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core Tejun Heo
2012-11-28 21:34 ` [PATCH 01/13] cpuset: remove unused cpuset_unlock() Tejun Heo
2012-11-28 21:34 ` [PATCH 02/13] cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() Tejun Heo
2012-11-28 21:34 ` [PATCH 03/13] cpuset: introduce ->css_on/offline() Tejun Heo
2012-11-28 21:34 ` [PATCH 04/13] cpuset: introduce CS_ONLINE Tejun Heo
2012-11-28 21:34 ` [PATCH 05/13] cpuset: introduce cpuset_for_each_child() Tejun Heo
2012-11-28 21:34 ` [PATCH 06/13] cpuset: cleanup cpuset[_can]_attach() Tejun Heo
2012-12-26 10:20 ` Li Zefan
2012-12-26 12:04 ` Tejun Heo
2013-01-02 4:42 ` Rusty Russell
2013-01-02 15:34 ` Tejun Heo
2013-01-03 0:47 ` Rusty Russell
2013-01-03 2:29 ` Tejun Heo
2013-01-06 23:28 ` Rusty Russell
2012-11-28 21:34 ` [PATCH 07/13] cpuset: drop async_rebuild_sched_domains() Tejun Heo
2012-11-28 21:34 ` [PATCH 08/13] cpuset: reorganize CPU / memory hotplug handling Tejun Heo
2012-11-28 21:34 ` [PATCH 09/13] cpuset: don't nest cgroup_mutex inside get_online_cpus() Tejun Heo
2012-11-28 21:34 ` [PATCH 10/13] cpuset: make CPU / memory hotplug propagation asynchronous Tejun Heo
2012-11-28 21:34 ` [PATCH 11/13] cpuset: pin down cpus and mems while a task is being attached Tejun Heo
2012-11-28 21:34 ` [PATCH 12/13] cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty Tejun Heo
2012-11-28 21:34 ` [PATCH 13/13] cpuset: replace cgroup_mutex locking with cpuset internal locking Tejun Heo
2012-11-29 11:14 ` [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core Glauber Costa
2012-11-29 14:26 ` Tejun Heo
2012-11-29 14:36 ` Tejun Heo
2012-11-30 3:21 ` Kamezawa Hiroyuki
2012-11-30 8:33 ` Michal Hocko
2012-11-30 9:00 ` Glauber Costa
2012-11-30 9:24 ` Michal Hocko
2012-11-30 9:33 ` Glauber Costa
2012-11-30 9:42 ` Glauber Costa
2012-11-30 9:49 ` Michal Hocko
2012-11-30 10:00 ` Glauber Costa
2012-11-30 14:59 ` Tejun Heo
2012-11-30 15:09 ` Glauber Costa
2012-12-03 15:22 ` Michal Hocko
2012-12-03 16:53 ` Tejun Heo
2012-12-06 6:25 ` Li Zefan
2012-12-06 13:09 ` Michal Hocko
2012-12-06 16:54 ` Tejun Heo
2012-12-26 10:51 ` Li Zefan
2013-01-02 8:53 ` Michal Hocko [this message]
2013-01-02 15:36 ` Tejun Heo
2013-01-02 16:02 ` Michal Hocko
2013-01-03 22:20 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130102085355.GA22160@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=bsingharora@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=paul@paulmenage.org \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).