From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: [PATCH 12/13] cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty Date: Thu, 3 Jan 2013 13:36:06 -0800 Message-ID: <1357248967-24959-13-git-send-email-tj@kernel.org> References: <1357248967-24959-1-git-send-email-tj@kernel.org> Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references; bh=LVSIV19Jwr8YnICVEJxzrrhRv9/UT1sBEBrnt2Qogw0=; b=mUP1jKpsvlqTYNclHLm5uvfl/WxcU16+NjdknsYuRZWsbjJ3aD+xm1ylQkxH/ZCvOg weld/mf0gZXOS5PZHiqRYnCtUfBOe1LmC/3S4zf68hNFRjpXpg6ajMLqsD91d2N4R9gi VZ1ptGR9jlJ1qi+EslTSCe735S7MM//LUM2NSFsR3KgJoXgVX/HY78RnLNpD6a1hmewB ZIAqbUbrYS2hiu31jc3mg27Uio5/TBMHBpmy/hvAUTIKyvaxh27J5BRxTXx+FOlsJnJW J4P+D/cAcQ1RVw8vogicQwZ+6B1GL1R+mBSVzHJOdnw672YRolUU5u2FFzrHEj2tXzQB hT6Q== In-Reply-To: <1357248967-24959-1-git-send-email-tj@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lizefan@huawei.com, paul@paulmenage.org, glommer@parallels.com Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org, peterz@infradead.org, mhocko@suse.cz, bsingharora@gmail.com, hannes@cmpxchg.org, kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Tejun Heo cpuset is scheduled to be decoupled from cgroup_lock which will make hotplug handling race with task migration. cpus or mems will be allowed to go offline between ->can_attach() and ->attach(). If hotplug takes down all cpus or mems of a cpuset while attach is in progress, ->attach() may end up putting tasks into an empty cpuset. This patchset makes ->attach() schedule hotplug propagation if the cpuset is empty after attaching is complete. This will move the tasks to the nearest ancestor which can execute and the end result would be as if hotplug handling happened after the tasks finished attaching. cpuset_write_resmask() now also flushes cpuset_propagate_hotplug_wq to wait for propagations scheduled directly by cpuset_attach(). This currently doesn't make any functional difference as everything is protected by cgroup_mutex but enables decoupling the locking. Signed-off-by: Tejun Heo --- kernel/cpuset.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 27e8614..94dd6b5 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -266,6 +266,7 @@ static struct workqueue_struct *cpuset_propagate_hotplug_wq; static void cpuset_hotplug_workfn(struct work_struct *work); static void cpuset_propagate_hotplug_workfn(struct work_struct *work); +static void schedule_cpuset_propagate_hotplug(struct cpuset *cs); static DECLARE_WORK(cpuset_hotplug_work, cpuset_hotplug_workfn); @@ -1464,6 +1465,14 @@ static void cpuset_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) } cs->attach_in_progress--; + + /* + * We may have raced with CPU/memory hotunplug. Trigger hotplug + * propagation if @cs doesn't have any CPU or memory. It will move + * the newly added tasks to the nearest parent which can execute. + */ + if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)) + schedule_cpuset_propagate_hotplug(cs); } /* The various types of files and directories in a cpuset file system */ @@ -1569,8 +1578,13 @@ static int cpuset_write_resmask(struct cgroup *cgrp, struct cftype *cft, * resources, wait for the previously scheduled operations before * proceeding, so that we don't end up keep removing tasks added * after execution capability is restored. + * + * Flushing cpuset_hotplug_work is enough to synchronize against + * hotplug hanlding; however, cpuset_attach() may schedule + * propagation work directly. Flush the workqueue too. */ flush_work(&cpuset_hotplug_work); + flush_workqueue(cpuset_propagate_hotplug_wq); if (!cgroup_lock_live_group(cgrp)) return -ENODEV; -- 1.8.0.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754397Ab3ACVhO (ORCPT ); Thu, 3 Jan 2013 16:37:14 -0500 Received: from mail-pa0-f53.google.com ([209.85.220.53]:62359 "EHLO mail-pa0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754303Ab3ACVgi (ORCPT ); Thu, 3 Jan 2013 16:36:38 -0500 From: Tejun Heo To: lizefan@huawei.com, paul@paulmenage.org, glommer@parallels.com Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org, peterz@infradead.org, mhocko@suse.cz, bsingharora@gmail.com, hannes@cmpxchg.org, kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Tejun Heo Subject: [PATCH 12/13] cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty Date: Thu, 3 Jan 2013 13:36:06 -0800 Message-Id: <1357248967-24959-13-git-send-email-tj@kernel.org> X-Mailer: git-send-email 1.8.0.2 In-Reply-To: <1357248967-24959-1-git-send-email-tj@kernel.org> References: <1357248967-24959-1-git-send-email-tj@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org cpuset is scheduled to be decoupled from cgroup_lock which will make hotplug handling race with task migration. cpus or mems will be allowed to go offline between ->can_attach() and ->attach(). If hotplug takes down all cpus or mems of a cpuset while attach is in progress, ->attach() may end up putting tasks into an empty cpuset. This patchset makes ->attach() schedule hotplug propagation if the cpuset is empty after attaching is complete. This will move the tasks to the nearest ancestor which can execute and the end result would be as if hotplug handling happened after the tasks finished attaching. cpuset_write_resmask() now also flushes cpuset_propagate_hotplug_wq to wait for propagations scheduled directly by cpuset_attach(). This currently doesn't make any functional difference as everything is protected by cgroup_mutex but enables decoupling the locking. Signed-off-by: Tejun Heo --- kernel/cpuset.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 27e8614..94dd6b5 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -266,6 +266,7 @@ static struct workqueue_struct *cpuset_propagate_hotplug_wq; static void cpuset_hotplug_workfn(struct work_struct *work); static void cpuset_propagate_hotplug_workfn(struct work_struct *work); +static void schedule_cpuset_propagate_hotplug(struct cpuset *cs); static DECLARE_WORK(cpuset_hotplug_work, cpuset_hotplug_workfn); @@ -1464,6 +1465,14 @@ static void cpuset_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) } cs->attach_in_progress--; + + /* + * We may have raced with CPU/memory hotunplug. Trigger hotplug + * propagation if @cs doesn't have any CPU or memory. It will move + * the newly added tasks to the nearest parent which can execute. + */ + if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)) + schedule_cpuset_propagate_hotplug(cs); } /* The various types of files and directories in a cpuset file system */ @@ -1569,8 +1578,13 @@ static int cpuset_write_resmask(struct cgroup *cgrp, struct cftype *cft, * resources, wait for the previously scheduled operations before * proceeding, so that we don't end up keep removing tasks added * after execution capability is restored. + * + * Flushing cpuset_hotplug_work is enough to synchronize against + * hotplug hanlding; however, cpuset_attach() may schedule + * propagation work directly. Flush the workqueue too. */ flush_work(&cpuset_hotplug_work); + flush_workqueue(cpuset_propagate_hotplug_wq); if (!cgroup_lock_live_group(cgrp)) return -ENODEV; -- 1.8.0.2