All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Tejun Heo <tj@kernel.org>,
	Cai Xinchen <caixinchen1@huawei.com>,
	Imran Khan <imran.f.khan@oracle.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>
Subject: [PATCH 4.19 38/39] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
Date: Wed, 15 Mar 2023 13:12:52 +0100	[thread overview]
Message-ID: <20230315115722.652817372@linuxfoundation.org> (raw)
In-Reply-To: <20230315115721.234756306@linuxfoundation.org>

From: Tejun Heo <tj@kernel.org>

commit 4f7e7236435ca0abe005c674ebd6892c6e83aeb3 upstream.

Bringing up a CPU may involve creating and destroying tasks which requires
read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
cpus_read_lock(). However, cpuset's ->attach(), which may be called with
thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
acquires cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-and-tested-by: Imran Khan <imran.f.khan@oracle.com>
Reported-and-tested-by: Xuewen Yan <xuewen.yan@unisoc.com>
Fixes: 05c7b7a92cc8 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cgroup/cgroup.c |   49 ++++++++++++++++++++++++++++++++++++++++++++-----
 kernel/cgroup/cpuset.c |    7 +------
 2 files changed, 45 insertions(+), 11 deletions(-)

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2210,6 +2210,45 @@ int task_cgroup_path(struct task_struct
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
 /**
+ * cgroup_attach_lock - Lock for ->attach()
+ * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
+ *
+ * cgroup migration sometimes needs to stabilize threadgroups against forks and
+ * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
+ * implementations (e.g. cpuset), also need to disable CPU hotplug.
+ * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can
+ * lead to deadlocks.
+ *
+ * Bringing up a CPU may involve creating and destroying tasks which requires
+ * read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
+ * cpus_read_lock(). If we call an ->attach() which acquires the cpus lock while
+ * write-locking threadgroup_rwsem, the locking order is reversed and we end up
+ * waiting for an on-going CPU hotplug operation which in turn is waiting for
+ * the threadgroup_rwsem to be released to create new tasks. For more details:
+ *
+ *   http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
+ *
+ * Resolve the situation by always acquiring cpus_read_lock() before optionally
+ * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
+ * CPU hotplug is disabled on entry.
+ */
+static void cgroup_attach_lock(void)
+{
+	get_online_cpus();
+	percpu_down_write(&cgroup_threadgroup_rwsem);
+}
+
+/**
+ * cgroup_attach_unlock - Undo cgroup_attach_lock()
+ * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem
+ */
+static void cgroup_attach_unlock(void)
+{
+	percpu_up_write(&cgroup_threadgroup_rwsem);
+	put_online_cpus();
+}
+
+/**
  * cgroup_migrate_add_task - add a migration target task to a migration context
  * @task: target task
  * @mgctx: target migration context
@@ -2694,7 +2733,7 @@ struct task_struct *cgroup_procs_write_s
 	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
 		return ERR_PTR(-EINVAL);
 
-	percpu_down_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_lock();
 
 	rcu_read_lock();
 	if (pid) {
@@ -2725,7 +2764,7 @@ struct task_struct *cgroup_procs_write_s
 	goto out_unlock_rcu;
 
 out_unlock_threadgroup:
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock();
 out_unlock_rcu:
 	rcu_read_unlock();
 	return tsk;
@@ -2740,7 +2779,7 @@ void cgroup_procs_write_finish(struct ta
 	/* release reference from cgroup_procs_write_start() */
 	put_task_struct(task);
 
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock();
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
 			ss->post_attach();
@@ -2799,7 +2838,7 @@ static int cgroup_update_dfl_csses(struc
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	percpu_down_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_lock();
 
 	/* look up all csses currently attached to @cgrp's subtree */
 	spin_lock_irq(&css_set_lock);
@@ -2830,7 +2869,7 @@ static int cgroup_update_dfl_csses(struc
 	ret = cgroup_migrate_execute(&mgctx);
 out_finish:
 	cgroup_migrate_finish(&mgctx);
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock();
 	return ret;
 }
 
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1528,13 +1528,9 @@ static void cpuset_attach(struct cgroup_
 	cgroup_taskset_first(tset, &css);
 	cs = css_cs(css);
 
+	lockdep_assert_cpus_held();     /* see cgroup_attach_lock() */
 	mutex_lock(&cpuset_mutex);
 
-	/*
-	 * It should hold cpus lock because a cpu offline event can
-	 * cause set_cpus_allowed_ptr() failed.
-	 */
-	get_online_cpus();
 	/* prepare for attach */
 	if (cs == &top_cpuset)
 		cpumask_copy(cpus_attach, cpu_possible_mask);
@@ -1553,7 +1549,6 @@ static void cpuset_attach(struct cgroup_
 		cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
 		cpuset_update_task_spread_flag(cs, task);
 	}
-       put_online_cpus();
 
 	/*
 	 * Change mm for all threadgroup leaders. This is expensive and may



  parent reply	other threads:[~2023-03-15 12:16 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 12:12 [PATCH 4.19 00/39] 4.19.278-rc1 review Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 01/39] fs: prevent out-of-bounds array speculation when closing a file descriptor Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 02/39] x86/CPU/AMD: Disable XSAVES on AMD family 0x17 Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 03/39] ext4: fix RENAME_WHITEOUT handling for inline directories Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 04/39] ext4: fix another off-by-one fsmap error on 1k block filesystems Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 05/39] ext4: move where set the MAY_INLINE_DATA flag is set Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 06/39] ext4: fix WARNING in ext4_update_inline_data Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 07/39] ext4: zero i_disksize when initializing the bootloader inode Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 08/39] nfc: change order inside nfc_se_io error path Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 09/39] udf: Explain handling of load_nls() failure Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 10/39] udf: reduce leakage of blocks related to named streams Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 11/39] udf: Remove pointless union in udf_inode_info Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 12/39] udf: Preserve link count of system files Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 13/39] udf: Detect system inodes linked into directory hierarchy Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 14/39] ARM: dts: exynos: Fix language typo and indentation Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 15/39] ARM: dts: exynos: Override thermal by label in Exynos4210 Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 16/39] ARM: dts: exynos: correct TMU phandle " Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 17/39] ARM: dts: exynos: Add all CPUs in cooling maps Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 18/39] ARM: dts: exynos: Move pmu and timer nodes out of soc Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 19/39] ARM: dts: exynos: Override thermal by label in Exynos5250 Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 20/39] ARM: dts: exynos: correct TMU phandle " Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 21/39] kbuild: fix false-positive need-builtin calculation Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 22/39] kbuild: generate modules.order only in directories visited by obj-y/m Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 23/39] ARM: dts: exynos: Add GPU thermal zone cooling maps for Odroid XU3/XU4/HC1 Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 24/39] ARM: dts: exynos: correct TMU phandle in Odroid HC1 Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 25/39] ARM: dts: exynos: correct TMU phandle in Odroid XU3 family Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 26/39] scsi: core: Remove the /proc/scsi/${proc_name} directory earlier Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 27/39] Revert "spi: mt7621: Fix an error message in mt7621_spi_probe()" Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 28/39] clk: qcom: mmcc-apq8084: remove spdm clocks Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 29/39] MIPS: Fix a compilation issue Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 30/39] alpha: fix R_ALPHA_LITERAL reloc for large modules Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 31/39] macintosh: windfarm: Use unsigned type for 1-bit bitfields Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 32/39] PCI: Add SolidRun vendor ID Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 33/39] PCI: Avoid FLR for SolidRun SNET DPU rev 1 Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 34/39] media: ov5640: Fix analogue gain control Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 35/39] tipc: improve function tipc_wait_for_cond() Greg Kroah-Hartman
2023-03-15 12:12 ` [Intel-gfx] [PATCH 4.19 36/39] drm/i915: Dont use BAR mappings for ring buffers with LLC Greg Kroah-Hartman
2023-03-15 12:12   ` Greg Kroah-Hartman
2023-03-15 12:12 ` [PATCH 4.19 37/39] cgroup/cpuset: Change cpuset_rwsem and hotplug lock order Greg Kroah-Hartman
2023-03-15 12:12 ` Greg Kroah-Hartman [this message]
2023-03-15 12:12 ` [PATCH 4.19 39/39] cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() Greg Kroah-Hartman
2023-03-15 14:12 ` [PATCH 4.19 00/39] 4.19.278-rc1 review Chris Paterson
2023-03-15 14:24   ` Chris Paterson
2023-03-16  7:48   ` Greg Kroah-Hartman
2023-03-15 14:32 ` Guenter Roeck
2023-03-15 15:44 ` Daniel Díaz
2023-03-15 15:59   ` Guenter Roeck
2023-03-15 16:35     ` Greg Kroah-Hartman
2023-03-15 16:28   ` Daniel Díaz
2023-03-16  7:48     ` Greg Kroah-Hartman
2023-03-16  7:47   ` Greg Kroah-Hartman
2023-03-16  0:04 ` Shuah Khan
2023-03-16  8:51 ` Missing patches in 4.19? was " Pavel Machek
2023-03-16  9:34   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230315115722.652817372@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=caixinchen1@huawei.com \
    --cc=imran.f.khan@oracle.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.