public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Tejun Heo <tj@kernel.org>,
	Cai Xinchen <caixinchen1@huawei.com>,
	Imran Khan <imran.f.khan@oracle.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>
Subject: [PATCH 4.19 83/84] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
Date: Mon,  3 Apr 2023 16:09:24 +0200	[thread overview]
Message-ID: <20230403140356.258072954@linuxfoundation.org> (raw)
In-Reply-To: <20230403140353.406927418@linuxfoundation.org>

From: Tejun Heo <tj@kernel.org>

commit 4f7e7236435ca0abe005c674ebd6892c6e83aeb3 upstream.

Add #include <linux/cpu.h> to avoid compile error on some architectures.

commit 9a3284fad42f6 ("cgroup: Optimize single thread migration") and
commit 671c11f0619e5 ("cgroup: Elide write-locking threadgroup_rwsem
when updating csses on an empty subtree") are not backport. So ignore the
input parameter of cgroup_attach_lock/cgroup_attach_unlock.

original commit message:

Bringing up a CPU may involve creating and destroying tasks which requires
read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
cpus_read_lock(). However, cpuset's ->attach(), which may be called with
thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
acquires cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-and-tested-by: Imran Khan <imran.f.khan@oracle.com>
Reported-and-tested-by: Xuewen Yan <xuewen.yan@unisoc.com>
Fixes: 05c7b7a92cc8 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
v2:
 - Add #include <linux/cpu.h> in kernel/cgroup/cgroup.c to avoid compile
   error on some architectures
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cgroup/cgroup.c |   50 ++++++++++++++++++++++++++++++++++++++++++++-----
 kernel/cgroup/cpuset.c |    7 ------
 2 files changed, 46 insertions(+), 11 deletions(-)

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -55,6 +55,7 @@
 #include <linux/nsproxy.h>
 #include <linux/file.h>
 #include <linux/sched/cputime.h>
+#include <linux/cpu.h>
 #include <net/sock.h>
 
 #define CREATE_TRACE_POINTS
@@ -2210,6 +2211,45 @@ int task_cgroup_path(struct task_struct
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
 /**
+ * cgroup_attach_lock - Lock for ->attach()
+ * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
+ *
+ * cgroup migration sometimes needs to stabilize threadgroups against forks and
+ * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
+ * implementations (e.g. cpuset), also need to disable CPU hotplug.
+ * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can
+ * lead to deadlocks.
+ *
+ * Bringing up a CPU may involve creating and destroying tasks which requires
+ * read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
+ * cpus_read_lock(). If we call an ->attach() which acquires the cpus lock while
+ * write-locking threadgroup_rwsem, the locking order is reversed and we end up
+ * waiting for an on-going CPU hotplug operation which in turn is waiting for
+ * the threadgroup_rwsem to be released to create new tasks. For more details:
+ *
+ *   http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
+ *
+ * Resolve the situation by always acquiring cpus_read_lock() before optionally
+ * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
+ * CPU hotplug is disabled on entry.
+ */
+static void cgroup_attach_lock(void)
+{
+	get_online_cpus();
+	percpu_down_write(&cgroup_threadgroup_rwsem);
+}
+
+/**
+ * cgroup_attach_unlock - Undo cgroup_attach_lock()
+ * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem
+ */
+static void cgroup_attach_unlock(void)
+{
+	percpu_up_write(&cgroup_threadgroup_rwsem);
+	put_online_cpus();
+}
+
+/**
  * cgroup_migrate_add_task - add a migration target task to a migration context
  * @task: target task
  * @mgctx: target migration context
@@ -2694,7 +2734,7 @@ struct task_struct *cgroup_procs_write_s
 	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
 		return ERR_PTR(-EINVAL);
 
-	percpu_down_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_lock();
 
 	rcu_read_lock();
 	if (pid) {
@@ -2725,7 +2765,7 @@ struct task_struct *cgroup_procs_write_s
 	goto out_unlock_rcu;
 
 out_unlock_threadgroup:
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock();
 out_unlock_rcu:
 	rcu_read_unlock();
 	return tsk;
@@ -2740,7 +2780,7 @@ void cgroup_procs_write_finish(struct ta
 	/* release reference from cgroup_procs_write_start() */
 	put_task_struct(task);
 
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock();
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
 			ss->post_attach();
@@ -2799,7 +2839,7 @@ static int cgroup_update_dfl_csses(struc
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	percpu_down_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_lock();
 
 	/* look up all csses currently attached to @cgrp's subtree */
 	spin_lock_irq(&css_set_lock);
@@ -2830,7 +2870,7 @@ static int cgroup_update_dfl_csses(struc
 	ret = cgroup_migrate_execute(&mgctx);
 out_finish:
 	cgroup_migrate_finish(&mgctx);
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock();
 	return ret;
 }
 
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1528,11 +1528,7 @@ static void cpuset_attach(struct cgroup_
 	cgroup_taskset_first(tset, &css);
 	cs = css_cs(css);
 
-	/*
-	 * It should hold cpus lock because a cpu offline event can
-	 * cause set_cpus_allowed_ptr() failed.
-	 */
-	get_online_cpus();
+	lockdep_assert_cpus_held();     /* see cgroup_attach_lock() */
 	mutex_lock(&cpuset_mutex);
 
 	/* prepare for attach */
@@ -1588,7 +1584,6 @@ static void cpuset_attach(struct cgroup_
 		wake_up(&cpuset_attach_wq);
 
 	mutex_unlock(&cpuset_mutex);
-	put_online_cpus();
 }
 
 /* The various types of files and directories in a cpuset file system */



  parent reply	other threads:[~2023-04-03 14:18 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03 14:08 [PATCH 4.19 00/84] 4.19.280-rc1 review Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 01/84] power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 02/84] i40evf: Change a VF mac without reloading the VF driver Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 03/84] intel-ethernet: rename i40evf to iavf Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 04/84] iavf: diet and reformat Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 05/84] iavf: fix inverted Rx hash condition leading to disabled hash Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 06/84] intel/igbvf: free irq on the error path in igbvf_request_msix() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 07/84] igbvf: Regard vf reset nack as success Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 08/84] i2c: imx-lpi2c: check only for enabled interrupt flags Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 09/84] scsi: scsi_dh_alua: Fix memleak for qdata in alua_activate() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 10/84] net: usb: smsc95xx: Limit packet length to skb->len Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 11/84] qed/qed_sriov: guard against NULL derefs from qed_iov_get_vf_info Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 12/84] xirc2ps_cs: Fix use after free bug in xirc2ps_detach Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 13/84] net: qcom/emac: Fix use after free bug in emac_remove due to race condition Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 14/84] net/ps3_gelic_net: Fix RX sk_buff length Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 15/84] net/ps3_gelic_net: Use dma_mapping_error Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 16/84] bpf: Adjust insufficient default bpf_jit_limit Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 17/84] net/mlx5: Read the TC mapping of all priorities on ETS query Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 18/84] atm: idt77252: fix kmemleak when rmmod idt77252 Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 19/84] erspan: do not use skb_mac_header() in ndo_start_xmit() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 20/84] net/sonic: use dma_mapping_error() for error check Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 21/84] hvc/xen: prevent concurrent accesses to the shared ring Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 22/84] net: mdio: thunder: Add missing fwnode_handle_put() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 23/84] Bluetooth: btqcomsmd: Fix command timeout after setting BD address Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 24/84] Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 25/84] hwmon (it87): Fix voltage scaling for chips with 10.9mV ADCs Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 26/84] uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2 Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 27/84] thunderbolt: Use const qualifier for `ring_interrupt_index` Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 28/84] riscv: Bump COMMAND_LINE_SIZE value to 1024 Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 29/84] ca8210: fix mac_len negative array access Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 30/84] m68k: Only force 030 bus error if PC not in exception table Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 31/84] scsi: target: iscsi: Fix an error message in iscsi_check_key() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 32/84] scsi: ufs: core: Add soft dependency on governor_simpleondemand Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 33/84] net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990 Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 34/84] net: usb: qmi_wwan: add Telit 0x1080 composition Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 35/84] sh: sanitize the flags on sigreturn Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 36/84] cifs: empty interface list when server doesnt support query interfaces Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 37/84] scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 38/84] usb: gadget: u_audio: dont let userspace block driver unbind Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 39/84] igb: revert rtnl_lock() that causes deadlock Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 40/84] dm thin: fix deadlock when swapping to thin device Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 41/84] usb: chipdea: core: fix return -EINVAL if request role is the same with current role Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 42/84] usb: chipidea: core: fix possible concurrent when switch role Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 43/84] nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 44/84] i2c: xgene-slimpro: Fix out-of-bounds bug in xgene_slimpro_i2c_xfer() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 45/84] dm stats: check for and propagate alloc_percpu failure Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 46/84] dm crypt: add cond_resched() to dmcrypt_write() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 47/84] sched/fair: sanitize vruntime of entity being placed Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 48/84] sched/fair: Sanitize vruntime of entity being migrated Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 49/84] tun: avoid double free in tun_free_netdev Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 50/84] ocfs2: fix data corruption after failed write Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 51/84] bus: imx-weim: fix branch condition evaluates to a garbage value Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 52/84] md: avoid signed overflow in slot_store() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 53/84] ALSA: asihpi: check pao in control_message() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 54/84] ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set() Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 55/84] fbdev: tgafb: Fix potential divide by zero Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 56/84] sched_getaffinity: dont assume cpumask_size() is fully initialized Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 57/84] fbdev: nvidia: Fix potential divide by zero Greg Kroah-Hartman
2023-04-03 14:08 ` [PATCH 4.19 58/84] fbdev: intelfb: " Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 59/84] fbdev: lxfb: " Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 60/84] fbdev: au1200fb: " Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 61/84] ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx() Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 62/84] scsi: megaraid_sas: Fix crash after a double completion Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 63/84] can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 64/84] i40e: fix registers dump after run ethtool adapter self test Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 65/84] net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 66/84] net: mvneta: make tx buffer array agnostic Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 67/84] Input: alps - fix compatibility with -funsigned-char Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 68/84] Input: focaltech - use explicitly signed char type Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 69/84] cifs: prevent infinite recursion in CIFSGetDFSRefer() Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 70/84] cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 71/84] xen/netback: dont do grant copy across page boundary Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 72/84] pinctrl: at91-pio4: fix domain name assignment Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 73/84] ALSA: hda/conexant: Partial revert of a quirk for Lenovo Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 74/84] ALSA: usb-audio: Fix regression on detection of Roland VS-100 Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 75/84] drm/etnaviv: fix reference leak when mmaping imported buffer Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 76/84] s390/uaccess: add missing earlyclobber annotations to __clear_user() Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 77/84] usb: host: ohci-pxa27x: Fix and & vs | typo Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 78/84] ext4: fix kernel BUG in ext4_write_inline_data_end() Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 79/84] firmware: arm_scmi: Fix device node validation for mailbox transport Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 80/84] gfs2: Always check inode size of inline inodes Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 81/84] net: sched: cbq: dont intepret cls results when asked to drop Greg Kroah-Hartman
2023-04-03 14:09 ` [PATCH 4.19 82/84] cgroup/cpuset: Change cpuset_rwsem and hotplug lock order Greg Kroah-Hartman
2023-04-03 14:09 ` Greg Kroah-Hartman [this message]
2023-04-03 14:09 ` [PATCH 4.19 84/84] cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() Greg Kroah-Hartman
2023-04-03 23:06 ` [PATCH 4.19 00/84] 4.19.280-rc1 review Shuah Khan
2023-04-04  5:19 ` Naresh Kamboju
2023-04-04 10:49 ` Chris Paterson
2023-04-04 11:46 ` Pavel Machek
2023-04-04 18:30   ` Greg Kroah-Hartman
2023-04-04 21:24 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230403140356.258072954@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=caixinchen1@huawei.com \
    --cc=imran.f.khan@oracle.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox