From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Gao Yingjie <gaoyingjie@uniontech.com>,
Chen Ridong <chenridong@huawei.com>, Teju Heo <tj@kernel.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 05/61] cgroup: split cgroup_destroy_wq into 3 workqueues
Date: Mon, 22 Sep 2025 21:28:58 +0200 [thread overview]
Message-ID: <20250922192403.694104160@linuxfoundation.org> (raw)
In-Reply-To: <20250922192403.524848428@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ridong <chenridong@huawei.com>
[ Upstream commit 79f919a89c9d06816dbdbbd168fa41d27411a7f9 ]
A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.
Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio
Call Trace:
cgroup_lock_and_drain_offline+0x14c/0x1e8
cgroup_destroy_root+0x3c/0x2c0
css_free_rwork_fn+0x248/0x338
process_one_work+0x16c/0x3b8
worker_thread+0x22c/0x3b0
kthread+0xec/0x100
ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1
mount perf_event umount net_prio
cgroup1_get_tree cgroup_kill_sb
rebind_subsystems // root destruction enqueues
// cgroup_destroy_wq
// kill all perf_event css
// one perf_event css A is dying
// css A offline enqueues cgroup_destroy_wq
// root destruction will be executed first
css_free_rwork_fn
cgroup_destroy_root
cgroup_lock_and_drain_offline
// some perf descendants are dying
// cgroup_destroy_wq max_active = 1
// waiting for css A to die
Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation
This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.
[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Suggested-by: Teju Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/cgroup/cgroup.c | 43 +++++++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 7997c8021b62f..9742574ec62fd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -123,8 +123,31 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem);
* of concurrent destructions. Use a separate workqueue so that cgroup
* destruction work items don't end up filling up max_active of system_wq
* which may lead to deadlock.
+ *
+ * A cgroup destruction should enqueue work sequentially to:
+ * cgroup_offline_wq: use for css offline work
+ * cgroup_release_wq: use for css release work
+ * cgroup_free_wq: use for free work
+ *
+ * Rationale for using separate workqueues:
+ * The cgroup root free work may depend on completion of other css offline
+ * operations. If all tasks were enqueued to a single workqueue, this could
+ * create a deadlock scenario where:
+ * - Free work waits for other css offline work to complete.
+ * - But other css offline work is queued after free work in the same queue.
+ *
+ * Example deadlock scenario with single workqueue (cgroup_destroy_wq):
+ * 1. umount net_prio
+ * 2. net_prio root destruction enqueues work to cgroup_destroy_wq (CPUx)
+ * 3. perf_event CSS A offline enqueues work to same cgroup_destroy_wq (CPUx)
+ * 4. net_prio cgroup_destroy_root->cgroup_lock_and_drain_offline.
+ * 5. net_prio root destruction blocks waiting for perf_event CSS A offline,
+ * which can never complete as it's behind in the same queue and
+ * workqueue's max_active is 1.
*/
-static struct workqueue_struct *cgroup_destroy_wq;
+static struct workqueue_struct *cgroup_offline_wq;
+static struct workqueue_struct *cgroup_release_wq;
+static struct workqueue_struct *cgroup_free_wq;
/* generate an array of cgroup subsystem pointers */
#define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys,
@@ -5444,7 +5467,7 @@ static void css_release_work_fn(struct work_struct *work)
cgroup_unlock();
INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
- queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
+ queue_rcu_work(cgroup_free_wq, &css->destroy_rwork);
}
static void css_release(struct percpu_ref *ref)
@@ -5453,7 +5476,7 @@ static void css_release(struct percpu_ref *ref)
container_of(ref, struct cgroup_subsys_state, refcnt);
INIT_WORK(&css->destroy_work, css_release_work_fn);
- queue_work(cgroup_destroy_wq, &css->destroy_work);
+ queue_work(cgroup_release_wq, &css->destroy_work);
}
static void init_and_link_css(struct cgroup_subsys_state *css,
@@ -5575,7 +5598,7 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
err_free_css:
list_del_rcu(&css->rstat_css_node);
INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
- queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
+ queue_rcu_work(cgroup_free_wq, &css->destroy_rwork);
return ERR_PTR(err);
}
@@ -5811,7 +5834,7 @@ static void css_killed_ref_fn(struct percpu_ref *ref)
if (atomic_dec_and_test(&css->online_cnt)) {
INIT_WORK(&css->destroy_work, css_killed_work_fn);
- queue_work(cgroup_destroy_wq, &css->destroy_work);
+ queue_work(cgroup_offline_wq, &css->destroy_work);
}
}
@@ -6183,8 +6206,14 @@ static int __init cgroup_wq_init(void)
* We would prefer to do this in cgroup_init() above, but that
* is called before init_workqueues(): so leave this until after.
*/
- cgroup_destroy_wq = alloc_workqueue("cgroup_destroy", 0, 1);
- BUG_ON(!cgroup_destroy_wq);
+ cgroup_offline_wq = alloc_workqueue("cgroup_offline", 0, 1);
+ BUG_ON(!cgroup_offline_wq);
+
+ cgroup_release_wq = alloc_workqueue("cgroup_release", 0, 1);
+ BUG_ON(!cgroup_release_wq);
+
+ cgroup_free_wq = alloc_workqueue("cgroup_free", 0, 1);
+ BUG_ON(!cgroup_free_wq);
return 0;
}
core_initcall(cgroup_wq_init);
--
2.51.0
next prev parent reply other threads:[~2025-09-22 19:31 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 19:28 [PATCH 6.1 00/61] 6.1.154-rc1 review Greg Kroah-Hartman
2025-09-22 19:28 ` [PATCH 6.1 01/61] ALSA: firewire-motu: drop EPOLLOUT from poll return values as write is not supported Greg Kroah-Hartman
2025-09-22 19:28 ` [PATCH 6.1 02/61] wifi: mac80211: increase scan_ies_len for S1G Greg Kroah-Hartman
2025-09-22 19:28 ` [PATCH 6.1 03/61] wifi: mac80211: fix incorrect type for ret Greg Kroah-Hartman
2025-09-22 19:28 ` [PATCH 6.1 04/61] pcmcia: omap_cf: Mark driver struct with __refdata to prevent section mismatch Greg Kroah-Hartman
2025-09-22 19:28 ` Greg Kroah-Hartman [this message]
2025-09-22 19:28 ` [PATCH 6.1 06/61] btrfs: fix invalid extref key setup when replaying dentry Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 07/61] um: virtio_uml: Fix use-after-free after put_device in probe Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 08/61] dpaa2-switch: fix buffer pool seeding for control traffic Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 09/61] qed: Dont collect too many protection override GRC elements Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 10/61] mptcp: set remote_deny_join_id0 on SYN recv Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 11/61] net: natsemi: fix `rx_dropped` double accounting on `netif_rx()` failure Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 12/61] i40e: remove redundant memory barrier when cleaning Tx descs Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 13/61] bonding: dont set oif to bond dev when getting NS target destination Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 14/61] tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 15/61] tls: make sure to abort the stream if headers are bogus Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 16/61] Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set" Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 17/61] net: liquidio: fix overflow in octeon_init_instr_queue() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 18/61] cnic: Fix use-after-free bugs in cnic_delete_task Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 19/61] octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 20/61] ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 21/61] ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 22/61] nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 23/61] power: supply: bq27xxx: fix error return in case of no bq27000 hdq battery Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 24/61] power: supply: bq27xxx: restrict no-battery detection to bq27000 Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 25/61] LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 26/61] LoongArch: Check the return value when creating kobj Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 27/61] iommu/vt-d: Fix __domain_mapping()s usage of switch_to_super_page() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 28/61] btrfs: tree-checker: fix the incorrect inode ref size check Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 29/61] ASoC: qcom: audioreach: Fix lpaif_type configuration for the I2S interface Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 30/61] ASoC: qcom: q6apm-lpass-dais: Fix missing set_fmt DAI op for I2S Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 31/61] mmc: mvsdio: Fix dma_unmap_sg() nents value Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 32/61] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 33/61] rds: ib: Increment i_fastreg_wrs before bailing out Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 34/61] selftests: mptcp: avoid spurious errors on TCP disconnect Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 35/61] ALSA: hda/realtek: Fix mute led for HP Laptop 15-dw4xx Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 36/61] io_uring: backport io_should_terminate_tw() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 37/61] io_uring: include dying ring in task_work "should cancel" state Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 38/61] ASoC: wm8940: Correct typo in control name Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 39/61] ASoC: wm8974: Correct PLL rate rounding Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 40/61] ASoC: SOF: Intel: hda-stream: Fix incorrect variable used in error message Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 41/61] drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 42/61] drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 43/61] crypto: af_alg: Indent the loop in af_alg_sendmsg() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 44/61] crypto: af_alg - Set merge to zero early in af_alg_sendmsg Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 45/61] smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 46/61] mptcp: pm: nl: announce deny-join-id0 flag Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 47/61] selftests: mptcp: userspace pm: validate " Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 48/61] phy: broadcom: ns-usb3: fix Wvoid-pointer-to-enum-cast warning Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 49/61] phy: Use device_get_match_data() Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 50/61] phy: ti: omap-usb2: fix device leak at unbind Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 51/61] xhci: dbc: decouple endpoint allocation from initialization Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 52/61] xhci: dbc: Fix full DbC transfer ring after several reconnects Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 53/61] mptcp: propagate shutdown to subflows when possible Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 54/61] selftests: mptcp: connect: catch IO errors on listen side Greg Kroah-Hartman
2025-09-30 15:30 ` Kenta Akagi
2025-10-01 7:56 ` Matthieu Baerts
2025-10-01 15:24 ` Kenta Akagi
2025-10-01 16:43 ` Kenta Akagi
2025-10-01 17:09 ` Matthieu Baerts
2025-10-02 16:06 ` Kenta Akagi
2025-09-22 19:29 ` [PATCH 6.1 55/61] net: rfkill: gpio: add DT support Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 56/61] net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 57/61] ASoC: qcom: q6apm-lpass-dai: close graphs before opening a new one Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 58/61] ASoC: q6apm-lpass-dai: close graph on prepare errors Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 59/61] ASoC: qcom: q6apm-lpass-dais: Fix NULL pointer dereference if source graph failed Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 60/61] crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES Greg Kroah-Hartman
2025-09-22 19:29 ` [PATCH 6.1 61/61] crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg Greg Kroah-Hartman
2025-09-22 22:43 ` [PATCH 6.1 00/61] 6.1.154-rc1 review Florian Fainelli
2025-09-23 7:27 ` Brett A C Sheffield
2025-09-23 10:02 ` [PATCH 6.1 00/61] " Peter Schneider
2025-09-23 10:30 ` Naresh Kamboju
2025-09-23 13:06 ` Jon Hunter
2025-09-23 13:12 ` Mark Brown
2025-09-23 15:16 ` Ron Economos
2025-09-23 20:36 ` Miguel Ojeda
2025-09-24 0:33 ` Shuah Khan
2025-09-24 6:56 ` Hardik Garg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250922192403.694104160@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=chenridong@huawei.com \
--cc=gaoyingjie@uniontech.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox