* [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work
@ 2025-09-04 7:45 Chuyi Zhou
2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Chuyi Zhou @ 2025-09-04 7:45 UTC (permalink / raw)
To: tj, mkoutny, hannes, longman; +Cc: linux-kernel, Chuyi Zhou
Now in cpuset_attach(), we need to synchronously wait for
flush_workqueue to complete. The execution time of flushing
cpuset_migrate_mm_wq depends on the amount of mm migration initiated by
cpusets at that time. When the cpuset.mems of a cgroup occupying a large
amount of memory is modified, it may trigger extensive mm migration,
causing cpuset_attach() to block on flush_workqueue for an extended period.
cgroup attach operation | someone change cpuset.mems
|
-------------------------------+-------------------------------
__cgroup_procs_write() cpuset_write_resmask()
cgroup_kn_lock_live()
cpuset_attach() cpuset_migrate_mm()
cpuset_post_attach()
flush_workqueue(cpuset_migrate_mm_wq);
This could be dangerous because cpuset_attach() is within the critical
section of cgroup_mutex, which may ultimately cause all cgroup-related
operations in the system to be blocked. We encountered this issue in the
production environment, and it can be easily reproduced locally using the
script below.
[Thu Sep 4 14:51:39 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Sep 4 14:51:39 2025] task:tee state:D stack:0 pid:13330 tgid:13330 ppid:13321 task_flags:0x400100 flags:0x00004000
[Thu Sep 4 14:51:39 2025] Call Trace:
[Thu Sep 4 14:51:39 2025] <TASK>
[Thu Sep 4 14:51:39 2025] __schedule+0xcc1/0x1c60
[Thu Sep 4 14:51:39 2025] ? find_held_lock+0x2d/0xa0
[Thu Sep 4 14:51:39 2025] schedule+0x3e/0xe0
[Thu Sep 4 14:51:39 2025] schedule_preempt_disabled+0x15/0x30
[Thu Sep 4 14:51:39 2025] __mutex_lock+0x928/0x1230
[Thu Sep 4 14:51:39 2025] ? cgroup_kn_lock_live+0x4a/0x240
[Thu Sep 4 14:51:39 2025] ? cgroup_kn_lock_live+0x4a/0x240
[Thu Sep 4 14:51:39 2025] cgroup_kn_lock_live+0x4a/0x240
[Thu Sep 4 14:51:39 2025] __cgroup_procs_write+0x38/0x210
[Thu Sep 4 14:51:39 2025] cgroup_procs_write+0x17/0x30
[Thu Sep 4 14:51:39 2025] cgroup_file_write+0xa5/0x260
[Thu Sep 4 14:51:39 2025] kernfs_fop_write_iter+0x13d/0x1e0
[Thu Sep 4 14:51:39 2025] vfs_write+0x310/0x530
[Thu Sep 4 14:51:39 2025] ksys_write+0x6e/0xf0
[Thu Sep 4 14:51:39 2025] do_syscall_64+0x77/0x390
[Thu Sep 4 14:51:39 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
This patchset attempts to defer the flush_workqueue() operation until
returning to userspace using the task_work which is originally proposed by
tejun[1], so that flush happens after cgroup_mutex is dropped. That way we
maintain the operation synchronicity while avoiding bothering anyone else.
[1]: https://lore.kernel.org/cgroups/ZgMFPMjZRZCsq9Q-@slm.duckdns.org/T/#m117f606fa24f66f0823a60f211b36f24bd9e1883
#!/bin/bash
sudo mkdir -p /sys/fs/cgroup/test
sudo mkdir -p /sys/fs/cgroup/test1
sudo mkdir -p /sys/fs/cgroup/test2
echo 0 > /sys/fs/cgroup/test1/cpuset.mems
echo 1 > /sys/fs/cgroup/test2/cpuset.mems
for i in {1..10}; do
(
pid=$BASHPID
while true; do
echo "Add $pid to test1"
echo "$pid" | sudo tee /sys/fs/cgroup/test1/cgroup.procs >/dev/null
sleep 5
echo "Add $pid to test2"
echo "$pid" | sudo tee /sys/fs/cgroup/test2/cgroup.procs >/dev/null
done
) &
done
echo 0 > /sys/fs/cgroup/test/cpuset.mems
echo $$ > /sys/fs/cgroup/test/cgroup.procs
stress --vm 100 --vm-bytes 2048M --vm-keep &
sleep 30
echo "begin change cpuset.mems"
echo 1 > /sys/fs/cgroup/test/cpuset.mems
Chuyi Zhou (3):
cpuset: Don't always flush cpuset_migrate_mm_wq in
cpuset_write_resmask
cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work
cgroup: Remove unused cgroup_subsys::post_attach
include/linux/cgroup-defs.h | 1 -
kernel/cgroup/cgroup.c | 4 ----
kernel/cgroup/cpuset.c | 30 +++++++++++++++++++++++++-----
3 files changed, 25 insertions(+), 10 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask 2025-09-04 7:45 [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou @ 2025-09-04 7:45 ` Chuyi Zhou 2025-09-04 14:30 ` Michal Koutný ` (2 more replies) 2025-09-04 7:45 ` [PATCH 2/3] cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou ` (2 subsequent siblings) 3 siblings, 3 replies; 11+ messages in thread From: Chuyi Zhou @ 2025-09-04 7:45 UTC (permalink / raw) To: tj, mkoutny, hannes, longman; +Cc: linux-kernel, Chuyi Zhou It is unnecessary to always wait for the flush operation of cpuset_migrate_mm_wq to complete in cpuset_write_resmask, as modifying cpuset.cpus or cpuset.exclusive does not trigger mm migrations. The flush_workqueue can be executed only when cpuset.mems is modified. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> --- kernel/cgroup/cpuset.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 27adb04df675d..3d8492581c8c4 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3256,7 +3256,8 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of, out_unlock: mutex_unlock(&cpuset_mutex); cpus_read_unlock(); - flush_workqueue(cpuset_migrate_mm_wq); + if (of_cft(of)->private == FILE_MEMLIST) + flush_workqueue(cpuset_migrate_mm_wq); return retval ?: nbytes; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask 2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou @ 2025-09-04 14:30 ` Michal Koutný 2025-09-04 15:12 ` Waiman Long 2025-09-04 17:15 ` Tejun Heo 2 siblings, 0 replies; 11+ messages in thread From: Michal Koutný @ 2025-09-04 14:30 UTC (permalink / raw) To: Chuyi Zhou; +Cc: tj, hannes, longman, linux-kernel [-- Attachment #1: Type: text/plain, Size: 613 bytes --] On Thu, Sep 04, 2025 at 03:45:03PM +0800, Chuyi Zhou <zhouchuyi@bytedance.com> wrote: > It is unnecessary to always wait for the flush operation of > cpuset_migrate_mm_wq to complete in cpuset_write_resmask, as modifying > cpuset.cpus or cpuset.exclusive does not trigger mm migrations. The > flush_workqueue can be executed only when cpuset.mems is modified. > > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> > --- > kernel/cgroup/cpuset.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Reasonable and AFAICT correct optimization. Reviewed-by: Michal Koutný <mkoutny@suse.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 265 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask 2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou 2025-09-04 14:30 ` Michal Koutný @ 2025-09-04 15:12 ` Waiman Long 2025-09-04 17:15 ` Tejun Heo 2 siblings, 0 replies; 11+ messages in thread From: Waiman Long @ 2025-09-04 15:12 UTC (permalink / raw) To: Chuyi Zhou, tj, mkoutny, hannes; +Cc: linux-kernel On 9/4/25 3:45 AM, Chuyi Zhou wrote: > It is unnecessary to always wait for the flush operation of > cpuset_migrate_mm_wq to complete in cpuset_write_resmask, as modifying > cpuset.cpus or cpuset.exclusive does not trigger mm migrations. The > flush_workqueue can be executed only when cpuset.mems is modified. > > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> > --- > kernel/cgroup/cpuset.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 27adb04df675d..3d8492581c8c4 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -3256,7 +3256,8 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of, > out_unlock: > mutex_unlock(&cpuset_mutex); > cpus_read_unlock(); > - flush_workqueue(cpuset_migrate_mm_wq); > + if (of_cft(of)->private == FILE_MEMLIST) > + flush_workqueue(cpuset_migrate_mm_wq); > return retval ?: nbytes; > } > LGTM Reviewed-by: Waiman Long <longman@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask 2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou 2025-09-04 14:30 ` Michal Koutný 2025-09-04 15:12 ` Waiman Long @ 2025-09-04 17:15 ` Tejun Heo 2 siblings, 0 replies; 11+ messages in thread From: Tejun Heo @ 2025-09-04 17:15 UTC (permalink / raw) To: Chuyi Zhou; +Cc: mkoutny, hannes, longman, linux-kernel On Thu, Sep 04, 2025 at 03:45:03PM +0800, Chuyi Zhou wrote: > It is unnecessary to always wait for the flush operation of > cpuset_migrate_mm_wq to complete in cpuset_write_resmask, as modifying > cpuset.cpus or cpuset.exclusive does not trigger mm migrations. The > flush_workqueue can be executed only when cpuset.mems is modified. > > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Applied cgroup/for-6.18. Thanks. -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/3] cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work 2025-09-04 7:45 [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou 2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou @ 2025-09-04 7:45 ` Chuyi Zhou 2025-09-04 15:14 ` Waiman Long 2025-09-04 7:45 ` [PATCH 3/3] cgroup: Remove unused cgroup_subsys::post_attach Chuyi Zhou 2025-09-04 17:27 ` [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Tejun Heo 3 siblings, 1 reply; 11+ messages in thread From: Chuyi Zhou @ 2025-09-04 7:45 UTC (permalink / raw) To: tj, mkoutny, hannes, longman; +Cc: linux-kernel, Chuyi Zhou Now in cpuset_attach(), we need to synchronously wait for flush_workqueue to complete. The execution time of flushing cpuset_migrate_mm_wq depends on the amount of mm migration initiated by cpusets at that time. When the cpuset.mems of a cgroup occupying a large amount of memory is modified, it may trigger extensive mm migration, causing cpuset_attach() to block on flush_workqueue for an extended period. This could be dangerous because cpuset_attach() is within the critical section of cgroup_mutex, which may ultimately cause all cgroup-related operations in the system to be blocked. This patch attempts to defer the flush_workqueue() operation until returning to userspace using the task_work which is originally proposed by tejun[1], so that flush happens after cgroup_mutex is dropped. That way we maintain the operation synchronicity while avoiding bothering anyone else. [1]: https://lore.kernel.org/cgroups/ZgMFPMjZRZCsq9Q-@slm.duckdns.org/T/#m117f606fa24f66f0823a60f211b36f24bd9e1883 Originally-by: tejun heo <tj@kernel.org> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> --- kernel/cgroup/cpuset.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 3d8492581c8c4..ceb467079e41f 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -40,6 +40,7 @@ #include <linux/sched/isolation.h> #include <linux/wait.h> #include <linux/workqueue.h> +#include <linux/task_work.h> DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key); DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key); @@ -2582,9 +2583,24 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from, } } -static void cpuset_post_attach(void) +static void flush_migrate_mm_task_workfn(struct callback_head *head) { flush_workqueue(cpuset_migrate_mm_wq); + kfree(head); +} + +static void schedule_flush_migrate_mm(void) +{ + struct callback_head *flush_cb; + + flush_cb = kzalloc(sizeof(struct callback_head), GFP_KERNEL); + if (!flush_cb) + return; + + init_task_work(flush_cb, flush_migrate_mm_task_workfn); + + if (task_work_add(current, flush_cb, TWA_RESUME)) + kfree(flush_cb); } /* @@ -3141,6 +3157,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) struct cpuset *cs; struct cpuset *oldcs = cpuset_attach_old_cs; bool cpus_updated, mems_updated; + bool queue_task_work = false; cgroup_taskset_first(tset, &css); cs = css_cs(css); @@ -3191,15 +3208,18 @@ static void cpuset_attach(struct cgroup_taskset *tset) * @old_mems_allowed is the right nodesets that we * migrate mm from. */ - if (is_memory_migrate(cs)) + if (is_memory_migrate(cs)) { cpuset_migrate_mm(mm, &oldcs->old_mems_allowed, &cpuset_attach_nodemask_to); - else + queue_task_work = true; + } else mmput(mm); } } out: + if (queue_task_work) + schedule_flush_migrate_mm(); cs->old_mems_allowed = cpuset_attach_nodemask_to; if (cs->nr_migrate_dl_tasks) { @@ -3257,7 +3277,7 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of, mutex_unlock(&cpuset_mutex); cpus_read_unlock(); if (of_cft(of)->private == FILE_MEMLIST) - flush_workqueue(cpuset_migrate_mm_wq); + schedule_flush_migrate_mm(); return retval ?: nbytes; } @@ -3725,7 +3745,6 @@ struct cgroup_subsys cpuset_cgrp_subsys = { .can_attach = cpuset_can_attach, .cancel_attach = cpuset_cancel_attach, .attach = cpuset_attach, - .post_attach = cpuset_post_attach, .bind = cpuset_bind, .can_fork = cpuset_can_fork, .cancel_fork = cpuset_cancel_fork, -- 2.20.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work 2025-09-04 7:45 ` [PATCH 2/3] cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou @ 2025-09-04 15:14 ` Waiman Long 0 siblings, 0 replies; 11+ messages in thread From: Waiman Long @ 2025-09-04 15:14 UTC (permalink / raw) To: Chuyi Zhou, tj, mkoutny, hannes; +Cc: linux-kernel On 9/4/25 3:45 AM, Chuyi Zhou wrote: > Now in cpuset_attach(), we need to synchronously wait for > flush_workqueue to complete. The execution time of flushing > cpuset_migrate_mm_wq depends on the amount of mm migration initiated by > cpusets at that time. When the cpuset.mems of a cgroup occupying a large > amount of memory is modified, it may trigger extensive mm migration, > causing cpuset_attach() to block on flush_workqueue for an extended period. > This could be dangerous because cpuset_attach() is within the critical > section of cgroup_mutex, which may ultimately cause all cgroup-related > operations in the system to be blocked. > > This patch attempts to defer the flush_workqueue() operation until > returning to userspace using the task_work which is originally proposed by > tejun[1], so that flush happens after cgroup_mutex is dropped. That way we > maintain the operation synchronicity while avoiding bothering anyone else. > > [1]: https://lore.kernel.org/cgroups/ZgMFPMjZRZCsq9Q-@slm.duckdns.org/T/#m117f606fa24f66f0823a60f211b36f24bd9e1883 > > Originally-by: tejun heo <tj@kernel.org> > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> > --- > kernel/cgroup/cpuset.c | 29 ++++++++++++++++++++++++----- > 1 file changed, 24 insertions(+), 5 deletions(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 3d8492581c8c4..ceb467079e41f 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -40,6 +40,7 @@ > #include <linux/sched/isolation.h> > #include <linux/wait.h> > #include <linux/workqueue.h> > +#include <linux/task_work.h> > > DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key); > DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key); > @@ -2582,9 +2583,24 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from, > } > } > > -static void cpuset_post_attach(void) > +static void flush_migrate_mm_task_workfn(struct callback_head *head) > { > flush_workqueue(cpuset_migrate_mm_wq); > + kfree(head); > +} > + > +static void schedule_flush_migrate_mm(void) > +{ > + struct callback_head *flush_cb; > + > + flush_cb = kzalloc(sizeof(struct callback_head), GFP_KERNEL); > + if (!flush_cb) > + return; > + > + init_task_work(flush_cb, flush_migrate_mm_task_workfn); > + > + if (task_work_add(current, flush_cb, TWA_RESUME)) > + kfree(flush_cb); > } > > /* > @@ -3141,6 +3157,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) > struct cpuset *cs; > struct cpuset *oldcs = cpuset_attach_old_cs; > bool cpus_updated, mems_updated; > + bool queue_task_work = false; > > cgroup_taskset_first(tset, &css); > cs = css_cs(css); > @@ -3191,15 +3208,18 @@ static void cpuset_attach(struct cgroup_taskset *tset) > * @old_mems_allowed is the right nodesets that we > * migrate mm from. > */ > - if (is_memory_migrate(cs)) > + if (is_memory_migrate(cs)) { > cpuset_migrate_mm(mm, &oldcs->old_mems_allowed, > &cpuset_attach_nodemask_to); > - else > + queue_task_work = true; > + } else > mmput(mm); > } > } > > out: > + if (queue_task_work) > + schedule_flush_migrate_mm(); > cs->old_mems_allowed = cpuset_attach_nodemask_to; > > if (cs->nr_migrate_dl_tasks) { > @@ -3257,7 +3277,7 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of, > mutex_unlock(&cpuset_mutex); > cpus_read_unlock(); > if (of_cft(of)->private == FILE_MEMLIST) > - flush_workqueue(cpuset_migrate_mm_wq); > + schedule_flush_migrate_mm(); > return retval ?: nbytes; > } > > @@ -3725,7 +3745,6 @@ struct cgroup_subsys cpuset_cgrp_subsys = { > .can_attach = cpuset_can_attach, > .cancel_attach = cpuset_cancel_attach, > .attach = cpuset_attach, > - .post_attach = cpuset_post_attach, > .bind = cpuset_bind, > .can_fork = cpuset_can_fork, > .cancel_fork = cpuset_cancel_fork, Reviewed-by: Waiman Long <longman@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/3] cgroup: Remove unused cgroup_subsys::post_attach 2025-09-04 7:45 [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou 2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou 2025-09-04 7:45 ` [PATCH 2/3] cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou @ 2025-09-04 7:45 ` Chuyi Zhou 2025-09-04 15:17 ` Waiman Long 2025-09-04 17:27 ` [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Tejun Heo 3 siblings, 1 reply; 11+ messages in thread From: Chuyi Zhou @ 2025-09-04 7:45 UTC (permalink / raw) To: tj, mkoutny, hannes, longman; +Cc: linux-kernel, Chuyi Zhou cgroup_subsys::post_attach callback was introduced in commit 5cf1cacb49ae ("cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback") and only cpuset would use this callback to wait for the mm migration to complete at the end of __cgroup_procs_write(). Since the previous patch defer the flush operation until returning to userspace, no one use this callback now. Remove this callback from cgroup_subsys. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> --- include/linux/cgroup-defs.h | 1 - kernel/cgroup/cgroup.c | 4 ---- 2 files changed, 5 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 6b93a64115fe9..432abdfdb2593 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -746,7 +746,6 @@ struct cgroup_subsys { int (*can_attach)(struct cgroup_taskset *tset); void (*cancel_attach)(struct cgroup_taskset *tset); void (*attach)(struct cgroup_taskset *tset); - void (*post_attach)(void); int (*can_fork)(struct task_struct *task, struct css_set *cset); void (*cancel_fork)(struct task_struct *task, struct css_set *cset); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 312c6a8b55bb7..75819bb2f1148 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3033,10 +3033,6 @@ void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked put_task_struct(task); cgroup_attach_unlock(threadgroup_locked); - - for_each_subsys(ss, ssid) - if (ss->post_attach) - ss->post_attach(); } static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask) -- 2.20.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] cgroup: Remove unused cgroup_subsys::post_attach 2025-09-04 7:45 ` [PATCH 3/3] cgroup: Remove unused cgroup_subsys::post_attach Chuyi Zhou @ 2025-09-04 15:17 ` Waiman Long 0 siblings, 0 replies; 11+ messages in thread From: Waiman Long @ 2025-09-04 15:17 UTC (permalink / raw) To: Chuyi Zhou, tj, mkoutny, hannes; +Cc: linux-kernel On 9/4/25 3:45 AM, Chuyi Zhou wrote: > cgroup_subsys::post_attach callback was introduced in commit 5cf1cacb49ae > ("cgroup, cpuset: replace cpuset_post_attach_flush() with > cgroup_subsys->post_attach callback") and only cpuset would use this > callback to wait for the mm migration to complete at the end of > __cgroup_procs_write(). Since the previous patch defer the flush operation > until returning to userspace, no one use this callback now. Remove this > callback from cgroup_subsys. > > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> > --- > include/linux/cgroup-defs.h | 1 - > kernel/cgroup/cgroup.c | 4 ---- > 2 files changed, 5 deletions(-) > > diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > index 6b93a64115fe9..432abdfdb2593 100644 > --- a/include/linux/cgroup-defs.h > +++ b/include/linux/cgroup-defs.h > @@ -746,7 +746,6 @@ struct cgroup_subsys { > int (*can_attach)(struct cgroup_taskset *tset); > void (*cancel_attach)(struct cgroup_taskset *tset); > void (*attach)(struct cgroup_taskset *tset); > - void (*post_attach)(void); > int (*can_fork)(struct task_struct *task, > struct css_set *cset); > void (*cancel_fork)(struct task_struct *task, struct css_set *cset); > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 312c6a8b55bb7..75819bb2f1148 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -3033,10 +3033,6 @@ void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked > put_task_struct(task); > > cgroup_attach_unlock(threadgroup_locked); > - > - for_each_subsys(ss, ssid) > - if (ss->post_attach) > - ss->post_attach(); > } > > static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask) Note that we may have to add it back in the future if a new use case comes up. Acked-by: Waiman Long <longman@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work 2025-09-04 7:45 [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou ` (2 preceding siblings ...) 2025-09-04 7:45 ` [PATCH 3/3] cgroup: Remove unused cgroup_subsys::post_attach Chuyi Zhou @ 2025-09-04 17:27 ` Tejun Heo 2025-09-05 2:15 ` Chuyi Zhou 3 siblings, 1 reply; 11+ messages in thread From: Tejun Heo @ 2025-09-04 17:27 UTC (permalink / raw) To: Chuyi Zhou; +Cc: mkoutny, hannes, longman, linux-kernel On Thu, Sep 04, 2025 at 03:45:02PM +0800, Chuyi Zhou wrote: > Now in cpuset_attach(), we need to synchronously wait for > flush_workqueue to complete. The execution time of flushing > cpuset_migrate_mm_wq depends on the amount of mm migration initiated by > cpusets at that time. When the cpuset.mems of a cgroup occupying a large > amount of memory is modified, it may trigger extensive mm migration, > causing cpuset_attach() to block on flush_workqueue for an extended period. Applied 1-3 to cgroup/for-6.18. There were a couple conflicts that I resolved. It'd be great if you can take a look and make sure everything is okay. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-6.18 Thanks. -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work 2025-09-04 17:27 ` [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Tejun Heo @ 2025-09-05 2:15 ` Chuyi Zhou 0 siblings, 0 replies; 11+ messages in thread From: Chuyi Zhou @ 2025-09-05 2:15 UTC (permalink / raw) To: Tejun Heo; +Cc: mkoutny, hannes, longman, linux-kernel 在 2025/9/5 01:27, Tejun Heo 写道: > On Thu, Sep 04, 2025 at 03:45:02PM +0800, Chuyi Zhou wrote: >> Now in cpuset_attach(), we need to synchronously wait for >> flush_workqueue to complete. The execution time of flushing >> cpuset_migrate_mm_wq depends on the amount of mm migration initiated by >> cpusets at that time. When the cpuset.mems of a cgroup occupying a large >> amount of memory is modified, it may trigger extensive mm migration, >> causing cpuset_attach() to block on flush_workqueue for an extended period. > > Applied 1-3 to cgroup/for-6.18. There were a couple conflicts that I > resolved. It'd be great if you can take a look and make sure everything is > okay. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-6.18 > > Thanks. > Sorry, I forgot to rebase the latest cgroup branch before sending the patchset. I made sure everything is okay. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-09-05 2:15 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-04 7:45 [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou 2025-09-04 7:45 ` [PATCH 1/3] cpuset: Don't always flush cpuset_migrate_mm_wq in cpuset_write_resmask Chuyi Zhou 2025-09-04 14:30 ` Michal Koutný 2025-09-04 15:12 ` Waiman Long 2025-09-04 17:15 ` Tejun Heo 2025-09-04 7:45 ` [PATCH 2/3] cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work Chuyi Zhou 2025-09-04 15:14 ` Waiman Long 2025-09-04 7:45 ` [PATCH 3/3] cgroup: Remove unused cgroup_subsys::post_attach Chuyi Zhou 2025-09-04 15:17 ` Waiman Long 2025-09-04 17:27 ` [PATCH 0/3] Defer flushing of the cpuset_migrate_mm_wq to task_work Tejun Heo 2025-09-05 2:15 ` Chuyi Zhou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox