* [PATCH v3 0/1] Fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
@ 2024-08-17 9:33 Chen Ridong
2024-08-17 9:33 ` [PATCH v3 1/1] cgroup: fix " Chen Ridong
0 siblings, 1 reply; 10+ messages in thread
From: Chen Ridong @ 2024-08-17 9:33 UTC (permalink / raw)
To: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes, roman.gushchin
Cc: bpf, cgroups, linux-kernel
changes from v1/v2:
- Optimize commit msg.
Link v1: https://lore.kernel.org/cgroups/20240607110313.2230669-1-chenridong@huawei.com/
Link v2: https://lore.kernel.org/cgroups/20240719025232.2143638-1-chenridong@huawei.com/
Chen Ridong (1):
cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
kernel/bpf/cgroup.c | 2 +-
kernel/cgroup/cgroup-internal.h | 1 +
kernel/cgroup/cgroup.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-08-17 9:33 [PATCH v3 0/1] Fix deadlock caused by cgroup_mutex and cpu_hotplug_lock Chen Ridong
@ 2024-08-17 9:33 ` Chen Ridong
2024-08-22 0:57 ` Chen Ridong
2024-09-09 14:19 ` Michal Koutný
0 siblings, 2 replies; 10+ messages in thread
From: Chen Ridong @ 2024-08-17 9:33 UTC (permalink / raw)
To: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes, roman.gushchin
Cc: bpf, cgroups, linux-kernel
We found a hung_task problem as shown below:
INFO: task kworker/0:0:8 blocked for more than 327 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:0 state:D stack:13920 pid:8 ppid:2 flags:0x00004000
Workqueue: events cgroup_bpf_release
Call Trace:
<TASK>
__schedule+0x5a2/0x2050
? find_held_lock+0x33/0x100
? wq_worker_sleeping+0x9e/0xe0
schedule+0x9f/0x180
schedule_preempt_disabled+0x25/0x50
__mutex_lock+0x512/0x740
? cgroup_bpf_release+0x1e/0x4d0
? cgroup_bpf_release+0xcf/0x4d0
? process_scheduled_works+0x161/0x8a0
? cgroup_bpf_release+0x1e/0x4d0
? mutex_lock_nested+0x2b/0x40
? __pfx_delay_tsc+0x10/0x10
mutex_lock_nested+0x2b/0x40
cgroup_bpf_release+0xcf/0x4d0
? process_scheduled_works+0x161/0x8a0
? trace_event_raw_event_workqueue_execute_start+0x64/0xd0
? process_scheduled_works+0x161/0x8a0
process_scheduled_works+0x23a/0x8a0
worker_thread+0x231/0x5b0
? __pfx_worker_thread+0x10/0x10
kthread+0x14d/0x1c0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x59/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
This issue can be reproduced by the following pressuse test:
1. A large number of cpuset cgroups are deleted.
2. Set cpu on and off repeatly.
3. Set watchdog_thresh repeatly.
The scripts can be obtained at LINK mentioned above the signature.
The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
acquired in different tasks, which may lead to deadlock.
It can lead to a deadlock through the following steps:
1. A large number of cpusets are deleted asynchronously, which puts a
large number of cgroup_bpf_release works into system_wq. The max_active
of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
cgroup_bpf_release works, and many cgroup_bpf_release works will be put
into inactive queue. As illustrated in the diagram, there are 256 (in
the acvtive queue) + n (in the inactive queue) works.
2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put
smp_call_on_cpu work into system_wq. However step 1 has already filled
system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has
to wait until the works that were put into the inacvtive queue earlier
have executed (n cgroup_bpf_release), so it will be blocked for a while.
3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.
4. Cpusets that were deleted at step 1 put cgroup_release works into
cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.
When cgroup_metux is acqured by work at css_killed_work_fn, it will
call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.
However, cpuset_css_offline will be blocked for step 3.
5. At this moment, there are 256 works in active queue that are
cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as
a result, all of them are blocked. Consequently, sscs.work can not be
executed. Ultimately, this situation leads to four processes being
blocked, forming a deadlock.
system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4)
...
2000+ cgroups deleted asyn
256 actives + n inactives
__lockup_detector_reconfigure
P(cpu_hotplug_lock.read)
put sscs.work into system_wq
256 + n + 1(sscs.work)
sscs.work wait to be executed
warting sscs.work finish
percpu_down_write
P(cpu_hotplug_lock.write)
...blocking...
css_killed_work_fn
P(cgroup_mutex)
cpuset_css_offline
P(cpu_hotplug_lock.read)
...blocking...
256 cgroup_bpf_release
mutex_lock(&cgroup_mutex);
..blocking...
To fix the problem, place cgroup_bpf_release works on cgroup_destroy_wq,
which can break the loop and solve the problem. System wqs are for misc
things which shouldn't create a large number of concurrent work items.
If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work
items, it should use its own dedicated workqueue.
Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei.com/T/#t
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/bpf/cgroup.c | 2 +-
kernel/cgroup/cgroup-internal.h | 1 +
kernel/cgroup/cgroup.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 8ba73042a239..a611a1274788 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -334,7 +334,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref)
struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt);
INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release);
- queue_work(system_wq, &cgrp->bpf.release_work);
+ queue_work(cgroup_destroy_wq, &cgrp->bpf.release_work);
}
/* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index c964dd7ff967..17ac19bc8106 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -13,6 +13,7 @@
extern spinlock_t trace_cgroup_path_lock;
extern char trace_cgroup_path[TRACE_CGROUP_PATH_LEN];
extern void __init enable_debug_cgroup(void);
+extern struct workqueue_struct *cgroup_destroy_wq;
/*
* cgroup_path() takes a spin lock. It is good practice not to take
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 75058fbf4450..77fa9ed69c86 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -124,7 +124,7 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem);
* destruction work items don't end up filling up max_active of system_wq
* which may lead to deadlock.
*/
-static struct workqueue_struct *cgroup_destroy_wq;
+struct workqueue_struct *cgroup_destroy_wq;
/* generate an array of cgroup subsystem pointers */
#define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys,
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-08-17 9:33 ` [PATCH v3 1/1] cgroup: fix " Chen Ridong
@ 2024-08-22 0:57 ` Chen Ridong
2024-08-30 2:08 ` Chen Ridong
2024-09-09 14:19 ` Michal Koutný
1 sibling, 1 reply; 10+ messages in thread
From: Chen Ridong @ 2024-08-22 0:57 UTC (permalink / raw)
To: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes, roman.gushchin, Michal Koutný
Cc: bpf, cgroups, linux-kernel
On 2024/8/17 17:33, Chen Ridong wrote:
> We found a hung_task problem as shown below:
>
> INFO: task kworker/0:0:8 blocked for more than 327 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/0:0 state:D stack:13920 pid:8 ppid:2 flags:0x00004000
> Workqueue: events cgroup_bpf_release
> Call Trace:
> <TASK>
> __schedule+0x5a2/0x2050
> ? find_held_lock+0x33/0x100
> ? wq_worker_sleeping+0x9e/0xe0
> schedule+0x9f/0x180
> schedule_preempt_disabled+0x25/0x50
> __mutex_lock+0x512/0x740
> ? cgroup_bpf_release+0x1e/0x4d0
> ? cgroup_bpf_release+0xcf/0x4d0
> ? process_scheduled_works+0x161/0x8a0
> ? cgroup_bpf_release+0x1e/0x4d0
> ? mutex_lock_nested+0x2b/0x40
> ? __pfx_delay_tsc+0x10/0x10
> mutex_lock_nested+0x2b/0x40
> cgroup_bpf_release+0xcf/0x4d0
> ? process_scheduled_works+0x161/0x8a0
> ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0
> ? process_scheduled_works+0x161/0x8a0
> process_scheduled_works+0x23a/0x8a0
> worker_thread+0x231/0x5b0
> ? __pfx_worker_thread+0x10/0x10
> kthread+0x14d/0x1c0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x59/0x70
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1b/0x30
> </TASK>
>
> This issue can be reproduced by the following pressuse test:
> 1. A large number of cpuset cgroups are deleted.
> 2. Set cpu on and off repeatly.
> 3. Set watchdog_thresh repeatly.
> The scripts can be obtained at LINK mentioned above the signature.
>
> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
> acquired in different tasks, which may lead to deadlock.
> It can lead to a deadlock through the following steps:
> 1. A large number of cpusets are deleted asynchronously, which puts a
> large number of cgroup_bpf_release works into system_wq. The max_active
> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
> cgroup_bpf_release works, and many cgroup_bpf_release works will be put
> into inactive queue. As illustrated in the diagram, there are 256 (in
> the acvtive queue) + n (in the inactive queue) works.
> 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put
> smp_call_on_cpu work into system_wq. However step 1 has already filled
> system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has
> to wait until the works that were put into the inacvtive queue earlier
> have executed (n cgroup_bpf_release), so it will be blocked for a while.
> 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.
> 4. Cpusets that were deleted at step 1 put cgroup_release works into
> cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.
> When cgroup_metux is acqured by work at css_killed_work_fn, it will
> call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.
> However, cpuset_css_offline will be blocked for step 3.
> 5. At this moment, there are 256 works in active queue that are
> cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as
> a result, all of them are blocked. Consequently, sscs.work can not be
> executed. Ultimately, this situation leads to four processes being
> blocked, forming a deadlock.
>
> system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4)
> ...
> 2000+ cgroups deleted asyn
> 256 actives + n inactives
> __lockup_detector_reconfigure
> P(cpu_hotplug_lock.read)
> put sscs.work into system_wq
> 256 + n + 1(sscs.work)
> sscs.work wait to be executed
> warting sscs.work finish
> percpu_down_write
> P(cpu_hotplug_lock.write)
> ...blocking...
> css_killed_work_fn
> P(cgroup_mutex)
> cpuset_css_offline
> P(cpu_hotplug_lock.read)
> ...blocking...
> 256 cgroup_bpf_release
> mutex_lock(&cgroup_mutex);
> ..blocking...
>
> To fix the problem, place cgroup_bpf_release works on cgroup_destroy_wq,
> which can break the loop and solve the problem. System wqs are for misc
> things which shouldn't create a large number of concurrent work items.
> If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work
> items, it should use its own dedicated workqueue.
>
> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
> Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei.com/T/#t
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/bpf/cgroup.c | 2 +-
> kernel/cgroup/cgroup-internal.h | 1 +
> kernel/cgroup/cgroup.c | 2 +-
> 3 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 8ba73042a239..a611a1274788 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -334,7 +334,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref)
> struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt);
>
> INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release);
> - queue_work(system_wq, &cgrp->bpf.release_work);
> + queue_work(cgroup_destroy_wq, &cgrp->bpf.release_work);
> }
>
> /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through
> diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
> index c964dd7ff967..17ac19bc8106 100644
> --- a/kernel/cgroup/cgroup-internal.h
> +++ b/kernel/cgroup/cgroup-internal.h
> @@ -13,6 +13,7 @@
> extern spinlock_t trace_cgroup_path_lock;
> extern char trace_cgroup_path[TRACE_CGROUP_PATH_LEN];
> extern void __init enable_debug_cgroup(void);
> +extern struct workqueue_struct *cgroup_destroy_wq;
>
> /*
> * cgroup_path() takes a spin lock. It is good practice not to take
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 75058fbf4450..77fa9ed69c86 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -124,7 +124,7 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem);
> * destruction work items don't end up filling up max_active of system_wq
> * which may lead to deadlock.
> */
> -static struct workqueue_struct *cgroup_destroy_wq;
> +struct workqueue_struct *cgroup_destroy_wq;
>
> /* generate an array of cgroup subsystem pointers */
> #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys,
Ping.
Hi,TJ, Roman and Michal, I have updated commit message, I think it can
be much clearer now, can you review it again?
Thanks,
Ridong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-08-22 0:57 ` Chen Ridong
@ 2024-08-30 2:08 ` Chen Ridong
0 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2024-08-30 2:08 UTC (permalink / raw)
To: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes, roman.gushchin, Michal Koutný
Cc: bpf, cgroups, linux-kernel
On 2024/8/22 8:57, Chen Ridong wrote:
>
>
> On 2024/8/17 17:33, Chen Ridong wrote:
>> We found a hung_task problem as shown below:
>>
>> INFO: task kworker/0:0:8 blocked for more than 327 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/0:0 state:D stack:13920 pid:8 ppid:2
>> flags:0x00004000
>> Workqueue: events cgroup_bpf_release
>> Call Trace:
>> <TASK>
>> __schedule+0x5a2/0x2050
>> ? find_held_lock+0x33/0x100
>> ? wq_worker_sleeping+0x9e/0xe0
>> schedule+0x9f/0x180
>> schedule_preempt_disabled+0x25/0x50
>> __mutex_lock+0x512/0x740
>> ? cgroup_bpf_release+0x1e/0x4d0
>> ? cgroup_bpf_release+0xcf/0x4d0
>> ? process_scheduled_works+0x161/0x8a0
>> ? cgroup_bpf_release+0x1e/0x4d0
>> ? mutex_lock_nested+0x2b/0x40
>> ? __pfx_delay_tsc+0x10/0x10
>> mutex_lock_nested+0x2b/0x40
>> cgroup_bpf_release+0xcf/0x4d0
>> ? process_scheduled_works+0x161/0x8a0
>> ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0
>> ? process_scheduled_works+0x161/0x8a0
>> process_scheduled_works+0x23a/0x8a0
>> worker_thread+0x231/0x5b0
>> ? __pfx_worker_thread+0x10/0x10
>> kthread+0x14d/0x1c0
>> ? __pfx_kthread+0x10/0x10
>> ret_from_fork+0x59/0x70
>> ? __pfx_kthread+0x10/0x10
>> ret_from_fork_asm+0x1b/0x30
>> </TASK>
>>
>> This issue can be reproduced by the following pressuse test:
>> 1. A large number of cpuset cgroups are deleted.
>> 2. Set cpu on and off repeatly.
>> 3. Set watchdog_thresh repeatly.
>> The scripts can be obtained at LINK mentioned above the signature.
>>
>> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
>> acquired in different tasks, which may lead to deadlock.
>> It can lead to a deadlock through the following steps:
>> 1. A large number of cpusets are deleted asynchronously, which puts a
>> large number of cgroup_bpf_release works into system_wq. The
>> max_active
>> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works
>> are
>> cgroup_bpf_release works, and many cgroup_bpf_release works will
>> be put
>> into inactive queue. As illustrated in the diagram, there are 256 (in
>> the acvtive queue) + n (in the inactive queue) works.
>> 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put
>> smp_call_on_cpu work into system_wq. However step 1 has already
>> filled
>> system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has
>> to wait until the works that were put into the inacvtive queue
>> earlier
>> have executed (n cgroup_bpf_release), so it will be blocked for a
>> while.
>> 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by
>> step 2.
>> 4. Cpusets that were deleted at step 1 put cgroup_release works into
>> cgroup_destroy_wq. They are competing to get cgroup_mutex all the
>> time.
>> When cgroup_metux is acqured by work at css_killed_work_fn, it will
>> call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.
>> However, cpuset_css_offline will be blocked for step 3.
>> 5. At this moment, there are 256 works in active queue that are
>> cgroup_bpf_release, they are attempting to acquire cgroup_mutex,
>> and as
>> a result, all of them are blocked. Consequently, sscs.work can not be
>> executed. Ultimately, this situation leads to four processes being
>> blocked, forming a deadlock.
>>
>> system_wq(step1) WatchDog(step2) cpu
>> offline(step3) cgroup_destroy_wq(step4)
>> ...
>> 2000+ cgroups deleted asyn
>> 256 actives + n inactives
>> __lockup_detector_reconfigure
>> P(cpu_hotplug_lock.read)
>> put sscs.work into system_wq
>> 256 + n + 1(sscs.work)
>> sscs.work wait to be executed
>> warting sscs.work finish
>> percpu_down_write
>> P(cpu_hotplug_lock.write)
>> ...blocking...
>> css_killed_work_fn
>> P(cgroup_mutex)
>> cpuset_css_offline
>> P(cpu_hotplug_lock.read)
>> ...blocking...
>> 256 cgroup_bpf_release
>> mutex_lock(&cgroup_mutex);
>> ..blocking...
>>
>> To fix the problem, place cgroup_bpf_release works on cgroup_destroy_wq,
>> which can break the loop and solve the problem. System wqs are for misc
>> things which shouldn't create a large number of concurrent work items.
>> If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work
>> items, it should use its own dedicated workqueue.
>>
>> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from
>> cgroup itself")
>> Link:
>> https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei.com/T/#t
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> kernel/bpf/cgroup.c | 2 +-
>> kernel/cgroup/cgroup-internal.h | 1 +
>> kernel/cgroup/cgroup.c | 2 +-
>> 3 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 8ba73042a239..a611a1274788 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -334,7 +334,7 @@ static void cgroup_bpf_release_fn(struct
>> percpu_ref *ref)
>> struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt);
>> INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release);
>> - queue_work(system_wq, &cgrp->bpf.release_work);
>> + queue_work(cgroup_destroy_wq, &cgrp->bpf.release_work);
>> }
>> /* Get underlying bpf_prog of bpf_prog_list entry, regardless if
>> it's through
>> diff --git a/kernel/cgroup/cgroup-internal.h
>> b/kernel/cgroup/cgroup-internal.h
>> index c964dd7ff967..17ac19bc8106 100644
>> --- a/kernel/cgroup/cgroup-internal.h
>> +++ b/kernel/cgroup/cgroup-internal.h
>> @@ -13,6 +13,7 @@
>> extern spinlock_t trace_cgroup_path_lock;
>> extern char trace_cgroup_path[TRACE_CGROUP_PATH_LEN];
>> extern void __init enable_debug_cgroup(void);
>> +extern struct workqueue_struct *cgroup_destroy_wq;
>> /*
>> * cgroup_path() takes a spin lock. It is good practice not to take
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index 75058fbf4450..77fa9ed69c86 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -124,7 +124,7 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem);
>> * destruction work items don't end up filling up max_active of
>> system_wq
>> * which may lead to deadlock.
>> */
>> -static struct workqueue_struct *cgroup_destroy_wq;
>> +struct workqueue_struct *cgroup_destroy_wq;
>> /* generate an array of cgroup subsystem pointers */
>> #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys,
>
> Ping.
> Hi,TJ, Roman and Michal, I have updated commit message, I think it can
> be much clearer now, can you review it again?
>
> Thanks,
> Ridong
>
Friendly ping.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-08-17 9:33 ` [PATCH v3 1/1] cgroup: fix " Chen Ridong
2024-08-22 0:57 ` Chen Ridong
@ 2024-09-09 14:19 ` Michal Koutný
2024-09-10 1:31 ` Chen Ridong
1 sibling, 1 reply; 10+ messages in thread
From: Michal Koutný @ 2024-09-09 14:19 UTC (permalink / raw)
To: Chen Ridong
Cc: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes, roman.gushchin, bpf, cgroups, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5924 bytes --]
On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@huawei.com> wrote:
> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
> acquired in different tasks, which may lead to deadlock.
> It can lead to a deadlock through the following steps:
> 1. A large number of cpusets are deleted asynchronously, which puts a
> large number of cgroup_bpf_release works into system_wq. The max_active
> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
> cgroup_bpf_release works, and many cgroup_bpf_release works will be put
> into inactive queue. As illustrated in the diagram, there are 256 (in
> the acvtive queue) + n (in the inactive queue) works.
> 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put
> smp_call_on_cpu work into system_wq. However step 1 has already filled
> system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has
> to wait until the works that were put into the inacvtive queue earlier
> have executed (n cgroup_bpf_release), so it will be blocked for a while.
> 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.
> 4. Cpusets that were deleted at step 1 put cgroup_release works into
> cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.
> When cgroup_metux is acqured by work at css_killed_work_fn, it will
> call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.
> However, cpuset_css_offline will be blocked for step 3.
> 5. At this moment, there are 256 works in active queue that are
> cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as
> a result, all of them are blocked. Consequently, sscs.work can not be
> executed. Ultimately, this situation leads to four processes being
> blocked, forming a deadlock.
>
> system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4)
> ...
> 2000+ cgroups deleted asyn
> 256 actives + n inactives
> __lockup_detector_reconfigure
> P(cpu_hotplug_lock.read)
> put sscs.work into system_wq
> 256 + n + 1(sscs.work)
> sscs.work wait to be executed
> warting sscs.work finish
> percpu_down_write
> P(cpu_hotplug_lock.write)
> ...blocking...
> css_killed_work_fn
> P(cgroup_mutex)
> cpuset_css_offline
> P(cpu_hotplug_lock.read)
> ...blocking...
> 256 cgroup_bpf_release
> mutex_lock(&cgroup_mutex);
> ..blocking...
Thanks, Ridong, for laying this out.
Let me try to extract the core of the deps above.
The correct lock ordering is: cgroup_mutex then cpu_hotplug_lock.
However, the smp_call_on_cpu() under cpus_read_lock may lead to
a deadlock (ABBA over those two locks).
This is OK
thread T system_wq worker
lock(cgroup_mutex) (II)
...
unlock(cgroup_mutex)
down(cpu_hotplug_lock.read)
smp_call_on_cpu
queue_work_on(cpu, system_wq, scss) (I)
scss.func
wait_for_completion(scss)
up(cpu_hotplug_lock.read)
However, there is no ordering between (I) and (II) so they can also happen
in opposite
thread T system_wq worker
down(cpu_hotplug_lock.read)
smp_call_on_cpu
queue_work_on(cpu, system_wq, scss) (I)
lock(cgroup_mutex) (II)
...
unlock(cgroup_mutex)
scss.func
wait_for_completion(scss)
up(cpu_hotplug_lock.read)
And here the thread T + system_wq worker effectively call
cpu_hotplug_lock and cgroup_mutex in the wrong order. (And since they're
two threads, it won't be caught by lockdep.)
By that reasoning any holder of cgroup_mutex on system_wq makes system
susceptible to a deadlock (in presence of cpu_hotplug_lock waiting
writers + cpuset operations). And the two work items must meet in same
worker's processing hence probability is low (zero?) with less than
WQ_DFL_ACTIVE items.
(And more generally, any lock that is ordered before cpu_hotplug_lock
should not be taken in system_wq work functions. Or at least such works
items should not saturate WQ_DFL_ACTIVE workers.)
Wrt other uses of cgroup_mutex, I only see
bpf_map_free_in_work
queue_work(system_unbound_wq)
bpf_map_free_deferred
ops->map_free == cgroup_storage_map_free
cgroup_lock()
which is safe since it uses a different workqueue than system_wq.
> To fix the problem, place cgroup_bpf_release works on cgroup_destroy_wq,
> which can break the loop and solve the problem.
Yes, it moves the problematic cgroup_mutex holder away from system_wq
and cgroup_destroy_wq could not cause similar problems because there are
no explicit waiter for particular work items or its flushing.
> System wqs are for misc things which shouldn't create a large number
> of concurrent work items. If something is going to generate
> >WQ_DFL_ACTIVE(256) concurrent work
> items, it should use its own dedicated workqueue.
Actually, I'm not sure (because I lack workqueue knowledge) if producing
less than WQ_DFL_ACTIVE work items completely eliminates the chance of
two offending work items producing the wrong lock ordering.
> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
I'm now indifferent whether this is needed (perhaps in the sense it is
the _latest_ of multiple changes that contributed to possibility of this
deadlock scenario).
> Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei.com/T/#t
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/bpf/cgroup.c | 2 +-
> kernel/cgroup/cgroup-internal.h | 1 +
> kernel/cgroup/cgroup.c | 2 +-
> 3 files changed, 3 insertions(+), 2 deletions(-)
I have convinved myself now that you can put
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Regards,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-09-09 14:19 ` Michal Koutný
@ 2024-09-10 1:31 ` Chen Ridong
2024-09-10 21:02 ` Roman Gushchin
0 siblings, 1 reply; 10+ messages in thread
From: Chen Ridong @ 2024-09-10 1:31 UTC (permalink / raw)
To: Michal Koutný, Chen Ridong
Cc: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes, roman.gushchin, bpf, cgroups, linux-kernel
On 2024/9/9 22:19, Michal Koutný wrote:
> On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@huawei.com> wrote:
>> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
>> acquired in different tasks, which may lead to deadlock.
>> It can lead to a deadlock through the following steps:
>> 1. A large number of cpusets are deleted asynchronously, which puts a
>> large number of cgroup_bpf_release works into system_wq. The max_active
>> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
>> cgroup_bpf_release works, and many cgroup_bpf_release works will be put
>> into inactive queue. As illustrated in the diagram, there are 256 (in
>> the acvtive queue) + n (in the inactive queue) works.
>> 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put
>> smp_call_on_cpu work into system_wq. However step 1 has already filled
>> system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has
>> to wait until the works that were put into the inacvtive queue earlier
>> have executed (n cgroup_bpf_release), so it will be blocked for a while.
>> 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.
>> 4. Cpusets that were deleted at step 1 put cgroup_release works into
>> cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.
>> When cgroup_metux is acqured by work at css_killed_work_fn, it will
>> call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.
>> However, cpuset_css_offline will be blocked for step 3.
>> 5. At this moment, there are 256 works in active queue that are
>> cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as
>> a result, all of them are blocked. Consequently, sscs.work can not be
>> executed. Ultimately, this situation leads to four processes being
>> blocked, forming a deadlock.
>>
>> system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4)
>> ...
>> 2000+ cgroups deleted asyn
>> 256 actives + n inactives
>> __lockup_detector_reconfigure
>> P(cpu_hotplug_lock.read)
>> put sscs.work into system_wq
>> 256 + n + 1(sscs.work)
>> sscs.work wait to be executed
>> warting sscs.work finish
>> percpu_down_write
>> P(cpu_hotplug_lock.write)
>> ...blocking...
>> css_killed_work_fn
>> P(cgroup_mutex)
>> cpuset_css_offline
>> P(cpu_hotplug_lock.read)
>> ...blocking...
>> 256 cgroup_bpf_release
>> mutex_lock(&cgroup_mutex);
>> ..blocking...
>
> Thanks, Ridong, for laying this out.
> Let me try to extract the core of the deps above.
>
> The correct lock ordering is: cgroup_mutex then cpu_hotplug_lock.
> However, the smp_call_on_cpu() under cpus_read_lock may lead to
> a deadlock (ABBA over those two locks).
>
That's right.
> This is OK
> thread T system_wq worker
>
> lock(cgroup_mutex) (II)
> ...
> unlock(cgroup_mutex)
> down(cpu_hotplug_lock.read)
> smp_call_on_cpu
> queue_work_on(cpu, system_wq, scss) (I)
> scss.func
> wait_for_completion(scss)
> up(cpu_hotplug_lock.read)
>
> However, there is no ordering between (I) and (II) so they can also happen
> in opposite
>
> thread T system_wq worker
>
> down(cpu_hotplug_lock.read)
> smp_call_on_cpu
> queue_work_on(cpu, system_wq, scss) (I)
> lock(cgroup_mutex) (II)
> ...
> unlock(cgroup_mutex)
> scss.func
> wait_for_completion(scss)
> up(cpu_hotplug_lock.read)
>
> And here the thread T + system_wq worker effectively call
> cpu_hotplug_lock and cgroup_mutex in the wrong order. (And since they're
> two threads, it won't be caught by lockdep.)
>
> By that reasoning any holder of cgroup_mutex on system_wq makes system
> susceptible to a deadlock (in presence of cpu_hotplug_lock waiting
> writers + cpuset operations). And the two work items must meet in same
> worker's processing hence probability is low (zero?) with less than
> WQ_DFL_ACTIVE items.
>
> (And more generally, any lock that is ordered before cpu_hotplug_lock
> should not be taken in system_wq work functions. Or at least such works
> items should not saturate WQ_DFL_ACTIVE workers.)
>
> Wrt other uses of cgroup_mutex, I only see
> bpf_map_free_in_work
> queue_work(system_unbound_wq)
> bpf_map_free_deferred
> ops->map_free == cgroup_storage_map_free
> cgroup_lock()
> which is safe since it uses a different workqueue than system_wq.
>
>> To fix the problem, place cgroup_bpf_release works on cgroup_destroy_wq,
>> which can break the loop and solve the problem.
>
> Yes, it moves the problematic cgroup_mutex holder away from system_wq
> and cgroup_destroy_wq could not cause similar problems because there are
> no explicit waiter for particular work items or its flushing.
>
>
>> System wqs are for misc things which shouldn't create a large number
>> of concurrent work items. If something is going to generate
>>> WQ_DFL_ACTIVE(256) concurrent work
>> items, it should use its own dedicated workqueue.
>
> Actually, I'm not sure (because I lack workqueue knowledge) if producing
> less than WQ_DFL_ACTIVE work items completely eliminates the chance of
> two offending work items producing the wrong lock ordering.
>
If producing less than WQ_DFL_ACTIVE work items, it won't lead to a
deadlock. Because scss.func can be executed and doesn't have to wait for
work that holds cgroup_mutex to be completed. Therefore, the probability
is low and this issue can only be reproduced under pressure test.
>
>> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
>
> I'm now indifferent whether this is needed (perhaps in the sense it is
> the _latest_ of multiple changes that contributed to possibility of this
> deadlock scenario).
>
>
>> Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei.com/T/#t
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> kernel/bpf/cgroup.c | 2 +-
>> kernel/cgroup/cgroup-internal.h | 1 +
>> kernel/cgroup/cgroup.c | 2 +-
>> 3 files changed, 3 insertions(+), 2 deletions(-)
>
> I have convinved myself now that you can put
>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
>
> Regards,
> Michal
Thank you very much.
Best Regards,
Ridong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-09-10 1:31 ` Chen Ridong
@ 2024-09-10 21:02 ` Roman Gushchin
2024-09-10 21:17 ` Tejun Heo
0 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2024-09-10 21:02 UTC (permalink / raw)
To: Chen Ridong
Cc: Michal Koutný, Chen Ridong, martin.lau, ast, daniel, andrii,
eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, jolsa, tj, lizefan.x, hannes, bpf, cgroups, linux-kernel
On Tue, Sep 10, 2024 at 09:31:41AM +0800, Chen Ridong wrote:
>
>
> On 2024/9/9 22:19, Michal Koutný wrote:
> > On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@huawei.com> wrote:
> > > The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
> > > acquired in different tasks, which may lead to deadlock.
> > > It can lead to a deadlock through the following steps:
> > > 1. A large number of cpusets are deleted asynchronously, which puts a
> > > large number of cgroup_bpf_release works into system_wq. The max_active
> > > of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
> > > cgroup_bpf_release works, and many cgroup_bpf_release works will be put
> > > into inactive queue. As illustrated in the diagram, there are 256 (in
> > > the acvtive queue) + n (in the inactive queue) works.
> > > 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put
> > > smp_call_on_cpu work into system_wq. However step 1 has already filled
> > > system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has
> > > to wait until the works that were put into the inacvtive queue earlier
> > > have executed (n cgroup_bpf_release), so it will be blocked for a while.
> > > 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.
> > > 4. Cpusets that were deleted at step 1 put cgroup_release works into
> > > cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.
> > > When cgroup_metux is acqured by work at css_killed_work_fn, it will
> > > call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.
> > > However, cpuset_css_offline will be blocked for step 3.
> > > 5. At this moment, there are 256 works in active queue that are
> > > cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as
> > > a result, all of them are blocked. Consequently, sscs.work can not be
> > > executed. Ultimately, this situation leads to four processes being
> > > blocked, forming a deadlock.
> > >
> > > system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4)
> > > ...
> > > 2000+ cgroups deleted asyn
> > > 256 actives + n inactives
> > > __lockup_detector_reconfigure
> > > P(cpu_hotplug_lock.read)
> > > put sscs.work into system_wq
> > > 256 + n + 1(sscs.work)
> > > sscs.work wait to be executed
> > > warting sscs.work finish
> > > percpu_down_write
> > > P(cpu_hotplug_lock.write)
> > > ...blocking...
> > > css_killed_work_fn
> > > P(cgroup_mutex)
> > > cpuset_css_offline
> > > P(cpu_hotplug_lock.read)
> > > ...blocking...
> > > 256 cgroup_bpf_release
> > > mutex_lock(&cgroup_mutex);
> > > ..blocking...
> >
> > Thanks, Ridong, for laying this out.
> > Let me try to extract the core of the deps above.
> >
> > The correct lock ordering is: cgroup_mutex then cpu_hotplug_lock.
> > However, the smp_call_on_cpu() under cpus_read_lock may lead to
> > a deadlock (ABBA over those two locks).
> >
>
> That's right.
>
> > This is OK
> > thread T system_wq worker
> >
> > lock(cgroup_mutex) (II)
> > ...
> > unlock(cgroup_mutex)
> > down(cpu_hotplug_lock.read)
> > smp_call_on_cpu
> > queue_work_on(cpu, system_wq, scss) (I)
> > scss.func
> > wait_for_completion(scss)
> > up(cpu_hotplug_lock.read)
> >
> > However, there is no ordering between (I) and (II) so they can also happen
> > in opposite
> >
> > thread T system_wq worker
> >
> > down(cpu_hotplug_lock.read)
> > smp_call_on_cpu
> > queue_work_on(cpu, system_wq, scss) (I)
> > lock(cgroup_mutex) (II)
> > ...
> > unlock(cgroup_mutex)
> > scss.func
> > wait_for_completion(scss)
> > up(cpu_hotplug_lock.read)
> >
> > And here the thread T + system_wq worker effectively call
> > cpu_hotplug_lock and cgroup_mutex in the wrong order. (And since they're
> > two threads, it won't be caught by lockdep.)
> >
> > By that reasoning any holder of cgroup_mutex on system_wq makes system
> > susceptible to a deadlock (in presence of cpu_hotplug_lock waiting
> > writers + cpuset operations). And the two work items must meet in same
> > worker's processing hence probability is low (zero?) with less than
> > WQ_DFL_ACTIVE items.
Right, I'm on the same page. Should we document then somewhere that
the cgroup mutex can't be locked from a system wq context?
I think thus will also make the Fixes tag more meaningful.
Thank you!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-09-10 21:02 ` Roman Gushchin
@ 2024-09-10 21:17 ` Tejun Heo
2024-09-12 1:33 ` Chen Ridong
0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2024-09-10 21:17 UTC (permalink / raw)
To: Roman Gushchin
Cc: Chen Ridong, Michal Koutný, Chen Ridong, martin.lau, ast,
daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes, bpf, cgroups,
linux-kernel
On Tue, Sep 10, 2024 at 09:02:59PM +0000, Roman Gushchin wrote:
...
> > > By that reasoning any holder of cgroup_mutex on system_wq makes system
> > > susceptible to a deadlock (in presence of cpu_hotplug_lock waiting
> > > writers + cpuset operations). And the two work items must meet in same
> > > worker's processing hence probability is low (zero?) with less than
> > > WQ_DFL_ACTIVE items.
>
> Right, I'm on the same page. Should we document then somewhere that
> the cgroup mutex can't be locked from a system wq context?
>
> I think thus will also make the Fixes tag more meaningful.
I think that's completely fine. What's not fine is saturating system_wq.
Anything which creates a large number of concurrent work items should be
using its own workqueue. If anything, workqueue needs to add a warning for
saturation conditions and who are the offenders.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-09-10 21:17 ` Tejun Heo
@ 2024-09-12 1:33 ` Chen Ridong
2024-09-12 5:59 ` Tejun Heo
0 siblings, 1 reply; 10+ messages in thread
From: Chen Ridong @ 2024-09-12 1:33 UTC (permalink / raw)
To: Tejun Heo, Roman Gushchin
Cc: Michal Koutný, Chen Ridong, martin.lau, ast, daniel, andrii,
eddyz87, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, jolsa, lizefan.x, hannes, bpf, cgroups, linux-kernel
On 2024/9/11 5:17, Tejun Heo wrote:
> On Tue, Sep 10, 2024 at 09:02:59PM +0000, Roman Gushchin wrote:
> ...
>>>> By that reasoning any holder of cgroup_mutex on system_wq makes system
>>>> susceptible to a deadlock (in presence of cpu_hotplug_lock waiting
>>>> writers + cpuset operations). And the two work items must meet in same
>>>> worker's processing hence probability is low (zero?) with less than
>>>> WQ_DFL_ACTIVE items.
>>
>> Right, I'm on the same page. Should we document then somewhere that
>> the cgroup mutex can't be locked from a system wq context?
>>
>> I think thus will also make the Fixes tag more meaningful.
>
> I think that's completely fine. What's not fine is saturating system_wq.
> Anything which creates a large number of concurrent work items should be
> using its own workqueue. If anything, workqueue needs to add a warning for
> saturation conditions and who are the offenders.
>
> Thanks.
>
I will add a patch do document that.
Should we modify WQ_DFL_ACTIVE(256 now)? Maybe 1024 is acceptable?
Best regards,
Ridong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
2024-09-12 1:33 ` Chen Ridong
@ 2024-09-12 5:59 ` Tejun Heo
0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2024-09-12 5:59 UTC (permalink / raw)
To: Chen Ridong
Cc: Roman Gushchin, Michal Koutný, Chen Ridong, martin.lau, ast,
daniel, andrii, eddyz87, song, yonghong.song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes, bpf, cgroups,
linux-kernel
On Thu, Sep 12, 2024 at 09:33:23AM +0800, Chen Ridong wrote:
> I will add a patch do document that.
> Should we modify WQ_DFL_ACTIVE(256 now)? Maybe 1024 is acceptable?
Yeah, let's bump it.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-09-12 5:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-17 9:33 [PATCH v3 0/1] Fix deadlock caused by cgroup_mutex and cpu_hotplug_lock Chen Ridong
2024-08-17 9:33 ` [PATCH v3 1/1] cgroup: fix " Chen Ridong
2024-08-22 0:57 ` Chen Ridong
2024-08-30 2:08 ` Chen Ridong
2024-09-09 14:19 ` Michal Koutný
2024-09-10 1:31 ` Chen Ridong
2024-09-10 21:02 ` Roman Gushchin
2024-09-10 21:17 ` Tejun Heo
2024-09-12 1:33 ` Chen Ridong
2024-09-12 5:59 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox