* [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER
@ 2023-07-29 13:53 Aaron Tomlin
2023-07-29 13:53 ` [RFC PATCH 1/2] " Aaron Tomlin
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Aaron Tomlin @ 2023-07-29 13:53 UTC (permalink / raw)
To: linux-kernel; +Cc: tj, jiangshanlai, peterz
The Linux kernel does not provide a way to differentiate between a
kworker and a rescue kworker for user-mode.
From user-mode, one can establish if a task is a kworker by testing for
PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via
/proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search
for the function namely "rescuer_thread". This is only available to the
root user.
It can be useful to identify a rescue kworker since their CPU affinity
cannot be modified and their initial CPU assignment can be safely ignored.
Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and
WQ_SYSFS the cpumask file is not applicable to the rescue kworker.
By design a rescue kworker should run anywhere.
This patch series introduces PF_WQ_RESCUE_WORKER and ensures it is set and
cleared appropriately and simplifies current_is_workqueue_rescuer().
Aaron Tomlin (2):
workqueue: Introduce PF_WQ_RESCUE_WORKER
workqueue: Simplify current_is_workqueue_rescuer()
include/linux/sched.h | 2 +-
kernel/workqueue.c | 25 +++++++++++++++----------
2 files changed, 16 insertions(+), 11 deletions(-)
--
2.39.1
^ permalink raw reply [flat|nested] 25+ messages in thread* [RFC PATCH 1/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-07-29 13:53 [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Aaron Tomlin @ 2023-07-29 13:53 ` Aaron Tomlin 2023-07-29 16:07 ` Peter Zijlstra 2023-07-29 13:53 ` [RFC PATCH 2/2] workqueue: Simplify current_is_workqueue_rescuer() Aaron Tomlin ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Aaron Tomlin @ 2023-07-29 13:53 UTC (permalink / raw) To: linux-kernel; +Cc: tj, jiangshanlai, peterz The Linux kernel does not provide a way to differentiate between a kworker and a rescue kworker for user-mode. From user-mode, one can establish if a task is a kworker by testing for PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via /proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search for the function namely "rescuer_thread". This is only available to the root user. It can be useful to identify a rescue kworker since their CPU affinity cannot be modified and their initial CPU assignment can be safely ignored. Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and WQ_SYSFS the cpumask file is not applicable to the rescue kworker. By design a rescue kworker should run anywhere. This patch introduces PF_WQ_RESCUE_WORKER and ensures it is set and cleared appropriately. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> --- include/linux/sched.h | 2 +- kernel/workqueue.c | 19 ++++++++++++------- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 609bde814cb0..039fcf8d9ed6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1734,7 +1734,7 @@ extern struct pid *cad_pid; #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ -#define PF__HOLE__00010000 0x00010000 +#define PF_WQ_RESCUE_WORKER 0x00010000 /* I am a rescue workqueue worker */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 02a8f402eeb5..6d38d714b72b 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2665,13 +2665,18 @@ static void process_scheduled_works(struct worker *worker) } } -static void set_pf_worker(bool val) +static void set_pf_worker_and_rescuer(bool worker, bool rescue) { mutex_lock(&wq_pool_attach_mutex); - if (val) + if (worker) { current->flags |= PF_WQ_WORKER; - else + if (rescue) + current->flags |= PF_WQ_RESCUE_WORKER; + } else { current->flags &= ~PF_WQ_WORKER; + if (rescue) + current->flags &= ~PF_WQ_RESCUE_WORKER; + } mutex_unlock(&wq_pool_attach_mutex); } @@ -2693,14 +2698,14 @@ static int worker_thread(void *__worker) struct worker_pool *pool = worker->pool; /* tell the scheduler that this is a workqueue worker */ - set_pf_worker(true); + set_pf_worker_and_rescuer(true, false); woke_up: raw_spin_lock_irq(&pool->lock); /* am I supposed to die? */ if (unlikely(worker->flags & WORKER_DIE)) { raw_spin_unlock_irq(&pool->lock); - set_pf_worker(false); + set_pf_worker_and_rescuer(false, false); set_task_comm(worker->task, "kworker/dying"); ida_free(&pool->worker_ida, worker->id); @@ -2804,7 +2809,7 @@ static int rescuer_thread(void *__rescuer) * Mark rescuer as worker too. As WORKER_PREP is never cleared, it * doesn't participate in concurrency management. */ - set_pf_worker(true); + set_pf_worker_and_rescuer(true, true); repeat: set_current_state(TASK_IDLE); @@ -2903,7 +2908,7 @@ static int rescuer_thread(void *__rescuer) if (should_stop) { __set_current_state(TASK_RUNNING); - set_pf_worker(false); + set_pf_worker_and_rescuer(false, true); return 0; } -- 2.39.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 1/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-07-29 13:53 ` [RFC PATCH 1/2] " Aaron Tomlin @ 2023-07-29 16:07 ` Peter Zijlstra 2023-08-01 10:04 ` Aaron Tomlin 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2023-07-29 16:07 UTC (permalink / raw) To: Aaron Tomlin; +Cc: linux-kernel, tj, jiangshanlai On Sat, Jul 29, 2023 at 02:53:33PM +0100, Aaron Tomlin wrote: > The Linux kernel does not provide a way to differentiate between a > kworker and a rescue kworker for user-mode. > From user-mode, one can establish if a task is a kworker by testing for > PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via > /proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search > for the function namely "rescuer_thread". This is only available to the > root user. > > It can be useful to identify a rescue kworker since their CPU affinity > cannot be modified and their initial CPU assignment can be safely ignored. > Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and > WQ_SYSFS the cpumask file is not applicable to the rescue kworker. > By design a rescue kworker should run anywhere. > > This patch introduces PF_WQ_RESCUE_WORKER and ensures it is set and > cleared appropriately. Is the implication that PF_flags are considered ABI? We've been changing them quite a bit over the years. Also, while we have a few spare bits atm, we used to be nearly out for a while, and I just don't think this is sane usage of them. We don't use PF flags just for userspace. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 1/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-07-29 16:07 ` Peter Zijlstra @ 2023-08-01 10:04 ` Aaron Tomlin 0 siblings, 0 replies; 25+ messages in thread From: Aaron Tomlin @ 2023-08-01 10:04 UTC (permalink / raw) To: peterz, tj; +Cc: atomlin, jiangshanlai, linux-kernel > Is the implication that PF_flags are considered ABI? We've been changing > them quite a bit over the years. Hi Peter, Tejun, I never assumed they were. In this context, one should always check the Linux kernel source code first i.e. do not assume what is exported via /proc/[PID]/stat will be stable/or consistent between releases. > Also, while we have a few spare bits atm, we used to be nearly out for a > while, and I just don't think this is sane usage of them. We don't use PF > flags just for userspace. Fair statement. Albeit, I suspect it would still be useful for user-mode to easily differentiate between a kworker and a rescuer kworker. According to create_worker(), we do make it clear the difference between a CPU-specific and unbound kworker by way of the task's name. Looking at init_rescuer() a rescuer kworker is simply given the name of its workqueue. Would you consider modifying the rescuer's task's name so it is prefixed with "kworker/r-%s" and then include the workqueue's name e.g. "kworker/r-ext4-rsv-conver" acceptable? Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 2/2] workqueue: Simplify current_is_workqueue_rescuer() 2023-07-29 13:53 [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Aaron Tomlin 2023-07-29 13:53 ` [RFC PATCH 1/2] " Aaron Tomlin @ 2023-07-29 13:53 ` Aaron Tomlin 2023-07-31 23:35 ` [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Tejun Heo 2023-12-11 14:51 ` Juri Lelli 3 siblings, 0 replies; 25+ messages in thread From: Aaron Tomlin @ 2023-07-29 13:53 UTC (permalink / raw) To: linux-kernel; +Cc: tj, jiangshanlai, peterz No functional change. This patch simplifies current_is_workqueue_rescuer() due to the addition of PF_WQ_RESCUE_WORKER. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> --- kernel/workqueue.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 6d38d714b72b..3b7a4d60cb6a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4890,9 +4890,9 @@ EXPORT_SYMBOL(current_work); */ bool current_is_workqueue_rescuer(void) { - struct worker *worker = current_wq_worker(); - - return worker && worker->rescue_wq; + if (in_task() && (current->flags & PF_WQ_RESCUE_WORKER)) + return true; + return false; } /** -- 2.39.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-07-29 13:53 [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Aaron Tomlin 2023-07-29 13:53 ` [RFC PATCH 1/2] " Aaron Tomlin 2023-07-29 13:53 ` [RFC PATCH 2/2] workqueue: Simplify current_is_workqueue_rescuer() Aaron Tomlin @ 2023-07-31 23:35 ` Tejun Heo 2023-08-01 10:53 ` Aaron Tomlin 2023-12-11 14:51 ` Juri Lelli 3 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-07-31 23:35 UTC (permalink / raw) To: Aaron Tomlin; +Cc: linux-kernel, jiangshanlai, peterz Hello, On Sat, Jul 29, 2023 at 02:53:32PM +0100, Aaron Tomlin wrote: > It can be useful to identify a rescue kworker since their CPU affinity > cannot be modified and their initial CPU assignment can be safely ignored. You really shouldn't be setting affinities on kworkers manually. There's no way of knowing which kworker is going to execute which workqueue. Please use the attributes API and sysfs interface to modify per-workqueue worker attributes. If that's not sufficient and you need finer grained control, the right thing to do is using kthread_worker which gives you a dedicated kthread that you can manipulate as appropriate. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-07-31 23:35 ` [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Tejun Heo @ 2023-08-01 10:53 ` Aaron Tomlin 2023-08-02 18:10 ` Tejun Heo 0 siblings, 1 reply; 25+ messages in thread From: Aaron Tomlin @ 2023-08-01 10:53 UTC (permalink / raw) To: tj; +Cc: peterz, atomlin, jiangshanlai, linux-kernel > You really shouldn't be setting affinities on kworkers manually. There's > no way of knowing which kworker is going to execute which workqueue. > Please use the attributes API and sysfs interface to modify per-workqueue > worker attributes. If that's not sufficient and you need finer grained > control, the right thing to do is using kthread_worker which gives you a > dedicated kthread that you can manipulate as appropriate. Hi Tejun, I completely agree. Each kworker has PF_NO_SETAFFINITY applied anyway. If I understand correctly, only an unbound kworker can have their CPU affinity modified via sysfs. The objective of this series was to easily identify a rescuer kworker from user-mode. Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-08-01 10:53 ` Aaron Tomlin @ 2023-08-02 18:10 ` Tejun Heo 2023-08-03 20:19 ` Aaron Tomlin 0 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-08-02 18:10 UTC (permalink / raw) To: Aaron Tomlin; +Cc: peterz, jiangshanlai, linux-kernel Hello, On Tue, Aug 01, 2023 at 11:53:01AM +0100, Aaron Tomlin wrote: > > You really shouldn't be setting affinities on kworkers manually. There's > > no way of knowing which kworker is going to execute which workqueue. > > Please use the attributes API and sysfs interface to modify per-workqueue > > worker attributes. If that's not sufficient and you need finer grained > > control, the right thing to do is using kthread_worker which gives you a > > dedicated kthread that you can manipulate as appropriate. > > I completely agree. Each kworker has PF_NO_SETAFFINITY applied anyway. > If I understand correctly, only an unbound kworker can have their CPU > affinity modified via sysfs. The objective of this series was to easily > identify a rescuer kworker from user-mode. But why do you need to identify rescue workers? What are you trying to achieve? Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-08-02 18:10 ` Tejun Heo @ 2023-08-03 20:19 ` Aaron Tomlin 2023-08-03 20:34 ` Tejun Heo 0 siblings, 1 reply; 25+ messages in thread From: Aaron Tomlin @ 2023-08-03 20:19 UTC (permalink / raw) To: tj; +Cc: atomlin, jiangshanlai, linux-kernel, peterz > But why do you need to identify rescue workers? What are you trying to > achieve? Hi Tejun, I had a conversation with a colleague of mine. It can be useful to identify and account for all kernel threads. From the perspective of user-mode, the name given currently to the rescuer kworker is ambiguous. For instance, "kworker/u16:9-kcryptd/253:0" is clearly identifiable as an unbound kworker for the specified workqueue which can have their CPU affinity adjusted as you mentioned before. I think if we followed the same naming convention for a rescuer kworker then it would be more consistent. I'll send a patch so it can be discussed further. Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-08-03 20:19 ` Aaron Tomlin @ 2023-08-03 20:34 ` Tejun Heo 2023-08-05 23:45 ` Aaron Tomlin 0 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-08-03 20:34 UTC (permalink / raw) To: Aaron Tomlin; +Cc: jiangshanlai, linux-kernel, peterz On Thu, Aug 03, 2023 at 09:19:14PM +0100, Aaron Tomlin wrote: > > But why do you need to identify rescue workers? What are you trying to > > achieve? > > Hi Tejun, > > I had a conversation with a colleague of mine. It can be useful to identify > and account for all kernel threads. From the perspective of user-mode, the > name given currently to the rescuer kworker is ambiguous. For instance, > "kworker/u16:9-kcryptd/253:0" is clearly identifiable as an unbound kworker > for the specified workqueue which can have their CPU affinity adjusted as Note that the name changes to the work item the worker is currently executing. It won't stay that way. Workers are shared across the workqueues, so I'm not sure "identify and account all kernel threads" is working as well as you think it is. > you mentioned before. I think if we followed the same naming convention > for a rescuer kworker then it would be more consistent. I'll send a patch > so it can be discussed further. We can certainly rename them to indicate that they are rescuers - e.g. maybe krescuer? But, at the moment, the proposed reason seems rather dubious. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-08-03 20:34 ` Tejun Heo @ 2023-08-05 23:45 ` Aaron Tomlin 0 siblings, 0 replies; 25+ messages in thread From: Aaron Tomlin @ 2023-08-05 23:45 UTC (permalink / raw) To: tj; +Cc: atomlin, jiangshanlai, linux-kernel, peterz > Note that the name changes to the work item the worker is currently > executing. It won't stay that way. Workers are shared across the > workqueues, so I'm not sure "identify and account all kernel threads" is > working as well as you think it is. Hi Tejun, Indeed. The point is that these kworker kthreads are easily identifiable. > We can certainly rename them to indicate that they are rescuers - e.g. > maybe krescuer? But, at the moment, the proposed reason seems rather > dubious. Personally, I would prefer "kworker/r-%s" and then include the specified workqueue's name e.g. "kworker/r-ext4-rsv-conver". So the rescuer task's name is more consistent with the current naming scheme. I will send a follow up patch. Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-07-29 13:53 [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Aaron Tomlin ` (2 preceding siblings ...) 2023-07-31 23:35 ` [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Tejun Heo @ 2023-12-11 14:51 ` Juri Lelli 2023-12-11 18:39 ` Tejun Heo 3 siblings, 1 reply; 25+ messages in thread From: Juri Lelli @ 2023-12-11 14:51 UTC (permalink / raw) To: Aaron Tomlin; +Cc: linux-kernel, tj, jiangshanlai, peterz Hi, Just stumbled upon this series while looking into rescuers myself. :) On 29/07/23 14:53, Aaron Tomlin wrote: > The Linux kernel does not provide a way to differentiate between a > kworker and a rescue kworker for user-mode. > From user-mode, one can establish if a task is a kworker by testing for > PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via > /proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search > for the function namely "rescuer_thread". This is only available to the > root user. > > It can be useful to identify a rescue kworker since their CPU affinity > cannot be modified and their initial CPU assignment can be safely ignored. > Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and > WQ_SYSFS the cpumask file is not applicable to the rescue kworker. > By design a rescue kworker should run anywhere. Guess this is a requirement because, if workqueue processing is stuck for some reason, getting rescuers to run on the same set of cpus workqueues have been restricted to already doesn't really have good chances of making any progress? Wonder if we still might need some sort of fail hard/warn mode in case strict isolation is in place? Or maybe we have that already? Thanks! Juri ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-11 14:51 ` Juri Lelli @ 2023-12-11 18:39 ` Tejun Heo 2023-12-12 9:56 ` Juri Lelli 0 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-12-11 18:39 UTC (permalink / raw) To: Juri Lelli; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz Hello, On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote: > Guess this is a requirement because, if workqueue processing is stuck > for some reason, getting rescuers to run on the same set of cpus > workqueues have been restricted to already doesn't really have good > chances of making any progress? The only problem rescuers try to solve is deadlocks caused by lack of memory, so on the cpu side, it just follows whatever worker pool it's trying to help. > Wonder if we still might need some sort of fail hard/warn mode in case > strict isolation is in place? Or maybe we have that already? For both percpu and unbound workqueues, the rescuers just follow whatever pool it's trying to help at the moment, so it shouldn't cause any surprises in terms of isolation. It just temporarily joins the already active but stuck pool. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-11 18:39 ` Tejun Heo @ 2023-12-12 9:56 ` Juri Lelli 2023-12-12 17:14 ` Tejun Heo 0 siblings, 1 reply; 25+ messages in thread From: Juri Lelli @ 2023-12-12 9:56 UTC (permalink / raw) To: Tejun Heo; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz Hello, Thanks for the quick reply! On 11/12/23 08:39, Tejun Heo wrote: > Hello, > > On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote: > > Guess this is a requirement because, if workqueue processing is stuck > > for some reason, getting rescuers to run on the same set of cpus > > workqueues have been restricted to already doesn't really have good > > chances of making any progress? > > The only problem rescuers try to solve is deadlocks caused by lack of > memory, so on the cpu side, it just follows whatever worker pool it's trying > to help. > > > Wonder if we still might need some sort of fail hard/warn mode in case > > strict isolation is in place? Or maybe we have that already? > > For both percpu and unbound workqueues, the rescuers just follow whatever > pool it's trying to help at the moment, so it shouldn't cause any surprises > in terms of isolation. It just temporarily joins the already active but > stuck pool. Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, how are we making sure that the wake up is always happening on housekeeping CPUs (assuming unbound workqueues have been restricted to those)? AFAICS, we have send_mayday -> wake_up_process(wq->rescuer->task) which is not affined to the workqueue cpumask it's called to rescue, so in theory can be woken up anywhere? Thanks, Juri ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-12 9:56 ` Juri Lelli @ 2023-12-12 17:14 ` Tejun Heo 2023-12-12 19:06 ` Aaron Tomlin 2023-12-13 8:59 ` Juri Lelli 0 siblings, 2 replies; 25+ messages in thread From: Tejun Heo @ 2023-12-12 17:14 UTC (permalink / raw) To: Juri Lelli; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz Hello, Juri. On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote: > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, > how are we making sure that the wake up is always happening on > housekeeping CPUs (assuming unbound workqueues have been restricted to > those)? > > AFAICS, we have > > send_mayday -> > wake_up_process(wq->rescuer->task) > > which is not affined to the workqueue cpumask it's called to rescue, so > in theory can be woken up anywhere? Ah, was only thinking about work item execution. Yeah, it's not following the isolation rule there and we probably should affine it as we're waking it up. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-12 17:14 ` Tejun Heo @ 2023-12-12 19:06 ` Aaron Tomlin 2023-12-12 20:16 ` Tejun Heo 2023-12-13 8:59 ` Juri Lelli 1 sibling, 1 reply; 25+ messages in thread From: Aaron Tomlin @ 2023-12-12 19:06 UTC (permalink / raw) To: Tejun Heo; +Cc: Juri Lelli, linux-kernel, jiangshanlai, peterz On Tue, Dec 12, 2023 at 07:14:48AM -1000, Tejun Heo wrote: > Hello, Juri. > > On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote: > > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, > > how are we making sure that the wake up is always happening on > > housekeeping CPUs (assuming unbound workqueues have been restricted to > > those)? > > > > AFAICS, we have > > > > send_mayday -> > > wake_up_process(wq->rescuer->task) > > > > which is not affined to the workqueue cpumask it's called to rescue, so > > in theory can be woken up anywhere? > > Ah, was only thinking about work item execution. Yeah, it's not following > the isolation rule there and we probably should affine it as we're waking it > up. Hi Tejun, I am confused. I thought by design we want a rescuer kthread to execute on any CPU, no? Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-12 19:06 ` Aaron Tomlin @ 2023-12-12 20:16 ` Tejun Heo 0 siblings, 0 replies; 25+ messages in thread From: Tejun Heo @ 2023-12-12 20:16 UTC (permalink / raw) To: Aaron Tomlin; +Cc: Juri Lelli, linux-kernel, jiangshanlai, peterz On Tue, Dec 12, 2023 at 07:06:48PM +0000, Aaron Tomlin wrote: > I thought by design we want a rescuer kthread to execute on any CPU, no? Well, it needs to be able to move around because it dynamically attaches to the worker pool it's rescuing and needs to take on its cpumask, but it doesn't have to be able to run on all cpus all the time. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-12 17:14 ` Tejun Heo 2023-12-12 19:06 ` Aaron Tomlin @ 2023-12-13 8:59 ` Juri Lelli 2023-12-13 15:35 ` Tejun Heo 1 sibling, 1 reply; 25+ messages in thread From: Juri Lelli @ 2023-12-13 8:59 UTC (permalink / raw) To: Tejun Heo; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz On 12/12/23 07:14, Tejun Heo wrote: > Hello, Juri. > > On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote: > > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up, > > how are we making sure that the wake up is always happening on > > housekeeping CPUs (assuming unbound workqueues have been restricted to > > those)? > > > > AFAICS, we have > > > > send_mayday -> > > wake_up_process(wq->rescuer->task) > > > > which is not affined to the workqueue cpumask it's called to rescue, so > > in theory can be woken up anywhere? > > Ah, was only thinking about work item execution. Yeah, it's not following > the isolation rule there and we probably should affine it as we're waking it > up. Something like the following then maybe? --- kernel/workqueue.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2989b57e154a7..ed73f7f80d57d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx) link_pwq(ctx->dfl_pwq); swap(ctx->wq->dfl_pwq, ctx->dfl_pwq); + /* rescuer needs to respect wq cpumask changes */ + if (ctx->wq->rescuer) { + kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask); + wake_up_process(ctx->wq->rescuer->task); + } + mutex_unlock(&ctx->wq->mutex); } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-13 8:59 ` Juri Lelli @ 2023-12-13 15:35 ` Tejun Heo 2023-12-13 18:32 ` Juri Lelli 0 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-12-13 15:35 UTC (permalink / raw) To: Juri Lelli; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz Hello, On Wed, Dec 13, 2023 at 09:59:42AM +0100, Juri Lelli wrote: > Something like the following then maybe? > > --- > kernel/workqueue.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 2989b57e154a7..ed73f7f80d57d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx) > link_pwq(ctx->dfl_pwq); > swap(ctx->wq->dfl_pwq, ctx->dfl_pwq); > > + /* rescuer needs to respect wq cpumask changes */ > + if (ctx->wq->rescuer) { > + kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask); > + wake_up_process(ctx->wq->rescuer->task); > + } > + > mutex_unlock(&ctx->wq->mutex); > } I'm not sure kthread_bind_mask() would be safe here. The rescuer might be running a work item. wait_task_inactive() might fail and we don't want to change cpumask while the rescuer is active anyway. Maybe the easiest way to do this is making rescuer_thread() restore the wq's cpumask right before going to sleep, and making apply_wqattrs_commit() just wake up the rescuer. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-13 15:35 ` Tejun Heo @ 2023-12-13 18:32 ` Juri Lelli 2023-12-13 18:38 ` Tejun Heo 0 siblings, 1 reply; 25+ messages in thread From: Juri Lelli @ 2023-12-13 18:32 UTC (permalink / raw) To: Tejun Heo; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz On 13/12/23 05:35, Tejun Heo wrote: > Hello, > > On Wed, Dec 13, 2023 at 09:59:42AM +0100, Juri Lelli wrote: > > Something like the following then maybe? > > > > --- > > kernel/workqueue.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 2989b57e154a7..ed73f7f80d57d 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx) > > link_pwq(ctx->dfl_pwq); > > swap(ctx->wq->dfl_pwq, ctx->dfl_pwq); > > > > + /* rescuer needs to respect wq cpumask changes */ > > + if (ctx->wq->rescuer) { > > + kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask); > > + wake_up_process(ctx->wq->rescuer->task); > > + } > > + > > mutex_unlock(&ctx->wq->mutex); > > } > > I'm not sure kthread_bind_mask() would be safe here. The rescuer might be > running a work item. wait_task_inactive() might fail and we don't want to > change cpumask while the rescuer is active anyway. > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's > cpumask right before going to sleep, and making apply_wqattrs_commit() just > wake up the rescuer. Hummm, don't think we can call that either while the rescuer is actually running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/ in the above? Thanks, Juri ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-13 18:32 ` Juri Lelli @ 2023-12-13 18:38 ` Tejun Heo 2023-12-14 11:25 ` Juri Lelli 0 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-12-13 18:38 UTC (permalink / raw) To: Juri Lelli; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz On Wed, Dec 13, 2023 at 07:32:10PM +0100, Juri Lelli wrote: > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's > > cpumask right before going to sleep, and making apply_wqattrs_commit() just > > wake up the rescuer. > > Hummm, don't think we can call that either while the rescuer is actually > running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/ > in the above? So, we have to use set_cpus_allowed_ptr() but we still don't want to change the affinity of a rescuer which is already running a task for a pool. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-13 18:38 ` Tejun Heo @ 2023-12-14 11:25 ` Juri Lelli 2023-12-14 19:47 ` Tejun Heo 0 siblings, 1 reply; 25+ messages in thread From: Juri Lelli @ 2023-12-14 11:25 UTC (permalink / raw) To: Tejun Heo; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz On 13/12/23 08:38, Tejun Heo wrote: > On Wed, Dec 13, 2023 at 07:32:10PM +0100, Juri Lelli wrote: > > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's > > > cpumask right before going to sleep, and making apply_wqattrs_commit() just > > > wake up the rescuer. > > > > Hummm, don't think we can call that either while the rescuer is actually > > running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/ > > in the above? > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > the affinity of a rescuer which is already running a task for a pool. But then, even today, a rescuer might keep handling work on a cpu outside its wq cpumask if the associated wq cpumask change can proceed w/o waiting for it to finish the iteration? BTW, apologies for all the questions, but I'd like to make sure I can get the implications hopefully right. :) Thanks, Juri ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-14 11:25 ` Juri Lelli @ 2023-12-14 19:47 ` Tejun Heo 2023-12-15 6:50 ` Juri Lelli 0 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2023-12-14 19:47 UTC (permalink / raw) To: Juri Lelli; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz Hello, On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote: > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > > the affinity of a rescuer which is already running a task for a pool. > > But then, even today, a rescuer might keep handling work on a cpu > outside its wq cpumask if the associated wq cpumask change can proceed > w/o waiting for it to finish the iteration? Yeah, that can happen and pool cpumasks naturally being subsets of the wq's cpumask that they're serving, your original approach likely isn't broken either. > BTW, apologies for all the questions, but I'd like to make sure I can > get the implications hopefully right. :) I obviously haven't thought through it very well, so thanks for the questions. So, yeah, I think we actually need to set the rescuer's cpumask when wq's cpumask changes and doing it where you were suggesting should probably work. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-14 19:47 ` Tejun Heo @ 2023-12-15 6:50 ` Juri Lelli 2023-12-19 8:55 ` Juri Lelli 0 siblings, 1 reply; 25+ messages in thread From: Juri Lelli @ 2023-12-15 6:50 UTC (permalink / raw) To: Tejun Heo; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz On 14/12/23 09:47, Tejun Heo wrote: > Hello, > > On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote: > > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > > > the affinity of a rescuer which is already running a task for a pool. > > > > But then, even today, a rescuer might keep handling work on a cpu > > outside its wq cpumask if the associated wq cpumask change can proceed > > w/o waiting for it to finish the iteration? > > Yeah, that can happen and pool cpumasks naturally being subsets of the wq's > cpumask that they're serving, your original approach likely isn't broken > either. > > > BTW, apologies for all the questions, but I'd like to make sure I can > > get the implications hopefully right. :) > > I obviously haven't thought through it very well, so thanks for the > questions. So, yeah, I think we actually need to set the rescuer's cpumask > when wq's cpumask changes and doing it where you were suggesting should > probably work. OK. Going to send a proper patch asap. Thanks! Juri ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER 2023-12-15 6:50 ` Juri Lelli @ 2023-12-19 8:55 ` Juri Lelli 0 siblings, 0 replies; 25+ messages in thread From: Juri Lelli @ 2023-12-19 8:55 UTC (permalink / raw) To: Tejun Heo; +Cc: Aaron Tomlin, linux-kernel, jiangshanlai, peterz Hello again, On 15/12/23 07:50, Juri Lelli wrote: > On 14/12/23 09:47, Tejun Heo wrote: > > Hello, > > > > On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote: > > > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change > > > > the affinity of a rescuer which is already running a task for a pool. > > > > > > But then, even today, a rescuer might keep handling work on a cpu > > > outside its wq cpumask if the associated wq cpumask change can proceed > > > w/o waiting for it to finish the iteration? > > > > Yeah, that can happen and pool cpumasks naturally being subsets of the wq's > > cpumask that they're serving, your original approach likely isn't broken > > either. > > > > > BTW, apologies for all the questions, but I'd like to make sure I can > > > get the implications hopefully right. :) > > > > I obviously haven't thought through it very well, so thanks for the > > questions. So, yeah, I think we actually need to set the rescuer's cpumask > > when wq's cpumask changes and doing it where you were suggesting should > > probably work. > > OK. Going to send a proper patch asap. I actually didn't do that yet as it turns out the proposed approach doesn't cover !WQ_SYSFS unbounded wqs. Well, I thought those should be covered as well, since we have (initiated by echo <mask> into /sys/devices/virtual/workqueue/cpumask) workqueue_apply_unbound_cpumask -> apply_wqattrs_commit but for some reason the mask change is not reflected into rescuers affinity. Trying to dig deeper I went ahead and extended the recent wq_dump.py addition with the following --- ls/workqueue/wq_dump.py | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py index d0df5833f2c18..6da621989e210 100644 --- a/tools/workqueue/wq_dump.py +++ b/tools/workqueue/wq_dump.py @@ -175,3 +175,32 @@ for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_( if wq.flags & WQ_UNBOUND: print(f' {wq.dfl_pwq.pool.id.value_():{max_pool_id_len}}', end='') print('') + +print('') +print('Workqueue -> rescuer') +print('=====================') +print(f'wq_unbound_cpumask={cpumask_str(wq_unbound_cpumask)}') +print('') +print('[ workqueue \ type unbound_cpumask rescuer pid cpumask]') + +for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_(), 'list'): + print(f'{wq.name.string_().decode()[-24:]:24}', end='') + if wq.flags & WQ_UNBOUND: + if wq.flags & WQ_ORDERED: + print(' ordered ', end='') + else: + print(' unbound', end='') + if wq.unbound_attrs.affn_strict: + print(',S ', end='') + else: + print(' ', end='') + print(f' {cpumask_str(wq.unbound_attrs.cpumask):24}', end='') + else: + print(' percpu ', end='') + print(' ', end='') + + if wq.flags & WQ_MEM_RECLAIM: + print(f' {wq.rescuer.task.comm.string_().decode()[-24:]:24}', end='') + print(f' {wq.rescuer.task.pid.value_():5}', end='') + print(f' {cpumask_str(wq.rescuer.task.cpus_ptr)}', end='') + print('') --- which shows the following situation after an # echo 00,00000003 > /sys/devices/virtual/workqueue/cpumask on the system I'm testing with: ... Workqueue -> rescuer ===================== wq_unbound_cpumask=00000003 [ workqueue \ type unbound_cpumask rescuer pid cpumask] events percpu events_highpri percpu events_long percpu events_unbound unbound 0xffffffff 000000ff events_freezable percpu events_power_efficient percpu events_freezable_power_ percpu rcu_gp percpu kworker/R-rcu_g 4 0xffffffff 000000ff rcu_par_gp percpu kworker/R-rcu_p 5 0xffffffff 000000ff slub_flushwq percpu kworker/R-slub_ 6 0xffffffff 000000ff netns ordered 0xffffffff 000000ff kworker/R-netns 7 0xffffffff 000000ff mm_percpu_wq percpu kworker/R-mm_pe 13 0xffffffff 000000ff cpuset_migrate_mm ordered 0xffffffff 000000ff inet_frag_wq percpu kworker/R-inet_ 300 0xffffffff 000000ff pm percpu cgroup_destroy percpu cgroup_pidlist_destroy percpu writeback unbound 0xffffffff 000000ff kworker/R-write 308 0xffffffff 000000ff cgwb_release percpu cryptd percpu kworker/R-crypt 314 0xffffffff 000000ff kintegrityd percpu kworker/R-kinte 315 0xffffffff 000000ff kblockd percpu kworker/R-kbloc 316 0xffffffff 000000ff kacpid percpu kacpi_notify percpu kacpi_hotplug ordered 0xffffffff 000000ff kec ordered 0xffffffff 000000ff kec_query percpu tpm_dev_wq percpu kworker/R-tpm_d 352 0xffffffff 000000ff usb_hub_wq percpu md percpu kworker/R-md 353 0xffffffff 000000ff md_misc percpu md_bitmap unbound 0xffffffff 000000ff kworker/R-md_bi 354 0xffffffff 000000ff edac-poller ordered 0xffffffff 000000ff kworker/R-edac- 355 0xffffffff 000000ff ... I guess I expected wq_unbound_cpumask and unbound_cpumask for each unbound wq to be kept in sync, so I'm evidently missing details. :) Can you please help me here understanding what am I missing? Thanks! Juri ^ permalink raw reply related [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-12-19 8:55 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-29 13:53 [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Aaron Tomlin 2023-07-29 13:53 ` [RFC PATCH 1/2] " Aaron Tomlin 2023-07-29 16:07 ` Peter Zijlstra 2023-08-01 10:04 ` Aaron Tomlin 2023-07-29 13:53 ` [RFC PATCH 2/2] workqueue: Simplify current_is_workqueue_rescuer() Aaron Tomlin 2023-07-31 23:35 ` [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER Tejun Heo 2023-08-01 10:53 ` Aaron Tomlin 2023-08-02 18:10 ` Tejun Heo 2023-08-03 20:19 ` Aaron Tomlin 2023-08-03 20:34 ` Tejun Heo 2023-08-05 23:45 ` Aaron Tomlin 2023-12-11 14:51 ` Juri Lelli 2023-12-11 18:39 ` Tejun Heo 2023-12-12 9:56 ` Juri Lelli 2023-12-12 17:14 ` Tejun Heo 2023-12-12 19:06 ` Aaron Tomlin 2023-12-12 20:16 ` Tejun Heo 2023-12-13 8:59 ` Juri Lelli 2023-12-13 15:35 ` Tejun Heo 2023-12-13 18:32 ` Juri Lelli 2023-12-13 18:38 ` Tejun Heo 2023-12-14 11:25 ` Juri Lelli 2023-12-14 19:47 ` Tejun Heo 2023-12-15 6:50 ` Juri Lelli 2023-12-19 8:55 ` Juri Lelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox