* [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default @ 2015-07-16 19:16 Daniel Bristot de Oliveira 2015-07-16 19:24 ` Tejun Heo 0 siblings, 1 reply; 19+ messages in thread From: Daniel Bristot de Oliveira @ 2015-07-16 19:16 UTC (permalink / raw) To: Tejun Heo, LKML Cc: Lai Jiangshan, Frederic Weisbecker, Rik van Riel, Luis Claudio R. Goncalves By default, unbounded workqueues run on all CPUs, which includes isolated CPUs. This patch avoids unbounded workqueues running on isolated CPUs by default, keeping the current behavior when no CPUs were isolated. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> --- kernel/workqueue.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 4c4f061..14d17f4 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5217,7 +5217,10 @@ static int __init init_workqueues(void) WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long)); BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); - cpumask_copy(wq_unbound_cpumask, cpu_possible_mask); + + /* by default, run unbound wq on non-isolated CPUs */ + cpumask_andnot(wq_unbound_cpumask, cpu_possible_mask, + cpu_isolated_map); pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); -- 2.1.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-16 19:16 [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default Daniel Bristot de Oliveira @ 2015-07-16 19:24 ` Tejun Heo 2015-07-17 4:26 ` Mike Galbraith 0 siblings, 1 reply; 19+ messages in thread From: Tejun Heo @ 2015-07-16 19:24 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: LKML, Lai Jiangshan, Frederic Weisbecker, Rik van Riel, Luis Claudio R. Goncalves Hello, On Thu, Jul 16, 2015 at 04:16:23PM -0300, Daniel Bristot de Oliveira wrote: > By default, unbounded workqueues run on all CPUs, which includes > isolated CPUs. This patch avoids unbounded workqueues running on > isolated CPUs by default, keeping the current behavior when no > CPUs were isolated. I don't have a strong opinion about this. Frederic, was there a reason we didn't do this when we added cpumasks for unbound wq's? Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-16 19:24 ` Tejun Heo @ 2015-07-17 4:26 ` Mike Galbraith 2015-07-17 15:27 ` Tejun Heo 0 siblings, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2015-07-17 4:26 UTC (permalink / raw) To: Tejun Heo Cc: Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Frederic Weisbecker, Rik van Riel, Luis Claudio R. Goncalves On Thu, 2015-07-16 at 15:24 -0400, Tejun Heo wrote: > Hello, > > On Thu, Jul 16, 2015 at 04:16:23PM -0300, Daniel Bristot de Oliveira wrote: > > By default, unbounded workqueues run on all CPUs, which includes > > isolated CPUs. This patch avoids unbounded workqueues running on > > isolated CPUs by default, keeping the current behavior when no > > CPUs were isolated. > > I don't have a strong opinion about this. Frederic, was there a > reason we didn't do this when we added cpumasks for unbound wq's? Hm, I thought the plan was that after the Lai's unbound series landed, his ordered wq patch would follow, but perhaps not. I'm referring to the somewhat aged patch below. (freshly wedged into master, and maybe not properly, but it should at least look familiar). From: Lai Jiangshan <laijs@cn.fujitsu.com> Date: Tue, 15 Apr 2014 17:52:19 +0800 Subject: [PATCH] workqueue: allow changing attributions of ordered workqueue Allow changing ordered workqueue's cpumask Allow changing ordered workqueue's nice value Allow registering ordered workqueue to SYSFS Still disallow changing ordered workqueue's max_active which breaks ordered guarantee Still disallow changing ordered workqueue's no_numa which breaks ordered guarantee Changing attributions will introduce new pwqs in a given workqueue, and old implement doesn't have any solution to handle multi-pwqs on ordered workqueues, so changing attributions for ordered workqueues is disallowed. After I investigated several solutions which try to handle multi-pwqs on ordered workqueues, I found the solution which reuse max_active are the simplest. In this solution, the new introduced pwq's max_active is kept as *ZERO* until it become the oldest pwq. Thus only one (the oldest) pwq is active, and ordered is guarantee. This solution forces ordered on higher level while the non-reentrancy is also forced. so we don't need to queue any work to its previous pool. And we shouldn't do it. we must disallow any work repeatedly requeues itself back-to-back which keeps the oldest pwq stay and make newest(if have) pwq starvation(non-active for ever). Build test only. This patch depends on previous patch: workqueue: add __WQ_FREEZING and remove POOL_FREEZING Frederic: You can pick this patch to your updated patchset before TJ apply it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> --- kernel/workqueue.c | 65 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 23 deletions(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1331,8 +1331,14 @@ static void __queue_work(int cpu, struct * If @work was previously on a different pool, it might still be * running there, in which case the work needs to be queued on that * pool to guarantee non-reentrancy. + * + * pwqs are guaranteed active orderly for ordered workqueue, and + * it guarantees non-reentrancy for works. So any work doesn't need + * to be queued on previous pool. And the works shouldn't be queued + * on previous pool, since we need to guarantee the prevous pwq + * releasable to avoid work-stavation on the newest pool. */ - last_pool = get_work_pool(work); + last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work); if (last_pool && last_pool != pwq->pool) { struct worker *worker; @@ -3283,6 +3289,13 @@ static void rcu_free_pwq(struct rcu_head container_of(rcu, struct pool_workqueue, rcu)); } +static struct pool_workqueue *oldest_pwq(struct workqueue_struct *wq) +{ + return list_last_entry(&wq->pwqs, struct pool_workqueue, pwqs_node); +} + +static void pwq_adjust_max_active(struct pool_workqueue *pwq); + /* * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt * and needs to be destroyed. @@ -3301,6 +3314,9 @@ static void pwq_unbound_release_workfn(s mutex_lock(&wq->mutex); list_del_rcu(&pwq->pwqs_node); is_last = list_empty(&wq->pwqs); + /* try to active the oldest pwq when needed */ + if (!is_last && (wq->flags & __WQ_ORDERED)) + pwq_adjust_max_active(oldest_pwq(wq)); mutex_unlock(&wq->mutex); mutex_lock(&wq_pool_mutex); @@ -3317,6 +3333,16 @@ static void pwq_unbound_release_workfn(s call_rcu_sched(&wq->rcu, rcu_free_wq); } +static bool pwq_active(struct pool_workqueue *pwq) +{ + /* Only the oldest pwq is active in the ordered wq */ + if (pwq->wq->flags & __WQ_ORDERED) + return pwq == oldest_pwq(pwq->wq); + + /* All pwqs in the non-ordered wq are active */ + return true; +} + /** * pwq_adjust_max_active - update a pwq's max_active to the current setting * @pwq: target pool_workqueue @@ -3344,7 +3370,7 @@ static void pwq_adjust_max_active(struct * this function is called at least once after @workqueue_freezing * is updated and visible. */ - if (!freezable || !workqueue_freezing) { + if ((!freezable || !workqueue_freezing) && pwq_active(pwq)) { pwq->max_active = wq->saved_max_active; while (!list_empty(&pwq->delayed_works) && @@ -3395,11 +3421,11 @@ static void link_pwq(struct pool_workque /* set the matching work_color */ pwq->work_color = wq->work_color; + /* link in @pwq on the head of &wq->pwqs */ + list_add_rcu(&pwq->pwqs_node, &wq->pwqs); + /* sync max_active to the current setting */ pwq_adjust_max_active(pwq); - - /* link in @pwq */ - list_add_rcu(&pwq->pwqs_node, &wq->pwqs); } /* obtain a pool matching @attr and create a pwq associating the pool and @wq */ @@ -3630,8 +3656,8 @@ static int apply_workqueue_attrs_locked( if (WARN_ON(!(wq->flags & WQ_UNBOUND))) return -EINVAL; - /* creating multiple pwqs breaks ordering guarantee */ - if (WARN_ON((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs))) + /* creating multiple per-node pwqs breaks ordering guarantee */ + if (WARN_ON((wq->flags & __WQ_ORDERED) && !attrs->no_numa)) return -EINVAL; ctx = apply_wqattrs_prepare(wq, attrs); @@ -3763,7 +3789,7 @@ static void wq_update_unbound_numa(struc static int alloc_and_link_pwqs(struct workqueue_struct *wq) { bool highpri = wq->flags & WQ_HIGHPRI; - int cpu, ret; + int cpu; if (!(wq->flags & WQ_UNBOUND)) { wq->cpu_pwqs = alloc_percpu(struct pool_workqueue); @@ -3784,12 +3810,7 @@ static int alloc_and_link_pwqs(struct wo } return 0; } else if (wq->flags & __WQ_ORDERED) { - ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]); - /* there should only be single pwq for ordering guarantee */ - WARN(!ret && (wq->pwqs.next != &wq->dfl_pwq->pwqs_node || - wq->pwqs.prev != &wq->dfl_pwq->pwqs_node), - "ordering guarantee broken for workqueue %s\n", wq->name); - return ret; + return apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]); } else { return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]); } @@ -5018,11 +5039,17 @@ static ssize_t wq_numa_store(struct devi const char *buf, size_t count) { struct workqueue_struct *wq = dev_to_wq(dev); - struct workqueue_attrs *attrs; + struct workqueue_attrs *attrs = NULL; int v, ret = -ENOMEM; apply_wqattrs_lock(); + /* Creating per-node pwqs breaks ordering guarantee. Keep no_numa = 1 */ + if (WARN_ON(wq->flags & __WQ_ORDERED)) { + ret = -EINVAL; + goto out_unlock; + } + attrs = wq_sysfs_prep_attrs(wq); if (!attrs) goto out_unlock; @@ -5125,14 +5152,6 @@ int workqueue_sysfs_register(struct work struct wq_device *wq_dev; int ret; - /* - * Adjusting max_active or creating new pwqs by applying - * attributes breaks ordering guarantee. Disallow exposing ordered - * workqueues. - */ - if (WARN_ON(wq->flags & __WQ_ORDERED)) - return -EINVAL; - wq->wq_dev = wq_dev = kzalloc(sizeof(*wq_dev), GFP_KERNEL); if (!wq_dev) return -ENOMEM; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-17 4:26 ` Mike Galbraith @ 2015-07-17 15:27 ` Tejun Heo 2015-07-17 15:35 ` Frederic Weisbecker 2015-07-17 17:15 ` Mike Galbraith 0 siblings, 2 replies; 19+ messages in thread From: Tejun Heo @ 2015-07-17 15:27 UTC (permalink / raw) To: Mike Galbraith Cc: Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Frederic Weisbecker, Rik van Riel, Luis Claudio R. Goncalves Hello, Mike. On Fri, Jul 17, 2015 at 06:26:30AM +0200, Mike Galbraith wrote: > Hm, I thought the plan was that after the Lai's unbound series landed, > his ordered wq patch would follow, but perhaps not. Yes, that still is the plan but this is kinda unrelated to that change. This just initializes wq cpumask according to cpu isolation. I'm just curious whether there was any specific reason we didn't do this before (ISTR people discussing it back then too). > I'm referring to the somewhat aged patch below. (freshly wedged into > master, and maybe not properly, but it should at least look familiar). Yeah, I think I asked Lai to try a different approach where we regulate it from queueing path rather than playing with pwqs. I think that'd end up quite a bit simpler. Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-17 15:27 ` Tejun Heo @ 2015-07-17 15:35 ` Frederic Weisbecker 2015-07-17 15:43 ` Tejun Heo 2015-07-17 17:15 ` Mike Galbraith 1 sibling, 1 reply; 19+ messages in thread From: Frederic Weisbecker @ 2015-07-17 15:35 UTC (permalink / raw) To: Tejun Heo Cc: Mike Galbraith, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves On Fri, Jul 17, 2015 at 11:27:20AM -0400, Tejun Heo wrote: > Hello, Mike. > > On Fri, Jul 17, 2015 at 06:26:30AM +0200, Mike Galbraith wrote: > > Hm, I thought the plan was that after the Lai's unbound series landed, > > his ordered wq patch would follow, but perhaps not. > > Yes, that still is the plan but this is kinda unrelated to that > change. This just initializes wq cpumask according to cpu isolation. > I'm just curious whether there was any specific reason we didn't do > this before (ISTR people discussing it back then too). Initializing wq unbound cpumask to housekeeping_mask is still the plan. I just remember we didn't do it in Lai's series because it was slightly unrelated. When a patchset is complicated, like Lai's, it's better to keep it focus to a single purpose. Anyway that patch is welcome. > > > I'm referring to the somewhat aged patch below. (freshly wedged into > > master, and maybe not properly, but it should at least look familiar). > > Yeah, I think I asked Lai to try a different approach where we > regulate it from queueing path rather than playing with pwqs. I think > that'd end up quite a bit simpler. Ordered workqueues aren't handled currently? I tried setting the unbound cpumask and it also applied to khelper which is a singlethread (and thus ordered) workqueue. > Thanks. > > -- > tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-17 15:35 ` Frederic Weisbecker @ 2015-07-17 15:43 ` Tejun Heo 0 siblings, 0 replies; 19+ messages in thread From: Tejun Heo @ 2015-07-17 15:43 UTC (permalink / raw) To: Frederic Weisbecker Cc: Mike Galbraith, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves Hello, On Fri, Jul 17, 2015 at 05:35:09PM +0200, Frederic Weisbecker wrote: > Initializing wq unbound cpumask to housekeeping_mask is still the > plan. I just remember we didn't do it in Lai's series because it > was slightly unrelated. When a patchset is complicated, like Lai's, > it's better to keep it focus to a single purpose. > > Anyway that patch is welcome. Ah, cool, can you ack it explicitly? > > Yeah, I think I asked Lai to try a different approach where we > > regulate it from queueing path rather than playing with pwqs. I think > > that'd end up quite a bit simpler. > > Ordered workqueues aren't handled currently? I tried setting the unbound > cpumask and it also applied to khelper which is a singlethread (and thus > ordered) workqueue. Hmmm... AFAICS, it shouldn't work. Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-17 15:27 ` Tejun Heo 2015-07-17 15:35 ` Frederic Weisbecker @ 2015-07-17 17:15 ` Mike Galbraith 2015-07-18 13:36 ` Frederic Weisbecker 1 sibling, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2015-07-17 17:15 UTC (permalink / raw) To: Tejun Heo Cc: Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Frederic Weisbecker, Rik van Riel, Luis Claudio R. Goncalves On Fri, 2015-07-17 at 11:27 -0400, Tejun Heo wrote: > I'm just curious whether there was any specific reason we didn't do > this before (ISTR people discussing it back then too). I'm dead set against all this auto-presume nonsense fwtw Allocating a pool of no_hz_full _capable_ CPUs should not entice the kernel to make any rash assumptions. Let users do the button poking, they know what they want, and when they want it. -Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-17 17:15 ` Mike Galbraith @ 2015-07-18 13:36 ` Frederic Weisbecker 2015-07-18 15:48 ` Mike Galbraith 2015-07-19 8:02 ` Mike Galbraith 0 siblings, 2 replies; 19+ messages in thread From: Frederic Weisbecker @ 2015-07-18 13:36 UTC (permalink / raw) To: Mike Galbraith Cc: Tejun Heo, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves On Fri, Jul 17, 2015 at 07:15:48PM +0200, Mike Galbraith wrote: > On Fri, 2015-07-17 at 11:27 -0400, Tejun Heo wrote: > > > I'm just curious whether there was any specific reason we didn't do > > this before (ISTR people discussing it back then too). > > I'm dead set against all this auto-presume nonsense fwtw Allocating a > pool of no_hz_full _capable_ CPUs should not entice the kernel to make > any rash assumptions. Let users do the button poking, they know what > they want, and when they want it. We need to make a choice then. Either we do all the affinity tuning from userspace with a common tool, which is what I had wished before everybody asked for pre-settings. Or we do it in the kernel, now we should define some kind of CONFIG_ISOLATION to make that proper and rule the various kinds of isolation people are interested in. But we can't leave it half-way like it is currently with everything preset on top of nohz: rcu nocb mask, watchdog mask, cpu_isolation_map and exclude workqueue. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-18 13:36 ` Frederic Weisbecker @ 2015-07-18 15:48 ` Mike Galbraith 2015-07-19 8:02 ` Mike Galbraith 1 sibling, 0 replies; 19+ messages in thread From: Mike Galbraith @ 2015-07-18 15:48 UTC (permalink / raw) To: Frederic Weisbecker Cc: Tejun Heo, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves On Sat, 2015-07-18 at 15:36 +0200, Frederic Weisbecker wrote: > On Fri, Jul 17, 2015 at 07:15:48PM +0200, Mike Galbraith wrote: > > On Fri, 2015-07-17 at 11:27 -0400, Tejun Heo wrote: > > > > > I'm just curious whether there was any specific reason we didn't do > > > this before (ISTR people discussing it back then too). > > > > I'm dead set against all this auto-presume nonsense fwtw Allocating a > > pool of no_hz_full _capable_ CPUs should not entice the kernel to make > > any rash assumptions. Let users do the button poking, they know what > > they want, and when they want it. > > We need to make a choice then. Either we do all the affinity tuning from > userspace with a common tool, which is what I had wished before everybody > asked for pre-settings. Giving userspace what they need to do what they want seems right to me. > Or we do it in the kernel, now we should define some kind of CONFIG_ISOLATION > to make that proper and rule the various kinds of isolation people are > interested in. > > But we can't leave it half-way like it is currently with everything preset on > top of nohz: rcu nocb mask, watchdog mask, cpu_isolation_map and exclude workqueue. Yeah. Hell, maybe I'm wrong. Maybe people really want this rigidity and hand-holding by the kernel, but it just seems dainbramaged to me. ATM, you pay a high price (the overhead) for the capability, but until that auto-assume isolcpus landed, those CPUs weren't forever more specialists, they were CPUs with an extra (costly) capability, could be disconnected/reconnected to load balancing on the fly, and used however the user saw fit. I can imagine an auto-everything kernel having a bit of trouble with an SGI beast from hell. Too bad I don't have access to one, I'd try to boot a tune for maximum hand holding kernel. -Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-18 13:36 ` Frederic Weisbecker 2015-07-18 15:48 ` Mike Galbraith @ 2015-07-19 8:02 ` Mike Galbraith 2015-07-19 12:19 ` Mike Galbraith 2015-07-21 8:55 ` Mike Galbraith 1 sibling, 2 replies; 19+ messages in thread From: Mike Galbraith @ 2015-07-19 8:02 UTC (permalink / raw) To: Frederic Weisbecker Cc: Tejun Heo, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves On Sat, 2015-07-18 at 15:36 +0200, Frederic Weisbecker wrote: > But we can't leave it half-way like it is currently with everything preset on > top of nohz: rcu nocb mask, watchdog mask, cpu_isolation_map and exclude workqueue. To automate or not aside... WRT wq_unbound_cpumask, it's very nice to have but anyone watching their box should notice generic allegedly unbound work landing on the bound system_wq, thus the quiet zone isn't protected from these work items. For example, my little perturbation measurement proggy emits a stat line periodically, which leads to tty_schedule_flip() -> schedule_work() thus it perturbs itself seemingly needlessly. Lord knows how many other ways there are to do the same. The hack below is not intended to be anything remotely resembling a proper answer to that problem, it's my box encouraging me to ask the question by surviving (modulo destroy, redirect there is bad idea). Why do we do nothing about these allegedly unbound work items? --- include/linux/sched.h | 2 ++ kernel/workqueue.c | 24 ++++++++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1430,6 +1430,8 @@ struct task_struct { unsigned sched_contributes_to_load:1; unsigned sched_migrated:1; + unsigned work_redirect_disable:1; + #ifdef CONFIG_MEMCG_KMEM unsigned memcg_kmem_skip_account:1; #endif --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1294,6 +1294,21 @@ static bool is_chained_work(struct workq return worker && worker->current_pwq->wq == wq; } +static struct workqueue_struct * +redirect_generic_unbound_work(int cpu, struct workqueue_struct *wq) +{ + if (cpu != WORK_CPU_UNBOUND || wq != system_wq) + return wq; + if (current->work_redirect_disable) + return wq; + if (cpumask_test_cpu(raw_smp_processor_id(), wq_unbound_cpumask)) + return wq; + if (wq->flags & __WQ_DRAINING || system_unbound_wq->flags & __WQ_DRAINING) + return wq; + + return system_unbound_wq; +} + static void __queue_work(int cpu, struct workqueue_struct *wq, struct work_struct *work) { @@ -1317,6 +1332,7 @@ static void __queue_work(int cpu, struct if (unlikely(wq->flags & __WQ_DRAINING) && WARN_ON_ONCE(!is_chained_work(wq))) return; + wq = redirect_generic_unbound_work(req_cpu, wq); retry: if (req_cpu == WORK_CPU_UNBOUND) cpu = raw_smp_processor_id(); @@ -3926,6 +3942,8 @@ void destroy_workqueue(struct workqueue_ struct pool_workqueue *pwq; int node; + current->work_redirect_disable = 1; + /* drain it before proceeding with destruction */ drain_workqueue(wq); @@ -3937,7 +3955,7 @@ void destroy_workqueue(struct workqueue_ for (i = 0; i < WORK_NR_COLORS; i++) { if (WARN_ON(pwq->nr_in_flight[i])) { mutex_unlock(&wq->mutex); - return; + goto out; } } @@ -3945,7 +3963,7 @@ void destroy_workqueue(struct workqueue_ WARN_ON(pwq->nr_active) || WARN_ON(!list_empty(&pwq->delayed_works))) { mutex_unlock(&wq->mutex); - return; + goto out; } } mutex_unlock(&wq->mutex); @@ -3991,6 +4009,8 @@ void destroy_workqueue(struct workqueue_ wq->dfl_pwq = NULL; put_pwq_unlocked(pwq); } +out: + current->work_redirect_disable = 0; } EXPORT_SYMBOL_GPL(destroy_workqueue); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-19 8:02 ` Mike Galbraith @ 2015-07-19 12:19 ` Mike Galbraith 2015-07-21 8:55 ` Mike Galbraith 1 sibling, 0 replies; 19+ messages in thread From: Mike Galbraith @ 2015-07-19 12:19 UTC (permalink / raw) To: Frederic Weisbecker Cc: Tejun Heo, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves On Sun, 2015-07-19 at 10:02 +0200, Mike Galbraith wrote: > Why do we do nothing about these allegedly unbound work items? Hm, the answer to that question may be as simple as "Because we know gotchas exist, but not where they all the little bastards live" ;-) -Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-19 8:02 ` Mike Galbraith 2015-07-19 12:19 ` Mike Galbraith @ 2015-07-21 8:55 ` Mike Galbraith 2015-07-22 5:24 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Mike Galbraith 1 sibling, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2015-07-21 8:55 UTC (permalink / raw) To: Frederic Weisbecker Cc: Tejun Heo, Daniel Bristot de Oliveira, LKML, Lai Jiangshan, Rik van Riel, Luis Claudio R. Goncalves On Sun, 2015-07-19 at 10:02 +0200, Mike Galbraith wrote: > Why do we do nothing about these allegedly unbound work items? My box seems to think the answer is: no reason other than nobody having asked the source to please not do that. Guess I'll go ask a NUMA box. workqueue: RR schedule unbound work to CPUs in wq_unbound_cpumask WORK_CPU_UNBOUND work items queued to a bound workqueue always run locally. This is a good thing normally, as it keeps us from bouncing work all over the place like ping-pong balls in a nuclear fission demo. When the user has asked us to keep unbound work away from certain CPUs however, honor that request, select a CPU from wq_unbound_cpumask. Not-signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> --- kernel/workqueue.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -301,6 +301,9 @@ static bool workqueue_freezing; /* PL: static cpumask_var_t wq_unbound_cpumask; /* PL: low level cpumask for all unbound wqs */ +/* CPU where WORK_CPU_UNBOUND work was last round robin scheduled from this CPU */ +static DEFINE_PER_CPU(unsigned int, wq_unbound_rr_cpu_last); + /* the per-cpu worker pools */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS], cpu_worker_pools); @@ -1294,6 +1297,24 @@ static bool is_chained_work(struct workq return worker && worker->current_pwq->wq == wq; } +/* + * When queueing WORK_CPU_UNBOUND work to a !WQ_UNBOUND queue, round + * robin among wq_unbound_cpumask to avoid perturbing sensitive tasks. + */ +static unsigned int select_round_robin_cpu(unsigned int cpu) +{ + if (cpumask_test_cpu(cpu, wq_unbound_cpumask)) + return cpu; + if (cpumask_empty(wq_unbound_cpumask)) + return cpu; + cpu = __this_cpu_read(wq_unbound_rr_cpu_last); + cpu = cpumask_next_and(cpu, wq_unbound_cpumask, cpu_online_mask); + if (cpu >= nr_cpu_ids) + cpu = 0; + __this_cpu_write(wq_unbound_rr_cpu_last, cpu); + return cpu; +} + static void __queue_work(int cpu, struct workqueue_struct *wq, struct work_struct *work) { @@ -1322,9 +1343,11 @@ static void __queue_work(int cpu, struct cpu = raw_smp_processor_id(); /* pwq which will be used unless @work is executing elsewhere */ - if (!(wq->flags & WQ_UNBOUND)) + if (!(wq->flags & WQ_UNBOUND)) { + if (req_cpu == WORK_CPU_UNBOUND) + cpu = select_round_robin_cpu(cpu); pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); - else + } else pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); /* ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs 2015-07-21 8:55 ` Mike Galbraith @ 2015-07-22 5:24 ` Mike Galbraith 2015-07-22 12:43 ` [PATCH] workqueue: avoiding unbounded wq on isolated CPUs by default Daniel Bristot de Oliveira 2015-07-22 14:11 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Tejun Heo 0 siblings, 2 replies; 19+ messages in thread From: Mike Galbraith @ 2015-07-22 5:24 UTC (permalink / raw) To: Tejun Heo, Lai Jiangshan Cc: Frederic Weisbecker, Daniel Bristot de Oliveira, LKML, Rik van Riel, Luis Claudio R. Goncalves On Tue, 2015-07-21 at 10:55 +0200, Mike Galbraith wrote: > On Sun, 2015-07-19 at 10:02 +0200, Mike Galbraith wrote: > > > Why do we do nothing about these allegedly unbound work items? > > My box seems to think the answer is: no reason other than nobody having > asked the source to please not do that. Guess I'll go ask a NUMA box. My [128] socket boxen show zero signs of caring, and it's dirt simple, so it's no longer an experiment. Fly or die little patchlet... WORK_CPU_UNBOUND work items queued to a bound workqueue always run locally. This is a good thing normally, but not when the user has asked us to keep unbound work away from certain CPUs. Round robin these to wq_unbound_cpumask CPUs instead, as perturbation avoidance trumps performance. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> --- kernel/workqueue.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -301,6 +301,9 @@ static bool workqueue_freezing; /* PL: static cpumask_var_t wq_unbound_cpumask; /* PL: low level cpumask for all unbound wqs */ +/* CPU where WORK_CPU_UNBOUND work was last round robin scheduled from this CPU */ +static DEFINE_PER_CPU(unsigned int, wq_unbound_rr_cpu_last); + /* the per-cpu worker pools */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS], cpu_worker_pools); @@ -1294,6 +1297,24 @@ static bool is_chained_work(struct workq return worker && worker->current_pwq->wq == wq; } +/* + * When queueing WORK_CPU_UNBOUND work to a !WQ_UNBOUND queue, round + * robin among wq_unbound_cpumask to avoid perturbing sensitive tasks. + */ +static unsigned int select_round_robin_cpu(unsigned int cpu) +{ + if (cpumask_test_cpu(cpu, wq_unbound_cpumask)) + return cpu; + if (cpumask_empty(wq_unbound_cpumask)) + return cpu; + cpu = __this_cpu_read(wq_unbound_rr_cpu_last); + cpu = cpumask_next_and(cpu, wq_unbound_cpumask, cpu_online_mask); + if (cpu >= nr_cpu_ids) + cpu = 0; + __this_cpu_write(wq_unbound_rr_cpu_last, cpu); + return cpu; +} + static void __queue_work(int cpu, struct workqueue_struct *wq, struct work_struct *work) { @@ -1322,9 +1343,11 @@ static void __queue_work(int cpu, struct cpu = raw_smp_processor_id(); /* pwq which will be used unless @work is executing elsewhere */ - if (!(wq->flags & WQ_UNBOUND)) + if (!(wq->flags & WQ_UNBOUND)) { + if (req_cpu == WORK_CPU_UNBOUND) + cpu = select_round_robin_cpu(cpu); pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); - else + } else pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); /* ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] workqueue: avoiding unbounded wq on isolated CPUs by default 2015-07-22 5:24 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Mike Galbraith @ 2015-07-22 12:43 ` Daniel Bristot de Oliveira 2015-07-22 14:11 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Tejun Heo 1 sibling, 0 replies; 19+ messages in thread From: Daniel Bristot de Oliveira @ 2015-07-22 12:43 UTC (permalink / raw) To: Tejun Heo, Lai Jiangshan Cc: Mike Galbraith, Frederic Weisbecker, LKML, Rik van Riel, Luis Claudio R. Goncalves By default, unbounded workqueues run on all CPUs, which includes isolated CPUs. This patch avoids unbounded workqueues running on isolated CPUs by default, keeping the current behavior when no CPUs were isolated. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> --- kernel/workqueue.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 4c4f061..14d17f4 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5217,7 +5217,10 @@ static int __init init_workqueues(void) WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long)); BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); - cpumask_copy(wq_unbound_cpumask, cpu_possible_mask); + + /* by default, run unbound wq on non-isolated CPUs */ + cpumask_andnot(wq_unbound_cpumask, cpu_possible_mask, + cpu_isolated_map); pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); -- 2.1.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs 2015-07-22 5:24 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Mike Galbraith 2015-07-22 12:43 ` [PATCH] workqueue: avoiding unbounded wq on isolated CPUs by default Daniel Bristot de Oliveira @ 2015-07-22 14:11 ` Tejun Heo 2015-07-22 14:55 ` Mike Galbraith 1 sibling, 1 reply; 19+ messages in thread From: Tejun Heo @ 2015-07-22 14:11 UTC (permalink / raw) To: Mike Galbraith Cc: Lai Jiangshan, Frederic Weisbecker, Daniel Bristot de Oliveira, LKML, Rik van Riel, Luis Claudio R. Goncalves On Wed, Jul 22, 2015 at 07:24:46AM +0200, Mike Galbraith wrote: > WORK_CPU_UNBOUND work items queued to a bound workqueue always run > locally. This is a good thing normally, but not when the user has The constant name used there is a bit misleading but you can't put work items which are queued w/ queue_work() on foreign cpus by default. queue_work() has always guaranteed local execution. The problem is that workqueue can't currently tell whether a queue_work() user expects cpu locality for correctness or optimization. It'd be great if we introduce queue_work_on_local() or sth and replace correctness ones with it but that involves auditing each and every queue_work() usage. If anybody is up for the task, I'd be happy to help. Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs 2015-07-22 14:11 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Tejun Heo @ 2015-07-22 14:55 ` Mike Galbraith 2015-07-22 15:19 ` Mike Galbraith 0 siblings, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2015-07-22 14:55 UTC (permalink / raw) To: Tejun Heo Cc: Lai Jiangshan, Frederic Weisbecker, Daniel Bristot de Oliveira, LKML, Rik van Riel, Luis Claudio R. Goncalves On Wed, 2015-07-22 at 10:11 -0400, Tejun Heo wrote: > On Wed, Jul 22, 2015 at 07:24:46AM +0200, Mike Galbraith wrote: > > WORK_CPU_UNBOUND work items queued to a bound workqueue always run > > locally. This is a good thing normally, but not when the user has > > The constant name used there is a bit misleading but you can't put > work items which are queued w/ queue_work() on foreign cpus by > default. queue_work() has always guaranteed local execution. The > problem is that workqueue can't currently tell whether a queue_work() > user expects cpu locality for correctness or optimization. It'd be > great if we introduce queue_work_on_local() or sth and replace > correctness ones with it but that involves auditing each and every > queue_work() usage. If anybody is up for the task, I'd be happy to > help. Oh well. That puts a big dent in the utility of wq_unbound_cpumask, but too bad. If someone has a big enough HPC itch, they'll scratch it. -Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs 2015-07-22 14:55 ` Mike Galbraith @ 2015-07-22 15:19 ` Mike Galbraith 2015-07-22 15:25 ` Tejun Heo 0 siblings, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2015-07-22 15:19 UTC (permalink / raw) To: Tejun Heo Cc: Lai Jiangshan, Frederic Weisbecker, Daniel Bristot de Oliveira, LKML, Rik van Riel, Luis Claudio R. Goncalves On Wed, 2015-07-22 at 16:55 +0200, Mike Galbraith wrote: > On Wed, 2015-07-22 at 10:11 -0400, Tejun Heo wrote: > > On Wed, Jul 22, 2015 at 07:24:46AM +0200, Mike Galbraith wrote: > > > WORK_CPU_UNBOUND work items queued to a bound workqueue always run > > > locally. This is a good thing normally, but not when the user has > > > > The constant name used there is a bit misleading but you can't put > > work items which are queued w/ queue_work() on foreign cpus by > > default. queue_work() has always guaranteed local execution. The > > problem is that workqueue can't currently tell whether a queue_work() > > user expects cpu locality for correctness or optimization. It'd be > > great if we introduce queue_work_on_local() or sth and replace > > correctness ones with it but that involves auditing each and every > > queue_work() usage. If anybody is up for the task, I'd be happy to > > help. > > Oh well. That puts a big dent in the utility of wq_unbound_cpumask, but > too bad. If someone has a big enough HPC itch, they'll scratch it. Ew, looking at the numbers, they may prefer to either a) pretend to not notice, or b) scurry off to HPC'R'US store if a) won't fly ;-) -Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs 2015-07-22 15:19 ` Mike Galbraith @ 2015-07-22 15:25 ` Tejun Heo 2015-07-24 3:38 ` Mike Galbraith 0 siblings, 1 reply; 19+ messages in thread From: Tejun Heo @ 2015-07-22 15:25 UTC (permalink / raw) To: Mike Galbraith Cc: Lai Jiangshan, Frederic Weisbecker, Daniel Bristot de Oliveira, LKML, Rik van Riel, Luis Claudio R. Goncalves Hello, On Wed, Jul 22, 2015 at 05:19:00PM +0200, Mike Galbraith wrote: > Ew, looking at the numbers, they may prefer to either a) pretend to not > notice, or b) scurry off to HPC'R'US store if a) won't fly ;-) Yeah, there are a lot of them. The sad part is that only very few of them would actually need local binding for correctness. :( -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs 2015-07-22 15:25 ` Tejun Heo @ 2015-07-24 3:38 ` Mike Galbraith 0 siblings, 0 replies; 19+ messages in thread From: Mike Galbraith @ 2015-07-24 3:38 UTC (permalink / raw) To: Tejun Heo Cc: Lai Jiangshan, Frederic Weisbecker, Daniel Bristot de Oliveira, LKML, Rik van Riel, Luis Claudio R. Goncalves On Wed, 2015-07-22 at 11:25 -0400, Tejun Heo wrote: > Hello, > > On Wed, Jul 22, 2015 at 05:19:00PM +0200, Mike Galbraith wrote: > > Ew, looking at the numbers, they may prefer to either a) pretend to not > > notice, or b) scurry off to HPC'R'US store if a) won't fly ;-) > > Yeah, there are a lot of them. The sad part is that only very few of > them would actually need local binding for correctness. :( I'm going to end this thread with an under my breath mutter. Before I do that, note that I went and beat up three different boxen over days before turning trivial little patchlet loose, ie I was quite paranoid despite the below, "No way Jose" was not entirely unexpected. Mutter: WORK_CPU_UNBOUND is about as far from a local execution guarantee as it gets, and not only is it prominently displayed for all to see... /** * queue_work - queue work on a workqueue * @wq: workqueue to use * @work: work to queue * * Returns %false if @work was already on a queue, %true otherwise. * * We queue the work to the CPU on which it was submitted, but if the CPU dies * it can be processed by another CPU. */ static inline bool queue_work(struct workqueue_struct *wq, struct work_struct *work) { return queue_work_on(WORK_CPU_UNBOUND, wq, work); } ...there's nothing ambiguous about that. While it states that we queue locally, it also clearly states that local execution is NOT guaranteed, making assumption thereof a BUG, which in turn makes it seem a bit odd to tie fixing up nohz_full a little to auditing and fixing every booboo ever made (unlikely to ever happen). It would appear that the only way anyone could ever have sanely assume local execution, WORK_CPU_UNBOUND aside, is to have called get_online_cpus() before queueing any work. Mutter expressed, it's not a big hairy deal, anyone using an SMP kernel with an expectation of zero perturbation is doomed to be disappointed anyway, patchlet just reduced noise a bit. -Mike (mutter mutter mutter;) ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-07-24 3:39 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-16 19:16 [RFC] workqueue: avoiding unbounded wq on isolated CPUs by default Daniel Bristot de Oliveira 2015-07-16 19:24 ` Tejun Heo 2015-07-17 4:26 ` Mike Galbraith 2015-07-17 15:27 ` Tejun Heo 2015-07-17 15:35 ` Frederic Weisbecker 2015-07-17 15:43 ` Tejun Heo 2015-07-17 17:15 ` Mike Galbraith 2015-07-18 13:36 ` Frederic Weisbecker 2015-07-18 15:48 ` Mike Galbraith 2015-07-19 8:02 ` Mike Galbraith 2015-07-19 12:19 ` Mike Galbraith 2015-07-21 8:55 ` Mike Galbraith 2015-07-22 5:24 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Mike Galbraith 2015-07-22 12:43 ` [PATCH] workqueue: avoiding unbounded wq on isolated CPUs by default Daniel Bristot de Oliveira 2015-07-22 14:11 ` [patch] workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs Tejun Heo 2015-07-22 14:55 ` Mike Galbraith 2015-07-22 15:19 ` Mike Galbraith 2015-07-22 15:25 ` Tejun Heo 2015-07-24 3:38 ` Mike Galbraith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.