From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Christoph Lameter <cl@linux.com>,
Kevin Hilman <khilman@linaro.org>,
Mike Galbraith <bitbucket@online.de>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Tejun Heo <tj@kernel.org>, Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [PATCH 4/4] workqueue: Include ordered workqueues in anon workqueue sysfs interface
Date: Mon, 31 Mar 2014 21:15:26 +0800 [thread overview]
Message-ID: <53396A6E.4030607@cn.fujitsu.com> (raw)
In-Reply-To: <533964AE.9070503@cn.fujitsu.com>
On 03/31/2014 08:50 PM, Lai Jiangshan wrote:
> On 03/28/2014 01:21 AM, Frederic Weisbecker wrote:
>> Changing the cpu affinity of workqueues through the sysfs interface
>> is done with apply_workqueue_attrs() by replacing the old pwqs
>> of a workqueue with new ones tied to worker pools that are affine to the
>> new cpumask.
>>
>> We can't do that with ordered workqueues however because the serialization
>> of their works is enforced by keeping a single worker running. Replacing
>> it with a new pool of single worker may open a small race window where
>> two workers could run concurrently during the pwq replacement. Eventually
>> this can break the ordering guarantee.
>>
>> So ordered workqueues get a special treatment here. Since they run a
>> single pwq with a single pool containing a single worker that all
>> ordered workqueue should share, we simply change this worker affinity
>> to the new cpumask.
>>
>> Also just in case this behaviour change a bit in the future and some
>> ordered workqueues have their private worker, lets iterate the affinity
>> change over all ordered workqueue pools. This way it works in both cases:
>> whether a single worker is shared among all ordered workqueue pools or
>> some of them run their private one.
>>
>> Suggested-by: Tejun Heo <tj@kernel.org>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Kevin Hilman <khilman@linaro.org>
>> Cc: Mike Galbraith <bitbucket@online.de>
>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
>> ---
>> kernel/workqueue.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----------
>> 1 file changed, 56 insertions(+), 12 deletions(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 2863c39..18807c7 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3330,10 +3330,60 @@ static ssize_t wq_anon_cpumask_show(struct device *dev,
>> return written;
>> }
>>
>> -/* Must be called with wq_unbound_mutex held */
>> -static int wq_anon_cpumask_set(cpumask_var_t cpumask)
>> +static int wq_ordered_cpumask_set(struct workqueue_struct *wq, cpumask_var_t cpumask)
>> +{
>> + struct pool_workqueue *pwq;
>> + struct worker_pool *pool;
>> + struct worker *worker;
>> + int ret;
>> + int wi;
>> +
>> + mutex_lock(&wq_pool_mutex);
>> + pwq = list_first_entry(&wq->pwqs, typeof(*pwq), pwqs_node);
>> + pool = pwq->pool;
>> +
>> + mutex_lock(&pool->manager_mutex);
>> + for_each_pool_worker(worker, wi, pool) {
>> + /* CHECKME: Should we hold pool->lock here? */
>
> Don't need to hold pool->lock which is a spinlock.
> pool->manager_mutex is enough to apply cpumask setting
> (but we only do it in cpuhotplug callbacks now).
>
>> + ret = set_cpus_allowed_ptr(worker->task, cpumask);
>
> no.
>
> even the pwq(of ordered wq) shares the pool with other unbound pwq.
> we shouldn't change the worker's attr.
Sorry, I'm wrong.
Tejun had told there is only one default worker pool for ordered workqueues.
It is true. But this pool may share with other non-ordered workqueues which
maybe have WQ_SYSFS. we should split this pool into two pools.
one for ordered wqs, one for the rest.
>
> Although we can change the ordered wq to use dedicated pool.
> But if we do so, I would recommend to remove the ordered wq from workqueue.c totally
> and ask the callers use queue_kthread_work() instead.
>
> (dedicated or any)pool requires two workers at least, and requires much
> more other resource.
>
>> + if (ret)
>> + break;
>> + }
>> + if (!ret) {
>> + cpumask_copy(pool->attrs->cpumask, cpumask);
>
> pool->attr should be constant (except initialization/destruction).
>
>> + }
>> + mutex_unlock(&pool->manager_mutex);
>> +
>> + if (!ret) {
>> + mutex_lock(&wq->mutex);
>> + cpumask_copy(wq->unbound_attrs->cpumask, cpumask);
>> + mutex_unlock(&wq->mutex);
>> + }
>> +
>> + mutex_unlock(&wq_pool_mutex);
>> +
>> + return ret;
>> +}
>> +
>> +static int wq_anon_cpumask_set(struct workqueue_struct *wq, cpumask_var_t cpumask)
>> {
>> struct workqueue_attrs *attrs;
>> + int ret;
>> +
>> + attrs = wq_sysfs_prep_attrs(wq);
>> + if (!attrs)
>> + return -ENOMEM;
>> +
>> + cpumask_copy(attrs->cpumask, cpumask);
>> + ret = apply_workqueue_attrs(wq, attrs);
>> + free_workqueue_attrs(attrs);
>> +
>> + return ret;
>> +}
>> +
>> +/* Must be called with wq_unbound_mutex held */
>> +static int wq_anon_cpumask_set_all(cpumask_var_t cpumask)
>> +{
>> struct workqueue_struct *wq;
>> int ret;
>>
>> @@ -3343,15 +3393,9 @@ static int wq_anon_cpumask_set(cpumask_var_t cpumask)
>> continue;
>> /* Ordered workqueues need specific treatment */
>> if (wq->flags & __WQ_ORDERED)
>> - continue;
>> -
>> - attrs = wq_sysfs_prep_attrs(wq);
>> - if (!attrs)
>> - return -ENOMEM;
>> -
>> - cpumask_copy(attrs->cpumask, cpumask);
>> - ret = apply_workqueue_attrs(wq, attrs);
>> - free_workqueue_attrs(attrs);
>> + ret = wq_ordered_cpumask_set(wq, cpumask);
After the pool is split. wq_ordered_cpumask_set() should only be called once.
once is enough for all ordered wqs.
>> + else
>> + ret = wq_anon_cpumask_set(wq, cpumask);
>> if (ret)
>> break;
>> }
>> @@ -3376,7 +3420,7 @@ static ssize_t wq_anon_cpumask_store(struct device *dev,
>> get_online_cpus();
>> if (cpumask_intersects(cpumask, cpu_online_mask)) {
>> mutex_lock(&wq_unbound_mutex);
>> - ret = wq_anon_cpumask_set(cpumask);
>> + ret = wq_anon_cpumask_set_all(cpumask);
>> if (!ret)
>> cpumask_copy(&wq_anon_cpumask, cpumask);
>> mutex_unlock(&wq_unbound_mutex);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2014-03-31 13:12 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-27 17:20 [PATCH 0/4] workqueue: Control cpu affinity of all unbound workqueues v2 Frederic Weisbecker
2014-03-27 17:20 ` [PATCH 1/4] workqueue: Move workqueue bus attr to device attribute Frederic Weisbecker
2014-04-03 7:09 ` Viresh Kumar
2014-04-24 13:48 ` Frederic Weisbecker
2014-03-27 17:21 ` [PATCH 2/4] workqueues: Account unbound workqueue in a seperate list Frederic Weisbecker
2014-03-30 12:57 ` Tejun Heo
2014-04-03 14:48 ` Frederic Weisbecker
2014-04-03 15:01 ` Tejun Heo
2014-04-03 15:43 ` Frederic Weisbecker
2014-03-27 17:21 ` [PATCH 3/4] workqueue: Add anon workqueue sysfs hierarchy Frederic Weisbecker
2014-03-30 13:01 ` Tejun Heo
2014-04-03 14:42 ` Frederic Weisbecker
2014-04-03 14:58 ` Tejun Heo
2014-04-03 15:05 ` Frederic Weisbecker
2014-04-03 7:07 ` Viresh Kumar
2014-03-27 17:21 ` [PATCH 4/4] workqueue: Include ordered workqueues in anon workqueue sysfs interface Frederic Weisbecker
2014-03-31 12:50 ` Lai Jiangshan
2014-03-31 13:15 ` Lai Jiangshan [this message]
2014-04-03 15:59 ` Frederic Weisbecker
2014-04-15 9:58 ` [PATCH] workqueue: allow changing attributions of ordered workqueue Lai Jiangshan
2014-04-15 12:25 ` Frederic Weisbecker
2014-04-15 15:19 ` Lai Jiangshan
2014-04-23 0:04 ` Frederic Weisbecker
2014-04-23 2:16 ` Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53396A6E.4030607@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=bitbucket@online.de \
--cc=cl@linux.com \
--cc=fweisbec@gmail.com \
--cc=khilman@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=tj@kernel.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox