From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752913AbbCaHoD (ORCPT <rfc822;w@1wt.eu>);
	Tue, 31 Mar 2015 03:44:03 -0400
Received: from cn.fujitsu.com ([59.151.112.132]:20588 "EHLO
	heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org
	with ESMTP id S1751566AbbCaHn6 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 31 Mar 2015 03:43:58 -0400
X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; 
   d="scan'208";a="89945073"
Message-ID: <551A50DC.6060904@cn.fujitsu.com>
Date: Tue, 31 Mar 2015 15:46:36 +0800
From: Lai Jiangshan <laijs@cn.fujitsu.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Tejun Heo <tj@kernel.org>
CC: <linux-kernel@vger.kernel.org>, Christoph Lameter <cl@linux.com>,
        Kevin Hilman <khilman@linaro.org>,
        Mike Galbraith <bitbucket@online.de>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Viresh Kumar <viresh.kumar@linaro.org>,
        Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH 4/4 V5] workqueue: Allow modifying low level unbound workqueue
 cpumask
References: <1426136412-7594-1-git-send-email-laijs@cn.fujitsu.com> <1426653617-3240-1-git-send-email-laijs@cn.fujitsu.com> <1426653617-3240-5-git-send-email-laijs@cn.fujitsu.com> <20150324173120.GI3880@htj.duckdns.org>
In-Reply-To: <20150324173120.GI3880@htj.duckdns.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.167.226.103]
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/25/2015 01:31 AM, Tejun Heo wrote:
> On Wed, Mar 18, 2015 at 12:40:17PM +0800, Lai Jiangshan wrote:
>> The oreder-workquue is ignore from the low level unbound workqueue cpumask,
>> it will be handled in near future.
> 
> Ugh, right, ordered workqueues are tricky.  Maybe we should change how
> ordered workqueues are implemented.  Just gate work items at the
> workqueue layer instead of fiddling with max_active and the number of
> pwqs.
> 
>>  static struct wq_unbound_install_ctx *
>>  wq_unbound_install_ctx_prepare(struct workqueue_struct *wq,
>> -			       const struct workqueue_attrs *attrs)
>> +			       const struct workqueue_attrs *attrs,
>> +			       cpumask_var_t unbound_cpumask)
>>  {
> ...
>>  	/* make a copy of @attrs and sanitize it */
>>  	copy_workqueue_attrs(new_attrs, attrs);
>> -	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
>> +	copy_workqueue_attrs(pwq_attrs, attrs);
>> +	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
>> +	cpumask_and(pwq_attrs->cpumask, pwq_attrs->cpumask, unbound_cpumask);
> 
> Hmmm... we weren't checking whether the intersection becomes null
> before.

Di you refer to the unquoted following code "cpumask_empty(pwq_attrs->cpumask)"?

It is explained in the changelog and the comments.

>  Why are we doing it now?  Note that this doesn't really make
> things water-tight as cpu on/offlining can still leave the mask w/o
> any online cpus.  Shouldn't we just let the scheduler handle it as
> before?

Did you refer to "cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);"?

new_attrs will be copied to wq->unbound_attrs, so we hope it is sanity.
the same code before this patchset did the same work.

And it maybe be used for default pwq, and it can reduce the pool creation:
	cpu_possible_mask = 0-7
	wq_unbound_cpumask = 0-3
	user1 try to set wq1:	attrs->cpumask = 4-9
	user2 try to set wq2:	attrs->cpumask = 4-11
thus both wq1 and wq2's default pwq's pool is the same pool. (pool's cpumask = 4-7)
	

> 
>> @@ -3712,6 +3726,9 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
>>  	 * wq's, the default pwq should be used.
>>  	 */
>>  	if (wq_calc_node_cpumask(wq->unbound_attrs, node, cpu_off, cpumask)) {
>> +		cpumask_and(cpumask, cpumask, wq_unbound_cpumask);
>> +		if (cpumask_empty(cpumask))
>> +			goto use_dfl_pwq;
> 
> So, this special handling is necessary only because we did special in
> the above for dfl_pwq.  Why do we need these?

wq->unbound_attrs is user setting attrs, its cpumask is not controlled by
wq_unbound_cpumask. so we need these cpumask_and().

Another question:
Why wq->unbound_attrs' cpumask is not controlled by wq_unbound_cpumask?

I hope the wq->unbound_attrs is always as the same as the user's last setting,
regardless how much times the wq_unbound_cpumask is changed.

> 
>> +static int unbounds_cpumask_apply(cpumask_var_t cpumask)
>> +{
> ..
>> +	list_for_each_entry_safe(ctx, n, &ctxs, list) {
>> +		if (ret >= 0)
> 
> Let's do !ret.
> 
>> +			wq_unbound_install_ctx_commit(ctx);
>> +		wq_unbound_install_ctx_free(ctx);
>> +	}
> ...
>> +/**
>> + *  workqueue_unbounds_cpumask_set - Set the low-level unbound cpumask
>> + *  @cpumask: the cpumask to set
>> + *
>> + *  The low-level workqueues cpumask is a global cpumask that limits
>> + *  the affinity of all unbound workqueues.  This function check the @cpumask
>> + *  and apply it to all unbound workqueues and updates all pwqs of them.
>> + *  When all succeed, it saves @cpumask to the global low-level unbound
>> + *  cpumask.
>> + *
>> + *  Retun:	0	- Success
>> + *  		-EINVAL	- No online cpu in the @cpumask
>> + *  		-ENOMEM	- Failed to allocate memory for attrs or pwqs.
>> + */
>> +int workqueue_unbounds_cpumask_set(cpumask_var_t cpumask)
>> +{
>> +	int ret = -EINVAL;
>> +
>> +	get_online_cpus();
>> +	cpumask_and(cpumask, cpumask, cpu_possible_mask);
>> +	if (cpumask_intersects(cpumask, cpu_online_mask)) {
> 
> Does this make sense?  We can't prevent cpus going down right after
> the mask is set.  What's the point of preventing empty config if we
> can't prevent transitions into it and have to handle it anyway?

Like set_cpus_allowed_ptr(). The cpumask must be valid when setting,
although it can be transited into non-intersection later.

This code is originated from Frederic.  Maybe he has some stronger reason.

> 
>> +static ssize_t unbounds_cpumask_store(struct device *dev,
>> +				      struct device_attribute *attr,
>> +				      const char *buf, size_t count)
> 
> Naming is too confusing.  Please pick a name which clearly
> distinguishes per-wq and global masking.

What about these names?
wq_unbound_cpumask ==> wq_unbound_global_cpumask
workqueue_unbounds_cpumask_set() ==> workqueue_set_unbound_global_cpumask(). (public API)
unbounds_cpumask_store() ==> wq_store_unbound_global_cpumask()   (static function for sysfs)

> 
> Thanks.
>