From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
"Gu, Zheng" <guz.fnst@cn.fujitsu.com>,
tangchen <tangchen@cn.fujitsu.com>
Subject: Re: [PATCH 1/4] workqueue:Fix unbound workqueue's node affinity detection
Date: Tue, 16 Dec 2014 13:30:32 +0800 [thread overview]
Message-ID: <548FC378.6060809@cn.fujitsu.com> (raw)
In-Reply-To: <548EC29A.5080008@jp.fujitsu.com>
On 12/15/2014 07:14 PM, Kamezawa Hiroyuki wrote:
> Unbound wq pool's node attribute is calculated at its allocation.
> But it's now calculated based on possible cpu<->node information
> which can be wrong after cpu hotplug/unplug.
>
> If wrong pool->node is set, following allocation error will happen.
> ==
> SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
> cache: kmalloc-192, object size: 192, buffer size: 192, default order:
> 1, min order: 0
> node 0: slabs: 6172, objs: 259224, free: 245741
> node 1: slabs: 3261, objs: 136962, free: 127656
> ==
>
> This patch fixes the node detection by making use of online cpu info.
> Unlike cpumask, the best node can be calculated by degree of overlap
> between attr->cpumask and numanode->online_cpumask.
> This change doesn't corrupt original purpose of the old calculation.
>
> Note: it's expected that this function is called as
> pool_detect_best_node
> get_unbound_pool
> alloc_unbound_pwq
> wq_update_unbound_numa
> called at CPU_ONLINE/CPU_DOWN_PREPARE
> and the latest online cpu info can be applied to a new wq pool,
> which replaces old one.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> kernel/workqueue.c | 38 ++++++++++++++++++++++++++------------
> 1 file changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 09b685d..7809154 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3440,6 +3440,31 @@ static void put_unbound_pool(struct worker_pool *pool)
> }
>
> /**
> + * pool_detect_best_node - detect a node which contains specified cpumask.
> + * Should be called with wq_pool_mutex held.
> + * Returns a online node where the most of given cpus are tied to.
> + */
> +static int pool_detect_best_node(const struct cpumask *cpumask)
> +{
> + int node, best, match, selected = NUMA_NO_NODE;
> + static struct cpumask andmask; /* under wq_pool_mutex */
> +
> + if (!wq_numa_enabled ||
> + cpumask_subset(cpu_online_mask, cpumask))
> + goto out;
> + best = 0;
> + /* select a node which contains the most number of cpu */
> + for_each_node_state(node, N_ONLINE) {
> + cpumask_and(&andmask, cpumask, cpumask_of_node(node));
> + match = cpumask_weight(&andmask);
> + if (match > best)
> + selected = best;
> + }
> +out:
> + return selected;
> +}
This is a mixture of fix and development. Why not just keep the original calculation?
if the mask cover multiple nodes, NUMA_NO_NODE is the best for pool->node
after the pool was created. The memory allocation will select the best node
for manage_workers(), from which CPU that the worker actually is running on.
> +
> +/**
> * get_unbound_pool - get a worker_pool with the specified attributes
> * @attrs: the attributes of the worker_pool to get
> *
> @@ -3457,7 +3482,6 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> {
> u32 hash = wqattrs_hash(attrs);
> struct worker_pool *pool;
> - int node;
>
> lockdep_assert_held(&wq_pool_mutex);
>
> @@ -3482,17 +3506,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> * 'struct workqueue_attrs' comments for detail.
> */
> pool->attrs->no_numa = false;
> -
> - /* if cpumask is contained inside a NUMA node, we belong to that node */
> - if (wq_numa_enabled) {
> - for_each_node(node) {
> - if (cpumask_subset(pool->attrs->cpumask,
> - wq_numa_possible_cpumask[node])) {
> - pool->node = node;
> - break;
> - }
> - }
> - }
> + pool->node = pool_detect_best_node(pool->attrs->cpumask);
>
> if (worker_pool_assign_id(pool) < 0)
> goto fail;
next prev parent reply other threads:[~2014-12-16 5:26 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 10:19 [PATCH 0/5] workqueue: fix bug when numa mapping is changed Lai Jiangshan
2014-12-12 10:19 ` [PATCH 1/5] workqueue: fix memory leak in wq_numa_init() Lai Jiangshan
2014-12-12 17:12 ` Tejun Heo
2014-12-15 5:25 ` Lai Jiangshan
2014-12-12 10:19 ` [PATCH 2/5] workqueue: update wq_numa_possible_cpumask Lai Jiangshan
2014-12-12 17:18 ` Tejun Heo
2014-12-15 2:02 ` Lai Jiangshan
2014-12-25 20:16 ` Tejun Heo
2014-12-18 2:22 ` Lai Jiangshan
2014-12-12 10:19 ` [PATCH 3/5] workqueue: fixup existing pool->node Lai Jiangshan
2014-12-12 17:25 ` Tejun Heo
2014-12-15 1:23 ` Lai Jiangshan
2014-12-25 20:14 ` Tejun Heo
2015-01-13 7:08 ` Lai Jiangshan
2015-01-13 15:24 ` Tejun Heo
2014-12-12 10:19 ` [PATCH 4/5] workqueue: update NUMA affinity for the node lost CPU Lai Jiangshan
2014-12-12 17:27 ` Tejun Heo
2014-12-15 1:28 ` Lai Jiangshan
2014-12-25 20:17 ` Tejun Heo
2014-12-12 10:19 ` [PATCH 5/5] workqueue: retry on NUMA_NO_NODE when create_worker() fails Lai Jiangshan
2014-12-12 16:05 ` KOSAKI Motohiro
2014-12-12 17:29 ` KOSAKI Motohiro
2014-12-12 17:29 ` Tejun Heo
2014-12-12 17:13 ` [PATCH 0/5] workqueue: fix bug when numa mapping is changed Yasuaki Ishimatsu
2014-12-15 1:34 ` Lai Jiangshan
2014-12-18 1:50 ` Yasuaki Ishimatsu
2014-12-13 16:27 ` [PATCH 0/4] workqueue: fix bug when numa mapping is changed v2 Kamezawa Hiroyuki
2014-12-13 16:30 ` [PATCH 1/4] workqueue: add a hook for node hotplug Kamezawa Hiroyuki
2014-12-13 16:33 ` [PATCH 2/4] workqueue: add warning if pool->node is offline Kamezawa Hiroyuki
2014-12-13 16:35 ` [PATCH 3/4] workqueue: remove per-node unbound pool when node goes offline Kamezawa Hiroyuki
2014-12-15 2:06 ` Lai Jiangshan
2014-12-15 2:06 ` Kamezawa Hiroyuki
2014-12-13 16:38 ` [PATCH 4/4] workqueue: handle change in cpu-node relationship Kamezawa Hiroyuki
2014-12-15 2:12 ` Lai Jiangshan
2014-12-15 2:20 ` Kamezawa Hiroyuki
2014-12-15 2:48 ` Lai Jiangshan
2014-12-15 2:55 ` Kamezawa Hiroyuki
2014-12-15 3:30 ` Lai Jiangshan
2014-12-15 3:34 ` Lai Jiangshan
2014-12-15 4:04 ` Kamezawa Hiroyuki
2014-12-15 5:19 ` Lai Jiangshan
2014-12-15 5:33 ` Kamezawa Hiroyuki
2014-12-15 11:11 ` [PATCH 0/4] workqueue: fix memory allocation after numa mapping is changed v3 Kamezawa Hiroyuki
2014-12-15 11:14 ` [PATCH 1/4] workqueue:Fix unbound workqueue's node affinity detection Kamezawa Hiroyuki
2014-12-16 5:30 ` Lai Jiangshan [this message]
2014-12-16 7:32 ` Kamezawa Hiroyuki
2014-12-16 7:54 ` Lai Jiangshan
2014-12-15 11:16 ` [PATCH 2/4] workqueue: update per-cpu workqueue's node affinity at,online-offline Kamezawa Hiroyuki
2014-12-16 5:32 ` Lai Jiangshan
2014-12-16 7:25 ` Kamezawa Hiroyuki
2014-12-15 11:18 ` [PATCH 3/4] workqueue: Update workqueue's possible cpumask when a new node, coming up Kamezawa Hiroyuki
2014-12-16 7:49 ` Lai Jiangshan
2014-12-16 8:10 ` Kamezawa Hiroyuki
2014-12-16 8:18 ` Kamezawa Hiroyuki
2014-12-15 11:22 ` [PATCH 4/4] workqueue: Handle cpu-node affinity change at CPU_ONLINE Kamezawa Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548FC378.6060809@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=guz.fnst@cn.fujitsu.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tangchen@cn.fujitsu.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.