From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: benh@kernel.crashing.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
David Rientjes <rientjes@google.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Tony Luck <tony.luck@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>,
linux-ia64@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
Tejun Heo <tj@kernel.org>
Subject: Re: [RFC 1/2] workqueue: use the nearest NUMA node, not the local one
Date: Fri, 18 Jul 2014 08:11:20 +0000 [thread overview]
Message-ID: <53C8D6A8.3040400@cn.fujitsu.com> (raw)
In-Reply-To: <20140717230958.GB32660@linux.vnet.ibm.com>
Hi,
I'm curious about what will it happen when alloc_pages_node(memoryless_node).
If the memory is allocated from the most preferable node for the @memoryless_node,
why we need to bother and use cpu_to_mem() in the caller site?
If not, why the memory allocation subsystem refuses to find a preferable node
for @memoryless_node in this case? Does it intend on some purpose or
it can't find in some cases?
Thanks,
Lai
Added CC to Tejun (workqueue maintainer).
On 07/18/2014 07:09 AM, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, the workqueue code incorrectly uses
> cpu_to_node() to determine what node to prefer memory allocations come
> from. cpu_to_mem() should be used instead, which will use the nearest
> NUMA node with memory.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 35974ac..0bba022 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3547,7 +3547,12 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> for_each_node(node) {
> if (cpumask_subset(pool->attrs->cpumask,
> wq_numa_possible_cpumask[node])) {
> - pool->node = node;
> + /*
> + * We could use local_memory_node(node) here,
> + * but it is expensive and the following caches
> + * the same value.
> + */
> + pool->node = cpu_to_mem(cpumask_first(pool->attrs->cpumask));
> break;
> }
> }
> @@ -4921,7 +4926,7 @@ static int __init init_workqueues(void)
> pool->cpu = cpu;
> cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
> pool->attrs->nice = std_nice[i++];
> - pool->node = cpu_to_node(cpu);
> + pool->node = cpu_to_mem(cpu);
>
> /* alloc pool ID */
> mutex_lock(&wq_pool_mutex);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
WARNING: multiple messages have this Message-ID (diff)
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
Tony Luck <tony.luck@intel.com>,
linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, David Rientjes <rientjes@google.com>,
Tejun Heo <tj@kernel.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
linuxppc-dev@lists.ozlabs.org,
Jiang Liu <jiang.liu@linux.intel.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>
Subject: Re: [RFC 1/2] workqueue: use the nearest NUMA node, not the local one
Date: Fri, 18 Jul 2014 16:11:20 +0800 [thread overview]
Message-ID: <53C8D6A8.3040400@cn.fujitsu.com> (raw)
In-Reply-To: <20140717230958.GB32660@linux.vnet.ibm.com>
Hi,
I'm curious about what will it happen when alloc_pages_node(memoryless_node).
If the memory is allocated from the most preferable node for the @memoryless_node,
why we need to bother and use cpu_to_mem() in the caller site?
If not, why the memory allocation subsystem refuses to find a preferable node
for @memoryless_node in this case? Does it intend on some purpose or
it can't find in some cases?
Thanks,
Lai
Added CC to Tejun (workqueue maintainer).
On 07/18/2014 07:09 AM, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, the workqueue code incorrectly uses
> cpu_to_node() to determine what node to prefer memory allocations come
> from. cpu_to_mem() should be used instead, which will use the nearest
> NUMA node with memory.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 35974ac..0bba022 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3547,7 +3547,12 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> for_each_node(node) {
> if (cpumask_subset(pool->attrs->cpumask,
> wq_numa_possible_cpumask[node])) {
> - pool->node = node;
> + /*
> + * We could use local_memory_node(node) here,
> + * but it is expensive and the following caches
> + * the same value.
> + */
> + pool->node = cpu_to_mem(cpumask_first(pool->attrs->cpumask));
> break;
> }
> }
> @@ -4921,7 +4926,7 @@ static int __init init_workqueues(void)
> pool->cpu = cpu;
> cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
> pool->attrs->nice = std_nice[i++];
> - pool->node = cpu_to_node(cpu);
> + pool->node = cpu_to_mem(cpu);
>
> /* alloc pool ID */
> mutex_lock(&wq_pool_mutex);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
WARNING: multiple messages have this Message-ID (diff)
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: benh@kernel.crashing.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
David Rientjes <rientjes@google.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Tony Luck <tony.luck@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>,
linux-ia64@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
Tejun Heo <tj@kernel.org>
Subject: Re: [RFC 1/2] workqueue: use the nearest NUMA node, not the local one
Date: Fri, 18 Jul 2014 16:11:20 +0800 [thread overview]
Message-ID: <53C8D6A8.3040400@cn.fujitsu.com> (raw)
In-Reply-To: <20140717230958.GB32660@linux.vnet.ibm.com>
Hi,
I'm curious about what will it happen when alloc_pages_node(memoryless_node).
If the memory is allocated from the most preferable node for the @memoryless_node,
why we need to bother and use cpu_to_mem() in the caller site?
If not, why the memory allocation subsystem refuses to find a preferable node
for @memoryless_node in this case? Does it intend on some purpose or
it can't find in some cases?
Thanks,
Lai
Added CC to Tejun (workqueue maintainer).
On 07/18/2014 07:09 AM, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, the workqueue code incorrectly uses
> cpu_to_node() to determine what node to prefer memory allocations come
> from. cpu_to_mem() should be used instead, which will use the nearest
> NUMA node with memory.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 35974ac..0bba022 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3547,7 +3547,12 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> for_each_node(node) {
> if (cpumask_subset(pool->attrs->cpumask,
> wq_numa_possible_cpumask[node])) {
> - pool->node = node;
> + /*
> + * We could use local_memory_node(node) here,
> + * but it is expensive and the following caches
> + * the same value.
> + */
> + pool->node = cpu_to_mem(cpumask_first(pool->attrs->cpumask));
> break;
> }
> }
> @@ -4921,7 +4926,7 @@ static int __init init_workqueues(void)
> pool->cpu = cpu;
> cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
> pool->attrs->nice = std_nice[i++];
> - pool->node = cpu_to_node(cpu);
> + pool->node = cpu_to_mem(cpu);
>
> /* alloc pool ID */
> mutex_lock(&wq_pool_mutex);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: <benh@kernel.crashing.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
David Rientjes <rientjes@google.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Tony Luck <tony.luck@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>, <linux-ia64@vger.kernel.org>,
<linux-mm@kvack.org>, <linuxppc-dev@lists.ozlabs.org>,
<linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>
Subject: Re: [RFC 1/2] workqueue: use the nearest NUMA node, not the local one
Date: Fri, 18 Jul 2014 16:11:20 +0800 [thread overview]
Message-ID: <53C8D6A8.3040400@cn.fujitsu.com> (raw)
In-Reply-To: <20140717230958.GB32660@linux.vnet.ibm.com>
Hi,
I'm curious about what will it happen when alloc_pages_node(memoryless_node).
If the memory is allocated from the most preferable node for the @memoryless_node,
why we need to bother and use cpu_to_mem() in the caller site?
If not, why the memory allocation subsystem refuses to find a preferable node
for @memoryless_node in this case? Does it intend on some purpose or
it can't find in some cases?
Thanks,
Lai
Added CC to Tejun (workqueue maintainer).
On 07/18/2014 07:09 AM, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, the workqueue code incorrectly uses
> cpu_to_node() to determine what node to prefer memory allocations come
> from. cpu_to_mem() should be used instead, which will use the nearest
> NUMA node with memory.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 35974ac..0bba022 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3547,7 +3547,12 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> for_each_node(node) {
> if (cpumask_subset(pool->attrs->cpumask,
> wq_numa_possible_cpumask[node])) {
> - pool->node = node;
> + /*
> + * We could use local_memory_node(node) here,
> + * but it is expensive and the following caches
> + * the same value.
> + */
> + pool->node = cpu_to_mem(cpumask_first(pool->attrs->cpumask));
> break;
> }
> }
> @@ -4921,7 +4926,7 @@ static int __init init_workqueues(void)
> pool->cpu = cpu;
> cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
> pool->attrs->nice = std_nice[i++];
> - pool->node = cpu_to_node(cpu);
> + pool->node = cpu_to_mem(cpu);
>
> /* alloc pool ID */
> mutex_lock(&wq_pool_mutex);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2014-07-18 8:11 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-17 23:09 [RFC 0/2] Memoryless nodes and kworker Nishanth Aravamudan
2014-07-17 23:09 ` Nishanth Aravamudan
2014-07-17 23:09 ` Nishanth Aravamudan
2014-07-17 23:09 ` Nishanth Aravamudan
2014-07-17 23:09 ` [RFC 1/2] workqueue: use the nearest NUMA node, not the local one Nishanth Aravamudan
2014-07-17 23:09 ` Nishanth Aravamudan
2014-07-17 23:09 ` Nishanth Aravamudan
2014-07-17 23:09 ` Nishanth Aravamudan
2014-07-17 23:15 ` [RFC 2/2] powerpc: reorder per-cpu NUMA information's initialization Nishanth Aravamudan
2014-07-17 23:15 ` Nishanth Aravamudan
2014-07-17 23:15 ` Nishanth Aravamudan
2014-07-17 23:15 ` Nishanth Aravamudan
2014-07-18 8:11 ` Lai Jiangshan [this message]
2014-07-18 8:11 ` [RFC 1/2] workqueue: use the nearest NUMA node, not the local one Lai Jiangshan
2014-07-18 8:11 ` Lai Jiangshan
2014-07-18 8:11 ` Lai Jiangshan
2014-07-18 17:33 ` Nish Aravamudan
2014-07-18 17:33 ` Nish Aravamudan
2014-07-18 11:20 ` [RFC 0/2] Memoryless nodes and kworker Tejun Heo
2014-07-18 11:20 ` Tejun Heo
2014-07-18 11:20 ` Tejun Heo
2014-07-18 11:20 ` Tejun Heo
2014-07-18 17:42 ` Nish Aravamudan
2014-07-18 17:42 ` Nish Aravamudan
2014-07-18 18:00 ` Tejun Heo
2014-07-18 18:00 ` Tejun Heo
2014-07-18 18:00 ` Tejun Heo
2014-07-18 18:00 ` Tejun Heo
2014-07-18 18:01 ` Tejun Heo
2014-07-18 18:01 ` Tejun Heo
2014-07-18 18:01 ` Tejun Heo
2014-07-18 18:01 ` Tejun Heo
2014-07-18 18:12 ` Nish Aravamudan
2014-07-18 18:12 ` Nish Aravamudan
2014-07-18 18:19 ` Tejun Heo
2014-07-18 18:19 ` Tejun Heo
2014-07-18 18:19 ` Tejun Heo
2014-07-18 18:19 ` Tejun Heo
2014-07-18 18:47 ` Nish Aravamudan
2014-07-18 18:47 ` Nish Aravamudan
2014-07-18 18:58 ` Tejun Heo
2014-07-18 18:58 ` Tejun Heo
2014-07-18 18:58 ` Tejun Heo
2014-07-18 18:58 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53C8D6A8.3040400@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=benh@kernel.crashing.org \
--cc=fenghua.yu@intel.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=nacc@linux.vnet.ibm.com \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.