Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Tang Chen <tangchen@cn.fujitsu.com>
Cc: mingo@redhat.com, akpm@linux-foundation.org, rjw@rjwysocki.net,
	hpa@zytor.com, laijs@cn.fujitsu.com, yasu.isimatu@gmail.com,
	isimatu.yasuaki@jp.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com,
	izumi.taku@jp.fujitsu.com, gongzhaogang@inspur.com,
	qiaonuohan@cn.fujitsu.com, x86@kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Gu Zheng <guz.fnst@cn.fujitsu.com>
Subject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.
Date: Wed, 15 Jul 2015 17:48:02 -0400	[thread overview]
Message-ID: <20150715214802.GL15934@mtj.duckdns.org> (raw)
In-Reply-To: <1436261425-29881-2-git-send-email-tangchen@cn.fujitsu.com>

Hello,

On Tue, Jul 07, 2015 at 05:30:21PM +0800, Tang Chen wrote:
...
> Why doing this is to prevent memory allocation failure if the cpu is

"The reason for doing this ..."

> online but there is no memory on that node.
> 
> But since cpuid <-> nodeid mapping will fix after this patch-set, doing

"But since cpuid <-> nodeid mapping is planned to be made static, ..."

> so in initialization pharse makes no sense any more. The best near online
> node for each cpu should be cached somewhere.

I'm not really following.  Is this because the now offline node can
later come online and we'd have to break the constant mapping
invariant if we update the mapping later?  If so, it'd be nice to
spell that out.

>  void numa_set_node(int cpu, int node)
>  {
>  	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
> @@ -95,7 +121,11 @@ void numa_set_node(int cpu, int node)
>  		return;
>  	}
>  #endif
> +
> +	per_cpu(x86_cpu_to_near_online_node, cpu) =
> +			find_near_online_node(numa_cpu_node(cpu));
>  	per_cpu(x86_cpu_to_node_map, cpu) = node;
> +	cpumask_set_cpu(cpu, &node_to_cpuid_mask_map[numa_cpu_node(cpu)]);
>  
>  	set_cpu_numa_node(cpu, node);
>  }
> @@ -105,6 +135,13 @@ void numa_clear_node(int cpu)
>  	numa_set_node(cpu, NUMA_NO_NODE);
>  }
>  
> +int get_near_online_node(int node)
> +{
> +	return per_cpu(x86_cpu_to_near_online_node,
> +		       cpumask_first(&node_to_cpuid_mask_map[node]));
> +}
> +EXPORT_SYMBOL(get_near_online_node);

Umm... this function is sitting on a fairly hot path and scanning a
cpumask each time.  Why not just build a numa node -> numa node array?

> @@ -702,24 +739,6 @@ void __init x86_numa_init(void)
>  	numa_init(dummy_numa_init);
>  }
>  
> -static __init int find_near_online_node(int node)
> -{
> -	int n, val;
> -	int min_val = INT_MAX;
> -	int best_node = -1;
> -
> -	for_each_online_node(n) {
> -		val = node_distance(node, n);
> -
> -		if (val < min_val) {
> -			min_val = val;
> -			best_node = n;
> -		}
> -	}
> -
> -	return best_node;
> -}

It's usually better to not mix code movement with actual changes.

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 6ba7cf2..4a18b21 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -307,13 +307,23 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	if (nid < 0)
>  		nid = numa_node_id();
>  
> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
> +	if (!node_online(nid))
> +		nid = get_near_online_node(nid);
> +#endif

Can you please introduce a wrapper function to do the above so that we
don't open code ifdefs?

> +
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> +
> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
> +	if (!node_online(nid))
> +		nid = get_near_online_node(nid);
> +#endif
>  
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }

Ditto.  Also, what's the synchronization rules for NUMA node
on/offlining.  If you end up updating the mapping later, how would
that be synchronized against the above usages?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-07-15 21:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-07  9:30 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-07  9:30 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
2015-07-15 21:48   ` Tejun Heo [this message]
2015-08-04  3:36     ` Tang Chen
2015-08-04  8:05       ` Jiang Liu
2015-08-04  8:24         ` Tang Chen
2015-08-09  6:15         ` Tang Chen
2015-08-12  1:53           ` Jiang Liu
2015-08-04  8:26       ` gongzhaogang
2015-08-04  8:53         ` Tang Chen
2015-08-04  8:58         ` Tang Chen
2015-07-07  9:30 ` [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time Tang Chen
2015-07-07  9:30 ` [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping Tang Chen
2015-07-07 11:14   ` Mika Penttilä
2015-07-15  3:33     ` Tang Chen
2015-07-15  5:35       ` Jiang Liu
2015-07-15  6:26         ` Tang Chen
2015-07-15 22:02   ` Tejun Heo
2015-07-07  9:30 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
2015-07-15 22:06   ` Tejun Heo
2015-07-07  9:30 ` [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Tang Chen
2015-07-15 22:13 ` [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tejun Heo
2015-07-23  4:44   ` Tang Chen
2015-07-23 18:32     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150715214802.GL15934@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gongzhaogang@inspur.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=qiaonuohan@cn.fujitsu.com \
    --cc=rjw@rjwysocki.net \
    --cc=tangchen@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).