linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jiang Liu <jiang.liu@linux.intel.com>
To: Tang Chen <tangchen@cn.fujitsu.com>, Tejun Heo <tj@kernel.org>
Cc: mingo@redhat.com, akpm@linux-foundation.org, rjw@rjwysocki.net,
	hpa@zytor.com, laijs@cn.fujitsu.com, yasu.isimatu@gmail.com,
	isimatu.yasuaki@jp.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com,
	izumi.taku@jp.fujitsu.com, gongzhaogang@inspur.com,
	qiaonuohan@cn.fujitsu.com, x86@kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.
Date: Tue, 4 Aug 2015 16:05:47 +0800	[thread overview]
Message-ID: <55C0725B.80201@linux.intel.com> (raw)
In-Reply-To: <55C03332.2030808@cn.fujitsu.com>

On 2015/8/4 11:36, Tang Chen wrote:
> Hi TJ,
> 
> Sorry for the late reply.
> 
> On 07/16/2015 05:48 AM, Tejun Heo wrote:
>> ......
>> so in initialization pharse makes no sense any more. The best near online
>> node for each cpu should be cached somewhere.
>> I'm not really following.  Is this because the now offline node can
>> later come online and we'd have to break the constant mapping
>> invariant if we update the mapping later?  If so, it'd be nice to
>> spell that out.
> 
> Yes. Will document this in the next version.
> 
>>> ......
>>>   +int get_near_online_node(int node)
>>> +{
>>> +    return per_cpu(x86_cpu_to_near_online_node,
>>> +               cpumask_first(&node_to_cpuid_mask_map[node]));
>>> +}
>>> +EXPORT_SYMBOL(get_near_online_node);
>> Umm... this function is sitting on a fairly hot path and scanning a
>> cpumask each time.  Why not just build a numa node -> numa node array?
> 
> Indeed. Will avoid to scan a cpumask.
> 
>> ......
>>
>>>     static inline struct page *alloc_pages_exact_node(int nid, gfp_t
>>> gfp_mask,
>>>                           unsigned int order)
>>>   {
>>> -    VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>> +    VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>>> +
>>> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
>>> +    if (!node_online(nid))
>>> +        nid = get_near_online_node(nid);
>>> +#endif
>>>         return __alloc_pages(gfp_mask, order, node_zonelist(nid,
>>> gfp_mask));
>>>   }
>> Ditto.  Also, what's the synchronization rules for NUMA node
>> on/offlining.  If you end up updating the mapping later, how would
>> that be synchronized against the above usages?
> 
> I think the near online node map should be updated when node online/offline
> happens. But about this, I think the current numa code has a little
> problem.
> 
> As you know, firmware info binds a set of CPUs and memory to a node. But
> at boot time, if the node has no memory (a memory-less node) , it won't
> be online.
> But the CPUs on that node is available, and bound to the near online node.
> (Here, I mean numa_set_node(cpu, node).)
> 
> Why does the kernel do this ? I think it is used to ensure that we can
> allocate memory
> successfully by calling functions like alloc_pages_node() and
> alloc_pages_exact_node().
> By these two fuctions, any CPU should be bound to a node who has memory
> so that
> memory allocation can be successful.
> 
> That means, for a memory-less node at boot time, CPUs on the node is
> online,
> but the node is not online.
> 
> That also means, "the node is online" equals to "the node has memory".
> Actually, there
> are a lot of code in the kernel is using this rule.
> 
> 
> But,
> 1) in cpu_up(), it will try to online a node, and it doesn't check if
> the node has memory.
> 2) in try_offline_node(), it offlines CPUs first, and then the memory.
> 
> This behavior looks a little wired, or let's say it is ambiguous. It
> seems that a NUMA node
> consists of CPUs and memory. So if the CPUs are online, the node should
> be online.
Hi Chen,
	I have posted a patch set to enable memoryless node on x86,
will repost it for review:) Hope it help to solve this issue.
Thanks!
Gerry

> 
> And also,
> The main purpose of this patch-set is to make the cpuid <-> nodeid
> mapping persistent.
> After this patch-set, alloc_pages_node() and alloc_pages_exact_node()
> won't depend on
> cpuid <-> nodeid mapping any more. So the node should be online if the
> CPUs on it are
> online. Otherwise, we cannot setup interfaces of CPUs under /sys.
> 
> 
> Unfortunately, since I don't have a machine a with memory-less node, I
> cannot reproduce
> the problem right now.
> 
> How do you think the node online behavior should be changed ?
> 
> Thanks.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-08-04  8:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-07  9:30 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-07  9:30 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
2015-07-15 21:48   ` Tejun Heo
2015-08-04  3:36     ` Tang Chen
2015-08-04  8:05       ` Jiang Liu [this message]
2015-08-04  8:24         ` Tang Chen
2015-08-09  6:15         ` Tang Chen
2015-08-12  1:53           ` Jiang Liu
2015-08-04  8:26       ` gongzhaogang
2015-08-04  8:53         ` Tang Chen
2015-08-04  8:58         ` Tang Chen
2015-07-07  9:30 ` [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time Tang Chen
2015-07-07  9:30 ` [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping Tang Chen
2015-07-07 11:14   ` Mika Penttilä
2015-07-15  3:33     ` Tang Chen
2015-07-15  5:35       ` Jiang Liu
2015-07-15  6:26         ` Tang Chen
2015-07-15 22:02   ` Tejun Heo
2015-07-07  9:30 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
2015-07-15 22:06   ` Tejun Heo
2015-07-07  9:30 ` [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Tang Chen
2015-07-15 22:13 ` [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tejun Heo
2015-07-23  4:44   ` Tang Chen
2015-07-23 18:32     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C0725B.80201@linux.intel.com \
    --to=jiang.liu@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=gongzhaogang@inspur.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=qiaonuohan@cn.fujitsu.com \
    --cc=rjw@rjwysocki.net \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).