public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tang Chen <tangchen@cn.fujitsu.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Don Morris <don.morris@hp.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tony Luck <tony.luck@intel.com>, Thomas Renninger <trenn@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tim Gardner <tim.gardner@canonical.com>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, x86@kernel.org, a.p.zijlstra@chello.nl,
	jarkko.sakkinen@intel.com
Subject: Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Date: Wed, 27 Feb 2013 12:32:37 +0800	[thread overview]
Message-ID: <512D8C65.50505@cn.fujitsu.com> (raw)
In-Reply-To: <CAE9FiQUCLGta4bmpP7j_L29SQuob+B=fWx5J+XyMq17Dmz0SeQ@mail.gmail.com>

On 02/27/2013 10:24 AM, Yinghai Lu wrote:
>>> After looked at the code more, thought that theory that does not let
>>> kernel use ram
>>> on hotplug area is not right.
>>>
>>> after that commit, following range can not use movable ram:
>>> 1. real_mode code.... well..funny, legacy cpu0 [0,1M) could be
>>> hot-removed?
>>> 2. dma_continguous ?
>>> 3. log buff ring.
>>> 4. initrd... why it will be freed after booting, so it could be on
>>> movable...
>>> 5. crashkernel for kdump...: : looks like we can not put kdump kernel
>>> above 4G anymore
>>> 6. initmem_init: it will allocate page table to setup kernel mapping
>>> for memory..., it should
>>> be with BRK and near end of max_pfn....
>>
>>
>> AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So
>> any memory
>> used by the kernel should not be on movable area.
>
> that depends.
>
> initrd will be freed later, so it should be put anywhere that is under
> max_pfn during boot.
>

OK,but initrd is not that big. Actually, before my code start to work, 
memblock
has reserved some memory. But it is not that big. On the other hand, it 
is not that
easy to find out which memory should be kept in unmovable area, and 
which should not.

>>
>>
>>>
>>> If node is hotplugable, the mem related stuff like page table and
>>> vmemmap could be
>>> on the that node without problem and should be on that node.
>>
>>
>> page tables and vmemmap are kernel memory. They should not be movable, I
>> think.
>
> why do you need to migrate pagetable and vmemmap for the memory range
> that will be
> offline ?

Hum, you are right. :)

True, we can store pagetable and vmemmap on the node that is hot-pluggable.
But just like the page_cgroup structs, we need additional work to handle it.

But based on the existing code, we didn't do any special handling. I think
we can improve it if needed. :)

>
>>
>>
>>>
>>> assume first cpu only have 1G ram, and other 31 socket will have bunch of
>>> ram
>>> and those cpu with ram could be hotadd and hotremoved.
>>> Now you want to put page table and vmemmap on first node.
>>> The system would not boot as not enough memory for cover whole system RAM.
>>
>>
>> Yes, you are right. And a more extreme situation has been talked about by
>> HPA.
>>
>> "If all the memory is hot-pluggable, then the kernel won't be able to boot."
>>
>> So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb:
>>          acpi, memory-hotplug: support getting hotplug info from SRAT
>>
>> I have excluded all the memory reserved by memblock, and any node that has
>> memory
>> reserved by memblock will be set to un-hot-pluggable, which means we will
>> have
>> enough memory (all the memory on the node) to boot the kernel. So I think
>> the problem
>> you are talking about has been solved.
>
> I don't think that you understand the problem.
>
> for the system that will put all pagetable and vmemmap on the 1G ram
> of first cpu.
> as all other ram are MOVABLE, so memblock_find_in_range will not use any local
> ram on those nodes.
>

Yes, I konw that. :)

In this case, the kernel will not able to use local ram on those nodes. 
It will
cause some performance down.

I mean if the 1G ram is not enough for the kernel to boot, the current 
code will
set all the ram on the same node as un-hot-pluggable.

If all the ram on the node is not enough for kernel to boot, it is a 
really extreme
situation, IIUC.

I think users can solve this problem in two ways:
1) add more ram to the node.
2) use movablemem_map=nn[KMG]@ss[KMG] to configure more ram as unmovable.


Thanks. :)


      reply	other threads:[~2013-02-27  4:33 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-25 15:02 sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tim Gardner
2013-02-25 15:32 ` Tim Gardner
2013-02-25 21:27   ` Don Morris
2013-02-25 22:50     ` Yinghai Lu
2013-02-26  0:35       ` Yinghai Lu
2013-02-26  2:06         ` Yinghai Lu
2013-02-26  3:21           ` Martin Bligh
2013-02-26  4:20             ` Yinghai Lu
2013-02-26  4:51               ` Martin Bligh
2013-02-26  6:09                 ` Tang Chen
2013-02-26  6:57                   ` Yinghai Lu
2013-02-26  7:29                     ` Tang Chen
2013-02-26  7:53                     ` Yasuaki Ishimatsu
2013-03-01  6:37                 ` H. Peter Anvin
2013-03-01  8:05                   ` Yinghai Lu
2013-03-01 10:59                   ` Ingo Molnar
2013-03-01 11:03                   ` Borislav Petkov
2013-03-01 11:24                     ` Ingo Molnar
2013-03-01 15:32                       ` H. Peter Anvin
2013-02-26  1:51       ` Tang Chen
2013-02-26 21:36       ` Yinghai Lu
2013-02-26 22:44         ` Yinghai Lu
2013-02-27  0:52           ` Yasuaki Ishimatsu
2013-02-27  2:30             ` Yinghai Lu
2013-02-27  3:38               ` Yasuaki Ishimatsu
2013-02-27  4:04                 ` Yinghai Lu
2013-02-27  4:43                   ` Yasuaki Ishimatsu
2013-02-27  5:11                     ` Yinghai Lu
2013-02-27  5:49                       ` Yasuaki Ishimatsu
2013-02-27  6:54                         ` Yinghai Lu
2013-02-27  7:11                           ` Tang Chen
2013-02-27  7:25                             ` Yinghai Lu
2013-02-27  7:44                               ` Tang Chen
2013-02-28 16:07                                 ` Yinghai Lu
2013-03-01  1:39                                   ` Tang Chen
2013-02-27  8:00                       ` Lai Jiangshan
2013-02-27 21:26                         ` Andrew Morton
2013-02-28 10:01                           ` Tang Chen
2013-03-01  3:13                           ` Linus Torvalds
2013-03-01  3:46                             ` Tang Chen
2013-03-01  4:32                               ` Linus Torvalds
2013-03-01  4:38                                 ` H. Peter Anvin
     [not found]                                   ` <CAE9FiQXb7K=QTR4PgMdNSoPm2LgYkxAuXUUZ0BXtgicQOGOaUA@mail.gmail.com>
2013-03-01  6:02                                     ` Yasuaki Ishimatsu
2013-03-01  7:55                                       ` Yinghai Lu
2013-03-01 15:43                                         ` H. Peter Anvin
2013-03-01 22:51                                         ` [PATCH] x86, ACPI, mm: Revert movablemem_map support Yinghai Lu
2013-03-01  6:18                                     ` sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tang Chen
2013-03-01  8:02                                       ` Yinghai Lu
2013-03-01  8:39                                         ` Yasuaki Ishimatsu
2013-03-01  7:43                                     ` Yinghai Lu
2013-03-01 11:32                                       ` Tang Chen
2013-03-01 19:31                                       ` Yinghai Lu
     [not found]                                         ` <CAD11hGx5N9Eqy5bX-SEv9c7oR6Ehz2pUJwdrK0Q=L4S44RC5gg@mail.gmail.com>
2013-03-02  5:46                                           ` Yinghai Lu
2013-03-01  4:40                                 ` Andrew Morton
2013-02-27 12:40                       ` Don Morris
2013-02-27 16:28             ` Luck, Tony
2013-02-27 17:30               ` Yinghai Lu
2013-02-27 17:50                 ` Luck, Tony
2013-02-27  2:14           ` Tang Chen
2013-02-27  2:24             ` Yinghai Lu
2013-02-27  4:32               ` Tang Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512D8C65.50505@cn.fujitsu.com \
    --to=tangchen@cn.fujitsu.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=don.morris@hp.com \
    --cc=hpa@zytor.com \
    --cc=jarkko.sakkinen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.gardner@canonical.com \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trenn@suse.de \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox