From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Don Morris <don.morris@hp.com>, "H. Peter Anvin" <hpa@zytor.com>,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Tony Luck <tony.luck@intel.com>, Thomas Renninger <trenn@suse.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Tim Gardner <tim.gardner@canonical.com>,
<linux-kernel@vger.kernel.org>, <tglx@linutronix.de>,
<mingo@redhat.com>, <x86@kernel.org>, <a.p.zijlstra@chello.nl>,
<jarkko.sakkinen@intel.com>, <tangchen@cn.fujitsu.com>
Subject: Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Date: Wed, 27 Feb 2013 09:52:18 +0900 [thread overview]
Message-ID: <512D58C2.1090403@jp.fujitsu.com> (raw)
In-Reply-To: <CAE9FiQU-CV4mF8x=Bi_4pmpmeZW-tSoes_f-jOt7yYrbrppo+A@mail.gmail.com>
2013/02/27 7:44, Yinghai Lu wrote:
> On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Mon, Feb 25, 2013 at 1:27 PM, Don Morris <don.morris@hp.com> wrote:
>>>> On 02/25/2013 10:32 AM, Tim Gardner wrote:
>>>>> On 02/25/2013 08:02 AM, Tim Gardner wrote:
>>>>>> Is this an expected warning ? I'll boot a vanilla kernel just to be sure.
>>>>>>
>>>>>> rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo:
>>>>>>
>>>>>
>>>>> Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft
>>>>> is having an impact:
>>>>
>>>> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but
>>>> still Sandy Bridge, though I don't think that matters).
>>>>
>>>> Bisection leads to:
>>>> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug:
>>>> parse SRAT before memblock is ready
>>>>
>>>> Nothing terribly obvious leaps out as to *why* that reshuffling messes
>>>> up the cpu<-->node bindings, but I wanted to put this out there while
>>>> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print
>>>> outs during boot are the same either way -- if you look at the APIC
>>>> numbers of the processors (from /proc/cpuinfo), the processors should
>>>> be assigned to the correct node, but they aren't.] cc'ing Tang Chen
>>>> in case this is obvious to him or he's already fixed it somewhere not
>>>> on Linus's tree yet.
>>>>
>>>> Don Morris
>>>>
>>>>>
>>>>> [ 0.170435] ------------[ cut here ]------------
>>>>> [ 0.170450] WARNING: at arch/x86/kernel/smpboot.c:324
>>>>> topology_sane.isra.2+0x71/0x84()
>>>>> [ 0.170452] Hardware name: S2600CP
>>>>> [ 0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same
>>>>> node! [node: 1 != 0]. Ignoring dependency.
>>>>> [ 0.156000] smpboot: Booting Node 1, Processors #1
>>>>> [ 0.170455] Modules linked in:
>>>>> [ 0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1
>>>>> [ 0.170461] Call Trace:
>>>>> [ 0.170466] [<ffffffff810597bf>] warn_slowpath_common+0x7f/0xc0
>>>>> [ 0.170473] [<ffffffff810598b6>] warn_slowpath_fmt+0x46/0x50
>>>>> [ 0.170477] [<ffffffff816cc752>] topology_sane.isra.2+0x71/0x84
>>>>> [ 0.170482] [<ffffffff816cc9de>] set_cpu_sibling_map+0x23f/0x436
>>>>> [ 0.170487] [<ffffffff816ccd0c>] start_secondary+0x137/0x201
>>>>> [ 0.170502] ---[ end trace 09222f596307ca1d ]---
>>>
>>> that commit is totally broken, and it should be reverted.
>>>
>>> 1. numa_init is called several times, NOT just for srat. so those
>>> nodes_clear(numa_nodes_parsed)
>>> memset(&numa_meminfo, 0, sizeof(numa_meminfo))
>>> can not be just removed.
>>> please consider sequence is: numaq, srat, amd, dummy.
>>> You need to make fall back path working!
>>>
>>> 2. simply split acpi_numa_init to early_parse_srat.
>>> a. that early_parse_srat is NOT called for ia64, so you break ia64.
>>> b. for (i = 0; i < MAX_LOCAL_APIC; i++)
>>> set_apicid_to_node(i, NUMA_NO_NODE)
>>> still left in numa_init. So it will just clear result from early_parse_srat.
>>> it should be moved before that....
>>
>> c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
>> early before override from INITRD is settled.
>>
>>>
>>> 3. that patch TITLE is total misleading, there is NO x86 in the title,
>>> but it changes
>>> to x86 code.
>>>
>>> 4, it does not CC to TJ and other numa guys...
>
> After looked at the code more, thought that theory that does not let
> kernel use ram
> on hotplug area is not right.
>
> after that commit, following range can not use movable ram:
> 1. real_mode code.... well..funny, legacy cpu0 [0,1M) could be hot-removed?
> 2. dma_continguous ?
> 3. log buff ring.
> 4. initrd... why it will be freed after booting, so it could be on movable...
> 5. crashkernel for kdump...: : looks like we can not put kdump kernel
> above 4G anymore
> 6. initmem_init: it will allocate page table to setup kernel mapping
> for memory..., it should
> be with BRK and near end of max_pfn....
If you use "movablemem_map=srat", abobe memory can not use movable memory.
But in my understanding, current Linux cannot move above memory. So above
memory should not use movable memory.
>
> If node is hotplugable, the mem related stuff like page table and
> vmemmap could be
> on the that node without problem and should be on that node.
>
> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram
> and those cpu with ram could be hotadd and hotremoved.
> Now you want to put page table and vmemmap on first node.
> The system would not boot as not enough memory for cover whole system RAM.
Even if we solve your above mentions, the system cannot boot.
In this case, user should:
o add ram to first cpu
o decreases hotpluggable ram by :
- changing hotpluggable information of SRAT
- using movablemem_map=nn[KMG]@ss[KMG]
Thansk,
Yasuaki Ishimatsu
>
> e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just
> reverted now.
>
> Thanks
>
> Yinghai
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2013-02-27 0:53 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-25 15:02 sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tim Gardner
2013-02-25 15:32 ` Tim Gardner
2013-02-25 21:27 ` Don Morris
2013-02-25 22:50 ` Yinghai Lu
2013-02-26 0:35 ` Yinghai Lu
2013-02-26 2:06 ` Yinghai Lu
2013-02-26 3:21 ` Martin Bligh
2013-02-26 4:20 ` Yinghai Lu
2013-02-26 4:51 ` Martin Bligh
2013-02-26 6:09 ` Tang Chen
2013-02-26 6:57 ` Yinghai Lu
2013-02-26 7:29 ` Tang Chen
2013-02-26 7:53 ` Yasuaki Ishimatsu
2013-03-01 6:37 ` H. Peter Anvin
2013-03-01 8:05 ` Yinghai Lu
2013-03-01 10:59 ` Ingo Molnar
2013-03-01 11:03 ` Borislav Petkov
2013-03-01 11:24 ` Ingo Molnar
2013-03-01 15:32 ` H. Peter Anvin
2013-02-26 1:51 ` Tang Chen
2013-02-26 21:36 ` Yinghai Lu
2013-02-26 22:44 ` Yinghai Lu
2013-02-27 0:52 ` Yasuaki Ishimatsu [this message]
2013-02-27 2:30 ` Yinghai Lu
2013-02-27 3:38 ` Yasuaki Ishimatsu
2013-02-27 4:04 ` Yinghai Lu
2013-02-27 4:43 ` Yasuaki Ishimatsu
2013-02-27 5:11 ` Yinghai Lu
2013-02-27 5:49 ` Yasuaki Ishimatsu
2013-02-27 6:54 ` Yinghai Lu
2013-02-27 7:11 ` Tang Chen
2013-02-27 7:25 ` Yinghai Lu
2013-02-27 7:44 ` Tang Chen
2013-02-28 16:07 ` Yinghai Lu
2013-03-01 1:39 ` Tang Chen
2013-02-27 8:00 ` Lai Jiangshan
2013-02-27 21:26 ` Andrew Morton
2013-02-28 10:01 ` Tang Chen
2013-03-01 3:13 ` Linus Torvalds
2013-03-01 3:46 ` Tang Chen
2013-03-01 4:32 ` Linus Torvalds
2013-03-01 4:38 ` H. Peter Anvin
[not found] ` <CAE9FiQXb7K=QTR4PgMdNSoPm2LgYkxAuXUUZ0BXtgicQOGOaUA@mail.gmail.com>
2013-03-01 6:02 ` Yasuaki Ishimatsu
2013-03-01 7:55 ` Yinghai Lu
2013-03-01 15:43 ` H. Peter Anvin
2013-03-01 22:51 ` [PATCH] x86, ACPI, mm: Revert movablemem_map support Yinghai Lu
2013-03-01 6:18 ` sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tang Chen
2013-03-01 8:02 ` Yinghai Lu
2013-03-01 8:39 ` Yasuaki Ishimatsu
2013-03-01 7:43 ` Yinghai Lu
2013-03-01 11:32 ` Tang Chen
2013-03-01 19:31 ` Yinghai Lu
[not found] ` <CAD11hGx5N9Eqy5bX-SEv9c7oR6Ehz2pUJwdrK0Q=L4S44RC5gg@mail.gmail.com>
2013-03-02 5:46 ` Yinghai Lu
2013-03-01 4:40 ` Andrew Morton
2013-02-27 12:40 ` Don Morris
2013-02-27 16:28 ` Luck, Tony
2013-02-27 17:30 ` Yinghai Lu
2013-02-27 17:50 ` Luck, Tony
2013-02-27 2:14 ` Tang Chen
2013-02-27 2:24 ` Yinghai Lu
2013-02-27 4:32 ` Tang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=512D58C2.1090403@jp.fujitsu.com \
--to=isimatu.yasuaki@jp.fujitsu.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=don.morris@hp.com \
--cc=hpa@zytor.com \
--cc=jarkko.sakkinen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tangchen@cn.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=tim.gardner@canonical.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=trenn@suse.de \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox