From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Don Morris <don.morris@hp.com>, "H. Peter Anvin" <hpa@zytor.com>,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Tony Luck <tony.luck@intel.com>, Thomas Renninger <trenn@suse.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Tim Gardner <tim.gardner@canonical.com>,
<linux-kernel@vger.kernel.org>, <tglx@linutronix.de>,
<mingo@redhat.com>, <x86@kernel.org>, <a.p.zijlstra@chello.nl>,
<jarkko.sakkinen@intel.com>, <tangchen@cn.fujitsu.com>
Subject: Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Date: Wed, 27 Feb 2013 09:52:18 +0900 [thread overview]
Message-ID: <512D58C2.1090403@jp.fujitsu.com> (raw)
In-Reply-To: <CAE9FiQU-CV4mF8x=Bi_4pmpmeZW-tSoes_f-jOt7yYrbrppo+A@mail.gmail.com>
2013/02/27 7:44, Yinghai Lu wrote:
> On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Mon, Feb 25, 2013 at 1:27 PM, Don Morris <don.morris@hp.com> wrote:
>>>> On 02/25/2013 10:32 AM, Tim Gardner wrote:
>>>>> On 02/25/2013 08:02 AM, Tim Gardner wrote:
>>>>>> Is this an expected warning ? I'll boot a vanilla kernel just to be sure.
>>>>>>
>>>>>> rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo:
>>>>>>
>>>>>
>>>>> Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft
>>>>> is having an impact:
>>>>
>>>> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but
>>>> still Sandy Bridge, though I don't think that matters).
>>>>
>>>> Bisection leads to:
>>>> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug:
>>>> parse SRAT before memblock is ready
>>>>
>>>> Nothing terribly obvious leaps out as to *why* that reshuffling messes
>>>> up the cpu<-->node bindings, but I wanted to put this out there while
>>>> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print
>>>> outs during boot are the same either way -- if you look at the APIC
>>>> numbers of the processors (from /proc/cpuinfo), the processors should
>>>> be assigned to the correct node, but they aren't.] cc'ing Tang Chen
>>>> in case this is obvious to him or he's already fixed it somewhere not
>>>> on Linus's tree yet.
>>>>
>>>> Don Morris
>>>>
>>>>>
>>>>> [ 0.170435] ------------[ cut here ]------------
>>>>> [ 0.170450] WARNING: at arch/x86/kernel/smpboot.c:324
>>>>> topology_sane.isra.2+0x71/0x84()
>>>>> [ 0.170452] Hardware name: S2600CP
>>>>> [ 0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same
>>>>> node! [node: 1 != 0]. Ignoring dependency.
>>>>> [ 0.156000] smpboot: Booting Node 1, Processors #1
>>>>> [ 0.170455] Modules linked in:
>>>>> [ 0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1
>>>>> [ 0.170461] Call Trace:
>>>>> [ 0.170466] [<ffffffff810597bf>] warn_slowpath_common+0x7f/0xc0
>>>>> [ 0.170473] [<ffffffff810598b6>] warn_slowpath_fmt+0x46/0x50
>>>>> [ 0.170477] [<ffffffff816cc752>] topology_sane.isra.2+0x71/0x84
>>>>> [ 0.170482] [<ffffffff816cc9de>] set_cpu_sibling_map+0x23f/0x436
>>>>> [ 0.170487] [<ffffffff816ccd0c>] start_secondary+0x137/0x201
>>>>> [ 0.170502] ---[ end trace 09222f596307ca1d ]---
>>>
>>> that commit is totally broken, and it should be reverted.
>>>
>>> 1. numa_init is called several times, NOT just for srat. so those
>>> nodes_clear(numa_nodes_parsed)
>>> memset(&numa_meminfo, 0, sizeof(numa_meminfo))
>>> can not be just removed.
>>> please consider sequence is: numaq, srat, amd, dummy.
>>> You need to make fall back path working!
>>>
>>> 2. simply split acpi_numa_init to early_parse_srat.
>>> a. that early_parse_srat is NOT called for ia64, so you break ia64.
>>> b. for (i = 0; i < MAX_LOCAL_APIC; i++)
>>> set_apicid_to_node(i, NUMA_NO_NODE)
>>> still left in numa_init. So it will just clear result from early_parse_srat.
>>> it should be moved before that....
>>
>> c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
>> early before override from INITRD is settled.
>>
>>>
>>> 3. that patch TITLE is total misleading, there is NO x86 in the title,
>>> but it changes
>>> to x86 code.
>>>
>>> 4, it does not CC to TJ and other numa guys...
>
> After looked at the code more, thought that theory that does not let
> kernel use ram
> on hotplug area is not right.
>
> after that commit, following range can not use movable ram:
> 1. real_mode code.... well..funny, legacy cpu0 [0,1M) could be hot-removed?
> 2. dma_continguous ?
> 3. log buff ring.
> 4. initrd... why it will be freed after booting, so it could be on movable...
> 5. crashkernel for kdump...: : looks like we can not put kdump kernel
> above 4G anymore
> 6. initmem_init: it will allocate page table to setup kernel mapping
> for memory..., it should
> be with BRK and near end of max_pfn....
If you use "movablemem_map=srat", abobe memory can not use movable memory.
But in my understanding, current Linux cannot move above memory. So above
memory should not use movable memory.
>
> If node is hotplugable, the mem related stuff like page table and
> vmemmap could be
> on the that node without problem and should be on that node.
>
> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram
> and those cpu with ram could be hotadd and hotremoved.
> Now you want to put page table and vmemmap on first node.
> The system would not boot as not enough memory for cover whole system RAM.
Even if we solve your above mentions, the system cannot boot.
In this case, user should:
o add ram to first cpu
o decreases hotpluggable ram by :
- changing hotpluggable information of SRAT
- using movablemem_map=nn[KMG]@ss[KMG]
Thansk,
Yasuaki Ishimatsu
>
> e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just
> reverted now.
>
> Thanks
>
> Yinghai
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2013-02-27 0:53 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-25 15:02 sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tim Gardner
2013-02-25 15:32 ` Tim Gardner
2013-02-25 21:27 ` Don Morris
2013-02-25 22:50 ` Yinghai Lu
2013-02-26 0:35 ` Yinghai Lu
2013-02-26 2:06 ` Yinghai Lu
2013-02-26 3:21 ` Martin Bligh
2013-02-26 4:20 ` Yinghai Lu
2013-02-26 4:51 ` Martin Bligh
2013-02-26 6:09 ` Tang Chen
2013-02-26 6:57 ` Yinghai Lu
2013-02-26 7:29 ` Tang Chen
2013-02-26 7:53 ` Yasuaki Ishimatsu
2013-03-01 6:37 ` H. Peter Anvin
2013-03-01 8:05 ` Yinghai Lu
2013-03-01 10:59 ` Ingo Molnar
2013-03-01 11:03 ` Borislav Petkov
2013-03-01 11:24 ` Ingo Molnar
2013-03-01 15:32 ` H. Peter Anvin
2013-02-26 1:51 ` Tang Chen
2013-02-26 21:36 ` Yinghai Lu
2013-02-26 22:44 ` Yinghai Lu
2013-02-27 0:52 ` Yasuaki Ishimatsu [this message]
2013-02-27 2:30 ` Yinghai Lu
2013-02-27 3:38 ` Yasuaki Ishimatsu
2013-02-27 4:04 ` Yinghai Lu
2013-02-27 4:43 ` Yasuaki Ishimatsu
2013-02-27 5:11 ` Yinghai Lu
2013-02-27 5:49 ` Yasuaki Ishimatsu
2013-02-27 6:54 ` Yinghai Lu
2013-02-27 7:11 ` Tang Chen
2013-02-27 7:25 ` Yinghai Lu
2013-02-27 7:44 ` Tang Chen
2013-02-28 16:07 ` Yinghai Lu
2013-03-01 1:39 ` Tang Chen
2013-02-27 8:00 ` Lai Jiangshan
2013-02-27 21:26 ` Andrew Morton
2013-02-28 10:01 ` Tang Chen
2013-03-01 3:13 ` Linus Torvalds
2013-03-01 3:46 ` Tang Chen
2013-03-01 4:32 ` Linus Torvalds
2013-03-01 4:38 ` H. Peter Anvin
[not found] ` <CAE9FiQXb7K=QTR4PgMdNSoPm2LgYkxAuXUUZ0BXtgicQOGOaUA@mail.gmail.com>
2013-03-01 6:02 ` Yasuaki Ishimatsu
2013-03-01 7:55 ` Yinghai Lu
2013-03-01 15:43 ` H. Peter Anvin
2013-03-01 22:51 ` [PATCH] x86, ACPI, mm: Revert movablemem_map support Yinghai Lu
2013-03-01 6:18 ` sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tang Chen
2013-03-01 8:02 ` Yinghai Lu
2013-03-01 8:39 ` Yasuaki Ishimatsu
2013-03-01 7:43 ` Yinghai Lu
2013-03-01 11:32 ` Tang Chen
2013-03-01 19:31 ` Yinghai Lu
[not found] ` <CAD11hGx5N9Eqy5bX-SEv9c7oR6Ehz2pUJwdrK0Q=L4S44RC5gg@mail.gmail.com>
2013-03-02 5:46 ` Yinghai Lu
2013-03-01 4:40 ` Andrew Morton
2013-02-27 12:40 ` Don Morris
2013-02-27 16:28 ` Luck, Tony
2013-02-27 17:30 ` Yinghai Lu
2013-02-27 17:50 ` Luck, Tony
2013-02-27 2:14 ` Tang Chen
2013-02-27 2:24 ` Yinghai Lu
2013-02-27 4:32 ` Tang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=512D58C2.1090403@jp.fujitsu.com \
--to=isimatu.yasuaki@jp.fujitsu.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=don.morris@hp.com \
--cc=hpa@zytor.com \
--cc=jarkko.sakkinen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tangchen@cn.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=tim.gardner@canonical.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=trenn@suse.de \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.