public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Don Morris <don.morris@hp.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tony Luck <tony.luck@intel.com>, Thomas Renninger <trenn@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tim Gardner <tim.gardner@canonical.com>,
	<linux-kernel@vger.kernel.org>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <x86@kernel.org>, <a.p.zijlstra@chello.nl>,
	<jarkko.sakkinen@intel.com>, <tangchen@cn.fujitsu.com>
Subject: Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Date: Wed, 27 Feb 2013 09:52:18 +0900	[thread overview]
Message-ID: <512D58C2.1090403@jp.fujitsu.com> (raw)
In-Reply-To: <CAE9FiQU-CV4mF8x=Bi_4pmpmeZW-tSoes_f-jOt7yYrbrppo+A@mail.gmail.com>

2013/02/27 7:44, Yinghai Lu wrote:
> On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Mon, Feb 25, 2013 at 1:27 PM, Don Morris <don.morris@hp.com> wrote:
>>>> On 02/25/2013 10:32 AM, Tim Gardner wrote:
>>>>> On 02/25/2013 08:02 AM, Tim Gardner wrote:
>>>>>> Is this an expected warning ? I'll boot a vanilla kernel just to be sure.
>>>>>>
>>>>>> rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo:
>>>>>>
>>>>>
>>>>> Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft
>>>>> is having an impact:
>>>>
>>>> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but
>>>> still Sandy Bridge, though I don't think that matters).
>>>>
>>>> Bisection leads to:
>>>> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug:
>>>> parse SRAT before memblock is ready
>>>>
>>>> Nothing terribly obvious leaps out as to *why* that reshuffling messes
>>>> up the cpu<-->node bindings, but I wanted to put this out there while
>>>> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print
>>>> outs during boot are the same either way -- if you look at the APIC
>>>> numbers of the processors (from /proc/cpuinfo), the processors should
>>>> be assigned to the correct node, but they aren't.] cc'ing Tang Chen
>>>> in case this is obvious to him or he's already fixed it somewhere not
>>>> on Linus's tree yet.
>>>>
>>>> Don Morris
>>>>
>>>>>
>>>>> [    0.170435] ------------[ cut here ]------------
>>>>> [    0.170450] WARNING: at arch/x86/kernel/smpboot.c:324
>>>>> topology_sane.isra.2+0x71/0x84()
>>>>> [    0.170452] Hardware name: S2600CP
>>>>> [    0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same
>>>>> node! [node: 1 != 0]. Ignoring dependency.
>>>>> [    0.156000] smpboot: Booting Node   1, Processors  #1
>>>>> [    0.170455] Modules linked in:
>>>>> [    0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1
>>>>> [    0.170461] Call Trace:
>>>>> [    0.170466]  [<ffffffff810597bf>] warn_slowpath_common+0x7f/0xc0
>>>>> [    0.170473]  [<ffffffff810598b6>] warn_slowpath_fmt+0x46/0x50
>>>>> [    0.170477]  [<ffffffff816cc752>] topology_sane.isra.2+0x71/0x84
>>>>> [    0.170482]  [<ffffffff816cc9de>] set_cpu_sibling_map+0x23f/0x436
>>>>> [    0.170487]  [<ffffffff816ccd0c>] start_secondary+0x137/0x201
>>>>> [    0.170502] ---[ end trace 09222f596307ca1d ]---
>>>
>>> that commit is totally broken, and it should be reverted.
>>>
>>> 1. numa_init is called several times, NOT just for srat. so those
>>>     nodes_clear(numa_nodes_parsed)
>>>     memset(&numa_meminfo, 0, sizeof(numa_meminfo))
>>> can not be just removed.
>>> please consider sequence is: numaq, srat, amd, dummy.
>>> You need to make fall back path working!
>>>
>>> 2. simply split acpi_numa_init to early_parse_srat.
>>> a. that early_parse_srat is NOT called for ia64, so you break ia64.
>>> b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
>>>       set_apicid_to_node(i, NUMA_NO_NODE)
>>> still left in numa_init. So it will just clear result from early_parse_srat.
>>> it should be moved before that....
>>
>>     c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
>> early before override from INITRD is settled.
>>
>>>
>>> 3. that patch TITLE is total misleading, there is NO x86 in the title,
>>> but it changes
>>> to x86 code.
>>>
>>> 4, it does not CC to TJ and other numa guys...
>
> After looked at the code more, thought that theory that does not let
> kernel use ram
> on hotplug area is not right.
>

> after that commit, following range can not use movable ram:
> 1. real_mode code.... well..funny, legacy cpu0 [0,1M) could be hot-removed?
> 2. dma_continguous ?
> 3. log buff ring.
> 4. initrd... why it will be freed after booting, so it could be on movable...
> 5. crashkernel for kdump...: : looks like we can not put kdump kernel
> above 4G anymore
> 6. initmem_init: it will allocate page table to setup kernel mapping
> for memory..., it should
> be with BRK and near end of max_pfn....

If you use "movablemem_map=srat", abobe memory can not use movable memory.
But in my understanding, current Linux cannot move above memory. So above
memory should not use movable memory.

>
> If node is hotplugable, the mem related stuff like page table and
> vmemmap could be
> on the that node without problem and should be on that node.
>

> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram
> and those cpu with ram could be hotadd and hotremoved.
> Now you want to put page table and vmemmap on first node.
> The system would not boot as not enough memory for cover whole system RAM.

Even if we solve your above mentions, the system cannot boot.
In this case, user should:
   o add ram to first cpu
   o decreases hotpluggable ram by :
     - changing hotpluggable information of SRAT
     - using movablemem_map=nn[KMG]@ss[KMG]

Thansk,
Yasuaki Ishimatsu

>
> e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just
> reverted now.
>
> Thanks
>
> Yinghai
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



  reply	other threads:[~2013-02-27  0:53 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-25 15:02 sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tim Gardner
2013-02-25 15:32 ` Tim Gardner
2013-02-25 21:27   ` Don Morris
2013-02-25 22:50     ` Yinghai Lu
2013-02-26  0:35       ` Yinghai Lu
2013-02-26  2:06         ` Yinghai Lu
2013-02-26  3:21           ` Martin Bligh
2013-02-26  4:20             ` Yinghai Lu
2013-02-26  4:51               ` Martin Bligh
2013-02-26  6:09                 ` Tang Chen
2013-02-26  6:57                   ` Yinghai Lu
2013-02-26  7:29                     ` Tang Chen
2013-02-26  7:53                     ` Yasuaki Ishimatsu
2013-03-01  6:37                 ` H. Peter Anvin
2013-03-01  8:05                   ` Yinghai Lu
2013-03-01 10:59                   ` Ingo Molnar
2013-03-01 11:03                   ` Borislav Petkov
2013-03-01 11:24                     ` Ingo Molnar
2013-03-01 15:32                       ` H. Peter Anvin
2013-02-26  1:51       ` Tang Chen
2013-02-26 21:36       ` Yinghai Lu
2013-02-26 22:44         ` Yinghai Lu
2013-02-27  0:52           ` Yasuaki Ishimatsu [this message]
2013-02-27  2:30             ` Yinghai Lu
2013-02-27  3:38               ` Yasuaki Ishimatsu
2013-02-27  4:04                 ` Yinghai Lu
2013-02-27  4:43                   ` Yasuaki Ishimatsu
2013-02-27  5:11                     ` Yinghai Lu
2013-02-27  5:49                       ` Yasuaki Ishimatsu
2013-02-27  6:54                         ` Yinghai Lu
2013-02-27  7:11                           ` Tang Chen
2013-02-27  7:25                             ` Yinghai Lu
2013-02-27  7:44                               ` Tang Chen
2013-02-28 16:07                                 ` Yinghai Lu
2013-03-01  1:39                                   ` Tang Chen
2013-02-27  8:00                       ` Lai Jiangshan
2013-02-27 21:26                         ` Andrew Morton
2013-02-28 10:01                           ` Tang Chen
2013-03-01  3:13                           ` Linus Torvalds
2013-03-01  3:46                             ` Tang Chen
2013-03-01  4:32                               ` Linus Torvalds
2013-03-01  4:38                                 ` H. Peter Anvin
     [not found]                                   ` <CAE9FiQXb7K=QTR4PgMdNSoPm2LgYkxAuXUUZ0BXtgicQOGOaUA@mail.gmail.com>
2013-03-01  6:02                                     ` Yasuaki Ishimatsu
2013-03-01  7:55                                       ` Yinghai Lu
2013-03-01 15:43                                         ` H. Peter Anvin
2013-03-01 22:51                                         ` [PATCH] x86, ACPI, mm: Revert movablemem_map support Yinghai Lu
2013-03-01  6:18                                     ` sched: CPU #1's llc-sibling CPU #0 is not on the same node! Tang Chen
2013-03-01  8:02                                       ` Yinghai Lu
2013-03-01  8:39                                         ` Yasuaki Ishimatsu
2013-03-01  7:43                                     ` Yinghai Lu
2013-03-01 11:32                                       ` Tang Chen
2013-03-01 19:31                                       ` Yinghai Lu
     [not found]                                         ` <CAD11hGx5N9Eqy5bX-SEv9c7oR6Ehz2pUJwdrK0Q=L4S44RC5gg@mail.gmail.com>
2013-03-02  5:46                                           ` Yinghai Lu
2013-03-01  4:40                                 ` Andrew Morton
2013-02-27 12:40                       ` Don Morris
2013-02-27 16:28             ` Luck, Tony
2013-02-27 17:30               ` Yinghai Lu
2013-02-27 17:50                 ` Luck, Tony
2013-02-27  2:14           ` Tang Chen
2013-02-27  2:24             ` Yinghai Lu
2013-02-27  4:32               ` Tang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512D58C2.1090403@jp.fujitsu.com \
    --to=isimatu.yasuaki@jp.fujitsu.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=don.morris@hp.com \
    --cc=hpa@zytor.com \
    --cc=jarkko.sakkinen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tim.gardner@canonical.com \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trenn@suse.de \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox