All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: <dave@sr71.net>, Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>, <hpa@linux.intel.com>,
	<brice.goglin@gmail.com>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH 0/6] fix topology for multi-NUMA-node CPUs
Date: Mon, 22 Sep 2014 10:54:24 -0500	[thread overview]
Message-ID: <54204630.4050104@amd.com> (raw)
In-Reply-To: <CAOjmkp8EGO0jicmdO=p6ATHz-hUJmWb+xoBLjOdLBUwwGzyhhg@mail.gmail.com>

On 9/22/2014 9:33 AM, Aravind Gopalakrishnan wrote:
>
> This is a big fat RFC.  It takes quite a few liberties with the
> multi-core topology level that I'm not completely comfortable
> with.
>
> It has only been tested lightly.
>
> Full dmesg for a Cluster-on-Die system with this set applied,
> and sched_debug on the command-line is here:
>
> http://sr71.net/~dave/intel/full-dmesg-hswep-20140917.txt 
> <http://sr71.net/%7Edave/intel/full-dmesg-hswep-20140917.txt>
>
> ---
>
> I'm getting the spew below when booting with Haswell (Xeon
> E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
> in the BIOS.  It seems similar to the issue that some folks from
> AMD ran in to on their systems and addressed in this commit:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=161270fc1f9ddfc17154e0d49291472a9cdef7db
>
> Both these Intel and AMD systems break an assumption which is
> being enforced by topology_sane(): a socket may not contain more
> than one NUMA node.
>
> AMD special-cased their system by looking for a cpuid flag. The
> Intel mode is dependent on BIOS options and I do not know of a
> way which it is enumerated other than the tables being parsed
> during the CPU bringup process.
>
> This also fixes sysfs because CPUs with the same 'physical_package_id'
> in /sys/devices/system/cpu/cpu*/topology/ are not listed together
> in the same 'core_siblings_list'.  This violates a statement from
> Documentation/ABI/testing/sysfs-devices-system-cpu:
>
>         core_siblings: internal kernel map of cpu#'s hardware threads
>         within the same physical_package_id.
>
>         core_siblings_list: human-readable list of the logical CPU
>         numbers within the same physical_package_id as cpu#.
>
> The sysfs effects here cause an issue with the hwloc tool where
> it gets confused and thinks there are more sockets than are
> physically present.
>
> Before this set, there are two packages:
>
> # cd /sys/devices/system/cpu/
> # cat cpu*/topology/physical_package_id | sort | uniq -c
>      18 0
>      18 1
>
> But 4 _sets_ of core siblings:
>
> # cat cpu*/topology/core_siblings_list | sort | uniq -c
>       9 0-8
>       9 18-26
>       9 27-35
>       9 9-17
>
> After this set, there are only 2 sets of core siblings, which
> is what we expect for a 2-socket system.
>
> # cat cpu*/topology/physical_package_id | sort | uniq -c
>      18 0
>      18 1
> # cat cpu*/topology/core_siblings_list | sort | uniq -c
>      18 0-17
>      18 18-35
>
>
> Example spew:
> ...
>         NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
>          #2  #3  #4  #5  #6  #7  #8
>         .... node  #1, CPUs:    #9
>         ------------[ cut here ]------------
>         WARNING: CPU: 9 PID: 0 at 
> /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 
> topology_sane.isra.2+0x74/0x90()
>         sched: CPU #9's mc-sibling CPU #0 is not on the same node! 
> [node: 1 != 0]. Ignoring dependency.
>         Modules linked in:
>         CPU: 9 PID: 0 Comm: swapper/9 Not tainted 
> 3.17.0-rc1-00293-g8e01c4d-dirty #631
>         Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
> GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
>         0000000000000009 ffff88046ddabe00 ffffffff8172e485 
> ffff88046ddabe48
>         ffff88046ddabe38 ffffffff8109691d 000000000000b001 
> 0000000000000009
>         ffff88086fc12580 000000000000b020 0000000000000009 
> ffff88046ddabe98
>         Call Trace:
>         [<ffffffff8172e485>] dump_stack+0x45/0x56
>         [<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
>         [<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
>         [<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
>         [<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
>         [<ffffffff8107568d>] start_secondary+0x1ad/0x240
>         ---[ end trace 3fe5f587a9fcde61 ]---
>         #10 #11 #12 #13 #14 #15 #16 #17
>         .... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
>         .... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35


Hi,
I looked at the topology info from sysfs both w/ and w/o the patch 
series and they are identical.
So, the patches seem to work fine on an AMD MCM part.

Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

Thanks,
-Aravind.

      parent reply	other threads:[~2014-09-22 16:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-17 22:33 [RFC][PATCH 0/6] fix topology for multi-NUMA-node CPUs Dave Hansen
2014-09-17 22:33 ` [RFC][PATCH 1/6] topology: rename topology_core_cpumask() to topology_package_cpumask() Dave Hansen
2014-09-17 22:33 ` [RFC][PATCH 2/6] x86: introduce cpumask specifically for the package Dave Hansen
2014-09-18 14:57   ` Peter Zijlstra
2014-09-17 22:33 ` [RFC][PATCH 3/6] x86: use package_map instead of core_map for sysfs Dave Hansen
2014-09-17 22:33 ` [RFC][PATCH 4/6] sched: eliminate "DIE" domain level when NUMA present Dave Hansen
2014-09-18 17:28   ` Peter Zijlstra
2014-09-17 22:33 ` [RFC][PATCH 5/6] sched: keep MC domain from crossing nodes OR packages Dave Hansen
2014-09-17 22:33 ` [RFC][PATCH 6/6] sched: consolidate config options Dave Hansen
2014-09-18 17:29   ` Peter Zijlstra
2014-09-19 19:15     ` Dave Hansen
2014-09-19 23:03       ` Peter Zijlstra
2014-09-18  7:45 ` [RFC][PATCH 0/6] fix topology for multi-NUMA-node CPUs Borislav Petkov
     [not found] ` <CAOjmkp8EGO0jicmdO=p6ATHz-hUJmWb+xoBLjOdLBUwwGzyhhg@mail.gmail.com>
2014-09-22 15:54   ` Aravind Gopalakrishnan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54204630.4050104@amd.com \
    --to=aravind.gopalakrishnan@amd.com \
    --cc=bp@alien8.de \
    --cc=brice.goglin@gmail.com \
    --cc=dave@sr71.net \
    --cc=hpa@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.