linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
	tglx@linutronix.de, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling
Date: Wed, 02 Mar 2011 13:36:22 -0800	[thread overview]
Message-ID: <4D6EB856.1010004@kernel.org> (raw)
In-Reply-To: <4D6EB2C3.7040704@kernel.org>

On 03/02/2011 01:12 PM, Yinghai Lu wrote:
> On 03/02/2011 07:42 AM, Tejun Heo wrote:
>> Hey,
>>
>> On Wed, Mar 02, 2011 at 06:30:59AM -0800, David Rientjes wrote:
>>> Acked-by: David Rientjes <rientjes@google.com>
>>>
>>> There's also this in numa_emulation() that isn't a safe assumption:
>>>
>>>         /* make sure all emulated nodes are mapped to a physical node */
>>>         for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++)
>>>                 if (emu_nid_to_phys[i] == NUMA_NO_NODE)
>>>                         emu_nid_to_phys[i] = 0;
>>>
>>> Node id 0 is not always online depending on how you setup your SRAT.  I'm 
>>> not sure why emu_nid_to_phys[] would ever map a fake node id that doesn't 
>>> exist to a physical node id rather than NUMA_NO_NODE, so I think it can 
>>> just be removed.  Otherwise, it should be mapped to a physical node id 
>>> that is known to be online.
>>
>> Unless I screwed up, that behavior isn't new.  It just put in a
>> different form.  Looking through the code... Okay, I think node 0
>> always exists.  SRAT PXM isn't used as node number directly.  It goes
>> through acpi_map_pxm_to_node() which allocates nids from 0 up.
>> amdtopology also guarantees the existence of node 0, so I think we're
>> in the safe and that probably is the reason why we had the above
>> behavior in the first place.
>>
>> IIRC, there are other places which assume the existence of node 0.
>> Whether it's a good idea or not, I'm not sure but requring node 0 to
>> be always allocated doesn't sound too wrong to me.  Maybe we can add
>> BUG_ON() if node 0 is offline somewhere.
> 
> 
> When first socket does not have memory, we will not node 0 online.
> and cpu_to_node() will have those cpus round to near node like node1 or node7.
> 
> BTW: this conf get broken several times, and get fixed several times.

david,

it looks like numa emu does not support that conf already.

old code:
void __cpuinit numa_add_cpu(int cpu)
{
        unsigned long addr;
        u16 apicid;
        int physnid;
        int nid = NUMA_NO_NODE;

        apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
        if (apicid != BAD_APICID)
                nid = apicid_to_node[apicid];
        if (nid == NUMA_NO_NODE)
                nid = early_cpu_to_node(cpu);
        BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));


current code:
void __cpuinit numa_add_cpu(int cpu)
{
        int physnid, nid;

        nid = numa_cpu_node(cpu);
        if (nid == NUMA_NO_NODE)
                nid = early_cpu_to_node(cpu);
        BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));

        physnid = emu_nid_to_phys[nid];

        /*
         * Map the cpu to each emulated node that is allocated on the physical
         * node of the cpu's apic id.
         */
        for_each_online_node(nid)
                if (emu_nid_to_phys[nid] == physnid)
                        cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
}


please note numa_cpu_node or old code will return nid that is node 0, and even node0 does not mem and not onlined.

maybe we can just change to nid = cpu_to_node() to get nodeid that is onlined.

Thanks

Yinghai

  reply	other threads:[~2011-03-02 21:37 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-24 14:51 [GIT PULL tip:x86/mm] Tejun Heo
2011-02-24 14:52 ` [GIT PULL tip:x86/mm] bootmem,x86: cleanup changes Tejun Heo
2011-02-24 19:08 ` [GIT PULL tip:x86/mm] Yinghai Lu
2011-02-24 19:23   ` Ingo Molnar
2011-02-24 19:28     ` Yinghai Lu
2011-02-24 19:32       ` Ingo Molnar
2011-02-24 19:46         ` Tejun Heo
2011-02-24 22:46           ` [patch] x86, mm: Fix size of numa_distance array David Rientjes
2011-02-24 23:30             ` Yinghai Lu
2011-02-24 23:31             ` David Rientjes
2011-02-25  9:05               ` Tejun Heo
2011-02-25  9:03             ` Tejun Heo
2011-02-25 10:58               ` Tejun Heo
2011-02-25 11:05                 ` Tejun Heo
2011-02-25  9:11             ` [PATCH x86-mm] x86-64, NUMA: " Tejun Heo
2011-03-01 17:18       ` [GIT PULL tip:x86/mm] David Rientjes
2011-03-01 18:25         ` Tejun Heo
2011-03-01 22:19         ` Yinghai Lu
2011-03-02  9:17           ` Tejun Heo
2011-03-02 10:04         ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Tejun Heo
2011-03-02 10:07           ` Ingo Molnar
2011-03-02 10:15             ` Tejun Heo
2011-03-02 10:36               ` Ingo Molnar
2011-03-02 10:25           ` [PATCH x86/mm UPDATED] " Tejun Heo
2011-03-02 10:39             ` [PATCH x86/mm] x86-64, NUMA: Better explain numa_distance handling Tejun Heo
2011-03-02 10:42               ` [PATCH UPDATED " Tejun Heo
2011-03-02 14:31                 ` David Rientjes
2011-03-02 14:30             ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling David Rientjes
2011-03-02 15:42               ` Tejun Heo
2011-03-02 21:12                 ` Yinghai Lu
2011-03-02 21:36                   ` Yinghai Lu [this message]
2011-03-03 20:07                     ` David Rientjes
2011-03-04 14:32                       ` Tejun Heo
2011-03-03 20:04                   ` David Rientjes
2011-03-03 20:00                 ` David Rientjes
2011-03-04 15:31               ` [PATCH x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() handling Tejun Heo
2011-03-04 21:33                 ` David Rientjes
2011-03-05  7:50                   ` Tejun Heo
2011-03-05 15:50               ` [tip:x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() tip-bot for Tejun Heo
2011-03-02 16:16             ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling Yinghai Lu
2011-03-02 16:37               ` Tejun Heo
2011-03-02 16:46                 ` Yinghai Lu
2011-03-02 16:55                   ` Tejun Heo
2011-03-02 18:52                     ` Yinghai Lu
2011-03-02 19:02                       ` Tejun Heo
2011-03-02 19:06                         ` Yinghai Lu
2011-03-02 19:13                           ` Tejun Heo
2011-03-02 20:32                             ` Yinghai Lu
2011-03-02 20:57                               ` Tejun Heo
2011-03-02 21:14                                 ` Yinghai Lu
2011-03-03  6:17                                   ` Tejun Heo
2011-03-10 18:46                                     ` Yinghai Lu
2011-03-11  8:29                                       ` Tejun Heo
2011-03-11  8:33                                         ` Tejun Heo
2011-03-11 15:48                                           ` Yinghai Lu
2011-03-11 15:54                                             ` Tejun Heo
2011-03-11 18:02                                               ` Yinghai Lu
2011-03-11 18:19                                                 ` Tejun Heo
2011-03-11 18:25                                                   ` Yinghai Lu
2011-03-11 18:29                                                     ` Tejun Heo
2011-03-11 18:45                                                       ` Yinghai Lu
2011-03-11  9:31                                         ` [PATCH x86/mm] x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation Tejun Heo
2011-03-11 15:42                                           ` Yinghai Lu
2011-03-11 16:03                                             ` Tejun Heo
2011-03-11 19:05                                           ` Yinghai Lu
2011-03-02 10:43           ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Ingo Molnar
2011-03-02 10:53             ` Tejun Heo
2011-03-02 10:59               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6EB856.1010004@kernel.org \
    --to=yinghai@kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).