All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
	tglx@linutronix.de, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling
Date: Wed, 02 Mar 2011 13:36:22 -0800	[thread overview]
Message-ID: <4D6EB856.1010004@kernel.org> (raw)
In-Reply-To: <4D6EB2C3.7040704@kernel.org>

On 03/02/2011 01:12 PM, Yinghai Lu wrote:
> On 03/02/2011 07:42 AM, Tejun Heo wrote:
>> Hey,
>>
>> On Wed, Mar 02, 2011 at 06:30:59AM -0800, David Rientjes wrote:
>>> Acked-by: David Rientjes <rientjes@google.com>
>>>
>>> There's also this in numa_emulation() that isn't a safe assumption:
>>>
>>>         /* make sure all emulated nodes are mapped to a physical node */
>>>         for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++)
>>>                 if (emu_nid_to_phys[i] == NUMA_NO_NODE)
>>>                         emu_nid_to_phys[i] = 0;
>>>
>>> Node id 0 is not always online depending on how you setup your SRAT.  I'm 
>>> not sure why emu_nid_to_phys[] would ever map a fake node id that doesn't 
>>> exist to a physical node id rather than NUMA_NO_NODE, so I think it can 
>>> just be removed.  Otherwise, it should be mapped to a physical node id 
>>> that is known to be online.
>>
>> Unless I screwed up, that behavior isn't new.  It just put in a
>> different form.  Looking through the code... Okay, I think node 0
>> always exists.  SRAT PXM isn't used as node number directly.  It goes
>> through acpi_map_pxm_to_node() which allocates nids from 0 up.
>> amdtopology also guarantees the existence of node 0, so I think we're
>> in the safe and that probably is the reason why we had the above
>> behavior in the first place.
>>
>> IIRC, there are other places which assume the existence of node 0.
>> Whether it's a good idea or not, I'm not sure but requring node 0 to
>> be always allocated doesn't sound too wrong to me.  Maybe we can add
>> BUG_ON() if node 0 is offline somewhere.
> 
> 
> When first socket does not have memory, we will not node 0 online.
> and cpu_to_node() will have those cpus round to near node like node1 or node7.
> 
> BTW: this conf get broken several times, and get fixed several times.

david,

it looks like numa emu does not support that conf already.

old code:
void __cpuinit numa_add_cpu(int cpu)
{
        unsigned long addr;
        u16 apicid;
        int physnid;
        int nid = NUMA_NO_NODE;

        apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
        if (apicid != BAD_APICID)
                nid = apicid_to_node[apicid];
        if (nid == NUMA_NO_NODE)
                nid = early_cpu_to_node(cpu);
        BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));


current code:
void __cpuinit numa_add_cpu(int cpu)
{
        int physnid, nid;

        nid = numa_cpu_node(cpu);
        if (nid == NUMA_NO_NODE)
                nid = early_cpu_to_node(cpu);
        BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));

        physnid = emu_nid_to_phys[nid];

        /*
         * Map the cpu to each emulated node that is allocated on the physical
         * node of the cpu's apic id.
         */
        for_each_online_node(nid)
                if (emu_nid_to_phys[nid] == physnid)
                        cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
}


please note numa_cpu_node or old code will return nid that is node 0, and even node0 does not mem and not onlined.

maybe we can just change to nid = cpu_to_node() to get nodeid that is onlined.

Thanks

Yinghai

  reply	other threads:[~2011-03-02 21:37 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-24 14:51 [GIT PULL tip:x86/mm] Tejun Heo
2011-02-24 14:52 ` [GIT PULL tip:x86/mm] bootmem,x86: cleanup changes Tejun Heo
2011-02-24 19:08 ` [GIT PULL tip:x86/mm] Yinghai Lu
2011-02-24 19:23   ` Ingo Molnar
2011-02-24 19:28     ` Yinghai Lu
2011-02-24 19:32       ` Ingo Molnar
2011-02-24 19:46         ` Tejun Heo
2011-02-24 22:46           ` [patch] x86, mm: Fix size of numa_distance array David Rientjes
2011-02-24 23:30             ` Yinghai Lu
2011-02-24 23:31             ` David Rientjes
2011-02-25  9:05               ` Tejun Heo
2011-02-25  9:03             ` Tejun Heo
2011-02-25 10:58               ` Tejun Heo
2011-02-25 11:05                 ` Tejun Heo
2011-02-25  9:11             ` [PATCH x86-mm] x86-64, NUMA: " Tejun Heo
2011-03-01 17:18       ` [GIT PULL tip:x86/mm] David Rientjes
2011-03-01 18:25         ` Tejun Heo
2011-03-01 22:19         ` Yinghai Lu
2011-03-02  9:17           ` Tejun Heo
2011-03-02 10:04         ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Tejun Heo
2011-03-02 10:07           ` Ingo Molnar
2011-03-02 10:15             ` Tejun Heo
2011-03-02 10:36               ` Ingo Molnar
2011-03-02 10:25           ` [PATCH x86/mm UPDATED] " Tejun Heo
2011-03-02 10:39             ` [PATCH x86/mm] x86-64, NUMA: Better explain numa_distance handling Tejun Heo
2011-03-02 10:42               ` [PATCH UPDATED " Tejun Heo
2011-03-02 14:31                 ` David Rientjes
2011-03-02 14:30             ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling David Rientjes
2011-03-02 15:42               ` Tejun Heo
2011-03-02 21:12                 ` Yinghai Lu
2011-03-02 21:36                   ` Yinghai Lu [this message]
2011-03-03 20:07                     ` David Rientjes
2011-03-04 14:32                       ` Tejun Heo
2011-03-03 20:04                   ` David Rientjes
2011-03-03 20:00                 ` David Rientjes
2011-03-04 15:31               ` [PATCH x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() handling Tejun Heo
2011-03-04 21:33                 ` David Rientjes
2011-03-05  7:50                   ` Tejun Heo
2011-03-05 15:50               ` [tip:x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() tip-bot for Tejun Heo
2011-03-02 16:16             ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling Yinghai Lu
2011-03-02 16:37               ` Tejun Heo
2011-03-02 16:46                 ` Yinghai Lu
2011-03-02 16:55                   ` Tejun Heo
2011-03-02 18:52                     ` Yinghai Lu
2011-03-02 19:02                       ` Tejun Heo
2011-03-02 19:06                         ` Yinghai Lu
2011-03-02 19:13                           ` Tejun Heo
2011-03-02 20:32                             ` Yinghai Lu
2011-03-02 20:57                               ` Tejun Heo
2011-03-02 21:14                                 ` Yinghai Lu
2011-03-03  6:17                                   ` Tejun Heo
2011-03-10 18:46                                     ` Yinghai Lu
2011-03-11  8:29                                       ` Tejun Heo
2011-03-11  8:33                                         ` Tejun Heo
2011-03-11 15:48                                           ` Yinghai Lu
2011-03-11 15:54                                             ` Tejun Heo
2011-03-11 18:02                                               ` Yinghai Lu
2011-03-11 18:19                                                 ` Tejun Heo
2011-03-11 18:25                                                   ` Yinghai Lu
2011-03-11 18:29                                                     ` Tejun Heo
2011-03-11 18:45                                                       ` Yinghai Lu
2011-03-11  9:31                                         ` [PATCH x86/mm] x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation Tejun Heo
2011-03-11 15:42                                           ` Yinghai Lu
2011-03-11 16:03                                             ` Tejun Heo
2011-03-11 19:05                                           ` Yinghai Lu
2011-03-02 10:43           ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Ingo Molnar
2011-03-02 10:53             ` Tejun Heo
2011-03-02 10:59               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6EB856.1010004@kernel.org \
    --to=yinghai@kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.