All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	shai@scalex86.org, steiner@sgi.com
Subject: Re: NUMA aware slab allocator V3
Date: Mon, 16 May 2005 10:22:15 -0700	[thread overview]
Message-ID: <1116264135.1005.73.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.62.0505160943140.1330@schroedinger.engr.sgi.com>

On Mon, 2005-05-16 at 09:47 -0700, Christoph Lameter wrote:
> On Mon, 16 May 2005, Dave Hansen wrote:
> > There are some broken assumptions in the kernel that
> > CONFIG_DISCONTIG==CONFIG_NUMA.  These usually manifest when code assumes
> > that one pg_data_t means one NUMA node.
> > 
> > However, NUMA node ids are actually distinct from "discontigmem nodes".
> > A "discontigmem node" is just one physically contiguous area of memory,
> > thus one pg_data_t.  Some (non-NUMA) Mac G5's have a gap in their
> > address space, so they get two discontigmem nodes.
> 
> I thought the discontigous memory in one node was handled through zones? 
> I.e. ZONE_HIGHMEM in i386?

You can only have one zone of each type under each pg_data_t.  For
instance, you can't properly represent (DMA, NORMAL, HIGHMEM, <GAP>,
HIGHMEM) in a single pg_data_t without wasting node_mem_map[] space.
The "proper" discontig way of representing that is like this:

        pg_data_t[0] (DMA, NORMAL, HIGHMEM)
        <GAP>
        pg_data_t[1] (---, ------, HIGHMEM)

Where pg_data_t[1] has empty DMA and NORMAL zones.  Also, remember that
both of these could theoretically be on the same NUMA node.  But, I
don't think we ever do that in practice.

> > So, that #error is bogus.  It's perfectly valid to have multiple
> > discontigmem nodes, when the number of NUMA nodes is 1.  MAX_NUMNODES
> > refers to discontigmem nodes, not NUMA nodes.
> 
> Ok. We looked through the code and saw that the check may be removed 
> without causing problems. However, there is still a feeling of uneasiness 
> about this.

I don't blame you :)

> To what node does numa_node_id() refer?

That refers to the NUMA node that you're thinking of.  Close CPUs and
memory and I/O, etc...

> And it is legit to use 
> numa_node_id() to index cpu maps and stuff?

Yes, those are all NUMA nodes.

> How do the concepts of numa node id relate to discontig node ids?

I believe there are quite a few assumptions on some architectures that,
when NUMA is on, they are equivalent.  It appears to be pretty much
assumed everywhere that CONFIG_NUMA=y means one pg_data_t per NUMA node.

Remember, as you saw, you can't assume that MAX_NUMNODES=1 when NUMA=n
because of the DISCONTIG=y case.

So, in summary, if you want to do it right: use the
CONFIG_NEED_MULTIPLE_NODES that you see in -mm.  As plain DISCONTIG=y
gets replaced by sparsemem any code using this is likely to stay
working.

-- Dave


WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <haveblue@us.ibm.com>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	shai@scalex86.org, steiner@sgi.com
Subject: Re: NUMA aware slab allocator V3
Date: Mon, 16 May 2005 10:22:15 -0700	[thread overview]
Message-ID: <1116264135.1005.73.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.62.0505160943140.1330@schroedinger.engr.sgi.com>

On Mon, 2005-05-16 at 09:47 -0700, Christoph Lameter wrote:
> On Mon, 16 May 2005, Dave Hansen wrote:
> > There are some broken assumptions in the kernel that
> > CONFIG_DISCONTIG==CONFIG_NUMA.  These usually manifest when code assumes
> > that one pg_data_t means one NUMA node.
> > 
> > However, NUMA node ids are actually distinct from "discontigmem nodes".
> > A "discontigmem node" is just one physically contiguous area of memory,
> > thus one pg_data_t.  Some (non-NUMA) Mac G5's have a gap in their
> > address space, so they get two discontigmem nodes.
> 
> I thought the discontigous memory in one node was handled through zones? 
> I.e. ZONE_HIGHMEM in i386?

You can only have one zone of each type under each pg_data_t.  For
instance, you can't properly represent (DMA, NORMAL, HIGHMEM, <GAP>,
HIGHMEM) in a single pg_data_t without wasting node_mem_map[] space.
The "proper" discontig way of representing that is like this:

        pg_data_t[0] (DMA, NORMAL, HIGHMEM)
        <GAP>
        pg_data_t[1] (---, ------, HIGHMEM)

Where pg_data_t[1] has empty DMA and NORMAL zones.  Also, remember that
both of these could theoretically be on the same NUMA node.  But, I
don't think we ever do that in practice.

> > So, that #error is bogus.  It's perfectly valid to have multiple
> > discontigmem nodes, when the number of NUMA nodes is 1.  MAX_NUMNODES
> > refers to discontigmem nodes, not NUMA nodes.
> 
> Ok. We looked through the code and saw that the check may be removed 
> without causing problems. However, there is still a feeling of uneasiness 
> about this.

I don't blame you :)

> To what node does numa_node_id() refer?

That refers to the NUMA node that you're thinking of.  Close CPUs and
memory and I/O, etc...

> And it is legit to use 
> numa_node_id() to index cpu maps and stuff?

Yes, those are all NUMA nodes.

> How do the concepts of numa node id relate to discontig node ids?

I believe there are quite a few assumptions on some architectures that,
when NUMA is on, they are equivalent.  It appears to be pretty much
assumed everywhere that CONFIG_NUMA=y means one pg_data_t per NUMA node.

Remember, as you saw, you can't assume that MAX_NUMNODES=1 when NUMA=n
because of the DISCONTIG=y case.

So, in summary, if you want to do it right: use the
CONFIG_NEED_MULTIPLE_NODES that you see in -mm.  As plain DISCONTIG=y
gets replaced by sparsemem any code using this is likely to stay
working.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2005-05-16 17:22 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-11 15:17 NUMA aware slab allocator V2 Christoph Lameter
2005-05-11 15:17 ` Christoph Lameter
2005-05-11 15:46 ` Jack Steiner
2005-05-11 15:46   ` Jack Steiner
2005-05-12  7:04 ` Andrew Morton
2005-05-12  7:04   ` Andrew Morton
2005-05-12  9:39   ` Niraj kumar
2005-05-12  9:39     ` Niraj kumar
2005-05-12 20:02   ` Christoph Lameter
2005-05-12 20:02     ` Christoph Lameter
2005-05-12 20:22     ` Andrew Morton
2005-05-12 20:22       ` Andrew Morton
2005-05-13  7:06     ` Andrew Morton
2005-05-13  7:06       ` Andrew Morton
2005-05-13 11:21       ` Christoph Lameter
2005-05-13 11:21         ` Christoph Lameter
2005-05-13 11:33         ` Andrew Morton
2005-05-13 11:33           ` Andrew Morton
2005-05-13 11:37           ` Christoph Lameter
2005-05-13 11:37             ` Christoph Lameter
2005-05-13 13:56             ` Dave Hansen
2005-05-13 13:56               ` Dave Hansen
2005-05-13 16:20               ` Christoph Lameter
2005-05-13 16:20                 ` Christoph Lameter
2005-05-14  1:24           ` NUMA aware slab allocator V3 Christoph Lameter
2005-05-14  1:24             ` Christoph Lameter
2005-05-14  7:42             ` Andrew Morton
2005-05-14  7:42               ` Andrew Morton
2005-05-14 16:24               ` Christoph Lameter
2005-05-14 16:24                 ` Christoph Lameter
2005-05-16  5:00                 ` Andrew Morton
2005-05-16  5:00                   ` Andrew Morton
2005-05-16 13:52             ` Dave Hansen
2005-05-16 13:52               ` Dave Hansen
2005-05-16 16:47               ` Christoph Lameter
2005-05-16 16:47                 ` Christoph Lameter
2005-05-16 17:22                 ` Dave Hansen [this message]
2005-05-16 17:22                   ` Dave Hansen
2005-05-16 17:54                   ` Christoph Lameter
2005-05-16 17:54                     ` Christoph Lameter
2005-05-16 18:08                     ` Martin J. Bligh
2005-05-16 18:08                       ` Martin J. Bligh
2005-05-16 21:10                       ` Jesse Barnes
2005-05-16 21:10                         ` Jesse Barnes
2005-05-16 21:21                         ` Martin J. Bligh
2005-05-16 21:21                           ` Martin J. Bligh
2005-05-17  0:14                           ` Christoph Lameter
2005-05-17  0:14                             ` Christoph Lameter
2005-05-17  0:26                             ` Dave Hansen
2005-05-17  0:26                               ` Dave Hansen
2005-05-17 23:36                               ` Matthew Dobson
2005-05-17 23:36                                 ` Matthew Dobson
2005-05-17 23:49                                 ` Christoph Lameter
2005-05-17 23:49                                   ` Christoph Lameter
2005-05-18 17:27                                   ` Matthew Dobson
2005-05-18 17:27                                     ` Matthew Dobson
2005-05-18 17:48                                     ` Christoph Lameter
2005-05-18 17:48                                       ` Christoph Lameter
2005-05-18 21:15                                       ` Matthew Dobson
2005-05-18 21:15                                         ` Matthew Dobson
2005-05-18 21:40                                         ` Christoph Lameter
2005-05-18 21:40                                           ` Christoph Lameter
2005-05-19  5:07                                           ` Christoph Lameter
2005-05-19  5:07                                             ` Christoph Lameter
2005-05-19 16:14                                             ` Jesse Barnes
2005-05-19 16:14                                               ` Jesse Barnes
2005-05-19 19:03                                             ` Matthew Dobson
2005-05-19 19:03                                               ` Matthew Dobson
2005-05-19 21:46                                             ` Matthew Dobson
2005-05-20 19:03                                             ` Matthew Dobson
2005-05-20 19:03                                               ` Matthew Dobson
2005-05-20 19:23                                               ` Christoph Lameter
2005-05-20 19:23                                                 ` Christoph Lameter
2005-05-20 20:20                                                 ` Matthew Dobson
2005-05-20 20:20                                                   ` Matthew Dobson
2005-05-20 21:30                                                 ` Matthew Dobson
2005-05-20 21:30                                                   ` Matthew Dobson
2005-05-20 23:42                                                   ` Christoph Lameter
2005-05-20 23:42                                                     ` Christoph Lameter
2005-05-24 21:37                                                   ` Christoph Lameter
2005-05-24 21:37                                                     ` Christoph Lameter
2005-05-24 23:02                                                     ` Matthew Dobson
2005-05-24 23:02                                                       ` Matthew Dobson
2005-05-25  5:21                                                       ` Christoph Lameter
2005-05-25  5:21                                                         ` Christoph Lameter
2005-05-25 18:27                                                         ` Matthew Dobson
2005-05-25 18:27                                                           ` Matthew Dobson
2005-05-25 21:03                                                           ` Christoph Lameter
2005-05-25 21:03                                                             ` Christoph Lameter
2005-05-26  6:48                                                             ` Martin J. Bligh
2005-05-26  6:48                                                               ` Martin J. Bligh
2005-05-28  1:59                                                       ` NUMA aware slab allocator V4 Christoph Lameter
2005-05-28  1:59                                                         ` Christoph Lameter
2005-05-16 21:54                         ` NUMA aware slab allocator V3 Dave Hansen
2005-05-16 21:54                           ` Dave Hansen
2005-05-16 18:12                     ` Dave Hansen
2005-05-16 18:12                       ` Dave Hansen
2005-05-13 13:46         ` NUMA aware slab allocator V2 Dave Hansen
2005-05-13 13:46           ` Dave Hansen
2005-05-17 23:29       ` Matthew Dobson
2005-05-17 23:29         ` Matthew Dobson
2005-05-18  1:07         ` Christoph Lameter
2005-05-18  1:07           ` Christoph Lameter
2005-05-12 21:49 ` Robin Holt
2005-05-12 21:49   ` Robin Holt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1116264135.1005.73.camel@localhost \
    --to=haveblue@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=clameter@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shai@scalex86.org \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.