Re: [PATCH] mm: add node physical memory range to sysfs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: add node physical memory range to sysfs
Date: Thu, 13 Dec 2012 16:18:47 -0800	[thread overview]
Message-ID: <50CA7067.4080706@linux.vnet.ibm.com> (raw)
In-Reply-To: <1355440542.1823.21.camel@buesod1.americas.hpqcorp.net>

On 12/13/2012 03:15 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 20:49 -0800, Dave Hansen wrote:
>> How is that possible?  If NUMA nodes are defined by distances from CPUs
>> to memory, how could a DIMM have more than a single distance to any
>> given CPU?
> 
> Can't this occur when interleaving emulated nodes with physical ones?

I'm glad you mentioned numa=fake. Its interleaving node configuration
would also make the patch you've proposed completely useless.  Let's say
you've got a two-node system with 16GB of RAM:

|        0        |      1      |

And you use numa=fake=1G, you'll get the interleaved like this:

|0|1|0|1|0|1|0|1|0|1|0|1|0|1|0|1|

The information that is exported from the interface you're proposing
would be:

node0: start_pfn=0  and spanned_pages = 15G
node1: start_pfn=1G and spanned_pages = 15G

In that situation, there is no way, to figure out which DIMM is backed
by a given node since the node ranges overlap.

>>>> How do you plan to use this in practice, btw?
>>>
>>> It started because I needed to recognize the address of a node to remove
>>> it from the e820 mappings and have the system "ignore" the node's
>>> memory.
>>
>> Actually, now that I think about it, can you check in the
>> /sys/devices/system/ directories for memory and nodes?  We have linkages
>> there for each memory section to every NUMA node, and you can also
>> derive the physical address from the phys_index in each section.  That
>> should allow you to work out physical addresses for a given node.
>> 
> I had looked at the memory-hotplug interface but found that this
> 'phys_index' doesn't include holes, while ->node_spanned_pages does.

I'm not sure what you mean.  Each memory section in sysfs accounts for
SECTION_SIZE where sections are 128MB by default on x86_64.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: add node physical memory range to sysfs
Date: Thu, 13 Dec 2012 16:18:47 -0800	[thread overview]
Message-ID: <50CA7067.4080706@linux.vnet.ibm.com> (raw)
In-Reply-To: <1355440542.1823.21.camel@buesod1.americas.hpqcorp.net>

On 12/13/2012 03:15 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 20:49 -0800, Dave Hansen wrote:
>> How is that possible?  If NUMA nodes are defined by distances from CPUs
>> to memory, how could a DIMM have more than a single distance to any
>> given CPU?
> 
> Can't this occur when interleaving emulated nodes with physical ones?

I'm glad you mentioned numa=fake. Its interleaving node configuration
would also make the patch you've proposed completely useless.  Let's say
you've got a two-node system with 16GB of RAM:

|        0        |      1      |

And you use numa=fake=1G, you'll get the interleaved like this:

|0|1|0|1|0|1|0|1|0|1|0|1|0|1|0|1|

The information that is exported from the interface you're proposing
would be:

node0: start_pfn=0  and spanned_pages = 15G
node1: start_pfn=1G and spanned_pages = 15G

In that situation, there is no way, to figure out which DIMM is backed
by a given node since the node ranges overlap.

>>>> How do you plan to use this in practice, btw?
>>>
>>> It started because I needed to recognize the address of a node to remove
>>> it from the e820 mappings and have the system "ignore" the node's
>>> memory.
>>
>> Actually, now that I think about it, can you check in the
>> /sys/devices/system/ directories for memory and nodes?  We have linkages
>> there for each memory section to every NUMA node, and you can also
>> derive the physical address from the phys_index in each section.  That
>> should allow you to work out physical addresses for a given node.
>> 
> I had looked at the memory-hotplug interface but found that this
> 'phys_index' doesn't include holes, while ->node_spanned_pages does.

I'm not sure what you mean.  Each memory section in sysfs accounts for
SECTION_SIZE where sections are 128MB by default on x86_64.

next prev parent reply	other threads:[~2012-12-14  0:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-07 22:34 [PATCH] mm: add node physical memory range to sysfs Davidlohr Bueso
2012-12-07 22:34 ` Davidlohr Bueso
2012-12-07 23:51 ` Andrew Morton
2012-12-07 23:51   ` Andrew Morton
2012-12-08  0:17   ` Dave Hansen
2012-12-08  0:17     ` Dave Hansen
2012-12-13  1:18     ` Davidlohr Bueso
2012-12-13  1:18       ` Davidlohr Bueso
2012-12-13  1:48       ` Dave Hansen
2012-12-13  1:48         ` Dave Hansen
2012-12-13  2:03         ` Davidlohr Bueso
2012-12-13  2:03           ` Davidlohr Bueso
2012-12-13  4:49           ` Dave Hansen
2012-12-13  4:49             ` Dave Hansen
2012-12-13 15:17             ` KOSAKI Motohiro
2012-12-13 15:17               ` KOSAKI Motohiro
2012-12-13 23:15             ` Davidlohr Bueso
2012-12-13 23:15               ` Davidlohr Bueso
2012-12-14  0:18               ` Dave Hansen [this message]
2012-12-14  0:18                 ` Dave Hansen
2012-12-08 19:45 ` Greg Kroah-Hartman
2012-12-08 19:45   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50CA7067.4080706@linux.vnet.ibm.com \
    --to=dave@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=davidlohr.bueso@hp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.