public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vmallocinfo: Add NUMA informations
@ 2008-06-02  6:54 Eric Dumazet
  2008-06-02  7:09 ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2008-06-02  6:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Nick Piggin, Hugh Dickins, KOSAKI Motohiro,
	linux kernel

[-- Attachment #1: Type: text/plain, Size: 4047 bytes --]

Christoph recently added /proc/vmallocinfo file to get information about 
vmalloc allocations.

This patch adds NUMA specific information, giving number of pages 
allocated on each memory node.

This should help to check that vmalloc() is able to respect NUMA policies.

Example of output on a four nodes machine (one cpu per node)

1) network hash tables are evenly spreaded on four nodes (OK)
   (Same point for inodes and dentries hash tables)
2) iptables tables (x_tables) are correctly allocated on each cpu node (OK).
3) sys_swapon() allocates its memory from one node only.
4) each loaded module is using memory on one node.

Sysadmins could tune their setup to change points 3) and 4) if necessary.

grep "pages="  /proc/vmallocinfo
0xffffc20000000000-0xffffc20000201000 2101248 
alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 
N2=128 N3=128
0xffffc20000201000-0xffffc20000302000 1052672 
alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 
N3=64
0xffffc2000031a000-0xffffc2000031d000   12288 
alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
0xffffc2000031f000-0xffffc2000032b000   49152 
cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
0xffffc2000033e000-0xffffc20000341000   12288 sys_swapon+0x640/0xac0 
pages=2 vmalloc N0=2
0xffffc20000341000-0xffffc20000344000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
0xffffc20000344000-0xffffc20000347000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
0xffffc20000347000-0xffffc2000034a000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
0xffffc2000034a000-0xffffc2000034d000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
0xffffc20004381000-0xffffc20004402000  528384 
alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 
N3=32
0xffffc20004402000-0xffffc20004803000 4198400 
alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 
N1=256 N2=256 N3=256
0xffffc20004803000-0xffffc20004904000 1052672 
alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 
N3=64
0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 
pages=743 vmalloc vpages N0=743
0xffffffffa0000000-0xffffffffa000f000   61440 
sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
0xffffffffa000f000-0xffffffffa0014000   20480 
sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
0xffffffffa0014000-0xffffffffa0017000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
0xffffffffa0017000-0xffffffffa0022000   45056 
sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
0xffffffffa0022000-0xffffffffa0028000   24576 
sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
0xffffffffa0028000-0xffffffffa0050000  163840 
sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
0xffffffffa0050000-0xffffffffa0052000    8192 
sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
0xffffffffa0052000-0xffffffffa0056000   16384 
sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
0xffffffffa0056000-0xffffffffa0081000  176128 
sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
0xffffffffa0081000-0xffffffffa00ae000  184320 
sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
0xffffffffa00ae000-0xffffffffa00b1000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00b1000-0xffffffffa00b9000   32768 
sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
0xffffffffa00b9000-0xffffffffa00c4000   45056 
sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
0xffffffffa00c6000-0xffffffffa00e0000  106496 
sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
0xffffffffa00e0000-0xffffffffa00f1000   69632 
sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
0xffffffffa00f1000-0xffffffffa00f4000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00f4000-0xffffffffa00f7000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 mm/vmalloc.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+)



[-- Attachment #2: vmallocinfo_numa.patch --]
[-- Type: text/plain, Size: 943 bytes --]

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6e45b0f..d1e6594 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -931,6 +931,27 @@ static void s_stop(struct seq_file *m, void *p)
 	read_unlock(&vmlist_lock);
 }
 
+static void show_numa_infos(struct seq_file *m, struct vm_struct *v)
+{
+	if (NUMA_BUILD) {
+		unsigned int *counters, nr;
+
+		counters = kzalloc(MAX_NUMNODES * sizeof(unsigned int),
+				   GFP_KERNEL);
+		if (!counters)
+			return;
+
+		for (nr = 0; nr < v->nr_pages; nr++)
+			counters[page_to_nid(v->pages[nr])]++;
+
+		for (nr = 0; nr < MAX_NUMNODES; nr++)
+			if (counters[nr])
+				seq_printf(m, " N%u=%u", nr, counters[nr]);
+
+		kfree(counters);
+	}
+}
+
 static int s_show(struct seq_file *m, void *p)
 {
 	struct vm_struct *v = p;
@@ -967,6 +988,7 @@ static int s_show(struct seq_file *m, void *p)
 	if (v->flags & VM_VPAGES)
 		seq_printf(m, " vpages");
 
+	show_numa_infos(m, v);
 	seq_putc(m, '\n');
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-02  6:54 [PATCH] vmallocinfo: Add NUMA informations Eric Dumazet
@ 2008-06-02  7:09 ` KOSAKI Motohiro
  2008-06-03  3:37   ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2008-06-02  7:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kosaki.motohiro, Andrew Morton, Christoph Lameter, Nick Piggin,
	Hugh Dickins, linux kernel

Hi

> Christoph recently added /proc/vmallocinfo file to get information about 
> vmalloc allocations.
> 
> This patch adds NUMA specific information, giving number of pages 
> allocated on each memory node.
> 
> This should help to check that vmalloc() is able to respect NUMA policies.

good patch :)


> +static void show_numa_infos(struct seq_file *m, struct vm_struct *v)
> +{
> +	if (NUMA_BUILD) {
> +		unsigned int *counters, nr;
> +
> +		counters = kzalloc(MAX_NUMNODES * sizeof(unsigned int),
> +				   GFP_KERNEL);
> +		if (!counters)
> +			return;
> +
> +		for (nr = 0; nr < v->nr_pages; nr++)
> +			counters[page_to_nid(v->pages[nr])]++;
> +
> +		for (nr = 0; nr < MAX_NUMNODES; nr++)
> +			if (counters[nr])
> +				seq_printf(m, " N%u=%u", nr, counters[nr]);
> +

for_each_node_state(n, N_HIGH_MEMORY) is better?
because MAX_NUMNODES has offten very large value.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-02  7:09 ` KOSAKI Motohiro
@ 2008-06-03  3:37   ` Eric Dumazet
  2008-06-03  4:35     ` KOSAKI Motohiro
  2008-06-03 21:40     ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2008-06-03  3:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Christoph Lameter, Nick Piggin, Hugh Dickins,
	linux kernel

[-- Attachment #1: Type: text/plain, Size: 5307 bytes --]

KOSAKI Motohiro a écrit :
> Hi
>
>   
>> Christoph recently added /proc/vmallocinfo file to get information about 
>> vmalloc allocations.
>>
>> This patch adds NUMA specific information, giving number of pages 
>> allocated on each memory node.
>>
>> This should help to check that vmalloc() is able to respect NUMA policies.
>>     
>
> good patch :)
>
>
>   
>> +static void show_numa_infos(struct seq_file *m, struct vm_struct *v)
>> +{
>> +	if (NUMA_BUILD) {
>> +		unsigned int *counters, nr;
>> +
>> +		counters = kzalloc(MAX_NUMNODES * sizeof(unsigned int),
>> +				   GFP_KERNEL);
>> +		if (!counters)
>> +			return;
>> +
>> +		for (nr = 0; nr < v->nr_pages; nr++)
>> +			counters[page_to_nid(v->pages[nr])]++;
>> +
>> +		for (nr = 0; nr < MAX_NUMNODES; nr++)
>> +			if (counters[nr])
>> +				seq_printf(m, " N%u=%u", nr, counters[nr]);
>> +
>>     
>
> for_each_node_state(n, N_HIGH_MEMORY) is better?
> because MAX_NUMNODES has offten very large value.
>
>
>
>   
Yes, good idea, thank you.

I also used nr_node_ids instead of MAX_NUMNODES in this second version :

[PATCH] vmallocinfo: Add NUMA informations

Christoph recently added /proc/vmallocinfo file to get information about 
vmalloc allocations.

This patch adds NUMA specific information, giving number of pages 
allocated on each memory node.

This should help to check that vmalloc() is able to respect NUMA policies.

Example of output on a four nodes machine (one cpu per node)

1) network hash tables are evenly spreaded on four nodes (OK)
  (Same point for inodes and dentries hash tables)
2) iptables tables (x_tables) are correctly allocated on each cpu node 
(OK).
3) sys_swapon() allocates its memory from one node only.
4) each loaded module is using memory on one node.

Sysadmins could tune their setup to change points 3) and 4) if necessary.

grep "pages="  /proc/vmallocinfo
0xffffc20000000000-0xffffc20000201000 2101248 
alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 
N2=128 N3=128
0xffffc20000201000-0xffffc20000302000 1052672 
alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 
N3=64
0xffffc2000031a000-0xffffc2000031d000   12288 
alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
0xffffc2000031f000-0xffffc2000032b000   49152 
cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
0xffffc2000033e000-0xffffc20000341000   12288 sys_swapon+0x640/0xac0 
pages=2 vmalloc N0=2
0xffffc20000341000-0xffffc20000344000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
0xffffc20000344000-0xffffc20000347000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
0xffffc20000347000-0xffffc2000034a000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
0xffffc2000034a000-0xffffc2000034d000   12288 
xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
0xffffc20004381000-0xffffc20004402000  528384 
alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 
N3=32
0xffffc20004402000-0xffffc20004803000 4198400 
alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 
N1=256 N2=256 N3=256
0xffffc20004803000-0xffffc20004904000 1052672 
alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 
N3=64
0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 
pages=743 vmalloc vpages N0=743
0xffffffffa0000000-0xffffffffa000f000   61440 
sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
0xffffffffa000f000-0xffffffffa0014000   20480 
sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
0xffffffffa0014000-0xffffffffa0017000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
0xffffffffa0017000-0xffffffffa0022000   45056 
sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
0xffffffffa0022000-0xffffffffa0028000   24576 
sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
0xffffffffa0028000-0xffffffffa0050000  163840 
sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
0xffffffffa0050000-0xffffffffa0052000    8192 
sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
0xffffffffa0052000-0xffffffffa0056000   16384 
sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
0xffffffffa0056000-0xffffffffa0081000  176128 
sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
0xffffffffa0081000-0xffffffffa00ae000  184320 
sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
0xffffffffa00ae000-0xffffffffa00b1000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00b1000-0xffffffffa00b9000   32768 
sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
0xffffffffa00b9000-0xffffffffa00c4000   45056 
sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
0xffffffffa00c6000-0xffffffffa00e0000  106496 
sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
0xffffffffa00e0000-0xffffffffa00f1000   69632 
sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
0xffffffffa00f1000-0xffffffffa00f4000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00f4000-0xffffffffa00f7000   12288 
sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 mm/vmalloc.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+)



[-- Attachment #2: vmallocinfo_numa.patch --]
[-- Type: text/plain, Size: 943 bytes --]

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6e45b0f..d2bbd85 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -931,6 +931,27 @@ static void s_stop(struct seq_file *m, void *p)
 	read_unlock(&vmlist_lock);
 }
 
+static void show_numa_infos(struct seq_file *m, struct vm_struct *v)
+{
+	if (NUMA_BUILD) {
+		unsigned int *counters, nr;
+
+		counters = kzalloc(nr_node_ids * sizeof(unsigned int),
+				   GFP_KERNEL);
+		if (!counters)
+			return;
+
+		for (nr = 0; nr < v->nr_pages; nr++)
+			counters[page_to_nid(v->pages[nr])]++;
+
+		for_each_node_state(nr, N_HIGH_MEMORY)
+			if (counters[nr])
+				seq_printf(m, " N%u=%u", nr, counters[nr]);
+
+		kfree(counters);
+	}
+}
+
 static int s_show(struct seq_file *m, void *p)
 {
 	struct vm_struct *v = p;
@@ -967,6 +988,7 @@ static int s_show(struct seq_file *m, void *p)
 	if (v->flags & VM_VPAGES)
 		seq_printf(m, " vpages");
 
+	show_numa_infos(m, v);
 	seq_putc(m, '\n');
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-03  3:37   ` Eric Dumazet
@ 2008-06-03  4:35     ` KOSAKI Motohiro
  2008-06-09 14:14       ` Christoph Lameter
  2008-06-03 21:40     ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2008-06-03  4:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kosaki.motohiro, Andrew Morton, Christoph Lameter, Nick Piggin,
	Hugh Dickins, linux kernel

> >> +		for (nr = 0; nr < MAX_NUMNODES; nr++)
> >> +			if (counters[nr])
> >> +				seq_printf(m, " N%u=%u", nr, counters[nr]);
> >
> > for_each_node_state(n, N_HIGH_MEMORY) is better?
> > because MAX_NUMNODES has offten very large value.
> >   
> Yes, good idea, thank you.
> 
> I also used nr_node_ids instead of MAX_NUMNODES in this second version :

Thank you! looks goot to me.
and, my test fouund no bug.

	Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


Cristoph, What do you think?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-03  3:37   ` Eric Dumazet
  2008-06-03  4:35     ` KOSAKI Motohiro
@ 2008-06-03 21:40     ` Andrew Morton
  2008-06-04 15:01       ` Eric Dumazet
  2008-06-09 14:16       ` Christoph Lameter
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2008-06-03 21:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kosaki.motohiro, clameter, nickpiggin, hugh, linux-kernel

On Tue, 03 Jun 2008 05:37:25 +0200
Eric Dumazet <dada1@cosmosbay.com> wrote:

> [PATCH] vmallocinfo: Add NUMA informations

Using multipart-mixed MIME makes it a bit hard to handle and reply to a
patch.

> Christoph recently added /proc/vmallocinfo file to get information about 
> vmalloc allocations.
> 
> This patch adds NUMA specific information, giving number of pages 
> allocated on each memory node.
> 
> This should help to check that vmalloc() is able to respect NUMA policies.
> 
> Example of output on a four nodes machine (one cpu per node)
> 
> 1) network hash tables are evenly spreaded on four nodes (OK)
>   (Same point for inodes and dentries hash tables)
> 2) iptables tables (x_tables) are correctly allocated on each cpu node 
> (OK).
> 3) sys_swapon() allocates its memory from one node only.
> 4) each loaded module is using memory on one node.
> 
> Sysadmins could tune their setup to change points 3) and 4) if necessary.
> 
> grep "pages="  /proc/vmallocinfo
> 0xffffc20000000000-0xffffc20000201000 2101248 
> alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 
> N2=128 N3=128
> 0xffffc20000201000-0xffffc20000302000 1052672 
> alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 
> N3=64

Yet it did nothing to prevent massive wordwrapping in the changelog :(

> 0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 
> pages=743 vmalloc vpages N0=743
> 0xffffffffa0000000-0xffffffffa000f000   61440 
> sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
> 0xffffffffa000f000-0xffffffffa0014000   20480 
> sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
> 0xffffffffa0014000-0xffffffffa0017000   12288 
> sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
> 0xffffffffa0017000-0xffffffffa0022000   45056 
> sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
> 0xffffffffa0022000-0xffffffffa0028000   24576 
> sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5

akpm:/usr/src/25> grep -ri vmallocinfo Documentation 
akpm:/usr/src/25> 

Sigh.

> 
> [vmallocinfo_numa.patch  text/plain (944B)]
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6e45b0f..d2bbd85 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -931,6 +931,27 @@ static void s_stop(struct seq_file *m, void *p)
>  	read_unlock(&vmlist_lock);
>  }
>  
> +static void show_numa_infos(struct seq_file *m, struct vm_struct *v)

"show_numa_info" would be more grammatical.

> +{
> +	if (NUMA_BUILD) {
> +		unsigned int *counters, nr;
> +
> +		counters = kzalloc(nr_node_ids * sizeof(unsigned int),

This is kcalloc().  If you like that sorts of thing - I think kcalloc()
is pretty pointless personally.

> +				   GFP_KERNEL);

We're running under read_lock(&vmlist_lock) here, aren't we?  If so,
please tape Documentation/SubmitChecklist to the bathroom door.  If
not, what prevents *v from vanishing?

Do we actually need dynamic allocation here?  There's a small,
constant, known-at-compile-time upper bound to the number of nodes IDs?


> +		if (!counters)
> +			return;

Will this just lock up until some memory comes free?

> +		for (nr = 0; nr < v->nr_pages; nr++)
> +			counters[page_to_nid(v->pages[nr])]++;
> +
> +		for_each_node_state(nr, N_HIGH_MEMORY)
> +			if (counters[nr])
> +				seq_printf(m, " N%u=%u", nr, counters[nr]);
> +
> +		kfree(counters);
> +	}
> +}
> +
>  static int s_show(struct seq_file *m, void *p)
>  {
>  	struct vm_struct *v = p;
> @@ -967,6 +988,7 @@ static int s_show(struct seq_file *m, void *p)
>  	if (v->flags & VM_VPAGES)
>  		seq_printf(m, " vpages");
>  
> +	show_numa_infos(m, v);
>  	seq_putc(m, '\n');
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-03 21:40     ` Andrew Morton
@ 2008-06-04 15:01       ` Eric Dumazet
  2008-06-04 15:33         ` Randy Dunlap
  2008-06-09 14:19         ` Christoph Lameter
  2008-06-09 14:16       ` Christoph Lameter
  1 sibling, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2008-06-04 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kosaki.motohiro, clameter, nickpiggin, hugh, linux-kernel

Andrew Morton a écrit :
> On Tue, 03 Jun 2008 05:37:25 +0200
> Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
>> [PATCH] vmallocinfo: Add NUMA informations
> 
> Using multipart-mixed MIME makes it a bit hard to handle and reply to a
> patch.

OK, I need to play around with my mailer, sorry.

>> +		counters = kzalloc(nr_node_ids * sizeof(unsigned int),
> 
> This is kcalloc().  If you like that sorts of thing - I think kcalloc()
> is pretty pointless personally.

OK, next patch is using kmalloc() anyway :)

> 
>> +				   GFP_KERNEL);
> 
> We're running under read_lock(&vmlist_lock) here, aren't we?  If so,
> please tape Documentation/SubmitChecklist to the bathroom door.  If
> not, what prevents *v from vanishing?
> 
> Do we actually need dynamic allocation here?  There's a small,
> constant, known-at-compile-time upper bound to the number of nodes IDs?
> 

MAX_NUMNODES can be pretty large so this array cannot be on stack unfortunatly.
So I guess we have to allocate this block in vmalloc_open() and pass it in seq->private.
Adding #ifdef stuf to avoid dynamic allocations for small MAX_NUMNODES would be overkill.

> 
>> +		if (!counters)
>> +			return;
> 
> Will this just lock up until some memory comes free?

Yes, my bad.

Thanks very much Andrew for this review.

Here is an updated patch.

It now allocates the array in vmalloc_open().
If this allocation fails, we just proceed and dont provide NUMA information.

I included missing documentation for /proc/vmallocinfo as well.


[PATCH] vmallocinfo: Add NUMA informations

Christoph recently added /proc/vmallocinfo file to get information about vmalloc allocations.

This patch adds NUMA specific information, giving number of pages allocated on each memory node.

This should help to check that vmalloc() is able to respect NUMA policies.

Example of output on a four nodes machine (one cpu per node)

1) network hash tables are evenly spreaded on four nodes (OK)
 (Same point for inodes and dentries hash tables)
2) iptables tables (x_tables) are correctly allocated on each cpu node (OK).
3) sys_swapon() allocates its memory from one node only.
4) each loaded module is using memory on one node.

Sysadmins could tune their setup to change points 3) and 4) if necessary.

grep "pages="  /proc/vmallocinfo
0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
0xffffc2000031a000-0xffffc2000031d000   12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
0xffffc2000033e000-0xffffc20000341000   12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2
0xffffc20000341000-0xffffc20000344000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
0xffffc20000344000-0xffffc20000347000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
0xffffc20000347000-0xffffc2000034a000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
0xffffc2000034a000-0xffffc2000034d000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
0xffffc20004381000-0xffffc20004402000  528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32
0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256
0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743
0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
0xffffffffa0022000-0xffffffffa0028000   24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
0xffffffffa0028000-0xffffffffa0050000  163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
0xffffffffa0050000-0xffffffffa0052000    8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
0xffffffffa0052000-0xffffffffa0056000   16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
0xffffffffa0056000-0xffffffffa0081000  176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
0xffffffffa0081000-0xffffffffa00ae000  184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
0xffffffffa00ae000-0xffffffffa00b1000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00b1000-0xffffffffa00b9000   32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
0xffffffffa00b9000-0xffffffffa00c4000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
0xffffffffa00c6000-0xffffffffa00e0000  106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
0xffffffffa00e0000-0xffffffffa00f1000   69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
0xffffffffa00f1000-0xffffffffa00f4000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00f4000-0xffffffffa00f7000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 Documentation/filesystems/proc.txt |   44 +++++++++++++++++++++++++++
 fs/proc/proc_misc.c                |   15 +++++++--
 mm/vmalloc.c                       |   20 ++++++++++++
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index dbc3c6a..b707d9a 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -296,6 +296,7 @@ Table 1-4: Kernel info in /proc
  uptime      System uptime                                     
  version     Kernel version                                    
  video	     bttv info of video resources			(2.4)
+ vmallocinfo Show vmalloced areas
 ..............................................................................
 
 You can,  for  example,  check  which interrupts are currently in use and what
@@ -550,6 +551,49 @@ VmallocTotal: total size of vmalloc memory area
  VmallocUsed: amount of vmalloc area which is used
 VmallocChunk: largest contigious block of vmalloc area which is free
 
+..............................................................................
+
+vmallocinfo:
+
+Provides information about vmalloced/vmaped areas. One line per area,
+containing the virtual address range of the area, size in bytes,
+caller information of the creator, and optional informations depending
+on the kind of area :
+
+ pages=nr    number of pages
+ phys=addr   if a physical address was specified
+ ioremap     I/O mapping (ioremap() and friends)
+ vmalloc     vmalloc() area
+ vmap        vmap()ed pages
+ user        VM_USERMAP area
+ vpages      buffer for pages pointers was vmalloced (huge area)
+ N<node>=nr  (Only on NUMA kernels)
+             Number of pages allocated on memory node <node>
+
+> cat /proc/vmallocinfo
+0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
+  /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
+0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
+  /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
+0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f...
+  phys=7fee8000 ioremap
+0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f...
+  phys=7fee7000 ioremap
+0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210
+0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ...
+  /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
+0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ...
+  pages=2 vmalloc N1=2
+0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ...
+  /0x130 [x_tables] pages=4 vmalloc N0=4
+0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ...
+   pages=14 vmalloc N2=14
+0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ...
+   pages=4 vmalloc N1=4
+0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ...
+   pages=2 vmalloc N1=2
+0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ...
+   pages=10 vmalloc N0=10
 
 1.3 IDE devices in /proc/ide
 ----------------------------
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index 32dc14c..cfdf9e3 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -461,14 +461,25 @@ static const struct file_operations proc_slabstats_operations = {
 #ifdef CONFIG_MMU
 static int vmalloc_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &vmalloc_op);
+	unsigned int *ptr = NULL;
+	int ret;
+
+	if (NUMA_BUILD)
+		ptr = kmalloc(nr_node_ids * sizeof(unsigned int), GFP_KERNEL);
+	ret = seq_open(file, &vmalloc_op);
+	if (!ret) {
+		struct seq_file *m = file->private_data;
+		m->private = ptr;
+	} else
+		kfree(ptr);
+	return ret;
 }
 
 static const struct file_operations proc_vmalloc_operations = {
 	.open		= vmalloc_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release,
+	.release	= seq_release_private,
 };
 #endif
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6e45b0f..35f2938 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -931,6 +931,25 @@ static void s_stop(struct seq_file *m, void *p)
 	read_unlock(&vmlist_lock);
 }
 
+static void show_numa_info(struct seq_file *m, struct vm_struct *v)
+{
+	if (NUMA_BUILD) {
+		unsigned int nr, *counters = m->private;
+
+		if (!counters)
+			return;
+
+		memset(counters, 0, nr_node_ids * sizeof(unsigned int));
+
+		for (nr = 0; nr < v->nr_pages; nr++)
+			counters[page_to_nid(v->pages[nr])]++;
+
+		for_each_node_state(nr, N_HIGH_MEMORY)
+			if (counters[nr])
+				seq_printf(m, " N%u=%u", nr, counters[nr]);
+	}
+}
+
 static int s_show(struct seq_file *m, void *p)
 {
 	struct vm_struct *v = p;
@@ -967,6 +986,7 @@ static int s_show(struct seq_file *m, void *p)
 	if (v->flags & VM_VPAGES)
 		seq_printf(m, " vpages");
 
+	show_numa_info(m, v);
 	seq_putc(m, '\n');
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-04 15:01       ` Eric Dumazet
@ 2008-06-04 15:33         ` Randy Dunlap
  2008-06-09 14:19         ` Christoph Lameter
  1 sibling, 0 replies; 12+ messages in thread
From: Randy Dunlap @ 2008-06-04 15:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, kosaki.motohiro, clameter, nickpiggin, hugh,
	linux-kernel

On Wed, 04 Jun 2008 17:01:04 +0200 Eric Dumazet wrote:

> Here is an updated patch.
> 
> It now allocates the array in vmalloc_open().
> If this allocation fails, we just proceed and dont provide NUMA information.
> 
> I included missing documentation for /proc/vmallocinfo as well.
> 
> 
> [PATCH] vmallocinfo: Add NUMA informations
> 
> Christoph recently added /proc/vmallocinfo file to get information about vmalloc allocations.
> 
> This patch adds NUMA specific information, giving number of pages allocated on each memory node.
> 
> This should help to check that vmalloc() is able to respect NUMA policies.
> 
> Example of output on a four nodes machine (one cpu per node)
> 
> 1) network hash tables are evenly spreaded on four nodes (OK)
>  (Same point for inodes and dentries hash tables)
> 2) iptables tables (x_tables) are correctly allocated on each cpu node (OK).
> 3) sys_swapon() allocates its memory from one node only.
> 4) each loaded module is using memory on one node.
> 
> Sysadmins could tune their setup to change points 3) and 4) if necessary.
> 
> grep "pages="  /proc/vmallocinfo
> [snip]

> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
>  Documentation/filesystems/proc.txt |   44 +++++++++++++++++++++++++++
>  fs/proc/proc_misc.c                |   15 +++++++--
>  mm/vmalloc.c                       |   20 ++++++++++++
>  3 files changed, 77 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index dbc3c6a..b707d9a 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -296,6 +296,7 @@ Table 1-4: Kernel info in /proc
>   uptime      System uptime                                     
>   version     Kernel version                                    
>   video	     bttv info of video resources			(2.4)
> + vmallocinfo Show vmalloced areas
>  ..............................................................................
>  
>  You can,  for  example,  check  which interrupts are currently in use and what
> @@ -550,6 +551,49 @@ VmallocTotal: total size of vmalloc memory area
>   VmallocUsed: amount of vmalloc area which is used
>  VmallocChunk: largest contigious block of vmalloc area which is free
>  
> +..............................................................................
> +
> +vmallocinfo:
> +
> +Provides information about vmalloced/vmaped areas. One line per area,
> +containing the virtual address range of the area, size in bytes,
> +caller information of the creator, and optional informations depending

s/informations/information/

> +on the kind of area :
> +
> + pages=nr    number of pages
> + phys=addr   if a physical address was specified
> + ioremap     I/O mapping (ioremap() and friends)
> + vmalloc     vmalloc() area
> + vmap        vmap()ed pages
> + user        VM_USERMAP area
> + vpages      buffer for pages pointers was vmalloced (huge area)
> + N<node>=nr  (Only on NUMA kernels)
> +             Number of pages allocated on memory node <node>


---
~Randy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-03  4:35     ` KOSAKI Motohiro
@ 2008-06-09 14:14       ` Christoph Lameter
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Lameter @ 2008-06-09 14:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric Dumazet, Andrew Morton, Nick Piggin, Hugh Dickins,
	linux kernel

On Tue, 3 Jun 2008, KOSAKI Motohiro wrote:

> Cristoph, What do you think?

Great. I think I acked part of this before.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-03 21:40     ` Andrew Morton
  2008-06-04 15:01       ` Eric Dumazet
@ 2008-06-09 14:16       ` Christoph Lameter
  2008-06-09 21:05         ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Lameter @ 2008-06-09 14:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric Dumazet, kosaki.motohiro, nickpiggin, hugh, linux-kernel

On Tue, 3 Jun 2008, Andrew Morton wrote:

> > +	if (NUMA_BUILD) {
> > +		unsigned int *counters, nr;
> > +
> > +		counters = kzalloc(nr_node_ids * sizeof(unsigned int),
> 
> This is kcalloc().  If you like that sorts of thing - I think kcalloc()
> is pretty pointless personally.

Same here. I think its generally ignored. I tried to remove it at some 
point in the past. If we want kcalloc then we also need kczalloc. It would 
be best to keep the interface simple.

> Do we actually need dynamic allocation here?  There's a small,
> constant, known-at-compile-time upper bound to the number of nodes IDs?

The number of node ids may reach 1024.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-04 15:01       ` Eric Dumazet
  2008-06-04 15:33         ` Randy Dunlap
@ 2008-06-09 14:19         ` Christoph Lameter
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Lameter @ 2008-06-09 14:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, kosaki.motohiro, nickpiggin, hugh, linux-kernel

On Wed, 4 Jun 2008, Eric Dumazet wrote:

> -	return seq_open(file, &vmalloc_op);
> +	unsigned int *ptr = NULL;
> +	int ret;
> +
> +	if (NUMA_BUILD)
> +		ptr = kmalloc(nr_node_ids * sizeof(unsigned int), GFP_KERNEL);

Maybe a bit of overkill here. nr_node_ids == 1 in the !NUMA case which 
should not do much harm if you alloc the struct anymore.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-09 14:16       ` Christoph Lameter
@ 2008-06-09 21:05         ` Andrew Morton
  2008-06-09 21:12           ` Pekka Enberg
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2008-06-09 21:05 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: dada1, kosaki.motohiro, nickpiggin, hugh, linux-kernel

On Mon, 9 Jun 2008 07:16:48 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 3 Jun 2008, Andrew Morton wrote:
> 
> > > +	if (NUMA_BUILD) {
> > > +		unsigned int *counters, nr;
> > > +
> > > +		counters = kzalloc(nr_node_ids * sizeof(unsigned int),
> > 
> > This is kcalloc().  If you like that sorts of thing - I think kcalloc()
> > is pretty pointless personally.
> 
> Same here. I think its generally ignored. I tried to remove it at some 
> point in the past. If we want kcalloc then we also need kczalloc.

kcalloc() zeroes the returned memory - it's like calloc().

> It would 
> be best to keep the interface simple.

yup.  Oh well, it's not a big deal.

Except the inlined

        if (n != 0 && size > ULONG_MAX / n)
                return NULL;

is a bit bloaty/inefficient.  I expect that it's often the case that
one of `n' and `size' is not a compile-time constant.

otoh, there's one good thing about kcalloc: it actually checks for
multiplicative overflows, whereas the open-coded version often forgets
to do that.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] vmallocinfo: Add NUMA informations
  2008-06-09 21:05         ` Andrew Morton
@ 2008-06-09 21:12           ` Pekka Enberg
  0 siblings, 0 replies; 12+ messages in thread
From: Pekka Enberg @ 2008-06-09 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, dada1, kosaki.motohiro, nickpiggin, hugh,
	linux-kernel

On Tue, Jun 10, 2008 at 12:05 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> yup.  Oh well, it's not a big deal.
>
> Except the inlined
>
>        if (n != 0 && size > ULONG_MAX / n)
>                return NULL;
>
> is a bit bloaty/inefficient.  I expect that it's often the case that
> one of `n' and `size' is not a compile-time constant.

I think it was out-of-line at first but then somebody made it inline
as an optimization...

On Tue, Jun 10, 2008 at 12:05 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> otoh, there's one good thing about kcalloc: it actually checks for
> multiplicative overflows, whereas the open-coded version often forgets
> to do that.

Yes. That's why we did kcalloc() in the first place. kzalloc() (aka
akpmalloc) came in much later.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-06-09 21:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-02  6:54 [PATCH] vmallocinfo: Add NUMA informations Eric Dumazet
2008-06-02  7:09 ` KOSAKI Motohiro
2008-06-03  3:37   ` Eric Dumazet
2008-06-03  4:35     ` KOSAKI Motohiro
2008-06-09 14:14       ` Christoph Lameter
2008-06-03 21:40     ` Andrew Morton
2008-06-04 15:01       ` Eric Dumazet
2008-06-04 15:33         ` Randy Dunlap
2008-06-09 14:19         ` Christoph Lameter
2008-06-09 14:16       ` Christoph Lameter
2008-06-09 21:05         ` Andrew Morton
2008-06-09 21:12           ` Pekka Enberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox