public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] shows Active/Inactive on per-node meminfo
@ 2004-08-20 18:02 mita akinobu
  2004-08-20 18:48 ` Jesse Barnes
  0 siblings, 1 reply; 8+ messages in thread
From: mita akinobu @ 2004-08-20 18:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Matthew Dobson

Hello,

The patch below enable to display the size of Active/Inactive pages on per-node
meminfo (/sys/devices/system/node/node%d/meminfo) like /proc/meminfo.

By a little change to procps, "vmstat -a" can show these statistics about
particular node.

Please apply.

Signed-off-by: Akinobu Mita <amgta@yacht.ocn.ne.jp>

 drivers/base/node.c    |   10 ++++++++++
 include/linux/mmzone.h |    2 ++
 mm/page_alloc.c        |   28 +++++++++++++++++++++++-----
 3 files changed, 35 insertions(+), 5 deletions(-)

diff -Nurp linux-2.6.8.1-mm3.orig/drivers/base/node.c linux-2.6.8.1-mm3/drivers/base/node.c
--- linux-2.6.8.1-mm3.orig/drivers/base/node.c	2004-08-21 00:15:43.000000000 +0900
+++ linux-2.6.8.1-mm3/drivers/base/node.c	2004-08-21 00:16:47.000000000 +0900
@@ -38,11 +38,19 @@ static ssize_t node_read_meminfo(struct 
 	int n;
 	int nid = dev->id;
 	struct sysinfo i;
+	unsigned long inactive;
+	unsigned long active;
+	unsigned long free;
+
 	si_meminfo_node(&i, nid);
+	__get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
+
 	n = sprintf(buf, "\n"
 		       "Node %d MemTotal:     %8lu kB\n"
 		       "Node %d MemFree:      %8lu kB\n"
 		       "Node %d MemUsed:      %8lu kB\n"
+		       "Node %d Active:       %8lu kB\n"
+		       "Node %d Inactive:     %8lu kB\n"
 		       "Node %d HighTotal:    %8lu kB\n"
 		       "Node %d HighFree:     %8lu kB\n"
 		       "Node %d LowTotal:     %8lu kB\n"
@@ -50,6 +58,8 @@ static ssize_t node_read_meminfo(struct 
 		       nid, K(i.totalram),
 		       nid, K(i.freeram),
 		       nid, K(i.totalram - i.freeram),
+		       nid, K(active),
+		       nid, K(inactive),
 		       nid, K(i.totalhigh),
 		       nid, K(i.freehigh),
 		       nid, K(i.totalram - i.totalhigh),
diff -Nurp linux-2.6.8.1-mm3.orig/include/linux/mmzone.h linux-2.6.8.1-mm3/include/linux/mmzone.h
--- linux-2.6.8.1-mm3.orig/include/linux/mmzone.h	2004-08-21 00:15:17.000000000 +0900
+++ linux-2.6.8.1-mm3/include/linux/mmzone.h	2004-08-21 00:16:46.000000000 +0900
@@ -272,6 +272,8 @@ typedef struct pglist_data {
 extern int numnodes;
 extern struct pglist_data *pgdat_list;
 
+void __get_zone_counts(unsigned long *active, unsigned long *inactive,
+			unsigned long *free, struct pglist_data *pgdat);
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
diff -Nurp linux-2.6.8.1-mm3.orig/mm/page_alloc.c linux-2.6.8.1-mm3/mm/page_alloc.c
--- linux-2.6.8.1-mm3.orig/mm/page_alloc.c	2004-08-21 00:16:18.000000000 +0900
+++ linux-2.6.8.1-mm3/mm/page_alloc.c	2004-08-21 00:16:47.000000000 +0900
@@ -1072,18 +1072,36 @@ unsigned long __read_page_state(unsigned
 	return ret;
 }
 
+void __get_zone_counts(unsigned long *active, unsigned long *inactive,
+			unsigned long *free, struct pglist_data *pgdat)
+{
+	struct zone *zones = pgdat->node_zones;
+	int i;
+
+	*active = 0;
+	*inactive = 0;
+	*free = 0;
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		*active += zones[i].nr_active;
+		*inactive += zones[i].nr_inactive;
+		*free += zones[i].free_pages;
+	}
+}
+
 void get_zone_counts(unsigned long *active,
 		unsigned long *inactive, unsigned long *free)
 {
-	struct zone *zone;
+	struct pglist_data *pgdat;
 
 	*active = 0;
 	*inactive = 0;
 	*free = 0;
-	for_each_zone(zone) {
-		*active += zone->nr_active;
-		*inactive += zone->nr_inactive;
-		*free += zone->free_pages;
+	for_each_pgdat(pgdat) {
+		unsigned long l, m, n;
+		__get_zone_counts(&l, &m, &n, pgdat);
+		*active += l;
+		*inactive += m;
+		*free += n;
 	}
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-20 18:02 [PATCH] shows Active/Inactive on per-node meminfo mita akinobu
@ 2004-08-20 18:48 ` Jesse Barnes
  2004-08-20 19:01   ` Jesse Barnes
  2004-08-21  6:29   ` mita akinobu
  0 siblings, 2 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-08-20 18:48 UTC (permalink / raw)
  To: mita akinobu; +Cc: linux-kernel, Andrew Morton, Matthew Dobson

On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> +	for (i = 0; i < MAX_NR_ZONES; i++) {
> +		*active += zones[i].nr_active;
> +		*inactive += zones[i].nr_inactive;
> +		*free += zones[i].free_pages;
> +	}
> +}
> +
> -		*free += zone->free_pages;
> +	for_each_pgdat(pgdat) {
> +		unsigned long l, m, n;
> +		__get_zone_counts(&l, &m, &n, pgdat);
> +		*active += l;
> +		*inactive += m;
> +		*free += n;
>  	}

Just FYI, loops like this are going to be very slow on a large machine.  
Iterating over every node in the system involves a TLB miss on every 
iteration along with an offnode reference and possibly cacheline demotion.

Jesse

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-20 18:48 ` Jesse Barnes
@ 2004-08-20 19:01   ` Jesse Barnes
  2004-08-20 22:25     ` Andrew Morton
  2004-08-21  6:29   ` mita akinobu
  1 sibling, 1 reply; 8+ messages in thread
From: Jesse Barnes @ 2004-08-20 19:01 UTC (permalink / raw)
  To: mita akinobu; +Cc: linux-kernel, Andrew Morton, Matthew Dobson

On Friday, August 20, 2004 2:48 pm, Jesse Barnes wrote:
> On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> > +	for (i = 0; i < MAX_NR_ZONES; i++) {
> > +		*active += zones[i].nr_active;
> > +		*inactive += zones[i].nr_inactive;
> > +		*free += zones[i].free_pages;
> > +	}
> > +}
> > +
> > -		*free += zone->free_pages;
> > +	for_each_pgdat(pgdat) {
> > +		unsigned long l, m, n;
> > +		__get_zone_counts(&l, &m, &n, pgdat);
> > +		*active += l;
> > +		*inactive += m;
> > +		*free += n;
> >  	}
>
> Just FYI, loops like this are going to be very slow on a large machine.
> Iterating over every node in the system involves a TLB miss on every
> iteration along with an offnode reference and possibly cacheline demotion.

...but I see that you're just adding the info to the per-node meminfo files, 
so it should be ok as long as people access a node's meminfo file from a 
local cpu.  /proc/meminfo will still hurt a lot though.

I bring this up because I ran into it once.  I created a file 
called /proc/discontig which printed out detailed per-node memory stats, one 
node per line.  On a large system it would literally take several seconds to 
cat the file due to the overhead of looking at all the pages and zone 
structures.

Jesse


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-20 19:01   ` Jesse Barnes
@ 2004-08-20 22:25     ` Andrew Morton
  2004-08-20 23:08       ` Matthew Dobson
  2004-08-21 20:09       ` Jesse Barnes
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2004-08-20 22:25 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: amgta, linux-kernel, colpatch

Jesse Barnes <jbarnes@engr.sgi.com> wrote:
>
> On Friday, August 20, 2004 2:48 pm, Jesse Barnes wrote:
> > On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> > > +	for (i = 0; i < MAX_NR_ZONES; i++) {
> > > +		*active += zones[i].nr_active;
> > > +		*inactive += zones[i].nr_inactive;
> > > +		*free += zones[i].free_pages;
> > > +	}
> > > +}
> > > +
> > > -		*free += zone->free_pages;
> > > +	for_each_pgdat(pgdat) {
> > > +		unsigned long l, m, n;
> > > +		__get_zone_counts(&l, &m, &n, pgdat);
> > > +		*active += l;
> > > +		*inactive += m;
> > > +		*free += n;
> > >  	}
> >
> > Just FYI, loops like this are going to be very slow on a large machine.
> > Iterating over every node in the system involves a TLB miss on every
> > iteration along with an offnode reference and possibly cacheline demotion.
> 
> ...but I see that you're just adding the info to the per-node meminfo files, 
> so it should be ok as long as people access a node's meminfo file from a 
> local cpu.  /proc/meminfo will still hurt a lot though.
> 
> I bring this up because I ran into it once.  I created a file 
> called /proc/discontig which printed out detailed per-node memory stats, one 
> node per line.  On a large system it would literally take several seconds to 
> cat the file due to the overhead of looking at all the pages and zone 
> structures.
> 

So was that an ack, a nack or a quack?

I'll queue the patch up so it doesn't get lost - could you please take a
closer look at the performance implications sometime, let me know?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-20 22:25     ` Andrew Morton
@ 2004-08-20 23:08       ` Matthew Dobson
  2004-08-21 20:09       ` Jesse Barnes
  1 sibling, 0 replies; 8+ messages in thread
From: Matthew Dobson @ 2004-08-20 23:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jesse Barnes, amgta, LKML

On Fri, 2004-08-20 at 15:25, Andrew Morton wrote:
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> >
> > On Friday, August 20, 2004 2:48 pm, Jesse Barnes wrote:
> > > On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> > > > +	for (i = 0; i < MAX_NR_ZONES; i++) {
> > > > +		*active += zones[i].nr_active;
> > > > +		*inactive += zones[i].nr_inactive;
> > > > +		*free += zones[i].free_pages;
> > > > +	}
> > > > +}
> > > > +
> > > > -		*free += zone->free_pages;
> > > > +	for_each_pgdat(pgdat) {
> > > > +		unsigned long l, m, n;
> > > > +		__get_zone_counts(&l, &m, &n, pgdat);
> > > > +		*active += l;
> > > > +		*inactive += m;
> > > > +		*free += n;
> > > >  	}
> > >
> > > Just FYI, loops like this are going to be very slow on a large machine.
> > > Iterating over every node in the system involves a TLB miss on every
> > > iteration along with an offnode reference and possibly cacheline demotion.
> > 
> > ...but I see that you're just adding the info to the per-node meminfo files, 
> > so it should be ok as long as people access a node's meminfo file from a 
> > local cpu.  /proc/meminfo will still hurt a lot though.
> > 
> > I bring this up because I ran into it once.  I created a file 
> > called /proc/discontig which printed out detailed per-node memory stats, one 
> > node per line.  On a large system it would literally take several seconds to 
> > cat the file due to the overhead of looking at all the pages and zone 
> > structures.
> > 
> 
> So was that an ack, a nack or a quack?
> 
> I'll queue the patch up so it doesn't get lost - could you please take a
> closer look at the performance implications sometime, let me know?

It doesn't look like it will have any real impact.  The only function
(besides reading /proc/meminfo and
/sys/devices/system/node/node%d/meminfo files) that is affect by this is
get_zone_counts().  get_zone_counts() is already looping over all zones
on all nodes in the system anyway, this new code does exactly that. 
for_each_zone() loops over all the zones by looping through each pgdat
in the system, which is what the new code does, but more explicitly.  We
might do a little better by inlining __get_zone_counts()?

That said, I haven't run any benchmarks or anything... :)

-Matt


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-20 18:48 ` Jesse Barnes
  2004-08-20 19:01   ` Jesse Barnes
@ 2004-08-21  6:29   ` mita akinobu
  2004-08-21 20:11     ` Jesse Barnes
  1 sibling, 1 reply; 8+ messages in thread
From: mita akinobu @ 2004-08-21  6:29 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel, Andrew Morton, Matthew Dobson

On Saturday 21 August 2004 03:48, Jesse Barnes wrote:
> Just FYI, loops like this are going to be very slow on a large machine.
> Iterating over every node in the system involves a TLB miss on every
> iteration along with an offnode reference and possibly cacheline demotion.

get_zone_counts() is used by max_sane_readahead(), and
max_sane_readahead() is often called in filemap_nopage().

If iterating over every node is going to be very slow, the following change
would have a little bit of improvement on a large machine?


--- linux-2.6.8.1-mm3/mm/readahead.c.orig	2004-08-21 15:18:08.924273720 +0900
+++ linux-2.6.8.1-mm3/mm/readahead.c	2004-08-21 15:22:31.123413392 +0900
@@ -572,6 +572,6 @@ unsigned long max_sane_readahead(unsigne
 	unsigned long inactive;
 	unsigned long free;
 
-	get_zone_counts(&active, &inactive, &free);
+	__get_zone_counts(&active, &inactive, &free, NODE_DATA(numa_node_id()));
 	return min(nr, (inactive + free) / 2);
 }




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-20 22:25     ` Andrew Morton
  2004-08-20 23:08       ` Matthew Dobson
@ 2004-08-21 20:09       ` Jesse Barnes
  1 sibling, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-08-21 20:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: amgta, linux-kernel, colpatch

On Friday, August 20, 2004 6:25 pm, Andrew Morton wrote:
> So was that an ack, a nack or a quack?

Ack.  It looks ok to me, it just reminded me that /proc/meminfo will hurt a 
lot on a big machine, but this patch messes with /sys/.../nodeX/meminfo, so 
it's fine with me, and is in fact quite useful.

Jesse

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] shows Active/Inactive on per-node meminfo
  2004-08-21  6:29   ` mita akinobu
@ 2004-08-21 20:11     ` Jesse Barnes
  0 siblings, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-08-21 20:11 UTC (permalink / raw)
  To: mita akinobu; +Cc: linux-kernel, Andrew Morton, Matthew Dobson

On Saturday, August 21, 2004 2:29 am, mita akinobu wrote:
> get_zone_counts() is used by max_sane_readahead(), and
> max_sane_readahead() is often called in filemap_nopage().
>
> If iterating over every node is going to be very slow, the following change
> would have a little bit of improvement on a large machine?
>
>
> --- linux-2.6.8.1-mm3/mm/readahead.c.orig	2004-08-21 15:18:08.924273720
> +0900 +++ linux-2.6.8.1-mm3/mm/readahead.c	2004-08-21 15:22:31.123413392
> +0900 @@ -572,6 +572,6 @@ unsigned long max_sane_readahead(unsigne
>  	unsigned long inactive;
>  	unsigned long free;
>
> -	get_zone_counts(&active, &inactive, &free);
> +	__get_zone_counts(&active, &inactive, &free, NODE_DATA(numa_node_id()));
>  	return min(nr, (inactive + free) / 2);
>  }

Good catch, this looks good.  for_each_pgdat just raises a red flag for me 
these days. :)

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-08-21 20:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-20 18:02 [PATCH] shows Active/Inactive on per-node meminfo mita akinobu
2004-08-20 18:48 ` Jesse Barnes
2004-08-20 19:01   ` Jesse Barnes
2004-08-20 22:25     ` Andrew Morton
2004-08-20 23:08       ` Matthew Dobson
2004-08-21 20:09       ` Jesse Barnes
2004-08-21  6:29   ` mita akinobu
2004-08-21 20:11     ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox