* [PATCH] shows Active/Inactive on per-node meminfo
@ 2004-08-20 18:02 mita akinobu
2004-08-20 18:48 ` Jesse Barnes
0 siblings, 1 reply; 8+ messages in thread
From: mita akinobu @ 2004-08-20 18:02 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Matthew Dobson
Hello,
The patch below enable to display the size of Active/Inactive pages on per-node
meminfo (/sys/devices/system/node/node%d/meminfo) like /proc/meminfo.
By a little change to procps, "vmstat -a" can show these statistics about
particular node.
Please apply.
Signed-off-by: Akinobu Mita <amgta@yacht.ocn.ne.jp>
drivers/base/node.c | 10 ++++++++++
include/linux/mmzone.h | 2 ++
mm/page_alloc.c | 28 +++++++++++++++++++++++-----
3 files changed, 35 insertions(+), 5 deletions(-)
diff -Nurp linux-2.6.8.1-mm3.orig/drivers/base/node.c linux-2.6.8.1-mm3/drivers/base/node.c
--- linux-2.6.8.1-mm3.orig/drivers/base/node.c 2004-08-21 00:15:43.000000000 +0900
+++ linux-2.6.8.1-mm3/drivers/base/node.c 2004-08-21 00:16:47.000000000 +0900
@@ -38,11 +38,19 @@ static ssize_t node_read_meminfo(struct
int n;
int nid = dev->id;
struct sysinfo i;
+ unsigned long inactive;
+ unsigned long active;
+ unsigned long free;
+
si_meminfo_node(&i, nid);
+ __get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
+
n = sprintf(buf, "\n"
"Node %d MemTotal: %8lu kB\n"
"Node %d MemFree: %8lu kB\n"
"Node %d MemUsed: %8lu kB\n"
+ "Node %d Active: %8lu kB\n"
+ "Node %d Inactive: %8lu kB\n"
"Node %d HighTotal: %8lu kB\n"
"Node %d HighFree: %8lu kB\n"
"Node %d LowTotal: %8lu kB\n"
@@ -50,6 +58,8 @@ static ssize_t node_read_meminfo(struct
nid, K(i.totalram),
nid, K(i.freeram),
nid, K(i.totalram - i.freeram),
+ nid, K(active),
+ nid, K(inactive),
nid, K(i.totalhigh),
nid, K(i.freehigh),
nid, K(i.totalram - i.totalhigh),
diff -Nurp linux-2.6.8.1-mm3.orig/include/linux/mmzone.h linux-2.6.8.1-mm3/include/linux/mmzone.h
--- linux-2.6.8.1-mm3.orig/include/linux/mmzone.h 2004-08-21 00:15:17.000000000 +0900
+++ linux-2.6.8.1-mm3/include/linux/mmzone.h 2004-08-21 00:16:46.000000000 +0900
@@ -272,6 +272,8 @@ typedef struct pglist_data {
extern int numnodes;
extern struct pglist_data *pgdat_list;
+void __get_zone_counts(unsigned long *active, unsigned long *inactive,
+ unsigned long *free, struct pglist_data *pgdat);
void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
void build_all_zonelists(void);
diff -Nurp linux-2.6.8.1-mm3.orig/mm/page_alloc.c linux-2.6.8.1-mm3/mm/page_alloc.c
--- linux-2.6.8.1-mm3.orig/mm/page_alloc.c 2004-08-21 00:16:18.000000000 +0900
+++ linux-2.6.8.1-mm3/mm/page_alloc.c 2004-08-21 00:16:47.000000000 +0900
@@ -1072,18 +1072,36 @@ unsigned long __read_page_state(unsigned
return ret;
}
+void __get_zone_counts(unsigned long *active, unsigned long *inactive,
+ unsigned long *free, struct pglist_data *pgdat)
+{
+ struct zone *zones = pgdat->node_zones;
+ int i;
+
+ *active = 0;
+ *inactive = 0;
+ *free = 0;
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ *active += zones[i].nr_active;
+ *inactive += zones[i].nr_inactive;
+ *free += zones[i].free_pages;
+ }
+}
+
void get_zone_counts(unsigned long *active,
unsigned long *inactive, unsigned long *free)
{
- struct zone *zone;
+ struct pglist_data *pgdat;
*active = 0;
*inactive = 0;
*free = 0;
- for_each_zone(zone) {
- *active += zone->nr_active;
- *inactive += zone->nr_inactive;
- *free += zone->free_pages;
+ for_each_pgdat(pgdat) {
+ unsigned long l, m, n;
+ __get_zone_counts(&l, &m, &n, pgdat);
+ *active += l;
+ *inactive += m;
+ *free += n;
}
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-20 18:02 [PATCH] shows Active/Inactive on per-node meminfo mita akinobu
@ 2004-08-20 18:48 ` Jesse Barnes
2004-08-20 19:01 ` Jesse Barnes
2004-08-21 6:29 ` mita akinobu
0 siblings, 2 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-08-20 18:48 UTC (permalink / raw)
To: mita akinobu; +Cc: linux-kernel, Andrew Morton, Matthew Dobson
On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> + for (i = 0; i < MAX_NR_ZONES; i++) {
> + *active += zones[i].nr_active;
> + *inactive += zones[i].nr_inactive;
> + *free += zones[i].free_pages;
> + }
> +}
> +
> - *free += zone->free_pages;
> + for_each_pgdat(pgdat) {
> + unsigned long l, m, n;
> + __get_zone_counts(&l, &m, &n, pgdat);
> + *active += l;
> + *inactive += m;
> + *free += n;
> }
Just FYI, loops like this are going to be very slow on a large machine.
Iterating over every node in the system involves a TLB miss on every
iteration along with an offnode reference and possibly cacheline demotion.
Jesse
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-20 18:48 ` Jesse Barnes
@ 2004-08-20 19:01 ` Jesse Barnes
2004-08-20 22:25 ` Andrew Morton
2004-08-21 6:29 ` mita akinobu
1 sibling, 1 reply; 8+ messages in thread
From: Jesse Barnes @ 2004-08-20 19:01 UTC (permalink / raw)
To: mita akinobu; +Cc: linux-kernel, Andrew Morton, Matthew Dobson
On Friday, August 20, 2004 2:48 pm, Jesse Barnes wrote:
> On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> > + for (i = 0; i < MAX_NR_ZONES; i++) {
> > + *active += zones[i].nr_active;
> > + *inactive += zones[i].nr_inactive;
> > + *free += zones[i].free_pages;
> > + }
> > +}
> > +
> > - *free += zone->free_pages;
> > + for_each_pgdat(pgdat) {
> > + unsigned long l, m, n;
> > + __get_zone_counts(&l, &m, &n, pgdat);
> > + *active += l;
> > + *inactive += m;
> > + *free += n;
> > }
>
> Just FYI, loops like this are going to be very slow on a large machine.
> Iterating over every node in the system involves a TLB miss on every
> iteration along with an offnode reference and possibly cacheline demotion.
...but I see that you're just adding the info to the per-node meminfo files,
so it should be ok as long as people access a node's meminfo file from a
local cpu. /proc/meminfo will still hurt a lot though.
I bring this up because I ran into it once. I created a file
called /proc/discontig which printed out detailed per-node memory stats, one
node per line. On a large system it would literally take several seconds to
cat the file due to the overhead of looking at all the pages and zone
structures.
Jesse
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-20 19:01 ` Jesse Barnes
@ 2004-08-20 22:25 ` Andrew Morton
2004-08-20 23:08 ` Matthew Dobson
2004-08-21 20:09 ` Jesse Barnes
0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2004-08-20 22:25 UTC (permalink / raw)
To: Jesse Barnes; +Cc: amgta, linux-kernel, colpatch
Jesse Barnes <jbarnes@engr.sgi.com> wrote:
>
> On Friday, August 20, 2004 2:48 pm, Jesse Barnes wrote:
> > On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> > > + for (i = 0; i < MAX_NR_ZONES; i++) {
> > > + *active += zones[i].nr_active;
> > > + *inactive += zones[i].nr_inactive;
> > > + *free += zones[i].free_pages;
> > > + }
> > > +}
> > > +
> > > - *free += zone->free_pages;
> > > + for_each_pgdat(pgdat) {
> > > + unsigned long l, m, n;
> > > + __get_zone_counts(&l, &m, &n, pgdat);
> > > + *active += l;
> > > + *inactive += m;
> > > + *free += n;
> > > }
> >
> > Just FYI, loops like this are going to be very slow on a large machine.
> > Iterating over every node in the system involves a TLB miss on every
> > iteration along with an offnode reference and possibly cacheline demotion.
>
> ...but I see that you're just adding the info to the per-node meminfo files,
> so it should be ok as long as people access a node's meminfo file from a
> local cpu. /proc/meminfo will still hurt a lot though.
>
> I bring this up because I ran into it once. I created a file
> called /proc/discontig which printed out detailed per-node memory stats, one
> node per line. On a large system it would literally take several seconds to
> cat the file due to the overhead of looking at all the pages and zone
> structures.
>
So was that an ack, a nack or a quack?
I'll queue the patch up so it doesn't get lost - could you please take a
closer look at the performance implications sometime, let me know?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-20 22:25 ` Andrew Morton
@ 2004-08-20 23:08 ` Matthew Dobson
2004-08-21 20:09 ` Jesse Barnes
1 sibling, 0 replies; 8+ messages in thread
From: Matthew Dobson @ 2004-08-20 23:08 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jesse Barnes, amgta, LKML
On Fri, 2004-08-20 at 15:25, Andrew Morton wrote:
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> >
> > On Friday, August 20, 2004 2:48 pm, Jesse Barnes wrote:
> > > On Friday, August 20, 2004 2:02 pm, mita akinobu wrote:
> > > > + for (i = 0; i < MAX_NR_ZONES; i++) {
> > > > + *active += zones[i].nr_active;
> > > > + *inactive += zones[i].nr_inactive;
> > > > + *free += zones[i].free_pages;
> > > > + }
> > > > +}
> > > > +
> > > > - *free += zone->free_pages;
> > > > + for_each_pgdat(pgdat) {
> > > > + unsigned long l, m, n;
> > > > + __get_zone_counts(&l, &m, &n, pgdat);
> > > > + *active += l;
> > > > + *inactive += m;
> > > > + *free += n;
> > > > }
> > >
> > > Just FYI, loops like this are going to be very slow on a large machine.
> > > Iterating over every node in the system involves a TLB miss on every
> > > iteration along with an offnode reference and possibly cacheline demotion.
> >
> > ...but I see that you're just adding the info to the per-node meminfo files,
> > so it should be ok as long as people access a node's meminfo file from a
> > local cpu. /proc/meminfo will still hurt a lot though.
> >
> > I bring this up because I ran into it once. I created a file
> > called /proc/discontig which printed out detailed per-node memory stats, one
> > node per line. On a large system it would literally take several seconds to
> > cat the file due to the overhead of looking at all the pages and zone
> > structures.
> >
>
> So was that an ack, a nack or a quack?
>
> I'll queue the patch up so it doesn't get lost - could you please take a
> closer look at the performance implications sometime, let me know?
It doesn't look like it will have any real impact. The only function
(besides reading /proc/meminfo and
/sys/devices/system/node/node%d/meminfo files) that is affect by this is
get_zone_counts(). get_zone_counts() is already looping over all zones
on all nodes in the system anyway, this new code does exactly that.
for_each_zone() loops over all the zones by looping through each pgdat
in the system, which is what the new code does, but more explicitly. We
might do a little better by inlining __get_zone_counts()?
That said, I haven't run any benchmarks or anything... :)
-Matt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-20 18:48 ` Jesse Barnes
2004-08-20 19:01 ` Jesse Barnes
@ 2004-08-21 6:29 ` mita akinobu
2004-08-21 20:11 ` Jesse Barnes
1 sibling, 1 reply; 8+ messages in thread
From: mita akinobu @ 2004-08-21 6:29 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-kernel, Andrew Morton, Matthew Dobson
On Saturday 21 August 2004 03:48, Jesse Barnes wrote:
> Just FYI, loops like this are going to be very slow on a large machine.
> Iterating over every node in the system involves a TLB miss on every
> iteration along with an offnode reference and possibly cacheline demotion.
get_zone_counts() is used by max_sane_readahead(), and
max_sane_readahead() is often called in filemap_nopage().
If iterating over every node is going to be very slow, the following change
would have a little bit of improvement on a large machine?
--- linux-2.6.8.1-mm3/mm/readahead.c.orig 2004-08-21 15:18:08.924273720 +0900
+++ linux-2.6.8.1-mm3/mm/readahead.c 2004-08-21 15:22:31.123413392 +0900
@@ -572,6 +572,6 @@ unsigned long max_sane_readahead(unsigne
unsigned long inactive;
unsigned long free;
- get_zone_counts(&active, &inactive, &free);
+ __get_zone_counts(&active, &inactive, &free, NODE_DATA(numa_node_id()));
return min(nr, (inactive + free) / 2);
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-20 22:25 ` Andrew Morton
2004-08-20 23:08 ` Matthew Dobson
@ 2004-08-21 20:09 ` Jesse Barnes
1 sibling, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-08-21 20:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: amgta, linux-kernel, colpatch
On Friday, August 20, 2004 6:25 pm, Andrew Morton wrote:
> So was that an ack, a nack or a quack?
Ack. It looks ok to me, it just reminded me that /proc/meminfo will hurt a
lot on a big machine, but this patch messes with /sys/.../nodeX/meminfo, so
it's fine with me, and is in fact quite useful.
Jesse
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] shows Active/Inactive on per-node meminfo
2004-08-21 6:29 ` mita akinobu
@ 2004-08-21 20:11 ` Jesse Barnes
0 siblings, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-08-21 20:11 UTC (permalink / raw)
To: mita akinobu; +Cc: linux-kernel, Andrew Morton, Matthew Dobson
On Saturday, August 21, 2004 2:29 am, mita akinobu wrote:
> get_zone_counts() is used by max_sane_readahead(), and
> max_sane_readahead() is often called in filemap_nopage().
>
> If iterating over every node is going to be very slow, the following change
> would have a little bit of improvement on a large machine?
>
>
> --- linux-2.6.8.1-mm3/mm/readahead.c.orig 2004-08-21 15:18:08.924273720
> +0900 +++ linux-2.6.8.1-mm3/mm/readahead.c 2004-08-21 15:22:31.123413392
> +0900 @@ -572,6 +572,6 @@ unsigned long max_sane_readahead(unsigne
> unsigned long inactive;
> unsigned long free;
>
> - get_zone_counts(&active, &inactive, &free);
> + __get_zone_counts(&active, &inactive, &free, NODE_DATA(numa_node_id()));
> return min(nr, (inactive + free) / 2);
> }
Good catch, this looks good. for_each_pgdat just raises a red flag for me
these days. :)
Thanks,
Jesse
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-08-21 20:12 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-20 18:02 [PATCH] shows Active/Inactive on per-node meminfo mita akinobu
2004-08-20 18:48 ` Jesse Barnes
2004-08-20 19:01 ` Jesse Barnes
2004-08-20 22:25 ` Andrew Morton
2004-08-20 23:08 ` Matthew Dobson
2004-08-21 20:09 ` Jesse Barnes
2004-08-21 6:29 ` mita akinobu
2004-08-21 20:11 ` Jesse Barnes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox