All of lore.kernel.org
 help / color / mirror / Atom feed
From: mel@skynet.ie (Mel Gorman)
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	ak@suse.de, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	akpm@linux-foundation.org, pj@sgi.com
Subject: Re: NUMA policy issues with ZONE_MOVABLE
Date: Fri, 27 Jul 2007 18:46:22 +0100	[thread overview]
Message-ID: <20070727174622.GD646@skynet.ie> (raw)
In-Reply-To: <Pine.LNX.4.64.0707271026040.15990@schroedinger.engr.sgi.com>

On (27/07/07 10:35), Christoph Lameter didst pronounce:
> On Fri, 27 Jul 2007, Mel Gorman wrote:
> 
> > This was fairly straight-forward but I wouldn't call it a bug fix for 2.6.23
> > for the policys + ZONE_MOVABLE issue; I still prefer the last patch for
> > the fix.
> > 
> > This patch uses one zonelist per node and filters based on a gfp_mask where
> > necessary. It consumes less memory and reduces cache pressure at the cost
> > of CPU. It also adds a zone_id field to struct zone as zone_idx is used more
> > than it was previously.
> > 
> > Performance differences on kernbench for Total CPU time ranged from
> > -0.06% to +1.19%.
> 
> Performance is equal otherwise?
>  

Initial tests imply yes but I haven't done broader tests yet. It saves 64
bytes on the size of the node structure on a non-numa i386 machine so even
that might be noticable in some cases.

> > Obvious things that are outstanding;
> > 
> > o Compile-test parisc
> > o Split patch in two to keep the zone_idx changes separetly
> > o Verify zlccache is not broken
> > o Have a version of __alloc_pages take a nodemask and ditch
> >   bind_zonelist()
> 
> Yeah. I think the NUMA folks would love this but the rest of the 
> developers may object.
> 
> > I can work on bringing this up to scratch during the cycle.
> > 
> > Patch as follows. Comments?
> 
> Glad to see some movement in this area. 
> 
> > index bc68dd9..f2a597e 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -116,6 +116,13 @@ static inline enum zone_type gfp_zone(gfp_t flags)
> >  	return ZONE_NORMAL;
> >  }
> >  
> > +static inline int should_filter_zone(struct zone *zone, int highest_zoneidx)
> > +{
> > +	if (zone_idx(zone) > highest_zoneidx)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> 
> I think this should_filter() creates more overhead than which it saves.

It's why part of the patch adds a zone_idx field to struct zone instead
of mucking around with pgdat->node_zones.

> In 
> particular true for configurations with a small number of zones like SMP 
> systems. For large NUMA systems the cache savings will likely may it 
> beneficial.
> 
> Simply filter all.
> 

What do you mean by simply filter all? The should_filter_zone() is
returning 1 if the zone should not be used for the current gfp_mask. It
would be easier to read (but slower) if it was expressed as

if (zone_idx(zone) > gfp_zone(gfp_mask))
	return 1;

so that zones unsuitable for gfp_mask are ignored.

> > @@ -258,7 +258,7 @@ static inline void mpol_fix_fork_child_flag(struct task_struct *p)
> >  static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma,
> >  		unsigned long addr, gfp_t gfp_flags)
> >  {
> > -	return NODE_DATA(0)->node_zonelists + gfp_zone(gfp_flags);
> > +	return &NODE_DATA(0)->node_zonelist;
> >  }
> 
> These modifications look good in terrms of code size reduction.
> 

720 bytes less in the size of the text section for a standalone non-numa
machine.

> > @@ -438,7 +439,7 @@ extern struct page *mem_map;
> >  struct bootmem_data;
> >  typedef struct pglist_data {
> >  	struct zone node_zones[MAX_NR_ZONES];
> > -	struct zonelist node_zonelists[MAX_NR_ZONES];
> > +	struct zonelist node_zonelist;
> 
> Looks like a significant memory savings on 1024 node numa. zonelist has
> #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
> zones.
> 

I'll gather figures.

> > @@ -185,11 +186,15 @@ static inline int constrained_alloc(struct zonelist *zonelist, gfp_t gfp_mask)
> >  		if (NODE_DATA(node)->node_present_pages)
> >  			node_set(node, nodes);
> >  
> > -	for (z = zonelist->zones; *z; z++)
> > +	for (z = zonelist->zones; *z; z++) {
> > +
> > +		if (should_filter_zone(*z, highest_zoneidx))
> > +			continue;
> 
> Huh? Why do you need it here? Note that this code is also going away with 
> the memoryless node patch. We can use the nodes with memory nodemask here.
> 

This function expects to walk a zonelist suitable for the gfp_mask. As
the zonelists it gets has potentially unsuitable zones in it, it must be
filtered as well so that it is functionally identical.

> > diff --git a/mm/slub.c b/mm/slub.c
> > index 9b2d617..a020a12 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1276,6 +1276,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
> >  	struct zonelist *zonelist;
> >  	struct zone **z;
> >  	struct page *page;
> > +	enum zone_type highest_zoneidx = gfp_zone(flags);
> >  
> >  	/*
> >  	 * The defrag ratio allows a configuration of the tradeoffs between
> > @@ -1298,11 +1299,13 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
> >  	if (!s->defrag_ratio || get_cycles() % 1024 > s->defrag_ratio)
> >  		return NULL;
> >  
> > -	zonelist = &NODE_DATA(slab_node(current->mempolicy))
> > -					->node_zonelists[gfp_zone(flags)];
> > +	zonelist = &NODE_DATA(slab_node(current->mempolicy))->node_zonelist;
> >  	for (z = zonelist->zones; *z; z++) {
> >  		struct kmem_cache_node *n;
> >  
> > +		if (should_filter_zone(*z, highest_zoneidx))
> > +			continue;
> > +
> >  		n = get_node(s, zone_to_nid(*z));
> >  
> >  		if (n && cpuset_zone_allowed_hardwall(*z, flags) &&
> 
> Isnt there some way to fold these traversals into a common page allocator 
> function?

Probably. When I looked first, each of the users were traversing the zonelist
slightly differently so it wasn't obvious how to have a single iterator but
it's a point for improvement.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-07-27 17:46 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-25  4:20 NUMA policy issues with ZONE_MOVABLE Christoph Lameter
2007-07-25  4:47 ` Nick Piggin
2007-07-25  5:05   ` Christoph Lameter
2007-07-25  5:24     ` Nick Piggin
2007-07-25  6:00       ` Christoph Lameter
2007-07-25  6:09         ` Nick Piggin
2007-07-25  9:32       ` Andi Kleen
2007-07-25  6:36 ` KAMEZAWA Hiroyuki
2007-07-25 11:16 ` Mel Gorman
2007-07-25 14:30   ` Lee Schermerhorn
2007-07-25 19:31   ` Christoph Lameter
2007-07-26  4:15     ` KAMEZAWA Hiroyuki
2007-07-26  4:53       ` Christoph Lameter
2007-07-26  7:41         ` KAMEZAWA Hiroyuki
2007-07-26 16:16       ` Mel Gorman
2007-07-26 18:03         ` Christoph Lameter
2007-07-26 18:26           ` Mel Gorman
2007-07-26 13:23     ` Mel Gorman
2007-07-26 18:07       ` Christoph Lameter
2007-07-26 22:59         ` Mel Gorman
2007-07-27  1:22           ` Christoph Lameter
2007-07-27  8:20             ` Mel Gorman
2007-07-27 15:45               ` Mel Gorman
2007-07-27 17:35                 ` Christoph Lameter
2007-07-27 17:46                   ` Mel Gorman [this message]
2007-07-27 18:38                     ` Christoph Lameter
2007-07-27 18:00                   ` [PATCH] Document Linux Memory Policy - V2 Lee Schermerhorn
2007-07-27 18:38                     ` Randy Dunlap
2007-07-27 19:01                       ` Lee Schermerhorn
2007-07-27 19:21                         ` Randy Dunlap
2007-07-27 18:55                     ` Christoph Lameter
2007-07-27 19:24                       ` Lee Schermerhorn
2007-07-31 15:14                     ` Mel Gorman
2007-07-31 16:34                       ` Lee Schermerhorn
2007-07-31 19:10                         ` Christoph Lameter
2007-07-31 19:46                           ` Lee Schermerhorn
2007-07-31 19:58                             ` Christoph Lameter
2007-07-31 20:23                               ` Lee Schermerhorn
2007-07-31 20:48                         ` [PATCH] Document Linux Memory Policy - V3 Lee Schermerhorn
2007-08-03 13:52                           ` Mel Gorman
2007-07-28  7:28                 ` NUMA policy issues with ZONE_MOVABLE KAMEZAWA Hiroyuki
2007-07-28 11:57                   ` Mel Gorman
2007-07-28 14:10                     ` KAMEZAWA Hiroyuki
2007-07-28 14:21                       ` KAMEZAWA Hiroyuki
2007-07-30 12:41                         ` Mel Gorman
2007-07-30 18:06                           ` Christoph Lameter
2007-07-27 14:24           ` Lee Schermerhorn
2007-08-01 18:59           ` Lee Schermerhorn
2007-08-02  0:36             ` KAMEZAWA Hiroyuki
2007-08-02 17:10             ` Mel Gorman
2007-08-02 17:51               ` Lee Schermerhorn
2007-07-26 18:09       ` Lee Schermerhorn
2007-08-02 14:09     ` Mel Gorman
2007-08-02 18:56       ` Christoph Lameter
2007-08-02 19:42         ` Mel Gorman
2007-08-02 19:52           ` Christoph Lameter
2007-08-03  9:32             ` Mel Gorman
2007-08-03 16:36               ` Christoph Lameter
2007-07-25 14:27 ` Lee Schermerhorn
2007-07-25 17:39   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070727174622.GD646@skynet.ie \
    --to=mel@skynet.ie \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.