All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, GOTO <y-goto@jp.fujitsu.com>
Subject: Re: [RFC][PATCH] syctl for selecting global zonelist[] order
Date: Wed, 25 Apr 2007 00:42:14 -0700	[thread overview]
Message-ID: <20070425004214.e21da2d8.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070425121946.9eb27a79.kamezawa.hiroyu@jp.fujitsu.com>

On Wed, 25 Apr 2007 12:19:46 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Make zonelist policy selectable from sysctl.
> 
> Assume 2 node NUMA, only node(0) has ZONE_DMA (ZONE_DMA32).
> 
> In this case, default (node0's) zonelist order is
> 
> Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)"s NORMAL.
> 
> This means Node(0)'s DMA is used before Node(1)'s NORMAL.
> 
> In some server, some application uses large memory allcation.
> This exhaust memory in the above order.
> Then....sometimes OOM_KILL will occur when 32bit device requires memory.
> 
> This patch adds sysctl for rebuilding zonelist after boot and doesn't change
> default zonelist order.

hm.  Why don't we use that ordering all the time?  Does the present ordering have
any advantage?

> command:
> %echo 0 > /proc/sys/vm/better_locality

Who could resist having better locality? ;)

> Will rebuild zonelist in following order.
> 
> Node(0)'s NORMAL -> Node(1)'s NORMAL -> Node(0)'s DMA.
> 
> if set better_locality == 1 (default), zonelist is
> Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)'s NORMAL.
> 
> Maybe useful in some users with heavy memory pressure and mlocks.
> 
> ...
>
>  extern int percpu_pagelist_fraction;
>  extern int compat_log;
> +#ifdef CONFIG_NUMA
> +extern int sysctl_better_locality;
> +#endif

The ifdef isn't needed here.  If something went wrong, we'll find out at
link-time.
  
>  /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
>  static int maxolduid = 65535;
> @@ -845,6 +848,15 @@ static ctl_table vm_table[] = {
>  		.extra1		= &zero,
>  		.extra2		= &one_hundred,
>  	},
> +	{
> +		.ctl_name	= VM_BETTER_LOCALITY,

Please don't add new sysctls: use CTL_UNNUMBERED here.

> +		.procname	= "better_locality",
> +		.data		= &sysctl_better_locality,
> +		.maxlen		= sizeof(sysctl_better_locality),
> +		.mode		= 0644,
> +		.proc_handler	= &sysctl_better_locality_handler,
> +		.strategy	= &sysctl_intvec,
> +	},
>
> ..
>
> +static void build_zonelists(pg_data_t *pgdat)
> +{
> +	if (sysctl_better_locality) {
> +		build_zonelists_locality_aware(pgdat);
> +	} else {
> +		build_zonelists_zone_aware(pgdat);
> +	}

Remove all the braces please.

> @@ -207,6 +207,7 @@ enum
>  	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
>  	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
>  	VM_MIN_SLAB=35,		 /* Percent pages ignored by zone reclaim */
> +	VM_BETTER_LOCALITY=36,	 /* create locality-preference zonelist */

This can go away.

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, GOTO <y-goto@jp.fujitsu.com>
Subject: Re: [RFC][PATCH] syctl for selecting global zonelist[] order
Date: Wed, 25 Apr 2007 00:42:14 -0700	[thread overview]
Message-ID: <20070425004214.e21da2d8.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070425121946.9eb27a79.kamezawa.hiroyu@jp.fujitsu.com>

On Wed, 25 Apr 2007 12:19:46 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Make zonelist policy selectable from sysctl.
> 
> Assume 2 node NUMA, only node(0) has ZONE_DMA (ZONE_DMA32).
> 
> In this case, default (node0's) zonelist order is
> 
> Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)"s NORMAL.
> 
> This means Node(0)'s DMA is used before Node(1)'s NORMAL.
> 
> In some server, some application uses large memory allcation.
> This exhaust memory in the above order.
> Then....sometimes OOM_KILL will occur when 32bit device requires memory.
> 
> This patch adds sysctl for rebuilding zonelist after boot and doesn't change
> default zonelist order.

hm.  Why don't we use that ordering all the time?  Does the present ordering have
any advantage?

> command:
> %echo 0 > /proc/sys/vm/better_locality

Who could resist having better locality? ;)

> Will rebuild zonelist in following order.
> 
> Node(0)'s NORMAL -> Node(1)'s NORMAL -> Node(0)'s DMA.
> 
> if set better_locality == 1 (default), zonelist is
> Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)'s NORMAL.
> 
> Maybe useful in some users with heavy memory pressure and mlocks.
> 
> ...
>
>  extern int percpu_pagelist_fraction;
>  extern int compat_log;
> +#ifdef CONFIG_NUMA
> +extern int sysctl_better_locality;
> +#endif

The ifdef isn't needed here.  If something went wrong, we'll find out at
link-time.
  
>  /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
>  static int maxolduid = 65535;
> @@ -845,6 +848,15 @@ static ctl_table vm_table[] = {
>  		.extra1		= &zero,
>  		.extra2		= &one_hundred,
>  	},
> +	{
> +		.ctl_name	= VM_BETTER_LOCALITY,

Please don't add new sysctls: use CTL_UNNUMBERED here.

> +		.procname	= "better_locality",
> +		.data		= &sysctl_better_locality,
> +		.maxlen		= sizeof(sysctl_better_locality),
> +		.mode		= 0644,
> +		.proc_handler	= &sysctl_better_locality_handler,
> +		.strategy	= &sysctl_intvec,
> +	},
>
> ..
>
> +static void build_zonelists(pg_data_t *pgdat)
> +{
> +	if (sysctl_better_locality) {
> +		build_zonelists_locality_aware(pgdat);
> +	} else {
> +		build_zonelists_zone_aware(pgdat);
> +	}

Remove all the braces please.

> @@ -207,6 +207,7 @@ enum
>  	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
>  	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
>  	VM_MIN_SLAB=35,		 /* Percent pages ignored by zone reclaim */
> +	VM_BETTER_LOCALITY=36,	 /* create locality-preference zonelist */

This can go away.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-04-25  7:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-25  3:19 [RFC][PATCH] syctl for selecting global zonelist[] order KAMEZAWA Hiroyuki
2007-04-25  3:19 ` KAMEZAWA Hiroyuki
2007-04-25  7:42 ` Andrew Morton [this message]
2007-04-25  7:42   ` Andrew Morton
2007-04-25  7:55   ` KAMEZAWA Hiroyuki
2007-04-25  7:55     ` KAMEZAWA Hiroyuki
2007-04-25  9:31   ` Andi Kleen
2007-04-25  9:31     ` Andi Kleen
2007-04-25 19:17 ` Christoph Lameter
2007-04-25 19:17   ` Christoph Lameter
2007-04-26  0:31   ` KAMEZAWA Hiroyuki
2007-04-26  0:31     ` KAMEZAWA Hiroyuki
2007-04-26  0:40     ` KAMEZAWA Hiroyuki
2007-04-26  0:40       ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070425004214.e21da2d8.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.