public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] Zone reclaim V3: main patch
@ 2005-12-08 20:37 Christoph Lameter
  2005-12-08 20:37 ` [PATCH 2/3] Zone reclaim V3: Remove debris from old zone reclaim Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 20:37 UTC (permalink / raw)
  To: akpm
  Cc: Christoph Hellwig, linux-ia64, steiner, linux-kernel, ak,
	Wu Fengguang, Christoph Lameter

Zone reclaim allows the reclaiming of pages from a zone if the number of free
pages falls below the watermark even if other zones still have enough pages
available. Zone reclaim is of particular importance for NUMA machines. It can
be more beneficial to reclaim a page than taking the performance penalties
that come with allocating a page on a remote zone.

The patch replaces Martin Hick's zone reclaim function (which was never
working properly).

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20 meaning the distance to another node in the
same component (enclosure or motherboard).

V2->V3:
- At Andi Kleen's suggestion: Use distance information to determine zone
  reclaim behavior instead of using an arch specific function.
- Do not compile zone_reclaim logic if this is not a NUMA system
- Limit number of unsuccessful reclaim attempts

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc4/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc4.orig/mm/page_alloc.c	2005-11-30 22:25:15.000000000 -0800
+++ linux-2.6.15-rc4/mm/page_alloc.c	2005-12-08 12:30:29.000000000 -0800
@@ -842,7 +842,9 @@ get_page_from_freelist(gfp_t gfp_mask, u
 				mark = (*z)->pages_high;
 			if (!zone_watermark_ok(*z, order, mark,
 				    classzone_idx, alloc_flags))
-				continue;
+				if (!zone_reclaim_mode ||
+			        	!zone_reclaim(*z, gfp_mask, order))
+						continue;
 		}
 
 		page = buffered_rmqueue(*z, order, gfp_mask);
@@ -1559,13 +1561,22 @@ static void __init build_zonelists(pg_da
 	prev_node = local_node;
 	nodes_clear(used_mask);
 	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
+		int distance = node_distance(local_node, node);
+		
+		/*
+		 * If another node is sufficiently far away then it is better
+		 * to reclaim pages in a zone before going off node.
+		 */
+		if (distance > RECLAIM_DISTANCE)
+			zone_reclaim_mode = 1;
+
 		/*
 		 * We don't want to pressure a particular node.
 		 * So adding penalty to the first node in same
 		 * distance group to make it round-robin.
 		 */
-		if (node_distance(local_node, node) !=
-				node_distance(local_node, prev_node))
+
+		if (distance != node_distance(local_node, prev_node))
 			node_load[node] += load;
 		prev_node = node;
 		load--;
Index: linux-2.6.15-rc4/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc4.orig/mm/vmscan.c	2005-11-30 22:25:15.000000000 -0800
+++ linux-2.6.15-rc4/mm/vmscan.c	2005-12-08 12:30:29.000000000 -0800
@@ -1354,6 +1354,14 @@ static int __init kswapd_init(void)
 
 module_init(kswapd_init)
 
+#ifdef CONFIG_NUMA
+/*
+ * Zone reclaim mode
+ *
+ * If non-zero call zone_reclaim when the number of free pages falls below
+ * the watermarks.
+ */
+int zone_reclaim_mode __read_mostly;
 
 /*
  * Try to free up some pages from this zone through reclaim.
@@ -1362,12 +1370,13 @@ int zone_reclaim(struct zone *zone, gfp_
 {
 	struct scan_control sc;
 	int nr_pages = 1 << order;
-	int total_reclaimed = 0;
+	struct task_struct *p = current;
+	struct reclaim_state reclaim_state;
 
-	/* The reclaim may sleep, so don't do it if sleep isn't allowed */
-	if (!(gfp_mask & __GFP_WAIT))
-		return 0;
-	if (zone->all_unreclaimable)
+	if (!(gfp_mask & __GFP_WAIT) ||
+	    zone->zone_pgdat->node_id != numa_node_id() ||
+	    zone->all_unreclaimable ||
+	    atomic_read(&zone->reclaim_in_progress) > 0)
 		return 0;
 
 	sc.gfp_mask = gfp_mask;
@@ -1376,25 +1385,22 @@ int zone_reclaim(struct zone *zone, gfp_
 	sc.nr_mapped = read_page_state(nr_mapped);
 	sc.nr_scanned = 0;
 	sc.nr_reclaimed = 0;
-	/* scan at the highest priority */
 	sc.priority = 0;
 	disable_swap_token();
 
-	if (nr_pages > SWAP_CLUSTER_MAX)
-		sc.swap_cluster_max = nr_pages;
-	else
-		sc.swap_cluster_max = SWAP_CLUSTER_MAX;
-
-	/* Don't reclaim the zone if there are other reclaimers active */
-	if (atomic_read(&zone->reclaim_in_progress) > 0)
-		goto out;
+	sc.swap_cluster_max = max(nr_pages, SWAP_CLUSTER_MAX);
 
+	cond_resched();
+	p->flags |= PF_MEMALLOC;
+	reclaim_state.reclaimed_slab = 0;
+	p->reclaim_state = &reclaim_state;
 	shrink_zone(zone, &sc);
-	total_reclaimed = sc.nr_reclaimed;
-
- out:
-	return total_reclaimed;
+	p->reclaim_state = NULL;
+	current->flags &= ~PF_MEMALLOC;
+	cond_resched();
+	return sc.nr_reclaimed >= (1 << order);
 }
+#endif
 
 asmlinkage long sys_set_zone_reclaim(unsigned int node, unsigned int zone,
 				     unsigned int state)
Index: linux-2.6.15-rc4/include/linux/swap.h
===================================================================
--- linux-2.6.15-rc4.orig/include/linux/swap.h	2005-11-30 22:25:15.000000000 -0800
+++ linux-2.6.15-rc4/include/linux/swap.h	2005-12-08 12:30:57.000000000 -0800
@@ -172,7 +172,17 @@ extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
 extern int try_to_free_pages(struct zone **, gfp_t);
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+#else
+#define zone_reclaim_mode 0
+static inline int zone_reclaim(struct zone *z, gfp_t mask,
+				unsigned int order)
+{
+	return 0;
+}
+#endif
 extern int shrink_all_memory(int);
 extern int vm_swappiness;
 
Index: linux-2.6.15-rc4/include/linux/topology.h
===================================================================
--- linux-2.6.15-rc4.orig/include/linux/topology.h	2005-11-30 22:25:15.000000000 -0800
+++ linux-2.6.15-rc4/include/linux/topology.h	2005-12-08 12:30:29.000000000 -0800
@@ -56,6 +56,9 @@
 #define REMOTE_DISTANCE		20
 #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
 #endif
+#ifndef RECLAIM_DISTANCE
+#define RECLAIM_DISTANCE 20
+#endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] Zone reclaim V3: Remove debris from old zone reclaim
  2005-12-08 20:37 [PATCH 1/3] Zone reclaim V3: main patch Christoph Lameter
@ 2005-12-08 20:37 ` Christoph Lameter
  2005-12-08 20:37 ` [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts Christoph Lameter
  2005-12-08 21:08 ` [PATCH 1/3] Zone reclaim V3: main patch Andi Kleen
  2 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 20:37 UTC (permalink / raw)
  To: akpm
  Cc: Christoph Hellwig, linux-ia64, steiner, linux-kernel, ak,
	Wu Fengguang, Christoph Lameter

Remove debris of old zone reclaim

Removes the leftovers from prior attempts to implement Zone reclaim.

sys_set_zone_reclaim is not rechable in 2.6.14.

The reclaim_pages field in struct zone is only used by sys_set_zone_reclaim.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc4/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc4.orig/include/linux/mmzone.h	2005-11-30 22:25:15.000000000 -0800
+++ linux-2.6.15-rc4/include/linux/mmzone.h	2005-12-08 09:35:29.000000000 -0800
@@ -150,11 +150,6 @@ struct zone {
 	unsigned long		pages_scanned;	   /* since last reclaim */
 	int			all_unreclaimable; /* All pages pinned */
 
-	/*
-	 * Does the allocator try to reclaim pages from the zone as soon
-	 * as it fails a watermark_ok() in __alloc_pages?
-	 */
-	int			reclaim_pages;
 	/* A count of how many reclaimers are scanning this zone */
 	atomic_t		reclaim_in_progress;
 
Index: linux-2.6.15-rc4/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc4.orig/mm/vmscan.c	2005-12-08 09:23:59.000000000 -0800
+++ linux-2.6.15-rc4/mm/vmscan.c	2005-12-08 09:35:29.000000000 -0800
@@ -1402,33 +1402,3 @@ int zone_reclaim(struct zone *zone, gfp_
 }
 #endif
 
-asmlinkage long sys_set_zone_reclaim(unsigned int node, unsigned int zone,
-				     unsigned int state)
-{
-	struct zone *z;
-	int i;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EACCES;
-
-	if (node >= MAX_NUMNODES || !node_online(node))
-		return -EINVAL;
-
-	/* This will break if we ever add more zones */
-	if (!(zone & (1<<ZONE_DMA|1<<ZONE_NORMAL|1<<ZONE_HIGHMEM)))
-		return -EINVAL;
-
-	for (i = 0; i < MAX_NR_ZONES; i++) {
-		if (!(zone & 1<<i))
-			continue;
-
-		z = &NODE_DATA(node)->node_zones[i];
-
-		if (state)
-			z->reclaim_pages = 1;
-		else
-			z->reclaim_pages = 0;
-	}
-
-	return 0;
-}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts
  2005-12-08 20:37 [PATCH 1/3] Zone reclaim V3: main patch Christoph Lameter
  2005-12-08 20:37 ` [PATCH 2/3] Zone reclaim V3: Remove debris from old zone reclaim Christoph Lameter
@ 2005-12-08 20:37 ` Christoph Lameter
  2005-12-08 20:52   ` Andi Kleen
  2005-12-08 21:08 ` [PATCH 1/3] Zone reclaim V3: main patch Andi Kleen
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 20:37 UTC (permalink / raw)
  To: akpm
  Cc: Christoph Hellwig, linux-ia64, steiner, linux-kernel, ak,
	Wu Fengguang, Christoph Lameter

Reduce frequency of unsuccessful zone reclaim attempts

It is unlikely that zone reclaim is successful once it has failed. The
performance of the page allocator will sink signficantly for off-node
allocation if every page allocation attempt first requires a zone reclaim
scan to establish that no local memory is availale.

This patch limits the number of unsuccessful zone reclaim attempts to one
per tick by remembering the last time a zone reclaim failed on a zone.

Note that this approach may be avoided once we have per zone statistics
on the number of unmapped (==easily reclaimable) pages. I am working on
a statistics patch that may allow keeping track of unmapped pages per
zone. A check of that number may then allow an easy determination if it
makes sense to run zone reclaim.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc4/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc4.orig/mm/vmscan.c	2005-12-08 11:10:14.000000000 -0800
+++ linux-2.6.15-rc4/mm/vmscan.c	2005-12-08 12:04:38.000000000 -0800
@@ -1379,6 +1379,16 @@ int zone_reclaim(struct zone *zone, gfp_
 	    atomic_read(&zone->reclaim_in_progress) > 0)
 		return 0;
 
+	/*
+	 * If an unsuccessful zone reclaim occurred in this tick then we
+	 * already needed to go off before. Our local purity is already
+	 * tainted and its likely that the scan for easily reclaimable pages
+	 * will be a waste of time. Continue off node allocations for the
+	 * duration of this tick.
+	 */
+	if (zone->last_unsuccessful_zone_reclaim == get_jiffies_64())
+		return 0;
+
 	sc.gfp_mask = gfp_mask;
 	sc.may_writepage = 0;
 	sc.may_swap = 0;
@@ -1397,6 +1407,8 @@ int zone_reclaim(struct zone *zone, gfp_
 	shrink_zone(zone, &sc);
 	p->reclaim_state = NULL;
 	current->flags &= ~PF_MEMALLOC;
+	if (sc.nr_reclaimed == 0)
+		zone->last_unsuccessful_zone_reclaim = get_jiffies_64();
 	cond_resched();
 	return sc.nr_reclaimed >= (1 << order);
 }
Index: linux-2.6.15-rc4/include/linux/mmzone.h
===================================================================
--- linux-2.6.15-rc4.orig/include/linux/mmzone.h	2005-12-08 11:10:14.000000000 -0800
+++ linux-2.6.15-rc4/include/linux/mmzone.h	2005-12-08 12:00:43.000000000 -0800
@@ -153,6 +153,8 @@ struct zone {
 	/* A count of how many reclaimers are scanning this zone */
 	atomic_t		reclaim_in_progress;
 
+	unsigned long		last_unsuccessful_zone_reclaim;
+
 	/*
 	 * prev_priority holds the scanning priority for this zone.  It is
 	 * defined as the scanning priority at which we achieved our reclaim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts
  2005-12-08 20:37 ` [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts Christoph Lameter
@ 2005-12-08 20:52   ` Andi Kleen
  2005-12-08 21:08     ` Christoph Lameter
  2005-12-08 21:08     ` Christoph Lameter
  0 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2005-12-08 20:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel, ak,
	Wu Fengguang

> +	if (zone->last_unsuccessful_zone_reclaim == get_jiffies_64())
> +		return 0;


and

>  
> +	unsigned long		last_unsuccessful_zone_reclaim;

For long you don't need get_jiffies_64. On 32bit it would be 32bit
anyways and on 64bit even normal jiffies is 64bit. So normal
jiffies would be suffice.

But I suspect it would be better to just merge the proper patch
with the full accounting instead of this kludge.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts
  2005-12-08 20:52   ` Andi Kleen
@ 2005-12-08 21:08     ` Christoph Lameter
  2005-12-08 21:08     ` Christoph Lameter
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 21:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel,
	Wu Fengguang

On Thu, 8 Dec 2005, Andi Kleen wrote:

> For long you don't need get_jiffies_64. On 32bit it would be 32bit
> anyways and on 64bit even normal jiffies is 64bit. So normal
> jiffies would be suffice.

Patch follows.

> But I suspect it would be better to just merge the proper patch
> with the full accounting instead of this kludge.

I would also like to see the full accounting patch to fix this in the 
right way but on the other hand I would like to disentangle different 
patchsets as much as possible. The accounting patch may touch many 
critical code paths.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 20:37 [PATCH 1/3] Zone reclaim V3: main patch Christoph Lameter
  2005-12-08 20:37 ` [PATCH 2/3] Zone reclaim V3: Remove debris from old zone reclaim Christoph Lameter
  2005-12-08 20:37 ` [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts Christoph Lameter
@ 2005-12-08 21:08 ` Andi Kleen
  2005-12-08 21:23   ` Christoph Lameter
  2 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2005-12-08 21:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel, ak,
	Wu Fengguang

> Zone reclaim is enabled if the maximum distance to another node is higher
> than RECLAIM_DISTANCE, which may be defined by an arch. By default
> RECLAIM_DISTANCE is 20 meaning the distance to another node in the
> same component (enclosure or motherboard).

Sorry I made a mistake here earlier. On checking the ACPI spec
again it's valid to have distances < 20 (e.g. for a 1.5 NUMA factor
it would be legally 15) 

So better just check > LOCAL_DISTANCE, not >= 20.

Also a lot of Opteron BIOS get that wrong, but I'm adding some
sanity checking now so it should work in future.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts
  2005-12-08 20:52   ` Andi Kleen
  2005-12-08 21:08     ` Christoph Lameter
@ 2005-12-08 21:08     ` Christoph Lameter
  2005-12-08 21:10       ` Andi Kleen
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 21:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel,
	Wu Fengguang

Patch:

Index: linux-2.6.15-rc4/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc4.orig/mm/vmscan.c	2005-12-08 12:31:56.000000000 -0800
+++ linux-2.6.15-rc4/mm/vmscan.c	2005-12-08 13:07:05.000000000 -0800
@@ -1386,7 +1386,7 @@ int zone_reclaim(struct zone *zone, gfp_
 	 * will be a waste of time. Continue off node allocations for the
 	 * duration of this tick.
 	 */
-	if (zone->last_unsuccessful_zone_reclaim == get_jiffies_64())
+	if (zone->last_unsuccessful_zone_reclaim == jiffies)
 		return 0;
 
 	sc.gfp_mask = gfp_mask;
@@ -1408,7 +1408,7 @@ int zone_reclaim(struct zone *zone, gfp_
 	p->reclaim_state = NULL;
 	current->flags &= ~PF_MEMALLOC;
 	if (sc.nr_reclaimed == 0)
-		zone->last_unsuccessful_zone_reclaim = get_jiffies_64();
+		zone->last_unsuccessful_zone_reclaim = jiffies;
 	cond_resched();
 	return sc.nr_reclaimed >= (1 << order);
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts
  2005-12-08 21:08     ` Christoph Lameter
@ 2005-12-08 21:10       ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2005-12-08 21:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, akpm, Christoph Hellwig, linux-ia64, steiner,
	linux-kernel, Wu Fengguang

On Thu, Dec 08, 2005 at 01:08:50PM -0800, Christoph Lameter wrote:
> Patch:

Looks good thanks.

I hope this will help Opteron users a lot who have been always
complaining about this too.

-Andi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 21:08 ` [PATCH 1/3] Zone reclaim V3: main patch Andi Kleen
@ 2005-12-08 21:23   ` Christoph Lameter
  2005-12-08 22:51     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 21:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel,
	Wu Fengguang

On Thu, 8 Dec 2005, Andi Kleen wrote:

> Sorry I made a mistake here earlier. On checking the ACPI spec
> again it's valid to have distances < 20 (e.g. for a 1.5 NUMA factor
> it would be legally 15) 

Saw that too.

> So better just check > LOCAL_DISTANCE, not >= 20.

For Altix 20 means that the other node is remote but in the same 
enclosure / motherboard. Latency is very low in these cases. I think in 
these small configurations it is better to go off node rather than using 
the reclaim logic.

Other small configurations may have the same issues.

RECLAIM_DISTANCE can be set per arch if the default is not okay.

> Also a lot of Opteron BIOS get that wrong, but I'm adding some
> sanity checking now so it should work in future.

Great!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 21:23   ` Christoph Lameter
@ 2005-12-08 22:51     ` Andi Kleen
  2005-12-08 23:19       ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2005-12-08 22:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, akpm, Christoph Hellwig, linux-ia64, steiner,
	linux-kernel, Wu Fengguang, discuss

> For Altix 20 means that the other node is remote but in the same 
> enclosure / motherboard. Latency is very low in these cases. I think in 
> these small configurations it is better to go off node rather than using 
> the reclaim logic.

On Opterons the NUMA factors are usually < 2, more towards 1, but people
definitely note a difference between node and off node.
So I don't think that's a good heuristic. 

I would use > LOCAL_DISTANCE or perhaps if you really want
a new constant with value 12-15. 

> RECLAIM_DISTANCE can be set per arch if the default is not okay.

Well if anything it would be per system - perhaps need to make
it a boot option or somesuch later. 

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 22:51     ` Andi Kleen
@ 2005-12-08 23:19       ` Christoph Lameter
  2005-12-08 23:28         ` [discuss] " Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 23:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel,
	Wu Fengguang, discuss

On Thu, 8 Dec 2005, Andi Kleen wrote:

> I would use > LOCAL_DISTANCE or perhaps if you really want
> a new constant with value 12-15. 

One may define RECLAIM_DISTANCE to be 12 for x86_64 in topology.h
in order to get zone reclaim earlier for the opteron clusters. I would 
think though that large opteron clusters also have distances > 20.

My experience is that at 20 systems do not need zone reclaim yet.
 
> > RECLAIM_DISTANCE can be set per arch if the default is not okay.
> 
> Well if anything it would be per system - perhaps need to make
> it a boot option or somesuch later. 

The idea here was to avoid any manual configuration. The numa distances 
must related in some real way to performance (at least per arch) in order 
for the automatic determination of zone reclaim to make sense. We could 
have a boot time override but then RECLAIM_DISTANCE needs to be a 
variable not a macro.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 23:19       ` Christoph Lameter
@ 2005-12-08 23:28         ` Andi Kleen
  2005-12-08 23:35           ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2005-12-08 23:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, akpm, Christoph Hellwig, linux-ia64, steiner,
	linux-kernel, Wu Fengguang, discuss

On Thu, Dec 08, 2005 at 03:19:36PM -0800, Christoph Lameter wrote:
> On Thu, 8 Dec 2005, Andi Kleen wrote:
> 
> > I would use > LOCAL_DISTANCE or perhaps if you really want
> > a new constant with value 12-15. 
> 
> One may define RECLAIM_DISTANCE to be 12 for x86_64 in topology.h
> in order to get zone reclaim earlier for the opteron clusters. I would 
> think though that large opteron clusters also have distances > 20.
> 
> My experience is that at 20 systems do not need zone reclaim yet.

I really cannot confirm your experience here.

>  
> > > RECLAIM_DISTANCE can be set per arch if the default is not okay.
> > 
> > Well if anything it would be per system - perhaps need to make
> > it a boot option or somesuch later. 
> 
> The idea here was to avoid any manual configuration. The numa distances 

Sure as a default this makes sense.

I'm just questioning your default values.

> must related in some real way to performance (at least per arch) in order 
> for the automatic determination of zone reclaim to make sense. We could 
> have a boot time override but then RECLAIM_DISTANCE needs to be a 
> variable not a macro.

The macro can be always later defined to a variable, no problem.

-Andi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 23:28         ` [discuss] " Andi Kleen
@ 2005-12-08 23:35           ` Christoph Lameter
  2005-12-08 23:40             ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2005-12-08 23:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akpm, Christoph Hellwig, linux-ia64, steiner, linux-kernel,
	Wu Fengguang, discuss

On Fri, 9 Dec 2005, Andi Kleen wrote:

> > My experience is that at 20 systems do not need zone reclaim yet.
> 
> I really cannot confirm your experience here.

Maybe the meaning of these numbers varies? I know that 10 is a local 
access but the assumption in include/linux/numa.h that 20 is a remote 
access is probably already a guess.

I know that our Altix machines seem to use 10 for a local and 20 for 
nonlocal but same box. The distances then increase from there.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 23:35           ` Christoph Lameter
@ 2005-12-08 23:40             ` Andi Kleen
  2005-12-09  0:10               ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2005-12-08 23:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, akpm, Christoph Hellwig, linux-ia64, steiner,
	linux-kernel, Wu Fengguang, discuss

On Thu, Dec 08, 2005 at 03:35:05PM -0800, Christoph Lameter wrote:
> On Fri, 9 Dec 2005, Andi Kleen wrote:
> 
> > > My experience is that at 20 systems do not need zone reclaim yet.
> > 
> > I really cannot confirm your experience here.
> 
> Maybe the meaning of these numbers varies? I know that 10 is a local 
> access but the assumption in include/linux/numa.h that 20 is a remote 
> access is probably already a guess.

The spec seems to suggest it's roughly the NUMA factor scaled (so for 1.4
you would get 14). But I haven't actually seen a Opteron with correct
SLIT yet so I don't know what they use ...

> I know that our Altix machines seem to use 10 for a local and 20 for 
> nonlocal but same box. The distances then increase from there.

Unless non local same box is 2 times as slow as the local I wouldn't
consider that correct.  (I would expect the Altix to do better than that) 
-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] Re: [PATCH 1/3] Zone reclaim V3: main patch
  2005-12-08 23:40             ` Andi Kleen
@ 2005-12-09  0:10               ` Christoph Lameter
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2005-12-09  0:10 UTC (permalink / raw)
  To: steiner
  Cc: akpm, Christoph Hellwig, linux-ia64, Andi Kleen, linux-kernel,
	Wu Fengguang, discuss

On Fri, 9 Dec 2005, Andi Kleen wrote:

> Unless non local same box is 2 times as slow as the local I wouldn't
> consider that correct.  (I would expect the Altix to do better than that) 

Maybe Jack could give us a hint how these slit numbers relate to 
reality?


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-12-09  0:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-08 20:37 [PATCH 1/3] Zone reclaim V3: main patch Christoph Lameter
2005-12-08 20:37 ` [PATCH 2/3] Zone reclaim V3: Remove debris from old zone reclaim Christoph Lameter
2005-12-08 20:37 ` [PATCH 3/3] Zone reclaim V3: Frequency of failed reclaim attempts Christoph Lameter
2005-12-08 20:52   ` Andi Kleen
2005-12-08 21:08     ` Christoph Lameter
2005-12-08 21:08     ` Christoph Lameter
2005-12-08 21:10       ` Andi Kleen
2005-12-08 21:08 ` [PATCH 1/3] Zone reclaim V3: main patch Andi Kleen
2005-12-08 21:23   ` Christoph Lameter
2005-12-08 22:51     ` Andi Kleen
2005-12-08 23:19       ` Christoph Lameter
2005-12-08 23:28         ` [discuss] " Andi Kleen
2005-12-08 23:35           ` Christoph Lameter
2005-12-08 23:40             ` Andi Kleen
2005-12-09  0:10               ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox