public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* lowmem_reserve (replaces protection)
@ 2004-10-25 17:01 Andrea Arcangeli
  2004-10-26  1:48 ` Rik van Riel
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-25 17:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Nick Piggin

This is a forward port to 2.6 CVS of the lowmem_reserve VM feature in
the 2.4 kernel.

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-1

Lack of this feature might explain out of memory related killing or
deadlocks hit on >2G boxes. so if anybody is having trouble with oom
conditions this is a patch to try.

this is only slightly tested but works for me so far.

This is the first of a series of oom related fix I'm going to do and
test within the next weeks to attempt to cure various oom regressions in
2.6 (deadlocks turned into crazy early oom kills and the like).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-25 17:01 lowmem_reserve (replaces protection) Andrea Arcangeli
@ 2004-10-26  1:48 ` Rik van Riel
  2004-10-26  1:58   ` Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2004-10-26  1:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-kernel, Nick Piggin

On Mon, 25 Oct 2004, Andrea Arcangeli wrote:

> This is a forward port to 2.6 CVS of the lowmem_reserve VM feature in
> the 2.4 kernel.
> 
> 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-1

-       unsigned long           protection[MAX_NR_ZONES];
+       unsigned long           lowmem_reserve[MAX_NR_ZONES];

The gratituous renaming of variable and function names makes
it hard to see what this patch actually changed.  Hard enough
that I'm not sure what the behavioural difference is supposed
to be.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-26  1:48 ` Rik van Riel
@ 2004-10-26  1:58   ` Andrea Arcangeli
  2004-10-26  3:48     ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-26  1:58 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, Nick Piggin

On Mon, Oct 25, 2004 at 09:48:25PM -0400, Rik van Riel wrote:
> On Mon, 25 Oct 2004, Andrea Arcangeli wrote:
> 
> > This is a forward port to 2.6 CVS of the lowmem_reserve VM feature in
> > the 2.4 kernel.
> > 
> > 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-1
> 
> -       unsigned long           protection[MAX_NR_ZONES];
> +       unsigned long           lowmem_reserve[MAX_NR_ZONES];
> 
> The gratituous renaming of variable and function names makes
> it hard to see what this patch actually changed.  Hard enough
> that I'm not sure what the behavioural difference is supposed
> to be.

the behavioural difference is the API and the fact the feaure is now
enabled with sane values (the previous code was disabled by default and
it was unusable with that API). besides fixing the API the patch nukes
dozens of useless lines of code and a buffer overflow.  The sysctl
definitely needs renaming or it'd break the ABI with userspace, it's far
from a gratituous rename. since I was foroced to change the sysctl name
accordingly with the new 2.4 API, I thought renaming the variable that
is set by the sysctl was also required, otherwise the sysctl is called
lowmem_reserve and the variable is still called protection. Clearly it's
much cleaner if _both_ sysctl and variable are called lowmem_reserve.

I could have used protection2 to still use the "protection" name, but
lowmem_reserve (btw, the same name I used first in 2.4, before
protection ever existed in 2.6) looks nicer to me.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-26  1:58   ` Andrea Arcangeli
@ 2004-10-26  3:48     ` Nick Piggin
  2004-10-26  4:04       ` Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2004-10-26  3:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Mon, Oct 25, 2004 at 09:48:25PM -0400, Rik van Riel wrote:
> 
>>On Mon, 25 Oct 2004, Andrea Arcangeli wrote:
>>
>>
>>>This is a forward port to 2.6 CVS of the lowmem_reserve VM feature in
>>>the 2.4 kernel.
>>>
>>>	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-1
>>
>>-       unsigned long           protection[MAX_NR_ZONES];
>>+       unsigned long           lowmem_reserve[MAX_NR_ZONES];
>>
>>The gratituous renaming of variable and function names makes
>>it hard to see what this patch actually changed.  Hard enough
>>that I'm not sure what the behavioural difference is supposed
>>to be.
> 
> 
> the behavioural difference is the API and the fact the feaure is now
> enabled with sane values (the previous code was disabled by default and
> it was unusable with that API). besides fixing the API the patch nukes
> dozens of useless lines of code and a buffer overflow.  The sysctl
> definitely needs renaming or it'd break the ABI with userspace, it's far
> from a gratituous rename. since I was foroced to change the sysctl name
> accordingly with the new 2.4 API, I thought renaming the variable that
> is set by the sysctl was also required, otherwise the sysctl is called
> lowmem_reserve and the variable is still called protection. Clearly it's
> much cleaner if _both_ sysctl and variable are called lowmem_reserve.
> 
> I could have used protection2 to still use the "protection" name, but
> lowmem_reserve (btw, the same name I used first in 2.4, before
> protection ever existed in 2.6) looks nicer to me.
> 

I'd say go with the name change. "protection" is fairly vague...
OTOH, lowmem_reserve doesn't quite carry the meaning that it is
_protecting_ lower zones from higher zone allocations... maybe
lowmem_protection? But I don't mind too much.

I see classzone_idx snuck in, can we leave that as alloc_type please?

Otherwise, looks great.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-26  3:48     ` Nick Piggin
@ 2004-10-26  4:04       ` Andrea Arcangeli
  2004-10-26  4:17         ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-26  4:04 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rik van Riel, Andrew Morton, linux-kernel

On Tue, Oct 26, 2004 at 01:48:02PM +1000, Nick Piggin wrote:
> I see classzone_idx snuck in, can we leave that as alloc_type please?

when I wrote that code in 2.4 it was called class_idx. Just to show it
was not an opaque type, in this 2.6 I called it classzone_idx but it's
the same as class_idx. If you feel classzone_idx is too long I'm sure
fine to rename to class_idx like plain 2.4.

The reason I renamed it is that alloc_type tells nothing to who's
reading the code. That value in the opaque "alloc_type" variable, is
really the classzone_idx that identify the classzone we have to allocate
memory from. Classzone 2 means "all ram is good", classzone 2 means
"zone-normal + zone-dma is good", classzone 0 means "zone-dma is good".

alloc_type means very very little to me, calling it with a meaningful
name made the code more readable for me. gfp_mask describes other alloc
types as well that are not less singificant than the classzone_idx, so I
don't see why we should go back to the opaque variable name when we can
have a more descriptive one.

I'm not going to nitpick further on this detail though, all I care about
is the asm generated and not breaking ABIs with userspace (i.e. at least
one forced rename to the sysctl).

thanks for the review!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-26  4:04       ` Andrea Arcangeli
@ 2004-10-26  4:17         ` Nick Piggin
  2004-10-27  0:25           ` Andrea Arcangeli
  2004-10-27  0:31           ` Rik van Riel
  0 siblings, 2 replies; 32+ messages in thread
From: Nick Piggin @ 2004-10-26  4:17 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Tue, Oct 26, 2004 at 01:48:02PM +1000, Nick Piggin wrote:
> 
>>I see classzone_idx snuck in, can we leave that as alloc_type please?
> 
> 
> when I wrote that code in 2.4 it was called class_idx. Just to show it
> was not an opaque type, in this 2.6 I called it classzone_idx but it's
> the same as class_idx. If you feel classzone_idx is too long I'm sure
> fine to rename to class_idx like plain 2.4.
> 
> The reason I renamed it is that alloc_type tells nothing to who's
> reading the code. That value in the opaque "alloc_type" variable, is
> really the classzone_idx that identify the classzone we have to allocate
> memory from. Classzone 2 means "all ram is good", classzone 2 means
> "zone-normal + zone-dma is good", classzone 0 means "zone-dma is good".
> 

OK that makes sense... it isn't the length of the name, but the fact
that that naming convention hasn't proliferated thoughout the 2.6 tree;
maybe could you add a little comment along the lines of the above?
Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-26  4:17         ` Nick Piggin
@ 2004-10-27  0:25           ` Andrea Arcangeli
  2004-10-27  0:42             ` Andrew Morton
                               ` (2 more replies)
  2004-10-27  0:31           ` Rik van Riel
  1 sibling, 3 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  0:25 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rik van Riel, Andrew Morton, linux-kernel

On Tue, Oct 26, 2004 at 02:17:33PM +1000, Nick Piggin wrote:
> OK that makes sense... it isn't the length of the name, but the fact
> that that naming convention hasn't proliferated thoughout the 2.6 tree;

I don't see any other equivalent teminology besides my "classzone" word
existing, if we standardize on "alloc_type" to only mean a classzone
then I'd be fine to giveup on my wording, I don't care to retain my
wording, but to me classzone sounds more self explanatory than
alloc_type (though I must be biased having invented that word).

> maybe could you add a little comment along the lines of the above?

sure. done here:

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-26  4:17         ` Nick Piggin
  2004-10-27  0:25           ` Andrea Arcangeli
@ 2004-10-27  0:31           ` Rik van Riel
  2004-10-27  0:54             ` Andrea Arcangeli
  1 sibling, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2004-10-27  0:31 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrea Arcangeli, Andrew Morton, linux-kernel

On Tue, 26 Oct 2004, Nick Piggin wrote:

> OK that makes sense... it isn't the length of the name, but the fact
> that that naming convention hasn't proliferated thoughout the 2.6 tree;

Speaking about not proliferating...

One thing we need to make sure of is that the lower zone
protection stuff doesn't put the allocation threshold
higher than kswapd's freeing threshold.

Otherwise on a 1GB system, we'll end up cycling most of
userspace allocations through the 128MB highmem zone,
instead of falling back to the other zones.

Note that simply having a small lower zone protection isn't
enough, if we still cross the kswapd threshold.  I suspect
the lower zone protection should just be off by default, 
except if highmem is at least 5x (or 8x?) bigger than lowmem...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:25           ` Andrea Arcangeli
@ 2004-10-27  0:42             ` Andrew Morton
  2004-10-27  0:48               ` Andrea Arcangeli
  2004-10-27  2:06             ` Nick Piggin
  2004-10-28  0:26             ` Andrea Arcangeli
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-10-27  0:42 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: nickpiggin, riel, linux-kernel

Andrea Arcangeli <andrea@novell.com> wrote:
>
> I don't see any other equivalent teminology besides my "classzone" word
> existing,

I'll confess that I've never understood what "classzone" _means_.  Is it "a
zone from amongst several classes" or what?

If it was "zone_class" then it might mean "a particular classification of
zones".  Maybe that's what you meant?

I think a lot of other mm hackers share my confusion, which is why
"classzone" has been trickling away.  But yeah, we haven't been replacing it
with anything very useful.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:42             ` Andrew Morton
@ 2004-10-27  0:48               ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  0:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, riel, linux-kernel

On Tue, Oct 26, 2004 at 05:42:37PM -0700, Andrew Morton wrote:
> Andrea Arcangeli <andrea@novell.com> wrote:
> >
> > I don't see any other equivalent teminology besides my "classzone" word
> > existing,
> 
> I'll confess that I've never understood what "classzone" _means_.  Is it "a
> zone from amongst several classes" or what?
> 
> If it was "zone_class" then it might mean "a particular classification of
> zones".  Maybe that's what you meant?
> 
> I think a lot of other mm hackers share my confusion, which is why
> "classzone" has been trickling away.  But yeah, we haven't been replacing it
> with anything very useful.

it's very easy to explain it now that you spontanously used that concept
in 2.6: classzone is the piece of ram represented by what you previously
called alloc_type in 2.6.9 mainline alloc_pages function. You wrote that
code so you know what it means. If you prefer to still call that
alloc_type that's fine with me.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:31           ` Rik van Riel
@ 2004-10-27  0:54             ` Andrea Arcangeli
  2004-10-27  0:56               ` Andrea Arcangeli
                                 ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  0:54 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Tue, Oct 26, 2004 at 08:31:32PM -0400, Rik van Riel wrote:
> On Tue, 26 Oct 2004, Nick Piggin wrote:
> 
> > OK that makes sense... it isn't the length of the name, but the fact
> > that that naming convention hasn't proliferated thoughout the 2.6 tree;
> 
> Speaking about not proliferating...
> 
> One thing we need to make sure of is that the lower zone
> protection stuff doesn't put the allocation threshold
> higher than kswapd's freeing threshold.

I agree. I didn't introduce that bug, the very same problem would happen
with the previous protection code. So this is not a regression, I'm far
from finished... I'm just trying to post orthogonal patches, since Hugh
had a much better merging success rate with small patches (though I find
very hard to produce small patches myself when there's more than one
thing to fix in the same file).

the per-classzone kswapd treshold was very well taken care of in 2.4,
thanks the watermarks embedding the low/min/high and the classzone being
passed up to the kswapd wakeup function.

> Otherwise on a 1GB system, we'll end up cycling most of
> userspace allocations through the 128MB highmem zone,
> instead of falling back to the other zones.

that's the side effect of the per-zone lru too (though I'm not going to
change the lru).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:54             ` Andrea Arcangeli
@ 2004-10-27  0:56               ` Andrea Arcangeli
  2004-10-27  1:35                 ` Andrea Arcangeli
  2004-10-27  1:00               ` Rik van Riel
  2004-10-27  2:05               ` Nick Piggin
  2 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  0:56 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Wed, Oct 27, 2004 at 02:54:25AM +0200, Andrea Arcangeli wrote:
> I agree. I didn't introduce that bug, the very same problem would happen

btw, the severity of such performance bug in the LRU screwup, is minor
compared to the severity of the functional deadlocks or early oom kills
potentially generated by the lack of lowmem_reserve enabled with proper
defaults but sure we need to fix the performance LRU issue too.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:54             ` Andrea Arcangeli
  2004-10-27  0:56               ` Andrea Arcangeli
@ 2004-10-27  1:00               ` Rik van Riel
  2004-10-27  1:10                 ` Andrea Arcangeli
  2004-10-27  2:05               ` Nick Piggin
  2 siblings, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2004-10-27  1:00 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Wed, 27 Oct 2004, Andrea Arcangeli wrote:

> the per-classzone kswapd treshold was very well taken care of in 2.4,
> thanks the watermarks embedding the low/min/high and the classzone being
> passed up to the kswapd wakeup function.

And it works fine on non-numa x86.

However, doesn't the protection also serve a purpose for
NUMA fallback ?  How is that handled by your code ?

NUMA wasn't very important a few years ago, but with AMD64
systems being common, it is an important consideration now.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  1:00               ` Rik van Riel
@ 2004-10-27  1:10                 ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  1:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Tue, Oct 26, 2004 at 09:00:11PM -0400, Rik van Riel wrote:
> However, doesn't the protection also serve a purpose for
> NUMA fallback ?  How is that handled by your code ?

It doesn't.

The only reason any NUMA folk were interested in this stuff was to nuke
the incremental min weak band-aid that Linus introduced in 2.4 while
rejecting my real lowmem_reserve algorithm (that has been reinvented in
2.6.5 first and against 2.6.7 with the name of protection but still
buggy and disabled by default, so not very useful.., especially after
even the incremental-min was gone from 2.6.8+).

NUMA folks only want to nuke the incremental-min hack since that
escalated to huge waste of ram on the hundred nodes.

the major point of the rework in the protection algorithm happened
between 2.6.5 and 2.6.8 was to drop any NUMA effect from the algorithm
that guarantees DMA allocations on x86-64 and NORMAL allocations on x86
PAE. So being it unrelated to NUMA is a feature as far as I can tell.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:56               ` Andrea Arcangeli
@ 2004-10-27  1:35                 ` Andrea Arcangeli
  2004-10-27  2:08                   ` Andrew Morton
  2004-10-27  2:56                   ` Nick Piggin
  0 siblings, 2 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  1:35 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Nick Piggin, Andrew Morton, linux-kernel

this _incremental_ 2/? patch should fix the longtanding kswapd issue
vs protection algorithm, now lowmem_reserve (partly hidden by the lack
of lowmem_reserve/protection or equivalent band-aid enabled in 2.6.9).

--- 2-kswapd-balance/include/linux/mmzone.h.~1~	2004-10-27 03:17:07.207812600 +0200
+++ 2-kswapd-balance/include/linux/mmzone.h	2004-10-27 03:26:22.673369000 +0200
@@ -273,7 +273,7 @@ void __get_zone_counts(unsigned long *ac
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
-void wakeup_kswapd(struct zone *zone);
+void wakeup_kswapd(struct zone *zone, int classzone_idx);
 
 /*
  * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
--- 2-kswapd-balance/mm/page_alloc.c.~1~	2004-10-27 03:17:07.215811384 +0200
+++ 2-kswapd-balance/mm/page_alloc.c	2004-10-27 03:24:31.351292528 +0200
@@ -641,7 +641,7 @@ __alloc_pages(unsigned int gfp_mask, uns
 	}
 
 	for (i = 0; (z = zones[i]) != NULL; i++)
-		wakeup_kswapd(z);
+		wakeup_kswapd(z, classzone_idx);
 
 	/*
 	 * Go through the zonelist again. Let __GFP_HIGH and allocations
--- 2-kswapd-balance/mm/vmscan.c.~1~	2004-10-27 03:14:22.563842288 +0200
+++ 2-kswapd-balance/mm/vmscan.c	2004-10-27 03:26:57.462080312 +0200
@@ -1169,11 +1169,11 @@ static int kswapd(void *p)
 /*
  * A zone is low on free memory, so wake its kswapd task to service it.
  */
-void wakeup_kswapd(struct zone *zone)
+void wakeup_kswapd(struct zone *zone, int classzone_idx)
 {
 	if (zone->present_pages == 0)
 		return;
-	if (zone->free_pages > zone->pages_low)
+	if (zone->free_pages > zone->pages_low + zone->lowmem_reserve[classzone_idx])
 		return;
 	if (!waitqueue_active(&zone->zone_pgdat->kswapd_wait))
 		return;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:54             ` Andrea Arcangeli
  2004-10-27  0:56               ` Andrea Arcangeli
  2004-10-27  1:00               ` Rik van Riel
@ 2004-10-27  2:05               ` Nick Piggin
  2004-10-27  2:29                 ` Andrea Arcangeli
  2 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2004-10-27  2:05 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:

> the per-classzone kswapd treshold was very well taken care of in 2.4,
> thanks the watermarks embedding the low/min/high and the classzone being
> passed up to the kswapd wakeup function.
> 

Kswapd actually should take care of this properly: see the
initial loop before the real scanning loop.

I thought this was a bit subtle, but it seems to work fine,
and Andrew likes it.

I have a patch to explicitly have kswapd use the lower zone
protection watermarks but I haven't really demonstrated it is
better than what is currently there (other than being simpler).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:25           ` Andrea Arcangeli
  2004-10-27  0:42             ` Andrew Morton
@ 2004-10-27  2:06             ` Nick Piggin
  2004-10-28  0:26             ` Andrea Arcangeli
  2 siblings, 0 replies; 32+ messages in thread
From: Nick Piggin @ 2004-10-27  2:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Tue, Oct 26, 2004 at 02:17:33PM +1000, Nick Piggin wrote:

>>maybe could you add a little comment along the lines of the above?
> 
> 
> sure. done here:
> 
> 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-2
> 

Thanks Andrea.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  1:35                 ` Andrea Arcangeli
@ 2004-10-27  2:08                   ` Andrew Morton
  2004-10-27  2:31                     ` Andrea Arcangeli
  2004-10-27  2:56                   ` Nick Piggin
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-10-27  2:08 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: riel, nickpiggin, linux-kernel

Andrea Arcangeli <andrea@novell.com> wrote:
>
> this _incremental_ 2/? patch should fix the longtanding kswapd issue
>  vs protection algorithm

Could you please email the patch which this depends on?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  2:05               ` Nick Piggin
@ 2004-10-27  2:29                 ` Andrea Arcangeli
  2004-10-27  3:01                   ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  2:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rik van Riel, Andrew Morton, linux-kernel

On Wed, Oct 27, 2004 at 12:05:19PM +1000, Nick Piggin wrote:
> Andrea Arcangeli wrote:
> 
> >the per-classzone kswapd treshold was very well taken care of in 2.4,
> >thanks the watermarks embedding the low/min/high and the classzone being
> >passed up to the kswapd wakeup function.
> >
> 
> Kswapd actually should take care of this properly: see the
> initial loop before the real scanning loop.

I don't see how it can take care of it, if it doesn't even get a wakeup?
kswapd just sleeps.

kswapd is just an optimization, we try provide async allocation if the
kswapd watermarks can it keep up between low and high (and we hang
synchronosly while hitting the wall at min), so it's not fatal that
kswapd sleeps, but it can be fixed easily with the patch I just posted
some minute ago.

> I thought this was a bit subtle, but it seems to work fine,
> and Andrew likes it.
>
> I have a patch to explicitly have kswapd use the lower zone
> protection watermarks but I haven't really demonstrated it is
> better than what is currently there (other than being simpler).

how can it ever wakeup if nobody ever calls
wake_up_interruptible(&zone->zone_pgdat->kswapd_wait)?


Now the patch I posted fixes the wakeup side, but I'm not sure that the
"stop" is properly done yet. Peraphs that's what you mean, that the
kswapd-stop decision already works despite not being aware of the
low memory ram reservation?

Following the code I see nr_pages is 0 in balance_pgdat when invoked
from kswapd. nr_pages then goes here:

		if (nr_pages == 0) {
			/*
			 * Scan in the highmem->dma direction for the
			 * highest
			 * zone which needs scanning
			 */
			for (i = pgdat->nr_zones - 1; i >= 0; i--) {
				struct zone *zone = pgdat->node_zones + i;

				if (zone->all_unreclaimable &&
						priority != DEF_PRIORITY)
					continue;

				if (zone->free_pages <= zone->pages_high) {
					end_zone = i;
					goto scan;
				}
			}
			goto out;


shouldn't we take the full watermarks into account in the above too?
I think so, otherwise we wakeup but it goes back to sleep, i.e.
overscheduling but nothing done still, so I'm going to fixup the stop as
well.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  2:08                   ` Andrew Morton
@ 2004-10-27  2:31                     ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  2:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: riel, nickpiggin, linux-kernel

On Tue, Oct 26, 2004 at 07:08:56PM -0700, Andrew Morton wrote:
> Andrea Arcangeli <andrea@novell.com> wrote:
> >
> > this _incremental_ 2/? patch should fix the longtanding kswapd issue
> >  vs protection algorithm
> 
> Could you please email the patch which this depends on?

it's against this one:

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-2

(the patch only address the wakeup issue, the stop is still unsolved,
but it can be done in a separate patch)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  1:35                 ` Andrea Arcangeli
  2004-10-27  2:08                   ` Andrew Morton
@ 2004-10-27  2:56                   ` Nick Piggin
  1 sibling, 0 replies; 32+ messages in thread
From: Nick Piggin @ 2004-10-27  2:56 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> this _incremental_ 2/? patch should fix the longtanding kswapd issue
> vs protection algorithm, now lowmem_reserve (partly hidden by the lack
> of lowmem_reserve/protection or equivalent band-aid enabled in 2.6.9).
> 
> --- 2-kswapd-balance/include/linux/mmzone.h.~1~	2004-10-27 03:17:07.207812600 +0200
> +++ 2-kswapd-balance/include/linux/mmzone.h	2004-10-27 03:26:22.673369000 +0200
> @@ -273,7 +273,7 @@ void __get_zone_counts(unsigned long *ac
>  void get_zone_counts(unsigned long *active, unsigned long *inactive,
>  			unsigned long *free);
>  void build_all_zonelists(void);
> -void wakeup_kswapd(struct zone *zone);
> +void wakeup_kswapd(struct zone *zone, int classzone_idx);
>  
>  /*
>   * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
> --- 2-kswapd-balance/mm/page_alloc.c.~1~	2004-10-27 03:17:07.215811384 +0200
> +++ 2-kswapd-balance/mm/page_alloc.c	2004-10-27 03:24:31.351292528 +0200
> @@ -641,7 +641,7 @@ __alloc_pages(unsigned int gfp_mask, uns
>  	}
>  
>  	for (i = 0; (z = zones[i]) != NULL; i++)
> -		wakeup_kswapd(z);
> +		wakeup_kswapd(z, classzone_idx);
>  
>  	/*
>  	 * Go through the zonelist again. Let __GFP_HIGH and allocations
> --- 2-kswapd-balance/mm/vmscan.c.~1~	2004-10-27 03:14:22.563842288 +0200
> +++ 2-kswapd-balance/mm/vmscan.c	2004-10-27 03:26:57.462080312 +0200
> @@ -1169,11 +1169,11 @@ static int kswapd(void *p)
>  /*
>   * A zone is low on free memory, so wake its kswapd task to service it.
>   */
> -void wakeup_kswapd(struct zone *zone)
> +void wakeup_kswapd(struct zone *zone, int classzone_idx)
>  {
>  	if (zone->present_pages == 0)
>  		return;
> -	if (zone->free_pages > zone->pages_low)
> +	if (zone->free_pages > zone->pages_low + zone->lowmem_reserve[classzone_idx])
>  		return;
>  	if (!waitqueue_active(&zone->zone_pgdat->kswapd_wait))
>  		return;

I don't think this is required, because by the time __alloc_pages
reaches wakeup_kswapd, it would have checked one zone with a
->lowmem_reserve of 0, and found it to be low on pages. Thus
wakeup_kswapd will wake it up.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  2:29                 ` Andrea Arcangeli
@ 2004-10-27  3:01                   ` Nick Piggin
  2004-10-27  3:23                     ` Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2004-10-27  3:01 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Wed, Oct 27, 2004 at 12:05:19PM +1000, Nick Piggin wrote:
> 
>>Andrea Arcangeli wrote:
>>
>>
>>>the per-classzone kswapd treshold was very well taken care of in 2.4,
>>>thanks the watermarks embedding the low/min/high and the classzone being
>>>passed up to the kswapd wakeup function.
>>>
>>
>>Kswapd actually should take care of this properly: see the
>>initial loop before the real scanning loop.
> 
> 
> I don't see how it can take care of it, if it doesn't even get a wakeup?
> kswapd just sleeps.
> 
> kswapd is just an optimization, we try provide async allocation if the
> kswapd watermarks can it keep up between low and high (and we hang
> synchronosly while hitting the wall at min), so it's not fatal that
> kswapd sleeps, but it can be fixed easily with the patch I just posted
> some minute ago.
> 

See my reply.

> 
>>I thought this was a bit subtle, but it seems to work fine,
>>and Andrew likes it.
>>
>>I have a patch to explicitly have kswapd use the lower zone
>>protection watermarks but I haven't really demonstrated it is
>>better than what is currently there (other than being simpler).
> 

[snip]

> shouldn't we take the full watermarks into account in the above too?
> I think so, otherwise we wakeup but it goes back to sleep, i.e.
> overscheduling but nothing done still, so I'm going to fixup the stop as
> well.
> 

Please don't just yet. I've already got this patch to do it, but
I've got more kswapd work underneath that (which makes kswapd
work properly with higher order allocations).

Currently it does not overschedule, because one zone is always
going to be low by the time kswapd wakes up. This causes all zones
below to be scanned as well.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  3:01                   ` Nick Piggin
@ 2004-10-27  3:23                     ` Andrea Arcangeli
  2004-10-27  3:34                       ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  3:23 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rik van Riel, Andrew Morton, linux-kernel

On Wed, Oct 27, 2004 at 01:01:54PM +1000, Nick Piggin wrote:
> Currently it does not overschedule, because one zone is always
> going to be low by the time kswapd wakes up. This causes all zones
> below to be scanned as well.

that's quite subtle (after all it's needed only for the numa pgdats) and
I agree on the wakeup side, the one thing that is wrong instead is the
kswapd-stop side. On that side you really need to know every single zone
that has to be balanced.  So whatever, you can't just use pages_high
there. I'm creating a zone->max_lowmem_reserve to fix that efficiently
(that's recalculated every time with the sysctl and at boot).

However my patch to wakeup_kswapd sure wouldn't hurt there.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  3:23                     ` Andrea Arcangeli
@ 2004-10-27  3:34                       ` Nick Piggin
  2004-10-27  3:43                         ` Andrew Morton
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2004-10-27  3:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Wed, Oct 27, 2004 at 01:01:54PM +1000, Nick Piggin wrote:
> 
>>Currently it does not overschedule, because one zone is always
>>going to be low by the time kswapd wakes up. This causes all zones
>>below to be scanned as well.
> 
> 
> that's quite subtle (after all it's needed only for the numa pgdats) and
> I agree on the wakeup side, the one thing that is wrong instead is the
> kswapd-stop side. On that side you really need to know every single zone
> that has to be balanced.  So whatever, you can't just use pages_high
> there. I'm creating a zone->max_lowmem_reserve to fix that efficiently
> (that's recalculated every time with the sysctl and at boot).
> 
> However my patch to wakeup_kswapd sure wouldn't hurt there.
> 

You do though... it is right as is (sort of).

It actually can overscan lower zones a little bit, because
whenever any higher zone in the pgdat is low on memory, then
it and all zones below it get scanned too.

So if HIGHMEM is low on memory, NORMAL and DMA get scanned
as well regardless. If NORMAL is low, then DMA gets scanned
as well.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  3:34                       ` Nick Piggin
@ 2004-10-27  3:43                         ` Andrew Morton
  2004-10-27  4:44                           ` Andrea Arcangeli
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-10-27  3:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: andrea, riel, linux-kernel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> It actually can overscan lower zones a little bit, because
>  whenever any higher zone in the pgdat is low on memory, then
>  it and all zones below it get scanned too.

Because we know that all of the eligible zones are below pages_low.  kswapd
will then work to bring all the relevant zones back to pages_high.

When working on this code it is very very easy to break the zone levelling:
you *have* to run a workload mix and monitor the numbers in /proc/vmstat to
ensure that all zones are undergoing page scanning at frequencies which are
proportional to their sizes.  It's easy to screw up the zone levelling so
all allocations end up coming from ZONE_NORMAL and pagecache pages in, say,
ZONE_DMA end up just sitting there.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  3:43                         ` Andrew Morton
@ 2004-10-27  4:44                           ` Andrea Arcangeli
  2004-10-27  4:51                             ` Rik van Riel
  2004-10-27  5:33                             ` Andrew Morton
  0 siblings, 2 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  4:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, riel, linux-kernel

On Tue, Oct 26, 2004 at 08:43:08PM -0700, Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >
> > It actually can overscan lower zones a little bit, because
> >  whenever any higher zone in the pgdat is low on memory, then
> >  it and all zones below it get scanned too.
> 
> Because we know that all of the eligible zones are below pages_low.  kswapd
> will then work to bring all the relevant zones back to pages_high.

entering the loop is easy and shrinking all zones is easy.

is how much to free from each zone in kswapd that won't work as 2.4 did
with only pages_high.

what we'll happen is that we'll blindly free a few pages from each zone,
but then we'll be allowed to allocate the highmem pages, and not the
normal/dma pages. So after allocating the highmem pages we invoke kswapd
again and it frees again some highmem/normal/dma pages but we keep only
using the highmem ones.  So for a while we may be rolling over only the
highmem lru and ignoring all freed pages from the normal/dma zones.

2.4 would have freed all pages from the lower zones first (without
this additional rolling over the highmem zone). Though it's a bit hard
to compare since 2.4 had a global lru.

The bigger the highmem zone, the less it will matter the stuff that can
be cached in the low zones. And the smaller the highemm zone the less
imbalancing will be generated since the reservation will be minuscle on
the lower zones. So perhaps it already works fine, but this is not 2.4
was working (I'm sure to make it work like 2.4 you'd need to add the
lowmem_reserve in the when-to-stop-kswapd math). But maybe it works even
better this way?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  4:44                           ` Andrea Arcangeli
@ 2004-10-27  4:51                             ` Rik van Riel
  2004-10-27  5:05                               ` Andrea Arcangeli
  2004-10-27  5:33                             ` Andrew Morton
  1 sibling, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2004-10-27  4:51 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, Nick Piggin, linux-kernel

On Wed, 27 Oct 2004, Andrea Arcangeli wrote:

> what we'll happen is that we'll blindly free a few pages from each zone,
> but then we'll be allowed to allocate the highmem pages, and not the
> normal/dma pages. So after allocating the highmem pages we invoke kswapd
> again and it frees again some highmem/normal/dma pages but we keep only
> using the highmem ones.  So for a while we may be rolling over only the
> highmem lru and ignoring all freed pages from the normal/dma zones.

This is not how the page allocator in 2.6 works.

It will not wake up kswapd until it is past the low
watermark in all zones.


-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  4:51                             ` Rik van Riel
@ 2004-10-27  5:05                               ` Andrea Arcangeli
  2004-10-27  5:50                                 ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  5:05 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, Nick Piggin, linux-kernel

On Wed, Oct 27, 2004 at 12:51:11AM -0400, Rik van Riel wrote:
> On Wed, 27 Oct 2004, Andrea Arcangeli wrote:
> 
> > what we'll happen is that we'll blindly free a few pages from each zone,
> > but then we'll be allowed to allocate the highmem pages, and not the
> > normal/dma pages. So after allocating the highmem pages we invoke kswapd
> > again and it frees again some highmem/normal/dma pages but we keep only
> > using the highmem ones.  So for a while we may be rolling over only the
> > highmem lru and ignoring all freed pages from the normal/dma zones.
> 
> This is not how the page allocator in 2.6 works.
> 
> It will not wake up kswapd until it is past the low
> watermark in all zones.

how else the page allocator should work if not waiting all zones to hit
the low+reserve watermarks? the page allocator works the same in 2.4
and 2.6, so I don't understand your point.

what changes is kswapd stop algorithm and the fact the lru is per-zone,
but I think I described correctly what happens in 2.6 kswapd-stop.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  4:44                           ` Andrea Arcangeli
  2004-10-27  4:51                             ` Rik van Riel
@ 2004-10-27  5:33                             ` Andrew Morton
  2004-10-27  6:11                               ` Andrea Arcangeli
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2004-10-27  5:33 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: nickpiggin, riel, linux-kernel

Andrea Arcangeli <andrea@novell.com> wrote:
>
> So after allocating the highmem pages we invoke kswapd
>  again

Nope.  We don't wake kswapd until the lower zones are below pages_low as
well.  Then we rebalance all the zones which are below pages_high.

As I say: run a workload mix and monitor /proc/vmstat:

	pgscan_kswapd_high
	pgscan_kswapd_normal
	pgscan_kswapd_dma

These should be increasing at rates which are proportional to their size,
if most allocations have __GFP_HIGHMEM.

The lower-zone protection thing means that the scanning rate really should
be proportional to (zone size - (protection from __GFP_HIGHMEM
allocations)), but it's not too far off at present.

The balancing of the zone scanning rates is subtle and easy to break.  As is
the balancing of that against slab scanning.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  5:05                               ` Andrea Arcangeli
@ 2004-10-27  5:50                                 ` Nick Piggin
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Piggin @ 2004-10-27  5:50 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Andrew Morton, linux-kernel

Andrea Arcangeli wrote:
> On Wed, Oct 27, 2004 at 12:51:11AM -0400, Rik van Riel wrote:
> 
>>On Wed, 27 Oct 2004, Andrea Arcangeli wrote:
>>
>>
>>>what we'll happen is that we'll blindly free a few pages from each zone,
>>>but then we'll be allowed to allocate the highmem pages, and not the
>>>normal/dma pages. So after allocating the highmem pages we invoke kswapd
>>>again and it frees again some highmem/normal/dma pages but we keep only
>>>using the highmem ones.  So for a while we may be rolling over only the
>>>highmem lru and ignoring all freed pages from the normal/dma zones.
>>

That's what I mean when I say it can overscan the lower zones.
You don't want to change the stop condition here though because
obviously the highmem zone still has work to do - you just want
to stop scanning the lower zones.

I don't think it is a big problem though.

I do have a patch to make it use the lower zone protection watermarks
explicitly, but it is sitting on top of other stuff which I'm going
to try to get merged first...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  5:33                             ` Andrew Morton
@ 2004-10-27  6:11                               ` Andrea Arcangeli
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-27  6:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, riel, linux-kernel

On Tue, Oct 26, 2004 at 10:33:35PM -0700, Andrew Morton wrote:
> Andrea Arcangeli <andrea@novell.com> wrote:
> >
> > So after allocating the highmem pages we invoke kswapd
> >  again
> 
> Nope.  We don't wake kswapd until the lower zones are below pages_low as
> well.  Then we rebalance all the zones which are below pages_high.

the lower zones will always be below pages_low+lowmem_reserve forever
since the reservation is 800M of ram.

The allocator isn't using pages_low, the allocator is always using
pages_low+lowmem_reserve. This is why only highmem will be allocated and
then immediatly kswapd will be invoked again.

the lower zones are above pages_high all the time, but they're also
below pages_low+lowmem_reserve all the time.

I'm not talking about 2.6.9 with protection disabled, I'm talking about
2.6.9+lowmem_reserve enabled.

So there will definitely be a rolling on the highmem, but it's probably
ok see below:

> As I say: run a workload mix and monitor /proc/vmstat:
> 
> 	pgscan_kswapd_high
> 	pgscan_kswapd_normal
> 	pgscan_kswapd_dma
> 
> These should be increasing at rates which are proportional to their size,
> if most allocations have __GFP_HIGHMEM.

they should yes. And this is also why it shouldn't be a problem this
behaviour of kswapd despite the rolling (after all we're shrinking cache
from the lower zones too, except the highmem cache will be rolled).

I'll try to measure the above soon (and I already applied it to all
relevant trees so it'll get some beating, since the aging details are
somewhat less severe than potential deadlocks or oom kills with lots of
ram and ptes, espcially now without the incremental min we have to
enable it).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: lowmem_reserve (replaces protection)
  2004-10-27  0:25           ` Andrea Arcangeli
  2004-10-27  0:42             ` Andrew Morton
  2004-10-27  2:06             ` Nick Piggin
@ 2004-10-28  0:26             ` Andrea Arcangeli
  2 siblings, 0 replies; 32+ messages in thread
From: Andrea Arcangeli @ 2004-10-28  0:26 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rik van Riel, Andrew Morton, linux-kernel

On Wed, Oct 27, 2004 at 02:25:37AM +0200, Andrea Arcangeli wrote:
> On Tue, Oct 26, 2004 at 02:17:33PM +1000, Nick Piggin wrote:
> > OK that makes sense... it isn't the length of the name, but the fact
> > that that naming convention hasn't proliferated thoughout the 2.6 tree;
> 
> I don't see any other equivalent teminology besides my "classzone" word
> existing, if we standardize on "alloc_type" to only mean a classzone
> then I'd be fine to giveup on my wording, I don't care to retain my
> wording, but to me classzone sounds more self explanatory than
> alloc_type (though I must be biased having invented that word).
> 
> > maybe could you add a little comment along the lines of the above?
> 
> sure. done here:
> 
> 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-2

sorry there was a small buglet, when testing I must have tweaked the
sysctl before sysrq+m so I overlooked it until now when I was playing
with a new feature (the other version of the patch against 2.6.5 that I
tested more closely didn't miss the below)

So it was disabled by default *still* ;)

--- 1-lowmem_reserve/mm/page_alloc.c.~1~	2004-10-28 01:56:50.000000000 +0200
+++ 1-lowmem_reserve/mm/page_alloc.c	2004-10-28 02:18:39.160567304 +0200
@@ -1916,6 +1916,7 @@ static int __init init_per_zone_pages_mi
 	if (min_free_kbytes > 16384)
 		min_free_kbytes = 16384;
 	setup_per_zone_pages_min();
+	setup_per_zone_lowmem_reserve();
 	return 0;
 }
 module_init(init_per_zone_pages_min)


full update ready to be merged is here:

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/lowmem_reserve-3

thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2004-10-28  0:32 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-25 17:01 lowmem_reserve (replaces protection) Andrea Arcangeli
2004-10-26  1:48 ` Rik van Riel
2004-10-26  1:58   ` Andrea Arcangeli
2004-10-26  3:48     ` Nick Piggin
2004-10-26  4:04       ` Andrea Arcangeli
2004-10-26  4:17         ` Nick Piggin
2004-10-27  0:25           ` Andrea Arcangeli
2004-10-27  0:42             ` Andrew Morton
2004-10-27  0:48               ` Andrea Arcangeli
2004-10-27  2:06             ` Nick Piggin
2004-10-28  0:26             ` Andrea Arcangeli
2004-10-27  0:31           ` Rik van Riel
2004-10-27  0:54             ` Andrea Arcangeli
2004-10-27  0:56               ` Andrea Arcangeli
2004-10-27  1:35                 ` Andrea Arcangeli
2004-10-27  2:08                   ` Andrew Morton
2004-10-27  2:31                     ` Andrea Arcangeli
2004-10-27  2:56                   ` Nick Piggin
2004-10-27  1:00               ` Rik van Riel
2004-10-27  1:10                 ` Andrea Arcangeli
2004-10-27  2:05               ` Nick Piggin
2004-10-27  2:29                 ` Andrea Arcangeli
2004-10-27  3:01                   ` Nick Piggin
2004-10-27  3:23                     ` Andrea Arcangeli
2004-10-27  3:34                       ` Nick Piggin
2004-10-27  3:43                         ` Andrew Morton
2004-10-27  4:44                           ` Andrea Arcangeli
2004-10-27  4:51                             ` Rik van Riel
2004-10-27  5:05                               ` Andrea Arcangeli
2004-10-27  5:50                                 ` Nick Piggin
2004-10-27  5:33                             ` Andrew Morton
2004-10-27  6:11                               ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox