All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Andrew Morton <akpm@osdl.org>,
	Linux Memory Management <linux-mm@kvack.org>,
	Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH 2/3] higher order watermarks
Date: Thu, 04 Nov 2004 23:20:54 +1100	[thread overview]
Message-ID: <418A1EA6.70500@yahoo.com.au> (raw)
In-Reply-To: <20041104085745.GA7186@logos.cnet>

Marcelo Tosatti wrote:
> On Wed, Oct 27, 2004 at 06:02:12PM +1000, Nick Piggin wrote:
> 
>>2/3
> 
> 
>>
>>Move the watermark checking code into a single function. Extend it to account
>>for the order of the allocation and the number of free pages that could satisfy
>>such a request.
>>
>>Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
> 
> 
> Hi Nick,
> 
> I have a few comments and doubts.
> 

Hi Marcelo,
Thanks for the comments and review. It is always very helpful to
have more eyes on this area of code especially. Let's see...

> 
>> linux-2.6-npiggin/include/linux/mmzone.h |    2 +
>> linux-2.6-npiggin/mm/page_alloc.c        |   58 ++++++++++++++++++++-----------
>> 2 files changed, 41 insertions(+), 19 deletions(-)
>>
>>diff -puN mm/page_alloc.c~vm-alloc-order-watermarks mm/page_alloc.c
>>--- linux-2.6/mm/page_alloc.c~vm-alloc-order-watermarks	2004-10-27 16:41:32.000000000 +1000
>>+++ linux-2.6-npiggin/mm/page_alloc.c	2004-10-27 17:53:33.000000000 +1000
>>@@ -586,6 +586,37 @@ buffered_rmqueue(struct zone *zone, int 
>> }
>> 
>> /*
>>+ * Return 1 if free pages are above 'mark'. This takes into account the order
>>+ * of the allocation.
>>+ */
>>+int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>>+		int alloc_type, int can_try_harder, int gfp_high)
>>+{
>>+	/* free_pages my go negative - that's OK */
>>+	long min = mark, free_pages = z->free_pages - (1 << order) + 1;
>>+	int o;
>>+
>>+	if (gfp_high)
>>+		min -= min / 2;
>>+	if (can_try_harder)
>>+		min -= min / 4;
>>+
>>+	if (free_pages <= min + z->protection[alloc_type])
>>+		return 0;
>>+	for (o = 0; o < order; o++) {
>>+		/* At the next order, this order's pages become unavailable */
>>+		free_pages -= z->free_area[order].nr_free << o;
>>+
>>+		/* Require fewer higher order pages to be free */
>>+		min >>= 1;
> 
> 
> I can't understand this. You decrease from free_pages 
> nr_order_free_pages << o, in a loop, and divide min by two.
> 
> What is the meaning of "nr_free_pages[order] << o" ? Its only meaningful
> when o == order?
> 
> You're multiplying the number of free pages of the order the allocation
> wants by "0, 1..order". The two values have different meanings, until 
> o == order.
> 
> In the first iteration of the loop, order is 0, so you decrease from free_pages 
> "z->free_area[order].nr_free". Again, the two values mean different things.
> 
> Can you enlight me?
> 
> I see you're trying to have some kind of extra protection, but the calculation 
> is difficult to understand for me.
> 

OK, we store the number of "order-pages" free for each order, so for
example, 16K worth of order-2 pages (on a 4K page architecture) will
count towards just 1 nr_free.

So now what we need to do in order to calculate, say the amount of memory
that will satisfy order-2 *and above* (this is important) is the following:

	z->free_pages - (order[0].nr_free << 0) - (order[1].nr_free << 1)

to find order-3 and above, you also need to subtract (order[2].nr_free << 2).

I quite liked this method because it has progressively less cost on lower
order allocations, and for order-0 we don't need to do any calculation.

Of course it is slightly racy, which is why I say free_pages can go negative,
but that should be OK.

Probably the comment there is woefully inadequate? - I sometimes forget that
people can't read my mind :\

> 
>>+
>>+		if (free_pages <= min)
>>+			return 0;
>>+	}
>>+	return 1;
>>+}
>>+
>>+/*
>>  * This is the 'heart' of the zoned buddy allocator.
>>  *
>>  * Herein lies the mysterious "incremental min".  That's the
>>@@ -606,7 +637,6 @@ __alloc_pages(unsigned int gfp_mask, uns
>> 		struct zonelist *zonelist)
>> {
>> 	const int wait = gfp_mask & __GFP_WAIT;
>>-	unsigned long min;
>> 	struct zone **zones, *z;
>> 	struct page *page;
>> 	struct reclaim_state reclaim_state;
>>@@ -636,9 +666,9 @@ __alloc_pages(unsigned int gfp_mask, uns
>> 
>> 	/* Go through the zonelist once, looking for a zone with enough free */
>> 	for (i = 0; (z = zones[i]) != NULL; i++) {
>>-		min = z->pages_low + (1<<order) + z->protection[alloc_type];
>> 
>>-		if (z->free_pages < min)
>>+		if (!zone_watermark_ok(z, order, z->pages_low,
>>+				alloc_type, 0, 0))
> 
> 
> 
> The original code didnt had the can_try_harder/gfp_high decrease 
> which is now on zone_watermark_ok. 
> 
> Means that those allocations will now be successful earlier, instead
> of going to the next zonelist iteration. kswapd will not be awake
> when it used to be.
> 
> Hopefully it doesnt matter that much. You did this by intention?
> 

That should be OK: the last two zero arguments mean that doesn't
get evaluated; so it should work as you'd expect I think?

Thanks,
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-11-04 12:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-27  8:00 [PATCH 0/3] teach kswapd about higher order allocations Nick Piggin
2004-10-27  8:00 ` [PATCH 1/3] keep count of free areas Nick Piggin
2004-10-27  8:02   ` [PATCH 2/3] higher order watermarks Nick Piggin
2004-10-27  8:02     ` [PATCH 3/3] teach kswapd about higher order areas Nick Piggin
2004-10-27  8:13       ` Nick Piggin
2004-11-04  8:57     ` [PATCH 2/3] higher order watermarks Marcelo Tosatti
2004-11-04 12:20       ` Nick Piggin [this message]
2004-11-04  9:55         ` Marcelo Tosatti
2004-11-05  1:06           ` Nick Piggin
2004-11-04 22:47             ` Marcelo Tosatti
2004-11-05  2:08               ` Andrew Morton
2004-11-05  2:14               ` Nick Piggin
2004-11-04 10:02         ` Marcelo Tosatti
2004-11-05  1:12           ` Nick Piggin
2004-11-10 16:23       ` Marcelo Tosatti
2004-11-11  1:41         ` Nick Piggin
2004-11-11 10:18           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=418A1EA6.70500@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=linux-mm@kvack.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.