From: Andy Whitcroft <apw@shadowen.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@skynet.ie>,
Nicolas Mailhot <nicolas.mailhot@laposte.net>,
Christoph Lameter <clameter@sgi.com>,
akpm@linux-foundation.org,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
Date: Thu, 17 May 2007 13:22:09 +0100 [thread overview]
Message-ID: <464C48F1.3060903@shadowen.org> (raw)
In-Reply-To: <464BFF9D.809@yahoo.com.au>
Nick Piggin wrote:
> Mel Gorman wrote:
>> On (17/05/07 01:44), Nick Piggin didst pronounce:
>
>>>> If the watermark was totally ignored with the second patch, I would
>>>> understand
>>>> but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
>>>> allocation, the watermarks are obeyed for order-0 so memory does not
>>>> get
>>>> exhausted as that could cause a host of problems. The difference is
>>>> if this
>>>> is a HIGH or HARDER allocation and the memory can be granted without
>>>> going
>>>> belong the order-0 watermarks, it'll succeed. Would it be better if the
>>>> lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
>>>> should be obeyed?
>>>
>>> But I don't know why you want to disobey higher order watermarks in the
>>> first place.
>>
>>
>> Because the original problem was bio_alloc() allocations failing and
>> the OOM
>> log showed that the higher-order pages were available. Patch 2
>> addressed it
>> by succeeding these allocations if the min watermark was not breached
>> with the
>> knowledge that kswapd was awake and reclaiming at the relevant order.
>> I think
>> it may even have solved it without the kswapd change but the kswapd
>> change
>> seemed sensible.
>
> But that just breaks the watermarks.
>
> It could be that the actual values of the watermarks as they are now are
> not very good ones, which is where the problem is coming from.
>
>
>>> *Those* are exactly the things that are going to be helpful
>>> to fix this problem of atomic higher order allocations failing or non
>>> atomic ones going into direct reclaim.
>>>
>>
>>
>> And the intention was that non-atomic ones would go into direct reclaim
>> after kicking kswapd but the atomic allocations would at least
>> succeeed if
>> the memory was there as long as they don't totally mess up watermarks.
>
> But we have 3 levels of watermarks, so you can keep a reserve for atomic
> allocations _and_ a buffer between the reclaim watermark and the direct
> reclaim watermark.
>
>
>>>> Raising watermarks is no guarantee that a high-order allocation that
>>>> can sleep
>>>> will occur at the right time to kick kswapd awake and that it'll get
>>>> back from
>>>> whatever it's doing in time to spot the new order and start
>>>> reclaiming again.
>>>
>>> You don't *need* a higher order allocation that can sleep in order
>>> to kick kswapd. Crikey, I keep saying this.
>>>
>>
>>
>> Indeed, we seem to have got stuck in a loop of sorts.
>>
>> I understand that kswapd gets kicked awake either way but there must be a
>> timing issue. Lets say we had a situations like
>>
>> order-0 alloc
>> watermark hit => wake kswapd
>> order-0 alloc kswapd reclaiming order 0
>> order-0 alloc kswapd reclaiming order 0
>> order-3 alloc => kick kswap for order 3
>> order-0 alloc kswapd reclaiming order 0
>> order-3 alloc kswapd reclaiming order 0
>> order-3 alloc kswapd reclaiming order 0
>> order-3 alloc => highorder mark hit, fail
>>
>> kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
>> and spots the new order and start over again. So there is a potentially
>> sizable window there where problems can hit. Right?
>
> Take a look at the code. wakeup_kswapd and __alloc_pages.
>
> First, assume the zone is above high watermarks for order-0 and order-1.
> order-0 allocs...
> order-1 low watermark hit => don't care, not allocing order-1
> order-0 low watermark hit => wake kswapd reclaim order 0
> order-1 alloc => wakeup_kswapd raises kswapd_max_order to 1
> order-1 allocs continue to succeed until the min watermark is hit
> order-1 *atomic* allocs continue until the atomic reserve is hit
> order-1 memalloc allocs continue until no more order-1 pages left.
This represents the ideal. However we never consider the reserves at
order-1 unless we get an order-1 allocation. With lots of order-0
allocations (the norm) we can run the order-1 availability well below
even the atomic reserve without anyone noticing, while the total reserve
is above the order-0 low watermark. Here kswapd has been idle as there
is only order-0 activity and we have sufficient of those. THEN an
order-1 comes in, we are below the order-1 low watermarks, we wake
kswapd, and retry and discover we are below the atomic threshold and
_fail_ the allocation.
>
> There really is (or should be) a proper watermarking system in place that
> provides the right buffering for higher order allocations.
I think that this is should be, not is.
>>> Working out why it apparently isn't working, first. Then maybe look at
>>> raising watermarks (they get reduced fairly rapidly as the order
>>> increases,
>>> so it might just be that there is not enough at order-3).
>>>
>>
>>
>> I believe it failed to work due to a combination of kswapd reclaiming at
>> the wrong order for a while and the fact that the watermarks are pretty
>> agressive when it comes to higher orders. I'm trying to think of
>> alternative fixes but keep coming back to the current fix using
>> !(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
>> the memory is there and above min watermarks at order-0.
>
> kswapd reclaiming at the wrong order should be a bug. It should start
> reclaiming at the right order as soon as an allocation (atomic or not)
> goes through the "start reclaiming now" watermark.
>
> Now this is just looking at mainline code that has the kswapd_max_order,
> and kswapd doesn't actually reclaim "at" any order -- it just uses the
> kswapd_max_order to know when the required "stop reclaiming now" marks
> have been hit. If lumpy reclaim is not reclaiming at the right order,
> then it means it isn't refreshing from kswapd_max_order enough.
Yes I believe all of this is working as designed. The problem is that
we treat order-0 and order-1 allocations as independant. We do not take
into account that we split order-1's to make order-0. We do not check
the order-1 reserve for order 0 and so wake kswapd early enough. It is
very hard given the interdependant nature if the current calculation to
detect transitions at _other_ orders when we allocate at any specific order.
Hmmmmmm.
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-17 12:22 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
2007-05-14 18:01 ` Christoph Lameter
2007-05-14 18:13 ` Christoph Lameter
2007-05-14 18:24 ` Mel Gorman
2007-05-14 18:52 ` Christoph Lameter
2007-05-15 8:42 ` Nicolas Mailhot
2007-05-15 9:16 ` Mel Gorman
2007-05-16 8:25 ` Nick Piggin
2007-05-16 9:03 ` Mel Gorman
2007-05-16 9:10 ` Nick Piggin
2007-05-16 9:45 ` Mel Gorman
2007-05-16 12:28 ` Nick Piggin
2007-05-16 13:50 ` Mel Gorman
2007-05-16 14:04 ` Nick Piggin
2007-05-16 15:32 ` Mel Gorman
2007-05-16 15:44 ` Nick Piggin
2007-05-16 16:46 ` Mel Gorman
2007-05-17 7:09 ` Nick Piggin
2007-05-17 12:22 ` Andy Whitcroft [this message]
2007-05-18 2:25 ` Nick Piggin
2007-05-16 15:46 ` Nick Piggin
2007-05-16 14:20 ` Nick Piggin
2007-05-16 15:06 ` Nicolas Mailhot
2007-05-16 15:33 ` Mel Gorman
2007-05-15 17:09 ` Christoph Lameter
2007-05-15 4:39 ` Christoph Lameter
2007-05-14 18:19 ` Mel Gorman
2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
2007-05-16 12:14 ` Nick Piggin
2007-05-16 13:24 ` Mel Gorman
2007-05-16 13:35 ` Nick Piggin
2007-05-16 14:00 ` Mel Gorman
2007-05-16 14:11 ` Nick Piggin
2007-05-16 18:28 ` Andy Whitcroft
2007-05-16 18:48 ` Mel Gorman
2007-05-16 19:00 ` Christoph Lameter
2007-05-17 7:34 ` Nick Piggin
2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=464C48F1.3060903@shadowen.org \
--to=apw@shadowen.org \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=nickpiggin@yahoo.com.au \
--cc=nicolas.mailhot@laposte.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.