From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
Chris Mason <chris.mason@oracle.com>,
Jens Axboe <jens.axboe@oracle.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure
Date: Mon, 15 Mar 2010 15:45:49 +0100 [thread overview]
Message-ID: <4B9E481D.5020709@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100315122948.GJ18274@csn.ul.ie>
Mel Gorman wrote:
> On Fri, Mar 12, 2010 at 09:37:55AM -0500, Andrew Morton wrote:
>> On Fri, 12 Mar 2010 13:15:05 +0100 Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> wrote:
>>
>>>> It still feels a bit unnatural though that the page allocator waits on
>>>> congestion when what it really cares about is watermarks. Even if this
>>>> patch works for Christian, I think it still has merit so will kick it a
>>>> few more times.
>>> In whatever way I can look at it watermark_wait should be supperior to
>>> congestion_wait. Because as Mel points out waiting for watermarks is
>>> what is semantically correct there.
>> If a direct-reclaimer waits for some thresholds to be achieved then what
>> task is doing reclaim?
>>
>> Ultimately, kswapd.
>
> Well, not quite. The direct reclaimer will still wake up after a timeout
> and try again regardless of whether watermarks have been met or not. The
> intention is to back after after direct reclaim has failed. Granted, the
> window during which a direct reclaim finishes and an allocation attempt
> occurs is unnecessarily large. This may be addressed by the patch that
> changes where cond_resched() is called.
>
>> This will introduce a hard dependency upon kswapd
>> activity. This might introduce scalability problems. And latency
>> problems if kswapd if off doodling with a slow device (say), or doing a
>> journal commit. And perhaps deadlocks if kswapd tries to take a lock
>> which one of the waiting-for-watermark direct relcaimers holds.
>>
>
> What lock could they be holding? Even if that is the case, the direct
> reclaimers do not wait indefinitily.
>
>> Generally, kswapd is an optional, best-effort latency optimisation
>> thing and we haven't designed for it to be a critical service.
>> Probably stuff would break were we to do so.
>>
>
> No disagreements there.
>
>> This is one of the reasons why we avoided creating such dependencies in
>> reclaim. Instead, what we do when a reclaimer is encountering lots of
>> dirty or in-flight pages is
>>
>> msleep(100);
>>
>> then try again. We're waiting for the disks, not kswapd.
>>
>> Only the hard-wired 100 is a bit silly, so we made the "100" variable,
>> inversely dependent upon the number of disks and their speed. If you
>> have more and faster disks then you sleep for less time.
>>
>> And that's what congestion_wait() does, in a very simplistic fashion.
>> It's a facility which direct-reclaimers use to ratelimit themselves in
>> inverse proportion to the speed with which the system can retire writes.
>>
>
> The problem being hit is when a direct reclaimer goes to sleep waiting
> on congestion when in reality there were not lots of dirty or in-flight
> pages. It goes to sleep for the wrong reasons and doesn't get woken up
> again until the timeout expires.
>
> Bear in mind that even if congestion clears, it just means that dirty
> pages are now clean although I admit that the next direct reclaim it
> does is going to encounter clean pages and should succeed.
>
> Lets see how the other patch that changes when cond_reched() gets called
> gets on. If it also works out, then it's harder to justify this patch.
> If it doesn't work out then it'll need to be kicked another few times.
>
Unfortunately "page-allocator: Attempt page allocation immediately after
direct reclaim" don't help. No improvement in the regression we had
fixed with the watermark wait patch.
-> *kick*^^
--
Grusse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-15 14:45 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-08 11:48 [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Mel Gorman
2010-03-08 11:48 ` [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion Mel Gorman
2010-03-09 13:35 ` Nick Piggin
2010-03-09 14:17 ` Mel Gorman
2010-03-09 15:03 ` Nick Piggin
2010-03-09 15:42 ` Christian Ehrhardt
2010-03-09 18:22 ` Mel Gorman
2010-03-10 2:38 ` Nick Piggin
2010-03-09 17:35 ` Mel Gorman
2010-03-10 2:35 ` Nick Piggin
2010-03-09 15:50 ` Christoph Lameter
2010-03-09 15:56 ` Christian Ehrhardt
2010-03-09 16:09 ` Christoph Lameter
2010-03-09 17:01 ` Mel Gorman
2010-03-09 17:11 ` Christoph Lameter
2010-03-09 17:30 ` Mel Gorman
2010-03-08 11:48 ` [PATCH 2/3] page-allocator: Check zone pressure when batch of pages are freed Mel Gorman
2010-03-09 9:53 ` Nick Piggin
2010-03-09 10:08 ` Mel Gorman
2010-03-09 10:23 ` Nick Piggin
2010-03-09 10:36 ` Mel Gorman
2010-03-09 11:11 ` Nick Piggin
2010-03-09 11:29 ` Mel Gorman
2010-03-08 11:48 ` [PATCH 3/3] vmscan: Put kswapd to sleep on its own waitqueue, not congestion Mel Gorman
2010-03-09 10:00 ` Nick Piggin
2010-03-09 10:21 ` Mel Gorman
2010-03-09 10:32 ` Nick Piggin
2010-03-11 23:41 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Andrew Morton
2010-03-12 6:39 ` Christian Ehrhardt
2010-03-12 7:05 ` Andrew Morton
2010-03-12 10:47 ` Mel Gorman
2010-03-12 12:15 ` Christian Ehrhardt
2010-03-12 14:37 ` Andrew Morton
2010-03-15 12:29 ` Mel Gorman
2010-03-15 14:45 ` Christian Ehrhardt [this message]
2010-03-15 12:34 ` Christian Ehrhardt
2010-03-15 20:09 ` Andrew Morton
2010-03-16 10:11 ` Mel Gorman
2010-03-18 17:42 ` Mel Gorman
2010-03-22 23:50 ` Mel Gorman
2010-03-23 14:35 ` Christian Ehrhardt
2010-03-23 21:35 ` Corrado Zoccolo
2010-03-24 11:48 ` Mel Gorman
2010-03-24 12:56 ` Corrado Zoccolo
2010-03-23 22:29 ` Rik van Riel
2010-03-24 14:50 ` Mel Gorman
2010-04-19 12:22 ` Christian Ehrhardt
2010-04-19 21:44 ` Johannes Weiner
2010-04-20 7:20 ` Christian Ehrhardt
2010-04-20 8:54 ` Christian Ehrhardt
2010-04-20 15:32 ` Johannes Weiner
2010-04-20 17:22 ` Rik van Riel
2010-04-21 4:23 ` Christian Ehrhardt
2010-04-21 7:35 ` Christian Ehrhardt
2010-04-21 13:19 ` Rik van Riel
2010-04-22 6:21 ` Christian Ehrhardt
2010-04-26 10:59 ` Subject: [PATCH][RFC] mm: make working set portion that is protected tunable v2 Christian Ehrhardt
2010-04-26 11:59 ` KOSAKI Motohiro
2010-04-26 12:43 ` Christian Ehrhardt
2010-04-26 14:20 ` Rik van Riel
2010-04-27 14:00 ` Christian Ehrhardt
2010-04-21 9:03 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Johannes Weiner
2010-04-21 13:20 ` Rik van Riel
2010-04-20 14:40 ` Rik van Riel
2010-03-24 2:38 ` Greg KH
2010-03-24 11:49 ` Mel Gorman
2010-03-24 13:13 ` Johannes Weiner
2010-03-12 9:09 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B9E481D.5020709@linux.vnet.ibm.com \
--to=ehrhardt@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).