From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>, Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, Chris Mason <chris.mason@oracle.com>,
Jens Axboe <jens.axboe@oracle.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion
Date: Tue, 09 Mar 2010 16:42:50 +0100 [thread overview]
Message-ID: <4B966C7A.5040706@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100309150332.GP8653@laptop>
Nick Piggin wrote:
> On Tue, Mar 09, 2010 at 02:17:13PM +0000, Mel Gorman wrote:
>> On Wed, Mar 10, 2010 at 12:35:13AM +1100, Nick Piggin wrote:
>>> On Mon, Mar 08, 2010 at 11:48:21AM +0000, Mel Gorman wrote:
>>>> Under heavy memory pressure, the page allocator may call congestion_wait()
>>>> to wait for IO congestion to clear or a timeout. This is not as sensible
>>>> a choice as it first appears. There is no guarantee that BLK_RW_ASYNC is
>>>> even congested as the pressure could have been due to a large number of
>>>> SYNC reads and the allocator waits for the entire timeout, possibly uselessly.
>>>>
>>>> At the point of congestion_wait(), the allocator is struggling to get the
>>>> pages it needs and it should back off. This patch puts the allocator to sleep
>>>> on a zone->pressure_wq for either a timeout or until a direct reclaimer or
>>>> kswapd brings the zone over the low watermark, whichever happens first.
>>>>
>>>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>>>> ---
>>>> include/linux/mmzone.h | 3 ++
>>>> mm/internal.h | 4 +++
>>>> mm/mmzone.c | 47 +++++++++++++++++++++++++++++++++++++++++++++
>>>> mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++++++++++++----
>>>> mm/vmscan.c | 2 +
>>>> 5 files changed, 101 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index 30fe668..72465c1 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
[...]
>>>> +{
>>>> + /* If no process is waiting, nothing to do */
>>>> + if (!waitqueue_active(zone->pressure_wq))
>>>> + return;
>>>> +
>>>> + /* Check if the high watermark is ok for order 0 */
>>>> + if (zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0))
>>>> + wake_up_interruptible(zone->pressure_wq);
>>>> +}
>>> If you were to do this under the zone lock (in your subsequent patch),
>>> then it could avoid races. I would suggest doing it all as a single
>>> patch and not doing the pressure checks in reclaim at all.
>>>
>> That is reasonable. I've already dropped the checks in reclaim because as you
>> say, if the free path check is cheap enough, it's also sufficient. Checking
>> in the reclaim paths as well is redundant.
>>
>> I'll move the call to check_zone_pressure() within the zone lock to avoid
>> races.
>>
Mel, we talked about a thundering herd issue that might come up here in
very constraint cases.
So wherever you end up putting that wake_up call, how about being extra
paranoid about a thundering herd flagging them WQ_FLAG_EXCLUSIVE and
waking them with something like that:
wake_up_interruptible_nr(zone->pressure_wq, #nrofpagesabovewatermark#);
That should be an easy to calculate sane max of waiters to wake up.
On the other hand it might be over-engineered and it implies the need to
reconsider when it would be best to wake up the rest.
Get me right - I don't really have a hard requirement or need for that,
I just wanted to mention it early on to hear your opinions about it.
looking forward to test the v2 patch series, adapted to all the good
stuff already discussed.
--
Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
WARNING: multiple messages have this Message-ID (diff)
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>, Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, Chris Mason <chris.mason@oracle.com>,
Jens Axboe <jens.axboe@oracle.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion
Date: Tue, 09 Mar 2010 16:42:50 +0100 [thread overview]
Message-ID: <4B966C7A.5040706@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100309150332.GP8653@laptop>
Nick Piggin wrote:
> On Tue, Mar 09, 2010 at 02:17:13PM +0000, Mel Gorman wrote:
>> On Wed, Mar 10, 2010 at 12:35:13AM +1100, Nick Piggin wrote:
>>> On Mon, Mar 08, 2010 at 11:48:21AM +0000, Mel Gorman wrote:
>>>> Under heavy memory pressure, the page allocator may call congestion_wait()
>>>> to wait for IO congestion to clear or a timeout. This is not as sensible
>>>> a choice as it first appears. There is no guarantee that BLK_RW_ASYNC is
>>>> even congested as the pressure could have been due to a large number of
>>>> SYNC reads and the allocator waits for the entire timeout, possibly uselessly.
>>>>
>>>> At the point of congestion_wait(), the allocator is struggling to get the
>>>> pages it needs and it should back off. This patch puts the allocator to sleep
>>>> on a zone->pressure_wq for either a timeout or until a direct reclaimer or
>>>> kswapd brings the zone over the low watermark, whichever happens first.
>>>>
>>>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>>>> ---
>>>> include/linux/mmzone.h | 3 ++
>>>> mm/internal.h | 4 +++
>>>> mm/mmzone.c | 47 +++++++++++++++++++++++++++++++++++++++++++++
>>>> mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++++++++++++----
>>>> mm/vmscan.c | 2 +
>>>> 5 files changed, 101 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index 30fe668..72465c1 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
[...]
>>>> +{
>>>> + /* If no process is waiting, nothing to do */
>>>> + if (!waitqueue_active(zone->pressure_wq))
>>>> + return;
>>>> +
>>>> + /* Check if the high watermark is ok for order 0 */
>>>> + if (zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0))
>>>> + wake_up_interruptible(zone->pressure_wq);
>>>> +}
>>> If you were to do this under the zone lock (in your subsequent patch),
>>> then it could avoid races. I would suggest doing it all as a single
>>> patch and not doing the pressure checks in reclaim at all.
>>>
>> That is reasonable. I've already dropped the checks in reclaim because as you
>> say, if the free path check is cheap enough, it's also sufficient. Checking
>> in the reclaim paths as well is redundant.
>>
>> I'll move the call to check_zone_pressure() within the zone lock to avoid
>> races.
>>
Mel, we talked about a thundering herd issue that might come up here in
very constraint cases.
So wherever you end up putting that wake_up call, how about being extra
paranoid about a thundering herd flagging them WQ_FLAG_EXCLUSIVE and
waking them with something like that:
wake_up_interruptible_nr(zone->pressure_wq, #nrofpagesabovewatermark#);
That should be an easy to calculate sane max of waiters to wake up.
On the other hand it might be over-engineered and it implies the need to
reconsider when it would be best to wake up the rest.
Get me right - I don't really have a hard requirement or need for that,
I just wanted to mention it early on to hear your opinions about it.
looking forward to test the v2 patch series, adapted to all the good
stuff already discussed.
--
Grusse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-09 15:42 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-08 11:48 [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Mel Gorman
2010-03-08 11:48 ` Mel Gorman
2010-03-08 11:48 ` [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion Mel Gorman
2010-03-08 11:48 ` Mel Gorman
2010-03-09 13:35 ` Nick Piggin
2010-03-09 13:35 ` Nick Piggin
2010-03-09 14:17 ` Mel Gorman
2010-03-09 14:17 ` Mel Gorman
2010-03-09 15:03 ` Nick Piggin
2010-03-09 15:03 ` Nick Piggin
2010-03-09 15:42 ` Christian Ehrhardt [this message]
2010-03-09 15:42 ` Christian Ehrhardt
2010-03-09 18:22 ` Mel Gorman
2010-03-09 18:22 ` Mel Gorman
2010-03-10 2:38 ` Nick Piggin
2010-03-10 2:38 ` Nick Piggin
2010-03-09 17:35 ` Mel Gorman
2010-03-09 17:35 ` Mel Gorman
2010-03-10 2:35 ` Nick Piggin
2010-03-10 2:35 ` Nick Piggin
2010-03-09 15:50 ` Christoph Lameter
2010-03-09 15:50 ` Christoph Lameter
2010-03-09 15:56 ` Christian Ehrhardt
2010-03-09 15:56 ` Christian Ehrhardt
2010-03-09 16:09 ` Christoph Lameter
2010-03-09 16:09 ` Christoph Lameter
2010-03-09 17:01 ` Mel Gorman
2010-03-09 17:01 ` Mel Gorman
2010-03-09 17:11 ` Christoph Lameter
2010-03-09 17:11 ` Christoph Lameter
2010-03-09 17:30 ` Mel Gorman
2010-03-09 17:30 ` Mel Gorman
2010-03-08 11:48 ` [PATCH 2/3] page-allocator: Check zone pressure when batch of pages are freed Mel Gorman
2010-03-08 11:48 ` Mel Gorman
2010-03-09 9:53 ` Nick Piggin
2010-03-09 9:53 ` Nick Piggin
2010-03-09 10:08 ` Mel Gorman
2010-03-09 10:08 ` Mel Gorman
2010-03-09 10:23 ` Nick Piggin
2010-03-09 10:23 ` Nick Piggin
2010-03-09 10:36 ` Mel Gorman
2010-03-09 10:36 ` Mel Gorman
2010-03-09 11:11 ` Nick Piggin
2010-03-09 11:11 ` Nick Piggin
2010-03-09 11:29 ` Mel Gorman
2010-03-09 11:29 ` Mel Gorman
2010-03-08 11:48 ` [PATCH 3/3] vmscan: Put kswapd to sleep on its own waitqueue, not congestion Mel Gorman
2010-03-08 11:48 ` Mel Gorman
2010-03-09 10:00 ` Nick Piggin
2010-03-09 10:00 ` Nick Piggin
2010-03-09 10:21 ` Mel Gorman
2010-03-09 10:21 ` Mel Gorman
2010-03-09 10:32 ` Nick Piggin
2010-03-09 10:32 ` Nick Piggin
2010-03-11 23:41 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Andrew Morton
2010-03-11 23:41 ` Andrew Morton
2010-03-12 6:39 ` Christian Ehrhardt
2010-03-12 6:39 ` Christian Ehrhardt
2010-03-12 7:05 ` Andrew Morton
2010-03-12 7:05 ` Andrew Morton
2010-03-12 10:47 ` Mel Gorman
2010-03-12 10:47 ` Mel Gorman
2010-03-12 12:15 ` Christian Ehrhardt
2010-03-12 12:15 ` Christian Ehrhardt
2010-03-12 14:37 ` Andrew Morton
2010-03-12 14:37 ` Andrew Morton
2010-03-15 12:29 ` Mel Gorman
2010-03-15 12:29 ` Mel Gorman
2010-03-15 14:45 ` Christian Ehrhardt
2010-03-15 14:45 ` Christian Ehrhardt
2010-03-15 12:34 ` Christian Ehrhardt
2010-03-15 12:34 ` Christian Ehrhardt
2010-03-15 20:09 ` Andrew Morton
2010-03-15 20:09 ` Andrew Morton
2010-03-16 10:11 ` Mel Gorman
2010-03-16 10:11 ` Mel Gorman
2010-03-18 17:42 ` Mel Gorman
2010-03-18 17:42 ` Mel Gorman
2010-03-22 23:50 ` Mel Gorman
2010-03-22 23:50 ` Mel Gorman
2010-03-23 14:35 ` Christian Ehrhardt
2010-03-23 14:35 ` Christian Ehrhardt
2010-03-23 21:35 ` Corrado Zoccolo
2010-03-23 21:35 ` Corrado Zoccolo
2010-03-24 11:48 ` Mel Gorman
2010-03-24 11:48 ` Mel Gorman
2010-03-24 12:56 ` Corrado Zoccolo
2010-03-24 12:56 ` Corrado Zoccolo
2010-03-23 22:29 ` Rik van Riel
2010-03-23 22:29 ` Rik van Riel
2010-03-24 14:50 ` Mel Gorman
2010-03-24 14:50 ` Mel Gorman
2010-04-19 12:22 ` Christian Ehrhardt
2010-04-19 12:22 ` Christian Ehrhardt
2010-04-19 21:44 ` Johannes Weiner
2010-04-19 21:44 ` Johannes Weiner
2010-04-20 7:20 ` Christian Ehrhardt
2010-04-20 7:20 ` Christian Ehrhardt
2010-04-20 8:54 ` Christian Ehrhardt
2010-04-20 8:54 ` Christian Ehrhardt
2010-04-20 15:32 ` Johannes Weiner
2010-04-20 15:32 ` Johannes Weiner
2010-04-20 17:22 ` Rik van Riel
2010-04-20 17:22 ` Rik van Riel
2010-04-21 4:23 ` Christian Ehrhardt
2010-04-21 4:23 ` Christian Ehrhardt
2010-04-21 7:35 ` Christian Ehrhardt
2010-04-21 7:35 ` Christian Ehrhardt
2010-04-21 13:19 ` Rik van Riel
2010-04-21 13:19 ` Rik van Riel
2010-04-22 6:21 ` Christian Ehrhardt
2010-04-22 6:21 ` Christian Ehrhardt
2010-04-26 10:59 ` Subject: [PATCH][RFC] mm: make working set portion that is protected tunable v2 Christian Ehrhardt
2010-04-26 10:59 ` Christian Ehrhardt
2010-04-26 11:59 ` KOSAKI Motohiro
2010-04-26 11:59 ` KOSAKI Motohiro
2010-04-26 12:43 ` Christian Ehrhardt
2010-04-26 12:43 ` Christian Ehrhardt
2010-04-26 14:20 ` Rik van Riel
2010-04-26 14:20 ` Rik van Riel
2010-04-27 14:00 ` Christian Ehrhardt
2010-04-27 14:00 ` Christian Ehrhardt
2010-04-21 9:03 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Johannes Weiner
2010-04-21 9:03 ` Johannes Weiner
2010-04-21 13:20 ` Rik van Riel
2010-04-21 13:20 ` Rik van Riel
2010-04-20 14:40 ` Rik van Riel
2010-04-20 14:40 ` Rik van Riel
2010-03-24 2:38 ` Greg KH
2010-03-24 2:38 ` Greg KH
2010-03-24 11:49 ` Mel Gorman
2010-03-24 11:49 ` Mel Gorman
2010-03-24 13:13 ` Johannes Weiner
2010-03-24 13:13 ` Johannes Weiner
2010-03-12 9:09 ` Mel Gorman
2010-03-12 9:09 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B966C7A.5040706@linux.vnet.ibm.com \
--to=ehrhardt@linux.vnet.ibm.com \
--cc=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.