linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>,
	linux-mm@kvack.org, Chris Mason <chris.mason@oracle.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion
Date: Tue, 9 Mar 2010 18:22:28 +0000	[thread overview]
Message-ID: <20100309182228.GJ4883@csn.ul.ie> (raw)
In-Reply-To: <4B966C7A.5040706@linux.vnet.ibm.com>

On Tue, Mar 09, 2010 at 04:42:50PM +0100, Christian Ehrhardt wrote:
>
>
> Nick Piggin wrote:
>> On Tue, Mar 09, 2010 at 02:17:13PM +0000, Mel Gorman wrote:
>>> On Wed, Mar 10, 2010 at 12:35:13AM +1100, Nick Piggin wrote:
>>>> On Mon, Mar 08, 2010 at 11:48:21AM +0000, Mel Gorman wrote:
>>>>> Under heavy memory pressure, the page allocator may call congestion_wait()
>>>>> to wait for IO congestion to clear or a timeout. This is not as sensible
>>>>> a choice as it first appears. There is no guarantee that BLK_RW_ASYNC is
>>>>> even congested as the pressure could have been due to a large number of
>>>>> SYNC reads and the allocator waits for the entire timeout, possibly uselessly.
>>>>>
>>>>> At the point of congestion_wait(), the allocator is struggling to get the
>>>>> pages it needs and it should back off. This patch puts the allocator to sleep
>>>>> on a zone->pressure_wq for either a timeout or until a direct reclaimer or
>>>>> kswapd brings the zone over the low watermark, whichever happens first.
>>>>>
>>>>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>>>>> ---
>>>>>  include/linux/mmzone.h |    3 ++
>>>>>  mm/internal.h          |    4 +++
>>>>>  mm/mmzone.c            |   47 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>  mm/page_alloc.c        |   50 +++++++++++++++++++++++++++++++++++++++++++----
>>>>>  mm/vmscan.c            |    2 +
>>>>>  5 files changed, 101 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>>> index 30fe668..72465c1 100644
>>>>> --- a/include/linux/mmzone.h
>>>>> +++ b/include/linux/mmzone.h
> [...]
>>>>> +{
>>>>> +	/* If no process is waiting, nothing to do */
>>>>> +	if (!waitqueue_active(zone->pressure_wq))
>>>>> +		return;
>>>>> +
>>>>> +	/* Check if the high watermark is ok for order 0 */
>>>>> +	if (zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0))
>>>>> +		wake_up_interruptible(zone->pressure_wq);
>>>>> +}
>>>> If you were to do this under the zone lock (in your subsequent patch),
>>>> then it could avoid races. I would suggest doing it all as a single
>>>> patch and not doing the pressure checks in reclaim at all.
>>>>
>>> That is reasonable. I've already dropped the checks in reclaim because as you
>>> say, if the free path check is cheap enough, it's also sufficient. Checking
>>> in the reclaim paths as well is redundant.
>>>
>>> I'll move the call to check_zone_pressure() within the zone lock to avoid
>>> races.
>>>
>
> Mel, we talked about a thundering herd issue that might come up here in  
> very constraint cases.
> So wherever you end up putting that wake_up call, how about being extra  
> paranoid about a thundering herd flagging them WQ_FLAG_EXCLUSIVE and  
> waking them with something like that:
>
> wake_up_interruptible_nr(zone->pressure_wq, #nrofpagesabovewatermark#);
>
> That should be an easy to calculate sane max of waiters to wake up.
> On the other hand it might be over-engineered and it implies the need to  
> reconsider when it would be best to wake up the rest.
>

It seems over-engineering considering that they wake up after a timeout
unconditionally. I think it's best for the moment to let them wake up in
a herd and recheck their zonelists as they'll go back to sleep if
necessary.

> Get me right - I don't really have a hard requirement or need for that,  
> I just wanted to mention it early on to hear your opinions about it.
>
> looking forward to test the v2 patch series, adapted to all the good  
> stuff already discussed.
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-03-09 18:22 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 11:48 [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Mel Gorman
2010-03-08 11:48 ` [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion Mel Gorman
2010-03-09 13:35   ` Nick Piggin
2010-03-09 14:17     ` Mel Gorman
2010-03-09 15:03       ` Nick Piggin
2010-03-09 15:42         ` Christian Ehrhardt
2010-03-09 18:22           ` Mel Gorman [this message]
2010-03-10  2:38             ` Nick Piggin
2010-03-09 17:35         ` Mel Gorman
2010-03-10  2:35           ` Nick Piggin
2010-03-09 15:50   ` Christoph Lameter
2010-03-09 15:56     ` Christian Ehrhardt
2010-03-09 16:09       ` Christoph Lameter
2010-03-09 17:01         ` Mel Gorman
2010-03-09 17:11           ` Christoph Lameter
2010-03-09 17:30             ` Mel Gorman
2010-03-08 11:48 ` [PATCH 2/3] page-allocator: Check zone pressure when batch of pages are freed Mel Gorman
2010-03-09  9:53   ` Nick Piggin
2010-03-09 10:08     ` Mel Gorman
2010-03-09 10:23       ` Nick Piggin
2010-03-09 10:36         ` Mel Gorman
2010-03-09 11:11           ` Nick Piggin
2010-03-09 11:29             ` Mel Gorman
2010-03-08 11:48 ` [PATCH 3/3] vmscan: Put kswapd to sleep on its own waitqueue, not congestion Mel Gorman
2010-03-09 10:00   ` Nick Piggin
2010-03-09 10:21     ` Mel Gorman
2010-03-09 10:32       ` Nick Piggin
2010-03-11 23:41 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Andrew Morton
2010-03-12  6:39   ` Christian Ehrhardt
2010-03-12  7:05     ` Andrew Morton
2010-03-12 10:47       ` Mel Gorman
2010-03-12 12:15         ` Christian Ehrhardt
2010-03-12 14:37           ` Andrew Morton
2010-03-15 12:29             ` Mel Gorman
2010-03-15 14:45               ` Christian Ehrhardt
2010-03-15 12:34             ` Christian Ehrhardt
2010-03-15 20:09               ` Andrew Morton
2010-03-16 10:11                 ` Mel Gorman
2010-03-18 17:42                 ` Mel Gorman
2010-03-22 23:50                 ` Mel Gorman
2010-03-23 14:35                   ` Christian Ehrhardt
2010-03-23 21:35                   ` Corrado Zoccolo
2010-03-24 11:48                     ` Mel Gorman
2010-03-24 12:56                       ` Corrado Zoccolo
2010-03-23 22:29                   ` Rik van Riel
2010-03-24 14:50                     ` Mel Gorman
2010-04-19 12:22                       ` Christian Ehrhardt
2010-04-19 21:44                         ` Johannes Weiner
2010-04-20  7:20                           ` Christian Ehrhardt
2010-04-20  8:54                             ` Christian Ehrhardt
2010-04-20 15:32                             ` Johannes Weiner
2010-04-20 17:22                               ` Rik van Riel
2010-04-21  4:23                                 ` Christian Ehrhardt
2010-04-21  7:35                                   ` Christian Ehrhardt
2010-04-21 13:19                                     ` Rik van Riel
2010-04-22  6:21                                       ` Christian Ehrhardt
2010-04-26 10:59                                         ` Subject: [PATCH][RFC] mm: make working set portion that is protected tunable v2 Christian Ehrhardt
2010-04-26 11:59                                           ` KOSAKI Motohiro
2010-04-26 12:43                                             ` Christian Ehrhardt
2010-04-26 14:20                                               ` Rik van Riel
2010-04-27 14:00                                                 ` Christian Ehrhardt
2010-04-21  9:03                                   ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Johannes Weiner
2010-04-21 13:20                                   ` Rik van Riel
2010-04-20 14:40                           ` Rik van Riel
2010-03-24  2:38                   ` Greg KH
2010-03-24 11:49                     ` Mel Gorman
2010-03-24 13:13                   ` Johannes Weiner
2010-03-12  9:09   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100309182228.GJ4883@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=chris.mason@oracle.com \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).