From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965082Ab0COOp4 (ORCPT ); Mon, 15 Mar 2010 10:45:56 -0400 Received: from mtagate5.de.ibm.com ([195.212.17.165]:53429 "EHLO mtagate5.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965042Ab0COOpz (ORCPT ); Mon, 15 Mar 2010 10:45:55 -0400 Message-ID: <4B9E481D.5020709@linux.vnet.ibm.com> Date: Mon, 15 Mar 2010 15:45:49 +0100 From: Christian Ehrhardt User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Mel Gorman CC: Andrew Morton , linux-mm@kvack.org, Nick Piggin , Chris Mason , Jens Axboe , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure References: <1268048904-19397-1-git-send-email-mel@csn.ul.ie> <20100311154124.e1e23900.akpm@linux-foundation.org> <4B99E19E.6070301@linux.vnet.ibm.com> <20100312020526.d424f2a8.akpm@linux-foundation.org> <20100312104712.GB18274@csn.ul.ie> <4B9A3049.7010602@linux.vnet.ibm.com> <20100312093755.b2393b33.akpm@linux-foundation.org> <20100315122948.GJ18274@csn.ul.ie> In-Reply-To: <20100315122948.GJ18274@csn.ul.ie> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mel Gorman wrote: > On Fri, Mar 12, 2010 at 09:37:55AM -0500, Andrew Morton wrote: >> On Fri, 12 Mar 2010 13:15:05 +0100 Christian Ehrhardt wrote: >> >>>> It still feels a bit unnatural though that the page allocator waits on >>>> congestion when what it really cares about is watermarks. Even if this >>>> patch works for Christian, I think it still has merit so will kick it a >>>> few more times. >>> In whatever way I can look at it watermark_wait should be supperior to >>> congestion_wait. Because as Mel points out waiting for watermarks is >>> what is semantically correct there. >> If a direct-reclaimer waits for some thresholds to be achieved then what >> task is doing reclaim? >> >> Ultimately, kswapd. > > Well, not quite. The direct reclaimer will still wake up after a timeout > and try again regardless of whether watermarks have been met or not. The > intention is to back after after direct reclaim has failed. Granted, the > window during which a direct reclaim finishes and an allocation attempt > occurs is unnecessarily large. This may be addressed by the patch that > changes where cond_resched() is called. > >> This will introduce a hard dependency upon kswapd >> activity. This might introduce scalability problems. And latency >> problems if kswapd if off doodling with a slow device (say), or doing a >> journal commit. And perhaps deadlocks if kswapd tries to take a lock >> which one of the waiting-for-watermark direct relcaimers holds. >> > > What lock could they be holding? Even if that is the case, the direct > reclaimers do not wait indefinitily. > >> Generally, kswapd is an optional, best-effort latency optimisation >> thing and we haven't designed for it to be a critical service. >> Probably stuff would break were we to do so. >> > > No disagreements there. > >> This is one of the reasons why we avoided creating such dependencies in >> reclaim. Instead, what we do when a reclaimer is encountering lots of >> dirty or in-flight pages is >> >> msleep(100); >> >> then try again. We're waiting for the disks, not kswapd. >> >> Only the hard-wired 100 is a bit silly, so we made the "100" variable, >> inversely dependent upon the number of disks and their speed. If you >> have more and faster disks then you sleep for less time. >> >> And that's what congestion_wait() does, in a very simplistic fashion. >> It's a facility which direct-reclaimers use to ratelimit themselves in >> inverse proportion to the speed with which the system can retire writes. >> > > The problem being hit is when a direct reclaimer goes to sleep waiting > on congestion when in reality there were not lots of dirty or in-flight > pages. It goes to sleep for the wrong reasons and doesn't get woken up > again until the timeout expires. > > Bear in mind that even if congestion clears, it just means that dirty > pages are now clean although I admit that the next direct reclaim it > does is going to encounter clean pages and should succeed. > > Lets see how the other patch that changes when cond_reched() gets called > gets on. If it also works out, then it's harder to justify this patch. > If it doesn't work out then it'll need to be kicked another few times. > Unfortunately "page-allocator: Attempt page allocation immediately after direct reclaim" don't help. No improvement in the regression we had fixed with the watermark wait patch. -> *kick*^^ -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, System z Linux Performance