From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 6 Feb 2007 11:51:13 -0800 From: Andrew Morton Subject: Re: [RFC 0/7] Move mlocked pages off the LRU and track them Message-Id: <20070206115113.4a5db10c.akpm@linux-foundation.org> In-Reply-To: <1170777882.4945.31.camel@localhost> References: <20070205205235.4500.54958.sendpatchset@schroedinger.engr.sgi.com> <1170777882.4945.31.camel@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Lee Schermerhorn Cc: Christoph Lameter , linux-mm@kvack.org, Christoph Hellwig , Arjan van de Ven , Nigel Cunningham , "Martin J. Bligh" , Peter Zijlstra , Nick Piggin , Matt Mackall , Rik van Riel , KAMEZAWA Hiroyuki , Larry Woodman List-ID: On Tue, 06 Feb 2007 11:04:42 -0500 Lee Schermerhorn wrote: > Note that anon [and shmem] pages in excess of available swap are > effectively mlocked(). In the field, we have seen non-NUMA x86_64 > systems with 64-128GB [16-32million 4k pages] with little to no > swap--big data base servers. The majority of the memory is dedicated to > large data base shared memory areas. The remaining is divided between > program anon and page cache [executable, libs] pages and any other page > cache pages used by data base utilities, system daemons, ... > > The system runs fine until someone runs a backup [or multiple, as there > are multiple data base instances running]. This over commits memory and > we end up with all cpus in reclaim, contending for the zone lru lock, > and walking an active list of 10s of millions of pages looking for pages > to reclaim. The reclaim logic spends a lot of time walking the lru > lists, nominating shmem pages [the majority of pages on the list] for > reclaim, only to find in shrink_pages() that it can't move the page to > swap. So, it puts it back on the list to be retried by the other cpus > once they obtain the zone lru lock. System appears to be hung for long > periods of time. > > There are a lot of behaviors in the reclaim code that exacerbate the > problems when we get into this mode, but the long lists of unswappable > anon/shmem pages is the major culprit. One of the guys at Red Hat has > tried a "proof of concept" patch to move all anon/shmem pages in excess > of swap space to "wired list" [currently global, per node/zone in > progress] and it seems to alleviate the problem. > > So, Christoph's patch addresses a real problem that we've seen. > Unfortunately, not all data base applications lock their shmem areas > into memory. Excluding pages from consideration for reclaim that can't > possibly be swapped out due to lack of swap space seems a natural > extension of this concept. I expect that many Christoph's customers run > with swap space that is much smaller than system memory and would > benefit from this extension. Yeah. The scanner at present tries to handle out-of-swap by moving these pages onto the active list (shrink_page_list) then keeping them there (shrink_active_list) so it _should_ be the case that the performance problems which you're observing are due to active list scanning. Is that correct? If not, something's busted. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org