From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [RFC] Remove unswappable anonymous pages off the LRU From: Lee Schermerhorn In-Reply-To: <45D4E3B6.8050009@redhat.com> References: <45D4DF28.7070409@redhat.com> <45D4E3B6.8050009@redhat.com> Content-Type: text/plain Date: Thu, 15 Feb 2007 18:20:58 -0500 Message-Id: <1171581658.5114.76.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: Christoph Lameter , akpm@linux-foundation.org, linux-mm@kvack.org, Nick Piggin , Peter Zijlstra , KAMEZAWA Hiroyuki , "Martin J. Bligh" , Larry Woodman List-ID: On Thu, 2007-02-15 at 17:50 -0500, Rik van Riel wrote: > Christoph Lameter wrote: > > On Thu, 15 Feb 2007, Rik van Riel wrote: > > > >> Running out of swap is a temporary condition. > >> You need to have some way for those pages to > >> make it back onto the LRU list when swap > >> becomes available. > > > > Yup any ideas how? > > Not really. > > >> For example, we could try to reclaim the swap > >> space of every page that we scan on the active > >> list - when swap space starts getting tight. > > > > Good idea. > > I suspect this will be a better approach. That way > the least used pages can cycle into swap space, and > the more used pages can be in RAM. > > The only reason pages are unswappable when we run > out of swap is that we don't free up the swap space > used by pages that are in memory. Many large memory systems [e.g., 64G-128G x86_64] running large database servers run with little [~2G] to no swap. Most of physical memory is allocated to large shared memory areas which are never expected to swap out [even tho' some db apps may not lock the shmem down :-(]. In these systems, removing the shared memory pages from reclaim consideration may alleviate some nasty lockups we've seen when one of these systems gets pushed into reclaim because, e.g., someone ran a backup that filled the page cache. We find all of the cpus walking the LRU list [millions of pages] to find eligible reclaim candidates. [Almost] none of the shmem pages are reclaimable because of insufficient swap, and we don't want them swapped anyway. Now one could argue that this is an application error, because it doesn't lock the shared memory regions that it doesn't want swapped anyway. This doesn't help the customers in the short term. They're looking for a way to take control outside of the application and make their needs known to the system. Needs like, never push out shmem [and maybe even anon] memory to make room for page cache pages. This, I believe, is the motivation behind the "limit the page cache" patches/requests that we keep seeing. An idea for handling these: With the addition of Christoph's patch to move mlock()ed pages out of the LRU, we could add a mechanism to automagically lock shared memory regions that either exceed some tunable threshold or that exceed the available amount of swap. Larry Woodman at Red Hat has been experimenting with patches to move shmem [and anon?] pages in excess of swap to a separate "wired list". This has alleviated part of the problems [apparent system hangs]. There are other issues, some that have been discussed on the mailing lists recently, with page cache pages messing up the LRU-ness of the active and inactive lists; vmscan not being proactive enough in keeping available memory [limits too low for large systems]; etc. Those issues are exacerbated by a long active list with a high fraction of unreclaimable pages. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org