linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Larry Woodman <lwoodman@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <clameter@sgi.com>,
	linux-mm@kvack.org, Christoph Hellwig <hch@infradead.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Nigel Cunningham <nigel@nigel.suspend2.net>,
	"Martin J. Bligh" <mbligh@mbligh.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Matt Mackall <mpm@selenic.com>, Rik van Riel <riel@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC 0/7] Move mlocked pages off the LRU and track them
Date: Wed, 07 Feb 2007 05:51:17 -0500	[thread overview]
Message-ID: <45C9AF25.9040107@redhat.com> (raw)
In-Reply-To: <20070206115113.4a5db10c.akpm@linux-foundation.org>

Andrew Morton wrote:

>On Tue, 06 Feb 2007 11:04:42 -0500
>Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
>
>  
>
>>Note that anon [and shmem] pages in excess of available swap are
>>effectively mlocked().  In the field, we have seen non-NUMA x86_64
>>systems with 64-128GB [16-32million 4k pages] with little to no
>>swap--big data base servers.  The majority of the memory is dedicated to
>>large data base shared memory areas.  The remaining is divided between
>>program anon and page cache [executable, libs] pages and any other page
>>cache pages used by data base utilities, system daemons, ...
>>
>>The system runs fine until someone runs a backup [or multiple, as there
>>are multiple data base instances running].  This over commits memory and
>>we end up with all cpus in reclaim, contending for the zone lru lock,
>>and walking an active list of 10s of millions of pages looking for pages
>>to reclaim.  The reclaim logic spends a lot of time walking the lru
>>lists, nominating shmem pages [the majority of pages on the list] for
>>reclaim, only to find in shrink_pages() that it can't move the page to
>>swap.  So, it puts it back on the list to be retried by the other cpus
>>once they obtain the zone lru lock.  System appears to be hung for long
>>periods of time.
>>
>>There are a lot of behaviors in the reclaim code that exacerbate the
>>problems when we get into this mode, but the long lists of unswappable
>>anon/shmem pages is the major culprit.  One of the guys at Red Hat has
>>tried a "proof of concept" patch to move all anon/shmem pages in excess
>>of swap space to "wired list" [currently global, per node/zone in
>>progress] and it seems to alleviate the problem.  
>>
>>So, Christoph's patch addresses a real problem that we've seen.
>>Unfortunately, not all data base applications lock their shmem areas
>>into memory.  Excluding pages from consideration for reclaim that can't
>>possibly be swapped out due to lack of swap space seems a natural
>>extension of this concept.  I expect that many Christoph's customers run
>>with swap space that is much smaller than system memory and would
>>benefit from this extension.
>>    
>>
>
>Yeah.
>
>The scanner at present tries to handle out-of-swap by moving these pages
>onto the active list (shrink_page_list) then keeping them there
>(shrink_active_list) so it _should_ be the case that the performance
>problems which you're observing are due to active list scanning.  Is that
>correct?
>
>If not, something's busted.
>

This is true but when mark_page_accessed() activates referenced 
pagecache pages it
mixes them with the non-swapable anonymous and system V shared memory pages
on the active list.   This combined with lots of heavy filesystem 
writing prevent kswapd
from keeping up with the memory demmand so the free list(s) fall below 
zone->pages_min
and every call to __alloc_pages() results in calling 
try_to_free_pages().  Once all CPUs
are scanning and trying to reclaim the system chokes, especially on 
systems with lots
of CPUs and lots of RAM.

Larry Woodman


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2007-02-07 10:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-05 20:52 [RFC 0/7] Move mlocked pages off the LRU and track them Christoph Lameter
2007-02-05 20:52 ` [RFC 1/7] Make try_to_unmap return a special exit code Christoph Lameter
2007-02-05 20:52 ` [RFC 2/7] Add PageMlocked() page state bit and lru infrastructure Christoph Lameter
2007-02-05 20:52 ` [RFC 3/7] Add NR_MLOCK ZVC Christoph Lameter
2007-02-05 20:52 ` [RFC 4/7] Logic to move mlocked pages Christoph Lameter
2007-02-05 20:53 ` [RFC 5/7] Consolidate new anonymous page code paths Christoph Lameter
2007-02-05 20:53 ` [RFC 6/7] Avoid putting new mlocked anonymous pages on LRU Christoph Lameter
2007-02-05 20:53 ` [RFC 7/7] Opportunistically move mlocked pages off the LRU Christoph Lameter
2007-02-06 16:04 ` [RFC 0/7] Move mlocked pages off the LRU and track them Lee Schermerhorn
2007-02-06 16:50   ` Larry Woodman
2007-02-06 19:51   ` Andrew Morton
2007-02-07 10:51     ` Larry Woodman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45C9AF25.9040107@redhat.com \
    --to=lwoodman@redhat.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=clameter@sgi.com \
    --cc=hch@infradead.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@mbligh.org \
    --cc=mpm@selenic.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=nigel@nigel.suspend2.net \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).