linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
	Eric Whitney <eric.whitney@hp.com>
Subject: Re: [patch 00/19] VM pageout scalability improvements
Date: Thu, 03 Jan 2008 11:52:08 -0500	[thread overview]
Message-ID: <1199379128.5295.21.camel@localhost> (raw)
In-Reply-To: <20080102224144.885671949@redhat.com>

On Wed, 2008-01-02 at 17:41 -0500, linux-kernel@vger.kernelporg wrote:
> On large memory systems, the VM can spend way too much time scanning
> through pages that it cannot (or should not) evict from memory. Not
> only does it use up CPU time, but it also provokes lock contention
> and can leave large systems under memory presure in a catatonic state.
> 
> Against 2.6.24-rc6-mm1
> 
> This patch series improves VM scalability by:
> 
> 1) making the locking a little more scalable
> 
> 2) putting filesystem backed, swap backed and non-reclaimable pages
>    onto their own LRUs, so the system only scans the pages that it
>    can/should evict from memory
> 
> 3) switching to SEQ replacement for the anonymous LRUs, so the
>    number of pages that need to be scanned when the system
>    starts swapping is bound to a reasonable number
> 
> The noreclaim patches come verbatim from Lee Schermerhorn and
> Nick Piggin.  I have made a few small fixes to them and left out
> the bits that are no longer needed with split file/anon lists.
> 
> The exception is "Scan noreclaim list for reclaimable pages",
> which should not be needed but could be a useful debugging tool.

Note that patch 14/19 [SHM_LOCK/UNLOCK handling] depends on the
infrastructure introduced by the "Scan noreclaim list for reclaimable
pages" patch.  When SHM_UNLOCKing a shm segment, we call a new
scan_mapping_noreclaim_page() function to check all of the pages in the
segment for reclaimability.  There might be other reasons for the pages
to be non-reclaimable...

So, we can't merge 14/19 as is w/o some of patch 12.  We can probably
eliminate the sysctl and per node sysfs attributes to force a scan.
But, as Rik says, this has been useful for debugging--e.g., periodically
forcing a full rescan while running a stress load.

Also, I should point out that the full noreclaim series includes a
couple of other patches NOT posted here by Rik:

1) treat swap backed pages as nonreclaimable when no swap space is
available.  This addresses a problem we've seen in real life, with
vmscan spending a lot of time trying to reclaim anon/shmem/tmpfs/...
pages only to find that there is no swap space--add_to_swap() fails.
Maybe not a problem with Rik's new anon page handling.  We'll see.  If
we did want to add this filter, we'll need a way to bring back pages
from the noreclaim list that are there only for lack of swap space when
space is added or becomes available.

2) treat anon pages with "excessively long" anon_vma lists as
nonreclaimable.   "excessively long" here is a sysctl tunable parameter.
This also addresses problems we've seen with benchmarks and stress
tests--all cpus spinning on some anon_vma lock.  In "real life", we've
seen this behavior with file backed pages--spinning on the
i_mmap_lock--running Oracle workloads with user counts in the few
thousands.  Again, something we may not need with Rik's vmscan rework.
If we did want to do this, we'd probably want to address file backed
pages and add support to bring the pages back from the noreclaim list
when the number of "mappers" drops below the threshold.  My current
patch leaves anon pages as non-reclaimable until they're freed, or
manually scanned via the mechanism introduced by patch 12.

Lee
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-01-03 16:52 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-02 22:41 [patch 00/19] VM pageout scalability improvements linux-kernel
2008-01-02 22:41 ` [patch 01/19] move isolate_lru_page() to vmscan.c linux-kernel
2008-01-02 22:41 ` [patch 02/19] free swap space on swap-in/activation linux-kernel
2008-01-02 22:41 ` [patch 03/19] define page_file_cache() function linux-kernel
2008-01-02 22:41 ` [patch 04/19] debugging checks for page_file_cache() linux-kernel
2008-01-02 22:41 ` [patch 05/19] Use an indexed array for LRU variables linux-kernel
2008-01-02 22:41 ` [patch 06/19] split LRU lists into anon & file sets linux-kernel
2008-01-07  9:23   ` KAMEZAWA Hiroyuki
2008-01-02 22:41 ` [patch 07/19] split anon & file LRUs for memcontrol code linux-kernel
2008-01-07 10:04   ` KAMEZAWA Hiroyuki
2008-01-07 14:10     ` Balbir Singh
2008-01-07 15:23     ` Rik van Riel
2008-01-02 22:41 ` [patch 08/19] SEQ replacement for anonymous pages linux-kernel
2008-01-02 22:41 ` [patch 09/19] add newly swapped in pages to the inactive list linux-kernel
2008-01-02 22:41 ` [patch 10/19] No Reclaim LRU Infrastructure linux-kernel
2008-01-02 22:41 ` [patch 11/19] Non-reclaimable page statistics linux-kernel
2008-01-02 22:41 ` [patch 12/19] scan noreclaim list for reclaimable pages linux-kernel
2008-01-02 22:41 ` [patch 13/19] ramfs pages are non-reclaimable linux-kernel
2008-01-02 22:41 ` [patch 14/19] SHM_LOCKED pages are nonreclaimable linux-kernel
2008-01-02 22:41 ` [patch 15/19] non-reclaimable mlocked pages linux-kernel
2008-01-02 22:42 ` [patch 16/19] mlock vma pages under mmap_sem held for read linux-kernel
2008-01-02 22:42 ` [patch 17/19] handle mlocked pages during map/unmap and truncate linux-kernel
2008-01-02 22:42 ` [patch 18/19] account mlocked pages linux-kernel
2008-01-02 22:42 ` [patch 19/19] cull non-reclaimable anon pages from the LRU at fault time linux-kernel
2008-01-02 23:17 ` [patch 00/19] VM pageout scalability - one big patch Rik van Riel
2008-01-03  3:44 ` [patch 00/19] VM pageout scalability improvements Rik van Riel
2008-01-10  2:39   ` Christoph Lameter
2008-01-10  3:14     ` Rik van Riel
2008-01-03 16:52 ` Lee Schermerhorn [this message]
2008-01-03 17:00   ` Rik van Riel
2008-01-03 17:13     ` Lee Schermerhorn
2008-01-03 22:00       ` Rik van Riel
2008-01-04 16:25         ` Lee Schermerhorn
2008-01-04 16:34           ` Andi Kleen
2008-01-04 16:55             ` Rik van Riel
2008-01-04 18:07               ` Larry Woodman
2008-01-04 17:06             ` Lee Schermerhorn
2008-01-07 19:07               ` Christoph Lameter
2008-01-07 19:32                 ` Rik van Riel
2008-01-07 10:06     ` KAMEZAWA Hiroyuki
2008-01-07 15:18       ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2008-01-08 20:59 Rik van Riel
2008-01-10  4:39 ` Mike Snitzer
2008-01-10 15:41   ` Rik van Riel
2008-01-10 16:08     ` Mike Snitzer
2008-01-11 10:41 ` Balbir Singh
2008-01-11 15:38   ` Rik van Riel
2008-01-11 11:47 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1199379128.5295.21.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).