linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, mel@csn.ul.ie, clameter@sgi.com,
	riel@redhat.com, balbir@linux.vnet.ibm.com, andrea@suse.de,
	a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de
Subject: [PATCH/RFC 0/14] Page Reclaim Scalability
Date: Fri, 14 Sep 2007 16:53:59 -0400	[thread overview]
Message-ID: <20070914205359.6536.98017.sendpatchset@localhost> (raw)

As I discussed with some of you in Cambridge:


[PATCH/RFC] 0/14 Page Reclaim Scalability Patches:

The objective of this series of patches is not to make page reclaim
"smarter"--e.g., by improving the heuristics or using a new replacement
algorithm.  Rather, the objective is to make the existing algorithm
more effective by removing from consideration those pages that are 
difficult or impossible to reclaim so that the page reclaim algorithm 
can concentrate on those pages which have a good chance of being 
reclaimed.  This is especially important for servers with large
amounts of memory--in the millions of pages.  Doing this should benefit
any future improvements to the reclaim algorithm itself.

Some of the conditions that make pages difficult or impossible to reclaim:
  1) page is ramdisk page.
  2) page is anon or shmem, but no swap space available
  3) page is mlocked into memory, including SHM_LOCKed shmem pages.
  4) page is anon with an excessive number of related vmas [on the
     anon_vma list]; or is a file-backed page, with an excessive
     number of vmas mapping the page.

Pages that fall in categories 1-3 above remain on the LRU lists,
despite being non-reclaimable.  vmscan can spend a great deal of time
shuffling these pages around the lists.  Pages in category 4 are
theoretically reclaimable, but the system can enter livelock, with
all cpus spinning on the respective anon_vma lock or i_mmap_lock.

The basic mechanism employed to achieve the stated objective is to
manage "non-reclaimable" pages off the LRU active and inactive lists
on a separate "noreclaim" list.  The "noreclaim" list is based on a
patch by Larry Woodman of Red Hat.  I have enhanced this concept to
make the noreclaim list a peer of the LRU active and inactive list--
i.e., yet another LRU list.  This approach simplifies the management
of noreclaim pages, as we have well established protocols for managing
pages on the LRU.  From my discussions with developers who attended
the VM Summit in Cambridge ~2-3Sept, I understand that there is some
agreement with this approach.

This series, although very much still a work in progress, has been
running in various forms for several months on test machines at HP--
fairly large ia64 NUMA servers--under reasonable high stress loads.  
I have posted a previous version of a subset of these patches on
linux-mm.   The current version has ungone a fair amount of rework
based on discussions with vm developers, but there is still much to
be done.  I'm reposting the new series in hopes of kick-starting
the discussion to either progress this series to acceptance, or
to kill it off so that we can direct our attentions to some other
approach.

Here is a brief [promise I'll try] summary of the patches to
follow.  More details and discussion in the patch descriptions.
The patch names are taken from the file names in my series.

Currently atop 2.6.23-rc4-mm1:

1) make-anon_vma-lock-rw
2) make-i_mmap_lock-rw

The first two patches are not part of the noreclaim infrastructure.
Rather, these patches improve parallelism in shrink_page_list()--
specifically in page_referenced() and try_to_unmap()--by making the
anon_vma lock and the i_mmap_lock reader/writer spinlocks.  

3) move-and-rework-isolate_lru_page

>From Nick Piggin's "keep mlocked pages off LRU" patch, this patch
moves the "isolate_lru_page()" function from mm/migrate.c
to mm/vmscan.c from where it is used by both the page migration
code and this noreclaim series [mlock patches below].

4) introduce-page_anon-function

Extracted from Rik van Riel's "split LRU" patch.  Used by
noreclaim series to detect swap-backed pages.

Aside:  at one point, I had this series working with Rik's
split LRU patch in the same tree.  I have separated them
for now, but plan to remerge at some point for further testing.
Rik's patch in more of a "make reclaim smarter" patch.

5) use-indexed-array-of-lru-lists

Christoph Lameter's cleanup of per zone LRU list handling.
Useful here as noreclaim adds an additional "LRU" list.  Will
also be useful with Rik's "split LRU" mechanism.

Aside:  I note that in 23-rc4-mm1, the memory controller has 
its own active and inactive list.  It may also benefit from
use of Christoph's patch.  Further, we'll need to consider 
whether memory controllers should maintain separate noreclaim
lists.

6) noreclaim-01-no-reclaim-infrastructure

This patch provides the basic noreclaim list mechanism and a
skeletal "page_reclaimable()" predicate function to test whether
a page should be diverted to the noreclaim list.  Subsequent
patches add tests to page_reclaimable().

7) noreclaim-02-report-nonreclaimable-memory

Provides basic accounting/statistics for non-reclaimable
pages.

8) noreclaim-03-ramdisk-pages-are-nonreclaimable

Enhances page_reclaimable() to detect ram_disk pages and
"just say no".  See the patch description for details.

9) noreclaim-04-SHM_LOCKed-pages-are-nonreclaimable

Similarly, declare pages in SHM_LOCKED shmem segments as
non-reclaimable.

10) noreclaim-05-track-anon_vma-related-vmas

Reference count anon_vma--number of vmas in the list.
Declare anon pages non-reclaimable if the count exceeds a 
tunable threshold.

TODO:  similar for file-backed pages.  No such patch yet.

11) noreclaim-06-unswappable-anon-and-shmem

Using Rik's page_anon() function, declare swap-backed
pages as non-reclaimable when no swap space exists.

TODO:  bring the pages back when [sufficient] swap space
freed or added.  See patch description.

12) noreclaim-07.1-prepare-for-mlocked-pages
13) noreclaim-07.2-move-mlocked-pages-off-the-LRU

These two patches are a rework of Nick Piggin's series to
do the same thing--move mlocked pages off the LRU.  The rework
eliminates the use of one of the lru list links as the mlock
count, so that these pages can be maintained on the noreclaim
list.  The count is replaced by a single page flag that is
maintained by mlock/munlock/munmap code.

14) noreclaim-08-cull-nonreclaimable-anon-pages-in-fault-path

This is an optional patch, inspired by Nick's mlock patch.  It 
checks for nonreclaimable anon pages created by copy-on-write
and diverts them to the noreclaim list so that vmscan never
sees them.  Without this patch, shrink_active_list() will see
these pages once and move them to the noreclaim list.  

-----------
A note to reviewers:  these patches contain intentional, glaring
style violations:  use of '//TODO' comments.  I KNOW that these are
style violations and will remove them as the questions they raise
are resolved.  I want them to stand out in hopes that you'll read
the contents.

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2007-09-14 20:53 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-14 20:53 Lee Schermerhorn [this message]
2007-09-14 20:54 ` [PATCH/RFC 1/14] Reclaim Scalability: Convert anon_vma lock to read/write lock Lee Schermerhorn
2007-09-17 11:02   ` Mel Gorman
2007-09-18  2:41     ` KAMEZAWA Hiroyuki
2007-09-18 11:01       ` Mel Gorman
2007-09-18 14:57         ` Rik van Riel
2007-09-18 15:37       ` Lee Schermerhorn
2007-09-18 20:17     ` Lee Schermerhorn
2007-09-20 10:19       ` Mel Gorman
2007-09-14 20:54 ` [PATCH/RFC 2/14] Reclaim Scalability: convert inode i_mmap_lock to reader/writer lock Lee Schermerhorn
2007-09-17 12:53   ` Mel Gorman
2007-09-20  1:24   ` Andrea Arcangeli
2007-09-20 14:10     ` Lee Schermerhorn
2007-09-20 14:16       ` Andrea Arcangeli
2007-09-14 20:54 ` [PATCH/RFC 3/14] Reclaim Scalability: move isolate_lru_page() to vmscan.c Lee Schermerhorn
2007-09-14 21:34   ` Peter Zijlstra
2007-09-15  1:55     ` Rik van Riel
2007-09-17 14:11     ` Lee Schermerhorn
2007-09-17  9:20   ` Balbir Singh
2007-09-17 19:19     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 4/14] Reclaim Scalability: Define page_anon() function Lee Schermerhorn
2007-09-15  2:00   ` Rik van Riel
2007-09-17 13:19   ` Mel Gorman
2007-09-18  1:58   ` KAMEZAWA Hiroyuki
2007-09-18  2:27     ` Rik van Riel
2007-09-18  2:40       ` KAMEZAWA Hiroyuki
2007-09-18 15:04     ` Lee Schermerhorn
2007-09-18 19:41       ` Christoph Lameter
2007-09-19  0:30       ` KAMEZAWA Hiroyuki
2007-09-19 16:58         ` Lee Schermerhorn
2007-09-20  0:56           ` KAMEZAWA Hiroyuki
2007-09-14 20:54 ` [PATCH/RFC 5/14] Reclaim Scalability: Use an indexed array for LRU variables Lee Schermerhorn
2007-09-17 13:40   ` Mel Gorman
2007-09-17 14:17     ` Lee Schermerhorn
2007-09-17 14:39       ` Lee Schermerhorn
2007-09-17 18:58   ` Balbir Singh
2007-09-17 19:12     ` Lee Schermerhorn
2007-09-17 19:36       ` Balbir Singh
2007-09-17 19:36     ` Rik van Riel
2007-09-17 20:21       ` Balbir Singh
2007-09-17 21:01         ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure" Lee Schermerhorn
2007-09-14 22:47   ` Christoph Lameter
2007-09-17 15:17     ` Lee Schermerhorn
2007-09-17 18:41       ` Christoph Lameter
2007-09-18  9:54         ` Mel Gorman
2007-09-18 19:45           ` Christoph Lameter
2007-09-19 11:11             ` Mel Gorman
2007-09-19 18:03               ` Christoph Lameter
2007-09-19  6:00   ` Balbir Singh
2007-09-19 14:47     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 7/14] Reclaim Scalability: Non-reclaimable page statistics Lee Schermerhorn
2007-09-17  1:56   ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 8/14] Reclaim Scalability: Ram Disk Pages are non-reclaimable Lee Schermerhorn
2007-09-17  1:57   ` Rik van Riel
2007-09-17 14:40     ` Lee Schermerhorn
2007-09-17 18:42       ` Christoph Lameter
2007-09-14 20:54 ` [PATCH/RFC 9/14] Reclaim Scalability: SHM_LOCKED pages are nonreclaimable Lee Schermerhorn
2007-09-17  2:18   ` Rik van Riel
2007-09-14 20:55 ` [PATCH/RFC 10/14] Reclaim Scalability: track anon_vma "related vmas" Lee Schermerhorn
2007-09-17  2:52   ` Rik van Riel
2007-09-17 15:52     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 11/14] Reclaim Scalability: swap backed pages are nonreclaimable when no swap space available Lee Schermerhorn
2007-09-17  2:53   ` Rik van Riel
2007-09-18 17:46     ` Lee Schermerhorn
2007-09-18 20:01       ` Rik van Riel
2007-09-19 14:55         ` Lee Schermerhorn
2007-09-18  2:59   ` KAMEZAWA Hiroyuki
2007-09-18 15:47     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 12/14] Reclaim Scalability: Non-reclaimable Mlock'ed pages Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 13/14] Reclaim Scalability: Handle Mlock'ed pages during map/unmap and truncate Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 14/14] Reclaim Scalability: cull non-reclaimable anon pages in fault path Lee Schermerhorn
2007-09-14 21:11 ` [PATCH/RFC 0/14] Page Reclaim Scalability Peter Zijlstra
2007-09-14 21:42   ` Linus Torvalds
2007-09-14 22:02     ` Peter Zijlstra
2007-09-15  0:07       ` Linus Torvalds
2007-09-17  6:44 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070914205359.6536.98017.sendpatchset@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).