[patch 3/3][rfc] vmscan: batched swap slot allocation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hugh@veritas.com>
Subject: [patch 3/3][rfc] vmscan: batched swap slot allocation
Date: Mon, 20 Apr 2009 22:24:45 +0200	[thread overview]
Message-ID: <1240259085-25872-3-git-send-email-hannes@cmpxchg.org> (raw)
In-Reply-To: <1240259085-25872-1-git-send-email-hannes@cmpxchg.org>

Every swap slot allocation tries to be subsequent to the previous one
to help keeping the LRU order of anon pages intact when they are
swapped out.

With an increasing number of concurrent reclaimers, the average
distance between two subsequent slot allocations of one reclaimer
increases as well.  The contiguous LRU list chunks each reclaimer
swaps out get 'multiplexed' on the swap space as they allocate the
slots concurrently.

	2 processes isolating 15 pages each and allocating swap slots
	concurrently:

	#0			#1

	page 0 slot 0		page 15 slot 1
	page 1 slot 2		page 16 slot 3
	page 2 slot 4		page 17 slot 5
	...

	-> average slot distance of 2

All reclaimers being equally fast, this becomes a problem when the
total number of concurrent reclaimers gets so high that even equal
distribution makes the average distance between the slots of one
reclaimer too wide for optimistic swap-in to compensate.

But right now, one reclaimer can take much longer than another one
because its pages are mapped into more page tables and it has thus
more work to do and the faster reclaimer will allocate multiple swap
slots between two slot allocations of the slower one.

This patch makes shrink_page_list() allocate swap slots in batches,
collecting all the anonymous memory pages in a list without
rescheduling and actual reclaim in between.  And only after all anon
pages are swap cached, unmap and write-out starts for them.

While this does not fix the fundamental issue of slot distribution
increasing with reclaimers, it mitigates the problem by balancing the
resulting fragmentation equally between the allocators.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
---
 mm/vmscan.c |   49 +++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 70092fa..b3823fe 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -592,24 +592,42 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 					enum pageout_io sync_writeback)
 {
 	LIST_HEAD(ret_pages);
+	LIST_HEAD(swap_pages);
 	struct pagevec freed_pvec;
-	int pgactivate = 0;
+	int pgactivate = 0, restart = 0;
 	unsigned long nr_reclaimed = 0;
 
 	cond_resched();
 
 	pagevec_init(&freed_pvec, 1);
+restart:
 	while (!list_empty(page_list)) {
 		struct address_space *mapping;
 		struct page *page;
 		int may_enter_fs;
 		int referenced;
 
-		cond_resched();
+		if (list_empty(&swap_pages))
+			cond_resched();
 
 		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
+		if (restart) {
+			/*
+			 * We are allowed to do IO when we restart for
+			 * swap pages.
+			 */
+			may_enter_fs = 1;
+			/*
+			 * Referenced pages will be sorted out by
+			 * try_to_unmap() and unmapped (anon!) pages
+			 * are not to be referenced anymore.
+			 */
+			referenced = 0;
+			goto reclaim;
+		}
+
 		if (!trylock_page(page))
 			goto keep;
 
@@ -655,14 +673,24 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
 		 */
-		if (PageAnon(page) && !PageSwapCache(page)) {
-			if (!(sc->gfp_mask & __GFP_IO))
-				goto keep_locked;
-			if (!add_to_swap(page))
-				goto activate_locked;
-			may_enter_fs = 1;
+		if (PageAnon(page)) {
+			if (!PageSwapCache(page)) {
+				if (!(sc->gfp_mask & __GFP_IO))
+					goto keep_locked;
+				if (!add_to_swap(page))
+					goto activate_locked;
+			} else if (!may_enter_fs)
+				/*
+				 * It's no use to batch when we are
+				 * not allocating swap for this GFP
+				 * mask.
+				 */
+				goto reclaim;
+			list_add(&page->lru, &swap_pages);
+			continue;
 		}
 
+	reclaim:
 		mapping = page_mapping(page);
 
 		/*
@@ -794,6 +822,11 @@ keep:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
+	if (!list_empty(&swap_pages)) {
+		list_splice_init(&swap_pages, page_list);
+		restart = 1;
+		goto restart;
+	}
 	list_splice(&ret_pages, page_list);
 	if (pagevec_count(&freed_pvec))
 		__pagevec_free(&freed_pvec);
-- 
1.6.2.1.135.gde769

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-04-20 20:25 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-20 20:24 [patch 1/3] mm: fix pageref leak in do_swap_page() Johannes Weiner
2009-04-20 20:24 ` [patch 2/3][rfc] swap: try to reuse freed slots in the allocation area Johannes Weiner
2009-04-22 19:59   ` Hugh Dickins
2009-04-27  8:02     ` Johannes Weiner
2009-04-20 20:24 ` Johannes Weiner [this message]
2009-04-20 20:31   ` [patch 3/3][rfc] vmscan: batched swap slot allocation Johannes Weiner
2009-04-20 20:53     ` Andrew Morton
2009-04-20 21:38       ` Johannes Weiner
2009-04-21  0:58   ` KAMEZAWA Hiroyuki
2009-04-21  8:52     ` Johannes Weiner
2009-04-21  9:23       ` KAMEZAWA Hiroyuki
2009-04-21  9:54         ` Johannes Weiner
2009-04-21  9:27       ` KOSAKI Motohiro
2009-04-21  9:38         ` Johannes Weiner
2009-04-21  9:41           ` KOSAKI Motohiro
2009-04-22 20:37   ` Hugh Dickins
2009-04-27  7:46     ` Johannes Weiner
2009-04-20 23:36 ` [patch 1/3] mm: fix pageref leak in do_swap_page() Minchan Kim
2009-04-21  3:14 ` Balbir Singh
2009-04-21  8:19   ` Johannes Weiner
2009-04-21  8:45     ` Balbir Singh
2009-04-21  3:44 ` KAMEZAWA Hiroyuki

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:70092fa dfblob:b3823fe )
 OR (
bs:"[patch 3/3][rfc] vmscan: batched swap slot allocation" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1240259085-25872-3-git-send-email-hannes@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).