From: Rik van Riel <riel@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Subject: [PATCH -mm 07/24] vmscan: second chance replacement for anonymous pages
Date: Wed, 11 Jun 2008 14:42:21 -0400 [thread overview]
Message-ID: <20080611184339.224527417@redhat.com> (raw)
In-Reply-To: 20080611184214.605110868@redhat.com
[-- Attachment #1: vmscan-second-chance-replacement-for-anonymous-pages.patch --]
[-- Type: text/plain, Size: 8690 bytes --]
From: Rik van Riel <riel@redhat.com>
We avoid evicting and scanning anonymous pages for the most part, but
under some workloads we can end up with most of memory filled with
anonymous pages. At that point, we suddenly need to clear the referenced
bits on all of memory, which can take ages on very large memory systems.
We can reduce the maximum number of pages that need to be scanned by
not taking the referenced state into account when deactivating an
anonymous page. After all, every anonymous page starts out referenced,
so why check?
If an anonymous page gets referenced again before it reaches the end
of the inactive list, we move it back to the active list.
To keep the maximum amount of necessary work reasonable, we scale the
active to inactive ratio with the size of memory, using the formula
active:inactive ratio = sqrt(memory in GB * 10).
Kswapd CPU use now seems to scale by the amount of pageout bandwidth,
instead of by the amount of memory present in the system.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
include/linux/mm_inline.h | 19 +++++++++++++++++
include/linux/mmzone.h | 6 +++++
mm/page_alloc.c | 41 ++++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 49 +++++++++++++++++++++++++++++++++++++++-------
mm/vmstat.c | 6 +++--
5 files changed, 112 insertions(+), 9 deletions(-)
Index: linux-2.6.26-rc5-mm2/include/linux/mm_inline.h
===================================================================
--- linux-2.6.26-rc5-mm2.orig/include/linux/mm_inline.h 2008-06-10 13:35:23.000000000 -0400
+++ linux-2.6.26-rc5-mm2/include/linux/mm_inline.h 2008-06-10 13:35:58.000000000 -0400
@@ -117,4 +117,23 @@ static inline enum lru_list page_lru(str
return lru;
}
+/**
+ * inactive_anon_is_low - check if anonymous pages need to be deactivated
+ * @zone: zone to check
+ *
+ * Returns true if the zone does not have enough inactive anon pages,
+ * meaning some active anon pages need to be deactivated.
+ */
+static inline int inactive_anon_is_low(struct zone *zone)
+{
+ unsigned long active, inactive;
+
+ active = zone_page_state(zone, NR_ACTIVE_ANON);
+ inactive = zone_page_state(zone, NR_INACTIVE_ANON);
+
+ if (inactive * zone->inactive_ratio < active)
+ return 1;
+
+ return 0;
+}
#endif
Index: linux-2.6.26-rc5-mm2/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc5-mm2.orig/include/linux/mmzone.h 2008-06-10 13:35:23.000000000 -0400
+++ linux-2.6.26-rc5-mm2/include/linux/mmzone.h 2008-06-10 13:35:58.000000000 -0400
@@ -323,6 +323,12 @@ struct zone {
*/
int prev_priority;
+ /*
+ * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
+ * this zone's LRU. Maintained by the pageout code.
+ */
+ unsigned int inactive_ratio;
+
ZONE_PADDING(_pad2_)
/* Rarely used or read-mostly fields */
Index: linux-2.6.26-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.26-rc5-mm2.orig/mm/page_alloc.c 2008-06-10 13:35:23.000000000 -0400
+++ linux-2.6.26-rc5-mm2/mm/page_alloc.c 2008-06-10 13:35:58.000000000 -0400
@@ -4194,6 +4194,46 @@ void setup_per_zone_pages_min(void)
calculate_totalreserve_pages();
}
+/**
+ * setup_per_zone_inactive_ratio - called when min_free_kbytes changes.
+ *
+ * The inactive anon list should be small enough that the VM never has to
+ * do too much work, but large enough that each inactive page has a chance
+ * to be referenced again before it is swapped out.
+ *
+ * The inactive_anon ratio is the target ratio of ACTIVE_ANON to
+ * INACTIVE_ANON pages on this zone's LRU, maintained by the
+ * pageout code. A zone->inactive_ratio of 3 means 3:1 or 25% of
+ * the anonymous pages are kept on the inactive list.
+ *
+ * total target max
+ * memory ratio inactive anon
+ * -------------------------------------
+ * 10MB 1 5MB
+ * 100MB 1 50MB
+ * 1GB 3 250MB
+ * 10GB 10 0.9GB
+ * 100GB 31 3GB
+ * 1TB 101 10GB
+ * 10TB 320 32GB
+ */
+void setup_per_zone_inactive_ratio(void)
+{
+ struct zone *zone;
+
+ for_each_zone(zone) {
+ unsigned int gb, ratio;
+
+ /* Zone size in gigabytes */
+ gb = zone->present_pages >> (30 - PAGE_SHIFT);
+ ratio = int_sqrt(10 * gb);
+ if (!ratio)
+ ratio = 1;
+
+ zone->inactive_ratio = ratio;
+ }
+}
+
/*
* Initialise min_free_kbytes.
*
@@ -4231,6 +4271,7 @@ static int __init init_per_zone_pages_mi
min_free_kbytes = 65536;
setup_per_zone_pages_min();
setup_per_zone_lowmem_reserve();
+ setup_per_zone_inactive_ratio();
return 0;
}
module_init(init_per_zone_pages_min)
Index: linux-2.6.26-rc5-mm2/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc5-mm2.orig/mm/vmscan.c 2008-06-10 13:35:23.000000000 -0400
+++ linux-2.6.26-rc5-mm2/mm/vmscan.c 2008-06-10 13:35:58.000000000 -0400
@@ -1056,17 +1056,32 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
spin_unlock_irq(&zone->lru_lock);
+ pgmoved = 0;
while (!list_empty(&l_hold)) {
cond_resched();
page = lru_to_page(&l_hold);
list_del(&page->lru);
- if (page_referenced(page, 0, sc->mem_cgroup))
- list_add(&page->lru, &l_active);
- else
+ if (page_referenced(page, 0, sc->mem_cgroup)) {
+ pgmoved++;
+ if (file) {
+ /* Referenced file pages stay active. */
+ list_add(&page->lru, &l_active);
+ } else {
+ /* Anonymous pages always get deactivated. */
+ list_add(&page->lru, &l_inactive);
+ }
+ } else
list_add(&page->lru, &l_inactive);
}
/*
+ * Count the referenced pages as rotated, even when they are moved
+ * to the inactive list. This helps balance scan pressure between
+ * file and anonymous pages in get_scan_ratio.
+ */
+ zone->recent_rotated[!!file] += pgmoved;
+
+ /*
* Now put the pages back on the appropriate [file or anon] inactive
* and active lists.
*/
@@ -1127,7 +1142,6 @@ static void shrink_active_list(unsigned
}
}
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
- zone->recent_rotated[!!file] += pgmoved;
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgdeactivate);
@@ -1143,7 +1157,13 @@ static unsigned long shrink_list(enum lr
{
int file = is_file_lru(lru);
- if (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE) {
+ if (lru == LRU_ACTIVE_FILE) {
+ shrink_active_list(nr_to_scan, zone, sc, priority, file);
+ return 0;
+ }
+
+ if (lru == LRU_ACTIVE_ANON &&
+ (!scan_global_lru(sc) || inactive_anon_is_low(zone))) {
shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;
}
@@ -1277,8 +1297,8 @@ static unsigned long shrink_zone(int pri
}
}
- while (nr[LRU_ACTIVE_ANON] || nr[LRU_INACTIVE_ANON] ||
- nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) {
+ while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
+ nr[LRU_INACTIVE_FILE]) {
for_each_lru(l) {
if (nr[l]) {
nr_to_scan = min(nr[l],
@@ -1291,6 +1311,13 @@ static unsigned long shrink_zone(int pri
}
}
+ /*
+ * Even if we did not try to evict anon pages at all, we want to
+ * rebalance the anon lru active/inactive ratio.
+ */
+ if (scan_global_lru(sc) && inactive_anon_is_low(zone))
+ shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
+
throttle_vm_writeout(sc->gfp_mask);
return nr_reclaimed;
}
@@ -1584,6 +1611,14 @@ loop_again:
priority != DEF_PRIORITY)
continue;
+ /*
+ * Do some background aging of the anon list, to give
+ * pages a chance to be referenced before reclaiming.
+ */
+ if (inactive_anon_is_low(zone))
+ shrink_active_list(SWAP_CLUSTER_MAX, zone,
+ &sc, priority, 0);
+
if (!zone_watermark_ok(zone, order, zone->pages_high,
0, 0)) {
end_zone = i;
Index: linux-2.6.26-rc5-mm2/mm/vmstat.c
===================================================================
--- linux-2.6.26-rc5-mm2.orig/mm/vmstat.c 2008-06-10 13:35:23.000000000 -0400
+++ linux-2.6.26-rc5-mm2/mm/vmstat.c 2008-06-10 13:35:58.000000000 -0400
@@ -721,10 +721,12 @@ static void zoneinfo_show_print(struct s
seq_printf(m,
"\n all_unreclaimable: %u"
"\n prev_priority: %i"
- "\n start_pfn: %lu",
+ "\n start_pfn: %lu"
+ "\n inactive_ratio: %u",
zone_is_all_unreclaimable(zone),
zone->prev_priority,
- zone->zone_start_pfn);
+ zone->zone_start_pfn,
+ zone->inactive_ratio);
seq_putc(m, '\n');
}
--
All Rights Reversed
next prev parent reply other threads:[~2008-06-11 18:53 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-11 18:42 [PATCH -mm 00/24] VM pageout scalability improvements (V12) Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 01/24] vmscan: move isolate_lru_page() to vmscan.c Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 02/24] vmscan: Use an indexed array for LRU variables Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 03/24] swap: use an array for the LRU pagevecs Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 04/24] vmscan: free swap space on swap-in/activation Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 05/24] define page_file_cache() function Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 06/24] vmscan: split LRU lists into anon & file sets Rik van Riel
2008-06-13 0:39 ` Hiroshi Shimamoto
2008-06-13 17:48 ` [PATCH] fix printk in show_free_areas Rik van Riel
2008-06-13 20:21 ` [PATCH] collect lru meminfo statistics from correct offset Lee Schermerhorn
2008-06-15 15:07 ` [PATCH] fix printk in show_free_areas KOSAKI Motohiro
2008-06-11 18:42 ` Rik van Riel [this message]
2008-06-11 18:42 ` [PATCH -mm 08/24] vmscan: fix pagecache reclaim referenced bit check Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 09/24] vmscan: add newly swapped in pages to the inactive list Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 10/24] more aggressively use lumpy reclaim Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 11/24] pageflag helpers for configed-out flags Rik van Riel
2008-06-11 20:01 ` Andrew Morton
2008-06-11 20:08 ` Rik van Riel
2008-06-11 20:23 ` Lee Schermerhorn
2008-06-11 20:30 ` Rik van Riel
2008-06-11 20:28 ` Lee Schermerhorn
2008-06-11 20:32 ` Rik van Riel
2008-06-11 20:43 ` Lee Schermerhorn
2008-06-11 20:48 ` Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 12/24] Unevictable LRU Infrastructure Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 13/24] Unevictable LRU Page Statistics Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 14/24] Ramfs and Ram Disk pages are unevictable Rik van Riel
2008-06-12 0:54 ` Nick Piggin
2008-06-12 17:29 ` Rik van Riel
2008-06-12 17:37 ` Nick Piggin
2008-06-12 17:50 ` Rik van Riel
2008-06-12 17:57 ` Nick Piggin
2008-06-11 18:42 ` [PATCH -mm 15/24] SHM_LOCKED " Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 16/24] mlock: mlocked " Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 17/24] mlock: downgrade mmap sem while populating mlocked regions Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 18/24] mmap: handle mlocked pages during map, remap, unmap Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 19/24] vmstat: mlocked pages statistics Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 20/24] swap: cull unevictable pages in fault path Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 21/24] vmstat: unevictable and mlocked pages vm events Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 22/24] vmscan: unevictable LRU scan sysctl Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 23/24] mlock: count attempts to free mlocked page Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 24/24] doc: unevictable LRU and mlocked pages documentation Rik van Riel
2008-06-12 5:34 ` [PATCH -mm 00/24] VM pageout scalability improvements (V12) Andrew Morton
2008-06-12 13:31 ` Rik van Riel
2008-06-16 5:32 ` KOSAKI Motohiro
2008-06-16 6:20 ` Andrew Morton
2008-06-16 6:22 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080611184339.224527417@redhat.com \
--to=riel@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox