From: Andrew Morton <akpm@linux-foundation.org>
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, lee.schermerhorn@hp.com,
kosaki.motohiro@jp.fujitsu.com
Subject: Re: [PATCH -mm 07/25] second chance replacement for anonymous pages
Date: Fri, 6 Jun 2008 18:04:43 -0700 [thread overview]
Message-ID: <20080606180443.43f782e2.akpm@linux-foundation.org> (raw)
In-Reply-To: <20080606202858.744030156@redhat.com>
On Fri, 06 Jun 2008 16:28:45 -0400
Rik van Riel <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> We avoid evicting and scanning anonymous pages for the most part, but
> under some workloads we can end up with most of memory filled with
> anonymous pages. At that point, we suddenly need to clear the referenced
> bits on all of memory, which can take ages on very large memory systems.
>
> We can reduce the maximum number of pages that need to be scanned by
> not taking the referenced state into account when deactivating an
> anonymous page. After all, every anonymous page starts out referenced,
> so why check?
>
> If an anonymous page gets referenced again before it reaches the end
> of the inactive list, we move it back to the active list.
>
> To keep the maximum amount of necessary work reasonable, we scale the
> active to inactive ratio with the size of memory, using the formula
> active:inactive ratio = sqrt(memory in GB * 10).
Should be scaled by PAGE_SIZE?
> Kswapd CPU use now seems to scale by the amount of pageout bandwidth,
> instead of by the amount of memory present in the system.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> ---
> include/linux/mm_inline.h | 12 ++++++++++++
> include/linux/mmzone.h | 5 +++++
> mm/page_alloc.c | 40 ++++++++++++++++++++++++++++++++++++++++
> mm/vmscan.c | 38 +++++++++++++++++++++++++++++++-------
> mm/vmstat.c | 6 ++++--
> 5 files changed, 92 insertions(+), 9 deletions(-)
>
> Index: linux-2.6.26-rc2-mm1/include/linux/mm_inline.h
> ===================================================================
> --- linux-2.6.26-rc2-mm1.orig/include/linux/mm_inline.h 2008-05-23 14:21:34.000000000 -0400
> +++ linux-2.6.26-rc2-mm1/include/linux/mm_inline.h 2008-05-28 12:09:06.000000000 -0400
> @@ -97,4 +97,16 @@ del_page_from_lru(struct zone *zone, str
> __dec_zone_state(zone, NR_INACTIVE_ANON + l);
> }
>
> +static inline int inactive_anon_low(struct zone *zone)
> +{
> + unsigned long active, inactive;
> +
> + active = zone_page_state(zone, NR_ACTIVE_ANON);
> + inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> +
> + if (inactive * zone->inactive_ratio < active)
> + return 1;
> +
> + return 0;
> +}
inactive_anon_low: "number of inactive anonymous pages which are in lowmem"?
Nope.
Needs a comment. And maybe a better name, like inactive_anon_is_low.
Although making the return type a bool kind-of does that.
> #endif
> Index: linux-2.6.26-rc2-mm1/include/linux/mmzone.h
> ===================================================================
> --- linux-2.6.26-rc2-mm1.orig/include/linux/mmzone.h 2008-05-23 14:21:34.000000000 -0400
> +++ linux-2.6.26-rc2-mm1/include/linux/mmzone.h 2008-05-28 12:09:06.000000000 -0400
> @@ -311,6 +311,11 @@ struct zone {
> */
> int prev_priority;
>
> + /*
> + * The ratio of active to inactive pages.
> + */
> + unsigned int inactive_ratio;
That comment needs a lot of help please. For a start, it's plain wrong
- inactive_ratio would need to be a float to be able to record that ratio.
The comment should describe the units too.
Now poor-old-reviewer has to go off and work out what this thing is.
>
> ZONE_PADDING(_pad2_)
> /* Rarely used or read-mostly fields */
> Index: linux-2.6.26-rc2-mm1/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.26-rc2-mm1.orig/mm/page_alloc.c 2008-05-23 14:21:34.000000000 -0400
> +++ linux-2.6.26-rc2-mm1/mm/page_alloc.c 2008-05-28 12:09:06.000000000 -0400
> @@ -4269,6 +4269,45 @@ void setup_per_zone_pages_min(void)
> calculate_totalreserve_pages();
> }
>
> +/**
> + * setup_per_zone_inactive_ratio - called when min_free_kbytes changes.
> + *
> + * The inactive anon list should be small enough that the VM never has to
> + * do too much work, but large enough that each inactive page has a chance
> + * to be referenced again before it is swapped out.
> + *
> + * The inactive_anon ratio is the ratio of active to inactive anonymous
target ratio? Desired ratio?
> + * pages. Ie. a ratio of 3 means 3:1 or 25% of the anonymous pages are
> + * on the inactive list.
> + *
> + * total return max
> + * memory value inactive anon
This function doesn't "return" a "value".
> + * -------------------------------------
> + * 10MB 1 5MB
> + * 100MB 1 50MB
> + * 1GB 3 250MB
> + * 10GB 10 0.9GB
> + * 100GB 31 3GB
> + * 1TB 101 10GB
> + * 10TB 320 32GB
> + */
> +void setup_per_zone_inactive_ratio(void)
> +{
> + struct zone *zone;
> +
> + for_each_zone(zone) {
> + unsigned int gb, ratio;
> +
> + /* Zone size in gigabytes */
> + gb = zone->present_pages >> (30 - PAGE_SHIFT);
> + ratio = int_sqrt(10 * gb);
> + if (!ratio)
> + ratio = 1;
> +
> + zone->inactive_ratio = ratio;
> + }
> +}
OK, so inactive_ratio is an integer 1 .. N which determines our target
number of inactive pages according to the formula
nr_inactive = nr_active / inactive_ratio
yes?
Can nr_inactive get larger than this? I assume so. I guess that
doesn't matter much. Except the problems which you're trying to sovle
here can reoccur. What would I need to do to trigger that?
> /*
> * Initialise min_free_kbytes.
> *
> @@ -4306,6 +4345,7 @@ static int __init init_per_zone_pages_mi
> min_free_kbytes = 65536;
> setup_per_zone_pages_min();
> setup_per_zone_lowmem_reserve();
> + setup_per_zone_inactive_ratio();
> return 0;
> }
> module_init(init_per_zone_pages_min)
> Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:34.000000000 -0400
> +++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-28 12:11:38.000000000 -0400
> @@ -114,7 +114,7 @@ struct scan_control {
> /*
> * From 0 .. 100. Higher means more swappy.
> */
> -int vm_swappiness = 60;
> +int vm_swappiness = 20;
<goes back to check the changelog>
Whoa. Where'd this come from?
> long vm_total_pages; /* The total number of pages which the VM controls */
>
> static LIST_HEAD(shrinker_list);
> @@ -1008,7 +1008,7 @@ static inline int zone_is_near_oom(struc
> static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
> struct scan_control *sc, int priority, int file)
> {
> - unsigned long pgmoved;
> + unsigned long pgmoved = 0;
> int pgdeactivate = 0;
> unsigned long pgscanned;
> LIST_HEAD(l_hold); /* The pages which were snipped off */
> @@ -1036,17 +1036,32 @@ static void shrink_active_list(unsigned
> __mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
> spin_unlock_irq(&zone->lru_lock);
>
> + pgmoved = 0;
didn't we just do that?
> while (!list_empty(&l_hold)) {
> cond_resched();
> page = lru_to_page(&l_hold);
> list_del(&page->lru);
> - if (page_referenced(page, 0, sc->mem_cgroup))
> - list_add(&page->lru, &l_active);
> - else
> + if (page_referenced(page, 0, sc->mem_cgroup)) {
> + if (file) {
> + /* Referenced file pages stay active. */
> + list_add(&page->lru, &l_active);
> + } else {
> + /* Anonymous pages always get deactivated. */
hm. That's going to make the machine swap like hell. I guess I don't
understand all this yet.
> + list_add(&page->lru, &l_inactive);
> + pgmoved++;
> + }
> + } else
> list_add(&page->lru, &l_inactive);
> }
>
> /*
> + * Count the referenced anon pages as rotated, to balance pageout
> + * scan pressure between file and anonymous pages in get_sacn_ratio.
tpyo
next prev parent reply other threads:[~2008-06-07 1:06 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-06 20:28 [PATCH -mm 00/25] VM pageout scalability improvements (V10) Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 01/25] move isolate_lru_page() to vmscan.c Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 02/25] Use an indexed array for LRU variables Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-07 5:43 ` KOSAKI Motohiro
2008-06-07 14:47 ` Rik van Riel
2008-06-08 11:22 ` KOSAKI Motohiro
2008-06-07 18:42 ` Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 03/25] use an array for the LRU pagevecs Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 04/25] free swap space on swap-in/activation Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-07 19:56 ` Rik van Riel
2008-06-09 2:14 ` MinChan Kim
2008-06-09 2:42 ` Rik van Riel
2008-06-09 13:38 ` KOSAKI Motohiro
2008-06-10 2:30 ` MinChan Kim
2008-06-06 20:28 ` [PATCH -mm 05/25] define page_file_cache() function Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-07 23:38 ` Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 06/25] split LRU lists into anon & file sets Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-07 1:22 ` Rik van Riel
2008-06-07 1:52 ` Andrew Morton
2008-06-06 20:28 ` [PATCH -mm 07/25] second chance replacement for anonymous pages Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton [this message]
2008-06-07 6:03 ` KOSAKI Motohiro
2008-06-07 6:43 ` Andrew Morton
2008-06-08 15:04 ` Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 08/25] add some sanity checks to get_scan_ratio Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-08 15:11 ` Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 09/25] fix pagecache reclaim referenced bit check Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-07 1:08 ` Rik van Riel
2008-06-08 10:02 ` Peter Zijlstra
2008-06-06 20:28 ` [PATCH -mm 10/25] add newly swapped in pages to the inactive list Rik van Riel, Rik van Riel
2008-06-07 1:04 ` Andrew Morton
2008-06-06 20:28 ` [PATCH -mm 11/25] more aggressively use lumpy reclaim Rik van Riel, Rik van Riel
2008-06-07 1:05 ` Andrew Morton
2008-06-06 20:28 ` [PATCH -mm 12/25] pageflag helpers for configed-out flags Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 13/25] Noreclaim LRU Infrastructure Rik van Riel, Rik van Riel
2008-06-07 1:05 ` Andrew Morton
2008-06-08 20:34 ` Rik van Riel
2008-06-08 20:57 ` Andrew Morton
2008-06-08 21:32 ` Rik van Riel
2008-06-08 21:43 ` Ray Lee
2008-06-08 23:22 ` Andrew Morton
2008-06-08 23:34 ` Rik van Riel
2008-06-08 23:54 ` Andrew Morton
2008-06-09 0:56 ` Rik van Riel
2008-06-09 6:10 ` Andrew Morton
2008-06-09 13:44 ` Rik van Riel
2008-06-09 2:58 ` Rik van Riel
2008-06-09 5:44 ` Andrew Morton
2008-06-10 19:17 ` Christoph Lameter
2008-06-10 19:37 ` Rik van Riel
2008-06-10 21:33 ` Andrew Morton
2008-06-10 21:48 ` Andi Kleen
2008-06-10 22:05 ` Dave Hansen
2008-06-11 5:09 ` Paul Mundt
2008-06-11 6:16 ` Andrew Morton
2008-06-11 6:29 ` Paul Mundt
2008-06-11 12:06 ` Andi Kleen
2008-06-11 14:09 ` Removing node flags from page->flags was Re: [PATCH -mm 13/25] Noreclaim LRU Infrastructure II Andi Kleen
2008-06-11 19:03 ` [PATCH -mm 13/25] Noreclaim LRU Infrastructure Andy Whitcroft
2008-06-11 20:52 ` Andi Kleen
2008-06-11 23:25 ` Christoph Lameter
2008-06-08 22:03 ` Rik van Riel
2008-06-08 21:07 ` KOSAKI Motohiro
2008-06-10 20:09 ` Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 14/25] Noreclaim LRU Page Statistics Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 15/25] Ramfs and Ram Disk pages are non-reclaimable Rik van Riel, Rik van Riel
2008-06-07 1:05 ` Andrew Morton
2008-06-08 4:32 ` Greg KH
2008-06-06 20:28 ` [PATCH -mm 16/25] SHM_LOCKED " Rik van Riel, Rik van Riel
2008-06-07 1:05 ` Andrew Morton
2008-06-07 5:21 ` KOSAKI Motohiro
2008-06-10 21:03 ` Rik van Riel
2008-06-10 21:22 ` Lee Schermerhorn
2008-06-10 21:49 ` Andrew Morton
2008-06-06 20:28 ` [PATCH -mm 17/25] Mlocked Pages " Rik van Riel, Rik van Riel
2008-06-07 1:07 ` Andrew Morton
2008-06-07 5:38 ` KOSAKI Motohiro
2008-06-10 3:31 ` Nick Piggin
2008-06-10 12:50 ` Rik van Riel
2008-06-10 21:14 ` Rik van Riel
2008-06-10 21:43 ` Lee Schermerhorn
2008-06-10 21:57 ` Andrew Morton
2008-06-11 16:01 ` Lee Schermerhorn
2008-06-10 23:48 ` Rik van Riel
2008-06-11 15:29 ` Lee Schermerhorn
2008-06-11 1:00 ` Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 18/25] Downgrade mmap sem while populating mlocked regions Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 19/25] Handle mlocked pages during map, remap, unmap Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 20/25] Mlocked Pages statistics Rik van Riel, Rik van Riel
2008-06-06 20:28 ` [PATCH -mm 21/25] Cull non-reclaimable pages in fault path Rik van Riel, Rik van Riel
2008-06-06 20:29 ` [PATCH -mm 22/25] Noreclaim and Mlocked pages vm events Rik van Riel, Rik van Riel
2008-06-06 20:29 ` [PATCH -mm 23/25] Noreclaim LRU scan sysctl Rik van Riel, Rik van Riel
2008-06-06 20:29 ` [PATCH -mm 24/25] Mlocked Pages: count attempts to free mlocked page Rik van Riel, Rik van Riel
2008-06-06 20:29 ` [PATCH -mm 25/25] Noreclaim LRU and Mlocked Pages Documentation Rik van Riel, Rik van Riel
2008-06-06 21:02 ` [PATCH -mm 00/25] VM pageout scalability improvements (V10) Andrew Morton
2008-06-06 21:08 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080606180443.43f782e2.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox