From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@osdl.org>,
Christoph Lameter <christoph@lameter.com>,
Rik van Riel <riel@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Nick Piggin <npiggin@suse.de>, Andrea Arcangeli <andrea@suse.de>,
Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
Magnus Damm <magnus.damm@gmail.com>,
Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 02/12] mm: supporting variables and functions for balanced zone aging
Date: Thu, 01 Dec 2005 18:18:12 +0800 [thread overview]
Message-ID: <20051201101933.936973000@localhost.localdomain> (raw)
In-Reply-To: 20051201101810.837245000@localhost.localdomain
[-- Attachment #1: mm-balance-zone-aging-supporting-facilities.patch --]
[-- Type: text/plain, Size: 5239 bytes --]
The zone aging rates are currently imbalanced, the gap can be as large as 3
times, which can severely damage read-ahead requests and shorten their
effective life time.
This patch adds three variables in struct zone
- aging_total
- aging_milestone
- page_age
to keep track of page aging rate, and keep it in sync on page reclaim time.
The aging_total is just a per-zone counter-part to the per-cpu
pgscan_{kswapd,direct}_{zone name}. But it is not direct comparable between
zones, so the aging_milestone/page_age are maintained based on aging_total.
The page_age is a normalized value that can be direct compared between zones
with the helper macro pages_more_aged(). The goal of balancing logics are to
keep this normalized value in sync between zones.
One can check the balanced aging progress by running:
tar c / | cat > /dev/null &
watch -n1 'grep "age " /proc/zoneinfo'
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---
include/linux/mmzone.h | 14 ++++++++++++++
mm/page_alloc.c | 11 +++++++++++
mm/vmscan.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+)
--- linux.orig/include/linux/mmzone.h
+++ linux/include/linux/mmzone.h
@@ -149,6 +149,20 @@ struct zone {
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
+ /* Fields for balanced page aging:
+ * aging_total - The accumulated number of activities that may
+ * cause page aging, that is, make some pages closer
+ * to the tail of inactive_list.
+ * aging_milestone - A snapshot of total_scan every time a full
+ * inactive_list of pages become aged.
+ * page_age - A normalized value showing the percent of pages
+ * have been aged. It is compared between zones to
+ * balance the rate of page aging.
+ */
+ unsigned long aging_total;
+ unsigned long aging_milestone;
+ unsigned long page_age;
+
/*
* Does the allocator try to reclaim pages from the zone as soon
* as it fails a watermark_ok() in __alloc_pages?
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -123,6 +123,44 @@ static long total_memory;
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
+#ifdef CONFIG_HIGHMEM64G
+#define PAGE_AGE_SHIFT 8
+#elif BITS_PER_LONG == 32
+#define PAGE_AGE_SHIFT 12
+#elif BITS_PER_LONG == 64
+#define PAGE_AGE_SHIFT 20
+#else
+#error unknown BITS_PER_LONG
+#endif
+#define PAGE_AGE_MASK ((1 << PAGE_AGE_SHIFT) - 1)
+
+/*
+ * The simplified code is: (a->page_age > b->page_age)
+ * The complexity deals with the wrap-around problem.
+ * Two page ages not close enough should also be ignored:
+ * they are out of sync and the comparison may be nonsense.
+ */
+#define pages_more_aged(a, b) \
+ ((b->page_age - a->page_age) & PAGE_AGE_MASK) > \
+ PAGE_AGE_MASK - (1 << (PAGE_AGE_SHIFT - 3)) \
+
+/*
+ * Keep track of the percent of cold pages that have been scanned / aged.
+ * It's not really ##%, but a high resolution normalized value.
+ */
+static inline void update_zone_age(struct zone *z, int nr_scan)
+{
+ unsigned long len = z->nr_inactive | 1;
+
+ z->aging_total += nr_scan;
+
+ if (z->aging_total - z->aging_milestone > len)
+ z->aging_milestone += len;
+
+ z->page_age = ((z->aging_total - z->aging_milestone)
+ << PAGE_AGE_SHIFT) / len;
+}
+
/*
* Add a shrinker callback to be called from the vm
*/
@@ -888,6 +926,7 @@ static void shrink_cache(struct zone *zo
&page_list, &nr_scan);
zone->nr_inactive -= nr_taken;
zone->pages_scanned += nr_scan;
+ update_zone_age(zone, nr_scan);
spin_unlock_irq(&zone->lru_lock);
if (nr_taken == 0)
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -1521,6 +1521,8 @@ void show_free_areas(void)
" active:%lukB"
" inactive:%lukB"
" present:%lukB"
+ " aging:%lukB"
+ " age:%lu"
" pages_scanned:%lu"
" all_unreclaimable? %s"
"\n",
@@ -1532,6 +1534,8 @@ void show_free_areas(void)
K(zone->nr_active),
K(zone->nr_inactive),
K(zone->present_pages),
+ K(zone->aging_total),
+ zone->page_age,
zone->pages_scanned,
(zone->all_unreclaimable ? "yes" : "no")
);
@@ -2145,6 +2149,9 @@ static void __init free_area_init_core(s
zone->nr_scan_inactive = 0;
zone->nr_active = 0;
zone->nr_inactive = 0;
+ zone->aging_total = 0;
+ zone->aging_milestone = 0;
+ zone->page_age = 0;
atomic_set(&zone->reclaim_in_progress, 0);
if (!size)
continue;
@@ -2293,6 +2300,8 @@ static int zoneinfo_show(struct seq_file
"\n high %lu"
"\n active %lu"
"\n inactive %lu"
+ "\n aging %lu"
+ "\n age %lu"
"\n scanned %lu (a: %lu i: %lu)"
"\n spanned %lu"
"\n present %lu",
@@ -2302,6 +2311,8 @@ static int zoneinfo_show(struct seq_file
zone->pages_high,
zone->nr_active,
zone->nr_inactive,
+ zone->aging_total,
+ zone->page_age,
zone->pages_scanned,
zone->nr_scan_active, zone->nr_scan_inactive,
zone->spanned_pages,
--
next prev parent reply other threads:[~2005-12-01 10:12 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-01 10:18 [PATCH 00/12] Balancing the scan rate of major caches Wu Fengguang
2005-12-01 10:18 ` [PATCH 01/12] vm: kswapd incmin Wu Fengguang
2005-12-01 10:33 ` Andrew Morton
2005-12-01 11:40 ` Wu Fengguang
2005-12-01 10:18 ` Wu Fengguang [this message]
2005-12-01 10:37 ` [PATCH 02/12] mm: supporting variables and functions for balanced zone aging Andrew Morton
2005-12-01 12:11 ` Wu Fengguang
2005-12-01 22:28 ` Marcelo Tosatti
2005-12-01 23:03 ` Andrew Morton
2005-12-02 1:19 ` Wu Fengguang
2005-12-02 1:30 ` Andrew Morton
2005-12-02 2:04 ` Wu Fengguang
2005-12-02 2:18 ` Andrea Arcangeli
2005-12-02 2:37 ` Wu Fengguang
2005-12-02 2:52 ` Andrea Arcangeli
2005-12-02 4:45 ` Andrew Morton
2005-12-02 6:38 ` Wu Fengguang
2005-12-02 2:27 ` Nick Piggin
2005-12-02 2:36 ` Andrea Arcangeli
2005-12-02 2:43 ` Wu Fengguang
2005-12-02 5:49 ` Andrew Morton
2005-12-02 7:18 ` Wu Fengguang
2005-12-02 7:27 ` Andrew Morton
2005-12-02 15:13 ` Marcelo Tosatti
2005-12-02 21:39 ` Andrew Morton
2005-12-03 0:26 ` Marcelo Tosatti
2005-12-04 6:06 ` Wu Fengguang
2005-12-02 1:26 ` Marcelo Tosatti
2005-12-02 3:40 ` Andrew Morton
2005-12-01 10:18 ` [PATCH 03/12] mm: balance zone aging in direct reclaim path Wu Fengguang
2005-12-01 10:18 ` [PATCH 04/12] mm: balance zone aging in kswapd " Wu Fengguang
2005-12-01 10:18 ` [PATCH 05/12] mm: balance slab aging Wu Fengguang
2005-12-01 10:18 ` [PATCH 06/12] mm: balance active/inactive list scan rates Wu Fengguang
2005-12-01 11:39 ` Peter Zijlstra
2005-12-01 10:18 ` [PATCH 07/12] mm: remove unnecessary variable and loop Wu Fengguang
2005-12-01 10:18 ` [PATCH 08/12] mm: remove swap_cluster_max from scan_control Wu Fengguang
2005-12-01 10:18 ` [PATCH 09/12] mm: accumulate sc.nr_scanned/sc.nr_reclaimed Wu Fengguang
2005-12-01 10:18 ` [PATCH 10/12] mm: merge sc.may_writepage and sc.may_swap into sc.flags Wu Fengguang
2005-12-01 10:18 ` [PATCH 11/12] mm: add page reclaim debug traces Wu Fengguang
2005-12-01 10:18 ` [PATCH 12/12] mm: fix minor scan count bugs Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051201101933.936973000@localhost.localdomain \
--to=wfg@mail.ustc.edu.cn \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=christoph@lameter.com \
--cc=linux-kernel@vger.kernel.org \
--cc=magnus.damm@gmail.com \
--cc=marcelo.tosatti@cyclades.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox