public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@osdl.org>,
	Christoph Lameter <christoph@lameter.com>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Nick Piggin <npiggin@suse.de>, Andrea Arcangeli <andrea@suse.de>,
	Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
	Magnus Damm <magnus.damm@gmail.com>,
	Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 02/12] mm: supporting variables and functions for balanced zone aging
Date: Thu, 01 Dec 2005 18:18:12 +0800	[thread overview]
Message-ID: <20051201101933.936973000@localhost.localdomain> (raw)
In-Reply-To: 20051201101810.837245000@localhost.localdomain

[-- Attachment #1: mm-balance-zone-aging-supporting-facilities.patch --]
[-- Type: text/plain, Size: 5239 bytes --]

The zone aging rates are currently imbalanced, the gap can be as large as 3
times, which can severely damage read-ahead requests and shorten their
effective life time.

This patch adds three variables in struct zone
	- aging_total
	- aging_milestone
	- page_age
to keep track of page aging rate, and keep it in sync on page reclaim time.

The aging_total is just a per-zone counter-part to the per-cpu
pgscan_{kswapd,direct}_{zone name}. But it is not direct comparable between
zones, so the aging_milestone/page_age are maintained based on aging_total.

The page_age is a normalized value that can be direct compared between zones
with the helper macro pages_more_aged(). The goal of balancing logics are to
keep this normalized value in sync between zones.

One can check the balanced aging progress by running:
                        tar c / | cat > /dev/null &
                        watch -n1 'grep "age " /proc/zoneinfo'

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---

 include/linux/mmzone.h |   14 ++++++++++++++
 mm/page_alloc.c        |   11 +++++++++++
 mm/vmscan.c            |   39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+)

--- linux.orig/include/linux/mmzone.h
+++ linux/include/linux/mmzone.h
@@ -149,6 +149,20 @@ struct zone {
 	unsigned long		pages_scanned;	   /* since last reclaim */
 	int			all_unreclaimable; /* All pages pinned */
 
+	/* Fields for balanced page aging:
+	 * aging_total     - The accumulated number of activities that may
+	 *                   cause page aging, that is, make some pages closer
+	 *                   to the tail of inactive_list.
+	 * aging_milestone - A snapshot of total_scan every time a full
+	 *                   inactive_list of pages become aged.
+	 * page_age        - A normalized value showing the percent of pages
+	 *                   have been aged.  It is compared between zones to
+	 *                   balance the rate of page aging.
+	 */
+	unsigned long		aging_total;
+	unsigned long		aging_milestone;
+	unsigned long		page_age;
+
 	/*
 	 * Does the allocator try to reclaim pages from the zone as soon
 	 * as it fails a watermark_ok() in __alloc_pages?
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -123,6 +123,44 @@ static long total_memory;
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
+#ifdef CONFIG_HIGHMEM64G
+#define		PAGE_AGE_SHIFT  8
+#elif BITS_PER_LONG == 32
+#define		PAGE_AGE_SHIFT  12
+#elif BITS_PER_LONG == 64
+#define		PAGE_AGE_SHIFT  20
+#else
+#error unknown BITS_PER_LONG
+#endif
+#define		PAGE_AGE_MASK   ((1 << PAGE_AGE_SHIFT) - 1)
+
+/*
+ * The simplified code is: (a->page_age > b->page_age)
+ * The complexity deals with the wrap-around problem.
+ * Two page ages not close enough should also be ignored:
+ * they are out of sync and the comparison may be nonsense.
+ */
+#define pages_more_aged(a, b) 						\
+	((b->page_age - a->page_age) & PAGE_AGE_MASK) >			\
+			PAGE_AGE_MASK - (1 << (PAGE_AGE_SHIFT - 3))	\
+
+/*
+ * Keep track of the percent of cold pages that have been scanned / aged.
+ * It's not really ##%, but a high resolution normalized value.
+ */
+static inline void update_zone_age(struct zone *z, int nr_scan)
+{
+	unsigned long len = z->nr_inactive | 1;
+
+	z->aging_total += nr_scan;
+
+	if (z->aging_total - z->aging_milestone > len)
+		z->aging_milestone += len;
+
+	z->page_age = ((z->aging_total - z->aging_milestone)
+						<< PAGE_AGE_SHIFT) / len;
+}
+
 /*
  * Add a shrinker callback to be called from the vm
  */
@@ -888,6 +926,7 @@ static void shrink_cache(struct zone *zo
 					     &page_list, &nr_scan);
 		zone->nr_inactive -= nr_taken;
 		zone->pages_scanned += nr_scan;
+		update_zone_age(zone, nr_scan);
 		spin_unlock_irq(&zone->lru_lock);
 
 		if (nr_taken == 0)
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -1521,6 +1521,8 @@ void show_free_areas(void)
 			" active:%lukB"
 			" inactive:%lukB"
 			" present:%lukB"
+			" aging:%lukB"
+			" age:%lu"
 			" pages_scanned:%lu"
 			" all_unreclaimable? %s"
 			"\n",
@@ -1532,6 +1534,8 @@ void show_free_areas(void)
 			K(zone->nr_active),
 			K(zone->nr_inactive),
 			K(zone->present_pages),
+			K(zone->aging_total),
+			zone->page_age,
 			zone->pages_scanned,
 			(zone->all_unreclaimable ? "yes" : "no")
 			);
@@ -2145,6 +2149,9 @@ static void __init free_area_init_core(s
 		zone->nr_scan_inactive = 0;
 		zone->nr_active = 0;
 		zone->nr_inactive = 0;
+		zone->aging_total = 0;
+		zone->aging_milestone = 0;
+		zone->page_age = 0;
 		atomic_set(&zone->reclaim_in_progress, 0);
 		if (!size)
 			continue;
@@ -2293,6 +2300,8 @@ static int zoneinfo_show(struct seq_file
 			   "\n        high     %lu"
 			   "\n        active   %lu"
 			   "\n        inactive %lu"
+			   "\n        aging    %lu"
+			   "\n        age      %lu"
 			   "\n        scanned  %lu (a: %lu i: %lu)"
 			   "\n        spanned  %lu"
 			   "\n        present  %lu",
@@ -2302,6 +2311,8 @@ static int zoneinfo_show(struct seq_file
 			   zone->pages_high,
 			   zone->nr_active,
 			   zone->nr_inactive,
+			   zone->aging_total,
+			   zone->page_age,
 			   zone->pages_scanned,
 			   zone->nr_scan_active, zone->nr_scan_inactive,
 			   zone->spanned_pages,

--

  parent reply	other threads:[~2005-12-01 10:12 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-01 10:18 [PATCH 00/12] Balancing the scan rate of major caches Wu Fengguang
2005-12-01 10:18 ` [PATCH 01/12] vm: kswapd incmin Wu Fengguang
2005-12-01 10:33   ` Andrew Morton
2005-12-01 11:40     ` Wu Fengguang
2005-12-01 10:18 ` Wu Fengguang [this message]
2005-12-01 10:37   ` [PATCH 02/12] mm: supporting variables and functions for balanced zone aging Andrew Morton
2005-12-01 12:11     ` Wu Fengguang
2005-12-01 22:28     ` Marcelo Tosatti
2005-12-01 23:03       ` Andrew Morton
2005-12-02  1:19         ` Wu Fengguang
2005-12-02  1:30           ` Andrew Morton
2005-12-02  2:04             ` Wu Fengguang
2005-12-02  2:18               ` Andrea Arcangeli
2005-12-02  2:37                 ` Wu Fengguang
2005-12-02  2:52                   ` Andrea Arcangeli
2005-12-02  4:45                 ` Andrew Morton
2005-12-02  6:38                   ` Wu Fengguang
2005-12-02  2:27               ` Nick Piggin
2005-12-02  2:36                 ` Andrea Arcangeli
2005-12-02  2:43                 ` Wu Fengguang
2005-12-02  5:49           ` Andrew Morton
2005-12-02  7:18             ` Wu Fengguang
2005-12-02  7:27               ` Andrew Morton
2005-12-02 15:13             ` Marcelo Tosatti
2005-12-02 21:39               ` Andrew Morton
2005-12-03  0:26                 ` Marcelo Tosatti
2005-12-04  6:06                   ` Wu Fengguang
2005-12-02  1:26         ` Marcelo Tosatti
2005-12-02  3:40           ` Andrew Morton
2005-12-01 10:18 ` [PATCH 03/12] mm: balance zone aging in direct reclaim path Wu Fengguang
2005-12-01 10:18 ` [PATCH 04/12] mm: balance zone aging in kswapd " Wu Fengguang
2005-12-01 10:18 ` [PATCH 05/12] mm: balance slab aging Wu Fengguang
2005-12-01 10:18 ` [PATCH 06/12] mm: balance active/inactive list scan rates Wu Fengguang
2005-12-01 11:39   ` Peter Zijlstra
2005-12-01 10:18 ` [PATCH 07/12] mm: remove unnecessary variable and loop Wu Fengguang
2005-12-01 10:18 ` [PATCH 08/12] mm: remove swap_cluster_max from scan_control Wu Fengguang
2005-12-01 10:18 ` [PATCH 09/12] mm: accumulate sc.nr_scanned/sc.nr_reclaimed Wu Fengguang
2005-12-01 10:18 ` [PATCH 10/12] mm: merge sc.may_writepage and sc.may_swap into sc.flags Wu Fengguang
2005-12-01 10:18 ` [PATCH 11/12] mm: add page reclaim debug traces Wu Fengguang
2005-12-01 10:18 ` [PATCH 12/12] mm: fix minor scan count bugs Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051201101933.936973000@localhost.localdomain \
    --to=wfg@mail.ustc.edu.cn \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=christoph@lameter.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.damm@gmail.com \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox