public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@osdl.org>,
	Christoph Lameter <christoph@lameter.com>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
	Magnus Damm <magnus.damm@gmail.com>,
	Nick Piggin <npiggin@suse.de>, Andrea Arcangeli <andrea@suse.de>,
	Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 05/16] mm: balance zone aging in kswapd reclaim path
Date: Wed, 07 Dec 2005 18:48:00 +0800	[thread overview]
Message-ID: <20051207105004.018561000@localhost.localdomain> (raw)
In-Reply-To: 20051207104755.177435000@localhost.localdomain

[-- Attachment #1: mm-balance-zone-aging-in-kswapd-reclaim.patch --]
[-- Type: text/plain, Size: 4431 bytes --]

The vm subsystem is rather complex. System memory is divided into zones,
lower zones act as fallback of higher zones in memory allocation.  The page
reclaim algorithm should generally keep zone aging rates in sync. But if a
zone under watermark has many unreclaimable pages, it has to be scanned much
more to get enough free pages. While doing this,

- lower zones should also be scanned more, since their pages are also usable
  for higher zone allocations.
- higher zones should not be scanned just to keep the aging in sync, which
  can evict large amount of pages without saving the problem(and may well
  worsen it).

With that in mind, the patch does the rebalance in kswapd as follows:
1) reclaim from the lowest zone when
	- under pages_high
	- under pages_high+lowmem_reserve, and less/equal aged than highest
	  zone(or out of sync with it)
2) reclaim from higher zones when
	- under pages_high+lowmem_reserve, and less/equal aged than its
	  immediate lower neighbor(or out of sync with it)

Note that the zone age is a normalized value in range 0-4096 on i386/4G. 4096
corresponds to a full scan of one zone. And the comparison of ages are only
deemed ok if the gap is less than 4096/8, or they will be regarded as out of
sync.

On exit, the code ensures:
1) the lowest zone will be pages_high ok
2) at least one zone will be pages_high+lowmem_reserve ok
3) a very strong force of rebalancing with the exception of
	- some lower zones are unreclaimable: we must let them go ahead
	  alone, leaving higher zones back
	- shrink_zone() scans too much and creates huge imbalance in one
	  run(Nick is working on this)

The logic can deal with known normal/abnormal situations gracefully:
1) Normal case
	- zone ages are cyclicly tied together: taking over each other, and
	  keeping close enough

2) A Zone is unreclaimable, scanned much more, and become out of sync
	- if ever a troublesome zone is being overscanned, the logic brings
	  its lower neighbors ahead together, leaving higher neighbors back.
	- the aging tie between the two groups is broken, and the relevant
	  zones are reclaimed when pages_high+lowmem_reserve not ok, just as
	  before the patch.
	- at some time the zone ages meet again and back to normal
	- a possiblely better strategy, as soon as the pressure disappeared,
	  might be relunctant to reclaim from the already overscanned lower
	  group, and let the higher group slowly catch up.

3) Zone is truncated
	- will not reclaim from it until under watermark

With this patch, the meaning of zone->pages_high+lowmem_reserve changed from
the _required_ watermark to the _recommended_ watermark. Someone might be
willing to increase them somehow.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---

 mm/vmscan.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1364,6 +1364,7 @@ static int balance_pgdat(pg_data_t *pgda
 	int total_scanned, total_reclaimed;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
+	struct zone *prev_zone = pgdat->node_zones;
 
 loop_again:
 	total_scanned = 0;
@@ -1379,6 +1380,9 @@ loop_again:
 		struct zone *zone = pgdat->node_zones + i;
 
 		zone->temp_priority = DEF_PRIORITY;
+
+		if (populated_zone(zone))
+			prev_zone = zone;
 	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -1409,14 +1413,34 @@ loop_again:
 			if (!populated_zone(zone))
 				continue;
 
-			if (nr_pages == 0) {	/* Not software suspend */
-				if (zone_watermark_ok(zone, order,
-					zone->pages_high, 0, 0))
-					continue;
+			if (nr_pages) 	/* software suspend */
+				goto scan_swspd;
 
-				all_zones_ok = 0;
+			if (zone_watermark_ok(zone, order,
+						zone->pages_high,
+						pgdat->nr_zones - 1, 0)) {
+				/* free pages enough, no reclaim */
+			} else if (zone < prev_zone) {
+				if (!zone_watermark_ok(zone, order,
+						zone->pages_high, 0, 0)) {
+					/* have to scan for free pages */
+					goto scan;
+				}
+				if (age_ge(prev_zone, zone)) {
+					/* catch up if falls behind */
+					goto scan;
+				}
+			} else if (!age_gt(zone, prev_zone)) {
+				/* catch up if falls behind or out of sync */
+				goto scan;
 			}
 
+			prev_zone = zone;
+			continue;
+scan:
+			prev_zone = zone;
+			all_zones_ok = 0;
+scan_swspd:
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 

--

  parent reply	other threads:[~2005-12-07 10:24 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-07 10:47 [PATCH 00/16] Balancing the scan rate of major caches V3 Wu Fengguang
2005-12-07 10:47 ` [PATCH 01/16] mm: restore sc.nr_to_reclaim Wu Fengguang
2005-12-07 10:47 ` [PATCH 02/16] mm: simplify kswapd reclaim code Wu Fengguang
2005-12-07 10:47 ` [PATCH 03/16] mm: supporting variables and functions for balanced zone aging Wu Fengguang
2005-12-11 22:36   ` Marcelo Tosatti
2005-12-12  2:53     ` Wu Fengguang
2005-12-07 10:47 ` [PATCH 04/16] mm: balance zone aging in direct reclaim path Wu Fengguang
2005-12-07 10:48 ` Wu Fengguang [this message]
2005-12-07 10:58   ` [PATCH 05/16] mm: balance zone aging in kswapd " Wu Fengguang
2005-12-07 13:32   ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 06/16] mm: balance slab aging Wu Fengguang
2005-12-07 11:08   ` Wu Fengguang
2005-12-07 11:34     ` Nick Piggin
2005-12-07 12:59       ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 07/16] mm: balance active/inactive list scan rates Wu Fengguang
2005-12-07 10:48 ` [PATCH 08/16] mm: fine grained scan priority Wu Fengguang
2005-12-07 10:48 ` [PATCH 09/16] mm: remove unnecessary variable and loop Wu Fengguang
2006-01-05 19:21   ` Marcelo Tosatti
2006-01-06  8:58     ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 10/16] mm: remove swap_cluster_max from scan_control Wu Fengguang
2005-12-07 10:48 ` [PATCH 11/16] mm: let sc.nr_scanned/sc.nr_reclaimed accumulate Wu Fengguang
2005-12-07 10:48 ` [PATCH 12/16] mm: fold sc.may_writepage and sc.may_swap into sc.flags Wu Fengguang
2005-12-07 10:36   ` Nick Piggin
2005-12-07 11:11     ` Wu Fengguang
2005-12-07 11:12       ` Nick Piggin
2005-12-07 13:01         ` Wu Fengguang
2005-12-07 11:15   ` Wu Fengguang
2005-12-07 17:02     ` Martin Hicks
2005-12-07 23:15       ` Andrew Morton
2005-12-07 10:48 ` [PATCH 13/16] mm: fix minor scan count bugs Wu Fengguang
2005-12-07 10:32   ` Nick Piggin
2005-12-07 11:02   ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 14/16] mm: zone aging rounds accounting Wu Fengguang
2005-12-07 10:48 ` [PATCH 15/16] mm: add page reclaim debug traces Wu Fengguang
2005-12-07 10:48 ` [PATCH 16/16] mm: kswapd reclaim debug trace Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051207105004.018561000@localhost.localdomain \
    --to=wfg@mail.ustc.edu.cn \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=christoph@lameter.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.damm@gmail.com \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox