From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@osdl.org>,
Christoph Lameter <christoph@lameter.com>,
Rik van Riel <riel@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
Magnus Damm <magnus.damm@gmail.com>,
Nick Piggin <npiggin@suse.de>, Andrea Arcangeli <andrea@suse.de>,
Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 06/16] mm: balance slab aging
Date: Wed, 07 Dec 2005 18:48:01 +0800 [thread overview]
Message-ID: <20051207105019.800865000@localhost.localdomain> (raw)
In-Reply-To: 20051207104755.177435000@localhost.localdomain
[-- Attachment #1: mm-balance-slab-aging.patch --]
[-- Type: text/plain, Size: 8590 bytes --]
The current slab shrinking code is way too fragile.
Let it manage aging pace by itself, and provide a simple and robust interface.
The design considerations:
- use the same syncing facilities as that of the zones
- keep the age of slabs in line with that of the largest zone
this in effect makes aging rate of slabs follow that of the most aged node.
- reserve a minimal number of unused slabs
the size of reservation depends on vm pressure
- shrink more slab caches only when vm pressure is high
the old logic, `mmap pages found' - `shrink more caches' - `avoid swapping',
sounds not quite logical, so the code is removed.
- let sc->nr_scanned record the exact number of cold pages scanned
it is no longer used by the slab cache shrinking algorithm, but good for other
algorithms(e.g. the active_list/inactive_list balancing).
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---
fs/drop-pagecache.c | 2
include/linux/mm.h | 7 +--
mm/vmscan.c | 106 +++++++++++++++++++++-------------------------------
3 files changed, 48 insertions(+), 67 deletions(-)
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -798,7 +798,9 @@ struct shrinker {
shrinker_t shrinker;
struct list_head list;
int seeks; /* seeks to recreate an obj */
- long nr; /* objs pending delete */
+ unsigned long aging_total;
+ unsigned long aging_milestone;
+ unsigned long page_age;
struct shrinker_stats *s_stats;
};
@@ -1080,8 +1082,7 @@ int in_gate_area_no_task(unsigned long a
int drop_pagecache_sysctl_handler(struct ctl_table *, int, struct file *,
void __user *, size_t *, loff_t *);
-int shrink_slab(unsigned long scanned, gfp_t gfp_mask,
- unsigned long lru_pages);
+int shrink_slab(struct zone *zone, int priority, gfp_t gfp_mask);
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -170,6 +170,18 @@ static inline void update_zone_age(struc
<< PAGE_AGE_SHIFT) / len;
}
+static inline void update_slab_age(struct shrinker *s,
+ unsigned long len, int nr_scan)
+{
+ s->aging_total += nr_scan;
+
+ if (s->aging_total - s->aging_milestone > len)
+ s->aging_milestone += len;
+
+ s->page_age = ((s->aging_total - s->aging_milestone)
+ << PAGE_AGE_SHIFT) / len;
+}
+
/*
* Add a shrinker callback to be called from the vm
*/
@@ -181,7 +193,9 @@ struct shrinker *set_shrinker(int seeks,
if (shrinker) {
shrinker->shrinker = theshrinker;
shrinker->seeks = seeks;
- shrinker->nr = 0;
+ shrinker->aging_total = 0;
+ shrinker->aging_milestone = 0;
+ shrinker->page_age = 0;
shrinker->s_stats = alloc_percpu(struct shrinker_stats);
if (!shrinker->s_stats) {
kfree(shrinker);
@@ -209,6 +223,7 @@ void remove_shrinker(struct shrinker *sh
EXPORT_SYMBOL(remove_shrinker);
#define SHRINK_BATCH 128
+#define SLAB_RESERVE 1000
/*
* Call the shrink functions to age shrinkable caches
*
@@ -217,76 +232,49 @@ EXPORT_SYMBOL(remove_shrinker);
* percentages of the lru and ageable caches. This should balance the seeks
* generated by these structures.
*
- * If the vm encounted mapped pages on the LRU it increase the pressure on
- * slab to avoid swapping.
+ * @priority reflects the vm pressure, the lower the value, the more to
+ * shrink.
*
- * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
- *
- * `lru_pages' represents the number of on-LRU pages in all the zones which
- * are eligible for the caller's allocation attempt. It is used for balancing
- * slab reclaim versus page reclaim.
+ * @zone is better to be the least over-scanned one (normally the highest
+ * zone).
*
* Returns the number of slab objects which we shrunk.
*/
-int shrink_slab(unsigned long scanned, gfp_t gfp_mask, unsigned long lru_pages)
+int shrink_slab(struct zone *zone, int priority, gfp_t gfp_mask)
{
struct shrinker *shrinker;
int ret = 0;
- if (scanned == 0)
- scanned = SWAP_CLUSTER_MAX;
-
if (!down_read_trylock(&shrinker_rwsem))
return 1; /* Assume we'll be able to shrink next time */
list_for_each_entry(shrinker, &shrinker_list, list) {
- unsigned long long delta;
- unsigned long total_scan;
- unsigned long max_pass = (*shrinker->shrinker)(0, gfp_mask);
-
- delta = (4 * scanned) / shrinker->seeks;
- delta *= max_pass;
- do_div(delta, lru_pages + 1);
- shrinker->nr += delta;
- if (shrinker->nr < 0) {
- printk(KERN_ERR "%s: nr=%ld\n",
- __FUNCTION__, shrinker->nr);
- shrinker->nr = max_pass;
- }
-
- /*
- * Avoid risking looping forever due to too large nr value:
- * never try to free more than twice the estimate number of
- * freeable entries.
- */
- if (shrinker->nr > max_pass * 2)
- shrinker->nr = max_pass * 2;
-
- total_scan = shrinker->nr;
- shrinker->nr = 0;
-
- while (total_scan >= SHRINK_BATCH) {
- long this_scan = SHRINK_BATCH;
- int shrink_ret;
+ while (!zone || age_gt(zone, shrinker)) {
int nr_before;
+ int nr_after;
nr_before = (*shrinker->shrinker)(0, gfp_mask);
- shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask);
- if (shrink_ret == -1)
+ if (nr_before <= SLAB_RESERVE * priority / DEF_PRIORITY)
+ break;
+
+ nr_after = (*shrinker->shrinker)(SHRINK_BATCH, gfp_mask);
+ if (nr_after == -1)
break;
- if (shrink_ret < nr_before) {
- ret += nr_before - shrink_ret;
- shrinker_stat_add(shrinker, nr_freed,
- (nr_before - shrink_ret));
+
+ if (nr_after < nr_before) {
+ int nr_freed = nr_before - nr_after;
+
+ ret += nr_freed;
+ shrinker_stat_add(shrinker, nr_freed, nr_freed);
}
- shrinker_stat_add(shrinker, nr_req, this_scan);
- mod_page_state(slabs_scanned, this_scan);
- total_scan -= this_scan;
+ shrinker_stat_add(shrinker, nr_req, SHRINK_BATCH);
+ mod_page_state(slabs_scanned, SHRINK_BATCH);
+ update_slab_age(shrinker, nr_before * DEF_PRIORITY * 2,
+ SHRINK_BATCH * shrinker->seeks *
+ (DEF_PRIORITY + priority));
cond_resched();
}
-
- shrinker->nr += total_scan;
}
up_read(&shrinker_rwsem);
return ret;
@@ -492,11 +480,6 @@ static int shrink_list(struct list_head
BUG_ON(PageActive(page));
- sc->nr_scanned++;
- /* Double the slab pressure for mapped and swapcache pages */
- if (page_mapped(page) || PageSwapCache(page))
- sc->nr_scanned++;
-
if (PageWriteback(page))
goto keep_locked;
@@ -941,6 +924,7 @@ static void shrink_cache(struct zone *zo
goto done;
max_scan -= nr_scan;
+ sc->nr_scanned += nr_scan;
if (current_is_kswapd())
mod_page_state_zone(zone, pgscan_kswapd, nr_scan);
else
@@ -1259,7 +1243,6 @@ int try_to_free_pages(struct zone **zone
int total_scanned = 0, total_reclaimed = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc;
- unsigned long lru_pages = 0;
int i;
delay_prefetch();
@@ -1277,7 +1260,6 @@ int try_to_free_pages(struct zone **zone
continue;
zone->temp_priority = DEF_PRIORITY;
- lru_pages += zone->nr_active + zone->nr_inactive;
}
/* The added 10 priorities are for scan rate balancing */
@@ -1290,7 +1272,8 @@ int try_to_free_pages(struct zone **zone
if (!priority)
disable_swap_token();
shrink_caches(zones, &sc);
- shrink_slab(sc.nr_scanned, gfp_mask, lru_pages);
+ if (zone_idx(zones[0]))
+ shrink_slab(zones[0], priority, gfp_mask);
if (reclaim_state) {
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
reclaim_state->reclaimed_slab = 0;
@@ -1386,8 +1369,6 @@ loop_again:
}
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- unsigned long lru_pages = 0;
-
all_zones_ok = 1;
sc.nr_scanned = 0;
sc.nr_reclaimed = 0;
@@ -1447,7 +1428,6 @@ scan_swspd:
zone->temp_priority = priority;
if (zone->prev_priority > priority)
zone->prev_priority = priority;
- lru_pages += zone->nr_active + zone->nr_inactive;
shrink_zone(zone, &sc);
@@ -1456,7 +1436,7 @@ scan_swspd:
zone->all_unreclaimable = 1;
}
reclaim_state->reclaimed_slab = 0;
- shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
+ shrink_slab(prev_zone, priority, GFP_KERNEL);
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_reclaimed += sc.nr_reclaimed;
total_scanned += sc.nr_scanned;
--- linux.orig/fs/drop-pagecache.c
+++ linux/fs/drop-pagecache.c
@@ -47,7 +47,7 @@ static void drop_slab(void)
int nr_objects;
do {
- nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+ nr_objects = shrink_slab(NULL, 0, GFP_KERNEL);
} while (nr_objects > 10);
}
--
next prev parent reply other threads:[~2005-12-07 10:24 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-07 10:47 [PATCH 00/16] Balancing the scan rate of major caches V3 Wu Fengguang
2005-12-07 10:47 ` [PATCH 01/16] mm: restore sc.nr_to_reclaim Wu Fengguang
2005-12-07 10:47 ` [PATCH 02/16] mm: simplify kswapd reclaim code Wu Fengguang
2005-12-07 10:47 ` [PATCH 03/16] mm: supporting variables and functions for balanced zone aging Wu Fengguang
2005-12-11 22:36 ` Marcelo Tosatti
2005-12-12 2:53 ` Wu Fengguang
2005-12-07 10:47 ` [PATCH 04/16] mm: balance zone aging in direct reclaim path Wu Fengguang
2005-12-07 10:48 ` [PATCH 05/16] mm: balance zone aging in kswapd " Wu Fengguang
2005-12-07 10:58 ` Wu Fengguang
2005-12-07 13:32 ` Wu Fengguang
2005-12-07 10:48 ` Wu Fengguang [this message]
2005-12-07 11:08 ` [PATCH 06/16] mm: balance slab aging Wu Fengguang
2005-12-07 11:34 ` Nick Piggin
2005-12-07 12:59 ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 07/16] mm: balance active/inactive list scan rates Wu Fengguang
2005-12-07 10:48 ` [PATCH 08/16] mm: fine grained scan priority Wu Fengguang
2005-12-07 10:48 ` [PATCH 09/16] mm: remove unnecessary variable and loop Wu Fengguang
2006-01-05 19:21 ` Marcelo Tosatti
2006-01-06 8:58 ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 10/16] mm: remove swap_cluster_max from scan_control Wu Fengguang
2005-12-07 10:48 ` [PATCH 11/16] mm: let sc.nr_scanned/sc.nr_reclaimed accumulate Wu Fengguang
2005-12-07 10:48 ` [PATCH 12/16] mm: fold sc.may_writepage and sc.may_swap into sc.flags Wu Fengguang
2005-12-07 10:36 ` Nick Piggin
2005-12-07 11:11 ` Wu Fengguang
2005-12-07 11:12 ` Nick Piggin
2005-12-07 13:01 ` Wu Fengguang
2005-12-07 11:15 ` Wu Fengguang
2005-12-07 17:02 ` Martin Hicks
2005-12-07 23:15 ` Andrew Morton
2005-12-07 10:48 ` [PATCH 13/16] mm: fix minor scan count bugs Wu Fengguang
2005-12-07 10:32 ` Nick Piggin
2005-12-07 11:02 ` Wu Fengguang
2005-12-07 10:48 ` [PATCH 14/16] mm: zone aging rounds accounting Wu Fengguang
2005-12-07 10:48 ` [PATCH 15/16] mm: add page reclaim debug traces Wu Fengguang
2005-12-07 10:48 ` [PATCH 16/16] mm: kswapd reclaim debug trace Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051207105019.800865000@localhost.localdomain \
--to=wfg@mail.ustc.edu.cn \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=christoph@lameter.com \
--cc=linux-kernel@vger.kernel.org \
--cc=magnus.damm@gmail.com \
--cc=marcelo.tosatti@cyclades.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox