From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH 14/28] readahead: state based method
Date: Wed, 15 Nov 2006 15:50:21 +0800 [thread overview]
Message-ID: <363577025.21912@ustc.edu.cn> (raw)
Message-ID: <20061115075028.829507795@localhost.localdomain> (raw)
In-Reply-To: 20061115075007.832957580@localhost.localdomain
[-- Attachment #1: readahead-state-based-method.patch --]
[-- Type: text/plain, Size: 6894 bytes --]
This is the fast code path of adaptive read-ahead.
MAJOR STEPS
===========
- estimate a thrashing safe ra_size;
- assemble the next read-ahead request in file_ra_state;
- submit it.
THE REFERENCE MODEL
===================
1. inactive list has constant length and page flow speed
2. the observed stream receives a steady flow of read requests
3. no page activation, so that the inactive list forms a pipe
With that we get the picture showed below.
|<------------------------- constant length ------------------------->|
<<<<<<<<<<<<<<<<<<<<<<<<< steady flow of pages <<<<<<<<<<<<<<<<<<<<<<<<
+---------------------------------------------------------------------+
|tail inactive list head|
| ======= ==========---- |
| chunk A(stale pages) chunk B(stale + fresh pages) |
+---------------------------------------------------------------------+
REAL WORLD ISSUES
=================
Real world workloads will always have fluctuations (violation of assumption
1 and 2). To counteract it, a tunable parameter readahead_ratio is introduced
to make the estimation conservative enough. Violation of assumption 3 will
not lead to thrashing, it is there just for simplicity of discussion.
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
DESC
readahead: state based method - stand-alone size limit code
EDESC
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
Separate out the readahead/lookahead sizes limiting code, and put them to
stand-alone limit_rala() function.
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
--- linux-2.6.19-rc5-mm2.orig/mm/readahead.c
+++ linux-2.6.19-rc5-mm2/mm/readahead.c
@@ -18,6 +18,8 @@
#include <linux/pagevec.h>
#include <linux/buffer_head.h>
+#include <asm/div64.h>
+
/*
* Convienent macros for min/max read-ahead pages.
* Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
@@ -964,6 +966,161 @@ static int ra_submit(struct file_ra_stat
}
/*
+ * Deduce the read-ahead/look-ahead size from primitive values.
+ *
+ * Input:
+ * - @ra_size stores the estimated thrashing-threshold.
+ * - @la_size stores the look-ahead size of previous request.
+ */
+static int adjust_rala(unsigned long ra_max,
+ unsigned long *ra_size, unsigned long *la_size)
+{
+ /*
+ * Substract the old look-ahead to get real safe size for the next
+ * read-ahead request.
+ */
+ if (*ra_size > *la_size)
+ *ra_size -= *la_size;
+ else {
+ ra_account(NULL, RA_EVENT_READAHEAD_SHRINK, *ra_size);
+ return 0;
+ }
+
+ /*
+ * Set new la_size according to the (still large) ra_size.
+ */
+ *la_size = *ra_size / LOOKAHEAD_RATIO;
+
+ return 1;
+}
+
+static void limit_rala(unsigned long ra_max, unsigned long la_old,
+ unsigned long *ra_size, unsigned long *la_size)
+{
+ unsigned long stream_shift;
+
+ /*
+ * Apply basic upper limits.
+ */
+ if (*ra_size > ra_max)
+ *ra_size = ra_max;
+ if (*la_size > *ra_size)
+ *la_size = *ra_size;
+
+ /*
+ * Make sure stream_shift is not too small.
+ * (So that the next global_shift will not be too small.)
+ */
+ stream_shift = la_old + (*ra_size - *la_size);
+ if (stream_shift < *ra_size / 4)
+ *la_size -= (*ra_size / 4 - stream_shift);
+}
+
+/*
+ * The function estimates two values:
+ * 1. thrashing-threshold for the current stream
+ * It is returned to make the next read-ahead request.
+ * 2. the remained safe space for the current chunk
+ * It will be checked to ensure that the current chunk is safe.
+ *
+ * The computation will be pretty accurate under heavy load, and will vibrate
+ * more on light load(with small global_shift), so the grow speed of ra_size
+ * must be limited, and a moderate large stream_shift must be insured.
+ *
+ * The following figure illustrates the formula used in the function:
+ * While the stream reads stream_shift pages inside the chunks,
+ * the chunks are shifted global_shift pages inside inactive_list.
+ * So
+ * thrashing_threshold = free_mem * stream_shift / global_shift;
+ *
+ *
+ * chunk A chunk B
+ * |<=============== global_shift ================|
+ * +-------------+ +-------------------+ |
+ * | # | | # | inactive_list |
+ * +-------------+ +-------------------+ head |
+ * |---->| |---------->|
+ * | |
+ * +-- stream_shift --+
+ */
+static unsigned long compute_thrashing_threshold(struct file_ra_state *ra,
+ unsigned long *remain)
+{
+ unsigned long global_size;
+ unsigned long global_shift;
+ unsigned long stream_shift;
+ unsigned long ra_size;
+ uint64_t ll;
+
+ global_size = nr_free_inactive_pages_node(numa_node_id());
+ global_shift = nr_scanned_pages_node(numa_node_id()) - ra->age;
+ global_shift |= 1UL;
+ stream_shift = ra_invoke_interval(ra);
+
+ /* future safe space */
+ ll = (uint64_t) stream_shift * (global_size >> 9) * readahead_ratio * 5;
+ do_div(ll, global_shift);
+ ra_size = ll;
+
+ /* remained safe space */
+ if (global_size > global_shift) {
+ ll = (uint64_t) stream_shift * (global_size - global_shift);
+ do_div(ll, global_shift);
+ *remain = ll;
+ } else
+ *remain = 0;
+
+ ddprintk("compute_thrashing_threshold: "
+ "at %lu ra %lu=%lu*%lu/%lu, remain %lu for %lu\n",
+ ra->readahead_index, ra_size,
+ stream_shift, global_size, global_shift,
+ *remain, ra_lookahead_size(ra));
+
+ return ra_size;
+}
+
+/*
+ * Main function for file_ra_state based read-ahead.
+ */
+static unsigned long
+state_based_readahead(struct address_space *mapping, struct file *filp,
+ struct file_ra_state *ra,
+ struct page *page, pgoff_t offset,
+ unsigned long req_size, unsigned long ra_max)
+{
+ unsigned long ra_old, ra_size;
+ unsigned long la_old, la_size;
+ unsigned long remain_space;
+ unsigned long growth_limit;
+
+ la_old = la_size = ra->readahead_index - offset;
+ ra_old = ra_readahead_size(ra);
+ ra_size = compute_thrashing_threshold(ra, &remain_space);
+
+ if (page && remain_space <= la_size) {
+ rescue_pages(page, la_size);
+ return 0;
+ }
+
+ growth_limit = req_size;
+ growth_limit += ra_max / 16;
+ growth_limit += (2 + readahead_ratio / 64) * ra_old;
+ if (growth_limit > ra_max)
+ growth_limit = ra_max;
+
+ if (!adjust_rala(growth_limit, &ra_size, &la_size))
+ return 0;
+
+ limit_rala(growth_limit, la_old, &ra_size, &la_size);
+
+ ra_set_class(ra, RA_CLASS_STATE);
+ ra_set_index(ra, offset, ra->readahead_index);
+ ra_set_size(ra, ra_size, la_size);
+
+ return ra_submit(ra, mapping, filp);
+}
+
+/*
* ra_min is mainly determined by the size of cache memory. Reasonable?
*
* Table of concrete numbers for 4KB page size:
--
next prev parent reply other threads:[~2006-11-15 7:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20061115075007.832957580@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 00/28] Adaptive readahead V16 Wu Fengguang
[not found] ` <20061115075024.180138257@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 01/28] readahead: kconfig options Wu Fengguang
[not found] ` <20061115075024.503627543@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 02/28] radixtree: introduce scan hole/data functions Wu Fengguang
[not found] ` <20061115075024.850542829@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 03/28] mm: introduce probe_page() Wu Fengguang
[not found] ` <20061115075025.438524224@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 04/28] mm: introduce PG_readahead Wu Fengguang
[not found] ` <20061115075026.121499794@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 06/28] readahead: insert cond_resched() calls Wu Fengguang
[not found] ` <20061115075027.139255636@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 09/28] readahead: rescue_pages() Wu Fengguang
[not found] ` <20061115075027.832896629@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 11/28] readahead: min/max sizes Wu Fengguang
[not found] ` <20061115075028.178039166@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 12/28] readahead: state based method - aging accounting Wu Fengguang
2006-11-15 16:54 ` Christoph Lameter
[not found] ` <20061116133919.GA6645@mail.ustc.edu.cn>
2006-11-16 13:39 ` Wu Fengguang
[not found] ` <20061115075028.494374406@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 13/28] readahead: state based method - routines Wu Fengguang
[not found] ` <20061115075028.829507795@localhost.localdomain>
2006-11-15 7:50 ` Wu Fengguang [this message]
[not found] ` <20061115075029.205178794@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 15/28] readahead: context based method Wu Fengguang
[not found] ` <20061115075029.519507130@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 16/28] readahead: initial method - guiding sizes Wu Fengguang
[not found] ` <20061115075029.869472273@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 17/28] readahead: initial method - thrashing guard size Wu Fengguang
[not found] ` <20061115075030.229339867@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 18/28] readahead: initial method - user recommended size Wu Fengguang
[not found] ` <20061115075030.942942737@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 20/28] readahead: backward prefetching method Wu Fengguang
[not found] ` <20061115075031.286178806@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 21/28] readahead: thrashing recovery method Wu Fengguang
[not found] ` <20061115075031.524129110@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 22/28] readahead: call scheme Wu Fengguang
[not found] ` <20061115075031.909090639@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 23/28] readahead: laptop mode Wu Fengguang
[not found] ` <20061115075032.213167260@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 24/28] readahead: loop case Wu Fengguang
[not found] ` <20061115075032.515501374@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 25/28] readahead: nfsd case Wu Fengguang
[not found] ` <20061115075032.945192537@localhost.localdomain>
2006-11-15 7:50 ` [PATCH 26/28] readahead: turn on by default Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=363577025.21912@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox