From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH 14/28] readahead: state based method
Date: Wed, 15 Nov 2006 15:50:21 +0800 [thread overview]
Message-ID: <363577025.21912@ustc.edu.cn> (raw)
Message-ID: <20061115075028.829507795@localhost.localdomain> (raw)
In-Reply-To: 20061115075007.832957580@localhost.localdomain
[-- Attachment #1: readahead-state-based-method.patch --]
[-- Type: text/plain, Size: 6894 bytes --]
This is the fast code path of adaptive read-ahead.
MAJOR STEPS
===========
- estimate a thrashing safe ra_size;
- assemble the next read-ahead request in file_ra_state;
- submit it.
THE REFERENCE MODEL
===================
1. inactive list has constant length and page flow speed
2. the observed stream receives a steady flow of read requests
3. no page activation, so that the inactive list forms a pipe
With that we get the picture showed below.
|<------------------------- constant length ------------------------->|
<<<<<<<<<<<<<<<<<<<<<<<<< steady flow of pages <<<<<<<<<<<<<<<<<<<<<<<<
+---------------------------------------------------------------------+
|tail inactive list head|
| ======= ==========---- |
| chunk A(stale pages) chunk B(stale + fresh pages) |
+---------------------------------------------------------------------+
REAL WORLD ISSUES
=================
Real world workloads will always have fluctuations (violation of assumption
1 and 2). To counteract it, a tunable parameter readahead_ratio is introduced
to make the estimation conservative enough. Violation of assumption 3 will
not lead to thrashing, it is there just for simplicity of discussion.
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
DESC
readahead: state based method - stand-alone size limit code
EDESC
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
Separate out the readahead/lookahead sizes limiting code, and put them to
stand-alone limit_rala() function.
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
--- linux-2.6.19-rc5-mm2.orig/mm/readahead.c
+++ linux-2.6.19-rc5-mm2/mm/readahead.c
@@ -18,6 +18,8 @@
#include <linux/pagevec.h>
#include <linux/buffer_head.h>
+#include <asm/div64.h>
+
/*
* Convienent macros for min/max read-ahead pages.
* Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
@@ -964,6 +966,161 @@ static int ra_submit(struct file_ra_stat
}
/*
+ * Deduce the read-ahead/look-ahead size from primitive values.
+ *
+ * Input:
+ * - @ra_size stores the estimated thrashing-threshold.
+ * - @la_size stores the look-ahead size of previous request.
+ */
+static int adjust_rala(unsigned long ra_max,
+ unsigned long *ra_size, unsigned long *la_size)
+{
+ /*
+ * Substract the old look-ahead to get real safe size for the next
+ * read-ahead request.
+ */
+ if (*ra_size > *la_size)
+ *ra_size -= *la_size;
+ else {
+ ra_account(NULL, RA_EVENT_READAHEAD_SHRINK, *ra_size);
+ return 0;
+ }
+
+ /*
+ * Set new la_size according to the (still large) ra_size.
+ */
+ *la_size = *ra_size / LOOKAHEAD_RATIO;
+
+ return 1;
+}
+
+static void limit_rala(unsigned long ra_max, unsigned long la_old,
+ unsigned long *ra_size, unsigned long *la_size)
+{
+ unsigned long stream_shift;
+
+ /*
+ * Apply basic upper limits.
+ */
+ if (*ra_size > ra_max)
+ *ra_size = ra_max;
+ if (*la_size > *ra_size)
+ *la_size = *ra_size;
+
+ /*
+ * Make sure stream_shift is not too small.
+ * (So that the next global_shift will not be too small.)
+ */
+ stream_shift = la_old + (*ra_size - *la_size);
+ if (stream_shift < *ra_size / 4)
+ *la_size -= (*ra_size / 4 - stream_shift);
+}
+
+/*
+ * The function estimates two values:
+ * 1. thrashing-threshold for the current stream
+ * It is returned to make the next read-ahead request.
+ * 2. the remained safe space for the current chunk
+ * It will be checked to ensure that the current chunk is safe.
+ *
+ * The computation will be pretty accurate under heavy load, and will vibrate
+ * more on light load(with small global_shift), so the grow speed of ra_size
+ * must be limited, and a moderate large stream_shift must be insured.
+ *
+ * The following figure illustrates the formula used in the function:
+ * While the stream reads stream_shift pages inside the chunks,
+ * the chunks are shifted global_shift pages inside inactive_list.
+ * So
+ * thrashing_threshold = free_mem * stream_shift / global_shift;
+ *
+ *
+ * chunk A chunk B
+ * |<=============== global_shift ================|
+ * +-------------+ +-------------------+ |
+ * | # | | # | inactive_list |
+ * +-------------+ +-------------------+ head |
+ * |---->| |---------->|
+ * | |
+ * +-- stream_shift --+
+ */
+static unsigned long compute_thrashing_threshold(struct file_ra_state *ra,
+ unsigned long *remain)
+{
+ unsigned long global_size;
+ unsigned long global_shift;
+ unsigned long stream_shift;
+ unsigned long ra_size;
+ uint64_t ll;
+
+ global_size = nr_free_inactive_pages_node(numa_node_id());
+ global_shift = nr_scanned_pages_node(numa_node_id()) - ra->age;
+ global_shift |= 1UL;
+ stream_shift = ra_invoke_interval(ra);
+
+ /* future safe space */
+ ll = (uint64_t) stream_shift * (global_size >> 9) * readahead_ratio * 5;
+ do_div(ll, global_shift);
+ ra_size = ll;
+
+ /* remained safe space */
+ if (global_size > global_shift) {
+ ll = (uint64_t) stream_shift * (global_size - global_shift);
+ do_div(ll, global_shift);
+ *remain = ll;
+ } else
+ *remain = 0;
+
+ ddprintk("compute_thrashing_threshold: "
+ "at %lu ra %lu=%lu*%lu/%lu, remain %lu for %lu\n",
+ ra->readahead_index, ra_size,
+ stream_shift, global_size, global_shift,
+ *remain, ra_lookahead_size(ra));
+
+ return ra_size;
+}
+
+/*
+ * Main function for file_ra_state based read-ahead.
+ */
+static unsigned long
+state_based_readahead(struct address_space *mapping, struct file *filp,
+ struct file_ra_state *ra,
+ struct page *page, pgoff_t offset,
+ unsigned long req_size, unsigned long ra_max)
+{
+ unsigned long ra_old, ra_size;
+ unsigned long la_old, la_size;
+ unsigned long remain_space;
+ unsigned long growth_limit;
+
+ la_old = la_size = ra->readahead_index - offset;
+ ra_old = ra_readahead_size(ra);
+ ra_size = compute_thrashing_threshold(ra, &remain_space);
+
+ if (page && remain_space <= la_size) {
+ rescue_pages(page, la_size);
+ return 0;
+ }
+
+ growth_limit = req_size;
+ growth_limit += ra_max / 16;
+ growth_limit += (2 + readahead_ratio / 64) * ra_old;
+ if (growth_limit > ra_max)
+ growth_limit = ra_max;
+
+ if (!adjust_rala(growth_limit, &ra_size, &la_size))
+ return 0;
+
+ limit_rala(growth_limit, la_old, &ra_size, &la_size);
+
+ ra_set_class(ra, RA_CLASS_STATE);
+ ra_set_index(ra, offset, ra->readahead_index);
+ ra_set_size(ra, ra_size, la_size);
+
+ return ra_submit(ra, mapping, filp);
+}
+
+/*
* ra_min is mainly determined by the size of cache memory. Reasonable?
*
* Table of concrete numbers for 4KB page size:
--
next prev parent reply other threads:[~2006-11-15 7:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-15 7:50 [PATCH 00/28] Adaptive readahead V16 Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 01/28] readahead: kconfig options Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 02/28] radixtree: introduce scan hole/data functions Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 03/28] mm: introduce probe_page() Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 04/28] mm: introduce PG_readahead Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 06/28] readahead: insert cond_resched() calls Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 09/28] readahead: rescue_pages() Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 11/28] readahead: min/max sizes Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 12/28] readahead: state based method - aging accounting Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 16:54 ` Christoph Lameter
2006-11-16 13:39 ` Wu Fengguang
2006-11-16 13:39 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 13/28] readahead: state based method - routines Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang [this message]
2006-11-15 7:50 ` [PATCH 14/28] readahead: state based method Wu Fengguang
2006-11-15 7:50 ` [PATCH 15/28] readahead: context " Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 16/28] readahead: initial method - guiding sizes Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 17/28] readahead: initial method - thrashing guard size Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 18/28] readahead: initial method - user recommended size Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 20/28] readahead: backward prefetching method Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 21/28] readahead: thrashing recovery method Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 22/28] readahead: call scheme Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 23/28] readahead: laptop mode Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 24/28] readahead: loop case Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 25/28] readahead: nfsd case Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
2006-11-15 7:50 ` [PATCH 26/28] readahead: turn on by default Wu Fengguang
2006-11-15 7:50 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=363577025.21912@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.