From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 16/33] readahead: state based method
Date: Fri, 26 May 2006 19:39:22 +0800 [thread overview]
Message-ID: <348644382.05317@ustc.edu.cn> (raw)
Message-ID: <20060526115307.794859372@localhost.localdomain> (raw)
In-Reply-To: 20060526113906.084341801@localhost.localdomain
[-- Attachment #1: readahead-method-stateful.patch --]
[-- Type: text/plain, Size: 6157 bytes --]
This is the fast code path of adaptive read-ahead.
MAJOR STEPS
===========
- estimate a thrashing safe ra_size;
- assemble the next read-ahead request in file_ra_state;
- submit it.
THE REFERENCE MODEL
===================
1. inactive list has constant length and page flow speed
2. the observed stream receives a steady flow of read requests
3. no page activation, so that the inactive list forms a pipe
With that we get the picture showed below.
|<------------------------- constant length ------------------------->|
<<<<<<<<<<<<<<<<<<<<<<<<< steady flow of pages <<<<<<<<<<<<<<<<<<<<<<<<
+---------------------------------------------------------------------+
|tail inactive list head|
| ======= ==========---- |
| chunk A(stale pages) chunk B(stale + fresh pages) |
+---------------------------------------------------------------------+
REAL WORLD ISSUES
=================
Real world workloads will always have fluctuations (violation of assumption
1 and 2). To counteract it, a tunable parameter readahead_ratio is introduced
to make the estimation conservative enough. Violation of assumption 3 will
not lead to thrashing, it is there just for simplicity of discussion.
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---
mm/readahead.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 147 insertions(+)
--- linux-2.6.17-rc4-mm3.orig/mm/readahead.c
+++ linux-2.6.17-rc4-mm3/mm/readahead.c
@@ -1038,6 +1038,153 @@ static int ra_dispatch(struct file_ra_st
}
/*
+ * Deduce the read-ahead/look-ahead size from primitive values.
+ *
+ * Input:
+ * - @ra_size stores the estimated thrashing-threshold.
+ * - @la_size stores the look-ahead size of previous request.
+ */
+static int adjust_rala(unsigned long ra_max,
+ unsigned long *ra_size, unsigned long *la_size)
+{
+ unsigned long stream_shift = *la_size;
+
+ /*
+ * Substract the old look-ahead to get real safe size for the next
+ * read-ahead request.
+ */
+ if (*ra_size > *la_size)
+ *ra_size -= *la_size;
+ else {
+ ra_account(NULL, RA_EVENT_READAHEAD_SHRINK, *ra_size);
+ return 0;
+ }
+
+ /*
+ * Set new la_size according to the (still large) ra_size.
+ */
+ *la_size = *ra_size / LOOKAHEAD_RATIO;
+
+ /*
+ * Apply upper limits.
+ */
+ if (*ra_size > ra_max)
+ *ra_size = ra_max;
+ if (*la_size > *ra_size)
+ *la_size = *ra_size;
+
+ /*
+ * Make sure stream_shift is not too small.
+ * (So that the next global_shift will not be too small.)
+ */
+ stream_shift += (*ra_size - *la_size);
+ if (stream_shift < *ra_size / 4)
+ *la_size -= (*ra_size / 4 - stream_shift);
+
+ return 1;
+}
+
+/*
+ * The function estimates two values:
+ * 1. thrashing-threshold for the current stream
+ * It is returned to make the next read-ahead request.
+ * 2. the remained safe space for the current chunk
+ * It will be checked to ensure that the current chunk is safe.
+ *
+ * The computation will be pretty accurate under heavy load, and will vibrate
+ * more on light load(with small global_shift), so the grow speed of ra_size
+ * must be limited, and a moderate large stream_shift must be insured.
+ *
+ * This figure illustrates the formula used in the function:
+ * While the stream reads stream_shift pages inside the chunks,
+ * the chunks are shifted global_shift pages inside inactive_list.
+ *
+ * chunk A chunk B
+ * |<=============== global_shift ================|
+ * +-------------+ +-------------------+ |
+ * | # | | # | inactive_list |
+ * +-------------+ +-------------------+ head |
+ * |---->| |---------->|
+ * | |
+ * +-- stream_shift --+
+ */
+static unsigned long compute_thrashing_threshold(struct file_ra_state *ra,
+ unsigned long *remain)
+{
+ unsigned long global_size;
+ unsigned long global_shift;
+ unsigned long stream_shift;
+ unsigned long ra_size;
+ uint64_t ll;
+
+ global_size = node_free_and_cold_pages();
+ global_shift = node_readahead_aging() - ra->age;
+ global_shift |= 1UL;
+ stream_shift = ra_invoke_interval(ra);
+
+ /* future safe space */
+ ll = (uint64_t) stream_shift * (global_size >> 9) * readahead_ratio * 5;
+ do_div(ll, global_shift);
+ ra_size = ll;
+
+ /* remained safe space */
+ if (global_size > global_shift) {
+ ll = (uint64_t) stream_shift * (global_size - global_shift);
+ do_div(ll, global_shift);
+ *remain = ll;
+ } else
+ *remain = 0;
+
+ ddprintk("compute_thrashing_threshold: "
+ "at %lu ra %lu=%lu*%lu/%lu, remain %lu for %lu\n",
+ ra->readahead_index, ra_size,
+ stream_shift, global_size, global_shift,
+ *remain, ra_lookahead_size(ra));
+
+ return ra_size;
+}
+
+/*
+ * Main function for file_ra_state based read-ahead.
+ */
+static unsigned long
+state_based_readahead(struct address_space *mapping, struct file *filp,
+ struct file_ra_state *ra,
+ struct page *page, pgoff_t index,
+ unsigned long req_size, unsigned long ra_max)
+{
+ unsigned long ra_old;
+ unsigned long ra_size;
+ unsigned long la_size;
+ unsigned long remain_space;
+ unsigned long growth_limit;
+
+ la_size = ra->readahead_index - index;
+ ra_size = compute_thrashing_threshold(ra, &remain_space);
+
+ if (page && remain_space <= la_size && la_size > 1) {
+ rescue_pages(page, la_size);
+ return 0;
+ }
+
+ ra_old = ra_readahead_size(ra);
+ growth_limit = req_size;
+ growth_limit += ra_max / 16;
+ growth_limit += (2 + readahead_ratio / 64) * ra_old;
+ if (growth_limit > ra_max)
+ growth_limit = ra_max;
+
+ if (!adjust_rala(growth_limit, &ra_size, &la_size))
+ return 0;
+
+ ra_set_class(ra, RA_CLASS_STATE);
+ ra_set_index(ra, index, ra->readahead_index);
+ ra_set_size(ra, ra_size, la_size);
+
+ return ra_dispatch(ra, mapping, filp);
+}
+
+/*
* ra_min is mainly determined by the size of cache memory. Reasonable?
*
* Table of concrete numbers for 4KB page size:
--
next prev parent reply other threads:[~2006-05-26 12:00 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20060526113906.084341801@localhost.localdomain>
2006-05-26 11:39 ` [PATCH 02/33] radixtree: introduce __radix_tree_lookup_parent() Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 13:56 ` Christoph Lameter
2006-05-26 14:09 ` Wu Fengguang
2006-05-26 14:09 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 03/33] radixtree: introduce radix_tree_scan_hole[_backward]() Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 04/33] mm: introduce probe_pages() Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 06/33] readahead: add look-ahead support to __do_page_cache_readahead() Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 07/33] readahead: delay page release in do_generic_mapping_read() Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 09/33] readahead: {MIN,MAX}_RA_PAGES Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 10/33] readahead: events accounting Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 11/33] readahead: rescue_pages() Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 12/33] readahead: sysctl parameters Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 14/33] readahead: state based method - aging accounting Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang [this message]
2006-05-26 11:39 ` [PATCH 16/33] readahead: state based method Wu Fengguang
2006-05-26 11:39 ` [PATCH 17/33] readahead: context " Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 19/33] readahead: initial method - thrashing guard size Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 21/33] readahead: initial method - user recommended size Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 22/33] readahead: initial method Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 23/33] readahead: backward prefetching method Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 25/33] readahead: thrashing recovery method Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 26/33] readahead: call scheme Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 28/33] readahead: loop case Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 29/33] readahead: nfsd case Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 30/33] readahead: turn on by default Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 31/33] readahead: debug radix tree new functions Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 32/33] readahead: debug traces showing accessed file names Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
2006-05-26 11:39 ` [PATCH 33/33] readahead: debug traces showing read patterns Wu Fengguang
2006-05-26 11:39 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=348644382.05317@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.