public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 16/33] readahead: state based method
Date: Fri, 26 May 2006 19:39:22 +0800	[thread overview]
Message-ID: <348644382.05317@ustc.edu.cn> (raw)
Message-ID: <20060526115307.794859372@localhost.localdomain> (raw)
In-Reply-To: 20060526113906.084341801@localhost.localdomain

[-- Attachment #1: readahead-method-stateful.patch --]
[-- Type: text/plain, Size: 6157 bytes --]

This is the fast code path of adaptive read-ahead.

MAJOR STEPS
===========

        - estimate a thrashing safe ra_size;
        - assemble the next read-ahead request in file_ra_state;
        - submit it.


THE REFERENCE MODEL
===================

        1. inactive list has constant length and page flow speed
        2. the observed stream receives a steady flow of read requests
        3. no page activation, so that the inactive list forms a pipe

With that we get the picture showed below.

|<------------------------- constant length ------------------------->|
<<<<<<<<<<<<<<<<<<<<<<<<< steady flow of pages <<<<<<<<<<<<<<<<<<<<<<<<
+---------------------------------------------------------------------+
|tail                        inactive list                        head|
|   =======                  ==========----                           |
|   chunk A(stale pages)     chunk B(stale + fresh pages)             |
+---------------------------------------------------------------------+


REAL WORLD ISSUES
=================

Real world workloads will always have fluctuations (violation of assumption
1 and 2). To counteract it, a tunable parameter readahead_ratio is introduced
to make the estimation conservative enough. Violation of assumption 3 will
not lead to thrashing, it is there just for simplicity of discussion.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---

 mm/readahead.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 147 insertions(+)

--- linux-2.6.17-rc4-mm3.orig/mm/readahead.c
+++ linux-2.6.17-rc4-mm3/mm/readahead.c
@@ -1038,6 +1038,153 @@ static int ra_dispatch(struct file_ra_st
 }
 
 /*
+ * Deduce the read-ahead/look-ahead size from primitive values.
+ *
+ * Input:
+ *	- @ra_size stores the estimated thrashing-threshold.
+ *	- @la_size stores the look-ahead size of previous request.
+ */
+static int adjust_rala(unsigned long ra_max,
+				unsigned long *ra_size, unsigned long *la_size)
+{
+	unsigned long stream_shift = *la_size;
+
+	/*
+	 * Substract the old look-ahead to get real safe size for the next
+	 * read-ahead request.
+	 */
+	if (*ra_size > *la_size)
+		*ra_size -= *la_size;
+	else {
+		ra_account(NULL, RA_EVENT_READAHEAD_SHRINK, *ra_size);
+		return 0;
+	}
+
+	/*
+	 * Set new la_size according to the (still large) ra_size.
+	 */
+	*la_size = *ra_size / LOOKAHEAD_RATIO;
+
+	/*
+	 * Apply upper limits.
+	 */
+	if (*ra_size > ra_max)
+		*ra_size = ra_max;
+	if (*la_size > *ra_size)
+		*la_size = *ra_size;
+
+	/*
+	 * Make sure stream_shift is not too small.
+	 * (So that the next global_shift will not be too small.)
+	 */
+	stream_shift += (*ra_size - *la_size);
+	if (stream_shift < *ra_size / 4)
+		*la_size -= (*ra_size / 4 - stream_shift);
+
+	return 1;
+}
+
+/*
+ * The function estimates two values:
+ * 1. thrashing-threshold for the current stream
+ *    It is returned to make the next read-ahead request.
+ * 2. the remained safe space for the current chunk
+ *    It will be checked to ensure that the current chunk is safe.
+ *
+ * The computation will be pretty accurate under heavy load, and will vibrate
+ * more on light load(with small global_shift), so the grow speed of ra_size
+ * must be limited, and a moderate large stream_shift must be insured.
+ *
+ * This figure illustrates the formula used in the function:
+ * While the stream reads stream_shift pages inside the chunks,
+ * the chunks are shifted global_shift pages inside inactive_list.
+ *
+ *      chunk A                    chunk B
+ *                          |<=============== global_shift ================|
+ *  +-------------+         +-------------------+                          |
+ *  |       #     |         |           #       |            inactive_list |
+ *  +-------------+         +-------------------+                     head |
+ *          |---->|         |---------->|
+ *             |                  |
+ *             +-- stream_shift --+
+ */
+static unsigned long compute_thrashing_threshold(struct file_ra_state *ra,
+							unsigned long *remain)
+{
+	unsigned long global_size;
+	unsigned long global_shift;
+	unsigned long stream_shift;
+	unsigned long ra_size;
+	uint64_t ll;
+
+	global_size = node_free_and_cold_pages();
+	global_shift = node_readahead_aging() - ra->age;
+	global_shift |= 1UL;
+	stream_shift = ra_invoke_interval(ra);
+
+	/* future safe space */
+	ll = (uint64_t) stream_shift * (global_size >> 9) * readahead_ratio * 5;
+	do_div(ll, global_shift);
+	ra_size = ll;
+
+	/* remained safe space */
+	if (global_size > global_shift) {
+		ll = (uint64_t) stream_shift * (global_size - global_shift);
+		do_div(ll, global_shift);
+		*remain = ll;
+	} else
+		*remain = 0;
+
+	ddprintk("compute_thrashing_threshold: "
+			"at %lu ra %lu=%lu*%lu/%lu, remain %lu for %lu\n",
+			ra->readahead_index, ra_size,
+			stream_shift, global_size, global_shift,
+			*remain, ra_lookahead_size(ra));
+
+	return ra_size;
+}
+
+/*
+ * Main function for file_ra_state based read-ahead.
+ */
+static unsigned long
+state_based_readahead(struct address_space *mapping, struct file *filp,
+			struct file_ra_state *ra,
+			struct page *page, pgoff_t index,
+			unsigned long req_size, unsigned long ra_max)
+{
+	unsigned long ra_old;
+	unsigned long ra_size;
+	unsigned long la_size;
+	unsigned long remain_space;
+	unsigned long growth_limit;
+
+	la_size = ra->readahead_index - index;
+	ra_size = compute_thrashing_threshold(ra, &remain_space);
+
+	if (page && remain_space <= la_size && la_size > 1) {
+		rescue_pages(page, la_size);
+		return 0;
+	}
+
+	ra_old = ra_readahead_size(ra);
+	growth_limit = req_size;
+	growth_limit += ra_max / 16;
+	growth_limit += (2 + readahead_ratio / 64) * ra_old;
+	if (growth_limit > ra_max)
+		growth_limit = ra_max;
+
+	if (!adjust_rala(growth_limit, &ra_size, &la_size))
+		return 0;
+
+	ra_set_class(ra, RA_CLASS_STATE);
+	ra_set_index(ra, index, ra->readahead_index);
+	ra_set_size(ra, ra_size, la_size);
+
+	return ra_dispatch(ra, mapping, filp);
+}
+
+/*
  * ra_min is mainly determined by the size of cache memory. Reasonable?
  *
  * Table of concrete numbers for 4KB page size:

--

  parent reply	other threads:[~2006-05-26 12:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060526113906.084341801@localhost.localdomain>
     [not found] ` <20060526115259.223408850@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 02/33] radixtree: introduce __radix_tree_lookup_parent() Wu Fengguang
2006-05-26 13:56     ` Christoph Lameter
     [not found]       ` <20060526140951.GA13954@mail.ustc.edu.cn>
2006-05-26 14:09         ` Wu Fengguang
     [not found] ` <20060526115259.809011306@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 03/33] radixtree: introduce radix_tree_scan_hole[_backward]() Wu Fengguang
     [not found] ` <20060526115300.609227164@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 04/33] mm: introduce probe_pages() Wu Fengguang
     [not found] ` <20060526115301.640751284@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 06/33] readahead: add look-ahead support to __do_page_cache_readahead() Wu Fengguang
     [not found] ` <20060526115302.278500703@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 07/33] readahead: delay page release in do_generic_mapping_read() Wu Fengguang
     [not found] ` <20060526115303.499451943@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 09/33] readahead: {MIN,MAX}_RA_PAGES Wu Fengguang
     [not found] ` <20060526115304.094503892@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 10/33] readahead: events accounting Wu Fengguang
     [not found] ` <20060526115304.821789643@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 11/33] readahead: rescue_pages() Wu Fengguang
     [not found] ` <20060526115305.437903777@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 12/33] readahead: sysctl parameters Wu Fengguang
     [not found] ` <20060526115306.535453644@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 14/33] readahead: state based method - aging accounting Wu Fengguang
     [not found] ` <20060526115307.794859372@localhost.localdomain>
2006-05-26 11:39   ` Wu Fengguang [this message]
     [not found] ` <20060526115308.522890112@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 17/33] readahead: context based method Wu Fengguang
     [not found] ` <20060526115309.581525784@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 19/33] readahead: initial method - thrashing guard size Wu Fengguang
     [not found] ` <20060526115310.948231030@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 21/33] readahead: initial method - user recommended size Wu Fengguang
     [not found] ` <20060526115311.541535720@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 22/33] readahead: initial method Wu Fengguang
     [not found] ` <20060526115312.145248016@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 23/33] readahead: backward prefetching method Wu Fengguang
     [not found] ` <20060526115313.491576583@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 25/33] readahead: thrashing recovery method Wu Fengguang
     [not found] ` <20060526115314.929319286@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 26/33] readahead: call scheme Wu Fengguang
     [not found] ` <20060526115315.823465555@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 28/33] readahead: loop case Wu Fengguang
     [not found] ` <20060526115316.335626686@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 29/33] readahead: nfsd case Wu Fengguang
     [not found] ` <20060526115316.925345724@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 30/33] readahead: turn on by default Wu Fengguang
     [not found] ` <20060526115317.663871267@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 31/33] readahead: debug radix tree new functions Wu Fengguang
     [not found] ` <20060526115318.181350700@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 32/33] readahead: debug traces showing accessed file names Wu Fengguang
     [not found] ` <20060526115318.520512078@localhost.localdomain>
2006-05-26 11:39   ` [PATCH 33/33] readahead: debug traces showing read patterns Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=348644382.05317@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox