All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Wu Fengguang <wfg@mail.ustc.edu.cn>
Subject: [PATCH 15/32] readahead: state based method
Date: Sat, 27 May 2006 23:49:04 +0800	[thread overview]
Message-ID: <348745092.16246@ustc.edu.cn> (raw)
Message-ID: <20060527155133.216888332@localhost.localdomain> (raw)
In-Reply-To: 20060527154849.927021763@localhost.localdomain

[-- Attachment #1: readahead-method-stateful.patch --]
[-- Type: text/plain, Size: 6174 bytes --]

This is the fast code path of adaptive read-ahead.

MAJOR STEPS
===========

        - estimate a thrashing safe ra_size;
        - assemble the next read-ahead request in file_ra_state;
        - submit it.


THE REFERENCE MODEL
===================

        1. inactive list has constant length and page flow speed
        2. the observed stream receives a steady flow of read requests
        3. no page activation, so that the inactive list forms a pipe

With that we get the picture showed below.

|<------------------------- constant length ------------------------->|
<<<<<<<<<<<<<<<<<<<<<<<<< steady flow of pages <<<<<<<<<<<<<<<<<<<<<<<<
+---------------------------------------------------------------------+
|tail                        inactive list                        head|
|   =======                  ==========----                           |
|   chunk A(stale pages)     chunk B(stale + fresh pages)             |
+---------------------------------------------------------------------+


REAL WORLD ISSUES
=================

Real world workloads will always have fluctuations (violation of assumption
1 and 2). To counteract it, a tunable parameter readahead_ratio is introduced
to make the estimation conservative enough. Violation of assumption 3 will
not lead to thrashing, it is there just for simplicity of discussion.

Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---

 mm/readahead.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 147 insertions(+)

--- linux-2.6.17-rc4-mm3.orig/mm/readahead.c
+++ linux-2.6.17-rc4-mm3/mm/readahead.c
@@ -1002,6 +1002,153 @@ static int ra_dispatch(struct file_ra_st
 }
 
 /*
+ * Deduce the read-ahead/look-ahead size from primitive values.
+ *
+ * Input:
+ *	- @ra_size stores the estimated thrashing-threshold.
+ *	- @la_size stores the look-ahead size of previous request.
+ */
+static int adjust_rala(unsigned long ra_max,
+				unsigned long *ra_size, unsigned long *la_size)
+{
+	unsigned long stream_shift = *la_size;
+
+	/*
+	 * Substract the old look-ahead to get real safe size for the next
+	 * read-ahead request.
+	 */
+	if (*ra_size > *la_size)
+		*ra_size -= *la_size;
+	else {
+		ra_account(NULL, RA_EVENT_READAHEAD_SHRINK, *ra_size);
+		return 0;
+	}
+
+	/*
+	 * Set new la_size according to the (still large) ra_size.
+	 */
+	*la_size = *ra_size / LOOKAHEAD_RATIO;
+
+	/*
+	 * Apply upper limits.
+	 */
+	if (*ra_size > ra_max)
+		*ra_size = ra_max;
+	if (*la_size > *ra_size)
+		*la_size = *ra_size;
+
+	/*
+	 * Make sure stream_shift is not too small.
+	 * (So that the next global_shift will not be too small.)
+	 */
+	stream_shift += (*ra_size - *la_size);
+	if (stream_shift < *ra_size / 4)
+		*la_size -= (*ra_size / 4 - stream_shift);
+
+	return 1;
+}
+
+/*
+ * The function estimates two values:
+ * 1. thrashing-threshold for the current stream
+ *    It is returned to make the next read-ahead request.
+ * 2. the remained safe space for the current chunk
+ *    It will be checked to ensure that the current chunk is safe.
+ *
+ * The computation will be pretty accurate under heavy load, and will vibrate
+ * more on light load(with small global_shift), so the grow speed of ra_size
+ * must be limited, and a moderate large stream_shift must be insured.
+ *
+ * This figure illustrates the formula used in the function:
+ * While the stream reads stream_shift pages inside the chunks,
+ * the chunks are shifted global_shift pages inside inactive_list.
+ *
+ *      chunk A                    chunk B
+ *                          |<=============== global_shift ================|
+ *  +-------------+         +-------------------+                          |
+ *  |       #     |         |           #       |            inactive_list |
+ *  +-------------+         +-------------------+                     head |
+ *          |---->|         |---------->|
+ *             |                  |
+ *             +-- stream_shift --+
+ */
+static unsigned long compute_thrashing_threshold(struct file_ra_state *ra,
+							unsigned long *remain)
+{
+	unsigned long global_size;
+	unsigned long global_shift;
+	unsigned long stream_shift;
+	unsigned long ra_size;
+	uint64_t ll;
+
+	global_size = nr_free_inactive_pages_node(numa_node_id());
+	global_shift = node_readahead_aging() - ra->age;
+	global_shift |= 1UL;
+	stream_shift = ra_invoke_interval(ra);
+
+	/* future safe space */
+	ll = (uint64_t) stream_shift * (global_size >> 9) * readahead_ratio * 5;
+	do_div(ll, global_shift);
+	ra_size = ll;
+
+	/* remained safe space */
+	if (global_size > global_shift) {
+		ll = (uint64_t) stream_shift * (global_size - global_shift);
+		do_div(ll, global_shift);
+		*remain = ll;
+	} else
+		*remain = 0;
+
+	ddprintk("compute_thrashing_threshold: "
+			"at %lu ra %lu=%lu*%lu/%lu, remain %lu for %lu\n",
+			ra->readahead_index, ra_size,
+			stream_shift, global_size, global_shift,
+			*remain, ra_lookahead_size(ra));
+
+	return ra_size;
+}
+
+/*
+ * Main function for file_ra_state based read-ahead.
+ */
+static unsigned long
+state_based_readahead(struct address_space *mapping, struct file *filp,
+			struct file_ra_state *ra,
+			struct page *page, pgoff_t index,
+			unsigned long req_size, unsigned long ra_max)
+{
+	unsigned long ra_old;
+	unsigned long ra_size;
+	unsigned long la_size;
+	unsigned long remain_space;
+	unsigned long growth_limit;
+
+	la_size = ra->readahead_index - index;
+	ra_size = compute_thrashing_threshold(ra, &remain_space);
+
+	if (page && remain_space <= la_size && la_size > 1) {
+		rescue_pages(page, la_size);
+		return 0;
+	}
+
+	ra_old = ra_readahead_size(ra);
+	growth_limit = req_size;
+	growth_limit += ra_max / 16;
+	growth_limit += (2 + readahead_ratio / 64) * ra_old;
+	if (growth_limit > ra_max)
+		growth_limit = ra_max;
+
+	if (!adjust_rala(growth_limit, &ra_size, &la_size))
+		return 0;
+
+	ra_set_class(ra, RA_CLASS_STATE);
+	ra_set_index(ra, index, ra->readahead_index);
+	ra_set_size(ra, ra_size, la_size);
+
+	return ra_dispatch(ra, mapping, filp);
+}
+
+/*
  * ra_min is mainly determined by the size of cache memory. Reasonable?
  *
  * Table of concrete numbers for 4KB page size:

--

  parent reply	other threads:[~2006-05-27 15:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-27 15:48 [PATCH 00/32] Adaptive readahead V14 Wu Fengguang
2006-05-27 15:48 ` Wu Fengguang
2006-05-27 17:29   ` Michael Tokarev
2006-05-28 12:08     ` Wu Fengguang
2006-05-28 12:08       ` Wu Fengguang
2006-05-28 19:23         ` Michael Tokarev
2006-05-29  3:01           ` Wu Fengguang
2006-05-29  3:01             ` Wu Fengguang
2006-05-30  9:23             ` Jens Axboe
2006-05-30 11:32               ` Wu Fengguang
2006-05-30 11:32                 ` Wu Fengguang
2006-05-30 12:29                 ` Jens Axboe
2006-05-30 14:34                   ` Wu Fengguang
2006-05-30 14:34                     ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 01/32] readahead: kconfig options Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 04/32] mm: introduce PG_readahead Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 06/32] readahead: delay page release in do_generic_mapping_read() Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 07/32] readahead: insert cond_resched() calls Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 08/32] readahead: {MIN,MAX}_RA_PAGES Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 09/32] readahead: events accounting Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:48 ` [PATCH 10/32] readahead: rescue_pages() Wu Fengguang
2006-05-27 15:48   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 11/32] readahead: sysctl parameters Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 14/32] readahead: state based method - routines Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` Wu Fengguang [this message]
2006-05-27 15:49   ` [PATCH 15/32] readahead: state based method Wu Fengguang
2006-05-27 15:49 ` [PATCH 18/32] readahead: initial method - thrashing guard size Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 20/32] readahead: initial method - user recommended size Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 22/32] readahead: backward prefetching method Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 24/32] readahead: thrashing recovery method Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 22:04     ` [PATCH 23/32] readahead: seeking reads method Ingo Oeser
2006-05-27 15:49 ` [PATCH 25/32] readahead: call scheme Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 26/32] readahead: laptop mode Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 27/32] readahead: loop case Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 30/32] readahead: debug radix tree new functions Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 31/32] readahead: debug traces showing accessed file names Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang
2006-05-27 15:49 ` [PATCH 32/32] readahead: debug traces showing read patterns Wu Fengguang
2006-05-27 15:49   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=348745092.16246@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.