From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
Chris Frost <frost@cs.ucla.edu>,
Steve VanDeBogart <vandebo@cs.ucla.edu>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Wu Fengguang <fengguang.wu@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Olivier Galibert <galibert@pobox.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 02/11] readahead: retain inactive lru pages to be accessed soon
Date: Sun, 07 Feb 2010 12:10:15 +0800 [thread overview]
Message-ID: <20100207041042.996584378@intel.com> (raw)
In-Reply-To: 20100207041013.891441102@intel.com
[-- Attachment #1: readahead-retain-pages-find_get_page.patch --]
[-- Type: text/plain, Size: 3571 bytes --]
From: Chris Frost <frost@cs.ucla.edu>
Ensure that cached pages in the inactive list are not prematurely evicted;
move such pages to lru head when they are covered by
- in-kernel heuristic readahead
- an posix_fadvise(POSIX_FADV_WILLNEED) hint from an application
Before this patch, pages already in core may be evicted before the
pages covered by the same prefetch scan but that were not yet in core.
Many small read requests may be forced on the disk because of this
behavior.
In particular, posix_fadvise(... POSIX_FADV_WILLNEED) on an in-core page
has no effect on the page's location in the LRU list, even if it is the
next victim on the inactive list.
This change helps address the performance problems we encountered
while modifying SQLite and the GIMP to use large file prefetching.
Overall these prefetching techniques improved the runtime of large
benchmarks by 10-17x for these applications. More in the publication
_Reducing Seek Overhead with Application-Directed Prefetching_ in
USENIX ATC 2009 and at http://libprefetch.cs.ucla.edu/.
Signed-off-by: Chris Frost <frost@cs.ucla.edu>
Signed-off-by: Steve VanDeBogart <vandebo@cs.ucla.edu>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/readahead.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
--- linux.orig/mm/readahead.c 2010-02-01 10:18:57.000000000 +0800
+++ linux/mm/readahead.c 2010-02-01 10:20:51.000000000 +0800
@@ -9,7 +9,9 @@
#include <linux/kernel.h>
#include <linux/fs.h>
+#include <linux/memcontrol.h>
#include <linux/mm.h>
+#include <linux/mm_inline.h>
#include <linux/module.h>
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
@@ -133,6 +135,40 @@ out:
}
/*
+ * The file range is expected to be accessed in near future. Move pages
+ * (possibly in inactive lru tail) to lru head, so that they are retained
+ * in memory for some reasonable time.
+ */
+static void retain_inactive_pages(struct address_space *mapping,
+ pgoff_t index, int len)
+{
+ int i;
+ struct page *page;
+ struct zone *zone;
+
+ for (i = 0; i < len; i++) {
+ page = find_get_page(mapping, index + i);
+ if (!page)
+ continue;
+
+ zone = page_zone(page);
+ spin_lock_irq(&zone->lru_lock);
+
+ if (PageLRU(page) &&
+ !PageActive(page) &&
+ !PageUnevictable(page)) {
+ int lru = page_lru_base_type(page);
+
+ del_page_from_lru_list(zone, page, lru);
+ add_page_to_lru_list(zone, page, lru);
+ }
+
+ spin_unlock_irq(&zone->lru_lock);
+ put_page(page);
+ }
+}
+
+/*
* __do_page_cache_readahead() actually reads a chunk of disk. It allocates all
* the pages first, then submits them all for I/O. This avoids the very bad
* behaviour which would occur if page allocations are causing VM writeback.
@@ -184,6 +220,14 @@ __do_page_cache_readahead(struct address
}
/*
+ * Normally readahead will auto stop on cached segments, so we won't
+ * hit many cached pages. If it does happen, bring the inactive pages
+ * adjecent to the newly prefetched ones(if any).
+ */
+ if (ret < nr_to_read)
+ retain_inactive_pages(mapping, offset, page_idx);
+
+ /*
* Now start the IO. We ignore I/O errors - if the page is not
* uptodate then the caller will launch readpage again, and
* will then handle the error.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-07 4:10 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-07 4:10 [PATCH 00/11] 512K readahead size with thrashing safe readahead Wu Fengguang
2010-02-07 4:10 ` [PATCH 01/11] readahead: limit readahead size for small devices Wu Fengguang
2010-02-07 4:10 ` Wu Fengguang [this message]
2010-02-07 4:10 ` [PATCH 03/11] readahead: bump up the default readahead size Wu Fengguang
2010-02-08 7:20 ` Christian Ehrhardt
2010-02-08 13:46 ` Wu Fengguang
2010-02-11 21:37 ` Matt Mackall
2010-02-11 23:42 ` Jamie Lokier
2010-02-12 0:04 ` Matt Mackall
2010-02-12 13:59 ` Wu Fengguang
2010-02-12 20:20 ` Matt Mackall
2010-02-21 2:25 ` Wu Fengguang
2010-02-07 4:10 ` [PATCH 04/11] readahead: introduce {MAX|MIN}_READAHEAD_PAGES macros for ease of use Wu Fengguang
2010-02-07 4:10 ` [PATCH 05/11] readahead: replace ra->mmap_miss with ra->ra_flags Wu Fengguang
2010-02-08 8:19 ` Nick Piggin
2010-02-08 13:43 ` Wu Fengguang
2010-02-07 4:10 ` [PATCH 06/11] readahead: thrashing safe context readahead Wu Fengguang
2010-02-07 4:10 ` [PATCH 07/11] readahead: record readahead patterns Wu Fengguang
2010-02-07 4:10 ` [PATCH 08/11] readahead: add tracing event Wu Fengguang
2010-02-07 4:10 ` [PATCH 09/11] readahead: add /debug/readahead/stats Wu Fengguang
2010-02-07 4:10 ` [PATCH 10/11] readahead: dont do start-of-file readahead after lseek() Wu Fengguang
2010-02-07 4:10 ` [PATCH 11/11] radixtree: speed up next/prev hole search Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100207041042.996584378@intel.com \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=frost@cs.ucla.edu \
--cc=jens.axboe@oracle.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=vandebo@cs.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).