public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, npiggin@suse.de
Subject: [PATCH] Improve heuristic detecting sequential reads
Date: Tue, 10 Apr 2007 17:54:11 +0200	[thread overview]
Message-ID: <20070410155411.GF13445@duck.suse.cz> (raw)

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]

  Hello!

  In thread http://lkml.org/lkml/2007/3/9/403, we discussed a problem
with the current heuristic for detecting sequential IO in
do_generic_mapping_read() - for small files a page is marked as accessed
only once which can cause a performance problems.
  Attached is a patch that improves the heuristic by introducing a
ra.offset variable so now we can reliably detect sequetial reads.
The patch replaces Nick's patch from http://lkml.org/lkml/2007/3/15/177
(Nick has acked to replace the patch). Could you please put the patch
into -mm?

							Bye
								Honza

-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

[-- Attachment #2: readahead-2.6.21-rc6.diff --]
[-- Type: text/x-patch, Size: 3581 bytes --]

Introduce ra.offset and store in it an offset where the previous read ended. This way
we can detect whether reads are really sequential (and thus we should not mark the page
as accessed repeatedly) or whether they are random and just happen to be in the same page
(and the page should really be marked accessed again).

Signed-off-by: Jan Kara <jack@suse.cz>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.21-rc6/include/linux/fs.h linux-2.6.21-rc6-1-readahead/include/linux/fs.h
--- linux-2.6.21-rc6/include/linux/fs.h	2007-04-10 17:09:58.000000000 +0200
+++ linux-2.6.21-rc6-1-readahead/include/linux/fs.h	2007-04-10 17:14:42.000000000 +0200
@@ -702,6 +702,7 @@ struct file_ra_state {
 	unsigned long ra_pages;		/* Maximum readahead window */
 	unsigned long mmap_hit;		/* Cache hit stat for mmap accesses */
 	unsigned long mmap_miss;	/* Cache miss stat for mmap accesses */
+	unsigned int offset;		/* Offset where last read() ended in a page */
 };
 #define RA_FLAG_MISS 0x01	/* a cache miss occured against this file */
 #define RA_FLAG_INCACHE 0x02	/* file is already in cache */
diff -rupX /home/jack/.kerndiffexclude linux-2.6.21-rc6/mm/filemap.c linux-2.6.21-rc6-1-readahead/mm/filemap.c
--- linux-2.6.21-rc6/mm/filemap.c	2007-04-10 17:10:00.000000000 +0200
+++ linux-2.6.21-rc6-1-readahead/mm/filemap.c	2007-04-10 17:18:10.000000000 +0200
@@ -868,6 +868,7 @@ void do_generic_mapping_read(struct addr
 	unsigned long last_index;
 	unsigned long next_index;
 	unsigned long prev_index;
+	unsigned int prev_offset;
 	loff_t isize;
 	struct page *cached_page;
 	int error;
@@ -877,6 +878,7 @@ void do_generic_mapping_read(struct addr
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	next_index = index;
 	prev_index = ra.prev_page;
+	prev_offset = ra.offset;
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
@@ -924,10 +926,10 @@ page_ok:
 			flush_dcache_page(page);
 
 		/*
-		 * When (part of) the same page is read multiple times
-		 * in succession, only mark it as accessed the first time.
+		 * When a sequential read accesses a page several times,
+		 * only mark it as accessed the first time.
 		 */
-		if (prev_index != index)
+		if (prev_index != index || offset != prev_offset)
 			mark_page_accessed(page);
 		prev_index = index;
 
@@ -945,6 +947,7 @@ page_ok:
 		offset += ret;
 		index += offset >> PAGE_CACHE_SHIFT;
 		offset &= ~PAGE_CACHE_MASK;
+		prev_offset = ra.offset = offset;
 
 		page_cache_release(page);
 		if (ret == nr && desc->count)
diff -rupX /home/jack/.kerndiffexclude linux-2.6.21-rc6/mm/readahead.c linux-2.6.21-rc6-1-readahead/mm/readahead.c
--- linux-2.6.21-rc6/mm/readahead.c	2007-04-10 17:10:00.000000000 +0200
+++ linux-2.6.21-rc6-1-readahead/mm/readahead.c	2007-04-10 17:14:42.000000000 +0200
@@ -207,6 +207,8 @@ out:
  *              If page_cache_readahead sees that it is again being called for
  *              a page which it just looked at, it can return immediately without
  *              making any state changes.
+ * offset:      Offset in the prev_page where the last read ended - used for
+ *              detection of sequential file reading.
  * ahead_start,
  * ahead_size:  Together, these form the "ahead window".
  * ra_pages:	The externally controlled max readahead for this fd.
@@ -473,6 +475,7 @@ page_cache_readahead(struct address_spac
 	/* Note that prev_page == -1 if it is a first read */
 	sequential = (offset == ra->prev_page + 1);
 	ra->prev_page = offset;
+	ra->offset = 0;
 
 	max = get_max_readahead(ra);
 	newsize = min(req_size, max);

             reply	other threads:[~2007-04-10 15:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-10 15:54 Jan Kara [this message]
2007-04-10 22:56 ` [PATCH] Improve heuristic detecting sequential reads Andi Kleen
2007-04-10 23:45   ` Andrew Morton
     [not found]     ` <20070411035448.GA4724@mail.ustc.edu.cn>
2007-04-11  3:54       ` WU Fengguang
2007-04-11 10:20     ` Andi Kleen
2007-04-11  2:27 ` Andrew Morton
2007-04-11 12:17   ` Jan Kara
2007-04-12 15:57   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070410155411.GF13445@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox