public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] readahead: improve sequential read detection
@ 2005-03-02 19:08 Oleg Nesterov
  2005-03-03  2:01 ` Andrew Morton
  2005-03-09 23:52 ` Ram
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2005-03-02 19:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ram Pai, Steven Pratt, Andrew Morton

1. Current code can't always detect sequential reading, in case
   when read size is not PAGE_CACHE_SIZE aligned.

   If application reads the file by 4096+512 chunks, we have:
   1st read: first read detected, prev_page = 2.
   2nd read: offset == 2, the read is considered random.

   page_cache_readahead() should treat prev_page == offset as
   sequential access. In this case it is better to ++offset,
   because of blockable_page_cache_readahead(offset, size).

2. If application reads 4096 bytes with *ppos == 512, we have to
   read 2 pages, but req_size == 1 in do_generic_mapping_read().

   Usually it's not a problem. But in random read case it results
   in unnecessary page cache misses.

~$ time dd conv=notrunc if=/tmp/GIG of=/tmp/dummy bs=$((4096+512))

2.6.11-clean:	real=370.35 user=0.16 sys=14.66
2.6.11-patched:	real=234.49 user=0.19 sys=12.41

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 2.6.11/mm/readahead.c~	2005-02-27 21:57:43.000000000 +0300
+++ 2.6.11/mm/readahead.c	2005-03-01 23:00:21.000000000 +0300
@@ -443,26 +443,26 @@ page_cache_readahead(struct address_spac
 		     struct file *filp, unsigned long offset,
 		     unsigned long req_size)
 {
-	unsigned long max, newsize = req_size;
-	int sequential = (offset == ra->prev_page + 1);
+	unsigned long max, newsize;
+	int sequential;
 
 	/*
 	 * Here we detect the case where the application is performing
 	 * sub-page sized reads.  We avoid doing extra work and bogusly
 	 * perturbing the readahead window expansion logic.
-	 * If size is zero, there is no read ahead window so we need one
 	 */
-	if (offset == ra->prev_page && req_size == 1)
-		goto out;
+	if (offset == ra->prev_page && --req_size)
+		++offset;
 
+	sequential = (offset == ra->prev_page + 1);
 	ra->prev_page = offset;
+
 	max = get_max_readahead(ra);
 	newsize = min(req_size, max);
 
-	if (newsize == 0 || (ra->flags & RA_FLAG_INCACHE)) {
-		newsize = 1;
-		goto out;	/* No readahead or file already in cache */
-	}
+	/* No readahead or file already in cache or sub-page sized read */
+	if (newsize == 0 || (ra->flags & RA_FLAG_INCACHE))
+		goto out;
 
 	ra->prev_page += newsize - 1;
 
@@ -527,7 +527,7 @@ page_cache_readahead(struct address_spac
 	}
 
 out:
-	return newsize;
+	return ra->prev_page + 1;
 }
 
 /*
--- 2.6.11/mm/filemap.c~	2005-02-25 14:32:52.000000000 +0300
+++ 2.6.11/mm/filemap.c	2005-03-01 22:26:10.000000000 +0300
@@ -691,7 +691,7 @@ void do_generic_mapping_read(struct addr
 	unsigned long index;
 	unsigned long end_index;
 	unsigned long offset;
-	unsigned long req_size;
+	unsigned long last_index;
 	unsigned long next_index;
 	unsigned long prev_index;
 	loff_t isize;
@@ -703,7 +703,7 @@ void do_generic_mapping_read(struct addr
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	next_index = index;
 	prev_index = ra.prev_page;
-	req_size = (desc->count + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
 	isize = i_size_read(inode);
@@ -713,7 +713,7 @@ void do_generic_mapping_read(struct addr
 	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
 	for (;;) {
 		struct page *page;
-		unsigned long ret_size, nr, ret;
+		unsigned long nr, ret;
 
 		/* nr is the maximum number of bytes to copy from this page */
 		nr = PAGE_CACHE_SIZE;
@@ -728,12 +728,9 @@ void do_generic_mapping_read(struct addr
 		nr = nr - offset;
 
 		cond_resched();
-		if (index == next_index && req_size) {
-			ret_size = page_cache_readahead(mapping, &ra,
-					filp, index, req_size);
-			next_index += ret_size;
-			req_size -= ret_size;
-		}
+		if (index == next_index)
+			next_index = page_cache_readahead(mapping, &ra, filp,
+					index, last_index - index);
 
 find_page:
 		page = find_get_page(mapping, index);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] readahead: improve sequential read detection
  2005-03-02 19:08 [PATCH 2/2] readahead: improve sequential read detection Oleg Nesterov
@ 2005-03-03  2:01 ` Andrew Morton
  2005-03-09 23:52 ` Ram
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2005-03-03  2:01 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, linuxram, slpratt

Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
> ~$ time dd conv=notrunc if=/tmp/GIG of=/tmp/dummy bs=$((4096+512))
> 
>  2.6.11-clean:	real=370.35 user=0.16 sys=14.66
>  2.6.11-patched:	real=234.49 user=0.19 sys=12.41

whoa, nice.  Ram, can you put this through the torture-test sometime?

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] readahead: improve sequential read detection
  2005-03-02 19:08 [PATCH 2/2] readahead: improve sequential read detection Oleg Nesterov
  2005-03-03  2:01 ` Andrew Morton
@ 2005-03-09 23:52 ` Ram
  2005-03-10  0:03   ` Steven Pratt
  2005-03-10 17:14   ` Oleg Nesterov
  1 sibling, 2 replies; 6+ messages in thread
From: Ram @ 2005-03-09 23:52 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Steven Pratt, Andrew Morton

On Wed, 2005-03-02 at 11:08, Oleg Nesterov wrote:


..snip...

> @@ -527,7 +527,7 @@ page_cache_readahead(struct address_spac
>  	}
>  
>  out:
> -	return newsize;
> +	return ra->prev_page + 1;

This change introduces one key behavioural change in
page_cache_readahead(). Instead of returning the number-of-pages
successfully read, it now returns the next-page-index which is yet to be
read. Was this essential? 

At least, a comment towards this effect at the top of the function is
worth adding.

RP


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] readahead: improve sequential read detection
  2005-03-09 23:52 ` Ram
@ 2005-03-10  0:03   ` Steven Pratt
  2005-03-10  0:06     ` Ram
  2005-03-10 17:14   ` Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Steven Pratt @ 2005-03-10  0:03 UTC (permalink / raw)
  To: Ram; +Cc: Oleg Nesterov, linux-kernel, Andrew Morton

Ram wrote:

>On Wed, 2005-03-02 at 11:08, Oleg Nesterov wrote:
>
>
>..snip...
>
>  
>
>>@@ -527,7 +527,7 @@ page_cache_readahead(struct address_spac
>> 	}
>> 
>> out:
>>-	return newsize;
>>+	return ra->prev_page + 1;
>>    
>>
>
>This change introduces one key behavioural change in
>page_cache_readahead(). Instead of returning the number-of-pages
>successfully read, it now returns the next-page-index which is yet to be
>read. Was this essential? 
>
>  
>
and unless filmap.c was changed accordingly this is broken..  need. to 
look at this more.

Steve

>At least, a comment towards this effect at the top of the function is
>worth adding.
>
>RP
>  
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] readahead: improve sequential read detection
  2005-03-10  0:03   ` Steven Pratt
@ 2005-03-10  0:06     ` Ram
  0 siblings, 0 replies; 6+ messages in thread
From: Ram @ 2005-03-10  0:06 UTC (permalink / raw)
  To: Steven Pratt; +Cc: Oleg Nesterov, linux-kernel, Andrew Morton

On Wed, 2005-03-09 at 16:03, Steven Pratt wrote:
> Ram wrote:
> 
> >On Wed, 2005-03-02 at 11:08, Oleg Nesterov wrote:
> >
> >
> >..snip...
> >
> >  
> >
> >>@@ -527,7 +527,7 @@ page_cache_readahead(struct address_spac
> >> 	}
> >> 
> >> out:
> >>-	return newsize;
> >>+	return ra->prev_page + 1;
> >>    
> >>
> >
> >This change introduces one key behavioural change in
> >page_cache_readahead(). Instead of returning the number-of-pages
> >successfully read, it now returns the next-page-index which is yet to be
> >read. Was this essential? 
> >
> >  
> >
> and unless filmap.c was changed accordingly this is broken..  need. to 
> look at this more.


Oleg has taken care of it in filemap.c whereever page_cache_readahead()
is called. So its not a bug as such. But I dont think it was necessary.
Doing so takes care of 1-line code reduction in filemap.c which is the
only plus point.

RP
 
> >  
> >
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] readahead: improve sequential read detection
  2005-03-09 23:52 ` Ram
  2005-03-10  0:03   ` Steven Pratt
@ 2005-03-10 17:14   ` Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2005-03-10 17:14 UTC (permalink / raw)
  To: Ram; +Cc: linux-kernel, Steven Pratt, Andrew Morton

Ram wrote:
>
> On Wed, 2005-03-02 at 11:08, Oleg Nesterov wrote:
> >
> >  out:
> > - return newsize;
> > + return ra->prev_page + 1;
>
> This change introduces one key behavioural change in
> page_cache_readahead(). Instead of returning the number-of-pages
> successfully read, it now returns the next-page-index which is yet to be
> read. Was this essential?

The problem is that with this change:
+       if (offset == ra->prev_page && --req_size)
+               ++offset;
we can't just return newsize.

Because the caller of page_cache_readahead(offset, req_size) expects
that returned value is the number-of-pages successfully read from
this original offset.

Consider do_generic_mapping_read() reading two pages at offset 10,
with ra->prev_page == 10.

1st page, index == 10:
	ret_size = page_cache_readahead(10, 2);		// returns 1
	next_index += ret_size;				// next_index == 11

2nd page, index == 11:
	if (index == next_index)			// Yes
		page_cache_readahead(11, 1);		// BOGUS!

It can be solved without behavioural change of course, but it will be
more complex.

> At least, a comment towards this effect at the top of the function is
> worth adding.

Yes, it's my fault. I should have added comment in changelog at least.

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-10 16:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-02 19:08 [PATCH 2/2] readahead: improve sequential read detection Oleg Nesterov
2005-03-03  2:01 ` Andrew Morton
2005-03-09 23:52 ` Ram
2005-03-10  0:03   ` Steven Pratt
2005-03-10  0:06     ` Ram
2005-03-10 17:14   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox