* [PATCH 00/14] filemap and readahead fixes
@ 2009-04-07 7:17 Wu Fengguang
0 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 7:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ying Han, LKML, linux-fsdevel, linux-mm
Andrew,
This is a set of fixes and cleanups for filemap and readahead.
They are for 2.6.29-rc8-mm1 and have been carefully tested.
filemap VM_FAULT_RETRY fixes
----------------------------
[PATCH 01/14] mm: fix find_lock_page_retry() return value parsing
[PATCH 02/14] mm: fix major/minor fault accounting on retried fault
[PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
[PATCH 04/14] mm: reduce duplicate page fault code
[PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY
readahead fixes
---------------
minor cleanups:
[PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
[PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead()
[PATCH 08/14] readahead: remove one unnecessary radix tree lookup
behavior changes necessary for the following mmap readahead:
[PATCH 09/14] readahead: increase interleaved readahead size
[PATCH 10/14] readahead: remove sync/async readahead call dependency
mmap readaround/readahead
-------------------------
major cleanups from Linus:
(the cleanups automatically fix a PGMAJFAULT accounting bug in VM_RAND_READ case)
[PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
and my further steps:
[PATCH 12/14] readahead: sequential mmap readahead
[PATCH 13/14] readahead: enforce full readahead size on async mmap readahead
[PATCH 14/14] readahead: record mmap read-around states in file_ra_state
Thanks,
Fengguang
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 00/14] filemap and readahead fixes
@ 2009-04-07 11:50 Wu Fengguang
2009-04-07 11:50 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
` (14 more replies)
0 siblings, 15 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, David Rientjes, Hugh Dickins, Ingo Molnar,
Lee Schermerhorn, Mike Waychison, Nick Piggin, Peter Zijlstra,
Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML, linux-mm,
linux-fsdevel
Andrew,
This is a set of fixes and cleanups for filemap and readahead.
They are for 2.6.29-rc8-mm1 and have been carefully tested.
filemap VM_FAULT_RETRY fixes
----------------------------
[PATCH 01/14] mm: fix find_lock_page_retry() return value parsing
[PATCH 02/14] mm: fix major/minor fault accounting on retried fault
[PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
[PATCH 04/14] mm: reduce duplicate page fault code
[PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY
readahead fixes
---------------
minor cleanups:
[PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
[PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead()
[PATCH 08/14] readahead: remove one unnecessary radix tree lookup
behavior changes necessary for the following mmap readahead:
[PATCH 09/14] readahead: increase interleaved readahead size
[PATCH 10/14] readahead: remove sync/async readahead call dependency
mmap readaround/readahead
-------------------------
major cleanups from Linus:
(the cleanups automatically fix a PGMAJFAULT accounting bug in VM_RAND_READ case)
[PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
and my further steps:
[PATCH 12/14] readahead: sequential mmap readahead
[PATCH 13/14] readahead: enforce full readahead size on async mmap readahead
[PATCH 14/14] readahead: record mmap read-around states in file_ra_state
Thanks,
Fengguang
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
` (13 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Ying Han, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
LKML, linux-mm, linux-fsdevel
[-- Attachment #1: filemap-fault-fix.patch --]
[-- Type: text/plain, Size: 2136 bytes --]
find_lock_page_retry() won't touch the *ppage value when returning
VM_FAULT_RETRY. So in the case of filemap_fault():no_cached_page,
the 'page' could be undefined after calling find_lock_page_retry().
Fix it by checking the VM_FAULT_RETRY case first.
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/filemap.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -759,7 +759,7 @@ EXPORT_SYMBOL(find_lock_page);
* @retry: 1 indicate caller tolerate a retry.
*
* If retry flag is on, and page is already locked by someone else, return
- * a hint of retry.
+ * a hint of retry and leave *ppage untouched.
*
* Return *ppage==NULL if page is not in pagecache. Otherwise return *ppage
* points to the page in the pagecache with ret=VM_FAULT_RETRY indicate a
@@ -1575,10 +1575,10 @@ retry_find_nopage:
vmf->pgoff, 1);
retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
vma, &page, retry_flag);
- if (!page)
- goto no_cached_page;
if (retry_ret == VM_FAULT_RETRY)
return retry_ret;
+ if (!page)
+ goto no_cached_page;
}
if (PageReadahead(page)) {
page_cache_async_readahead(mapping, ra, file, page,
@@ -1617,10 +1617,10 @@ retry_find_nopage:
}
retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
vma, &page, retry_flag);
- if (!page)
- goto no_cached_page;
if (retry_ret == VM_FAULT_RETRY)
return retry_ret;
+ if (!page)
+ goto no_cached_page;
}
if (!did_readaround)
@@ -1672,10 +1672,10 @@ no_cached_page:
retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
vma, &page, retry_flag);
+ if (retry_ret == VM_FAULT_RETRY)
+ return retry_ret;
if (!page)
goto retry_find_nopage;
- else if (retry_ret == VM_FAULT_RETRY)
- return retry_ret;
else
goto retry_page_update;
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
2009-04-07 11:50 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
` (12 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Ying Han, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
LKML, linux-mm, linux-fsdevel
[-- Attachment #1: filemap-major-fault-retry.patch --]
[-- Type: text/plain, Size: 1783 bytes --]
VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
arch/x86/mm/fault.c | 2 ++
mm/memory.c | 22 ++++++++++++++--------
2 files changed, 16 insertions(+), 8 deletions(-)
--- mm.orig/arch/x86/mm/fault.c
+++ mm/arch/x86/mm/fault.c
@@ -1160,6 +1160,8 @@ good_area:
if (fault & VM_FAULT_RETRY) {
if (retry_flag) {
retry_flag = 0;
+ tsk->maj_flt++;
+ tsk->min_flt--;
goto retry;
}
BUG();
--- mm.orig/mm/memory.c
+++ mm/mm/memory.c
@@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
+ int ret;
__set_current_state(TASK_RUNNING);
- count_vm_event(PGFAULT);
-
- if (unlikely(is_vm_hugetlb_page(vma)))
- return hugetlb_fault(mm, vma, address, write_access);
+ if (unlikely(is_vm_hugetlb_page(vma))) {
+ ret = hugetlb_fault(mm, vma, address, write_access);
+ goto out;
+ }
+ ret = VM_FAULT_OOM;
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
- return VM_FAULT_OOM;
+ goto out;
pmd = pmd_alloc(mm, pud, address);
if (!pmd)
- return VM_FAULT_OOM;
+ goto out;
pte = pte_alloc_map(mm, pmd, address);
if (!pte)
- return VM_FAULT_OOM;
+ goto out;
- return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+ ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+out:
+ if (!(ret & VM_FAULT_RETRY))
+ count_vm_event(PGFAULT);
+ return ret;
}
#ifndef __PAGETABLE_PUD_FOLDED
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
2009-04-07 11:50 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
2009-04-07 11:50 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 04/14] mm: reduce duplicate page fault code Wu Fengguang
` (11 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Ying Han, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
LKML, linux-mm, linux-fsdevel
[-- Attachment #1: memory-fault-retry-simp.patch --]
[-- Type: text/plain, Size: 910 bytes --]
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/memory.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- mm.orig/mm/memory.c
+++ mm/mm/memory.c
@@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
{
pgoff_t pgoff = (((address & PAGE_MASK)
- vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
- int write = write_access & ~FAULT_FLAG_RETRY;
- unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
+ unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
- flags |= (write_access & FAULT_FLAG_RETRY);
pte_unmap(page_table);
return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 04/14] mm: reduce duplicate page fault code
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (2 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY Wu Fengguang
` (10 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Ying Han, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
LKML, linux-mm, linux-fsdevel
[-- Attachment #1: filemap-fault-cleanup.patch --]
[-- Type: text/plain, Size: 2155 bytes --]
Restore the simplicity of the filemap_fault():no_cached_page block.
The VM_FAULT_RETRY case is not all that different.
No readahead/readaround will be performed after no_cached_page,
because no_cached_page either means MADV_RANDOM or some error condition.
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/filemap.c | 22 +++-------------------
1 file changed, 3 insertions(+), 19 deletions(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1565,7 +1565,6 @@ int filemap_fault(struct vm_area_struct
retry_find:
page = find_lock_page(mapping, vmf->pgoff);
-retry_find_nopage:
/*
* For sequential accesses, we use the generic readahead logic.
*/
@@ -1615,6 +1614,7 @@ retry_find_nopage:
start = vmf->pgoff - ra_pages / 2;
do_page_cache_readahead(mapping, file, start, ra_pages);
}
+retry_find_retry:
retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
vma, &page, retry_flag);
if (retry_ret == VM_FAULT_RETRY)
@@ -1626,7 +1626,6 @@ retry_find_nopage:
if (!did_readaround)
ra->mmap_miss--;
-retry_page_update:
/*
* We have a locked page in the page cache, now we need to check
* that it's up-to-date. If not, it is going to be due to an error.
@@ -1662,23 +1661,8 @@ no_cached_page:
* In the unlikely event that someone removed it in the
* meantime, we'll just come back here and read it again.
*/
- if (error >= 0) {
- /*
- * If caller cannot tolerate a retry in the ->fault path
- * go back to check the page again.
- */
- if (!retry_flag)
- goto retry_find;
-
- retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
- vma, &page, retry_flag);
- if (retry_ret == VM_FAULT_RETRY)
- return retry_ret;
- if (!page)
- goto retry_find_nopage;
- else
- goto retry_page_update;
- }
+ if (error >= 0)
+ goto retry_find_retry;
/*
* An error return from page_cache_read can result if the
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (3 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 04/14] mm: reduce duplicate page fault code Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
` (9 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-mmap_miss-retry.patch --]
[-- Type: text/plain, Size: 2127 bytes --]
The VM_FAULT_RETRY case introduced a performance bug that leads to
excessive/unconditional mmap readarounds for wild random mmap reads.
A retried page fault means a mmap readahead miss(mmap_miss++) followed by
a hit(mmap_miss--) on the same page. This sticks mmap_miss, and thus stops
mmap readaround from being turned off for wild random reads. Fix it by an
extra mmap_miss increament in order to counteract the followed mmap hit.
Also make mmap_miss a more robust 'unsigned int', so that if ever mmap_miss
goes out of range, it only create _temporary_ performance impacts.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
include/linux/fs.h | 2 +-
mm/filemap.c | 8 ++++++--
2 files changed, 7 insertions(+), 3 deletions(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1574,8 +1574,10 @@ retry_find:
vmf->pgoff, 1);
retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
vma, &page, retry_flag);
- if (retry_ret == VM_FAULT_RETRY)
+ if (retry_ret == VM_FAULT_RETRY) {
+ ra->mmap_miss++; /* counteract the followed retry hit */
return retry_ret;
+ }
if (!page)
goto no_cached_page;
}
@@ -1617,8 +1619,10 @@ retry_find:
retry_find_retry:
retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
vma, &page, retry_flag);
- if (retry_ret == VM_FAULT_RETRY)
+ if (retry_ret == VM_FAULT_RETRY) {
+ ra->mmap_miss++; /* counteract the followed retry hit */
return retry_ret;
+ }
if (!page)
goto no_cached_page;
}
--- mm.orig/include/linux/fs.h
+++ mm/include/linux/fs.h
@@ -824,7 +824,7 @@ struct file_ra_state {
there are only # of pages ahead */
unsigned int ra_pages; /* Maximum readahead window */
- int mmap_miss; /* Cache miss stat for mmap accesses */
+ unsigned int mmap_miss; /* Cache miss stat for mmap accesses */
loff_t prev_pos; /* Cache last read() position */
};
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (4 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
` (8 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Nick Piggin, Linus Torvalds, Wu Fengguang,
David Rientjes, Hugh Dickins, Ingo Molnar, Lee Schermerhorn,
Mike Waychison, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-move-max_sane_readahead.patch --]
[-- Type: text/plain, Size: 1843 bytes --]
Impact: code simplification.
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/fadvise.c | 2 +-
mm/filemap.c | 3 +--
mm/madvise.c | 3 +--
mm/readahead.c | 1 +
4 files changed, 4 insertions(+), 5 deletions(-)
--- mm.orig/mm/fadvise.c
+++ mm/mm/fadvise.c
@@ -101,7 +101,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, lof
ret = force_page_cache_readahead(mapping, file,
start_index,
- max_sane_readahead(nrpages));
+ nrpages);
if (ret > 0)
ret = 0;
break;
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1458,8 +1458,7 @@ do_readahead(struct address_space *mappi
if (!mapping || !mapping->a_ops || !mapping->a_ops->readpage)
return -EINVAL;
- force_page_cache_readahead(mapping, filp, index,
- max_sane_readahead(nr));
+ force_page_cache_readahead(mapping, filp, index, nr);
return 0;
}
--- mm.orig/mm/madvise.c
+++ mm/mm/madvise.c
@@ -123,8 +123,7 @@ static long madvise_willneed(struct vm_a
end = vma->vm_end;
end = ((end - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
- force_page_cache_readahead(file->f_mapping,
- file, start, max_sane_readahead(end - start));
+ force_page_cache_readahead(file->f_mapping, file, start, end - start);
return 0;
}
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -223,6 +223,7 @@ int force_page_cache_readahead(struct ad
if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
return -EINVAL;
+ nr_to_read = max_sane_readahead(nr_to_read);
while (nr_to_read) {
int err;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead()
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (5 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 08/14] readahead: remove one unnecessary radix tree lookup Wu Fengguang
` (7 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Nick Piggin, Linus Torvalds, Wu Fengguang,
David Rientjes, Hugh Dickins, Ingo Molnar, Lee Schermerhorn,
Mike Waychison, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-sane-max.patch --]
[-- Type: text/plain, Size: 852 bytes --]
Just in case someone aggressively set a huge readahead size.
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/readahead.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -382,7 +382,7 @@ ondemand_readahead(struct address_space
bool hit_readahead_marker, pgoff_t offset,
unsigned long req_size)
{
- int max = ra->ra_pages; /* max readahead pages */
+ unsigned long max = max_sane_readahead(ra->ra_pages);
pgoff_t prev_offset;
int sequential;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 08/14] readahead: remove one unnecessary radix tree lookup
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (6 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 09/14] readahead: increase interleaved readahead size Wu Fengguang
` (6 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-interleaved-offset.patch --]
[-- Type: text/plain, Size: 884 bytes --]
(hit_readahead_marker != 0) means the page at @offset is present,
so we can search for non-present page starting from @offset+1.
Reported-by: Xu Chenfeng <xcf@ustc.edu.cn>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/readahead.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -420,7 +420,7 @@ ondemand_readahead(struct address_space
pgoff_t start;
rcu_read_lock();
- start = radix_tree_next_hole(&mapping->page_tree, offset,max+1);
+ start = radix_tree_next_hole(&mapping->page_tree, offset+1,max);
rcu_read_unlock();
if (!start || start - offset > max)
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 09/14] readahead: increase interleaved readahead size
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (7 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 08/14] readahead: remove one unnecessary radix tree lookup Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 10/14] readahead: remove sync/async readahead call dependency Wu Fengguang
` (5 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Wu Fengguang, David Rientjes,
Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-interleaved-size.patch --]
[-- Type: text/plain, Size: 808 bytes --]
Make sure interleaved readahead size is larger than request size.
This also makes readahead window grow up more quickly.
Reported-by: Xu Chenfeng <xcf@ustc.edu.cn>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/readahead.c | 1 +
1 file changed, 1 insertion(+)
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -428,6 +428,7 @@ ondemand_readahead(struct address_space
ra->start = start;
ra->size = start - offset; /* old async_size */
+ ra->size += req_size;
ra->size = get_next_ra_size(ra, max);
ra->async_size = ra->size;
goto readit;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 10/14] readahead: remove sync/async readahead call dependency
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (8 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 09/14] readahead: increase interleaved readahead size Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
` (4 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Nick Piggin, Linus Torvalds, Wu Fengguang,
David Rientjes, Hugh Dickins, Ingo Molnar, Lee Schermerhorn,
Mike Waychison, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-remove-call-dependancy.patch --]
[-- Type: text/plain, Size: 2013 bytes --]
The readahead call scheme is error-prone in that it expects on the call sites
to check for async readahead after doing any sync one. I.e.
if (!page)
page_cache_sync_readahead();
page = find_get_page();
if (page && PageReadahead(page))
page_cache_async_readahead();
This is because PG_readahead could be set by a sync readahead for the _current_
newly faulted in page, and the readahead code simply expects one more callback
on the same page to start the async readahead. If the caller fails to do so, it
will miss the PG_readahead bits and never able to start an async readahead.
Eliminate this insane constraint by piggy-backing the async part into the
current readahead window.
Now if an async readahead should be started immediately after a sync one,
the readahead logic itself will do it. So the following code becomes valid:
(the 'else' in particular)
if (!page)
page_cache_sync_readahead();
else if (PageReadahead(page))
page_cache_async_readahead();
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/readahead.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -446,6 +446,16 @@ ondemand_readahead(struct address_space
ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
readit:
+ /*
+ * Will this read hit the readahead marker made by itself?
+ * If so, trigger the readahead marker hit now, and merge
+ * the resulted next readahead window into the current one.
+ */
+ if (offset == ra->start && ra->size == ra->async_size) {
+ ra->async_size = get_next_ra_size(ra, max);
+ ra->size += ra->async_size;
+ }
+
return ra_submit(ra, mapping, filp);
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (9 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 10/14] readahead: remove sync/async readahead call dependency Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 15:50 ` Linus Torvalds
2009-04-07 11:50 ` [PATCH 12/14] readahead: sequential mmap readahead Wu Fengguang
` (3 subsequent siblings)
14 siblings, 1 reply; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Pavel Levshin, wli, Nick Piggin,
Wu Fengguang, Linus Torvalds, David Rientjes, Hugh Dickins,
Ingo Molnar, Lee Schermerhorn, Mike Waychison, Peter Zijlstra,
Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML, linux-mm,
linux-fsdevel
[-- Attachment #1: readahead-mmap-split-code-and-cleanup.patch --]
[-- Type: text/plain, Size: 9841 bytes --]
From: Linus Torvalds <torvalds@linux-foundation.org>
This shouldn't really change behavior all that much, but the single
rather complex function with read-ahead inside a loop etc is broken up
into more manageable pieces.
The behaviour is also less subtle, with the read-ahead being done up-front
rather than inside some subtle loop and thus avoiding the now unnecessary
extra state variables (ie "did_readaround" is gone).
Fengguang: the code split in fact fixed a bug reported by Pavel Levshin:
the PGMAJFAULT accounting used to be bypassed when MADV_RANDOM is set, in
which case the original code will directly jump to no_cached_page reading.
Cc: Pavel Levshin <lpk@581.spb.su>
Cc: wli@movementarian.org
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
Ok, so this is something I did in Mexico when I wasn't scuba-diving, and
was "watching" the kids at the pool. It was brought on by looking at git
mmap file behaviour under cold-cache behaviour: git does ok, but my laptop
disk is really slow, and I tried to verify that the kernel did a
reasonable job of read-ahead when taking page faults.
I think it did, but quite frankly, the filemap_fault() code was totally
unreadable. So this separates out the read-ahead cases, and adds more
comments, and also changes it so that we do asynchronous read-ahead
*before* we actually wait for the page we are waiting for to become
unlocked.
Not that it seems to make any real difference on my laptop, but I really
hated how it was doing a
page = get_lock_page(..)
and then doing read-ahead after that: which just guarantees that we have
to wait for any out-standing IO on "page" to complete before we can even
submit any new read-ahead! That just seems totally broken!
So it replaces the "get_lock_page()" at the top with a broken-out page
cache lookup, which allows us to look at the page state flags and make
appropriate decisions on what we should do without waiting for the locked
bit to clear.
It does add many more lines than it removes:
mm/filemap.c | 192 +++++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 130 insertions(+), 62 deletions(-)
but that's largely due to (a) the new function headers etc due to the
split-up and (b) new or extended comments especially about the helper
functions. The code, in many ways, is actually simpler, apart from the
fairly trivial expansion of the equivalent of "get_lock_page()" into the
function.
Comments? I tried to avoid changing the read-ahead logic itself, although
the old code did some strange things like doing *both* async readahead and
then looking up the page and doing sync readahead (which I think was just
due to the code being so damn messily organized, not on purpose).
Linus
---
mm/filemap.c | 164 ++++++++++++++++++++++++++-----------------------
1 file changed, 90 insertions(+), 74 deletions(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1524,6 +1524,68 @@ static int page_cache_read(struct file *
#define MMAP_LOTSAMISS (100)
+/*
+ * Synchronous readahead happens when we don't even find
+ * a page in the page cache at all.
+ */
+static void do_sync_mmap_readahead(struct vm_area_struct *vma,
+ struct file_ra_state *ra,
+ struct file *file,
+ pgoff_t offset)
+{
+ unsigned long ra_pages;
+ struct address_space *mapping = file->f_mapping;
+
+ /* If we don't want any read-ahead, don't bother */
+ if (VM_RandomReadHint(vma))
+ return;
+
+ if (VM_SequentialReadHint(vma)) {
+ page_cache_sync_readahead(mapping, ra, file, offset, 1);
+ return;
+ }
+
+ if (ra->mmap_miss < INT_MAX)
+ ra->mmap_miss++;
+
+ /*
+ * Do we miss much more than hit in this file? If so,
+ * stop bothering with read-ahead. It will only hurt.
+ */
+ if (ra->mmap_miss > MMAP_LOTSAMISS)
+ return;
+
+ ra_pages = max_sane_readahead(ra->ra_pages);
+ if (ra_pages) {
+ pgoff_t start = 0;
+
+ if (offset > ra_pages / 2)
+ start = offset - ra_pages / 2;
+ do_page_cache_readahead(mapping, file, start, ra_pages);
+ }
+}
+
+/*
+ * Asynchronous readahead happens when we find the page and PG_readahead,
+ * so we want to possibly extend the readahead further..
+ */
+static void do_async_mmap_readahead(struct vm_area_struct *vma,
+ struct file_ra_state *ra,
+ struct file *file,
+ struct page *page,
+ pgoff_t offset)
+{
+ struct address_space *mapping = file->f_mapping;
+
+ /* If we don't want any read-ahead, don't bother */
+ if (VM_RandomReadHint(vma))
+ return;
+ if (ra->mmap_miss > 0)
+ ra->mmap_miss--;
+ if (PageReadahead(page))
+ page_cache_async_readahead(mapping, ra, file, page, offset, 1);
+}
+
/**
* filemap_fault - read in file data for page fault handling
* @vma: vma in which the fault was taken
@@ -1543,80 +1605,43 @@ int filemap_fault(struct vm_area_struct
struct address_space *mapping = file->f_mapping;
struct file_ra_state *ra = &file->f_ra;
struct inode *inode = mapping->host;
+ pgoff_t offset = vmf->pgoff;
struct page *page;
pgoff_t size;
- int did_readaround = 0;
int ret = 0;
int retry_flag = vmf->flags & FAULT_FLAG_RETRY;
int retry_ret;
size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- if (vmf->pgoff >= size)
+ if (offset >= size)
return VM_FAULT_SIGBUS;
- /* If we don't want any read-ahead, don't bother */
- if (VM_RandomReadHint(vma))
- goto no_cached_page;
-
/*
* Do we have something in the page cache already?
*/
-retry_find:
- page = find_lock_page(mapping, vmf->pgoff);
-
- /*
- * For sequential accesses, we use the generic readahead logic.
- */
- if (VM_SequentialReadHint(vma)) {
- if (!page) {
- page_cache_sync_readahead(mapping, ra, file,
- vmf->pgoff, 1);
- retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
- vma, &page, retry_flag);
- if (retry_ret == VM_FAULT_RETRY) {
- ra->mmap_miss++; /* counteract the followed retry hit */
- return retry_ret;
- }
- if (!page)
- goto no_cached_page;
- }
- if (PageReadahead(page)) {
- page_cache_async_readahead(mapping, ra, file, page,
- vmf->pgoff, 1);
- }
- }
-
- if (!page) {
- unsigned long ra_pages;
-
- ra->mmap_miss++;
+ page = find_get_page(mapping, offset);
+ if (likely(page)) {
/*
- * Do we miss much more than hit in this file? If so,
- * stop bothering with read-ahead. It will only hurt.
+ * We found the page, so try async readahead before
+ * waiting for the lock.
*/
- if (ra->mmap_miss > MMAP_LOTSAMISS)
- goto no_cached_page;
+ do_async_mmap_readahead(vma, ra, file, page, offset);
+ lock_page(page);
- /*
- * To keep the pgmajfault counter straight, we need to
- * check did_readaround, as this is an inner loop.
- */
- if (!did_readaround) {
- ret = VM_FAULT_MAJOR;
- count_vm_event(PGMAJFAULT);
- }
- did_readaround = 1;
- ra_pages = max_sane_readahead(file->f_ra.ra_pages);
- if (ra_pages) {
- pgoff_t start = 0;
-
- if (vmf->pgoff > ra_pages / 2)
- start = vmf->pgoff - ra_pages / 2;
- do_page_cache_readahead(mapping, file, start, ra_pages);
+ /* Did it get truncated? */
+ if (unlikely(page->mapping != mapping)) {
+ unlock_page(page);
+ put_page(page);
+ goto no_cached_page;
}
-retry_find_retry:
- retry_ret = find_lock_page_retry(mapping, vmf->pgoff,
+ } else {
+ /* No page in the page cache at all */
+ do_sync_mmap_readahead(vma, ra, file, offset);
+ count_vm_event(PGMAJFAULT);
+ ret = VM_FAULT_MAJOR;
+retry_find:
+ retry_ret = find_lock_page_retry(mapping, offset,
vma, &page, retry_flag);
if (retry_ret == VM_FAULT_RETRY) {
ra->mmap_miss++; /* counteract the followed retry hit */
@@ -1626,9 +1651,6 @@ retry_find_retry:
goto no_cached_page;
}
- if (!did_readaround)
- ra->mmap_miss--;
-
/*
* We have a locked page in the page cache, now we need to check
* that it's up-to-date. If not, it is going to be due to an error.
@@ -1636,19 +1658,19 @@ retry_find_retry:
if (unlikely(!PageUptodate(page)))
goto page_not_uptodate;
- /* Must recheck i_size under page lock */
+ /*
+ * Found the page and have a reference on it.
+ * We must recheck i_size under page lock.
+ */
size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- if (unlikely(vmf->pgoff >= size)) {
+ if (unlikely(offset >= size)) {
unlock_page(page);
page_cache_release(page);
return VM_FAULT_SIGBUS;
}
- /*
- * Found the page and have a reference on it.
- */
update_page_reclaim_stat(page);
- ra->prev_pos = (loff_t)page->index << PAGE_CACHE_SHIFT;
+ ra->prev_pos = (loff_t)offset << PAGE_CACHE_SHIFT;
vmf->page = page;
return ret | VM_FAULT_LOCKED;
@@ -1657,7 +1679,7 @@ no_cached_page:
* We're only likely to ever get here if MADV_RANDOM is in
* effect.
*/
- error = page_cache_read(file, vmf->pgoff);
+ error = page_cache_read(file, offset);
/*
* The page we want has now been added to the page cache.
@@ -1665,7 +1687,7 @@ no_cached_page:
* meantime, we'll just come back here and read it again.
*/
if (error >= 0)
- goto retry_find_retry;
+ goto retry_find;
/*
* An error return from page_cache_read can result if the
@@ -1677,12 +1699,6 @@ no_cached_page:
return VM_FAULT_SIGBUS;
page_not_uptodate:
- /* IO error path */
- if (!did_readaround) {
- ret = VM_FAULT_MAJOR;
- count_vm_event(PGMAJFAULT);
- }
-
/*
* Umm, take care of errors if the page isn't up-to-date.
* Try to re-read it _once_. We do this synchronously,
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 12/14] readahead: sequential mmap readahead
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (10 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 13/14] readahead: enforce full readahead size on async " Wu Fengguang
` (2 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Nick Piggin, Linus Torvalds, Wu Fengguang,
David Rientjes, Hugh Dickins, Ingo Molnar, Lee Schermerhorn,
Mike Waychison, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-mmap-sequential-readahead.patch --]
[-- Type: text/plain, Size: 2021 bytes --]
Auto-detect sequential mmap reads and do readahead for them.
The sequential mmap readahead will be triggered when
- sync readahead: it's a major fault and (prev_offset == offset-1);
- async readahead: minor fault on PG_readahead page with valid readahead state.
The benefits of doing readahead instead of read-around:
- less I/O wait thanks to async readahead
- double real I/O size and no more cache hits
The single stream case is improved a little.
For 100,000 sequential mmap reads:
user system cpu total
(1-1) plain -mm, 128KB readaround: 3.224 2.554 48.40% 11.838
(1-2) plain -mm, 256KB readaround: 3.170 2.392 46.20% 11.976
(2) patched -mm, 128KB readahead: 3.117 2.448 47.33% 11.607
The patched (2) has smallest total time, since it has no cache hit overheads
and less I/O block time(thanks to async readahead). Here the I/O size
makes no much difference, since there's only one single stream.
Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is 128KB,
since the half of the read-around pages will be readahead cache hits.
This is going to make _real_ differences for _concurrent_ IO streams.
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/filemap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1540,7 +1540,8 @@ static void do_sync_mmap_readahead(struc
if (VM_RandomReadHint(vma))
return;
- if (VM_SequentialReadHint(vma)) {
+ if (VM_SequentialReadHint(vma) ||
+ offset - 1 == (ra->prev_pos >> PAGE_CACHE_SHIFT)) {
page_cache_sync_readahead(mapping, ra, file, offset, 1);
return;
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 13/14] readahead: enforce full readahead size on async mmap readahead
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (11 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 12/14] readahead: sequential mmap readahead Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-07 11:50 ` [PATCH 14/14] readahead: record mmap read-around states in file_ra_state Wu Fengguang
2009-04-10 4:36 ` [PATCH 00/14] filemap and readahead fixes Andrew Morton
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Linus Torvalds, Nick Piggin, Wu Fengguang,
David Rientjes, Hugh Dickins, Ingo Molnar, Lee Schermerhorn,
Mike Waychison, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-mmap-full-async-readahead-size.patch --]
[-- Type: text/plain, Size: 2711 bytes --]
We need this in one perticular case and two more general ones.
Now we do async readahead for sequential mmap reads, and do it with the help of
PG_readahead. For normal reads, PG_readahead is the sufficient condition to do
a sequential readahead. But unfortunately, for mmap reads, there is a tiny nuisance:
[11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:4503599627370495, ra=0+4-3) = 4
[11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17
[11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32
[11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~
An unfavorably small readahead. The original dumb read-around size could be more efficient.
That happened because ld-linux.so does a read(832) in L1 before mmap(),
which triggers a 4-page readahead, with the second page tagged PG_readahead.
L0: open("/lib/libc.so.6", O_RDONLY) = 3
L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0
L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000
L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0
L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000
L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000
L7: close(3) = 0
In general, the PG_readahead flag will also be hit in cases
- sequential reads
- clustered random reads
A full readahead size is desirable in both cases.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/filemap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1584,7 +1584,8 @@ static void do_async_mmap_readahead(stru
if (ra->mmap_miss > 0)
ra->mmap_miss--;
if (PageReadahead(page))
- page_cache_async_readahead(mapping, ra, file, page, offset, 1);
+ page_cache_async_readahead(mapping, ra, file,
+ page, offset, ra->ra_pages);
}
/**
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 14/14] readahead: record mmap read-around states in file_ra_state
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (12 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 13/14] readahead: enforce full readahead size on async " Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
2009-04-10 4:36 ` [PATCH 00/14] filemap and readahead fixes Andrew Morton
14 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, Nick Piggin, Linus Torvalds,
David Rientjes, Hugh Dickins, Ingo Molnar, Lee Schermerhorn,
Mike Waychison, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
Ying Han, LKML, linux-mm, linux-fsdevel
[-- Attachment #1: readahead-mmap-readaround-use-ra_submit.patch --]
[-- Type: text/plain, Size: 4123 bytes --]
Mmap read-around now shares the same code style and data structure
with readahead code.
This also removes do_page_cache_readahead().
Its last user, mmap read-around, has been changed to call ra_submit().
The no-readahead-if-congested logic is dumped by the way.
Users will be pretty sensitive about the slow loading of executables.
So it's unfavorable to disabled mmap read-around on a congested queue.
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
include/linux/mm.h | 5 +++--
mm/filemap.c | 12 +++++++-----
mm/readahead.c | 23 ++---------------------
3 files changed, 12 insertions(+), 28 deletions(-)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1556,13 +1556,15 @@ static void do_sync_mmap_readahead(struc
if (ra->mmap_miss > MMAP_LOTSAMISS)
return;
+ /*
+ * mmap read-around
+ */
ra_pages = max_sane_readahead(ra->ra_pages);
if (ra_pages) {
- pgoff_t start = 0;
-
- if (offset > ra_pages / 2)
- start = offset - ra_pages / 2;
- do_page_cache_readahead(mapping, file, start, ra_pages);
+ ra->start = max_t(long, 0, offset - ra_pages/2);
+ ra->size = ra_pages;
+ ra->async_size = 0;
+ ra_submit(ra, mapping, file);
}
}
--- mm.orig/include/linux/mm.h
+++ mm/include/linux/mm.h
@@ -1183,8 +1183,6 @@ void task_dirty_inc(struct task_struct *
#define VM_MAX_READAHEAD 128 /* kbytes */
#define VM_MIN_READAHEAD 16 /* kbytes (includes current page) */
-int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
- pgoff_t offset, unsigned long nr_to_read);
int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
@@ -1202,6 +1200,9 @@ void page_cache_async_readahead(struct a
unsigned long size);
unsigned long max_sane_readahead(unsigned long nr);
+unsigned long ra_submit(struct file_ra_state *ra,
+ struct address_space *mapping,
+ struct file *filp);
/* Do stack extension */
extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
--- mm.orig/mm/readahead.c
+++ mm/mm/readahead.c
@@ -146,15 +146,12 @@ out:
}
/*
- * do_page_cache_readahead actually reads a chunk of disk. It allocates all
+ * __do_page_cache_readahead() actually reads a chunk of disk. It allocates all
* the pages first, then submits them all for I/O. This avoids the very bad
* behaviour which would occur if page allocations are causing VM writeback.
* We really don't want to intermingle reads and writes like that.
*
* Returns the number of pages requested, or the maximum amount of I/O allowed.
- *
- * do_page_cache_readahead() returns -1 if it encountered request queue
- * congestion.
*/
static int
__do_page_cache_readahead(struct address_space *mapping, struct file *filp,
@@ -245,22 +242,6 @@ int force_page_cache_readahead(struct ad
}
/*
- * This version skips the IO if the queue is read-congested, and will tell the
- * block layer to abandon the readahead if request allocation would block.
- *
- * force_page_cache_readahead() will ignore queue congestion and will block on
- * request queues.
- */
-int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
- pgoff_t offset, unsigned long nr_to_read)
-{
- if (bdi_read_congested(mapping->backing_dev_info))
- return -1;
-
- return __do_page_cache_readahead(mapping, filp, offset, nr_to_read, 0);
-}
-
-/*
* Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
* sensible upper limit.
*/
@@ -285,7 +266,7 @@ subsys_initcall(readahead_init);
/*
* Submit IO for the read-ahead request in file_ra_state.
*/
-static unsigned long ra_submit(struct file_ra_state *ra,
+unsigned long ra_submit(struct file_ra_state *ra,
struct address_space *mapping, struct file *filp)
{
int actual;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead
2009-04-07 11:50 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
@ 2009-04-07 15:50 ` Linus Torvalds
0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2009-04-07 15:50 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Benjamin Herrenschmidt, Pavel Levshin, wli,
Nick Piggin, David Rientjes, Hugh Dickins, Ingo Molnar,
Lee Schermerhorn, Mike Waychison, Peter Zijlstra, Rohit Seth,
Edwin, H. Peter Anvin, Ying Han, LKML, linux-mm, linux-fsdevel
On Tue, 7 Apr 2009, Wu Fengguang wrote:
>
> From: Linus Torvalds <torvalds@linux-foundation.org>
>
> This shouldn't really change behavior all that much, but the single
> rather complex function with read-ahead inside a loop etc is broken up
> into more manageable pieces.
Heh. That's an old patch.
Anyway, ACK on the whole series (or at least the pieces of it that were
cc'd to me). Looks like sane cleanups, and I don't mean just my own old
patch ;)
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 00/14] filemap and readahead fixes
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
` (13 preceding siblings ...)
2009-04-07 11:50 ` [PATCH 14/14] readahead: record mmap read-around states in file_ra_state Wu Fengguang
@ 2009-04-10 4:36 ` Andrew Morton
2009-04-10 4:54 ` Wu Fengguang
14 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2009-04-10 4:36 UTC (permalink / raw)
To: Wu Fengguang
Cc: Benjamin Herrenschmidt, David Rientjes, Hugh Dickins, Ingo Molnar,
Lee Schermerhorn, Mike Waychison, Nick Piggin, Peter Zijlstra,
Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML, linux-mm,
linux-fsdevel
On Tue, 07 Apr 2009 19:50:39 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> This is a set of fixes and cleanups for filemap and readahead.
Unfortunately page_fault-retry-with-nopage_retry.patch got dropped so
the first five patches are no longer applicable. Patch #11 also died.
Can you please respin the remains against current mainline?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 00/14] filemap and readahead fixes
2009-04-10 4:36 ` [PATCH 00/14] filemap and readahead fixes Andrew Morton
@ 2009-04-10 4:54 ` Wu Fengguang
2009-04-10 5:08 ` Andrew Morton
0 siblings, 1 reply; 21+ messages in thread
From: Wu Fengguang @ 2009-04-10 4:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, David Rientjes, Hugh Dickins, Ingo Molnar,
Lee Schermerhorn, Mike Waychison, Nick Piggin, Peter Zijlstra,
Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
On Fri, Apr 10, 2009 at 12:36:43PM +0800, Andrew Morton wrote:
> On Tue, 07 Apr 2009 19:50:39 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > This is a set of fixes and cleanups for filemap and readahead.
>
> Unfortunately page_fault-retry-with-nopage_retry.patch got dropped so
> the first five patches are no longer applicable. Patch #11 also died.
>
> Can you please respin the remains against current mainline?
Do you mean rebase them onto linux-next, bypassing Ying Hans' patches?
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 00/14] filemap and readahead fixes
2009-04-10 4:54 ` Wu Fengguang
@ 2009-04-10 5:08 ` Andrew Morton
2009-04-10 5:53 ` Wu Fengguang
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2009-04-10 5:08 UTC (permalink / raw)
To: Wu Fengguang
Cc: Benjamin Herrenschmidt, David Rientjes, Hugh Dickins, Ingo Molnar,
Lee Schermerhorn, Mike Waychison, Nick Piggin, Peter Zijlstra,
Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
On Fri, 10 Apr 2009 12:54:40 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Fri, Apr 10, 2009 at 12:36:43PM +0800, Andrew Morton wrote:
> > On Tue, 07 Apr 2009 19:50:39 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> >
> > > This is a set of fixes and cleanups for filemap and readahead.
> >
> > Unfortunately page_fault-retry-with-nopage_retry.patch got dropped so
> > the first five patches are no longer applicable. Patch #11 also died.
> >
> > Can you please respin the remains against current mainline?
>
> Do you mean rebase them onto linux-next, bypassing Ying Hans' patches?
>
Those patches are still several akpm-hours ahead in my backlog queue.
They don't seem to have generated much attention and someone (ie: you)
had substantial comments which haven't been replied to yet. So I'd
expect another version to be forthcoming.
But I don't mind either way. I guess the main question here is: do we
see a need to squeeze any of these things into 2.6.30?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 00/14] filemap and readahead fixes
2009-04-10 5:08 ` Andrew Morton
@ 2009-04-10 5:53 ` Wu Fengguang
0 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2009-04-10 5:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Benjamin Herrenschmidt, David Rientjes, Hugh Dickins, Ingo Molnar,
Lee Schermerhorn, Mike Waychison, Nick Piggin, Peter Zijlstra,
Rohit Seth, Edwin, H. Peter Anvin, Ying Han, LKML,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
On Fri, Apr 10, 2009 at 01:08:15PM +0800, Andrew Morton wrote:
> On Fri, 10 Apr 2009 12:54:40 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > On Fri, Apr 10, 2009 at 12:36:43PM +0800, Andrew Morton wrote:
> > > On Tue, 07 Apr 2009 19:50:39 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >
> > > > This is a set of fixes and cleanups for filemap and readahead.
> > >
> > > Unfortunately page_fault-retry-with-nopage_retry.patch got dropped so
> > > the first five patches are no longer applicable. Patch #11 also died.
> > >
> > > Can you please respin the remains against current mainline?
> >
> > Do you mean rebase them onto linux-next, bypassing Ying Hans' patches?
> >
>
> Those patches are still several akpm-hours ahead in my backlog queue.
> They don't seem to have generated much attention and someone (ie: you)
> had substantial comments which haven't been replied to yet. So I'd
> expect another version to be forthcoming.
OK. The truth on my part is that I'll be able to submit my patches
much earlier if her patches are not in the way. I have to understand
what her patches do and resolve conflicts and retest and resolve the
side effects.
So I became a big tester of her patches and cleared several bugs out
of it. Now I feel confident on the fault-retry patches.
I can imagine one major benefited workload to be concurrent threads
reading on the same mmap file. Before her patch:
- minor faults are blocked waiting for major faults doing IO
- major faults block each other, leading to _serialized_ IO
So her patches not only reduce unnecessary long waited locks,
but also make _parallel_ IOs happen.
> But I don't mind either way. I guess the main question here is: do we
> see a need to squeeze any of these things into 2.6.30?
Linus and mine filemap/readahead patches are in fact pretty old bug
fixing and cleanup ones, they should be safe for 2.6.30 :-)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-04-10 5:53 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
2009-04-07 11:50 ` [PATCH 01/14] mm: fix find_lock_page_retry() return value parsing Wu Fengguang
2009-04-07 11:50 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang
2009-04-07 11:50 ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Wu Fengguang
2009-04-07 11:50 ` [PATCH 04/14] mm: reduce duplicate page fault code Wu Fengguang
2009-04-07 11:50 ` [PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY Wu Fengguang
2009-04-07 11:50 ` [PATCH 06/14] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
2009-04-07 11:50 ` [PATCH 07/14] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
2009-04-07 11:50 ` [PATCH 08/14] readahead: remove one unnecessary radix tree lookup Wu Fengguang
2009-04-07 11:50 ` [PATCH 09/14] readahead: increase interleaved readahead size Wu Fengguang
2009-04-07 11:50 ` [PATCH 10/14] readahead: remove sync/async readahead call dependency Wu Fengguang
2009-04-07 11:50 ` [PATCH 11/14] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
2009-04-07 15:50 ` Linus Torvalds
2009-04-07 11:50 ` [PATCH 12/14] readahead: sequential mmap readahead Wu Fengguang
2009-04-07 11:50 ` [PATCH 13/14] readahead: enforce full readahead size on async " Wu Fengguang
2009-04-07 11:50 ` [PATCH 14/14] readahead: record mmap read-around states in file_ra_state Wu Fengguang
2009-04-10 4:36 ` [PATCH 00/14] filemap and readahead fixes Andrew Morton
2009-04-10 4:54 ` Wu Fengguang
2009-04-10 5:08 ` Andrew Morton
2009-04-10 5:53 ` Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2009-04-07 7:17 Wu Fengguang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).