* [RFC][PATCH] vfs: check inode size on no_cached_page
@ 2009-04-12 7:16 Wu Fengguang
2009-04-15 0:11 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Wu Fengguang @ 2009-04-12 7:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: LKML, linux-fsdevel, Chenfeng Xu
[This patch may not necessarily be merged, but at least we should
be aware of the problem.]
When user space requests past-EOF data, do_generic_file_read() will
issue a bonus readpage call, which may be unfavorable.
do_generic_file_read:
-> find_page:
-> find_get_page() = NULL
-> page_cache_sync_readahead()
-> find_get_page() = NULL
-> no_cached_page:
-> readpage:
-> nfs_readpage() = error
-> readpage_error:
Reported-by: Xu Chenfeng <xcf@ustc.edu.cn>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/filemap.c | 5 +++++
1 file changed, 5 insertions(+)
--- mm.orig/mm/filemap.c
+++ mm/mm/filemap.c
@@ -1269,6 +1269,11 @@ readpage_error:
goto out;
no_cached_page:
+ isize = i_size_read(inode);
+ end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+ if (unlikely(!isize || index > end_index))
+ goto out;
+
/*
* Ok, it wasn't cached, so we need to create a new
* page..
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC][PATCH] vfs: check inode size on no_cached_page 2009-04-12 7:16 [RFC][PATCH] vfs: check inode size on no_cached_page Wu Fengguang @ 2009-04-15 0:11 ` Andrew Morton [not found] ` <20090415012027.GA4731@ThinkPad> [not found] ` <20090414171114.04a47932.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 0 siblings, 2 replies; 6+ messages in thread From: Andrew Morton @ 2009-04-15 0:11 UTC (permalink / raw) To: Wu Fengguang; +Cc: linux-kernel, linux-fsdevel, xcf On Sun, 12 Apr 2009 15:16:05 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote: > [This patch may not necessarily be merged, but at least we should > be aware of the problem.] > > When user space requests past-EOF data, do_generic_file_read() will > issue a bonus readpage call, which may be unfavorable. > > do_generic_file_read: > -> find_page: > -> find_get_page() = NULL > -> page_cache_sync_readahead() > -> find_get_page() = NULL > -> no_cached_page: > -> readpage: > -> nfs_readpage() = error > -> readpage_error: > > Reported-by: Xu Chenfeng <xcf@ustc.edu.cn> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > --- > mm/filemap.c | 5 +++++ > 1 file changed, 5 insertions(+) > > --- mm.orig/mm/filemap.c > +++ mm/mm/filemap.c > @@ -1269,6 +1269,11 @@ readpage_error: > goto out; > > no_cached_page: > + isize = i_size_read(inode); > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > + if (unlikely(!isize || index > end_index)) > + goto out; > + > /* > * Ok, it wasn't cached, so we need to create a new > * page.. Is this a problem which needs to be solved? userspace does something silly and the kernel behaves a bit suboptimally? If thats the only problem here then it's not worth adding fastpath cycles to fix it? ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20090415012027.GA4731@ThinkPad>]
* Re: [RFC][PATCH] vfs: check inode size on no_cached_page [not found] ` <20090415012027.GA4731@ThinkPad> @ 2009-04-15 1:22 ` Chenfeng Xu 2009-04-15 1:22 ` Chenfeng Xu 1 sibling, 0 replies; 6+ messages in thread From: Chenfeng Xu @ 2009-04-15 1:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel, fengguang.wu On Tue, Apr 14, 2009 at 05:11:14PM -0700, Andrew Morton wrote: > On Sun, 12 Apr 2009 15:16:05 +0800 > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > [This patch may not necessarily be merged, but at least we should > > be aware of the problem.] > > > > When user space requests past-EOF data, do_generic_file_read() will > > issue a bonus readpage call, which may be unfavorable. > > > > do_generic_file_read: > > -> find_page: > > -> find_get_page() = NULL > > -> page_cache_sync_readahead() > > -> find_get_page() = NULL > > -> no_cached_page: > > -> readpage: > > -> nfs_readpage() = error > > -> readpage_error: > > > > Reported-by: Xu Chenfeng <xcf@ustc.edu.cn> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > > --- > > mm/filemap.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > --- mm.orig/mm/filemap.c > > +++ mm/mm/filemap.c > > @@ -1269,6 +1269,11 @@ readpage_error: > > goto out; > > > > no_cached_page: > > + isize = i_size_read(inode); > > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > > + if (unlikely(!isize || index > end_index)) > > + goto out; > > + > > /* > > * Ok, it wasn't cached, so we need to create a new > > * page.. > > Is this a problem which needs to be solved? userspace does something > silly and the kernel behaves a bit suboptimally? > > If thats the only problem here then it's not worth adding fastpath > cycles to fix it? > Currently, it is the only problem I have found. It is caused by 'dd' which tries to access the page > end_index. However, as linux-based operating systems have many zero-size configure files, 'isize == 0' could be a more general case. Without this fastpath, 'no_cached_page' will create many unusable pages. I add the following lines and boot a small system in kvm. if (unlikely(!isize || index > end_index)) + { + printk(KERN_DEBUG "over read: %s %lu/%lu\n", + filp->f_path.dentry->d_name.name, index, end_index); goto out; + } The 'over read' message in boot time: Apr 6 23:57:43 kvm kernel: [ 4.334520] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 4.350895] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 4.365575] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 4.695195] over read: mtab 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 5.326886] over read: ifstate 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 5.326912] over read: ifstate 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 5.396599] over read: portmap_mapping 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.076429] over read: random-seed 1/0 Apr 6 23:57:43 kvm kernel: [ 6.152477] over read: utmp 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.152492] over read: utmp 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.367410] over read: etab 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.426226] over read: xtab.tmp 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.426229] over read: xtab 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.533658] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.550716] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.565761] over read: locale 0/18446744073709551615 Apr 6 23:57:47 kvm kernel: [ 10.122621] over read: environment 0/18446744073709551615 Apr 6 23:57:47 kvm kernel: [ 10.122751] over read: locale 0/18446744073709551615 Apr 6 23:58:49 kvm kernel: [ 72.388823] over read: etab.tmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 6.184260] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 6.221483] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 6.260493] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 7.185390] over read: mtab 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 8.960259] over read: ifstate 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 8.960322] over read: ifstate 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 9.155492] over read: portmap_mapping 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 10.922985] over read: random-seed 1/0 Apr 6 23:59:17 kvm kernel: [ 11.151462] over read: utmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.151497] over read: utmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.708856] over read: etab 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.810557] over read: xtab.tmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.810561] over read: xtab 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.954552] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.975360] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.994969] over read: locale 0/18446744073709551615 Apr 6 23:59:22 kvm kernel: [ 16.132338] over read: environment 0/18446744073709551615 Apr 6 23:59:22 kvm kernel: [ 16.132471] over read: locale 0/18446744073709551615 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH] vfs: check inode size on no_cached_page [not found] ` <20090415012027.GA4731@ThinkPad> 2009-04-15 1:22 ` Chenfeng Xu @ 2009-04-15 1:22 ` Chenfeng Xu 1 sibling, 0 replies; 6+ messages in thread From: Chenfeng Xu @ 2009-04-15 1:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel, fengguang.wu On Tue, Apr 14, 2009 at 05:11:14PM -0700, Andrew Morton wrote: > On Sun, 12 Apr 2009 15:16:05 +0800 > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > [This patch may not necessarily be merged, but at least we should > > be aware of the problem.] > > > > When user space requests past-EOF data, do_generic_file_read() will > > issue a bonus readpage call, which may be unfavorable. > > > > do_generic_file_read: > > -> find_page: > > -> find_get_page() = NULL > > -> page_cache_sync_readahead() > > -> find_get_page() = NULL > > -> no_cached_page: > > -> readpage: > > -> nfs_readpage() = error > > -> readpage_error: > > > > Reported-by: Xu Chenfeng <xcf@ustc.edu.cn> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > > --- > > mm/filemap.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > --- mm.orig/mm/filemap.c > > +++ mm/mm/filemap.c > > @@ -1269,6 +1269,11 @@ readpage_error: > > goto out; > > > > no_cached_page: > > + isize = i_size_read(inode); > > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > > + if (unlikely(!isize || index > end_index)) > > + goto out; > > + > > /* > > * Ok, it wasn't cached, so we need to create a new > > * page.. > > Is this a problem which needs to be solved? userspace does something > silly and the kernel behaves a bit suboptimally? > > If thats the only problem here then it's not worth adding fastpath > cycles to fix it? > Currently, it is the only problem I have found. It is caused by 'dd' which tries to access the page > end_index. However, as linux-based operating systems have many zero-size configure files, 'isize == 0' could be a more general case. Without this fastpath, 'no_cached_page' will create many unusable pages. I add the following lines and boot a small system in kvm. if (unlikely(!isize || index > end_index)) + { + printk(KERN_DEBUG "over read: %s %lu/%lu\n", + filp->f_path.dentry->d_name.name, index, end_index); goto out; + } The 'over read' message in boot time: Apr 6 23:57:43 kvm kernel: [ 4.334520] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 4.350895] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 4.365575] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 4.695195] over read: mtab 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 5.326886] over read: ifstate 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 5.326912] over read: ifstate 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 5.396599] over read: portmap_mapping 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.076429] over read: random-seed 1/0 Apr 6 23:57:43 kvm kernel: [ 6.152477] over read: utmp 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.152492] over read: utmp 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.367410] over read: etab 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.426226] over read: xtab.tmp 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.426229] over read: xtab 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.533658] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.550716] over read: locale 0/18446744073709551615 Apr 6 23:57:43 kvm kernel: [ 6.565761] over read: locale 0/18446744073709551615 Apr 6 23:57:47 kvm kernel: [ 10.122621] over read: environment 0/18446744073709551615 Apr 6 23:57:47 kvm kernel: [ 10.122751] over read: locale 0/18446744073709551615 Apr 6 23:58:49 kvm kernel: [ 72.388823] over read: etab.tmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 6.184260] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 6.221483] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 6.260493] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 7.185390] over read: mtab 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 8.960259] over read: ifstate 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 8.960322] over read: ifstate 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 9.155492] over read: portmap_mapping 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 10.922985] over read: random-seed 1/0 Apr 6 23:59:17 kvm kernel: [ 11.151462] over read: utmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.151497] over read: utmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.708856] over read: etab 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.810557] over read: xtab.tmp 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.810561] over read: xtab 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.954552] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.975360] over read: locale 0/18446744073709551615 Apr 6 23:59:17 kvm kernel: [ 11.994969] over read: locale 0/18446744073709551615 Apr 6 23:59:22 kvm kernel: [ 16.132338] over read: environment 0/18446744073709551615 Apr 6 23:59:22 kvm kernel: [ 16.132471] over read: locale 0/18446744073709551615 ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20090414171114.04a47932.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]
* Re: [RFC][PATCH] vfs: check inode size on no_cached_page [not found] ` <20090414171114.04a47932.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> @ 2009-04-15 1:36 ` Wu Fengguang 2009-04-15 2:39 ` Wu Fengguang 0 siblings, 1 reply; 6+ messages in thread From: Wu Fengguang @ 2009-04-15 1:36 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust On Wed, Apr 15, 2009 at 08:11:14AM +0800, Andrew Morton wrote: > On Sun, 12 Apr 2009 15:16:05 +0800 > Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > > [This patch may not necessarily be merged, but at least we should > > be aware of the problem.] > > > > When user space requests past-EOF data, do_generic_file_read() will > > issue a bonus readpage call, which may be unfavorable. > > > > do_generic_file_read: > > -> find_page: > > -> find_get_page() = NULL > > -> page_cache_sync_readahead() > > -> find_get_page() = NULL > > -> no_cached_page: > > -> readpage: > > -> nfs_readpage() = error > > -> readpage_error: Sorry nfs_readpage() will actually return 0 now. See below. > > > > Reported-by: Xu Chenfeng <xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org> > > Signed-off-by: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > --- > > mm/filemap.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > --- mm.orig/mm/filemap.c > > +++ mm/mm/filemap.c > > @@ -1269,6 +1269,11 @@ readpage_error: > > goto out; > > > > no_cached_page: > > + isize = i_size_read(inode); > > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > > + if (unlikely(!isize || index > end_index)) > > + goto out; > > + > > /* > > * Ok, it wasn't cached, so we need to create a new > > * page.. > > Is this a problem which needs to be solved? userspace does something > silly and the kernel behaves a bit suboptimally? > > If thats the only problem here then it's not worth adding fastpath > cycles to fix it? Yeah just some inefficiency in theory, so no fixing is necessary. The underlying fs code shall be able to do the right thing - just as if a concurrent truncate happened. The NFS case goes like this: nfs_readpage() { # some bonus accountings: nfs_inc_stats(inode, NFSIOS_VFSREADPAGE); nfs_add_stats(inode, NFSIOS_READPAGES, 1); nfs_readpage_async(page) nfs_return_empty_page(page) zero_user(page) # will zero the page return 0; } After it returns 0, do_generic_file_read() will goto page_ok and check i_size there, and free the past-EOF page. I wonder if NFS could be improved to: - move the NFSIOS_READPAGES accounting _after_ a successful read - return AOP_TRUNCATED_PAGE instead of zeroing the past-EOF page The following untested patch demonstrates the ideas. Thanks, Fengguang --- diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 96c4ebf..6688b46 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -76,15 +76,6 @@ void nfs_readdata_release(void *data) nfs_readdata_free(rdata); } -static -int nfs_return_empty_page(struct page *page) -{ - zero_user(page, 0, PAGE_CACHE_SIZE); - SetPageUptodate(page); - unlock_page(page); - return 0; -} - static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data) { unsigned int remainder = data->args.count - data->res.count; @@ -123,7 +114,8 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, len = nfs_page_length(page); if (len == 0) - return nfs_return_empty_page(page); + return AOP_TRUNCATED_PAGE; + new = nfs_create_request(ctx, inode, page, 0, len); if (IS_ERR(new)) { unlock_page(page); @@ -516,7 +508,6 @@ int nfs_readpage(struct file *file, struct page *page) dprintk("NFS: nfs_readpage (%p %ld@%lu)\n", page, PAGE_CACHE_SIZE, page->index); nfs_inc_stats(inode, NFSIOS_VFSREADPAGE); - nfs_add_stats(inode, NFSIOS_READPAGES, 1); /* * Try to flush any pending writes to the file.. @@ -550,6 +541,8 @@ int nfs_readpage(struct file *file, struct page *page) } error = nfs_readpage_async(ctx, inode, page); + if (!error) + nfs_add_stats(inode, NFSIOS_READPAGES, 1); out: put_nfs_open_context(ctx); @@ -575,7 +568,7 @@ readpage_async_filler(void *data, struct page *page) len = nfs_page_length(page); if (len == 0) - return nfs_return_empty_page(page); + return AOP_TRUNCATED_PAGE; new = nfs_create_request(desc->ctx, inode, page, 0, len); if (IS_ERR(new)) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH] vfs: check inode size on no_cached_page 2009-04-15 1:36 ` Wu Fengguang @ 2009-04-15 2:39 ` Wu Fengguang 0 siblings, 0 replies; 6+ messages in thread From: Wu Fengguang @ 2009-04-15 2:39 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, xcf@ustc.edu.cn, linux-nfs, Trond Myklebust On Wed, Apr 15, 2009 at 09:36:34AM +0800, Wu Fengguang wrote: > On Wed, Apr 15, 2009 at 08:11:14AM +0800, Andrew Morton wrote: > > On Sun, 12 Apr 2009 15:16:05 +0800 > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > [This patch may not necessarily be merged, but at least we should > > > be aware of the problem.] > > > > > > When user space requests past-EOF data, do_generic_file_read() will > > > issue a bonus readpage call, which may be unfavorable. > > > > > > do_generic_file_read: > > > -> find_page: > > > -> find_get_page() = NULL > > > -> page_cache_sync_readahead() > > > -> find_get_page() = NULL > > > -> no_cached_page: > > > -> readpage: > > > > -> nfs_readpage() = error > > > -> readpage_error: > > Sorry nfs_readpage() will actually return 0 now. See below. > > > > > > > Reported-by: Xu Chenfeng <xcf@ustc.edu.cn> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > > > --- > > > mm/filemap.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > --- mm.orig/mm/filemap.c > > > +++ mm/mm/filemap.c > > > @@ -1269,6 +1269,11 @@ readpage_error: > > > goto out; > > > > > > no_cached_page: > > > + isize = i_size_read(inode); > > > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > > > + if (unlikely(!isize || index > end_index)) > > > + goto out; > > > + > > > /* > > > * Ok, it wasn't cached, so we need to create a new > > > * page.. > > > > Is this a problem which needs to be solved? userspace does something > > silly and the kernel behaves a bit suboptimally? > > > > If thats the only problem here then it's not worth adding fastpath > > cycles to fix it? > > Yeah just some inefficiency in theory, so no fixing is necessary. > > The underlying fs code shall be able to do the right thing - just > as if a concurrent truncate happened. > > The NFS case goes like this: > > nfs_readpage() > { > # some bonus accountings: > nfs_inc_stats(inode, NFSIOS_VFSREADPAGE); > nfs_add_stats(inode, NFSIOS_READPAGES, 1); > > nfs_readpage_async(page) > nfs_return_empty_page(page) > zero_user(page) # will zero the page > > return 0; > } > > After it returns 0, do_generic_file_read() will goto page_ok and check > i_size there, and free the past-EOF page. > > I wonder if NFS could be improved to: > - move the NFSIOS_READPAGES accounting _after_ a successful read > - return AOP_TRUNCATED_PAGE instead of zeroing the past-EOF page Ah AOP_TRUNCATED_PAGE actually indicates to retry the read_page() call. Returning AOP_TRUNCATED_PAGE in nfs_read_page() in this case will create an infinite loop... Thanks, Fengguang > The following untested patch demonstrates the ideas. > > Thanks, > Fengguang > --- > > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > index 96c4ebf..6688b46 100644 > --- a/fs/nfs/read.c > +++ b/fs/nfs/read.c > @@ -76,15 +76,6 @@ void nfs_readdata_release(void *data) > nfs_readdata_free(rdata); > } > > -static > -int nfs_return_empty_page(struct page *page) > -{ > - zero_user(page, 0, PAGE_CACHE_SIZE); > - SetPageUptodate(page); > - unlock_page(page); > - return 0; > -} > - > static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data) > { > unsigned int remainder = data->args.count - data->res.count; > @@ -123,7 +114,8 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, > > len = nfs_page_length(page); > if (len == 0) > - return nfs_return_empty_page(page); > + return AOP_TRUNCATED_PAGE; > + > new = nfs_create_request(ctx, inode, page, 0, len); > if (IS_ERR(new)) { > unlock_page(page); > @@ -516,7 +508,6 @@ int nfs_readpage(struct file *file, struct page *page) > dprintk("NFS: nfs_readpage (%p %ld@%lu)\n", > page, PAGE_CACHE_SIZE, page->index); > nfs_inc_stats(inode, NFSIOS_VFSREADPAGE); > - nfs_add_stats(inode, NFSIOS_READPAGES, 1); > > /* > * Try to flush any pending writes to the file.. > @@ -550,6 +541,8 @@ int nfs_readpage(struct file *file, struct page *page) > } > > error = nfs_readpage_async(ctx, inode, page); > + if (!error) > + nfs_add_stats(inode, NFSIOS_READPAGES, 1); > > out: > put_nfs_open_context(ctx); > @@ -575,7 +568,7 @@ readpage_async_filler(void *data, struct page *page) > > len = nfs_page_length(page); > if (len == 0) > - return nfs_return_empty_page(page); > + return AOP_TRUNCATED_PAGE; > > new = nfs_create_request(desc->ctx, inode, page, 0, len); > if (IS_ERR(new)) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-15 2:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-12 7:16 [RFC][PATCH] vfs: check inode size on no_cached_page Wu Fengguang
2009-04-15 0:11 ` Andrew Morton
[not found] ` <20090415012027.GA4731@ThinkPad>
2009-04-15 1:22 ` Chenfeng Xu
2009-04-15 1:22 ` Chenfeng Xu
[not found] ` <20090414171114.04a47932.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-04-15 1:36 ` Wu Fengguang
2009-04-15 2:39 ` Wu Fengguang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).