From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org"
<xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org>,
linux-nfs@vger.kernel.org,
Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [RFC][PATCH] vfs: check inode size on no_cached_page
Date: Wed, 15 Apr 2009 09:36:34 +0800 [thread overview]
Message-ID: <20090415013634.GB6143@localhost> (raw)
In-Reply-To: <20090414171114.04a47932.akpm@linux-foundation.org>
On Wed, Apr 15, 2009 at 08:11:14AM +0800, Andrew Morton wrote:
> On Sun, 12 Apr 2009 15:16:05 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > [This patch may not necessarily be merged, but at least we should
> > be aware of the problem.]
> >
> > When user space requests past-EOF data, do_generic_file_read() will
> > issue a bonus readpage call, which may be unfavorable.
> >
> > do_generic_file_read:
> > -> find_page:
> > -> find_get_page() = NULL
> > -> page_cache_sync_readahead()
> > -> find_get_page() = NULL
> > -> no_cached_page:
> > -> readpage:
> > -> nfs_readpage() = error
> > -> readpage_error:
Sorry nfs_readpage() will actually return 0 now. See below.
> >
> > Reported-by: Xu Chenfeng <xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> > mm/filemap.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > --- mm.orig/mm/filemap.c
> > +++ mm/mm/filemap.c
> > @@ -1269,6 +1269,11 @@ readpage_error:
> > goto out;
> >
> > no_cached_page:
> > + isize = i_size_read(inode);
> > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> > + if (unlikely(!isize || index > end_index))
> > + goto out;
> > +
> > /*
> > * Ok, it wasn't cached, so we need to create a new
> > * page..
>
> Is this a problem which needs to be solved? userspace does something
> silly and the kernel behaves a bit suboptimally?
>
> If thats the only problem here then it's not worth adding fastpath
> cycles to fix it?
Yeah just some inefficiency in theory, so no fixing is necessary.
The underlying fs code shall be able to do the right thing - just
as if a concurrent truncate happened.
The NFS case goes like this:
nfs_readpage()
{
# some bonus accountings:
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
nfs_add_stats(inode, NFSIOS_READPAGES, 1);
nfs_readpage_async(page)
nfs_return_empty_page(page)
zero_user(page) # will zero the page
return 0;
}
After it returns 0, do_generic_file_read() will goto page_ok and check
i_size there, and free the past-EOF page.
I wonder if NFS could be improved to:
- move the NFSIOS_READPAGES accounting _after_ a successful read
- return AOP_TRUNCATED_PAGE instead of zeroing the past-EOF page
The following untested patch demonstrates the ideas.
Thanks,
Fengguang
---
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 96c4ebf..6688b46 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -76,15 +76,6 @@ void nfs_readdata_release(void *data)
nfs_readdata_free(rdata);
}
-static
-int nfs_return_empty_page(struct page *page)
-{
- zero_user(page, 0, PAGE_CACHE_SIZE);
- SetPageUptodate(page);
- unlock_page(page);
- return 0;
-}
-
static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data)
{
unsigned int remainder = data->args.count - data->res.count;
@@ -123,7 +114,8 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
len = nfs_page_length(page);
if (len == 0)
- return nfs_return_empty_page(page);
+ return AOP_TRUNCATED_PAGE;
+
new = nfs_create_request(ctx, inode, page, 0, len);
if (IS_ERR(new)) {
unlock_page(page);
@@ -516,7 +508,6 @@ int nfs_readpage(struct file *file, struct page *page)
dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
page, PAGE_CACHE_SIZE, page->index);
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
- nfs_add_stats(inode, NFSIOS_READPAGES, 1);
/*
* Try to flush any pending writes to the file..
@@ -550,6 +541,8 @@ int nfs_readpage(struct file *file, struct page *page)
}
error = nfs_readpage_async(ctx, inode, page);
+ if (!error)
+ nfs_add_stats(inode, NFSIOS_READPAGES, 1);
out:
put_nfs_open_context(ctx);
@@ -575,7 +568,7 @@ readpage_async_filler(void *data, struct page *page)
len = nfs_page_length(page);
if (len == 0)
- return nfs_return_empty_page(page);
+ return AOP_TRUNCATED_PAGE;
new = nfs_create_request(desc->ctx, inode, page, 0, len);
if (IS_ERR(new))
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"xcf@ustc.edu.cn" <xcf@ustc.edu.cn>,
linux-nfs@vger.kernel.org,
Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [RFC][PATCH] vfs: check inode size on no_cached_page
Date: Wed, 15 Apr 2009 09:36:34 +0800 [thread overview]
Message-ID: <20090415013634.GB6143@localhost> (raw)
In-Reply-To: <20090414171114.04a47932.akpm@linux-foundation.org>
On Wed, Apr 15, 2009 at 08:11:14AM +0800, Andrew Morton wrote:
> On Sun, 12 Apr 2009 15:16:05 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > [This patch may not necessarily be merged, but at least we should
> > be aware of the problem.]
> >
> > When user space requests past-EOF data, do_generic_file_read() will
> > issue a bonus readpage call, which may be unfavorable.
> >
> > do_generic_file_read:
> > -> find_page:
> > -> find_get_page() = NULL
> > -> page_cache_sync_readahead()
> > -> find_get_page() = NULL
> > -> no_cached_page:
> > -> readpage:
> > -> nfs_readpage() = error
> > -> readpage_error:
Sorry nfs_readpage() will actually return 0 now. See below.
> >
> > Reported-by: Xu Chenfeng <xcf@ustc.edu.cn>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> > mm/filemap.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > --- mm.orig/mm/filemap.c
> > +++ mm/mm/filemap.c
> > @@ -1269,6 +1269,11 @@ readpage_error:
> > goto out;
> >
> > no_cached_page:
> > + isize = i_size_read(inode);
> > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> > + if (unlikely(!isize || index > end_index))
> > + goto out;
> > +
> > /*
> > * Ok, it wasn't cached, so we need to create a new
> > * page..
>
> Is this a problem which needs to be solved? userspace does something
> silly and the kernel behaves a bit suboptimally?
>
> If thats the only problem here then it's not worth adding fastpath
> cycles to fix it?
Yeah just some inefficiency in theory, so no fixing is necessary.
The underlying fs code shall be able to do the right thing - just
as if a concurrent truncate happened.
The NFS case goes like this:
nfs_readpage()
{
# some bonus accountings:
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
nfs_add_stats(inode, NFSIOS_READPAGES, 1);
nfs_readpage_async(page)
nfs_return_empty_page(page)
zero_user(page) # will zero the page
return 0;
}
After it returns 0, do_generic_file_read() will goto page_ok and check
i_size there, and free the past-EOF page.
I wonder if NFS could be improved to:
- move the NFSIOS_READPAGES accounting _after_ a successful read
- return AOP_TRUNCATED_PAGE instead of zeroing the past-EOF page
The following untested patch demonstrates the ideas.
Thanks,
Fengguang
---
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 96c4ebf..6688b46 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -76,15 +76,6 @@ void nfs_readdata_release(void *data)
nfs_readdata_free(rdata);
}
-static
-int nfs_return_empty_page(struct page *page)
-{
- zero_user(page, 0, PAGE_CACHE_SIZE);
- SetPageUptodate(page);
- unlock_page(page);
- return 0;
-}
-
static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data)
{
unsigned int remainder = data->args.count - data->res.count;
@@ -123,7 +114,8 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
len = nfs_page_length(page);
if (len == 0)
- return nfs_return_empty_page(page);
+ return AOP_TRUNCATED_PAGE;
+
new = nfs_create_request(ctx, inode, page, 0, len);
if (IS_ERR(new)) {
unlock_page(page);
@@ -516,7 +508,6 @@ int nfs_readpage(struct file *file, struct page *page)
dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
page, PAGE_CACHE_SIZE, page->index);
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
- nfs_add_stats(inode, NFSIOS_READPAGES, 1);
/*
* Try to flush any pending writes to the file..
@@ -550,6 +541,8 @@ int nfs_readpage(struct file *file, struct page *page)
}
error = nfs_readpage_async(ctx, inode, page);
+ if (!error)
+ nfs_add_stats(inode, NFSIOS_READPAGES, 1);
out:
put_nfs_open_context(ctx);
@@ -575,7 +568,7 @@ readpage_async_filler(void *data, struct page *page)
len = nfs_page_length(page);
if (len == 0)
- return nfs_return_empty_page(page);
+ return AOP_TRUNCATED_PAGE;
new = nfs_create_request(desc->ctx, inode, page, 0, len);
if (IS_ERR(new))
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org"
<xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org>,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Trond Myklebust
<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC][PATCH] vfs: check inode size on no_cached_page
Date: Wed, 15 Apr 2009 09:36:34 +0800 [thread overview]
Message-ID: <20090415013634.GB6143@localhost> (raw)
In-Reply-To: <20090414171114.04a47932.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
On Wed, Apr 15, 2009 at 08:11:14AM +0800, Andrew Morton wrote:
> On Sun, 12 Apr 2009 15:16:05 +0800
> Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>
> > [This patch may not necessarily be merged, but at least we should
> > be aware of the problem.]
> >
> > When user space requests past-EOF data, do_generic_file_read() will
> > issue a bonus readpage call, which may be unfavorable.
> >
> > do_generic_file_read:
> > -> find_page:
> > -> find_get_page() = NULL
> > -> page_cache_sync_readahead()
> > -> find_get_page() = NULL
> > -> no_cached_page:
> > -> readpage:
> > -> nfs_readpage() = error
> > -> readpage_error:
Sorry nfs_readpage() will actually return 0 now. See below.
> >
> > Reported-by: Xu Chenfeng <xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org>
> > Signed-off-by: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> > mm/filemap.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > --- mm.orig/mm/filemap.c
> > +++ mm/mm/filemap.c
> > @@ -1269,6 +1269,11 @@ readpage_error:
> > goto out;
> >
> > no_cached_page:
> > + isize = i_size_read(inode);
> > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> > + if (unlikely(!isize || index > end_index))
> > + goto out;
> > +
> > /*
> > * Ok, it wasn't cached, so we need to create a new
> > * page..
>
> Is this a problem which needs to be solved? userspace does something
> silly and the kernel behaves a bit suboptimally?
>
> If thats the only problem here then it's not worth adding fastpath
> cycles to fix it?
Yeah just some inefficiency in theory, so no fixing is necessary.
The underlying fs code shall be able to do the right thing - just
as if a concurrent truncate happened.
The NFS case goes like this:
nfs_readpage()
{
# some bonus accountings:
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
nfs_add_stats(inode, NFSIOS_READPAGES, 1);
nfs_readpage_async(page)
nfs_return_empty_page(page)
zero_user(page) # will zero the page
return 0;
}
After it returns 0, do_generic_file_read() will goto page_ok and check
i_size there, and free the past-EOF page.
I wonder if NFS could be improved to:
- move the NFSIOS_READPAGES accounting _after_ a successful read
- return AOP_TRUNCATED_PAGE instead of zeroing the past-EOF page
The following untested patch demonstrates the ideas.
Thanks,
Fengguang
---
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 96c4ebf..6688b46 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -76,15 +76,6 @@ void nfs_readdata_release(void *data)
nfs_readdata_free(rdata);
}
-static
-int nfs_return_empty_page(struct page *page)
-{
- zero_user(page, 0, PAGE_CACHE_SIZE);
- SetPageUptodate(page);
- unlock_page(page);
- return 0;
-}
-
static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data)
{
unsigned int remainder = data->args.count - data->res.count;
@@ -123,7 +114,8 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
len = nfs_page_length(page);
if (len == 0)
- return nfs_return_empty_page(page);
+ return AOP_TRUNCATED_PAGE;
+
new = nfs_create_request(ctx, inode, page, 0, len);
if (IS_ERR(new)) {
unlock_page(page);
@@ -516,7 +508,6 @@ int nfs_readpage(struct file *file, struct page *page)
dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
page, PAGE_CACHE_SIZE, page->index);
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
- nfs_add_stats(inode, NFSIOS_READPAGES, 1);
/*
* Try to flush any pending writes to the file..
@@ -550,6 +541,8 @@ int nfs_readpage(struct file *file, struct page *page)
}
error = nfs_readpage_async(ctx, inode, page);
+ if (!error)
+ nfs_add_stats(inode, NFSIOS_READPAGES, 1);
out:
put_nfs_open_context(ctx);
@@ -575,7 +568,7 @@ readpage_async_filler(void *data, struct page *page)
len = nfs_page_length(page);
if (len == 0)
- return nfs_return_empty_page(page);
+ return AOP_TRUNCATED_PAGE;
new = nfs_create_request(desc->ctx, inode, page, 0, len);
if (IS_ERR(new))
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-04-15 1:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-12 7:16 [RFC][PATCH] vfs: check inode size on no_cached_page Wu Fengguang
2009-04-15 0:11 ` Andrew Morton
2009-04-15 1:22 ` Chenfeng Xu
2009-04-15 1:22 ` Chenfeng Xu
2009-04-15 1:36 ` Wu Fengguang [this message]
2009-04-15 1:36 ` Wu Fengguang
2009-04-15 1:36 ` Wu Fengguang
2009-04-15 2:39 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090415013634.GB6143@localhost \
--to=fengguang.wu@intel.com \
--cc=Trond.Myklebust@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=xcf-pcTkQGq1WEvM1kAEIRd3EQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.