linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Fwd: fsx-linux failing with latest cifs-2.6 git tree
       [not found]             ` <524f69650811211218v78295682lcf6dce842327b097@mail.gmail.com>
@ 2008-11-21 20:38               ` Steve French
  2008-11-21 20:41                 ` Dave Kleikamp
                                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Steve French @ 2008-11-21 20:38 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Nick Piggin, linux-fsdevel, pbadari, Jeff Layton,
	linux-cifs-client@lists.samba.org

[-- Attachment #1: Type: text/plain, Size: 2352 bytes --]

Fix attached.

Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?

On Fri, Nov 21, 2008 at 2:18 PM, Steve French <smfrench@gmail.com> wrote:
> Looks like the following change to cifs_write_begin does fix it ...
> thanks Shaggy ...
>
> @@ -2062,8 +2074,10 @@ static int cifs_write_begin(struct file *file,
> struct address_space *mapping,
>  {
>        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
>        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> +       loff_t page_start = pos & PAGE_MASK;
>
>       cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
> @@ -2081,13 +2095,14 @@ static int cifs_write_begin(struct file *file,
> struct address_space *mapping,
>                int rc;
>
>                /* might as well read a page, it is fast enough */
> -               rc = cifs_readpage_worker(file, *pagep, &offset);
> +               rc = cifs_readpage_worker(file, *pagep, &page_start);
>
>                /* we do not need to pass errors back
>                   e.g. if we do not have read access to the file
>                   because cifs_write_end will attempt synchronous writes
>                   -- shaggy */
>
>
> On Fri, Nov 21, 2008 at 1:41 PM, Dave Kleikamp
> <shaggy@linux.vnet.ibm.com> wrote:
>> On Fri, 2008-11-21 at 13:13 -0600, Steve French wrote:
>>> Looks like this section of code is wrong in cifs_write_begin:
>>>
>>>         if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
>>>                 int rc;
>>>
>>>                 /* might as well read a page, it is fast enough */
>>>                 rc = cifs_readpage_worker(file, *pagep, &offset);
>>>
>>>                 /* we do not need to pass errors back
>>>                    e.g. if we do not have read access to the file
>>>                    because cifs_write_end will attempt synchronous writes
>>>                    -- shaggy */
>>>
>>>
>>> We see a case in which a write begins at offset 0x2e42f but the range
>>> immediately before it is unitialized in write_begin
>>>
>>> shouldn't we be doing a read of the whole page?
>>
>> What cifs_write_begin() passes in as offset is completely wrong.  It
>> should be the file offset of the beginning of the page rather than some
>> offset within the page.
>> --
>> David Kleikamp
>> IBM Linux Technology Center
>>
>>
>
>
>
> --
> Thanks,
>
> Steve
>



-- 
Thanks,

Steve

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fsx-fail-fix.patch --]
[-- Type: text/x-diff; name=fsx-fail-fix.patch, Size: 852 bytes --]

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..726d437 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2061,7 +2061,7 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 			struct page **pagep, void **fsdata)
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
-	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t offset_of_page_start = pos & PAGE_MASK;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
@@ -2081,7 +2081,7 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 		int rc;
 
 		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+		rc = cifs_readpage_worker(file, *pagep, &offset_of_page_start);
 
 		/* we do not need to pass errors back
 		   e.g. if we do not have read access to the file

[-- Attachment #3: Type: text/plain, Size: 172 bytes --]

_______________________________________________
linux-cifs-client mailing list
linux-cifs-client@lists.samba.org
https://lists.samba.org/mailman/listinfo/linux-cifs-client

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Fwd: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 20:38               ` Fwd: fsx-linux failing with latest cifs-2.6 git tree Steve French
@ 2008-11-21 20:41                 ` Dave Kleikamp
  2008-11-21 21:02                   ` Steve French
  2008-11-21 23:44                   ` Nick Piggin
  2008-11-21 20:50                 ` Jeff Layton
  2008-11-21 22:50                 ` Jeff Layton
  2 siblings, 2 replies; 38+ messages in thread
From: Dave Kleikamp @ 2008-11-21 20:41 UTC (permalink / raw)
  To: Steve French
  Cc: Nick Piggin, Jeff Layton, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 2008-11-21 at 14:38 -0600, Steve French wrote:
> Fix attached.
> 
> Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?

Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

> 
> On Fri, Nov 21, 2008 at 2:18 PM, Steve French <smfrench@gmail.com> wrote:
> > Looks like the following change to cifs_write_begin does fix it ...
> > thanks Shaggy ...
> >
> > @@ -2062,8 +2074,10 @@ static int cifs_write_begin(struct file *file,
> > struct address_space *mapping,
> >  {
> >        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
> >        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> > +       loff_t page_start = pos & PAGE_MASK;
> >
> >       cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
> > @@ -2081,13 +2095,14 @@ static int cifs_write_begin(struct file *file,
> > struct address_space *mapping,
> >                int rc;
> >
> >                /* might as well read a page, it is fast enough */
> > -               rc = cifs_readpage_worker(file, *pagep, &offset);
> > +               rc = cifs_readpage_worker(file, *pagep, &page_start);
> >
> >                /* we do not need to pass errors back
> >                   e.g. if we do not have read access to the file
> >                   because cifs_write_end will attempt synchronous writes
> >                   -- shaggy */
> >
> >

-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 20:38               ` Fwd: fsx-linux failing with latest cifs-2.6 git tree Steve French
  2008-11-21 20:41                 ` Dave Kleikamp
@ 2008-11-21 20:50                 ` Jeff Layton
  2008-11-21 22:50                 ` Jeff Layton
  2 siblings, 0 replies; 38+ messages in thread
From: Jeff Layton @ 2008-11-21 20:50 UTC (permalink / raw)
  To: Steve French
  Cc: Dave Kleikamp, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 21 Nov 2008 14:38:18 -0600
"Steve French" <smfrench@gmail.com> wrote:

> Fix attached.
> 
> Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> 
> On Fri, Nov 21, 2008 at 2:18 PM, Steve French <smfrench@gmail.com> wrote:
> > Looks like the following change to cifs_write_begin does fix it ...
> > thanks Shaggy ...
> >
> > @@ -2062,8 +2074,10 @@ static int cifs_write_begin(struct file *file,
> > struct address_space *mapping,
> >  {
> >        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
> >        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> > +       loff_t page_start = pos & PAGE_MASK;
> >
> >       cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
> > @@ -2081,13 +2095,14 @@ static int cifs_write_begin(struct file *file,
> > struct address_space *mapping,
> >                int rc;
> >
> >                /* might as well read a page, it is fast enough */
> > -               rc = cifs_readpage_worker(file, *pagep, &offset);
> > +               rc = cifs_readpage_worker(file, *pagep, &page_start);
> >
> >                /* we do not need to pass errors back
> >                   e.g. if we do not have read access to the file
> >                   because cifs_write_end will attempt synchronous writes
> >                   -- shaggy */
> >
> >
> > On Fri, Nov 21, 2008 at 1:41 PM, Dave Kleikamp
> > <shaggy@linux.vnet.ibm.com> wrote:
> >> On Fri, 2008-11-21 at 13:13 -0600, Steve French wrote:
> >>> Looks like this section of code is wrong in cifs_write_begin:
> >>>
> >>>         if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
> >>>                 int rc;
> >>>
> >>>                 /* might as well read a page, it is fast enough */
> >>>                 rc = cifs_readpage_worker(file, *pagep, &offset);
> >>>
> >>>                 /* we do not need to pass errors back
> >>>                    e.g. if we do not have read access to the file
> >>>                    because cifs_write_end will attempt synchronous writes
> >>>                    -- shaggy */
> >>>
> >>>
> >>> We see a case in which a write begins at offset 0x2e42f but the range
> >>> immediately before it is unitialized in write_begin
> >>>
> >>> shouldn't we be doing a read of the whole page?
> >>
> >> What cifs_write_begin() passes in as offset is completely wrong.  It
> >> should be the file offset of the beginning of the page rather than some
> >> offset within the page.
> >> --
> >> David Kleikamp
> >> IBM Linux Technology Center
> >>

Good catch...

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fwd: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 20:41                 ` Dave Kleikamp
@ 2008-11-21 21:02                   ` Steve French
  2008-11-21 23:44                   ` Nick Piggin
  1 sibling, 0 replies; 38+ messages in thread
From: Steve French @ 2008-11-21 21:02 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Nick Piggin, Jeff Layton, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

thx ... the only other thing I noticed was that the original patch removed

2067         if ((offset >= i_size) ||
2068             ((from == 0) && (offset + to) >= i_size)) {
2069                 /*
2070                  * We don't need to read data beyond the end of the file.
2071                  * zero it, and set the page uptodate
2072                  */
2073                 simple_prepare_write(file, page, from, to);
2074                 SetPageUptodate(page);

which means we do a lot of extra reads (which return no bytes) when we
write near or at end of file



On Fri, Nov 21, 2008 at 2:41 PM, Dave Kleikamp
<shaggy@linux.vnet.ibm.com> wrote:
> On Fri, 2008-11-21 at 14:38 -0600, Steve French wrote:
>> Fix attached.
>>
>> Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
>
> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
>
>>
>> On Fri, Nov 21, 2008 at 2:18 PM, Steve French <smfrench@gmail.com> wrote:
>> > Looks like the following change to cifs_write_begin does fix it ...
>> > thanks Shaggy ...
>> >
>> > @@ -2062,8 +2074,10 @@ static int cifs_write_begin(struct file *file,
>> > struct address_space *mapping,
>> >  {
>> >        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
>> >        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
>> > +       loff_t page_start = pos & PAGE_MASK;
>> >
>> >       cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
>> > @@ -2081,13 +2095,14 @@ static int cifs_write_begin(struct file *file,
>> > struct address_space *mapping,
>> >                int rc;
>> >
>> >                /* might as well read a page, it is fast enough */
>> > -               rc = cifs_readpage_worker(file, *pagep, &offset);
>> > +               rc = cifs_readpage_worker(file, *pagep, &page_start);
>> >
>> >                /* we do not need to pass errors back
>> >                   e.g. if we do not have read access to the file
>> >                   because cifs_write_end will attempt synchronous writes
>> >                   -- shaggy */
>> >
>> >
>
> --
> David Kleikamp
> IBM Linux Technology Center
>
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 20:38               ` Fwd: fsx-linux failing with latest cifs-2.6 git tree Steve French
  2008-11-21 20:41                 ` Dave Kleikamp
  2008-11-21 20:50                 ` Jeff Layton
@ 2008-11-21 22:50                 ` Jeff Layton
  2008-11-21 23:02                   ` Dave Kleikamp
  2008-11-21 23:53                   ` Nick Piggin
  2 siblings, 2 replies; 38+ messages in thread
From: Jeff Layton @ 2008-11-21 22:50 UTC (permalink / raw)
  To: Steve French
  Cc: Dave Kleikamp, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 21 Nov 2008 14:38:18 -0600
"Steve French" <smfrench@gmail.com> wrote:

> Fix attached.
> 
> Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> 

Talking with Steve on IRC, we thought it might be better to optimize
away the read when possible. I think this patch should do it. We skip
the read if the write starts past the current end of the file, or if
the offset into the page of the beginning of the write is 0 and we're
writing past the current end of the file. In those situations we just
zero out the rest of the page.

Combined patch inlined below. I also took the liberty of adding a page
pointer to make the code look a little cleaner.

Thoughts?

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@us.ibm.com>
---
 fs/cifs/file.c |   42 ++++++++++++++++++++++++++++++------------
 1 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..1d252b8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2062,26 +2062,43 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t page_start = pos & PAGE_MASK;
+	loff_t i_size;
+	struct page *page;
+	char *buf;
+	int rc = 0;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
-	*pagep = __grab_cache_page(mapping, index);
-	if (!*pagep)
-		return -ENOMEM;
+	page = __grab_cache_page(mapping, index);
+	if (!page) {
+		rc = -ENOMEM;
+		goto out;
+	}
 
-	if (PageUptodate(*pagep))
-		return 0;
+	if (PageUptodate(page))
+		goto out;
 
 	/* If we are writing a full page it will be up to date,
 	   no need to read from the server */
 	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
-		return 0;
-
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-		int rc;
+		goto out;
 
+	i_size = i_size_read(page->mapping->host);
+	if (pos >= i_size || ((offset == 0) && ((pos + len) >= i_size))) {
+		/*
+		 * optimize away the read when the unwritten part of the page
+		 * isn't expected to have any existing file data. Just zero
+		 * out the unused parts.
+		 */
+		buf = kmap(page);
+		memset(buf, 0, offset);
+		memset(buf + offset + len, 0, PAGE_CACHE_SIZE - (offset + len));
+		kunmap(page);
+		SetPageUptodate(page);
+	} else if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
 		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+		cifs_readpage_worker(file, page, &page_start);
 
 		/* we do not need to pass errors back
 		   e.g. if we do not have read access to the file
@@ -2093,8 +2110,9 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 		   racing with this read? In any case
 		   this will be written out by write_end so is fine */
 	}
-
-	return 0;
+out:
+	*pagep = page;
+	return rc;
 }
 
 const struct address_space_operations cifs_addr_ops = {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 22:50                 ` Jeff Layton
@ 2008-11-21 23:02                   ` Dave Kleikamp
  2008-11-21 23:25                     ` Jeff Layton
  2008-11-21 23:53                   ` Nick Piggin
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Kleikamp @ 2008-11-21 23:02 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 2008-11-21 at 17:50 -0500, Jeff Layton wrote:
> On Fri, 21 Nov 2008 14:38:18 -0600
> "Steve French" <smfrench@gmail.com> wrote:
> 
> > Fix attached.
> > 
> > Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> > 
> 
> Talking with Steve on IRC, we thought it might be better to optimize
> away the read when possible. I think this patch should do it. We skip
> the read if the write starts past the current end of the file, or if
> the offset into the page of the beginning of the write is 0 and we're
> writing past the current end of the file. In those situations we just
> zero out the rest of the page.
> 
> Combined patch inlined below. I also took the liberty of adding a page
> pointer to make the code look a little cleaner.
> 
> Thoughts?
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> Signed-off-by: Steve French <smfrench@us.ibm.com>
> ---
>  fs/cifs/file.c |   42 ++++++++++++++++++++++++++++++------------
>  1 files changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index b691b89..1d252b8 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2062,26 +2062,43 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>  {
>  	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
>  	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> +	loff_t page_start = pos & PAGE_MASK;
> +	loff_t i_size;
> +	struct page *page;
> +	char *buf;
> +	int rc = 0;
> 
>  	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
> 
> -	*pagep = __grab_cache_page(mapping, index);
> -	if (!*pagep)
> -		return -ENOMEM;
> +	page = __grab_cache_page(mapping, index);
> +	if (!page) {
> +		rc = -ENOMEM;
> +		goto out;
> +	}
> 
> -	if (PageUptodate(*pagep))
> -		return 0;
> +	if (PageUptodate(page))
> +		goto out;
> 
>  	/* If we are writing a full page it will be up to date,
>  	   no need to read from the server */
>  	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
> -		return 0;
> -
> -	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
> -		int rc;
> +		goto out;
> 
> +	i_size = i_size_read(page->mapping->host);
> +	if (pos >= i_size || ((offset == 0) && ((pos + len) >= i_size))) {

I don't think the (pos >= i_size) part of this test is correct.  We
would still need to preserve the data between the start of the page and
i_size.

The second part of the test makes sense.

> +		/*
> +		 * optimize away the read when the unwritten part of the page
> +		 * isn't expected to have any existing file data. Just zero
> +		 * out the unused parts.
> +		 */
> +		buf = kmap(page);
> +		memset(buf, 0, offset);
> +		memset(buf + offset + len, 0, PAGE_CACHE_SIZE - (offset + len));
> +		kunmap(page);
> +		SetPageUptodate(page);
> +	} else if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
>  		/* might as well read a page, it is fast enough */
> -		rc = cifs_readpage_worker(file, *pagep, &offset);
> +		cifs_readpage_worker(file, page, &page_start);
> 
>  		/* we do not need to pass errors back
>  		   e.g. if we do not have read access to the file
> @@ -2093,8 +2110,9 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>  		   racing with this read? In any case
>  		   this will be written out by write_end so is fine */
>  	}
> -
> -	return 0;
> +out:
> +	*pagep = page;
> +	return rc;
>  }
> 
>  const struct address_space_operations cifs_addr_ops = {
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 23:02                   ` Dave Kleikamp
@ 2008-11-21 23:25                     ` Jeff Layton
  2008-11-22  1:04                       ` Steve French
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-21 23:25 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Steve French, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 21 Nov 2008 23:02:48 +0000
Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
> 
> I don't think the (pos >= i_size) part of this test is correct.  We
> would still need to preserve the data between the start of the page and
> i_size.
> 
> The second part of the test makes sense.
> 

Thanks Dave, good catch.

I think this is what we want. Skip the read if the entire page is
beyond the end of file, or if the write will clobber all of the data
that would be read in.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@us.ibm.com>
---
 fs/cifs/file.c |   42 ++++++++++++++++++++++++++++++------------
 1 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..0c7ef2a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2062,26 +2062,43 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t page_start = pos & PAGE_MASK;
+	loff_t i_size;
+	struct page *page;
+	char *buf;
+	int rc = 0;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
-	*pagep = __grab_cache_page(mapping, index);
-	if (!*pagep)
-		return -ENOMEM;
+	page = __grab_cache_page(mapping, index);
+	if (!page) {
+		rc = -ENOMEM;
+		goto out;
+	}
 
-	if (PageUptodate(*pagep))
-		return 0;
+	if (PageUptodate(page))
+		goto out;
 
 	/* If we are writing a full page it will be up to date,
 	   no need to read from the server */
 	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
-		return 0;
-
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-		int rc;
+		goto out;
 
+	i_size = i_size_read(page->mapping->host);
+	if (page_start >= i_size || (offset == 0 && (pos + len) >= i_size)) {
+		/*
+		 * optimize away the read when the unwritten part of the page
+		 * isn't expected to have any existing file data. Just zero
+		 * out the unused parts.
+		 */
+		buf = kmap(page);
+		memset(buf, 0, offset);
+		memset(buf + offset + len, 0, PAGE_CACHE_SIZE - (offset + len));
+		kunmap(page);
+		SetPageUptodate(page);
+	} else if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
 		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+		cifs_readpage_worker(file, page, &page_start);
 
 		/* we do not need to pass errors back
 		   e.g. if we do not have read access to the file
@@ -2093,8 +2110,9 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 		   racing with this read? In any case
 		   this will be written out by write_end so is fine */
 	}
-
-	return 0;
+out:
+	*pagep = page;
+	return rc;
 }
 
 const struct address_space_operations cifs_addr_ops = {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Fwd: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 20:41                 ` Dave Kleikamp
  2008-11-21 21:02                   ` Steve French
@ 2008-11-21 23:44                   ` Nick Piggin
  1 sibling, 0 replies; 38+ messages in thread
From: Nick Piggin @ 2008-11-21 23:44 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Steve French, Jeff Layton, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, Nov 21, 2008 at 02:41:41PM -0600, Dave Kleikamp wrote:
> On Fri, 2008-11-21 at 14:38 -0600, Steve French wrote:
> > Fix attached.
> > 
> > Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> 
> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
> 
> > 
> > On Fri, Nov 21, 2008 at 2:18 PM, Steve French <smfrench@gmail.com> wrote:
> > > Looks like the following change to cifs_write_begin does fix it ...
> > > thanks Shaggy ...
> > >
> > > @@ -2062,8 +2074,10 @@ static int cifs_write_begin(struct file *file,
> > > struct address_space *mapping,
> > >  {
> > >        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
> > >        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> > > +       loff_t page_start = pos & PAGE_MASK;
> > >
> > >       cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
> > > @@ -2081,13 +2095,14 @@ static int cifs_write_begin(struct file *file,
> > > struct address_space *mapping,
> > >                int rc;
> > >
> > >                /* might as well read a page, it is fast enough */
> > > -               rc = cifs_readpage_worker(file, *pagep, &offset);
> > > +               rc = cifs_readpage_worker(file, *pagep, &page_start);
> > >
> > >                /* we do not need to pass errors back
> > >                   e.g. if we do not have read access to the file
> > >                   because cifs_write_end will attempt synchronous writes
> > >                   -- shaggy */
> > >
> > >

Thanks guys. Yeah that is how I thought cifs_readpage_worker should
work when making that change, but obiously didn't have an eye for
details. Thanks Shaggy!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 22:50                 ` Jeff Layton
  2008-11-21 23:02                   ` Dave Kleikamp
@ 2008-11-21 23:53                   ` Nick Piggin
  2008-11-22  1:51                     ` Jeff Layton
  1 sibling, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2008-11-21 23:53 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, Dave Kleikamp, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, Nov 21, 2008 at 05:50:17PM -0500, Jeff Layton wrote:
> On Fri, 21 Nov 2008 14:38:18 -0600
> "Steve French" <smfrench@gmail.com> wrote:
> 
> > Fix attached.
> > 
> > Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> > 
> 
> Talking with Steve on IRC, we thought it might be better to optimize
> away the read when possible. I think this patch should do it. We skip
> the read if the write starts past the current end of the file, or if
> the offset into the page of the beginning of the write is 0 and we're
> writing past the current end of the file. In those situations we just
> zero out the rest of the page.
> 
> Combined patch inlined below. I also took the liberty of adding a page
> pointer to make the code look a little cleaner.
> 
> Thoughts?

You just have to be very careful when marking a page uptodate. This
is why I removed that earlier hunk.

1) the actual write may not cover as much space as we were told here.
2) the page no wlooks like this I think?

0                       offset+len          PAGE_CACHE_SIZE      
|---- uninitialized data ---|---- zeroes ---|

Then if you SetPageUptodate, if you are using the generic_mapping_read,
it can come and read the page even without locking it. If you are
not using the generic pagecache operations at all, then you could do
something like this if you are careful, but it still seems a bit
risky.

Or am I wrong about the data being uninitialized?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 23:25                     ` Jeff Layton
@ 2008-11-22  1:04                       ` Steve French
  2008-11-22  1:50                         ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Steve French @ 2008-11-22  1:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Dave Kleikamp, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

Don't we also have to add the check for
if (mapping->host->clientCanCacheRead) before we decide to skip doing
the cifs_readpage_worker ... otherwise we aren't 100% sure of the file
size on the server (although it is probably right since we just did
revalidate)

so something like

 if (mapping->host->clientCanCacheRead && (page_start >= i_size ||
(offset == 0 && (pos + len) >= i_size)))

On Fri, Nov 21, 2008 at 5:25 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Fri, 21 Nov 2008 23:02:48 +0000
> Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
>>
>> I don't think the (pos >= i_size) part of this test is correct.  We
>> would still need to preserve the data between the start of the page and
>> i_size.
>>
>> The second part of the test makes sense.
>>
>
> Thanks Dave, good catch.
>
> I think this is what we want. Skip the read if the entire page is
> beyond the end of file, or if the write will clobber all of the data
> that would be read in.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> Signed-off-by: Steve French <smfrench@us.ibm.com>
> ---
>  fs/cifs/file.c |   42 ++++++++++++++++++++++++++++++------------
>  1 files changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index b691b89..0c7ef2a 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2062,26 +2062,43 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>  {
>        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
>        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> +       loff_t page_start = pos & PAGE_MASK;
> +       loff_t i_size;
> +       struct page *page;
> +       char *buf;
> +       int rc = 0;
>
>        cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
>
> -       *pagep = __grab_cache_page(mapping, index);
> -       if (!*pagep)
> -               return -ENOMEM;
> +       page = __grab_cache_page(mapping, index);
> +       if (!page) {
> +               rc = -ENOMEM;
> +               goto out;
> +       }
>
> -       if (PageUptodate(*pagep))
> -               return 0;
> +       if (PageUptodate(page))
> +               goto out;
>
>        /* If we are writing a full page it will be up to date,
>           no need to read from the server */
>        if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
> -               return 0;
> -
> -       if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
> -               int rc;
> +               goto out;
>
> +       i_size = i_size_read(page->mapping->host);
> +       if (page_start >= i_size || (offset == 0 && (pos + len) >= i_size)) {
> +               /*
> +                * optimize away the read when the unwritten part of the page
> +                * isn't expected to have any existing file data. Just zero
> +                * out the unused parts.
> +                */
> +               buf = kmap(page);
> +               memset(buf, 0, offset);
> +               memset(buf + offset + len, 0, PAGE_CACHE_SIZE - (offset + len));
> +               kunmap(page);
> +               SetPageUptodate(page);
> +       } else if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
>                /* might as well read a page, it is fast enough */
> -               rc = cifs_readpage_worker(file, *pagep, &offset);
> +               cifs_readpage_worker(file, page, &page_start);
>
>                /* we do not need to pass errors back
>                   e.g. if we do not have read access to the file
> @@ -2093,8 +2110,9 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>                   racing with this read? In any case
>                   this will be written out by write_end so is fine */
>        }
> -
> -       return 0;
> +out:
> +       *pagep = page;
> +       return rc;
>  }
>
>  const struct address_space_operations cifs_addr_ops = {
> --
> 1.5.5.1
>
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22  1:04                       ` Steve French
@ 2008-11-22  1:50                         ` Jeff Layton
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Layton @ 2008-11-22  1:50 UTC (permalink / raw)
  To: Steve French
  Cc: Dave Kleikamp, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 21 Nov 2008 19:04:04 -0600
"Steve French" <smfrench@gmail.com> wrote:

> Don't we also have to add the check for
> if (mapping->host->clientCanCacheRead) before we decide to skip doing
> the cifs_readpage_worker ... otherwise we aren't 100% sure of the file
> size on the server (although it is probably right since we just did
> revalidate)
> 
> so something like
> 
>  if (mapping->host->clientCanCacheRead && (page_start >= i_size ||
> (offset == 0 && (pos + len) >= i_size)))
> 

That certainly would be safer. I'll plan to add that in once I'm more
clear on the page flag handling here...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-21 23:53                   ` Nick Piggin
@ 2008-11-22  1:51                     ` Jeff Layton
  2008-11-22  2:02                       ` Steve French
  2008-11-26 13:02                       ` Nick Piggin
  0 siblings, 2 replies; 38+ messages in thread
From: Jeff Layton @ 2008-11-22  1:51 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Steve French, Dave Kleikamp, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Sat, 22 Nov 2008 00:53:46 +0100
Nick Piggin <npiggin@suse.de> wrote:

> On Fri, Nov 21, 2008 at 05:50:17PM -0500, Jeff Layton wrote:
> > On Fri, 21 Nov 2008 14:38:18 -0600
> > "Steve French" <smfrench@gmail.com> wrote:
> > 
> > > Fix attached.
> > > 
> > > Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> > > 
> > 
> > Talking with Steve on IRC, we thought it might be better to optimize
> > away the read when possible. I think this patch should do it. We skip
> > the read if the write starts past the current end of the file, or if
> > the offset into the page of the beginning of the write is 0 and we're
> > writing past the current end of the file. In those situations we just
> > zero out the rest of the page.
> > 
> > Combined patch inlined below. I also took the liberty of adding a page
> > pointer to make the code look a little cleaner.
> > 
> > Thoughts?
> 
> You just have to be very careful when marking a page uptodate. This
> is why I removed that earlier hunk.
> 
> 1) the actual write may not cover as much space as we were told here.

Under what circumstances would that occur? The places where write_begin
and write_end get called look like they use the same lengths...

> 2) the page no wlooks like this I think?
> 
> 0                       offset+len          PAGE_CACHE_SIZE      
> |---- uninitialized data ---|---- zeroes ---|
> 
> Then if you SetPageUptodate, if you are using the generic_mapping_read,
> it can come and read the page even without locking it. If you are
> not using the generic pagecache operations at all, then you could do
> something like this if you are careful, but it still seems a bit
> risky.
> 
> Or am I wrong about the data being uninitialized?

Bear with me here -- page flag handling makes me dizzy...

The latest patch that I sent only skips the read if:

1) the page lies beyond the current end of the file

2) if the page sits on top of the end of the file and the write would
overwrite all of the data that we would read in.

The first part should be OK I think. A pagecache read should not be
able to go beyond EOF, should it? The second one is a problem. Ideally
we'd like to be able to get away w/o doing a read in that case, but I
guess we'd have to delay setting the page uptodate until after the
write data has been copied to it. cifs_write_end seems to expect
that the page is uptodate when it gets it though.

For now, we can probably do the safe thing and perform a read in that
case, but it would certainly be nice to avoid it. Is there some way that
we can?

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22  1:51                     ` Jeff Layton
@ 2008-11-22  2:02                       ` Steve French
  2008-11-22  4:47                         ` Dave Kleikamp
  2008-11-26 13:02                       ` Nick Piggin
  1 sibling, 1 reply; 38+ messages in thread
From: Steve French @ 2008-11-22  2:02 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Nick Piggin, Dave Kleikamp, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, Nov 21, 2008 at 7:51 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Sat, 22 Nov 2008 00:53:46 +0100
> Nick Piggin <npiggin@suse.de> wrote:
>
>> On Fri, Nov 21, 2008 at 05:50:17PM -0500, Jeff Layton wrote:
>> > On Fri, 21 Nov 2008 14:38:18 -0600
>> > "Steve French" <smfrench@gmail.com> wrote:
>> >
>> > > Fix attached.
>> > >
>> > > Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
>> > >
>> >
>> > Talking with Steve on IRC, we thought it might be better to optimize
>> > away the read when possible. I think this patch should do it. We skip
>> > the read if the write starts past the current end of the file, or if
>> > the offset into the page of the beginning of the write is 0 and we're
>> > writing past the current end of the file. In those situations we just
>> > zero out the rest of the page.
>> >
>> > Combined patch inlined below. I also took the liberty of adding a page
>> > pointer to make the code look a little cleaner.
>> >
>> > Thoughts?
>>
>> You just have to be very careful when marking a page uptodate. This
>> is why I removed that earlier hunk.
>>
>> 1) the actual write may not cover as much space as we were told here.
>
> Under what circumstances would that occur? The places where write_begin
> and write_end get called look like they use the same lengths...
>
>> 2) the page no wlooks like this I think?
>>
>> 0                       offset+len          PAGE_CACHE_SIZE
>> |---- uninitialized data ---|---- zeroes ---|
>>
>> Then if you SetPageUptodate, if you are using the generic_mapping_read,
>> it can come and read the page even without locking it. If you are
>> not using the generic pagecache operations at all, then you could do
>> something like this if you are careful, but it still seems a bit
>> risky.
>>
>> Or am I wrong about the data being uninitialized?
>
> Bear with me here -- page flag handling makes me dizzy...
>
> The latest patch that I sent only skips the read if:
>
> 1) the page lies beyond the current end of the file
>
> 2) if the page sits on top of the end of the file and the write would
> overwrite all of the data that we would read in.
>
> The first part should be OK I think. A pagecache read should not be
> able to go beyond EOF, should it? The second one is a problem. Ideally
> we'd like to be able to get away w/o doing a read in that case, but I
> guess we'd have to delay setting the page uptodate until after the
> write data has been copied to it. cifs_write_end seems to expect
> that the page is uptodate when it gets it though.
If cifs_write_end gets a page that is not uptodate (e.g. if we don't
have readaccess to the file) then it could be really slowww ...
because write end will have to send each write to the server
individually (even if the app is only 1 byte at a time) ... I don't
mind threads racing with each other doing reads/writes as long as they
get valid data ... but wish we could mark them up to date when we
finish copying the write data into them so it wouldn't be so slow

But ... we also need to be checking to make sure we can cache reads
(at least) since otherwise we could have an i_size that is up to 1
second old


Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22  2:02                       ` Steve French
@ 2008-11-22  4:47                         ` Dave Kleikamp
  2008-11-22 15:39                           ` [linux-cifs-client] " Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Kleikamp @ 2008-11-22  4:47 UTC (permalink / raw)
  To: Steve French
  Cc: Jeff Layton, Nick Piggin, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

On Fri, 2008-11-21 at 20:02 -0600, Steve French wrote:
> On Fri, Nov 21, 2008 at 7:51 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > On Sat, 22 Nov 2008 00:53:46 +0100
> > Nick Piggin <npiggin@suse.de> wrote:
> >
> >> On Fri, Nov 21, 2008 at 05:50:17PM -0500, Jeff Layton wrote:
i
> >> > Thoughts?
> >>
> >> You just have to be very careful when marking a page uptodate. This
> >> is why I removed that earlier hunk.
> >>
> >> 1) the actual write may not cover as much space as we were told here.
> >
> > Under what circumstances would that occur? The places where write_begin
> > and write_end get called look like they use the same lengths...
> >
> >> 2) the page no wlooks like this I think?
> >>
> >> 0                       offset+len          PAGE_CACHE_SIZE
> >> |---- uninitialized data ---|---- zeroes ---|
> >>
> >> Then if you SetPageUptodate, if you are using the generic_mapping_read,
> >> it can come and read the page even without locking it. If you are
> >> not using the generic pagecache operations at all, then you could do
> >> something like this if you are careful, but it still seems a bit
> >> risky.
> >>
> >> Or am I wrong about the data being uninitialized?
> >
> > Bear with me here -- page flag handling makes me dizzy...
> >
> > The latest patch that I sent only skips the read if:
> >
> > 1) the page lies beyond the current end of the file
> >
> > 2) if the page sits on top of the end of the file and the write would
> > overwrite all of the data that we would read in.
> >
> > The first part should be OK I think. A pagecache read should not be
> > able to go beyond EOF, should it? The second one is a problem. Ideally
> > we'd like to be able to get away w/o doing a read in that case, but I
> > guess we'd have to delay setting the page uptodate until after the
> > write data has been copied to it. cifs_write_end seems to expect
> > that the page is uptodate when it gets it though.
> If cifs_write_end gets a page that is not uptodate (e.g. if we don't
> have readaccess to the file) then it could be really slowww ...
> because write end will have to send each write to the server
> individually (even if the app is only 1 byte at a time) ... I don't
> mind threads racing with each other doing reads/writes as long as they
> get valid data ... but wish we could mark them up to date when we
> finish copying the write data into them so it wouldn't be so slow

Nick's point is that it isn't really safe to mark the page Uptodate
until cifs_write_end(), since the data being overwritten hasn't yet
been.

How about cifs_write_begin() marking the page PageChecked instead of
PageUptodate, of course leaving it PageUptodate if it already is?  Then
cifs_write_end() only has to handle the slow path when the page is
neither PageUptodate or PageChecked.

PageChecked == PG_owner_priv_1, which is reserved for whatever the
filesystem wants to use it for.

> But ... we also need to be checking to make sure we can cache reads
> (at least) since otherwise we could have an i_size that is up to 1
> second old
> 
> 
> Thanks,
> 
> Steve
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22  4:47                         ` Dave Kleikamp
@ 2008-11-22 15:39                           ` Jeff Layton
  2008-11-22 20:27                             ` Dave Kleikamp
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-22 15:39 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Steve French, Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Fri, 21 Nov 2008 22:47:09 -0600
Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:

> Nick's point is that it isn't really safe to mark the page Uptodate
> until cifs_write_end(), since the data being overwritten hasn't yet
> been.
> 
> How about cifs_write_begin() marking the page PageChecked instead of
> PageUptodate, of course leaving it PageUptodate if it already is?  Then
> cifs_write_end() only has to handle the slow path when the page is
> neither PageUptodate or PageChecked.
> 
> PageChecked == PG_owner_priv_1, which is reserved for whatever the
> filesystem wants to use it for.
> 

Right -- Nick's point about the uptodate flag was definitely correct,
though I'm not sure I understand when we'd get less data eventually
copied to the page than we expected in write_begin...

I had the same thought last night about using a different flag. I was
considering PG_private, but PG_checked sounds like a better choice.
Thoughts on this patch? I've run some basic regression tests against
it and it seems to work.

---------------------[snip]-------------------
>From 2ac44918f498f722ccfd3e5293ad0b966a9261ca Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Sat, 22 Nov 2008 10:12:49 -0500
Subject: [PATCH] cifs: fix regression in cifs_write_begin/cifs_write_end

The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.

It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by co-opting the PG_checked flag and
using that in place of PG_uptodate to indicate whether cifs_write_end
should do a sync write.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@us.ibm.com>
---
 fs/cifs/file.c |   83 +++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 61 insertions(+), 22 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..f890e08 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1475,10 +1475,11 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
 		 page, pos, copied));
 
-	if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
-		SetPageUptodate(page);
-
-	if (!PageUptodate(page)) {
+	/*
+	 * if PG_checked is set, then the parts of the page that weren't
+	 * copied over are expected to be up to date.
+	 */
+	if (!PageChecked(page)) {
 		char *page_data;
 		unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
 		int xid;
@@ -1496,6 +1497,8 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 
 		FreeXid(xid);
 	} else {
+		SetPageUptodate(page);
+		ClearPageChecked(page);
 		rc = copied;
 		pos += copied;
 		set_page_dirty(page);
@@ -2056,45 +2059,81 @@ bool is_size_safe_to_change(struct cifsInodeInfo *cifsInode, __u64 end_of_file)
 		return true;
 }
 
+/*
+ * Prepare a page for writing. cifs_write_end expects that the page will have
+ * PG_checked set if the unwritten parts of the page are up to date.
+ */
 static int cifs_write_begin(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata)
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t page_start = pos & PAGE_MASK;
+	loff_t i_size;
+	struct inode *inode;
+	struct page *page;
+	char *page_data;
+	int rc = 0;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
-	*pagep = __grab_cache_page(mapping, index);
-	if (!*pagep)
-		return -ENOMEM;
+	page = __grab_cache_page(mapping, index);
+	if (!page) {
+		rc = -ENOMEM;
+		goto out;
+	}
 
-	if (PageUptodate(*pagep))
-		return 0;
+	if (PageUptodate(page)) {
+		SetPageChecked(page);
+		goto out;
+	}
 
 	/* If we are writing a full page it will be up to date,
 	   no need to read from the server */
-	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
-		return 0;
-
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-		int rc;
+	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE) {
+		SetPageChecked(page);
+		goto out;
+	}
 
-		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+	/*
+	 * optimize away the read when we have an oplock, and we're not
+	 * expecting to use any of the data we'd be reading in. That is, when
+	 * the page lies beyond the EOF, or straddles the EOF and the write
+	 * would cover all of the existing data in the file.
+	 */
+	inode = page->mapping->host;
+	if (CIFS_I(inode)->clientCanCacheRead) {
+		i_size = i_size_read(inode);
+		if (page_start >= i_size ||
+		    (offset == 0 && (pos + len) >= i_size)) {
+			page_data = kmap(page);
+			memset(page_data, 0, offset);
+			memset(page_data + offset + len, 0,
+			       PAGE_CACHE_SIZE - (offset + len));
+			kunmap(page);
+			SetPageChecked(page);
+			goto out;
+		}
+	}
 
-		/* we do not need to pass errors back
-		   e.g. if we do not have read access to the file
-		   because cifs_write_end will attempt synchronous writes
-		   -- shaggy */
+	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+		/*
+		 * might as well read a page, it is fast enough. If we get
+		 * an error, we don't need to return it. cifs_write_end can
+		 * do a sync write instead.
+		 */
+		if (!cifs_readpage_worker(file, page, &page_start))
+			SetPageChecked(page);
 	} else {
 		/* we could try using another file handle if there is one -
 		   but how would we lock it to prevent close of that handle
 		   racing with this read? In any case
 		   this will be written out by write_end so is fine */
 	}
-
-	return 0;
+out:
+	*pagep = page;
+	return rc;
 }
 
 const struct address_space_operations cifs_addr_ops = {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22 15:39                           ` [linux-cifs-client] " Jeff Layton
@ 2008-11-22 20:27                             ` Dave Kleikamp
  2008-11-23 11:57                               ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Kleikamp @ 2008-11-22 20:27 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Sat, 2008-11-22 at 10:39 -0500, Jeff Layton wrote:
> On Fri, 21 Nov 2008 22:47:09 -0600
> Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
> 
> > Nick's point is that it isn't really safe to mark the page Uptodate
> > until cifs_write_end(), since the data being overwritten hasn't yet
> > been.
> > 
> > How about cifs_write_begin() marking the page PageChecked instead of
> > PageUptodate, of course leaving it PageUptodate if it already is?  Then
> > cifs_write_end() only has to handle the slow path when the page is
> > neither PageUptodate or PageChecked.
> > 
> > PageChecked == PG_owner_priv_1, which is reserved for whatever the
> > filesystem wants to use it for.
> > 
> 
> Right -- Nick's point about the uptodate flag was definitely correct,
> though I'm not sure I understand when we'd get less data eventually
> copied to the page than we expected in write_begin...
> 
> I had the same thought last night about using a different flag. I was
> considering PG_private, but PG_checked sounds like a better choice.
> Thoughts on this patch? I've run some basic regression tests against
> it and it seems to work.

Just took a real quick look, but I have one comment.  I'd leave the page
alone if it's already PageUptodate.  I wouldn't bother setting
PageChecked and let cifs_write_end leave it alone when its already
uptodate.
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22 20:27                             ` Dave Kleikamp
@ 2008-11-23 11:57                               ` Jeff Layton
  2008-11-24  2:32                                 ` Steve French
  2008-11-24 20:00                                 ` Dave Kleikamp
  0 siblings, 2 replies; 38+ messages in thread
From: Jeff Layton @ 2008-11-23 11:57 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Steve French, Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Sat, 22 Nov 2008 14:27:44 -0600
Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:

> Just took a real quick look, but I have one comment.  I'd leave the page
> alone if it's already PageUptodate.  I wouldn't bother setting
> PageChecked and let cifs_write_end leave it alone when its already
> uptodate.

Ok, it makes sense not to flip page bits unless we need to. How's this
instead? It also switches the code to use zero_user_segments() instead
of calling kmap/memset directly.

I've dropped Steve's SoB for now since this has morphed quite a bit from
his original patch. Steve, feel free to add it back if you approve.

--------------------[snip]--------------------

>From da222d5125ec4095f10e9a4c984b6bd6b0cacf18 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Sun, 23 Nov 2008 06:54:42 -0500
Subject: [PATCH] cifs: fix regression in cifs_write_begin/cifs_write_end

The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.

It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/file.c |   68 +++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..9114ea5 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1475,7 +1475,10 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
 		 page, pos, copied));
 
-	if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
+	if (PageChecked(page)) {
+		SetPageUptodate(page);
+		ClearPageChecked(page);
+	} else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
 		SetPageUptodate(page);
 
 	if (!PageUptodate(page)) {
@@ -2062,39 +2065,68 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t page_start = pos & PAGE_MASK;
+	loff_t i_size;
+	struct inode *inode;
+	struct page *page;
+	int rc = 0;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
-	*pagep = __grab_cache_page(mapping, index);
-	if (!*pagep)
-		return -ENOMEM;
+	page = __grab_cache_page(mapping, index);
+	if (!page) {
+		rc = -ENOMEM;
+		goto out;
+	}
 
-	if (PageUptodate(*pagep))
-		return 0;
+	if (PageUptodate(page))
+		goto out;
 
 	/* If we are writing a full page it will be up to date,
 	   no need to read from the server */
 	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
-		return 0;
-
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-		int rc;
+		goto out;
 
-		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+	/*
+	 * optimize away the read when we have an oplock, and we're not
+	 * expecting to use any of the data we'd be reading in. That is, when
+	 * the page lies beyond the EOF, or straddles the EOF and the write
+	 * would cover all of the existing data in the file.
+	 */
+	inode = page->mapping->host;
+	if (CIFS_I(inode)->clientCanCacheRead) {
+		i_size = i_size_read(inode);
+		if (page_start >= i_size ||
+		    (offset == 0 && (pos + len) >= i_size)) {
+			zero_user_segments(page, 0, offset, offset + len,
+					   PAGE_CACHE_SIZE);
+			/*
+			 * PageChecked means that the parts of the page to
+			 * which we're not writing are considered up to date.
+			 * Once the write data is copied to the page, it can
+			 * be set uptodate.
+			 */
+			SetPageChecked(page);
+			goto out;
+		}
+	}
 
-		/* we do not need to pass errors back
-		   e.g. if we do not have read access to the file
-		   because cifs_write_end will attempt synchronous writes
-		   -- shaggy */
+	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+		/*
+		 * might as well read a page, it is fast enough. If we get
+		 * an error, we don't need to return it. cifs_write_end will
+		 * do a sync write instead since PG_uptodate isn't set.
+		 */
+		cifs_readpage_worker(file, page, &page_start);
 	} else {
 		/* we could try using another file handle if there is one -
 		   but how would we lock it to prevent close of that handle
 		   racing with this read? In any case
 		   this will be written out by write_end so is fine */
 	}
-
-	return 0;
+out:
+	*pagep = page;
+	return rc;
 }
 
 const struct address_space_operations cifs_addr_ops = {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-23 11:57                               ` Jeff Layton
@ 2008-11-24  2:32                                 ` Steve French
  2008-11-24 11:19                                   ` [linux-cifs-client] " Jeff Layton
  2008-11-24 20:00                                 ` Dave Kleikamp
  1 sibling, 1 reply; 38+ messages in thread
From: Steve French @ 2008-11-24  2:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

Looks fine to me, you can add in the acked-by ... if Dave or Nick
agree and it tests out ok then I will push to cifs-2.6.git within next
day or so.

On Sun, Nov 23, 2008 at 5:57 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Sat, 22 Nov 2008 14:27:44 -0600
> Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
>
>> Just took a real quick look, but I have one comment.  I'd leave the page
>> alone if it's already PageUptodate.  I wouldn't bother setting
>> PageChecked and let cifs_write_end leave it alone when its already
>> uptodate.
>
> Ok, it makes sense not to flip page bits unless we need to. How's this
> instead? It also switches the code to use zero_user_segments() instead
> of calling kmap/memset directly.
>
> I've dropped Steve's SoB for now since this has morphed quite a bit from
> his original patch. Steve, feel free to add it back if you approve.
>
> --------------------[snip]--------------------
>
> From da222d5125ec4095f10e9a4c984b6bd6b0cacf18 Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@redhat.com>
> Date: Sun, 23 Nov 2008 06:54:42 -0500
> Subject: [PATCH] cifs: fix regression in cifs_write_begin/cifs_write_end
>
> The conversion to write_begin/write_end interfaces had a bug where we
> were passing a bad parameter to cifs_readpage_worker. Rather than
> passing the page offset of the start of the write, we needed to pass the
> offset of the beginning of the page. This was reliably showing up as
> data corruption in the fsx-linux test from LTP.
>
> It also became evident that this code was occasionally doing unnecessary
> read calls. Optimize those away by using the PG_checked flag to indicate
> that the unwritten part of the page has been initialized.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/cifs/file.c |   68 +++++++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 50 insertions(+), 18 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index b691b89..9114ea5 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1475,7 +1475,10 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
>        cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
>                 page, pos, copied));
>
> -       if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
> +       if (PageChecked(page)) {
> +               SetPageUptodate(page);
> +               ClearPageChecked(page);
> +       } else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
>                SetPageUptodate(page);
>
>        if (!PageUptodate(page)) {
> @@ -2062,39 +2065,68 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
>  {
>        pgoff_t index = pos >> PAGE_CACHE_SHIFT;
>        loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
> +       loff_t page_start = pos & PAGE_MASK;
> +       loff_t i_size;
> +       struct inode *inode;
> +       struct page *page;
> +       int rc = 0;
>
>        cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
>
> -       *pagep = __grab_cache_page(mapping, index);
> -       if (!*pagep)
> -               return -ENOMEM;
> +       page = __grab_cache_page(mapping, index);
> +       if (!page) {
> +               rc = -ENOMEM;
> +               goto out;
> +       }
>
> -       if (PageUptodate(*pagep))
> -               return 0;
> +       if (PageUptodate(page))
> +               goto out;
>
>        /* If we are writing a full page it will be up to date,
>           no need to read from the server */
>        if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
> -               return 0;
> -
> -       if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
> -               int rc;
> +               goto out;
>
> -               /* might as well read a page, it is fast enough */
> -               rc = cifs_readpage_worker(file, *pagep, &offset);
> +       /*
> +        * optimize away the read when we have an oplock, and we're not
> +        * expecting to use any of the data we'd be reading in. That is, when
> +        * the page lies beyond the EOF, or straddles the EOF and the write
> +        * would cover all of the existing data in the file.
> +        */
> +       inode = page->mapping->host;
> +       if (CIFS_I(inode)->clientCanCacheRead) {
> +               i_size = i_size_read(inode);
> +               if (page_start >= i_size ||
> +                   (offset == 0 && (pos + len) >= i_size)) {
> +                       zero_user_segments(page, 0, offset, offset + len,
> +                                          PAGE_CACHE_SIZE);
> +                       /*
> +                        * PageChecked means that the parts of the page to
> +                        * which we're not writing are considered up to date.
> +                        * Once the write data is copied to the page, it can
> +                        * be set uptodate.
> +                        */
> +                       SetPageChecked(page);
> +                       goto out;
> +               }
> +       }
>
> -               /* we do not need to pass errors back
> -                  e.g. if we do not have read access to the file
> -                  because cifs_write_end will attempt synchronous writes
> -                  -- shaggy */
> +       if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
> +               /*
> +                * might as well read a page, it is fast enough. If we get
> +                * an error, we don't need to return it. cifs_write_end will
> +                * do a sync write instead since PG_uptodate isn't set.
> +                */
> +               cifs_readpage_worker(file, page, &page_start);
>        } else {
>                /* we could try using another file handle if there is one -
>                   but how would we lock it to prevent close of that handle
>                   racing with this read? In any case
>                   this will be written out by write_end so is fine */
>        }
> -
> -       return 0;
> +out:
> +       *pagep = page;
> +       return rc;
>  }
>
>  const struct address_space_operations cifs_addr_ops = {
> --
> 1.5.5.1
>
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-24  2:32                                 ` Steve French
@ 2008-11-24 11:19                                   ` Jeff Layton
  2008-11-26  4:04                                     ` Steve French
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-24 11:19 UTC (permalink / raw)
  To: Steve French
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Sun, 23 Nov 2008 20:32:05 -0600
"Steve French" <smfrench@gmail.com> wrote:

> Looks fine to me, you can add in the acked-by ... if Dave or Nick
> agree and it tests out ok then I will push to cifs-2.6.git within next
> day or so.
> 

Testing seems fine with this patch. Connectathon suite still passes,
and fsx and fsstress both ran overnight without errors.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-23 11:57                               ` Jeff Layton
  2008-11-24  2:32                                 ` Steve French
@ 2008-11-24 20:00                                 ` Dave Kleikamp
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Kleikamp @ 2008-11-24 20:00 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Sun, 2008-11-23 at 06:57 -0500, Jeff Layton wrote:
> On Sat, 22 Nov 2008 14:27:44 -0600
> Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
> 
> > Just took a real quick look, but I have one comment.  I'd leave the page
> > alone if it's already PageUptodate.  I wouldn't bother setting
> > PageChecked and let cifs_write_end leave it alone when its already
> > uptodate.
> 
> Ok, it makes sense not to flip page bits unless we need to. How's this
> instead? It also switches the code to use zero_user_segments() instead
> of calling kmap/memset directly.

Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

> 
> I've dropped Steve's SoB for now since this has morphed quite a bit from
> his original patch. Steve, feel free to add it back if you approve.

-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-24 11:19                                   ` [linux-cifs-client] " Jeff Layton
@ 2008-11-26  4:04                                     ` Steve French
  2008-11-26 11:54                                       ` Jeff Layton
  2008-11-26 12:11                                       ` Jeff Layton
  0 siblings, 2 replies; 38+ messages in thread
From: Steve French @ 2008-11-26  4:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

Do you know why someone added the AOP_FLAG_UNITERRUPTIBLE check in
cifs_write_begin instead of always marking a page up to date if we are
writing the whole page?  How often would that flag be set - I only see
it in the path which calls generic_file_buffered_write

 	/* If we are writing a full page it will be up to date,
 	   no need to read from the server */
 	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)

It seems restrictive
On Mon, Nov 24, 2008 at 5:19 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Sun, 23 Nov 2008 20:32:05 -0600
> "Steve French" <smfrench@gmail.com> wrote:
>
>> Looks fine to me, you can add in the acked-by ... if Dave or Nick
>> agree and it tests out ok then I will push to cifs-2.6.git within next
>> day or so.
>>
>
> Testing seems fine with this patch. Connectathon suite still passes,
> and fsx and fsstress both ran overnight without errors.
>
> --
> Jeff Layton <jlayton@redhat.com>
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26  4:04                                     ` Steve French
@ 2008-11-26 11:54                                       ` Jeff Layton
  2008-11-26 12:11                                       ` Jeff Layton
  1 sibling, 0 replies; 38+ messages in thread
From: Jeff Layton @ 2008-11-26 11:54 UTC (permalink / raw)
  To: Steve French
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Tue, 25 Nov 2008 22:04:04 -0600
"Steve French" <smfrench@gmail.com> wrote:

> Do you know why someone added the AOP_FLAG_UNITERRUPTIBLE check in
> cifs_write_begin instead of always marking a page up to date if we are
> writing the whole page?  How often would that flag be set - I only see
> it in the path which calls generic_file_buffered_write
> 
>  	/* If we are writing a full page it will be up to date,
>  	   no need to read from the server */
>  	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
> 
> It seems restrictive

Good Q. This article is informative...

http://lwn.net/Articles/254856/

...so I take this to mean that when this flag isn't set you might be
able to interrupt the copy of the data to the page, and leave part of
it in an uninitialized state.

Hmm...given that, we should probably not optimize the read away at all
if that flag isn't set. Let me see about respinning that patch...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26  4:04                                     ` Steve French
  2008-11-26 11:54                                       ` Jeff Layton
@ 2008-11-26 12:11                                       ` Jeff Layton
  2008-11-26 13:09                                         ` [linux-cifs-client] " Nick Piggin
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-26 12:11 UTC (permalink / raw)
  To: Steve French
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Tue, 25 Nov 2008 22:04:04 -0600
"Steve French" <smfrench@gmail.com> wrote:

> Do you know why someone added the AOP_FLAG_UNITERRUPTIBLE check in
> cifs_write_begin instead of always marking a page up to date if we are
> writing the whole page?  How often would that flag be set - I only see
> it in the path which calls generic_file_buffered_write
> 
>  	/* If we are writing a full page it will be up to date,
>  	   no need to read from the server */
>  	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
> 
> It seems restrictive

Given that I now understand what AOP_FLAG_UNINTERRUPTIBLE is supposed
to do, this patch is probably what we need. Running tests on it now.

-------------[snip]------------

From 366afe94ecfb314f3126bc9a668b6be61469089f Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Wed, 26 Nov 2008 07:01:33 -0500
Subject: [PATCH] cifs: fix regression in cifs_write_begin/cifs_write_end

The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.

It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/file.c |   80 +++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..37567c9 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1475,7 +1475,10 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
 		 page, pos, copied));
 
-	if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
+	if (PageChecked(page)) {
+		SetPageUptodate(page);
+		ClearPageChecked(page);
+	} else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
 		SetPageUptodate(page);
 
 	if (!PageUptodate(page)) {
@@ -2062,39 +2065,74 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t page_start = pos & PAGE_MASK;
+	loff_t i_size;
+	struct inode *inode;
+	struct page *page;
+	int rc = 0;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
-	*pagep = __grab_cache_page(mapping, index);
-	if (!*pagep)
-		return -ENOMEM;
-
-	if (PageUptodate(*pagep))
-		return 0;
+	page = __grab_cache_page(mapping, index);
+	if (!page) {
+		rc = -ENOMEM;
+		goto out;
+	}
 
-	/* If we are writing a full page it will be up to date,
-	   no need to read from the server */
-	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
-		return 0;
+	if (PageUptodate(page))
+		goto out;
 
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-		int rc;
+	/* only optimize away the read if the copy's not interruptible */
+	if (flags & AOP_FLAG_UNINTERRUPTIBLE) {
+		/*
+		 * If we are writing a full page it will be up to date,
+		 * no need to read from the server
+		 */
+		if (len == PAGE_CACHE_SIZE)
+			goto out;
 
-		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+		/*
+		 * optimize away the read when we have an oplock, and we're not
+		 * expecting to use any of the data we'd be reading in. That
+		 * is, when the page lies beyond the EOF, or straddles the EOF
+		 * and the write will cover all of the existing data.
+		 */
+		inode = page->mapping->host;
+		if (CIFS_I(inode)->clientCanCacheRead) {
+			i_size = i_size_read(inode);
+			if (page_start >= i_size ||
+			    (offset == 0 && (pos + len) >= i_size)) {
+				zero_user_segments(page, 0, offset,
+						   offset + len,
+						   PAGE_CACHE_SIZE);
+				/*
+				 * PageChecked means that the parts of the page
+				 * to which we're not writing are considered up
+				 * to date. Once the data is copied to the
+				 * page, it can be set uptodate.
+				 */
+				SetPageChecked(page);
+				goto out;
+			}
+		}
+	}
 
-		/* we do not need to pass errors back
-		   e.g. if we do not have read access to the file
-		   because cifs_write_end will attempt synchronous writes
-		   -- shaggy */
+	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+		/*
+		 * might as well read a page, it is fast enough. If we get
+		 * an error, we don't need to return it. cifs_write_end will
+		 * do a sync write instead since PG_uptodate isn't set.
+		 */
+		cifs_readpage_worker(file, page, &page_start);
 	} else {
 		/* we could try using another file handle if there is one -
 		   but how would we lock it to prevent close of that handle
 		   racing with this read? In any case
 		   this will be written out by write_end so is fine */
 	}
-
-	return 0;
+out:
+	*pagep = page;
+	return rc;
 }
 
 const struct address_space_operations cifs_addr_ops = {
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-22  1:51                     ` Jeff Layton
  2008-11-22  2:02                       ` Steve French
@ 2008-11-26 13:02                       ` Nick Piggin
  1 sibling, 0 replies; 38+ messages in thread
From: Nick Piggin @ 2008-11-26 13:02 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, Dave Kleikamp, pbadari, linux-fsdevel,
	linux-cifs-client@lists.samba.org

Guys, sorry for the late reply. Had a lot of travelling to do...

I'll comment here because you raise most of your questions here, but at
the other end of the thread, it looks like you have a pretty nice patch
so I'll check if I can make any more comments on that.


On Fri, Nov 21, 2008 at 08:51:51PM -0500, Jeff Layton wrote:
> On Sat, 22 Nov 2008 00:53:46 +0100
> Nick Piggin <npiggin@suse.de> wrote:
> 
> > On Fri, Nov 21, 2008 at 05:50:17PM -0500, Jeff Layton wrote:
> > > On Fri, 21 Nov 2008 14:38:18 -0600
> > > "Steve French" <smfrench@gmail.com> wrote:
> > > 
> > > > Fix attached.
> > > > 
> > > > Shaggy/Jeff/Nick etc. do you want to review/ack it since it is late in the rc?
> > > > 
> > > 
> > > Talking with Steve on IRC, we thought it might be better to optimize
> > > away the read when possible. I think this patch should do it. We skip
> > > the read if the write starts past the current end of the file, or if
> > > the offset into the page of the beginning of the write is 0 and we're
> > > writing past the current end of the file. In those situations we just
> > > zero out the rest of the page.
> > > 
> > > Combined patch inlined below. I also took the liberty of adding a page
> > > pointer to make the code look a little cleaner.
> > > 
> > > Thoughts?
> > 
> > You just have to be very careful when marking a page uptodate. This
> > is why I removed that earlier hunk.
> > 
> > 1) the actual write may not cover as much space as we were told here.
> 
> Under what circumstances would that occur? The places where write_begin
> and write_end get called look like they use the same lengths...

mm/filemap.c:generic_perform_write().

They both get the same lengths in their 4th arguments "I want to copy
this much data into the filesystem's pagecache", but write_end also
has a "how much was I actually able to copy this time" as its 5th
argument.

As to why it can happen, because copy_from_user could take a page fault
on the app's source address (eg. to write(2)). In order to complete the
copy, we must handle the page fault, but that has to take mmap_sem,
which gives an ABBA deadlock with the page lock (which we're holding).
And handling the page fault also might have to take another page lock
(another ABBA), or if we're really tricky, it might have to take the
same page lock (lock recursion deadlock).

So it does a copy_from_user_atomic, which can return a short copy.


> > 2) the page no wlooks like this I think?
> > 
> > 0                       offset+len          PAGE_CACHE_SIZE      
> > |---- uninitialized data ---|---- zeroes ---|
> > 
> > Then if you SetPageUptodate, if you are using the generic_mapping_read,
> > it can come and read the page even without locking it. If you are
> > not using the generic pagecache operations at all, then you could do
> > something like this if you are careful, but it still seems a bit
> > risky.
> > 
> > Or am I wrong about the data being uninitialized?
> 
> Bear with me here -- page flag handling makes me dizzy...
> 
> The latest patch that I sent only skips the read if:
> 
> 1) the page lies beyond the current end of the file
> 
> 2) if the page sits on top of the end of the file and the write would
> overwrite all of the data that we would read in.
> 
> The first part should be OK I think. A pagecache read should not be
> able to go beyond EOF, should it?

You _might_ be OK there, but it's not a great idea to SetPageUptodate
first ;) Aside from the problem of the short-copy, SetPageUptodate
actually has a memory barrier in it to ensure the data stored into the
page to bring it uptodate is actually visible before the PageUptodate
flag is. Again, if you are doing DMAs rather than cache coherent stores
to initialise the page, maybe you can get away without that barrier...
But it's just bad practice.


> The second one is a problem. Ideally
> we'd like to be able to get away w/o doing a read in that case, but I
> guess we'd have to delay setting the page uptodate until after the
> write data has been copied to it. cifs_write_end seems to expect
> that the page is uptodate when it gets it though.
> 
> For now, we can probably do the safe thing and perform a read in that
> case, but it would certainly be nice to avoid it. Is there some way that
> we can?

I can't remember the CIFS code very well, but in several of the new
aops conversions I did, I added something like a BUG_ON(!PageUptodate())
in the write_end methods to ensure I wasn't missing some key part of
the logic. It's entirely possible that cifs is almost ready to handle
a !uptodate page in write_end...


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26 12:11                                       ` Jeff Layton
@ 2008-11-26 13:09                                         ` Nick Piggin
  2008-11-26 15:08                                           ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2008-11-26 13:09 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Wed, Nov 26, 2008 at 07:11:46AM -0500, Jeff Layton wrote:
> On Tue, 25 Nov 2008 22:04:04 -0600
> "Steve French" <smfrench@gmail.com> wrote:
> 
> > Do you know why someone added the AOP_FLAG_UNITERRUPTIBLE check in
> > cifs_write_begin instead of always marking a page up to date if we are
> > writing the whole page?  How often would that flag be set - I only see
> > it in the path which calls generic_file_buffered_write
> > 
> >  	/* If we are writing a full page it will be up to date,
> >  	   no need to read from the server */
> >  	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
> > 
> > It seems restrictive
> 
> Given that I now understand what AOP_FLAG_UNINTERRUPTIBLE is supposed
> to do, this patch is probably what we need. Running tests on it now.

That seems pretty reasonable, although keep in mind that
AOP_FLAG_UNINTERRUPTIBLE is not going to be the common case (unless
you're running loop or nfsd or something on the filesystem).

It would be really nice to figure out a way to avoid the reads in
the interruptible case as well.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26 13:09                                         ` [linux-cifs-client] " Nick Piggin
@ 2008-11-26 15:08                                           ` Jeff Layton
  2008-11-26 15:23                                             ` Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-26 15:08 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Wed, 26 Nov 2008 14:09:43 +0100
Nick Piggin <npiggin@suse.de> wrote:


> As to why it can happen, because copy_from_user could take a page fault
> on the app's source address (eg. to write(2)).

Yep, I figured this out after stumbling across the LWN article.

> You _might_ be OK there, but it's not a great idea to SetPageUptodate
> first ;) Aside from the problem of the short-copy, SetPageUptodate
> actually has a memory barrier in it to ensure the data stored into the
> page to bring it uptodate is actually visible before the PageUptodate
> flag is. Again, if you are doing DMAs rather than cache coherent stores
> to initialise the page, maybe you can get away without that barrier...
> But it's just bad practice.
> 

Gotcha. I think the current patch takes care of this (we're using
PageChecked to indicate that the uninitialized parts of the page were
written to).

The problem I suppose is that we could end up getting a short write in
write_end. I guess this means that we need to modify the patch a bit
further and only set PageUptodate in write_end if copied == len.

> > Given that I now understand what AOP_FLAG_UNINTERRUPTIBLE is supposed
> > to do, this patch is probably what we need. Running tests on it now.
> 
> That seems pretty reasonable, although keep in mind that
> AOP_FLAG_UNINTERRUPTIBLE is not going to be the common case (unless
> you're running loop or nfsd or something on the filesystem).
> 
> It would be really nice to figure out a way to avoid the reads in
> the interruptible case as well.

True. For now though I think we need to start with slow and safe and see
if we can optimize it further later...

> I can't remember the CIFS code very well, but in several of the new
> aops conversions I did, I added something like a BUG_ON(!PageUptodate())
> in the write_end methods to ensure I wasn't missing some key part of
> the logic. It's entirely possible that cifs is almost ready to handle
> a !uptodate page in write_end...

Well, CIFS is "special". Rather than just updating the pagecache, we
can fall back to doing a sync write instead. So I don't think we want
to BUG if the page isn't up to date. It's not ideal, but I think it's a
situation we can deal with if necessary.

Steve, I think I may have (at least) one more respin coming your way...
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26 15:08                                           ` Jeff Layton
@ 2008-11-26 15:23                                             ` Nick Piggin
  2008-11-26 16:37                                               ` Jeff Layton
  2008-11-26 19:46                                               ` Steve French
  0 siblings, 2 replies; 38+ messages in thread
From: Nick Piggin @ 2008-11-26 15:23 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Wed, Nov 26, 2008 at 10:08:57AM -0500, Jeff Layton wrote:
> On Wed, 26 Nov 2008 14:09:43 +0100
> Nick Piggin <npiggin@suse.de> wrote:
> 
> 
> > As to why it can happen, because copy_from_user could take a page fault
> > on the app's source address (eg. to write(2)).
> 
> Yep, I figured this out after stumbling across the LWN article.
> 
> > You _might_ be OK there, but it's not a great idea to SetPageUptodate
> > first ;) Aside from the problem of the short-copy, SetPageUptodate
> > actually has a memory barrier in it to ensure the data stored into the
> > page to bring it uptodate is actually visible before the PageUptodate
> > flag is. Again, if you are doing DMAs rather than cache coherent stores
> > to initialise the page, maybe you can get away without that barrier...
> > But it's just bad practice.
> > 
> 
> Gotcha. I think the current patch takes care of this (we're using
> PageChecked to indicate that the uninitialized parts of the page were
> written to).
> 
> The problem I suppose is that we could end up getting a short write in
> write_end. I guess this means that we need to modify the patch a bit
> further and only set PageUptodate in write_end if copied == len.
> 
> > > Given that I now understand what AOP_FLAG_UNINTERRUPTIBLE is supposed
> > > to do, this patch is probably what we need. Running tests on it now.
> > 
> > That seems pretty reasonable, although keep in mind that
> > AOP_FLAG_UNINTERRUPTIBLE is not going to be the common case (unless
> > you're running loop or nfsd or something on the filesystem).
> > 
> > It would be really nice to figure out a way to avoid the reads in
> > the interruptible case as well.
> 
> True. For now though I think we need to start with slow and safe and see
> if we can optimize it further later...

Yes definitely. Thanks for cleaning up my mess!

 
> > I can't remember the CIFS code very well, but in several of the new
> > aops conversions I did, I added something like a BUG_ON(!PageUptodate())
> > in the write_end methods to ensure I wasn't missing some key part of
> > the logic. It's entirely possible that cifs is almost ready to handle
> > a !uptodate page in write_end...
> 
> Well, CIFS is "special". Rather than just updating the pagecache, we
> can fall back to doing a sync write instead. So I don't think we want
> to BUG if the page isn't up to date. It's not ideal, but I think it's a
> situation we can deal with if necessary.

Yes, that would be better. That sync write fallback is quite clever I
think...


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26 15:23                                             ` Nick Piggin
@ 2008-11-26 16:37                                               ` Jeff Layton
  2008-11-27  8:33                                                 ` Nick Piggin
  2008-11-26 19:46                                               ` Steve French
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-26 16:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Wed, 26 Nov 2008 16:23:32 +0100
Nick Piggin <npiggin@suse.de> wrote:
> 
> > > That seems pretty reasonable, although keep in mind that
> > > AOP_FLAG_UNINTERRUPTIBLE is not going to be the common case (unless
> > > you're running loop or nfsd or something on the filesystem).
> > > 
> > > It would be really nice to figure out a way to avoid the reads in
> > > the interruptible case as well.
> > 
>  
> > > I can't remember the CIFS code very well, but in several of the new
> > > aops conversions I did, I added something like a BUG_ON(!PageUptodate())
> > > in the write_end methods to ensure I wasn't missing some key part of
> > > the logic. It's entirely possible that cifs is almost ready to handle
> > > a !uptodate page in write_end...
> > 
> > Well, CIFS is "special". Rather than just updating the pagecache, we
> > can fall back to doing a sync write instead. So I don't think we want
> > to BUG if the page isn't up to date. It's not ideal, but I think it's a
> > situation we can deal with if necessary.
> 
> Yes, that would be better. That sync write fallback is quite clever I
> think...
> 

Since we're able to do the sync write fallback here, I think we can
probably not bother with checking for AOP_FLAG_UNINTERRUPTIBLE at all.
As long as we only set the page uptodate in write_end in the case that
the write isn't short, then we'll be ok I think even when it's
interruptible.

This does make the assumption that the interrupted case is somewhat
rare...

Thoughts on this patch? 

------------------[snip]-------------------

>From a8e5ea4d9859f590b5c04954e8e2a13023f690f5 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Wed, 26 Nov 2008 11:32:09 -0500
Subject: [PATCH] cifs: fix regression in cifs_write_begin/cifs_write_end

The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.

It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/file.c |   79 +++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b691b89..49d4a97 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1475,7 +1475,11 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
 		 page, pos, copied));
 
-	if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
+	if (PageChecked(page)) {
+		if (copied == len)
+			SetPageUptodate(page);
+		ClearPageChecked(page);
+	} else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
 		SetPageUptodate(page);
 
 	if (!PageUptodate(page)) {
@@ -2062,39 +2066,72 @@ static int cifs_write_begin(struct file *file, struct address_space *mapping,
 {
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
 	loff_t offset = pos & (PAGE_CACHE_SIZE - 1);
+	loff_t page_start = pos & PAGE_MASK;
+	loff_t i_size;
+	struct inode *inode;
+	struct page *page;
+	int rc = 0;
 
 	cFYI(1, ("write_begin from %lld len %d", (long long)pos, len));
 
-	*pagep = __grab_cache_page(mapping, index);
-	if (!*pagep)
-		return -ENOMEM;
-
-	if (PageUptodate(*pagep))
-		return 0;
+	page = __grab_cache_page(mapping, index);
+	if (!page) {
+		rc = -ENOMEM;
+		goto out;
+	}
 
-	/* If we are writing a full page it will be up to date,
-	   no need to read from the server */
-	if (len == PAGE_CACHE_SIZE && flags & AOP_FLAG_UNINTERRUPTIBLE)
-		return 0;
+	if (PageUptodate(page))
+		goto out;
 
-	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-		int rc;
+	/*
+	 * If we write a full page it will be up to date, no need to read from
+	 * the server. If the write is short, we'll end up doing a sync write
+	 * instead.
+	 */
+	if (len == PAGE_CACHE_SIZE)
+		goto out;
 
-		/* might as well read a page, it is fast enough */
-		rc = cifs_readpage_worker(file, *pagep, &offset);
+	/*
+	 * optimize away the read when we have an oplock, and we're not
+	 * expecting to use any of the data we'd be reading in. That
+	 * is, when the page lies beyond the EOF, or straddles the EOF
+	 * and the write will cover all of the existing data.
+	 */
+	inode = page->mapping->host;
+	if (CIFS_I(inode)->clientCanCacheRead) {
+		i_size = i_size_read(inode);
+		if (page_start >= i_size ||
+		    (offset == 0 && (pos + len) >= i_size)) {
+			zero_user_segments(page, 0, offset,
+					   offset + len,
+					   PAGE_CACHE_SIZE);
+			/*
+			 * PageChecked means that the parts of the page
+			 * to which we're not writing are considered up
+			 * to date. Once the data is copied to the
+			 * page, it can be set uptodate.
+			 */
+			SetPageChecked(page);
+			goto out;
+		}
+	}
 
-		/* we do not need to pass errors back
-		   e.g. if we do not have read access to the file
-		   because cifs_write_end will attempt synchronous writes
-		   -- shaggy */
+	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+		/*
+		 * might as well read a page, it is fast enough. If we get
+		 * an error, we don't need to return it. cifs_write_end will
+		 * do a sync write instead since PG_uptodate isn't set.
+		 */
+		cifs_readpage_worker(file, page, &page_start);
 	} else {
 		/* we could try using another file handle if there is one -
 		   but how would we lock it to prevent close of that handle
 		   racing with this read? In any case
 		   this will be written out by write_end so is fine */
 	}
-
-	return 0;
+out:
+	*pagep = page;
+	return rc;
 }
 
 const struct address_space_operations cifs_addr_ops = {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26 15:23                                             ` Nick Piggin
  2008-11-26 16:37                                               ` Jeff Layton
@ 2008-11-26 19:46                                               ` Steve French
  1 sibling, 0 replies; 38+ messages in thread
From: Steve French @ 2008-11-26 19:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jeff Layton, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

I put Jeff's latest patch in cifs-2.6.git - last chance to review
before I request the push upstream:
http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commit;h=a98ee8c1c707fe3210b00ef9f806ba8e2bf35504

Running some quick tests with iozone, the performance improvements on
the write subtests were spectacular.   In some cases (presumably due
to caching and journalling effects) beating ext3

On Wed, Nov 26, 2008 at 9:23 AM, Nick Piggin <npiggin@suse.de> wrote:
> On Wed, Nov 26, 2008 at 10:08:57AM -0500, Jeff Layton wrote:
>> On Wed, 26 Nov 2008 14:09:43 +0100
>> Nick Piggin <npiggin@suse.de> wrote:
>>
>>
>> > As to why it can happen, because copy_from_user could take a page fault
>> > on the app's source address (eg. to write(2)).
>>
>> Yep, I figured this out after stumbling across the LWN article.
>>
>> > You _might_ be OK there, but it's not a great idea to SetPageUptodate
>> > first ;) Aside from the problem of the short-copy, SetPageUptodate
>> > actually has a memory barrier in it to ensure the data stored into the
>> > page to bring it uptodate is actually visible before the PageUptodate
>> > flag is. Again, if you are doing DMAs rather than cache coherent stores
>> > to initialise the page, maybe you can get away without that barrier...
>> > But it's just bad practice.
>> >
>>
>> Gotcha. I think the current patch takes care of this (we're using
>> PageChecked to indicate that the uninitialized parts of the page were
>> written to).
>>
>> The problem I suppose is that we could end up getting a short write in
>> write_end. I guess this means that we need to modify the patch a bit
>> further and only set PageUptodate in write_end if copied == len.
>>
>> > > Given that I now understand what AOP_FLAG_UNINTERRUPTIBLE is supposed
>> > > to do, this patch is probably what we need. Running tests on it now.
>> >
>> > That seems pretty reasonable, although keep in mind that
>> > AOP_FLAG_UNINTERRUPTIBLE is not going to be the common case (unless
>> > you're running loop or nfsd or something on the filesystem).
>> >
>> > It would be really nice to figure out a way to avoid the reads in
>> > the interruptible case as well.
>>
>> True. For now though I think we need to start with slow and safe and see
>> if we can optimize it further later...
>
> Yes definitely. Thanks for cleaning up my mess!
>
>
>> > I can't remember the CIFS code very well, but in several of the new
>> > aops conversions I did, I added something like a BUG_ON(!PageUptodate())
>> > in the write_end methods to ensure I wasn't missing some key part of
>> > the logic. It's entirely possible that cifs is almost ready to handle
>> > a !uptodate page in write_end...
>>
>> Well, CIFS is "special". Rather than just updating the pagecache, we
>> can fall back to doing a sync write instead. So I don't think we want
>> to BUG if the page isn't up to date. It's not ideal, but I think it's a
>> situation we can deal with if necessary.
>
> Yes, that would be better. That sync write fallback is quite clever I
> think...
>
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-26 16:37                                               ` Jeff Layton
@ 2008-11-27  8:33                                                 ` Nick Piggin
  2008-11-28 12:18                                                   ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2008-11-27  8:33 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Wed, Nov 26, 2008 at 11:37:58AM -0500, Jeff Layton wrote:
> On Wed, 26 Nov 2008 16:23:32 +0100
> Nick Piggin <npiggin@suse.de> wrote:
> > 
> > > > That seems pretty reasonable, although keep in mind that
> > > > AOP_FLAG_UNINTERRUPTIBLE is not going to be the common case (unless
> > > > you're running loop or nfsd or something on the filesystem).
> > > > 
> > > > It would be really nice to figure out a way to avoid the reads in
> > > > the interruptible case as well.
> > > 
> >  
> > > > I can't remember the CIFS code very well, but in several of the new
> > > > aops conversions I did, I added something like a BUG_ON(!PageUptodate())
> > > > in the write_end methods to ensure I wasn't missing some key part of
> > > > the logic. It's entirely possible that cifs is almost ready to handle
> > > > a !uptodate page in write_end...
> > > 
> > > Well, CIFS is "special". Rather than just updating the pagecache, we
> > > can fall back to doing a sync write instead. So I don't think we want
> > > to BUG if the page isn't up to date. It's not ideal, but I think it's a
> > > situation we can deal with if necessary.
> > 
> > Yes, that would be better. That sync write fallback is quite clever I
> > think...
> > 
> 
> Since we're able to do the sync write fallback here, I think we can
> probably not bother with checking for AOP_FLAG_UNINTERRUPTIBLE at all.
> As long as we only set the page uptodate in write_end in the case that
> the write isn't short, then we'll be ok I think even when it's
> interruptible.

Yes, that would make sense. The AOP_FLAG_UNINTERRUPTIBLE is really just
a hint in case the filesystem can avoid some really expensive operation.
In reality, ~AOP_FLAG_UNINTERRUPTIBLE is going to be the most common
case for 99% of users, so having any tricky optimisations for it is
probably not worth the hassle.


> This does make the assumption that the interrupted case is somewhat
> rare...

It should be somewhat rare. The kernel prefaults the user memory before
it attempts to copy. For regular write(2), this means a short write is
almost never going to happen. For writev(2), we only fault in the first
iovec, so short writes could be much more common -- if this ever becomes
a problem then we could easily look at prefetching more than one iov.


> Thoughts on this patch? 

Looks pretty good.


> ------------------[snip]-------------------
> 
> From a8e5ea4d9859f590b5c04954e8e2a13023f690f5 Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@redhat.com>
> Date: Wed, 26 Nov 2008 11:32:09 -0500
> Subject: [PATCH] cifs: fix regression in cifs_write_begin/cifs_write_end
> 
> The conversion to write_begin/write_end interfaces had a bug where we
> were passing a bad parameter to cifs_readpage_worker. Rather than
> passing the page offset of the start of the write, we needed to pass the
> offset of the beginning of the page. This was reliably showing up as
> data corruption in the fsx-linux test from LTP.
> 
> It also became evident that this code was occasionally doing unnecessary
> read calls. Optimize those away by using the PG_checked flag to indicate
> that the unwritten part of the page has been initialized.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/cifs/file.c |   79 +++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 58 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index b691b89..49d4a97 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1475,7 +1475,11 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
>  	cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
>  		 page, pos, copied));
>  
> -	if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
> +	if (PageChecked(page)) {
> +		if (copied == len)
> +			SetPageUptodate(page);
> +		ClearPageChecked(page);
> +	} else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
>  		SetPageUptodate(page);

One minor thing -- you could do the !PageUptodate check first? If the
page is already uptodate, then everything is much simpler I think? (and
PageChecked should not be set).

if (!PageUptodate(page)) {
    if (PageChecked(page)) {
        if (copied == len)
            SetPageUptodate(page);
        ClearPageChecked(page);
    } else if (copied == PAGE_CACHE_SIZE)
        SetPageUptodate(page);
}

I don't know if you think that's better or not, but I really like to
make it clear that this is the !PageUptodate logic, and we never try
to SetPageUptodate on an already uptodate page.

But I guess it is just a matter of style. So go with whatever you like
best.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-27  8:33                                                 ` Nick Piggin
@ 2008-11-28 12:18                                                   ` Jeff Layton
  2008-11-30 21:44                                                     ` Steve French
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-28 12:18 UTC (permalink / raw)
  To: Steve French
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

> 
> One minor thing -- you could do the !PageUptodate check first? If the
> page is already uptodate, then everything is much simpler I think? (and
> PageChecked should not be set).
> 
> if (!PageUptodate(page)) {
>     if (PageChecked(page)) {
>         if (copied == len)
>             SetPageUptodate(page);
>         ClearPageChecked(page);
>     } else if (copied == PAGE_CACHE_SIZE)
>         SetPageUptodate(page);
> }
> 
> I don't know if you think that's better or not, but I really like to
> make it clear that this is the !PageUptodate logic, and we never try
> to SetPageUptodate on an already uptodate page.
> 
> But I guess it is just a matter of style. So go with whatever you like
> best.

Sounds reasonable. It might even be more efficient to clean up all of
the logic in cifs_write_end at some point so that we only check
PageUptodate once. That'll take some re-org though, and the perf gain
would be minimal (if any).

This patch should apply cleanly on top of the patch already in Steve's
tree...

--------------[snip]---------------

>From d4bd64bb2168a1833fa37a0e0eed8782afc633cc Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Fri, 28 Nov 2008 07:08:59 -0500
Subject: [PATCH] cifs: clean up conditionals in cifs_write_end

Make it clear that the conditionals at the start of cifs_write_end are
just for the situation when the page is not uptodate.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/file.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index f0a81e6..202a20f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1475,12 +1475,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
 	cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
 		 page, pos, copied));
 
-	if (PageChecked(page)) {
-		if (copied == len)
+	if (!PageUptodate(page)) {
+		if (PageChecked(page)) {
+			if (copied == len)
+				SetPageUptodate(page);
+			ClearPageChecked(page);
+		} else if (copied == PAGE_CACHE_SIZE)
 			SetPageUptodate(page);
-		ClearPageChecked(page);
-	} else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
-		SetPageUptodate(page);
+	}
 
 	if (!PageUptodate(page)) {
 		char *page_data;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-28 12:18                                                   ` Jeff Layton
@ 2008-11-30 21:44                                                     ` Steve French
  2008-11-30 22:17                                                       ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Steve French @ 2008-11-30 21:44 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Fri, Nov 28, 2008 at 6:18 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> One minor thing -- you could do the !PageUptodate check first? If the
>> page is already uptodate, then everything is much simpler I think? (and
>> PageChecked should not be set).
>>
>> if (!PageUptodate(page)) {
>>     if (PageChecked(page)) {
>>         if (copied == len)
>>             SetPageUptodate(page);
>>         ClearPageChecked(page);
>>     } else if (copied == PAGE_CACHE_SIZE)
>>         SetPageUptodate(page);
>> }
>>
>> I don't know if you think that's better or not, but I really like to
>> make it clear that this is the !PageUptodate logic, and we never try
>> to SetPageUptodate on an already uptodate page.
>>
>> But I guess it is just a matter of style. So go with whatever you like
>> best.
> --------------[snip]---------------
> Subject: [PATCH] cifs: clean up conditionals in cifs_write_end
>
> Make it clear that the conditionals at the start of cifs_write_end are
> just for the situation when the page is not uptodate.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/cifs/file.c |   12 +++++++-----
>  1 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index f0a81e6..202a20f 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1475,12 +1475,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
>        cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
>                 page, pos, copied));
>
> -       if (PageChecked(page)) {
> -               if (copied == len)
> +       if (!PageUptodate(page)) {
> +               if (PageChecked(page)) {
> +                       if (copied == len)
> +                               SetPageUptodate(page);
> +                       ClearPageChecked(page);
> +               } else if (copied == PAGE_CACHE_SIZE)
>                        SetPageUptodate(page);
> -               ClearPageChecked(page);
> -       } else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
> -               SetPageUptodate(page);
> +       }
>
>        if (!PageUptodate(page)) {
>                char *page_data;

Jeff and I just talked about his patch above, and decided not to make
his minor change above.  Moving PageUptodate check earlier would
complicate things in one way ... if PageChecked were ever set at the
same time as PageUptodate then PageChecked would stay set.  That is
probably not an issue but that is clearer with the original.

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-30 21:44                                                     ` Steve French
@ 2008-11-30 22:17                                                       ` Jeff Layton
  2008-12-01  8:44                                                         ` Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-11-30 22:17 UTC (permalink / raw)
  To: Steve French
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Sun, 30 Nov 2008 15:44:21 -0600
"Steve French" <smfrench@gmail.com> wrote:

> On Fri, Nov 28, 2008 at 6:18 AM, Jeff Layton <jlayton@redhat.com> wrote:
> >> One minor thing -- you could do the !PageUptodate check first? If the
> >> page is already uptodate, then everything is much simpler I think? (and
> >> PageChecked should not be set).
> >>
> >> if (!PageUptodate(page)) {
> >>     if (PageChecked(page)) {
> >>         if (copied == len)
> >>             SetPageUptodate(page);
> >>         ClearPageChecked(page);
> >>     } else if (copied == PAGE_CACHE_SIZE)
> >>         SetPageUptodate(page);
> >> }
> >>
> >> I don't know if you think that's better or not, but I really like to
> >> make it clear that this is the !PageUptodate logic, and we never try
> >> to SetPageUptodate on an already uptodate page.
> >>
> >> But I guess it is just a matter of style. So go with whatever you like
> >> best.
> > --------------[snip]---------------
> > Subject: [PATCH] cifs: clean up conditionals in cifs_write_end
> >
> > Make it clear that the conditionals at the start of cifs_write_end are
> > just for the situation when the page is not uptodate.
> >
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/cifs/file.c |   12 +++++++-----
> >  1 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index f0a81e6..202a20f 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -1475,12 +1475,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
> >        cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
> >                 page, pos, copied));
> >
> > -       if (PageChecked(page)) {
> > -               if (copied == len)
> > +       if (!PageUptodate(page)) {
> > +               if (PageChecked(page)) {
> > +                       if (copied == len)
> > +                               SetPageUptodate(page);
> > +                       ClearPageChecked(page);
> > +               } else if (copied == PAGE_CACHE_SIZE)
> >                        SetPageUptodate(page);
> > -               ClearPageChecked(page);
> > -       } else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
> > -               SetPageUptodate(page);
> > +       }
> >
> >        if (!PageUptodate(page)) {
> >                char *page_data;
> 
> Jeff and I just talked about his patch above, and decided not to make
> his minor change above.  Moving PageUptodate check earlier would
> complicate things in one way ... if PageChecked were ever set at the
> same time as PageUptodate then PageChecked would stay set.  That is
> probably not an issue but that is clearer with the original.
> 

I think it actually is a problem. Suppose PageChecked is never cleared
like you say, we flush the page and then do a partial page write again.
We do a readpage this time and it fails, but the copy of data to the
page works. Now we hit cifs_write_end and PageChecked is set, but
the unwritten parts of the page actually aren't up to date. Data
corruption ensues...

I agree that we should drop that patch. We might be able to make
cifs_write_end more efficient, but we'll need to be more careful
with PageChecked.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-11-30 22:17                                                       ` Jeff Layton
@ 2008-12-01  8:44                                                         ` Nick Piggin
  2008-12-01 11:28                                                           ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2008-12-01  8:44 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Sun, Nov 30, 2008 at 05:17:34PM -0500, Jeff Layton wrote:
> On Sun, 30 Nov 2008 15:44:21 -0600
> "Steve French" <smfrench@gmail.com> wrote:
> 
> > On Fri, Nov 28, 2008 at 6:18 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > >> One minor thing -- you could do the !PageUptodate check first? If the
> > >> page is already uptodate, then everything is much simpler I think? (and
> > >> PageChecked should not be set).
> > >>
> > >> if (!PageUptodate(page)) {
> > >>     if (PageChecked(page)) {
> > >>         if (copied == len)
> > >>             SetPageUptodate(page);
> > >>         ClearPageChecked(page);
> > >>     } else if (copied == PAGE_CACHE_SIZE)
> > >>         SetPageUptodate(page);
> > >> }
> > >>
> > >> I don't know if you think that's better or not, but I really like to
> > >> make it clear that this is the !PageUptodate logic, and we never try
> > >> to SetPageUptodate on an already uptodate page.
> > >>
> > >> But I guess it is just a matter of style. So go with whatever you like
> > >> best.
> > > --------------[snip]---------------
> > > Subject: [PATCH] cifs: clean up conditionals in cifs_write_end
> > >
> > > Make it clear that the conditionals at the start of cifs_write_end are
> > > just for the situation when the page is not uptodate.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > ---
> > >  fs/cifs/file.c |   12 +++++++-----
> > >  1 files changed, 7 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > > index f0a81e6..202a20f 100644
> > > --- a/fs/cifs/file.c
> > > +++ b/fs/cifs/file.c
> > > @@ -1475,12 +1475,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
> > >        cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
> > >                 page, pos, copied));
> > >
> > > -       if (PageChecked(page)) {
> > > -               if (copied == len)
> > > +       if (!PageUptodate(page)) {
> > > +               if (PageChecked(page)) {
> > > +                       if (copied == len)
> > > +                               SetPageUptodate(page);
> > > +                       ClearPageChecked(page);
> > > +               } else if (copied == PAGE_CACHE_SIZE)
> > >                        SetPageUptodate(page);
> > > -               ClearPageChecked(page);
> > > -       } else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
> > > -               SetPageUptodate(page);
> > > +       }
> > >
> > >        if (!PageUptodate(page)) {
> > >                char *page_data;
> > 
> > Jeff and I just talked about his patch above, and decided not to make
> > his minor change above.  Moving PageUptodate check earlier would
> > complicate things in one way ... if PageChecked were ever set at the
> > same time as PageUptodate then PageChecked would stay set.  That is
> > probably not an issue but that is clearer with the original.
> > 
> 
> I think it actually is a problem. Suppose PageChecked is never cleared
> like you say, we flush the page and then do a partial page write again.
> We do a readpage this time and it fails, but the copy of data to the
> page works. Now we hit cifs_write_end and PageChecked is set, but
> the unwritten parts of the page actually aren't up to date. Data
> corruption ensues...
> 
> I agree that we should drop that patch. We might be able to make
> cifs_write_end more efficient, but we'll need to be more careful
> with PageChecked.

Oh? I admittedly haven't looked at the source code after applying
your latest patch, but I thought it should not be possible to have
a leaking PageChecked. The page is under the page lock the whole
time, so a concurrent write should not be an issue...?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-12-01  8:44                                                         ` Nick Piggin
@ 2008-12-01 11:28                                                           ` Jeff Layton
  2008-12-01 11:32                                                             ` Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-12-01 11:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Mon, 1 Dec 2008 09:44:35 +0100
Nick Piggin <npiggin@suse.de> wrote:

> On Sun, Nov 30, 2008 at 05:17:34PM -0500, Jeff Layton wrote:
> > On Sun, 30 Nov 2008 15:44:21 -0600
> > "Steve French" <smfrench@gmail.com> wrote:
> > 
> > > On Fri, Nov 28, 2008 at 6:18 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > >> One minor thing -- you could do the !PageUptodate check first? If the
> > > >> page is already uptodate, then everything is much simpler I think? (and
> > > >> PageChecked should not be set).
> > > >>
> > > >> if (!PageUptodate(page)) {
> > > >>     if (PageChecked(page)) {
> > > >>         if (copied == len)
> > > >>             SetPageUptodate(page);
> > > >>         ClearPageChecked(page);
> > > >>     } else if (copied == PAGE_CACHE_SIZE)
> > > >>         SetPageUptodate(page);
> > > >> }
> > > >>
> > > >> I don't know if you think that's better or not, but I really like to
> > > >> make it clear that this is the !PageUptodate logic, and we never try
> > > >> to SetPageUptodate on an already uptodate page.
> > > >>
> > > >> But I guess it is just a matter of style. So go with whatever you like
> > > >> best.
> > > > --------------[snip]---------------
> > > > Subject: [PATCH] cifs: clean up conditionals in cifs_write_end
> > > >
> > > > Make it clear that the conditionals at the start of cifs_write_end are
> > > > just for the situation when the page is not uptodate.
> > > >
> > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > ---
> > > >  fs/cifs/file.c |   12 +++++++-----
> > > >  1 files changed, 7 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > > > index f0a81e6..202a20f 100644
> > > > --- a/fs/cifs/file.c
> > > > +++ b/fs/cifs/file.c
> > > > @@ -1475,12 +1475,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping,
> > > >        cFYI(1, ("write_end for page %p from pos %lld with %d bytes",
> > > >                 page, pos, copied));
> > > >
> > > > -       if (PageChecked(page)) {
> > > > -               if (copied == len)
> > > > +       if (!PageUptodate(page)) {
> > > > +               if (PageChecked(page)) {
> > > > +                       if (copied == len)
> > > > +                               SetPageUptodate(page);
> > > > +                       ClearPageChecked(page);
> > > > +               } else if (copied == PAGE_CACHE_SIZE)
> > > >                        SetPageUptodate(page);
> > > > -               ClearPageChecked(page);
> > > > -       } else if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
> > > > -               SetPageUptodate(page);
> > > > +       }
> > > >
> > > >        if (!PageUptodate(page)) {
> > > >                char *page_data;
> > > 
> > > Jeff and I just talked about his patch above, and decided not to make
> > > his minor change above.  Moving PageUptodate check earlier would
> > > complicate things in one way ... if PageChecked were ever set at the
> > > same time as PageUptodate then PageChecked would stay set.  That is
> > > probably not an issue but that is clearer with the original.
> > > 
> > 
> > I think it actually is a problem. Suppose PageChecked is never cleared
> > like you say, we flush the page and then do a partial page write again.
> > We do a readpage this time and it fails, but the copy of data to the
> > page works. Now we hit cifs_write_end and PageChecked is set, but
> > the unwritten parts of the page actually aren't up to date. Data
> > corruption ensues...
> > 
> > I agree that we should drop that patch. We might be able to make
> > cifs_write_end more efficient, but we'll need to be more careful
> > with PageChecked.
> 
> Oh? I admittedly haven't looked at the source code after applying
> your latest patch, but I thought it should not be possible to have
> a leaking PageChecked. The page is under the page lock the whole
> time, so a concurrent write should not be an issue...?
> 

But a concurrent write and read is, right?

Suppose we do a successful cifs_write_begin and set PageChecked. Another
thread then incurs a page fault and does a readpage before we copy the
data to the page. Won't we then call write_end with both PageChecked and
PageUptodate set?

That write will be fine, of course. PageChecked is still true though,
and I think that sets up the problem I was describing...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-12-01 11:28                                                           ` Jeff Layton
@ 2008-12-01 11:32                                                             ` Nick Piggin
  2008-12-01 11:55                                                               ` Jeff Layton
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2008-12-01 11:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Mon, Dec 01, 2008 at 06:28:49AM -0500, Jeff Layton wrote:
> On Mon, 1 Dec 2008 09:44:35 +0100
> Nick Piggin <npiggin@suse.de> wrote:
> > > I think it actually is a problem. Suppose PageChecked is never cleared
> > > like you say, we flush the page and then do a partial page write again.
> > > We do a readpage this time and it fails, but the copy of data to the
> > > page works. Now we hit cifs_write_end and PageChecked is set, but
> > > the unwritten parts of the page actually aren't up to date. Data
> > > corruption ensues...
> > > 
> > > I agree that we should drop that patch. We might be able to make
> > > cifs_write_end more efficient, but we'll need to be more careful
> > > with PageChecked.
> > 
> > Oh? I admittedly haven't looked at the source code after applying
> > your latest patch, but I thought it should not be possible to have
> > a leaking PageChecked. The page is under the page lock the whole
> > time, so a concurrent write should not be an issue...?
> > 
> 
> But a concurrent write and read is, right?
> 
> Suppose we do a successful cifs_write_begin and set PageChecked. Another
> thread then incurs a page fault and does a readpage before we copy the
> data to the page. Won't we then call write_end with both PageChecked and
> PageUptodate set?
> 
> That write will be fine, of course. PageChecked is still true though,
> and I think that sets up the problem I was describing...

Unless cifs is doing something different from the usual case, it should
lock the page over the readpage operation (the end IO handler would
typically unlock the page after doing a SetPageUptodate).

So concurrent reads should be protected with page lock as well.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-12-01 11:32                                                             ` Nick Piggin
@ 2008-12-01 11:55                                                               ` Jeff Layton
  2008-12-01 17:43                                                                 ` Steve French
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Layton @ 2008-12-01 11:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Steve French, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Mon, 1 Dec 2008 12:32:26 +0100
Nick Piggin <npiggin@suse.de> wrote:

> On Mon, Dec 01, 2008 at 06:28:49AM -0500, Jeff Layton wrote:
> > On Mon, 1 Dec 2008 09:44:35 +0100
> > Nick Piggin <npiggin@suse.de> wrote:
> > > > I think it actually is a problem. Suppose PageChecked is never cleared
> > > > like you say, we flush the page and then do a partial page write again.
> > > > We do a readpage this time and it fails, but the copy of data to the
> > > > page works. Now we hit cifs_write_end and PageChecked is set, but
> > > > the unwritten parts of the page actually aren't up to date. Data
> > > > corruption ensues...
> > > > 
> > > > I agree that we should drop that patch. We might be able to make
> > > > cifs_write_end more efficient, but we'll need to be more careful
> > > > with PageChecked.
> > > 
> > > Oh? I admittedly haven't looked at the source code after applying
> > > your latest patch, but I thought it should not be possible to have
> > > a leaking PageChecked. The page is under the page lock the whole
> > > time, so a concurrent write should not be an issue...?
> > > 
> > 
> > But a concurrent write and read is, right?
> > 
> > Suppose we do a successful cifs_write_begin and set PageChecked. Another
> > thread then incurs a page fault and does a readpage before we copy the
> > data to the page. Won't we then call write_end with both PageChecked and
> > PageUptodate set?
> > 
> > That write will be fine, of course. PageChecked is still true though,
> > and I think that sets up the problem I was describing...
> 
> Unless cifs is doing something different from the usual case, it should
> lock the page over the readpage operation (the end IO handler would
> typically unlock the page after doing a SetPageUptodate).
> 
> So concurrent reads should be protected with page lock as well.
> 

Ahh good point...the page would be locked there. If it's impossible for
PageUptodate to be flipped on while the page lock is held then this is
probably safe enough.

I'd still prefer that we handle the situation where both bits are set
in cifs_write_end. Some defensive coding is warranted here I think.
That can wait until 2.6.29 though. For now, the patch in Steve's tree
should be fine, IMO.

--
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [linux-cifs-client] Re: fsx-linux failing with latest cifs-2.6 git tree
  2008-12-01 11:55                                                               ` Jeff Layton
@ 2008-12-01 17:43                                                                 ` Steve French
  0 siblings, 0 replies; 38+ messages in thread
From: Steve French @ 2008-12-01 17:43 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Nick Piggin, linux-fsdevel, pbadari,
	linux-cifs-client@lists.samba.org

On Mon, Dec 1, 2008 at 5:55 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Mon, 1 Dec 2008 12:32:26 +0100
> Nick Piggin <npiggin@suse.de> wrote:
>
>> On Mon, Dec 01, 2008 at 06:28:49AM -0500, Jeff Layton wrote:
>> > On Mon, 1 Dec 2008 09:44:35 +0100
>> > Nick Piggin <npiggin@suse.de> wrote:
>> > > > I think it actually is a problem. Suppose PageChecked is never cleared
>> > > > like you say, we flush the page and then do a partial page write again.
>> > > > We do a readpage this time and it fails, but the copy of data to the
>> > > > page works. Now we hit cifs_write_end and PageChecked is set, but
>> > > > the unwritten parts of the page actually aren't up to date. Data
>> > > > corruption ensues...
>> > > >
>> > > > I agree that we should drop that patch. We might be able to make
>> > > > cifs_write_end more efficient, but we'll need to be more careful
>> > > > with PageChecked.
>> > >
>> > > Oh? I admittedly haven't looked at the source code after applying
>> > > your latest patch, but I thought it should not be possible to have
>> > > a leaking PageChecked. The page is under the page lock the whole
>> > > time, so a concurrent write should not be an issue...?
>> > >
>> >
>> > But a concurrent write and read is, right?
>> >
>> > Suppose we do a successful cifs_write_begin and set PageChecked. Another
>> > thread then incurs a page fault and does a readpage before we copy the
>> > data to the page. Won't we then call write_end with both PageChecked and
>> > PageUptodate set?
>> >
>> > That write will be fine, of course. PageChecked is still true though,
>> > and I think that sets up the problem I was describing...
>>
>> Unless cifs is doing something different from the usual case, it should
>> lock the page over the readpage operation (the end IO handler would
>> typically unlock the page after doing a SetPageUptodate).
>>
>> So concurrent reads should be protected with page lock as well.
>>
>
> Ahh good point...the page would be locked there. If it's impossible for
> PageUptodate to be flipped on while the page lock is held then this is
> probably safe enough.
>
> I'd still prefer that we handle the situation where both bits are set
> in cifs_write_end. Some defensive coding is warranted here I think.
> That can wait until 2.6.29 though. For now, the patch in Steve's tree
> should be fine, IMO.

This (why this defensive code is fine) is similar to what we
discussed.  I agree that we can leave it as is.


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2008-12-01 17:43 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20081121105613.09a8cb8e@tleilax.poochiereds.net>
     [not found] ` <524f69650811210820s549de2bah3181cbc0c5633091@mail.gmail.com>
     [not found]   ` <20081121112249.0b408b55@tleilax.poochiereds.net>
     [not found]     ` <524f69650811210846q7502fd99m6f4d335bb6ac1b65@mail.gmail.com>
     [not found]       ` <524f69650811211109w659e5decoa34a8e0f907772a3@mail.gmail.com>
     [not found]         ` <524f69650811211113q4fffcc70of88cb85db531c358@mail.gmail.com>
     [not found]           ` <1227296476.20845.8.camel@norville.austin.ibm.com>
     [not found]             ` <524f69650811211218v78295682lcf6dce842327b097@mail.gmail.com>
2008-11-21 20:38               ` Fwd: fsx-linux failing with latest cifs-2.6 git tree Steve French
2008-11-21 20:41                 ` Dave Kleikamp
2008-11-21 21:02                   ` Steve French
2008-11-21 23:44                   ` Nick Piggin
2008-11-21 20:50                 ` Jeff Layton
2008-11-21 22:50                 ` Jeff Layton
2008-11-21 23:02                   ` Dave Kleikamp
2008-11-21 23:25                     ` Jeff Layton
2008-11-22  1:04                       ` Steve French
2008-11-22  1:50                         ` Jeff Layton
2008-11-21 23:53                   ` Nick Piggin
2008-11-22  1:51                     ` Jeff Layton
2008-11-22  2:02                       ` Steve French
2008-11-22  4:47                         ` Dave Kleikamp
2008-11-22 15:39                           ` [linux-cifs-client] " Jeff Layton
2008-11-22 20:27                             ` Dave Kleikamp
2008-11-23 11:57                               ` Jeff Layton
2008-11-24  2:32                                 ` Steve French
2008-11-24 11:19                                   ` [linux-cifs-client] " Jeff Layton
2008-11-26  4:04                                     ` Steve French
2008-11-26 11:54                                       ` Jeff Layton
2008-11-26 12:11                                       ` Jeff Layton
2008-11-26 13:09                                         ` [linux-cifs-client] " Nick Piggin
2008-11-26 15:08                                           ` Jeff Layton
2008-11-26 15:23                                             ` Nick Piggin
2008-11-26 16:37                                               ` Jeff Layton
2008-11-27  8:33                                                 ` Nick Piggin
2008-11-28 12:18                                                   ` Jeff Layton
2008-11-30 21:44                                                     ` Steve French
2008-11-30 22:17                                                       ` Jeff Layton
2008-12-01  8:44                                                         ` Nick Piggin
2008-12-01 11:28                                                           ` Jeff Layton
2008-12-01 11:32                                                             ` Nick Piggin
2008-12-01 11:55                                                               ` Jeff Layton
2008-12-01 17:43                                                                 ` Steve French
2008-11-26 19:46                                               ` Steve French
2008-11-24 20:00                                 ` Dave Kleikamp
2008-11-26 13:02                       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).