[patch] call truncate_inode_pages in the DIO fallback to buffered I/O path

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
@ 2006-10-04 17:04 Jeff Moyer
  2006-10-04 17:25 ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2006-10-04 17:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, zach.brown

Hi,

It seems that Oracle creates sparse files when doing table creates, and
then populates those files using O_DIRECT I/O.  That means that every I/O to
the sparse file falls back to buffered I/O.  Currently, such a sequential
O_DIRECT write to a sparse file will end up populating the page cache.  The
problem is that we don't invalidate the page cache pages used to perform
the buffered fallback.  After talking this over with Zach, we agreed that
there should be a call to truncate_inode_pages_range after the buffered I/O
fallback.

Attached is a patch which addresses the problem in my testing.  I wrote a
simple test program that creates a sparse file and issues sequential DIO
writes to it.  Before the patch, the page cache would grow as the file was
written.  With the patch, the page cache does not grow.

Comments welcome.

Cheers,

Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

--- linux-2.6.18.i686/mm/filemap.c.orig	2006-10-02 12:59:25.000000000 -0400
+++ linux-2.6.18.i686/mm/filemap.c	2006-10-04 12:54:51.000000000 -0400
@@ -2350,7 +2350,7 @@ __generic_file_aio_write_nolock(struct k
 				unsigned long nr_segs, loff_t *ppos)
 {
 	struct file *file = iocb->ki_filp;
-	const struct address_space * mapping = file->f_mapping;
+	struct address_space * mapping = file->f_mapping;
 	size_t ocount;		/* original count */
 	size_t count;		/* after file limit checks */
 	struct inode 	*inode = mapping->host;
@@ -2417,6 +2417,15 @@ __generic_file_aio_write_nolock(struct k

 	written = generic_file_buffered_write(iocb, iov, nr_segs,
 			pos, ppos, count, written);
+
+	/*
+	 *  When falling through to buffered I/O, we need to ensure that the
+	 *  page cache pages are written to disk and invalidated to preserve
+	 *  the expected O_DIRECT semantics.
+	 */
+	if (unlikely(file->f_flags & O_DIRECT))
+		truncate_inode_pages_range(mapping, pos, pos + count - 1);
+
 out:
 	current->backing_dev_info = NULL;
 	return written ? written : err;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 17:04 [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path Jeff Moyer
@ 2006-10-04 17:25 ` Andrew Morton
  2006-10-04 17:51   ` Zach Brown
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-04 17:25 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel, zach.brown

On Wed, 04 Oct 2006 13:04:42 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> Hi,
> 
> It seems that Oracle creates sparse files when doing table creates, and
> then populates those files using O_DIRECT I/O.  That means that every I/O to
> the sparse file falls back to buffered I/O.  Currently, such a sequential
> O_DIRECT write to a sparse file will end up populating the page cache.  The
> problem is that we don't invalidate the page cache pages used to perform
> the buffered fallback.

Why is this a problem?  It's just like someone did a write(), and we'll
invalidate the pagecache on the next direct-io operation.

>  After talking this over with Zach, we agreed that
> there should be a call to truncate_inode_pages_range after the buffered I/O
> fallback.
> 
> Attached is a patch which addresses the problem in my testing.  I wrote a
> simple test program that creates a sparse file and issues sequential DIO
> writes to it.  Before the patch, the page cache would grow as the file was
> written.  With the patch, the page cache does not grow.
> 
> Comments welcome.
> 
> Cheers,
> 
> Jeff
> 
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
> 
> --- linux-2.6.18.i686/mm/filemap.c.orig	2006-10-02 12:59:25.000000000 -0400
> +++ linux-2.6.18.i686/mm/filemap.c	2006-10-04 12:54:51.000000000 -0400
> @@ -2350,7 +2350,7 @@ __generic_file_aio_write_nolock(struct k
>  				unsigned long nr_segs, loff_t *ppos)
>  {
>  	struct file *file = iocb->ki_filp;
> -	const struct address_space * mapping = file->f_mapping;
> +	struct address_space * mapping = file->f_mapping;
>  	size_t ocount;		/* original count */
>  	size_t count;		/* after file limit checks */
>  	struct inode 	*inode = mapping->host;
> @@ -2417,6 +2417,15 @@ __generic_file_aio_write_nolock(struct k
>  
>  	written = generic_file_buffered_write(iocb, iov, nr_segs,
>  			pos, ppos, count, written);
> +
> +	/*
> +	 *  When falling through to buffered I/O, we need to ensure that the
> +	 *  page cache pages are written to disk and invalidated to preserve
> +	 *  the expected O_DIRECT semantics.
> +	 */
> +	if (unlikely(file->f_flags & O_DIRECT))
> +		truncate_inode_pages_range(mapping, pos, pos + count - 1);
> +
>  out:
>  	current->backing_dev_info = NULL;
>  	return written ? written : err;

eek.  truncate_inode_pages() will throw away dirty data.  Very dangerous,
much chin-scratching needed.

It also has security implications: if a user can force this shootdown to
happen against dirty data at the right time (perhaps with multiple
processes) then a subsequent read of that part of the file will yield
uninitialised disk data.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 17:25 ` Andrew Morton
@ 2006-10-04 17:51   ` Zach Brown
  2006-10-04 17:53     ` Jeff Moyer
  0 siblings, 1 reply; 16+ messages in thread
From: Zach Brown @ 2006-10-04 17:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jeff Moyer, linux-kernel

> Why is this a problem?  It's just like someone did a write(), and we'll
> invalidate the pagecache on the next direct-io operation.

This was noticed as a distro regression as they moved from the kernels
that used to invalidate the entire address space on direct io ops to
more modern ones that only invalidate the region being written.

You can end up with significant memory pressure after this change with a
large enough working set on disk.

> eek.  truncate_inode_pages() will throw away dirty data.  Very dangerous,
> much chin-scratching needed.

Yeah, I failed to tell Jeff that it should be calling
filemap_fdatawrite() first to get things into writeback.  (And
presumably not truncating if that returns an error.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 17:51   ` Zach Brown
@ 2006-10-04 17:53     ` Jeff Moyer
  2006-10-04 18:16       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2006-10-04 17:53 UTC (permalink / raw)
  To: Zach Brown; +Cc: Andrew Morton, linux-kernel

==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Zach Brown <zach.brown@oracle.com> adds:

>> Why is this a problem?  It's just like someone did a write(), and we'll
>> invalidate the pagecache on the next direct-io operation.

zach.brown> This was noticed as a distro regression as they moved from the
zach.brown> kernels that used to invalidate the entire address space on
zach.brown> direct io ops to more modern ones that only invalidate the
zach.brown> region being written.

Right.

zach.brown> You can end up with significant memory pressure after this
zach.brown> change with a large enough working set on disk.

>> eek.  truncate_inode_pages() will throw away dirty data.  Very
>> dangerous, much chin-scratching needed.

zach.brown> Yeah, I failed to tell Jeff that it should be calling
zach.brown> filemap_fdatawrite() first to get things into writeback.  (And
zach.brown> presumably not truncating if that returns an error.)

Ahh, that explains it.  The strange thing is that my test validates the
file afterwards, and I was seeing correct data.

I'll repost after another round of testing.

Thanks!

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 17:53     ` Jeff Moyer
@ 2006-10-04 18:16       ` Andrew Morton
  2006-10-04 18:40         ` Zach Brown
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-04 18:16 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Zach Brown, linux-kernel

On Wed, 04 Oct 2006 13:53:32 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> ==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Zach Brown <zach.brown@oracle.com> adds:
> 
> >> Why is this a problem?  It's just like someone did a write(), and we'll
> >> invalidate the pagecache on the next direct-io operation.
> 
> zach.brown> This was noticed as a distro regression as they moved from the
> zach.brown> kernels that used to invalidate the entire address space on
> zach.brown> direct io ops to more modern ones that only invalidate the
> zach.brown> region being written.
> 
> Right.

Change app to use sync_file_range() followed by
posix_fadvise(POSIX_FADV_DONTNEED).  Problem solved.

We have lots of nice new tools in-kernel which permit applications to
manipulate and to invalidate pagecache.  Please, start using them rather
than pushing bits of oracle into the core vfs ;)

> zach.brown> You can end up with significant memory pressure after this
> zach.brown> change with a large enough working set on disk.
> 
> >> eek.  truncate_inode_pages() will throw away dirty data.  Very
> >> dangerous, much chin-scratching needed.
> 
> zach.brown> Yeah, I failed to tell Jeff that it should be calling
> zach.brown> filemap_fdatawrite() first to get things into writeback.  (And
> zach.brown> presumably not truncating if that returns an error.)
> 
> Ahh, that explains it.  The strange thing is that my test validates the
> file afterwards, and I was seeing correct data.

That is strange, because the truncate_inode_pages() will throw away the
dirty pagecache pages which the application just wrote to.  Maybe the file
was opened O_SYNC or something.

> I'll repost after another round of testing.

Please, no truncate_inode_pages.  For this application, the far-safer
invalidate_inode_pages() would suffice.

However no kernel change is needed.

And no sneaking changes like this into vendor kernels either!  Fix Oracle.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 18:16       ` Andrew Morton
@ 2006-10-04 18:40         ` Zach Brown
  2006-10-04 19:16           ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Zach Brown @ 2006-10-04 18:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jeff Moyer, linux-kernel

> We have lots of nice new tools in-kernel which permit applications to
> manipulate and to invalidate pagecache.  Please, start using them rather
> than pushing bits of oracle into the core vfs ;)

And apps that were written before they were available? :)  We're OK with
their behaviour changing under newer kernels because they now have a
previous source of memory pressure that they didn't have before?

It seems a bit much to suggest that retaining the previous behaviour of
avoiding memory pressure by using the O_DIRECT API is somehow "pushing
bits of oracle into the core vfs" :).

Maybe that aspect of the API was unintentional, though.  That would be a
shame.  I suspect Oracle isn't alone in relying on it.

> Please, no truncate_inode_pages.  For this application, the far-safer
> invalidate_inode_pages() would suffice.

So now apps always have to pay the cost of looking up pages to
invalidate on the off chance that they wrote to a sparse region, based
on the current implementation detail that sparse regions fall back to
buffered?

- z

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 18:40         ` Zach Brown
@ 2006-10-04 19:16           ` Andrew Morton
  2006-10-04 20:53             ` Jeff Moyer
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-04 19:16 UTC (permalink / raw)
  To: Zach Brown; +Cc: Jeff Moyer, linux-kernel

On Wed, 04 Oct 2006 11:40:52 -0700
Zach Brown <zach.brown@oracle.com> wrote:

> 
> > We have lots of nice new tools in-kernel which permit applications to
> > manipulate and to invalidate pagecache.  Please, start using them rather
> > than pushing bits of oracle into the core vfs ;)
> 
> And apps that were written before they were available? :)  We're OK with
> their behaviour changing under newer kernels because they now have a
> previous source of memory pressure that they didn't have before?
> 
> It seems a bit much to suggest that retaining the previous behaviour of
> avoiding memory pressure by using the O_DIRECT API is somehow "pushing
> bits of oracle into the core vfs" :).
> 
> Maybe that aspect of the API was unintentional, though.  That would be a
> shame.  I suspect Oracle isn't alone in relying on it.

Why do so many patches degenerate into these little question-and-answer
sessions?  It's like pulling teeth.  Please completely describe the
problem.

What "newer kernels"?  The kernel has had the fall-back-to-buffered
behaviour for *ages*.

What's so bad about having a bit of pagecache floating about during what
is, I expect, a fairly rare operation?

What are the user-visible effects of this pagecache?

Please don't just sling a patch at us and leave us to guess what it's for.

> > Please, no truncate_inode_pages.  For this application, the far-safer
> > invalidate_inode_pages() would suffice.
> 
> So now apps always have to pay the cost of looking up pages to
> invalidate on the off chance that they wrote to a sparse region, based
> on the current implementation detail that sparse regions fall back to
> buffered?

"current"?  It was current about two years ago!

What I've thus far been able to decrypt from this exchange has been that
you think that the Linux direct-IO API should, as a side-effect, guarantee
that a direct-io write (and read?) of a file area should not leave that
part of the file in pagecache.

Well we _could_ define the API in that fashion.  It'd need to be carefully
documented somewhere.  But it's a fairly bizarre requirement, especially as
userspace already has quite rich ways of manipulating pagecache.

To do this right without modifying userspace we're going to need to sync
these pages to disk within the write() syscall and then invalidate them. 
That might screw somebody else up, but I think it could be justified on the
grounds that direct-io is a synchronous operation, so we should still be
synchronous if we went the buffered route.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 19:16           ` Andrew Morton
@ 2006-10-04 20:53             ` Jeff Moyer
  2006-10-04 21:22               ` Jeff Moyer
  2006-10-04 23:55               ` Andrew Morton
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff Moyer @ 2006-10-04 20:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zach Brown, linux-kernel

==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Andrew Morton <akpm@osdl.org> adds:

akpm> On Wed, 04 Oct 2006 11:40:52 -0700
akpm> Zach Brown <zach.brown@oracle.com> wrote:

>> > We have lots of nice new tools in-kernel which permit applications to
>> > manipulate and to invalidate pagecache.  Please, start using them
>> rather > than pushing bits of oracle into the core vfs ;)
>> 
>> And apps that were written before they were available? :) We're OK with
>> their behaviour changing under newer kernels because they now have a
>> previous source of memory pressure that they didn't have before?
>> 
>> It seems a bit much to suggest that retaining the previous behaviour of
>> avoiding memory pressure by using the O_DIRECT API is somehow "pushing
>> bits of oracle into the core vfs" :).
>> 
>> Maybe that aspect of the API was unintentional, though.  That would be a
>> shame.  I suspect Oracle isn't alone in relying on it.

akpm> Why do so many patches degenerate into these little
akpm> question-and-answer sessions?  It's like pulling teeth.  Please
akpm> completely describe the problem.

akpm> What "newer kernels"?  The kernel has had the fall-back-to-buffered
akpm> behaviour for *ages*.

akpm> What's so bad about having a bit of pagecache floating about during
akpm> what is, I expect, a fairly rare operation?

akpm> What are the user-visible effects of this pagecache?

akpm> Please don't just sling a patch at us and leave us to guess what it's
akpm> for.

This is my fault and I apologize.

Starting again from the beginning, Oracle table creates will create sparse
redo logs and database files, and then populate them sequentially.  A table
create can happen at any time, though it is arguably a rare occurrence
(some administrators have creates scripted to run nightly, others much
less).  Allocating memory to the page cache for these operations is
sub-optimal, since Oracle maintains its own cache layer.  The pages will
eventually need to be reaped, anyway, and this doesn't happen until there
is memory pressure.  I asked for performance numbers showing the impact of
this behaviour, and I'm working on producing some numbers here.  I'll
post the results when they are available.

So, how did this work in the past?  In kernels prior to 2.6.12,
generic_file_direct_IO looked like so:

	retval = mapping->a_ops->direct_IO(rw, iocb, iov,
					offset, nr_segs);
	if (rw == WRITE && mapping->nrpages) {
                int err = invalidate_inode_pages2(mapping);
	        if (err)
	                retval = err;
	}

If this was a write to a hole in a file, retval will be 0 and we fall back
to buffered I/O.  In this case, the invalidate_inode_pages2 call may not
invalidate any pages at all.  Here is the caller:

	if (unlikely(file->f_flags & O_DIRECT)) {
		written = generic_file_direct_write(iocb, iov,
				&nr_segs, pos, ppos, count, ocount);
		if (written < 0 || written == count)
			goto out;
		/*
		 * direct-io write to a hole: fall through to buffered I/O
		 * for completing the rest of the request.
		 */
		pos += written;
		count -= written;
	}

	written = generic_file_buffered_write(iocb, iov, nr_segs,
			pos, ppos, count, written);

Here, we end up populating the page cache.  Notice that, just like in
current kernels, there is no invalidation logic here.  The catch is that on
the next I/O to this file, all of the pages associated with the file will
be invalidated by the call to invalidate_inode_pages2 in
generic_file_direct_IO.  So, each I/O clears the page cache pages used by
the one before it.

In 2.6.12, the call to invalidate_inode_pages2 in generic_file_direct_IO was
changed to invalidate_inode_pages2_range:

		retval = mapping->a_ops->direct_IO(rw, iocb, iov,
						offset, nr_segs);
		if (rw == WRITE && mapping->nrpages) {
			pgoff_t end = (offset + write_len - 1)
						>> PAGE_CACHE_SHIFT;
			int err = invalidate_inode_pages2_range(mapping,
					offset >> PAGE_CACHE_SHIFT, end);
			if (err)
				retval = err;
		}

So this means that we no longer invalidate the pages used by the previous
I/O.  This is the source of the regression.

akpm> What I've thus far been able to decrypt from this exchange has been
akpm> that you think that the Linux direct-IO API should, as a side-effect,
akpm> guarantee that a direct-io write (and read?) of a file area should
akpm> not leave that part of the file in pagecache.

I think it is expected that a direct I/O should not populate the page
cache.  I think your statement is correct for writes.  For reads, I don't
think we need to shoot down the page cache (we do a filemap_write_and_wait
on the mapping before issuing the I/O, so the on-disk copy should be up to
date).

The man page for open states:

       O_DIRECT
              Try to minimize cache effects of the I/O to and from this file.

I think that invalidating the page cache pages we use when falling back to
buffered I/O stays true to the above description.

akpm> Well we _could_ define the API in that fashion.  It'd need to be
akpm> carefully documented somewhere.  But it's a fairly bizarre
akpm> requirement, especially as userspace already has quite rich ways of
akpm> manipulating pagecache.

I don't think it's a bizarre requirement that we invalidate the page cache
that we may have used while falling back to buffered I/O.

akpm> To do this right without modifying userspace we're going to need to
akpm> sync these pages to disk within the write() syscall and then
akpm> invalidate them. That might screw somebody else up, but I think it
akpm> could be justified on the grounds that direct-io is a synchronous
akpm> operation, so we should still be synchronous if we went the buffered
akpm> route.

I'm sure I'm missing something.  But I think that if I can get a suitable
patch posted for this issue, then there is no further work needed.

Again, sorry that I didn't include a better description.  I've attached my
reproducer to this message.

akpm> And no sneaking changes like this into vendor kernels either!  Fix
akpm> Oracle.

'Sneaking' is certainly not my intention.  =)

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 20:53             ` Jeff Moyer
@ 2006-10-04 21:22               ` Jeff Moyer
  2006-10-04 23:55               ` Andrew Morton
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Moyer @ 2006-10-04 21:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zach Brown, linux-kernel

==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Jeff Moyer <jmoyer@redhat.com> adds:
jmoyer> Again, sorry that I didn't include a better description.  I've
jmoyer> attached my reproducer to this message.

Oops, forgot to attach it.

-Jeff

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int
make_sparse(const char *filename, off_t len)
{
	int fd;

	fd = open(filename, O_WRONLY | O_CREAT, 0644);
	if (fd < 0) {
		perror("open");
		return -1;
	}

	if (ftruncate(fd, 0) != 0) {
		perror("ftruncate");
		return -1;
	}
	if (ftruncate(fd, len) != 0) {
		perror("ftruncate");
		return -1;
	}

	close(fd);
	return 0;
}

int
main(int argc, char **argv)
{
	int fd, i, nr_pages, ret;
	off_t len;
	int page_size = getpagesize();
	char *buf;
	char *pattern;

	if (argc < 3) {
		printf("Usage: %s <filename> <size-in-megabytes>\n", argv[0]);
		exit(1);
	}

	/* convert length to megabytes */
	len = strtoul(argv[2], NULL, 10);
	len <<= 20;

	if (make_sparse(argv[1], len) < 0)
		exit(4);

	nr_pages = len / page_size;

	if ((ret = posix_memalign((void **)&buf, 1024, page_size)) != 0) {
		errno = ret;
		perror("posix_memalign");
		exit(2);
	}


	fd = open(argv[1], O_WRONLY | O_CREAT | O_DIRECT);
	if (fd < 0) {
		perror("open");
		exit(3);
	}

	memset(buf, 'b', page_size);
	for (i = 0; i < nr_pages; i++) {
		ret = write(fd, buf, page_size);
		if (ret < 0) {
			perror("write");
			exit(5);
		}
	}
	close(fd);

	/* now read the data back in and make sure it hit the disk! */
	pattern = malloc(page_size);
	if (!pattern) {
		perror("malloc");
		exit(1);
	}
	memset(pattern, 'b', page_size);

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		exit(1);
	}

	while (i < len) {
		memset(buf, 0, sizeof(buf));
		ret = read(fd, buf, page_size);
		if (ret != page_size) {
			if (ret < 0)
				perror("read");
			else
				fprintf(stderr, "short read of %d\n", ret);
			exit(1);
		}
		if (memcmp(buf, pattern, page_size)) {
			fprintf(stderr, "Invalid Data!\n");
			exit(1);
		}
		i += page_size;
	}
	
	exit(0);
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 20:53             ` Jeff Moyer
  2006-10-04 21:22               ` Jeff Moyer
@ 2006-10-04 23:55               ` Andrew Morton
  2006-10-05 19:31                 ` Jeff Moyer
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-04 23:55 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Zach Brown, linux-kernel

On Wed, 04 Oct 2006 16:53:53 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> The man page for open states:
> 
>        O_DIRECT
>               Try to minimize cache effects of the I/O to and from this file.
> 
> I think that invalidating the page cache pages we use when falling back to
> buffered I/O stays true to the above description.

What the manpage forgot to mention is "direct-io is synchronous".

Except it isn't, when we fall back to buffered-IO.  So yup, I think we
could justify this (sort of) change on those grounds alone: preserving the
synchronous semantics.

I'd propose that we do this via

	generic_file_buffered_write(...);
	do_sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|
			SYNC_FILE_RANGE_WRITE|
			SYNC_FILE_RANGE_WAIT_AFTER)

	invalidate_mapping_pages(...);

There is a slight inefficiency here: generic_file_direct_IO() does
invalidate_inode_pages2_range(), then we go and instantiate some pagecache,
then we strip it away again with invalidate_mapping_pages().  That first
invalidate_inode_pages2_range() was somewhat of a waste of cycles.

But we expect that the next call to generic_file_direct_IO() won't actually
call invalidate_inode_pages2_range(), because mapping->nrpages is usually
zero.

Well, it would have been, back in the days when we were invalidating the
whole file.  Now are more efficient and we only invalidate the specific
segment of that file.  So if there's a stray pagecache page somewhere at the
far end ofthe file, we'll pointlessly call invalidate_inode_pages2_range() every
time.  Oh well.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-04 23:55               ` Andrew Morton
@ 2006-10-05 19:31                 ` Jeff Moyer
  2006-10-06 20:11                   ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2006-10-05 19:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zach Brown, linux-kernel

==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Andrew Morton <akpm@osdl.org> adds:

akpm> On Wed, 04 Oct 2006 16:53:53 -0400
akpm> Jeff Moyer <jmoyer@redhat.com> wrote:

>> The man page for open states:
>> 
>> O_DIRECT
>> Try to minimize cache effects of the I/O to and from this file.
>> 
>> I think that invalidating the page cache pages we use when falling back to
>> buffered I/O stays true to the above description.

akpm> What the manpage forgot to mention is "direct-io is synchronous".

akpm> Except it isn't, when we fall back to buffered-IO.  So yup, I think
akpm> we could justify this (sort of) change on those grounds alone:
akpm> preserving the synchronous semantics.

akpm> I'd propose that we do this via

akpm> 	generic_file_buffered_write(...);
akpm> 	do_sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|
akpm> 			SYNC_FILE_RANGE_WRITE|
akpm> 			SYNC_FILE_RANGE_WAIT_AFTER)

akpm> 	invalidate_mapping_pages(...);

OK, patch attached.  It's a bit awkward that some interfaces take a page
index while others take an offset in bytes.  I tested it with the code I
sent earlier in this thread.  I'll get results from oast and some dbcreate
runs comparing kernels with and without this change and post them when
available (probably in the next day or two).  Feel free to hold off
comitting this until that data is presented.

akpm> There is a slight inefficiency here: generic_file_direct_IO() does
akpm> invalidate_inode_pages2_range(), then we go and instantiate some
akpm> pagecache, then we strip it away again with
akpm> invalidate_mapping_pages().  That first
akpm> invalidate_inode_pages2_range() was somewhat of a waste of cycles.

akpm> But we expect that the next call to generic_file_direct_IO() won't
akpm> actually call invalidate_inode_pages2_range(), because
akpm> mapping->nrpages is usually zero.

akpm> Well, it would have been, back in the days when we were invalidating
akpm> the whole file.  Now are more efficient and we only invalidate the
akpm> specific segment of that file.  So if there's a stray pagecache page
akpm> somewhere at the far end ofthe file, we'll pointlessly call
akpm> invalidate_inode_pages2_range() every time.  Oh well.

I don't think it's worth losing sleep over.  I wonder how many applications
actually mix buffered and unbuffered I/O to the same file.

Cheers,

Jeff

--- linux-2.6.18.i686/mm/filemap.c.orig	2006-10-02 12:59:25.000000000 -0400
+++ linux-2.6.18.i686/mm/filemap.c	2006-10-05 15:06:26.000000000 -0400
@@ -2350,13 +2350,13 @@ __generic_file_aio_write_nolock(struct k
 				unsigned long nr_segs, loff_t *ppos)
 {
 	struct file *file = iocb->ki_filp;
-	const struct address_space * mapping = file->f_mapping;
+	struct address_space * mapping = file->f_mapping;
 	size_t ocount;		/* original count */
 	size_t count;		/* after file limit checks */
 	struct inode 	*inode = mapping->host;
 	unsigned long	seg;
 	loff_t		pos;
-	ssize_t		written;
+	ssize_t		written, written_direct;
 	ssize_t		err;

 	ocount = 0;
@@ -2386,7 +2386,7 @@ __generic_file_aio_write_nolock(struct k

 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
-	written = 0;
+	written = written_direct = 0;

 	err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
 	if (err)
@@ -2413,10 +2413,32 @@ __generic_file_aio_write_nolock(struct k
 		 */
 		pos += written;
 		count -= written;
+
+		written_direct = written;
 	}

 	written = generic_file_buffered_write(iocb, iov, nr_segs,
 			pos, ppos, count, written);
+
+	/*
+	 *  When falling through to buffered I/O, we need to ensure that the
+	 *  page cache pages are written to disk and invalidated to preserve
+	 *  the expected O_DIRECT semantics.
+	 */
+	if (unlikely(file->f_flags & O_DIRECT)) {
+		pgoff_t endbyte = pos + count - 1;
+
+		err = do_sync_file_range(file, pos, endbyte,
+					 SYNC_FILE_RANGE_WAIT_BEFORE|
+					 SYNC_FILE_RANGE_WRITE|
+					 SYNC_FILE_RANGE_WAIT_AFTER);
+		if (err == 0)
+			invalidate_mapping_pages(mapping,
+						 pos >> PAGE_CACHE_SHIFT,
+						 endbyte >> PAGE_CACHE_SHIFT);
+		else
+			written = written_direct;
+	}
 out:
 	current->backing_dev_info = NULL;
 	return written ? written : err;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-05 19:31                 ` Jeff Moyer
@ 2006-10-06 20:11                   ` Andrew Morton
  2006-10-11 16:48                     ` jmoyer
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-06 20:11 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Zach Brown, linux-kernel

On Thu, 05 Oct 2006 15:31:43 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> akpm> I'd propose that we do this via
> 
> akpm> 	generic_file_buffered_write(...);
> akpm> 	do_sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|
> akpm> 			SYNC_FILE_RANGE_WRITE|
> akpm> 			SYNC_FILE_RANGE_WAIT_AFTER)
> 
> akpm> 	invalidate_mapping_pages(...);
> 
> OK, patch attached.

I mangled your patch rather a lot.

- coding style (multiple declarations and multiple assignments)

- `endbyte' should be loff_t, not pgoff_t.

- sync the region (pos, pos+written), not (pos, pos+count).

- attempt to return the correct value from __generic_file_aio_write_nolock
  in all circumstances.

- Avoid testing the O_DIRECT flag twice - just call
  generic_file_buffered_write() from two sites.  Simpler and cleaner that way.


Patch is below.  The end result looks like:


	/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
	if (unlikely(file->f_flags & O_DIRECT)) {
		loff_t endbyte;
		ssize_t written_buffered;

		written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
							ppos, count, ocount);
		if (written < 0 || written == count)
			goto out;
		/*
		 * direct-io write to a hole: fall through to buffered I/O
		 * for completing the rest of the request.
		 */
		pos += written;
		count -= written;
		written_buffered = generic_file_buffered_write(iocb, iov,
						nr_segs, pos, ppos, count,
						written);

		/*
		 * We need to ensure that the page cache pages are written to
		 * disk and invalidated to preserve the expected O_DIRECT
		 * semantics.
		 */
		endbyte = pos + written_buffered - 1;
		err = do_sync_file_range(file, pos, endbyte,
					 SYNC_FILE_RANGE_WAIT_BEFORE|
					 SYNC_FILE_RANGE_WRITE|
					 SYNC_FILE_RANGE_WAIT_AFTER);
		if (err == 0) {
			written += written_buffered;
			invalidate_mapping_pages(mapping,
						 pos >> PAGE_CACHE_SHIFT,
						 endbyte >> PAGE_CACHE_SHIFT);
		} else {
			/*
			 * We don't know how much we wrote, so just return
			 * the number of bytes which were direct-written
			 */
		}
	} else {
		written = generic_file_buffered_write(iocb, iov, nr_segs,
				pos, ppos, count, written);
	}
out:
	current->backing_dev_info = NULL;
	return written ? written : err;
}


Which I think is closer to correct, but boy it needs a lot of reviewing and
testing.



diff -puN mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fixes mm/filemap.c
--- a/mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fixes
+++ a/mm/filemap.c
@@ -2228,7 +2228,7 @@ __generic_file_aio_write_nolock(struct k
 	struct inode 	*inode = mapping->host;
 	unsigned long	seg;
 	loff_t		pos;
-	ssize_t		written, written_direct;
+	ssize_t		written;
 	ssize_t		err;
 
 	ocount = 0;
@@ -2258,7 +2258,7 @@ __generic_file_aio_write_nolock(struct k
 
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
-	written = written_direct = 0;
+	written = 0;
 
 	err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
 	if (err)
@@ -2275,8 +2275,11 @@ __generic_file_aio_write_nolock(struct k
 
 	/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
 	if (unlikely(file->f_flags & O_DIRECT)) {
-		written = generic_file_direct_write(iocb, iov,
-				&nr_segs, pos, ppos, count, ocount);
+		loff_t endbyte;
+		ssize_t written_buffered;
+
+		written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
+							ppos, count, ocount);
 		if (written < 0 || written == count)
 			goto out;
 		/*
@@ -2285,31 +2288,34 @@ __generic_file_aio_write_nolock(struct k
 		 */
 		pos += written;
 		count -= written;
+		written_buffered = generic_file_buffered_write(iocb, iov,
+						nr_segs, pos, ppos, count,
+						written);
 
-		written_direct = written;
-	}
-
-	written = generic_file_buffered_write(iocb, iov, nr_segs,
-			pos, ppos, count, written);
-
-	/*
-	 *  When falling through to buffered I/O, we need to ensure that the
-	 *  page cache pages are written to disk and invalidated to preserve
-	 *  the expected O_DIRECT semantics.
-	 */
-	if (unlikely(file->f_flags & O_DIRECT)) {
-		pgoff_t endbyte = pos + count - 1;
-
+		/*
+		 * We need to ensure that the page cache pages are written to
+		 * disk and invalidated to preserve the expected O_DIRECT
+		 * semantics.
+		 */
+		endbyte = pos + written_buffered - 1;
 		err = do_sync_file_range(file, pos, endbyte,
 					 SYNC_FILE_RANGE_WAIT_BEFORE|
 					 SYNC_FILE_RANGE_WRITE|
 					 SYNC_FILE_RANGE_WAIT_AFTER);
-		if (err == 0)
+		if (err == 0) {
+			written += written_buffered;
 			invalidate_mapping_pages(mapping,
 						 pos >> PAGE_CACHE_SHIFT,
 						 endbyte >> PAGE_CACHE_SHIFT);
-		else
-			written = written_direct;
+		} else {
+			/*
+			 * We don't know how much we wrote, so just return
+			 * the number of bytes which were direct-written
+			 */
+		}
+	} else {
+		written = generic_file_buffered_write(iocb, iov, nr_segs,
+				pos, ppos, count, written);
 	}
 out:
 	current->backing_dev_info = NULL;
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-06 20:11                   ` Andrew Morton
@ 2006-10-11 16:48                     ` jmoyer
  2006-10-11 18:37                       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: jmoyer @ 2006-10-11 16:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zach Brown, linux-kernel

==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Andrew Morton <akpm@osdl.org> adds:

akpm> Patch is below.  The end result looks like:

akpm> 	/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
akpm> 	if (unlikely(file->f_flags & O_DIRECT)) {
akpm> 		loff_t endbyte;
akpm> 		ssize_t written_buffered;

akpm> 		written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
akpm> 							ppos, count, ocount);
akpm> 		if (written < 0 || written == count)
akpm> 			goto out;
akpm> 		/*
akpm> 		 * direct-io write to a hole: fall through to buffered I/O
akpm> 		 * for completing the rest of the request.
akpm> 		 */
akpm> 		pos += written;
akpm> 		count -= written;
akpm> 		written_buffered = generic_file_buffered_write(iocb, iov,
akpm> 						nr_segs, pos, ppos, count,
akpm> 						written);

akpm> 		/*
akpm> 		 * We need to ensure that the page cache pages are written to
akpm> 		 * disk and invalidated to preserve the expected O_DIRECT
akpm> 		 * semantics.
akpm> 		 */
akpm> 		endbyte = pos + written_buffered - 1;

We probably want to handle the case where generic_file_buffered_write
returns an error or nothing written.

akpm> 		err = do_sync_file_range(file, pos, endbyte,
akpm> 					 SYNC_FILE_RANGE_WAIT_BEFORE|
akpm> 					 SYNC_FILE_RANGE_WRITE|
akpm> 					 SYNC_FILE_RANGE_WAIT_AFTER);
akpm> 		if (err == 0) {
akpm> 			written += written_buffered;
akpm> 			invalidate_mapping_pages(mapping,
akpm> 						 pos >> PAGE_CACHE_SHIFT,
akpm> 						 endbyte >> PAGE_CACHE_SHIFT);

generic_file_buffered_write takes written as an argument, and returns that
amount plus whatever it managed to write.  As such, you don't want to add
written_buffered to written.  Instead, you want written = written_buffered.
The endbyte calculation has to be altered in kind.

Incremental, locally tested patch attached.  Comments are welcome as
always.  Once there is consensus, I'll send this off for testing with
Oracle again.

-Jeff

--- linux-2.6.18.i686/mm/filemap.c.orig	2006-10-11 11:58:29.000000000 -0400
+++ linux-2.6.18.i686/mm/filemap.c	2006-10-11 12:31:11.000000000 -0400
@@ -2419,19 +2419,21 @@ __generic_file_aio_write_nolock(struct k
 		written_buffered = generic_file_buffered_write(iocb, iov,
 						nr_segs, pos, ppos, count,
 						written);
+		if (written_buffered < 0 || written_buffered == written)
+			goto out;

 		/*
 		 * We need to ensure that the page cache pages are written to
 		 * disk and invalidated to preserve the expected O_DIRECT
 		 * semantics.
 		 */
-		endbyte = pos + written_buffered - 1;
+		endbyte = pos + written_buffered - written - 1;
 		err = do_sync_file_range(file, pos, endbyte,
 					 SYNC_FILE_RANGE_WAIT_BEFORE|
 					 SYNC_FILE_RANGE_WRITE|
 					 SYNC_FILE_RANGE_WAIT_AFTER);
 		if (err == 0) {
-			written += written_buffered;
+			written = written_buffered;
 			invalidate_mapping_pages(mapping,
 						 pos >> PAGE_CACHE_SHIFT,
 						 endbyte >> PAGE_CACHE_SHIFT);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-11 16:48                     ` jmoyer
@ 2006-10-11 18:37                       ` Andrew Morton
  2006-10-12 22:01                         ` Jeff Moyer
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-11 18:37 UTC (permalink / raw)
  To: jmoyer; +Cc: Zach Brown, linux-kernel

On Wed, 11 Oct 2006 12:48:57 -0400
jmoyer@redhat.com wrote:

> ==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Andrew Morton <akpm@osdl.org> adds:
> 
> akpm> Patch is below.  The end result looks like:
> 
> 
> akpm> 	/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
> akpm> 	if (unlikely(file->f_flags & O_DIRECT)) {
> akpm> 		loff_t endbyte;
> akpm> 		ssize_t written_buffered;
> 
> akpm> 		written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
> akpm> 							ppos, count, ocount);
> akpm> 		if (written < 0 || written == count)
> akpm> 			goto out;
> akpm> 		/*
> akpm> 		 * direct-io write to a hole: fall through to buffered I/O
> akpm> 		 * for completing the rest of the request.
> akpm> 		 */
> akpm> 		pos += written;
> akpm> 		count -= written;
> akpm> 		written_buffered = generic_file_buffered_write(iocb, iov,
> akpm> 						nr_segs, pos, ppos, count,
> akpm> 						written);
> 
> akpm> 		/*
> akpm> 		 * We need to ensure that the page cache pages are written to
> akpm> 		 * disk and invalidated to preserve the expected O_DIRECT
> akpm> 		 * semantics.
> akpm> 		 */
> akpm> 		endbyte = pos + written_buffered - 1;
> 
> We probably want to handle the case where generic_file_buffered_write
> returns an error or nothing written.

If generic_file_buffered_write() returned an error, we want to save that
error code somewhere.

If generic_file_buffered_write() didn't actually write anything then that's
pretty weird, but the following code should handle that correctly and do
the right thing.  So perhaps that case is so rare it isn't worth the
test-n-branch to optimise for it?


> akpm> 		err = do_sync_file_range(file, pos, endbyte,
> akpm> 					 SYNC_FILE_RANGE_WAIT_BEFORE|
> akpm> 					 SYNC_FILE_RANGE_WRITE|
> akpm> 					 SYNC_FILE_RANGE_WAIT_AFTER);
> akpm> 		if (err == 0) {
> akpm> 			written += written_buffered;
> akpm> 			invalidate_mapping_pages(mapping,
> akpm> 						 pos >> PAGE_CACHE_SHIFT,
> akpm> 						 endbyte >> PAGE_CACHE_SHIFT);
> 
> generic_file_buffered_write takes written as an argument, and returns that
> amount plus whatever it managed to write.

How weird.

>  As such, you don't want to add
> written_buffered to written.  Instead, you want written = written_buffered.
> The endbyte calculation has to be altered in kind.

OK.

> Incremental, locally tested patch attached.  Comments are welcome as
> always.  Once there is consensus, I'll send this off for testing with
> Oracle again.


So I'd propose:

diff -puN mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fix mm/filemap.c
--- a/mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fix
+++ a/mm/filemap.c
@@ -2291,19 +2291,30 @@ __generic_file_aio_write_nolock(struct k
 		written_buffered = generic_file_buffered_write(iocb, iov,
 						nr_segs, pos, ppos, count,
 						written);
+		/*
+		 * If generic_file_buffered_write() retuned a synchronous error
+		 * then we want to return the number of bytes which were
+		 * direct-written, or the error code if that was zero.  Note
+		 * that this differs from normal direct-io semantics, which
+		 * will return -EFOO even if some bytes were written.
+		 */
+		if (written_buffered < 0) {
+			err = written_buffered;
+			goto out;
+		}
 
 		/*
 		 * We need to ensure that the page cache pages are written to
 		 * disk and invalidated to preserve the expected O_DIRECT
 		 * semantics.
 		 */
-		endbyte = pos + written_buffered - 1;
+		endbyte = pos + written_buffered - written - 1;
 		err = do_sync_file_range(file, pos, endbyte,
 					 SYNC_FILE_RANGE_WAIT_BEFORE|
 					 SYNC_FILE_RANGE_WRITE|
 					 SYNC_FILE_RANGE_WAIT_AFTER);
 		if (err == 0) {
-			written += written_buffered;
+			written = written_buffered;
 			invalidate_mapping_pages(mapping,
 						 pos >> PAGE_CACHE_SHIFT,
 						 endbyte >> PAGE_CACHE_SHIFT);
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-11 18:37                       ` Andrew Morton
@ 2006-10-12 22:01                         ` Jeff Moyer
  2006-10-12 22:37                           ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2006-10-12 22:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zach Brown, linux-kernel

==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Andrew Morton <akpm@osdl.org> adds:

akpm> So I'd propose:

> diff -puN mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fix mm/filemap.c
> --- a/mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fix
> +++ a/mm/filemap.c
> @@ -2291,19 +2291,30 @@ __generic_file_aio_write_nolock(struct k
>  		written_buffered = generic_file_buffered_write(iocb, iov,
>  						nr_segs, pos, ppos, count,
>  						written);
> +		/*
> +		 * If generic_file_buffered_write() retuned a synchronous error
> +		 * then we want to return the number of bytes which were
> +		 * direct-written, or the error code if that was zero.  Note
> +		 * that this differs from normal direct-io semantics, which
> +		 * will return -EFOO even if some bytes were written.
> +		 */
> +		if (written_buffered < 0) {
> +			err = written_buffered;
> +			goto out;
> +		}

>  		/*
>  		 * We need to ensure that the page cache pages are written to
>  		 * disk and invalidated to preserve the expected O_DIRECT
>  		 * semantics.
>  		 */
> -		endbyte = pos + written_buffered - 1;
> +		endbyte = pos + written_buffered - written - 1;
>  		err = do_sync_file_range(file, pos, endbyte,
>  					 SYNC_FILE_RANGE_WAIT_BEFORE|
>  					 SYNC_FILE_RANGE_WRITE|
>  					 SYNC_FILE_RANGE_WAIT_AFTER);
>  		if (err == 0) {
> -			written += written_buffered;
> +			written = written_buffered;
>  			invalidate_mapping_pages(mapping,
>  						 pos >> PAGE_CACHE_SHIFT,
>  						 endbyte >> PAGE_CACHE_SHIFT);
> _

This passes my tests and the Oracle tests that triggered the problem in the
first place.  Thanks!

Acked-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
  2006-10-12 22:01                         ` Jeff Moyer
@ 2006-10-12 22:37                           ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2006-10-12 22:37 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Zach Brown, linux-kernel

On Thu, 12 Oct 2006 18:01:43 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> This passes my tests and the Oracle tests that triggered the problem in the
> first place.  Thanks!
> 
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

OK, thanks for testing.

We'll have added at least one bug.  We always do in there :(

I'll do some fsx-linux-with-O_DIRECT testing..

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-10-12 22:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-04 17:04 [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path Jeff Moyer
2006-10-04 17:25 ` Andrew Morton
2006-10-04 17:51   ` Zach Brown
2006-10-04 17:53     ` Jeff Moyer
2006-10-04 18:16       ` Andrew Morton
2006-10-04 18:40         ` Zach Brown
2006-10-04 19:16           ` Andrew Morton
2006-10-04 20:53             ` Jeff Moyer
2006-10-04 21:22               ` Jeff Moyer
2006-10-04 23:55               ` Andrew Morton
2006-10-05 19:31                 ` Jeff Moyer
2006-10-06 20:11                   ` Andrew Morton
2006-10-11 16:48                     ` jmoyer
2006-10-11 18:37                       ` Andrew Morton
2006-10-12 22:01                         ` Jeff Moyer
2006-10-12 22:37                           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox