linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rfc][patch] mm: direct io less aggressive syncs and invalidates
@ 2008-10-28 15:54 Nick Piggin
  2008-10-28 21:11 ` Jeff Moyer
  2008-10-29 14:02 ` Mikulas Patocka
  0 siblings, 2 replies; 12+ messages in thread
From: Nick Piggin @ 2008-10-28 15:54 UTC (permalink / raw)
  To: Andrew Morton, linux-fsdevel; +Cc: mpatocka


Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A
4K direct IO will actually try to sync and/or invalidate the pagecache of the
entire file, for example (which might be many GB or TB large).

Improve this by doing range syncs. Also, memory no longer has to be unmapped
to catch the dirty bits for syncing, as dirty bits would remain coherent due to
dirty mmap accounting.

This should fix the immediate DM deadlocks when doing direct IO reads to
block device with a mounted filesystem, if only by papering over the problem
somewhat rather than addressing the fsync starvation cases. Not that the
patch itself is a hack, but for this particular problem it is not really
the correct solution IMO. But anyway, this might be more appropriate to go
into stable kernels if this DM deadlock is biting users.

Yes, I still need to put more time into finishing my pagecache tag based
sync solution. Sorry :(


---
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
+++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
@@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
 			goto out; /* skip atime */
 		size = i_size_read(inode);
 		if (pos < size) {
-			retval = filemap_write_and_wait(mapping);
-			if (!retval) {
-				retval = mapping->a_ops->direct_IO(READ, iocb,
+			retval = mapping->a_ops->direct_IO(READ, iocb,
 							iov, pos, nr_segs);
-			}
 			if (retval > 0)
 				*ppos = pos + retval;
 			if (retval) {
@@ -2110,18 +2107,10 @@ generic_file_direct_write(struct kiocb *
 	if (count != ocount)
 		*nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count);
 
-	/*
-	 * Unmap all mmappings of the file up-front.
-	 *
-	 * This will cause any pte dirty bits to be propagated into the
-	 * pageframes for the subsequent filemap_write_and_wait().
-	 */
 	write_len = iov_length(iov, *nr_segs);
 	end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT;
-	if (mapping_mapped(mapping))
-		unmap_mapping_range(mapping, pos, write_len, 0);
 
-	written = filemap_write_and_wait(mapping);
+	written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
 	if (written)
 		goto out;
 
@@ -2507,7 +2496,8 @@ generic_file_buffered_write(struct kiocb
 	 * the file data here, to try to honour O_DIRECT expectations.
 	 */
 	if (unlikely(file->f_flags & O_DIRECT) && written)
-		status = filemap_write_and_wait(mapping);
+		status = filemap_write_and_wait_range(mapping,
+					pos, pos + written - 1);
 
 	return written ? written : status;
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-28 15:54 [rfc][patch] mm: direct io less aggressive syncs and invalidates Nick Piggin
@ 2008-10-28 21:11 ` Jeff Moyer
  2008-10-28 23:52   ` Nick Piggin
  2008-10-29  0:56   ` Jamie Lokier
  2008-10-29 14:02 ` Mikulas Patocka
  1 sibling, 2 replies; 12+ messages in thread
From: Jeff Moyer @ 2008-10-28 21:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, mpatocka

Nick Piggin <npiggin@suse.de> writes:

> Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A
> 4K direct IO will actually try to sync and/or invalidate the pagecache of the
> entire file, for example (which might be many GB or TB large).
>
> Improve this by doing range syncs. Also, memory no longer has to be unmapped
> to catch the dirty bits for syncing, as dirty bits would remain coherent due to
> dirty mmap accounting.
>
> This should fix the immediate DM deadlocks when doing direct IO reads to
> block device with a mounted filesystem, if only by papering over the problem
> somewhat rather than addressing the fsync starvation cases. Not that the
> patch itself is a hack, but for this particular problem it is not really
> the correct solution IMO. But anyway, this might be more appropriate to go
> into stable kernels if this DM deadlock is biting users.
>
> Yes, I still need to put more time into finishing my pagecache tag based
> sync solution. Sorry :(
>
>
> ---
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
> +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
> @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
>  			goto out; /* skip atime */
>  		size = i_size_read(inode);
>  		if (pos < size) {
> -			retval = filemap_write_and_wait(mapping);
> -			if (!retval) {
> -				retval = mapping->a_ops->direct_IO(READ, iocb,
> +			retval = mapping->a_ops->direct_IO(READ, iocb,
>  							iov, pos, nr_segs);
> -			}

So why is it safe to get rid of this?  Can't this result in reading
stale data from disk?

The rest looks good to me.  I ran the aio-dio-regress tests against this
kernel on a UP machine, and they all passed.  The kernel didn't boot on
my SMP box, though.  Nick, any chance you could grab that test suite and
run it on an smp system?
  http://git.kernel.org/?p=linux/kernel/git/zab/aio-dio-regress.git;a=summary

Thanks,

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-28 21:11 ` Jeff Moyer
@ 2008-10-28 23:52   ` Nick Piggin
  2008-10-29 13:12     ` Jeff Moyer
  2008-10-29  0:56   ` Jamie Lokier
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2008-10-28 23:52 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Andrew Morton, linux-fsdevel, mpatocka

On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote:
> Nick Piggin <npiggin@suse.de> writes:
> 
> > Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A
> > 4K direct IO will actually try to sync and/or invalidate the pagecache of the
> > entire file, for example (which might be many GB or TB large).
> >
> > Improve this by doing range syncs. Also, memory no longer has to be unmapped
> > to catch the dirty bits for syncing, as dirty bits would remain coherent due to
> > dirty mmap accounting.
> >
> > This should fix the immediate DM deadlocks when doing direct IO reads to
> > block device with a mounted filesystem, if only by papering over the problem
> > somewhat rather than addressing the fsync starvation cases. Not that the
> > patch itself is a hack, but for this particular problem it is not really
> > the correct solution IMO. But anyway, this might be more appropriate to go
> > into stable kernels if this DM deadlock is biting users.
> >
> > Yes, I still need to put more time into finishing my pagecache tag based
> > sync solution. Sorry :(
> >
> >
> > ---
> > Index: linux-2.6/mm/filemap.c
> > ===================================================================
> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
> >  			goto out; /* skip atime */
> >  		size = i_size_read(inode);
> >  		if (pos < size) {
> > -			retval = filemap_write_and_wait(mapping);
> > -			if (!retval) {
> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
> >  							iov, pos, nr_segs);
> > -			}
> 
> So why is it safe to get rid of this?  Can't this result in reading
> stale data from disk?

AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it
encounters a READ. I should have documented this change. This is one
thing I'm not *quite* sure of there  might be a path do the block device
that I haven't considered, and which does not do the sync...

 
> The rest looks good to me.  I ran the aio-dio-regress tests against this
> kernel on a UP machine, and they all passed.  The kernel didn't boot on
> my SMP box, though.  Nick, any chance you could grab that test suite and
> run it on an smp system?
>   http://git.kernel.org/?p=linux/kernel/git/zab/aio-dio-regress.git;a=summary

Yeah I could give that a shot and repost the patch for Andrew in a day or
two. Thanks for looking a it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-28 21:11 ` Jeff Moyer
  2008-10-28 23:52   ` Nick Piggin
@ 2008-10-29  0:56   ` Jamie Lokier
  2008-10-29 13:30     ` Jeff Moyer
  1 sibling, 1 reply; 12+ messages in thread
From: Jamie Lokier @ 2008-10-29  0:56 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka

Jeff Moyer wrote:
> > Index: linux-2.6/mm/filemap.c
> > ===================================================================
> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
> >  			goto out; /* skip atime */
> >  		size = i_size_read(inode);
> >  		if (pos < size) {
> > -			retval = filemap_write_and_wait(mapping);
> > -			if (!retval) {
> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
> >  							iov, pos, nr_segs);
> > -			}
> 
> So why is it safe to get rid of this?  Can't this result in reading
> stale data from disk?

It seems that could be easily tested in one of the test suites, by
writing a page without O_DIRECT to make a dirty page, then reading the
same page with O_DIRECT.  Do it a few times to be sure.

Do the test suites verify O_DIRECT / page-cache coherency?

Cheers,
-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-28 23:52   ` Nick Piggin
@ 2008-10-29 13:12     ` Jeff Moyer
  2008-10-29 21:47       ` Dave Chinner
  2008-10-30  2:11       ` Nick Piggin
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Moyer @ 2008-10-29 13:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, mpatocka

Nick Piggin <npiggin@suse.de> writes:

> On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote:
>> Nick Piggin <npiggin@suse.de> writes:

>> > Index: linux-2.6/mm/filemap.c
>> > ===================================================================
>> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
>> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
>> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
>> >  			goto out; /* skip atime */
>> >  		size = i_size_read(inode);
>> >  		if (pos < size) {
>> > -			retval = filemap_write_and_wait(mapping);
>> > -			if (!retval) {
>> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
>> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
>> >  							iov, pos, nr_segs);
>> > -			}
>> 
>> So why is it safe to get rid of this?  Can't this result in reading
>> stale data from disk?
>
> AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it
> encounters a READ. I should have documented this change. This is one
> thing I'm not *quite* sure of there  might be a path do the block device
> that I haven't considered, and which does not do the sync...

Well, that's if dio_lock_type != DIO_NO_LOCKING.  cscope shows the
following callers of blockdev_direct_IO_no_locking:

gfs2_direct_IO
ocfs2_direct_IO
xfs_vm_direct_IO

and of course

blkdev_direct_IO

I can't say whether all of these callers are safe.  They certainly don't
appear to be safe to me.

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-29  0:56   ` Jamie Lokier
@ 2008-10-29 13:30     ` Jeff Moyer
  2008-10-29 21:48       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Moyer @ 2008-10-29 13:30 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka

Jamie Lokier <jamie@shareable.org> writes:

> Jeff Moyer wrote:
>> > Index: linux-2.6/mm/filemap.c
>> > ===================================================================
>> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
>> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
>> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
>> >  			goto out; /* skip atime */
>> >  		size = i_size_read(inode);
>> >  		if (pos < size) {
>> > -			retval = filemap_write_and_wait(mapping);
>> > -			if (!retval) {
>> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
>> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
>> >  							iov, pos, nr_segs);
>> > -			}
>> 
>> So why is it safe to get rid of this?  Can't this result in reading
>> stale data from disk?
>
> It seems that could be easily tested in one of the test suites, by
> writing a page without O_DIRECT to make a dirty page, then reading the
> same page with O_DIRECT.  Do it a few times to be sure.

Sure, are you volunteering to write this? =)  For completeness, it
should probably do this via mmap, too.

> Do the test suites verify O_DIRECT / page-cache coherency?

aio-dio-regress has a couple of targeted tests, but nothing I would call
comprehensive.  I know that during the development of the direct I/O
code, there were some tests kicking around for this.  The small test
cases I can find include:

/*
 * aiodio_sparse - issue async O_DIRECT writes to holes is a file while
 *      concurrently reading the file and checking that the read never reads
 *      uninitailized data.
 */

dio_sparse -- same theme but no async I/O

aiodio_append -- this seems to fork a child which continually seeks to
the end of the file and tries to read data.  Meanwhile, the parent
continually appends zeroed data to the file using async direct I/O.  If
the child ever reads anything other than zeroes, it kicks an error.

dio_truncate -- 
        /*
         * Parent creates a zero file using DIO.
         * Truncates it to zero
         * Create another file with '0xaa'
         */
I have no idea why it creates a different file and fills it with 0xaa.
The child process continually reads the first file, ensuring that all of
the returned data is zeroes.

I'll see about incorporating the above into aio-dio-regress.

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-28 15:54 [rfc][patch] mm: direct io less aggressive syncs and invalidates Nick Piggin
  2008-10-28 21:11 ` Jeff Moyer
@ 2008-10-29 14:02 ` Mikulas Patocka
  2008-10-30  2:08   ` Nick Piggin
  1 sibling, 1 reply; 12+ messages in thread
From: Mikulas Patocka @ 2008-10-29 14:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel



On Tue, 28 Oct 2008, Nick Piggin wrote:

> Improve this by doing range syncs. Also, memory no longer has to be 
> unmapped to catch the dirty bits for syncing, as dirty bits would remain 
> coherent due to dirty mmap accounting.

mmaped memory doesn't have have to by synchronized at all with read/write. 
It has to be only synchonized if msync or munmap is called --- that's what 
Posix says.

So direct i/o implementation that ignores mmap would be correct. Direct 
i/o implementation that is not synchronized w.r.t. nondirect i/o wouldn't.

Mikulas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-29 13:12     ` Jeff Moyer
@ 2008-10-29 21:47       ` Dave Chinner
  2008-10-30  2:11       ` Nick Piggin
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2008-10-29 21:47 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka

On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote:
> Nick Piggin <npiggin@suse.de> writes:
> 
> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote:
> >> Nick Piggin <npiggin@suse.de> writes:
> 
> >> > Index: linux-2.6/mm/filemap.c
> >> > ===================================================================
> >> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
> >> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
> >> >  			goto out; /* skip atime */
> >> >  		size = i_size_read(inode);
> >> >  		if (pos < size) {
> >> > -			retval = filemap_write_and_wait(mapping);
> >> > -			if (!retval) {
> >> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
> >> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
> >> >  							iov, pos, nr_segs);
> >> > -			}
> >> 
> >> So why is it safe to get rid of this?  Can't this result in reading
> >> stale data from disk?
> >
> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it
> > encounters a READ. I should have documented this change. This is one
> > thing I'm not *quite* sure of there  might be a path do the block device
> > that I haven't considered, and which does not do the sync...
> 
> Well, that's if dio_lock_type != DIO_NO_LOCKING.  cscope shows the
> following callers of blockdev_direct_IO_no_locking:
> 
> gfs2_direct_IO
> ocfs2_direct_IO
> xfs_vm_direct_IO

XFS does it's own flush and invalidate before calling into the
generic direct I/O code, so the above patch is safe from an
XFS perspective.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-29 13:30     ` Jeff Moyer
@ 2008-10-29 21:48       ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2008-10-29 21:48 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Jamie Lokier, Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka

On Wed, Oct 29, 2008 at 09:30:12AM -0400, Jeff Moyer wrote:
> Jamie Lokier <jamie@shareable.org> writes:
> 
> > Jeff Moyer wrote:
> >> > Index: linux-2.6/mm/filemap.c
> >> > ===================================================================
> >> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
> >> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
> >> >  			goto out; /* skip atime */
> >> >  		size = i_size_read(inode);
> >> >  		if (pos < size) {
> >> > -			retval = filemap_write_and_wait(mapping);
> >> > -			if (!retval) {
> >> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
> >> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
> >> >  							iov, pos, nr_segs);
> >> > -			}
> >> 
> >> So why is it safe to get rid of this?  Can't this result in reading
> >> stale data from disk?
> >
> > It seems that could be easily tested in one of the test suites, by
> > writing a page without O_DIRECT to make a dirty page, then reading the
> > same page with O_DIRECT.  Do it a few times to be sure.
> 
> Sure, are you volunteering to write this? =)  For completeness, it
> should probably do this via mmap, too.

Making fsx mix buffered and direct I/O is the anwer to this
problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-29 14:02 ` Mikulas Patocka
@ 2008-10-30  2:08   ` Nick Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2008-10-30  2:08 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Andrew Morton, linux-fsdevel

On Wed, Oct 29, 2008 at 10:02:43AM -0400, Mikulas Patocka wrote:
> 
> 
> On Tue, 28 Oct 2008, Nick Piggin wrote:
> 
> > Improve this by doing range syncs. Also, memory no longer has to be 
> > unmapped to catch the dirty bits for syncing, as dirty bits would remain 
> > coherent due to dirty mmap accounting.
> 
> mmaped memory doesn't have have to by synchronized at all with read/write. 
> It has to be only synchonized if msync or munmap is called --- that's what 
> Posix says.
> 
> So direct i/o implementation that ignores mmap would be correct. Direct 
> i/o implementation that is not synchronized w.r.t. nondirect i/o wouldn't.

Right, but our implementation tries not to ignore mmap.

Our mmap is coherent with buffered IO, and buffered IO (kinda) coherent
with direct IO... least surprise says direct IO should be kinda coherent
with mmap ;)

At least now there is basically no extra cost to it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-29 13:12     ` Jeff Moyer
  2008-10-29 21:47       ` Dave Chinner
@ 2008-10-30  2:11       ` Nick Piggin
  2008-10-30 19:14         ` Jeff Moyer
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2008-10-30  2:11 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Andrew Morton, linux-fsdevel, mpatocka

On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote:
> Nick Piggin <npiggin@suse.de> writes:
> 
> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote:
> >> Nick Piggin <npiggin@suse.de> writes:
> 
> >> > Index: linux-2.6/mm/filemap.c
> >> > ===================================================================
> >> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
> >> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
> >> >  			goto out; /* skip atime */
> >> >  		size = i_size_read(inode);
> >> >  		if (pos < size) {
> >> > -			retval = filemap_write_and_wait(mapping);
> >> > -			if (!retval) {
> >> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
> >> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
> >> >  							iov, pos, nr_segs);
> >> > -			}
> >> 
> >> So why is it safe to get rid of this?  Can't this result in reading
> >> stale data from disk?
> >
> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it
> > encounters a READ. I should have documented this change. This is one
> > thing I'm not *quite* sure of there  might be a path do the block device
> > that I haven't considered, and which does not do the sync...
> 
> Well, that's if dio_lock_type != DIO_NO_LOCKING.  cscope shows the
> following callers of blockdev_direct_IO_no_locking:
> 
> gfs2_direct_IO
> ocfs2_direct_IO
> xfs_vm_direct_IO
> 
> and of course
> 
> blkdev_direct_IO
> 
> I can't say whether all of these callers are safe.  They certainly don't
> appear to be safe to me.

Ah OK of course you're right. I'll need to take another look at that
and probably send any improvement as another patch.

My test SMP system just started getting memory errors for some reason
so I haven't been able to boot it :( Will try to resurrect it or find
another before resending...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates
  2008-10-30  2:11       ` Nick Piggin
@ 2008-10-30 19:14         ` Jeff Moyer
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Moyer @ 2008-10-30 19:14 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, mpatocka

Nick Piggin <npiggin@suse.de> writes:

> On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote:
>> Nick Piggin <npiggin@suse.de> writes:
>> 
>> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote:
>> >> Nick Piggin <npiggin@suse.de> writes:
>> 
>> >> > Index: linux-2.6/mm/filemap.c
>> >> > ===================================================================
>> >> > --- linux-2.6.orig/mm/filemap.c	2008-10-03 11:21:31.000000000 +1000
>> >> > +++ linux-2.6/mm/filemap.c	2008-10-03 12:00:17.000000000 +1000
>> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
>> >> >  			goto out; /* skip atime */
>> >> >  		size = i_size_read(inode);
>> >> >  		if (pos < size) {
>> >> > -			retval = filemap_write_and_wait(mapping);
>> >> > -			if (!retval) {
>> >> > -				retval = mapping->a_ops->direct_IO(READ, iocb,
>> >> > +			retval = mapping->a_ops->direct_IO(READ, iocb,
>> >> >  							iov, pos, nr_segs);
>> >> > -			}
>> >> 
>> >> So why is it safe to get rid of this?  Can't this result in reading
>> >> stale data from disk?
>> >
>> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it
>> > encounters a READ. I should have documented this change. This is one
>> > thing I'm not *quite* sure of there  might be a path do the block device
>> > that I haven't considered, and which does not do the sync...
>> 
>> Well, that's if dio_lock_type != DIO_NO_LOCKING.  cscope shows the
>> following callers of blockdev_direct_IO_no_locking:
>> 
>> gfs2_direct_IO
>> ocfs2_direct_IO
>> xfs_vm_direct_IO
>> 
>> and of course
>> 
>> blkdev_direct_IO
>> 
>> I can't say whether all of these callers are safe.  They certainly don't
>> appear to be safe to me.
>
> Ah OK of course you're right. I'll need to take another look at that
> and probably send any improvement as another patch.
>
> My test SMP system just started getting memory errors for some reason
> so I haven't been able to boot it :( Will try to resurrect it or find
> another before resending...

OK, I got a kernel running on an smp system for testing.  I modified
your patch to do a filemap_write_and_wait_range in the read case.  The
aio-dio-regress test suite (with a few added programs to check for
buffered vs. direct I/O) passed without problems.  One of those programs
did not work with your initial patch, since it opened the block device
and mixed buffered and direct I/O.

Cheers,

Jeff

diff --git a/mm/filemap.c b/mm/filemap.c
index ab85536..76de63e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1317,11 +1317,11 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 			goto out; /* skip atime */
 		size = i_size_read(inode);
 		if (pos < size) {
-			retval = filemap_write_and_wait(mapping);
-			if (!retval) {
+			retval = filemap_write_and_wait_range(mapping, pos,
+					pos + iov_length(iov, nr_segs) - 1);
+			if (!retval)
 				retval = mapping->a_ops->direct_IO(READ, iocb,
 							iov, pos, nr_segs);
-			}
 			if (retval > 0)
 				*ppos = pos + retval;
 			if (retval) {
@@ -2123,18 +2123,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 	if (count != ocount)
 		*nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count);
 
-	/*
-	 * Unmap all mmappings of the file up-front.
-	 *
-	 * This will cause any pte dirty bits to be propagated into the
-	 * pageframes for the subsequent filemap_write_and_wait().
-	 */
 	write_len = iov_length(iov, *nr_segs);
 	end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT;
-	if (mapping_mapped(mapping))
-		unmap_mapping_range(mapping, pos, write_len, 0);
 
-	written = filemap_write_and_wait(mapping);
+	written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
 	if (written)
 		goto out;
 
@@ -2520,7 +2512,8 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	 * the file data here, to try to honour O_DIRECT expectations.
 	 */
 	if (unlikely(file->f_flags & O_DIRECT) && written)
-		status = filemap_write_and_wait(mapping);
+		status = filemap_write_and_wait_range(mapping,
+					pos, pos + written - 1);
 
 	return written ? written : status;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-10-30 19:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-28 15:54 [rfc][patch] mm: direct io less aggressive syncs and invalidates Nick Piggin
2008-10-28 21:11 ` Jeff Moyer
2008-10-28 23:52   ` Nick Piggin
2008-10-29 13:12     ` Jeff Moyer
2008-10-29 21:47       ` Dave Chinner
2008-10-30  2:11       ` Nick Piggin
2008-10-30 19:14         ` Jeff Moyer
2008-10-29  0:56   ` Jamie Lokier
2008-10-29 13:30     ` Jeff Moyer
2008-10-29 21:48       ` Dave Chinner
2008-10-29 14:02 ` Mikulas Patocka
2008-10-30  2:08   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).