* [rfc][patch] mm: direct io less aggressive syncs and invalidates
@ 2008-10-28 15:54 Nick Piggin
2008-10-28 21:11 ` Jeff Moyer
2008-10-29 14:02 ` Mikulas Patocka
0 siblings, 2 replies; 12+ messages in thread
From: Nick Piggin @ 2008-10-28 15:54 UTC (permalink / raw)
To: Andrew Morton, linux-fsdevel; +Cc: mpatocka
Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A
4K direct IO will actually try to sync and/or invalidate the pagecache of the
entire file, for example (which might be many GB or TB large).
Improve this by doing range syncs. Also, memory no longer has to be unmapped
to catch the dirty bits for syncing, as dirty bits would remain coherent due to
dirty mmap accounting.
This should fix the immediate DM deadlocks when doing direct IO reads to
block device with a mounted filesystem, if only by papering over the problem
somewhat rather than addressing the fsync starvation cases. Not that the
patch itself is a hack, but for this particular problem it is not really
the correct solution IMO. But anyway, this might be more appropriate to go
into stable kernels if this DM deadlock is biting users.
Yes, I still need to put more time into finishing my pagecache tag based
sync solution. Sorry :(
---
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000
+++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000
@@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb
goto out; /* skip atime */
size = i_size_read(inode);
if (pos < size) {
- retval = filemap_write_and_wait(mapping);
- if (!retval) {
- retval = mapping->a_ops->direct_IO(READ, iocb,
+ retval = mapping->a_ops->direct_IO(READ, iocb,
iov, pos, nr_segs);
- }
if (retval > 0)
*ppos = pos + retval;
if (retval) {
@@ -2110,18 +2107,10 @@ generic_file_direct_write(struct kiocb *
if (count != ocount)
*nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count);
- /*
- * Unmap all mmappings of the file up-front.
- *
- * This will cause any pte dirty bits to be propagated into the
- * pageframes for the subsequent filemap_write_and_wait().
- */
write_len = iov_length(iov, *nr_segs);
end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT;
- if (mapping_mapped(mapping))
- unmap_mapping_range(mapping, pos, write_len, 0);
- written = filemap_write_and_wait(mapping);
+ written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1);
if (written)
goto out;
@@ -2507,7 +2496,8 @@ generic_file_buffered_write(struct kiocb
* the file data here, to try to honour O_DIRECT expectations.
*/
if (unlikely(file->f_flags & O_DIRECT) && written)
- status = filemap_write_and_wait(mapping);
+ status = filemap_write_and_wait_range(mapping,
+ pos, pos + written - 1);
return written ? written : status;
}
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-28 15:54 [rfc][patch] mm: direct io less aggressive syncs and invalidates Nick Piggin @ 2008-10-28 21:11 ` Jeff Moyer 2008-10-28 23:52 ` Nick Piggin 2008-10-29 0:56 ` Jamie Lokier 2008-10-29 14:02 ` Mikulas Patocka 1 sibling, 2 replies; 12+ messages in thread From: Jeff Moyer @ 2008-10-28 21:11 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, mpatocka Nick Piggin <npiggin@suse.de> writes: > Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A > 4K direct IO will actually try to sync and/or invalidate the pagecache of the > entire file, for example (which might be many GB or TB large). > > Improve this by doing range syncs. Also, memory no longer has to be unmapped > to catch the dirty bits for syncing, as dirty bits would remain coherent due to > dirty mmap accounting. > > This should fix the immediate DM deadlocks when doing direct IO reads to > block device with a mounted filesystem, if only by papering over the problem > somewhat rather than addressing the fsync starvation cases. Not that the > patch itself is a hack, but for this particular problem it is not really > the correct solution IMO. But anyway, this might be more appropriate to go > into stable kernels if this DM deadlock is biting users. > > Yes, I still need to put more time into finishing my pagecache tag based > sync solution. Sorry :( > > > --- > Index: linux-2.6/mm/filemap.c > =================================================================== > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > goto out; /* skip atime */ > size = i_size_read(inode); > if (pos < size) { > - retval = filemap_write_and_wait(mapping); > - if (!retval) { > - retval = mapping->a_ops->direct_IO(READ, iocb, > + retval = mapping->a_ops->direct_IO(READ, iocb, > iov, pos, nr_segs); > - } So why is it safe to get rid of this? Can't this result in reading stale data from disk? The rest looks good to me. I ran the aio-dio-regress tests against this kernel on a UP machine, and they all passed. The kernel didn't boot on my SMP box, though. Nick, any chance you could grab that test suite and run it on an smp system? http://git.kernel.org/?p=linux/kernel/git/zab/aio-dio-regress.git;a=summary Thanks, Jeff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-28 21:11 ` Jeff Moyer @ 2008-10-28 23:52 ` Nick Piggin 2008-10-29 13:12 ` Jeff Moyer 2008-10-29 0:56 ` Jamie Lokier 1 sibling, 1 reply; 12+ messages in thread From: Nick Piggin @ 2008-10-28 23:52 UTC (permalink / raw) To: Jeff Moyer; +Cc: Andrew Morton, linux-fsdevel, mpatocka On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: > Nick Piggin <npiggin@suse.de> writes: > > > Direct IO can invalidate and sync a lot of pagecache pages in the mapping. A > > 4K direct IO will actually try to sync and/or invalidate the pagecache of the > > entire file, for example (which might be many GB or TB large). > > > > Improve this by doing range syncs. Also, memory no longer has to be unmapped > > to catch the dirty bits for syncing, as dirty bits would remain coherent due to > > dirty mmap accounting. > > > > This should fix the immediate DM deadlocks when doing direct IO reads to > > block device with a mounted filesystem, if only by papering over the problem > > somewhat rather than addressing the fsync starvation cases. Not that the > > patch itself is a hack, but for this particular problem it is not really > > the correct solution IMO. But anyway, this might be more appropriate to go > > into stable kernels if this DM deadlock is biting users. > > > > Yes, I still need to put more time into finishing my pagecache tag based > > sync solution. Sorry :( > > > > > > --- > > Index: linux-2.6/mm/filemap.c > > =================================================================== > > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > > goto out; /* skip atime */ > > size = i_size_read(inode); > > if (pos < size) { > > - retval = filemap_write_and_wait(mapping); > > - if (!retval) { > > - retval = mapping->a_ops->direct_IO(READ, iocb, > > + retval = mapping->a_ops->direct_IO(READ, iocb, > > iov, pos, nr_segs); > > - } > > So why is it safe to get rid of this? Can't this result in reading > stale data from disk? AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it encounters a READ. I should have documented this change. This is one thing I'm not *quite* sure of there might be a path do the block device that I haven't considered, and which does not do the sync... > The rest looks good to me. I ran the aio-dio-regress tests against this > kernel on a UP machine, and they all passed. The kernel didn't boot on > my SMP box, though. Nick, any chance you could grab that test suite and > run it on an smp system? > http://git.kernel.org/?p=linux/kernel/git/zab/aio-dio-regress.git;a=summary Yeah I could give that a shot and repost the patch for Andrew in a day or two. Thanks for looking a it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-28 23:52 ` Nick Piggin @ 2008-10-29 13:12 ` Jeff Moyer 2008-10-29 21:47 ` Dave Chinner 2008-10-30 2:11 ` Nick Piggin 0 siblings, 2 replies; 12+ messages in thread From: Jeff Moyer @ 2008-10-29 13:12 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, mpatocka Nick Piggin <npiggin@suse.de> writes: > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: >> Nick Piggin <npiggin@suse.de> writes: >> > Index: linux-2.6/mm/filemap.c >> > =================================================================== >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb >> > goto out; /* skip atime */ >> > size = i_size_read(inode); >> > if (pos < size) { >> > - retval = filemap_write_and_wait(mapping); >> > - if (!retval) { >> > - retval = mapping->a_ops->direct_IO(READ, iocb, >> > + retval = mapping->a_ops->direct_IO(READ, iocb, >> > iov, pos, nr_segs); >> > - } >> >> So why is it safe to get rid of this? Can't this result in reading >> stale data from disk? > > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it > encounters a READ. I should have documented this change. This is one > thing I'm not *quite* sure of there might be a path do the block device > that I haven't considered, and which does not do the sync... Well, that's if dio_lock_type != DIO_NO_LOCKING. cscope shows the following callers of blockdev_direct_IO_no_locking: gfs2_direct_IO ocfs2_direct_IO xfs_vm_direct_IO and of course blkdev_direct_IO I can't say whether all of these callers are safe. They certainly don't appear to be safe to me. Cheers, Jeff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-29 13:12 ` Jeff Moyer @ 2008-10-29 21:47 ` Dave Chinner 2008-10-30 2:11 ` Nick Piggin 1 sibling, 0 replies; 12+ messages in thread From: Dave Chinner @ 2008-10-29 21:47 UTC (permalink / raw) To: Jeff Moyer; +Cc: Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote: > Nick Piggin <npiggin@suse.de> writes: > > > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: > >> Nick Piggin <npiggin@suse.de> writes: > > >> > Index: linux-2.6/mm/filemap.c > >> > =================================================================== > >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > >> > goto out; /* skip atime */ > >> > size = i_size_read(inode); > >> > if (pos < size) { > >> > - retval = filemap_write_and_wait(mapping); > >> > - if (!retval) { > >> > - retval = mapping->a_ops->direct_IO(READ, iocb, > >> > + retval = mapping->a_ops->direct_IO(READ, iocb, > >> > iov, pos, nr_segs); > >> > - } > >> > >> So why is it safe to get rid of this? Can't this result in reading > >> stale data from disk? > > > > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it > > encounters a READ. I should have documented this change. This is one > > thing I'm not *quite* sure of there might be a path do the block device > > that I haven't considered, and which does not do the sync... > > Well, that's if dio_lock_type != DIO_NO_LOCKING. cscope shows the > following callers of blockdev_direct_IO_no_locking: > > gfs2_direct_IO > ocfs2_direct_IO > xfs_vm_direct_IO XFS does it's own flush and invalidate before calling into the generic direct I/O code, so the above patch is safe from an XFS perspective. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-29 13:12 ` Jeff Moyer 2008-10-29 21:47 ` Dave Chinner @ 2008-10-30 2:11 ` Nick Piggin 2008-10-30 19:14 ` Jeff Moyer 1 sibling, 1 reply; 12+ messages in thread From: Nick Piggin @ 2008-10-30 2:11 UTC (permalink / raw) To: Jeff Moyer; +Cc: Andrew Morton, linux-fsdevel, mpatocka On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote: > Nick Piggin <npiggin@suse.de> writes: > > > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: > >> Nick Piggin <npiggin@suse.de> writes: > > >> > Index: linux-2.6/mm/filemap.c > >> > =================================================================== > >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > >> > goto out; /* skip atime */ > >> > size = i_size_read(inode); > >> > if (pos < size) { > >> > - retval = filemap_write_and_wait(mapping); > >> > - if (!retval) { > >> > - retval = mapping->a_ops->direct_IO(READ, iocb, > >> > + retval = mapping->a_ops->direct_IO(READ, iocb, > >> > iov, pos, nr_segs); > >> > - } > >> > >> So why is it safe to get rid of this? Can't this result in reading > >> stale data from disk? > > > > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it > > encounters a READ. I should have documented this change. This is one > > thing I'm not *quite* sure of there might be a path do the block device > > that I haven't considered, and which does not do the sync... > > Well, that's if dio_lock_type != DIO_NO_LOCKING. cscope shows the > following callers of blockdev_direct_IO_no_locking: > > gfs2_direct_IO > ocfs2_direct_IO > xfs_vm_direct_IO > > and of course > > blkdev_direct_IO > > I can't say whether all of these callers are safe. They certainly don't > appear to be safe to me. Ah OK of course you're right. I'll need to take another look at that and probably send any improvement as another patch. My test SMP system just started getting memory errors for some reason so I haven't been able to boot it :( Will try to resurrect it or find another before resending... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-30 2:11 ` Nick Piggin @ 2008-10-30 19:14 ` Jeff Moyer 0 siblings, 0 replies; 12+ messages in thread From: Jeff Moyer @ 2008-10-30 19:14 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, mpatocka Nick Piggin <npiggin@suse.de> writes: > On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote: >> Nick Piggin <npiggin@suse.de> writes: >> >> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: >> >> Nick Piggin <npiggin@suse.de> writes: >> >> >> > Index: linux-2.6/mm/filemap.c >> >> > =================================================================== >> >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 >> >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 >> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb >> >> > goto out; /* skip atime */ >> >> > size = i_size_read(inode); >> >> > if (pos < size) { >> >> > - retval = filemap_write_and_wait(mapping); >> >> > - if (!retval) { >> >> > - retval = mapping->a_ops->direct_IO(READ, iocb, >> >> > + retval = mapping->a_ops->direct_IO(READ, iocb, >> >> > iov, pos, nr_segs); >> >> > - } >> >> >> >> So why is it safe to get rid of this? Can't this result in reading >> >> stale data from disk? >> > >> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it >> > encounters a READ. I should have documented this change. This is one >> > thing I'm not *quite* sure of there might be a path do the block device >> > that I haven't considered, and which does not do the sync... >> >> Well, that's if dio_lock_type != DIO_NO_LOCKING. cscope shows the >> following callers of blockdev_direct_IO_no_locking: >> >> gfs2_direct_IO >> ocfs2_direct_IO >> xfs_vm_direct_IO >> >> and of course >> >> blkdev_direct_IO >> >> I can't say whether all of these callers are safe. They certainly don't >> appear to be safe to me. > > Ah OK of course you're right. I'll need to take another look at that > and probably send any improvement as another patch. > > My test SMP system just started getting memory errors for some reason > so I haven't been able to boot it :( Will try to resurrect it or find > another before resending... OK, I got a kernel running on an smp system for testing. I modified your patch to do a filemap_write_and_wait_range in the read case. The aio-dio-regress test suite (with a few added programs to check for buffered vs. direct I/O) passed without problems. One of those programs did not work with your initial patch, since it opened the block device and mixed buffered and direct I/O. Cheers, Jeff diff --git a/mm/filemap.c b/mm/filemap.c index ab85536..76de63e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1317,11 +1317,11 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, goto out; /* skip atime */ size = i_size_read(inode); if (pos < size) { - retval = filemap_write_and_wait(mapping); - if (!retval) { + retval = filemap_write_and_wait_range(mapping, pos, + pos + iov_length(iov, nr_segs) - 1); + if (!retval) retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); - } if (retval > 0) *ppos = pos + retval; if (retval) { @@ -2123,18 +2123,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, if (count != ocount) *nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count); - /* - * Unmap all mmappings of the file up-front. - * - * This will cause any pte dirty bits to be propagated into the - * pageframes for the subsequent filemap_write_and_wait(). - */ write_len = iov_length(iov, *nr_segs); end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT; - if (mapping_mapped(mapping)) - unmap_mapping_range(mapping, pos, write_len, 0); - written = filemap_write_and_wait(mapping); + written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); if (written) goto out; @@ -2520,7 +2512,8 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov, * the file data here, to try to honour O_DIRECT expectations. */ if (unlikely(file->f_flags & O_DIRECT) && written) - status = filemap_write_and_wait(mapping); + status = filemap_write_and_wait_range(mapping, + pos, pos + written - 1); return written ? written : status; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-28 21:11 ` Jeff Moyer 2008-10-28 23:52 ` Nick Piggin @ 2008-10-29 0:56 ` Jamie Lokier 2008-10-29 13:30 ` Jeff Moyer 1 sibling, 1 reply; 12+ messages in thread From: Jamie Lokier @ 2008-10-29 0:56 UTC (permalink / raw) To: Jeff Moyer; +Cc: Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka Jeff Moyer wrote: > > Index: linux-2.6/mm/filemap.c > > =================================================================== > > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > > goto out; /* skip atime */ > > size = i_size_read(inode); > > if (pos < size) { > > - retval = filemap_write_and_wait(mapping); > > - if (!retval) { > > - retval = mapping->a_ops->direct_IO(READ, iocb, > > + retval = mapping->a_ops->direct_IO(READ, iocb, > > iov, pos, nr_segs); > > - } > > So why is it safe to get rid of this? Can't this result in reading > stale data from disk? It seems that could be easily tested in one of the test suites, by writing a page without O_DIRECT to make a dirty page, then reading the same page with O_DIRECT. Do it a few times to be sure. Do the test suites verify O_DIRECT / page-cache coherency? Cheers, -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-29 0:56 ` Jamie Lokier @ 2008-10-29 13:30 ` Jeff Moyer 2008-10-29 21:48 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Jeff Moyer @ 2008-10-29 13:30 UTC (permalink / raw) To: Jamie Lokier; +Cc: Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka Jamie Lokier <jamie@shareable.org> writes: > Jeff Moyer wrote: >> > Index: linux-2.6/mm/filemap.c >> > =================================================================== >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb >> > goto out; /* skip atime */ >> > size = i_size_read(inode); >> > if (pos < size) { >> > - retval = filemap_write_and_wait(mapping); >> > - if (!retval) { >> > - retval = mapping->a_ops->direct_IO(READ, iocb, >> > + retval = mapping->a_ops->direct_IO(READ, iocb, >> > iov, pos, nr_segs); >> > - } >> >> So why is it safe to get rid of this? Can't this result in reading >> stale data from disk? > > It seems that could be easily tested in one of the test suites, by > writing a page without O_DIRECT to make a dirty page, then reading the > same page with O_DIRECT. Do it a few times to be sure. Sure, are you volunteering to write this? =) For completeness, it should probably do this via mmap, too. > Do the test suites verify O_DIRECT / page-cache coherency? aio-dio-regress has a couple of targeted tests, but nothing I would call comprehensive. I know that during the development of the direct I/O code, there were some tests kicking around for this. The small test cases I can find include: /* * aiodio_sparse - issue async O_DIRECT writes to holes is a file while * concurrently reading the file and checking that the read never reads * uninitailized data. */ dio_sparse -- same theme but no async I/O aiodio_append -- this seems to fork a child which continually seeks to the end of the file and tries to read data. Meanwhile, the parent continually appends zeroed data to the file using async direct I/O. If the child ever reads anything other than zeroes, it kicks an error. dio_truncate -- /* * Parent creates a zero file using DIO. * Truncates it to zero * Create another file with '0xaa' */ I have no idea why it creates a different file and fills it with 0xaa. The child process continually reads the first file, ensuring that all of the returned data is zeroes. I'll see about incorporating the above into aio-dio-regress. Cheers, Jeff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-29 13:30 ` Jeff Moyer @ 2008-10-29 21:48 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2008-10-29 21:48 UTC (permalink / raw) To: Jeff Moyer Cc: Jamie Lokier, Nick Piggin, Andrew Morton, linux-fsdevel, mpatocka On Wed, Oct 29, 2008 at 09:30:12AM -0400, Jeff Moyer wrote: > Jamie Lokier <jamie@shareable.org> writes: > > > Jeff Moyer wrote: > >> > Index: linux-2.6/mm/filemap.c > >> > =================================================================== > >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 > >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 > >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb > >> > goto out; /* skip atime */ > >> > size = i_size_read(inode); > >> > if (pos < size) { > >> > - retval = filemap_write_and_wait(mapping); > >> > - if (!retval) { > >> > - retval = mapping->a_ops->direct_IO(READ, iocb, > >> > + retval = mapping->a_ops->direct_IO(READ, iocb, > >> > iov, pos, nr_segs); > >> > - } > >> > >> So why is it safe to get rid of this? Can't this result in reading > >> stale data from disk? > > > > It seems that could be easily tested in one of the test suites, by > > writing a page without O_DIRECT to make a dirty page, then reading the > > same page with O_DIRECT. Do it a few times to be sure. > > Sure, are you volunteering to write this? =) For completeness, it > should probably do this via mmap, too. Making fsx mix buffered and direct I/O is the anwer to this problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-28 15:54 [rfc][patch] mm: direct io less aggressive syncs and invalidates Nick Piggin 2008-10-28 21:11 ` Jeff Moyer @ 2008-10-29 14:02 ` Mikulas Patocka 2008-10-30 2:08 ` Nick Piggin 1 sibling, 1 reply; 12+ messages in thread From: Mikulas Patocka @ 2008-10-29 14:02 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel On Tue, 28 Oct 2008, Nick Piggin wrote: > Improve this by doing range syncs. Also, memory no longer has to be > unmapped to catch the dirty bits for syncing, as dirty bits would remain > coherent due to dirty mmap accounting. mmaped memory doesn't have have to by synchronized at all with read/write. It has to be only synchonized if msync or munmap is called --- that's what Posix says. So direct i/o implementation that ignores mmap would be correct. Direct i/o implementation that is not synchronized w.r.t. nondirect i/o wouldn't. Mikulas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates 2008-10-29 14:02 ` Mikulas Patocka @ 2008-10-30 2:08 ` Nick Piggin 0 siblings, 0 replies; 12+ messages in thread From: Nick Piggin @ 2008-10-30 2:08 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Andrew Morton, linux-fsdevel On Wed, Oct 29, 2008 at 10:02:43AM -0400, Mikulas Patocka wrote: > > > On Tue, 28 Oct 2008, Nick Piggin wrote: > > > Improve this by doing range syncs. Also, memory no longer has to be > > unmapped to catch the dirty bits for syncing, as dirty bits would remain > > coherent due to dirty mmap accounting. > > mmaped memory doesn't have have to by synchronized at all with read/write. > It has to be only synchonized if msync or munmap is called --- that's what > Posix says. > > So direct i/o implementation that ignores mmap would be correct. Direct > i/o implementation that is not synchronized w.r.t. nondirect i/o wouldn't. Right, but our implementation tries not to ignore mmap. Our mmap is coherent with buffered IO, and buffered IO (kinda) coherent with direct IO... least surprise says direct IO should be kinda coherent with mmap ;) At least now there is basically no extra cost to it. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-10-30 19:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-28 15:54 [rfc][patch] mm: direct io less aggressive syncs and invalidates Nick Piggin 2008-10-28 21:11 ` Jeff Moyer 2008-10-28 23:52 ` Nick Piggin 2008-10-29 13:12 ` Jeff Moyer 2008-10-29 21:47 ` Dave Chinner 2008-10-30 2:11 ` Nick Piggin 2008-10-30 19:14 ` Jeff Moyer 2008-10-29 0:56 ` Jamie Lokier 2008-10-29 13:30 ` Jeff Moyer 2008-10-29 21:48 ` Dave Chinner 2008-10-29 14:02 ` Mikulas Patocka 2008-10-30 2:08 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).