From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Moyer Subject: Re: [rfc][patch] mm: direct io less aggressive syncs and invalidates Date: Thu, 30 Oct 2008 15:14:40 -0400 Message-ID: References: <20081028155421.GC3082@wotan.suse.de> <20081028235221.GB15599@wotan.suse.de> <20081030021144.GE18041@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andrew Morton , linux-fsdevel@vger.kernel.org, mpatocka@redhat.com To: Nick Piggin Return-path: Received: from mx2.redhat.com ([66.187.237.31]:37005 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750914AbYJ3TO5 (ORCPT ); Thu, 30 Oct 2008 15:14:57 -0400 In-Reply-To: <20081030021144.GE18041@wotan.suse.de> (Nick Piggin's message of "Thu, 30 Oct 2008 03:11:45 +0100") Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Nick Piggin writes: > On Wed, Oct 29, 2008 at 09:12:24AM -0400, Jeff Moyer wrote: >> Nick Piggin writes: >> >> > On Tue, Oct 28, 2008 at 05:11:02PM -0400, Jeff Moyer wrote: >> >> Nick Piggin writes: >> >> >> > Index: linux-2.6/mm/filemap.c >> >> > =================================================================== >> >> > --- linux-2.6.orig/mm/filemap.c 2008-10-03 11:21:31.000000000 +1000 >> >> > +++ linux-2.6/mm/filemap.c 2008-10-03 12:00:17.000000000 +1000 >> >> > @@ -1304,11 +1304,8 @@ generic_file_aio_read(struct kiocb *iocb >> >> > goto out; /* skip atime */ >> >> > size = i_size_read(inode); >> >> > if (pos < size) { >> >> > - retval = filemap_write_and_wait(mapping); >> >> > - if (!retval) { >> >> > - retval = mapping->a_ops->direct_IO(READ, iocb, >> >> > + retval = mapping->a_ops->direct_IO(READ, iocb, >> >> > iov, pos, nr_segs); >> >> > - } >> >> >> >> So why is it safe to get rid of this? Can't this result in reading >> >> stale data from disk? >> > >> > AFAIKS, __blockdev_direct_IO is doing the same thing for us, when it >> > encounters a READ. I should have documented this change. This is one >> > thing I'm not *quite* sure of there might be a path do the block device >> > that I haven't considered, and which does not do the sync... >> >> Well, that's if dio_lock_type != DIO_NO_LOCKING. cscope shows the >> following callers of blockdev_direct_IO_no_locking: >> >> gfs2_direct_IO >> ocfs2_direct_IO >> xfs_vm_direct_IO >> >> and of course >> >> blkdev_direct_IO >> >> I can't say whether all of these callers are safe. They certainly don't >> appear to be safe to me. > > Ah OK of course you're right. I'll need to take another look at that > and probably send any improvement as another patch. > > My test SMP system just started getting memory errors for some reason > so I haven't been able to boot it :( Will try to resurrect it or find > another before resending... OK, I got a kernel running on an smp system for testing. I modified your patch to do a filemap_write_and_wait_range in the read case. The aio-dio-regress test suite (with a few added programs to check for buffered vs. direct I/O) passed without problems. One of those programs did not work with your initial patch, since it opened the block device and mixed buffered and direct I/O. Cheers, Jeff diff --git a/mm/filemap.c b/mm/filemap.c index ab85536..76de63e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1317,11 +1317,11 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, goto out; /* skip atime */ size = i_size_read(inode); if (pos < size) { - retval = filemap_write_and_wait(mapping); - if (!retval) { + retval = filemap_write_and_wait_range(mapping, pos, + pos + iov_length(iov, nr_segs) - 1); + if (!retval) retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); - } if (retval > 0) *ppos = pos + retval; if (retval) { @@ -2123,18 +2123,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, if (count != ocount) *nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count); - /* - * Unmap all mmappings of the file up-front. - * - * This will cause any pte dirty bits to be propagated into the - * pageframes for the subsequent filemap_write_and_wait(). - */ write_len = iov_length(iov, *nr_segs); end = (pos + write_len - 1) >> PAGE_CACHE_SHIFT; - if (mapping_mapped(mapping)) - unmap_mapping_range(mapping, pos, write_len, 0); - written = filemap_write_and_wait(mapping); + written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); if (written) goto out; @@ -2520,7 +2512,8 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov, * the file data here, to try to honour O_DIRECT expectations. */ if (unlikely(file->f_flags & O_DIRECT) && written) - status = filemap_write_and_wait(mapping); + status = filemap_write_and_wait_range(mapping, + pos, pos + written - 1); return written ? written : status; }