From: Zach Brown <zach.brown@oracle.com>
To: Karl Schendel <kschendel@datallegro.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Benjamin LaHaise <bcrl@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH] Fix bad data from non-direct-io read after direct-io write
Date: Tue, 30 Oct 2007 11:45:46 -0700 [thread overview]
Message-ID: <47277BDA.6070702@oracle.com> (raw)
In-Reply-To: <47227813.7040604@datallegro.com>
> Yes, I do - I'd been tripping over it once every couple weeks,
> and I finally figured out how to hold my mouth right so that it
> fails (almost) every time.
OK, I tested and verified Karl's fix and wrote some commentary around it.
(Would a aio-dio git repo on kernel.org for these kind of fixes be well
received?)
----
dio: fix cache invalidation after sync writes
Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate clean
pages before dio write") introduced a bug which stopped dio from ever
invalidating the page cache after writes. It still invalidated it before
writes so most users were fine.
Karl Schendel reported hitting this bug ( http://lkml.org/lkml/2007/10/26/481 )
when he had a buffered reader immediately reading file data after an O_DIRECT
wirter had written the data. The kernel issued read-ahead beyond the position
of the reader which overlapped with the O_DIRECT writer. The failure to
invalidate after writes caused the reader to see stale data from the
read-ahead.
The following patch is originally from Karl. The following commentary is his:
The below 3rd try takes on your suggestion of just invalidating
no matter what the retval from the direct_IO call. I ran it
thru the test-case several times and it has worked every time.
The post-invalidate is probably still too early for async-directio,
but I don't have a testcase for that; just sync. And, this
won't be any worse in the async case.
I added a test to the aio-dio-regress repository which mimics Karl's IO
pattern. It verifed the bad behaviour and that the patch fixed it. I agree
with Karl, this still doesn't help the case where a buffered reader follows an
AIO O_DIRECT writer. That will require a bit more work.
This gives up on the idea of returning EIO to indicate to userspace that stale
data remains if the invalidation failed.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
--- linux-2.6.23.1-base/mm/filemap.c 2007-10-12 12:43:44.000000000 -0400
+++ linux-2.6.23.1/mm/filemap.c 2007-10-26 19:21:20.000000000 -0400
@@ -2194,21 +2194,17 @@ generic_file_direct_IO(int rw, struct ki
}
retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs);
- if (retval)
- goto out;
/*
* Finally, try again to invalidate clean pages which might have been
- * faulted in by get_user_pages() if the source of the write was an
- * mmap()ed region of the file we're writing. That's a pretty crazy
- * thing to do, so we don't support it 100%. If this invalidation
- * fails and we have -EIOCBQUEUED we ignore the failure.
+ * cached by non-direct readahead, or faulted in by get_user_pages()
+ * if the source of the write was an mmap'ed region of the file
+ * we're writing. Either one is a pretty crazy thing to do,
+ * so we don't support it 100%. If this invalidation
+ * fails, tough, the write still worked...
*/
if (rw == WRITE && mapping->nrpages) {
- int err = invalidate_inode_pages2_range(mapping,
- offset >> PAGE_CACHE_SHIFT, end);
- if (err && retval >= 0)
- retval = err;
+ invalidate_inode_pages2_range(mapping, offset >> PAGE_CACHE_SHIFT, end);
}
out:
return retval;
next prev parent reply other threads:[~2007-10-30 18:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-26 21:12 [PATCH] Fix bad data from non-direct-io read after direct-io write Karl Schendel
2007-10-26 21:34 ` Linus Torvalds
2007-10-26 22:10 ` Karl Schendel
2007-10-26 22:30 ` Zach Brown
2007-10-26 22:41 ` Karl Schendel
2007-10-26 22:42 ` Linus Torvalds
2007-10-26 22:54 ` Zach Brown
2007-10-26 23:14 ` Linus Torvalds
2007-10-26 23:28 ` Karl Schendel
2007-10-30 18:45 ` Zach Brown [this message]
2007-10-30 19:11 ` Linus Torvalds
2007-10-26 23:38 ` Zach Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47277BDA.6070702@oracle.com \
--to=zach.brown@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=chris.mason@oracle.com \
--cc=kschendel@datallegro.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.