From: Nick Piggin <nickpiggin@yahoo.com.au>
To: David Chinner <dgc@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com, akpm@osdl.org
Subject: Re: [PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS
Date: Thu, 25 Jan 2007 11:47:24 +1100 [thread overview]
Message-ID: <45B7FE1C.3070807@yahoo.com.au> (raw)
In-Reply-To: <20070125003536.GS33919298@melbourne.sgi.com>
David Chinner wrote:
> On Thu, Jan 25, 2007 at 11:12:41AM +1100, Nick Piggin wrote:
>>... so surely if you do a direct read followed by a buffered read,
>>you should *not* get the same data if there has been some activity
>>to modify that part of the file in the meantime (whether that be a
>>buffered or direct write).
>
>
> Right. And that is what happens in XFS because it purges the
> caches on direct I/O and forces data to be re-read from disk.
And that is critical for direct IO writes, of course.
> Effectively, if you are mixing direct I/O with other types of I/O
> (buffered or mmap) then the application really needs to be certain
> it is doing the right thing because there are races that can occur
> below the filesystem. All we care about in the filesystem is that
> what we cache is the same as what is on disk, and that implies that
> direct I/O needs to purge the cache regardless of the state it is in....
>
> Hence we need to unmap pages and use truncate semantics on them to
> ensure they are removed from the page cache....
OK, I understand that this does need to happen (at least for writes),
so you need to fix it regardless of the DIO read issue.
But I'm just interested about DIO reads. I think you can get pretty
reasonable semantics without discarding pagecache, but the semantics
are weaker in one aspect.
DIO read
1. writeback page
2. read from disk
Now your read will pick up data no older than 1. And if a buffered
write happens after 2, then there is no problem either.
So if you are doing a buffered write and DIO read concurrently, you
want synchronisation so the buffered write happens either before 1
or after 2 -- the DIO read will see either all or none of the write.
Supposing your pagecache isn't invalidated, then a buffered write
(from mmap, if XFS doesn't allow write(2)) comes in between 1 and 2,
then the DIO read will find either none, some, or all of that write.
So I guess what you are preventing is the "some" case. Am I right?
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
next prev parent reply other threads:[~2007-01-25 0:47 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-23 22:37 [PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS David Chinner
2007-01-24 12:13 ` Peter Zijlstra
2007-01-24 13:43 ` Nick Piggin
2007-01-24 14:40 ` Peter Zijlstra
2007-01-25 0:05 ` Nick Piggin
2007-01-24 22:46 ` David Chinner
2007-01-25 0:12 ` Nick Piggin
2007-01-25 0:35 ` David Chinner
2007-01-25 0:47 ` Nick Piggin [this message]
2007-01-25 1:52 ` David Chinner
2007-01-25 2:01 ` Nick Piggin
2007-01-25 3:42 ` David Chinner
2007-01-25 4:25 ` Nick Piggin
2007-01-25 7:40 ` David Chinner
2007-01-25 10:26 ` Nick Piggin
2007-01-24 22:24 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45B7FE1C.3070807@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@osdl.org \
--cc=dgc@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox