From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: mmap writes vs truncate causing data corruption
Date: Tue, 23 Sep 2014 14:27:54 +0200 [thread overview]
Message-ID: <20140923122754.GE2359@quack.suse.cz> (raw)
In-Reply-To: <20140917092805.GT4322@dastard>
Hi,
On Wed 17-09-14 19:28:05, Dave Chinner wrote:
> Brian, Eric and I have been tracking down a set of data corruption
> problems on XFS over the past couple of days. The one that is
> important to the wider developer community is the truncate/mmap
> write issue that Eric isolated from a real-world application that
> was triggering it.
>
> The corruption only affects block size smaller than page size
> configurations and is caused by mmapped writes to the EOF page
> which has been partially truncated. If we then extend the file
> again, the region of the page that was truncated and had blocks
> punched out of it can be written to via mapped writes without blocks
> being allocated for the hole. Hence while the page is in the page
> cache, the contents of the file look OK. Unmount/mount the
> filesystem, then re-read the page from disk and it will contain
> zeros because there is a hole rather than data blocks.
Hum, this is what we already discussed in
http://lists.openwall.net/linux-ext4/2014/03/13/23, isn't it? I never
thought about using mremap() in the test cases. That makes it even a POSIX
valid test case... Nasty.
> In the XFS case, the bug was that the filesystem truncate code is
> not cleaning the partial page fully during the truncate down or up,
> and hence the pte remains mapped dirty in the TLB. Hence when new
> data is written to the page, it doesn't trigger a write fault,
> ->page_mkwrite is not called and hence blocks are not allocated over
> the hole. I chose to fix it on the truncate up as it was the lesser
> of two evils - we can't actually fix the problem entirely because we
> can't serialise page faults against truncate.
Actually, as I mentioned in the above email, exactly the same problem
happens when file gets extended because of a write beyond EOF (just change
truncate up for pwrite in your test cases). You didn't handle that case in
your XFS patch AFAICS.
> Initially I couldn't reproduce the data corruptions on ext4, but
> Eric came to my rescue and provided me with an updated mremap test
> that triggered corruptions. I also added another variant to the
> plain truncate/mwrite test and so now that itest also reliably
> produces data corruptions on ext4. I suspect the ext4 issue is
> similar to the XFS case (i.e. no page_mkwrite call), but I can't
> follow the ext4 code with any level of cluefulness....
I'm surprised ext4 is vulnerable. When I was checking a few years back
(2009 or so) it was not because if we found dirty buffers not marked
delalloc we just bit the bullet and tried allocating blocks. But probably
this got broken... checking the code... yeah, I broke that when rewriting
ext4 writeback path :-|
> That is, if two filesystems that support block size smaller than
> page size have similar data corruptions when exercising the same
> generic code paths in similar ways, then it is likely that other
> filesystems have similar problems and need to be checked.
Frankly, I'd like to handle the problem in the generic code rather than
having hacks in various filesystems. I have a patch back from 2009 which
implements a helper function which gets called when creating a hole (either
from ->setattr or ->write_end) and which handles this. It also has various
optimizations built in - it doesn't do anything when blocksize == pagesize
or when no hole block is actually created. Also it doesn't do any IO as you
do in XFS - it only writeprotects the page. I'll port the patch and try it
out with ext4.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2014-09-23 12:27 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-17 9:28 mmap writes vs truncate causing data corruption Dave Chinner
2014-09-23 12:27 ` Jan Kara [this message]
2014-09-23 13:18 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140923122754.GE2359@quack.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).