From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: mmap writes vs truncate causing data corruption
Date: Tue, 23 Sep 2014 23:18:16 +1000 [thread overview]
Message-ID: <20140923131816.GQ4267@dastard> (raw)
In-Reply-To: <20140923122754.GE2359@quack.suse.cz>
On Tue, Sep 23, 2014 at 02:27:54PM +0200, Jan Kara wrote:
> Hi,
>
> On Wed 17-09-14 19:28:05, Dave Chinner wrote:
> > Brian, Eric and I have been tracking down a set of data corruption
> > problems on XFS over the past couple of days. The one that is
> > important to the wider developer community is the truncate/mmap
> > write issue that Eric isolated from a real-world application that
> > was triggering it.
> >
> > The corruption only affects block size smaller than page size
> > configurations and is caused by mmapped writes to the EOF page
> > which has been partially truncated. If we then extend the file
> > again, the region of the page that was truncated and had blocks
> > punched out of it can be written to via mapped writes without blocks
> > being allocated for the hole. Hence while the page is in the page
> > cache, the contents of the file look OK. Unmount/mount the
> > filesystem, then re-read the page from disk and it will contain
> > zeros because there is a hole rather than data blocks.
> Hum, this is what we already discussed in
> http://lists.openwall.net/linux-ext4/2014/03/13/23, isn't it? I never
> thought about using mremap() in the test cases. That makes it even a POSIX
> valid test case... Nasty.
Yup, and the test case came from an application rather than being
something that was thought up in a drunken rampage of random
syscalls....
> > In the XFS case, the bug was that the filesystem truncate code is
> > not cleaning the partial page fully during the truncate down or up,
> > and hence the pte remains mapped dirty in the TLB. Hence when new
> > data is written to the page, it doesn't trigger a write fault,
> > ->page_mkwrite is not called and hence blocks are not allocated over
> > the hole. I chose to fix it on the truncate up as it was the lesser
> > of two evils - we can't actually fix the problem entirely because we
> > can't serialise page faults against truncate.
> Actually, as I mentioned in the above email, exactly the same problem
> happens when file gets extended because of a write beyond EOF (just change
> truncate up for pwrite in your test cases). You didn't handle that case in
> your XFS patch AFAICS.
That's because XFS already does tail block zeroing on truncate up
earlier in the truncate code (the xfs_zero_eof() call). The patch I
wrote simply stabilises the page so that a new page fault occurs
and remaps it correctly.
> > That is, if two filesystems that support block size smaller than
> > page size have similar data corruptions when exercising the same
> > generic code paths in similar ways, then it is likely that other
> > filesystems have similar problems and need to be checked.
> Frankly, I'd like to handle the problem in the generic code rather than
> having hacks in various filesystems. I have a patch back from 2009 which
> implements a helper function which gets called when creating a hole (either
> from ->setattr or ->write_end) and which handles this. It also has various
> optimizations built in - it doesn't do anything when blocksize == pagesize
> or when no hole block is actually created. Also it doesn't do any IO as you
> do in XFS - it only writeprotects the page. I'll port the patch and try it
> out with ext4.
I'm not sure exactly how that helps - I'll understand better when I
see the code ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2014-09-23 13:18 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-17 9:28 mmap writes vs truncate causing data corruption Dave Chinner
2014-09-23 12:27 ` Jan Kara
2014-09-23 13:18 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140923131816.GQ4267@dastard \
--to=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.