From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, Dmitry Monakhov <dmonakhov@openvz.org>,
linux-ext4@vger.kernel.org
Subject: Re: Uninitialized extent races
Date: Fri, 21 Dec 2012 23:49:47 +0100 [thread overview]
Message-ID: <20121221224947.GA23652@quack.suse.cz> (raw)
In-Reply-To: <20121221180243.GB31731@thunk.org>
On Fri 21-12-12 13:02:43, Ted Tso wrote:
> On Fri, Dec 21, 2012 at 05:19:29PM +0100, Jan Kara wrote:
> > No, I'm speaking about merging currently uninitialized extents. I.e.
> > suppose someone does the following on a filesystem with dioread_nolock so
> > that writeback happens via unwritten extents:
> > fd = open("file", O_RDWR);
> > pwrite(fd, buf, 4096, 0);
> > flusher thread starts writing
> > we create uninitialized extent for
> > range 0-4096
> > fallocate(fd, 0, 4096, 4096);
> > - we merge extents and now have just 1 uninitialized extent for range
> > 0-8192
> > ext4_convert_unwritten_extents() now
> > has to split the extent to finish
> > the IO.
>
> Ah, I see. Disabling the the merging that might take place as a
> result of the fallocate. Yes, I agree that's a completely sane thing
> to do.
OK, I'll write some patches.
> The alternate approach would be to add a flag in the extent status
> tree indicating that an unwritten conversion is pending, but that
> would add more complexity.
>
> Hmmm.... do we need that complexity anyway? What happens if we have a
> race between a punch (or truncate) and the flusher thread, so there is
> pending write. There are two things that would be of concern. (1)
> Will convert_unwritten_extents do the right thing if the extent in
> question has disappeared, and (2) what if the block gets reused for
> some other inode in the interim?
>
> I _think_ we're OK in the case of (2), since we're not using FUA writes
> for anything other than the commit block, so there shouldn't be any way
> that a write for the new inode could complete before the pending write
> finishes up. And (1) should be OK, although it may end up triggering a
> WARN_ON and a scarry ext4_msg() in ext4_convert_unwritten_extents().
> But it made me stop and think....
It's actually simpler than that. We wait for any pending DIO using
inode_dio_wait() and i_mutex protects from new writes to be submitted. So
that takes care of one possibility. truncate_inode_pages() waits for
PageWriteback bit so that handles waiting for IO itself. After I change
ext4 to convert extents before clearing PageWriteback, this will take care
also of extent conversion. Now a call to ext4_flush_unwritten_io() in
ext4_ext_truncate() resolves the problems. It's called after invalidating
page cache so we know all the pending IO for the truncated / punched area
is finished, just a conversion may be still pending.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-12-21 22:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-21 1:25 Uninitialized extent races Jan Kara
2012-12-21 3:11 ` Theodore Ts'o
2012-12-21 16:19 ` Jan Kara
2012-12-21 18:02 ` Theodore Ts'o
2012-12-21 22:49 ` Jan Kara [this message]
2012-12-21 23:03 ` Theodore Ts'o
2012-12-24 11:17 ` Zheng Liu
2012-12-31 8:32 ` Jan Kara
2012-12-31 16:31 ` Zheng Liu
2012-12-31 16:44 ` Jan Kara
2013-01-01 4:49 ` Zheng Liu
2012-12-21 12:34 ` Dmitry Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121221224947.GA23652@quack.suse.cz \
--to=jack@suse.cz \
--cc=dmonakhov@openvz.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).