From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, Dmitry Monakhov <dmonakhov@openvz.org>,
linux-ext4@vger.kernel.org
Subject: Re: Uninitialized extent races
Date: Fri, 21 Dec 2012 23:49:47 +0100 [thread overview]
Message-ID: <20121221224947.GA23652@quack.suse.cz> (raw)
In-Reply-To: <20121221180243.GB31731@thunk.org>
On Fri 21-12-12 13:02:43, Ted Tso wrote:
> On Fri, Dec 21, 2012 at 05:19:29PM +0100, Jan Kara wrote:
> > No, I'm speaking about merging currently uninitialized extents. I.e.
> > suppose someone does the following on a filesystem with dioread_nolock so
> > that writeback happens via unwritten extents:
> > fd = open("file", O_RDWR);
> > pwrite(fd, buf, 4096, 0);
> > flusher thread starts writing
> > we create uninitialized extent for
> > range 0-4096
> > fallocate(fd, 0, 4096, 4096);
> > - we merge extents and now have just 1 uninitialized extent for range
> > 0-8192
> > ext4_convert_unwritten_extents() now
> > has to split the extent to finish
> > the IO.
>
> Ah, I see. Disabling the the merging that might take place as a
> result of the fallocate. Yes, I agree that's a completely sane thing
> to do.
OK, I'll write some patches.
> The alternate approach would be to add a flag in the extent status
> tree indicating that an unwritten conversion is pending, but that
> would add more complexity.
>
> Hmmm.... do we need that complexity anyway? What happens if we have a
> race between a punch (or truncate) and the flusher thread, so there is
> pending write. There are two things that would be of concern. (1)
> Will convert_unwritten_extents do the right thing if the extent in
> question has disappeared, and (2) what if the block gets reused for
> some other inode in the interim?
>
> I _think_ we're OK in the case of (2), since we're not using FUA writes
> for anything other than the commit block, so there shouldn't be any way
> that a write for the new inode could complete before the pending write
> finishes up. And (1) should be OK, although it may end up triggering a
> WARN_ON and a scarry ext4_msg() in ext4_convert_unwritten_extents().
> But it made me stop and think....
It's actually simpler than that. We wait for any pending DIO using
inode_dio_wait() and i_mutex protects from new writes to be submitted. So
that takes care of one possibility. truncate_inode_pages() waits for
PageWriteback bit so that handles waiting for IO itself. After I change
ext4 to convert extents before clearing PageWriteback, this will take care
also of extent conversion. Now a call to ext4_flush_unwritten_io() in
ext4_ext_truncate() resolves the problems. It's called after invalidating
page cache so we know all the pending IO for the truncated / punched area
is finished, just a conversion may be still pending.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-12-21 22:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-21 1:25 Uninitialized extent races Jan Kara
2012-12-21 3:11 ` Theodore Ts'o
2012-12-21 16:19 ` Jan Kara
2012-12-21 18:02 ` Theodore Ts'o
2012-12-21 22:49 ` Jan Kara [this message]
2012-12-21 23:03 ` Theodore Ts'o
2012-12-24 11:17 ` Zheng Liu
2012-12-31 8:32 ` Jan Kara
2012-12-31 16:31 ` Zheng Liu
2012-12-31 16:44 ` Jan Kara
2013-01-01 4:49 ` Zheng Liu
2012-12-21 12:34 ` Dmitry Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121221224947.GA23652@quack.suse.cz \
--to=jack@suse.cz \
--cc=dmonakhov@openvz.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.