From: Theodore Ts'o <tytso@mit.edu>
To: Jan Kara <jack@suse.cz>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>, linux-ext4@vger.kernel.org
Subject: Re: Uninitialized extent races
Date: Fri, 21 Dec 2012 18:03:35 -0500 [thread overview]
Message-ID: <20121221230335.GH31731@thunk.org> (raw)
In-Reply-To: <20121221224947.GA23652@quack.suse.cz>
On Fri, Dec 21, 2012 at 11:49:47PM +0100, Jan Kara wrote:
> It's actually simpler than that. We wait for any pending DIO using
> inode_dio_wait() and i_mutex protects from new writes to be submitted. So
> that takes care of one possibility. truncate_inode_pages() waits for
> PageWriteback bit so that handles waiting for IO itself.
Hmm, yes, I should have known/remembered that. I've seen cases where
very rarely, it's possible for a unlink() or truncate() call to stall
for multiple minutes(!). This can happen if you have writeback
happening in a container which has a very small (low priority)
constraint on its block I/O bandwidth. If you try to delete an inode
which has writeback work pending, it's possible for the writeback to
take a looong time, which in turn causes the unlink to take a long
time.
It becomes worse the process doing the unlink is a high priority
process (say, the cluster management daemon who is cleaning up after
said low-priority job has completed), but the writeback is happening
in the context of a low priority cgroup. You can end up with a nasty
priority inversion.
And there's not a lot we can do at the kernel level. We could
dispatch the truncate to a workqueue and just make sure the file name
has disappeared from the file system name space before the unlink() to
userspace, but then the disk space gets released after the unlink()
call returns, which can cause other problems.
- Ted
next prev parent reply other threads:[~2012-12-21 23:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-21 1:25 Uninitialized extent races Jan Kara
2012-12-21 3:11 ` Theodore Ts'o
2012-12-21 16:19 ` Jan Kara
2012-12-21 18:02 ` Theodore Ts'o
2012-12-21 22:49 ` Jan Kara
2012-12-21 23:03 ` Theodore Ts'o [this message]
2012-12-24 11:17 ` Zheng Liu
2012-12-31 8:32 ` Jan Kara
2012-12-31 16:31 ` Zheng Liu
2012-12-31 16:44 ` Jan Kara
2013-01-01 4:49 ` Zheng Liu
2012-12-21 12:34 ` Dmitry Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121221230335.GH31731@thunk.org \
--to=tytso@mit.edu \
--cc=dmonakhov@openvz.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).