From: Jan Kara <jack@suse.cz>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, jack@suse.cz,
lczerner@redhat.com
Subject: Re: [PATCH 04/10] ext4: completed_io locking cleanup V3
Date: Wed, 26 Sep 2012 15:42:12 +0200 [thread overview]
Message-ID: <20120926134212.GE10145@quack.suse.cz> (raw)
In-Reply-To: <1348487060-19598-5-git-send-email-dmonakhov@openvz.org>
On Mon 24-09-12 15:44:14, Dmitry Monakhov wrote:
> Current unwritten extent conversion state-machine is very fuzzy.
> - By unknown reason it want perform conversion under i_mutex. What for?
> It was initially added by Theodore. Please comment your initial assumption.
> My diagnosis:
> We already protect extent tree with i_data_sem, truncate should
> wait for DIO in flight, so the only data we have to protect io->flags
> modification, but only flush_completed_IO and work are modified this
> flags and we can serialize them via i_completed_io_lock.
>
> Currently all this games with mutex_trylock result in following deadlock
> truncate: kworker:
> ext4_setattr ext4_end_io_work
> mutex_lock(i_mutex)
> inode_dio_wait(inode) ->BLOCK
> DEADLOCK<- mutex_trylock()
> inode_dio_done()
> #TEST_CASE1_BEGIN
> MNT=/mnt_scrach
> unlink $MNT/file
> fallocate -l $((1024*1024*1024)) $MNT/file
> aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
> sleep 2
> truncate -s 0 $MNT/file
> #TEST_CASE1_END
>
> Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286
>
> This patch makes state machine simple and clean:
> (1) ext4_end_io_work is responsible for handling all pending
> end_io from ei->i_completed_io_list(per inode list)
> NOTE1: i_completed_io_lock is acquired only once
> NOTE2: i_mutex is not required because it does not protect
> any data guarded by i_mutex any more
>
> (2) xxx_end_io schedule end_io context completion simply by pushing it
> to the inode's list.
> NOTE1: because of (1) work should be queued only if
> ->i_completed_io_list was empty at the moment, otherwise it
> work is scheduled already.
>
> (3) No one is able to free inode's blocks while pented io_completion
> exist othervise may result in blocks beyond EOF, this
> stated by the fact that all truncate routines wait for
> all pended unwritten requets in flight
>
> (4) Replace flush_completed_io() with ext4_unwritten_wait(). This
> allow greatly simplify state machine because end_io conext
> will be destroyed only in one place (end_io_work)
>
>
> - remove EXT4_IO_END_QUEUED and EXT4_IO_END_FSYNC flags because
> end_io is now destroyed from known context
> - Improve SMP scalability by removing useless i_mutex which does not
> protect io->flags anymore.
> - Reduce lock contention on i_completed_io_lock by optimizing list walk.
> - Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
>
> Changes since V2:
> Fix use-after-free caused by race truncate vs end_io_work
Nice work! Some comments below:
...
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 9970022..fa69bba 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -57,6 +57,29 @@ void ext4_ioend_wait(struct inode *inode)
> wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_ioend_count) == 0));
> }
>
> +void ext4_unwritten_wait(struct inode *inode)
> +{
> + wait_queue_head_t *wq = ext4_ioend_wq(inode);
> +
> + wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_unwritten) == 0));
> +}
I would add WARN_ON_ONCE(!mutex_locked(inode->i_mutex)) here because
without i_mutex this could be easily livelocked... Also I'm somewhat uneasy
that we wait for worker to do the work but it can be rather busy with
completing work for other inodes. So won't this slow down e.g. fsync() or
truncate() when there is heavy writing to other inodes? I guess some
numbers would be appropriate here...
> @@ -83,12 +106,7 @@ void ext4_free_io_end(ext4_io_end_t *io)
> kmem_cache_free(io_end_cachep, io);
> }
>
> -/*
> - * check a range of space and convert unwritten extents to written.
> - *
> - * Called with inode->i_mutex; we depend on this when we manipulate
> - * io->flag, since we could otherwise race with ext4_flush_completed_IO()
> - */
> +/* check a range of space and convert unwritten extents to written. */
> int ext4_end_io_nolock(ext4_io_end_t *io)
> {
> struct inode *inode = io->inode;
ext4_end_io_nolock() is a misnomer now. So just make it ext4_end_io() and
make it static.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-09-26 13:42 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-24 11:44 [PATCH 00/10] ext4: Bunch of DIO/AIO fixes V3 Dmitry Monakhov
2012-09-24 11:44 ` [PATCH 01/10] ext4: ext4_inode_info diet Dmitry Monakhov
2012-09-26 12:28 ` Jan Kara
2012-09-24 11:44 ` [PATCH 02/10] ext4: give i_aiodio_unwritten more appropriate name Dmitry Monakhov
2012-09-26 12:32 ` Jan Kara
2012-09-24 11:44 ` [PATCH 03/10] ext4: fix unwritten counter leakage Dmitry Monakhov
2012-09-26 13:07 ` Jan Kara
2012-09-27 12:19 ` Dmitry Monakhov
2012-09-27 12:34 ` Jan Kara
2012-09-27 12:54 ` Dmitry Monakhov
2012-09-27 13:07 ` Jan Kara
2012-09-24 11:44 ` [PATCH 04/10] ext4: completed_io locking cleanup V3 Dmitry Monakhov
2012-09-26 13:42 ` Jan Kara [this message]
2012-09-27 11:24 ` Dmitry Monakhov
2012-09-24 11:44 ` [PATCH 05/10] ext4: serialize dio nonlocked reads with defrag workers V3 Dmitry Monakhov
2012-09-26 13:49 ` Jan Kara
2012-09-24 11:44 ` [PATCH 06/10] ext4: punch_hole should wait for DIO writers V2 Dmitry Monakhov
2012-09-26 13:56 ` Jan Kara
2012-09-24 11:44 ` [PATCH 07/10] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
2012-09-24 11:44 ` [PATCH 08/10] ext4: endless truncate due to nonlocked dio readers V2 Dmitry Monakhov
2012-09-26 14:05 ` Jan Kara
2012-09-27 15:11 ` Dmitry Monakhov
2012-09-27 15:23 ` Jan Kara
2012-09-24 11:44 ` [PATCH 09/10] ext4: serialize truncate with owerwrite DIO workers V2 Dmitry Monakhov
2012-09-24 11:44 ` [PATCH 10/10] ext4: fix ext_remove_space for punch_hole case Dmitry Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120926134212.GE10145@quack.suse.cz \
--to=jack@suse.cz \
--cc=dmonakhov@openvz.org \
--cc=lczerner@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).