From: Dmitry Monakhov <dmonakhov@openvz.org>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, jack@suse.cz,
wenqing.lz@taobao.com
Subject: Re: [PATCH 2/7] ext4: completed_io locking cleanup
Date: Mon, 10 Sep 2012 14:19:52 +0400 [thread overview]
Message-ID: <87ehmagoxj.fsf@openvz.org> (raw)
In-Reply-To: <20120910092312.GD22903@quack.suse.cz>
On Mon, 10 Sep 2012 11:23:12 +0200, Jan Kara <jack@suse.cz> wrote:
> On Sun 09-09-12 21:27:09, Dmitry Monakhov wrote:
> > Current unwritten extent conversion state-machine is very fuzzy.
> > - By unknown reason it want perform conversion under i_mutex. What for?
> > All this games with mutex_trylock result in following deadlock
> > truncate: kworker:
> > ext4_setattr ext4_end_io_work
> > mutex_lock(i_mutex)
> > inode_dio_wait(inode) ->BLOCK
> > DEADLOCK<- mutex_trylock()
> > inode_dio_done()
> > #TEST_CASE1_BEGIN
> > MNT=/mnt_scrach
> > unlink $MNT/file
> > fallocate -l $((1024*1024*1024)) $MNT/file
> > aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
> > sleep 2
> > truncate -s 0 $MNT/file
> > #TEST_CASE1_END
> >
> > This patch makes state machine simple and clean:
> >
> > (1) ext4_flush_completed_IO is responsible for handling all pending
> > end_io from ei->i_completed_io_list(per inode list)
> > NOTE1: i_completed_io_lock is acquired only once
> > NOTE2: i_mutex is not required because it does not protect
> > any data guarded by i_mutex
> Removing of i_mutex might be OK but you really need to add more
> justification (both into the changelog and end_io code) than just "it does
> not protect any data guarded by i_mutex". So far i_mutex is supposed to
> protect all modifications of extent tree, now that won't be true anymore.
> We have i_data_sem protecting extent tree as well but that is acquired for
> single extent operation only so it cannot be used for guarding "global
> properties of the extent tree" - e.g. if end_io code would race with
> truncate it could happen that end_io would create some extents beyond EOF.
> Now maybe that cannot happen because of other synchronization mechanisms
> (e.g. inode_dio_wait(), or waiting for PageWriteback while invalidating
> page cache - although extra care needs to be taken when extents straddle
> i_size and there can be IO running on the part of the extent before
> i_size). So I don't immediaetly see a problem but please add more
> justification to convince me (and future readers of the code) here...
Ok you right. Actually I've already tried to figure out correct BUG_ON
condition to spot extent beyond EOF, but this condition is not trivial
and need justification. My current assumption:
Since EXT4_GET_BLOCKS_UNINIT_EXT is used only then file size is not
changed it seems to be correct to prohibit
ext4_convert_unwritten_extents() to handle blocks beyond
EXT4_I(inode)->i_disksize (inode->i_size is not suitable here
because truncate may already reduce it, and now wait for existing dio)
As a naive word in my defence i should say that blocks beyond EOF issue easily
triggered on current kernel via xfstests, but not happen w/ my patch-queue.
next prev parent reply other threads:[~2012-09-10 10:19 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-09 17:27 [PATCH 0/7] ext4: Bunch of DIO/AIO fixes Dmitry Monakhov
2012-09-09 17:27 ` [PATCH 1/7] ext4: ext4_inode_info diet Dmitry Monakhov
2012-09-13 10:50 ` Zheng Liu
2012-09-13 11:15 ` Dmitry Monakhov
2012-09-15 15:53 ` Theodore Ts'o
2012-09-09 17:27 ` [PATCH 2/7] ext4: completed_io locking cleanup Dmitry Monakhov
2012-09-10 9:23 ` Jan Kara
2012-09-10 10:19 ` Dmitry Monakhov [this message]
2012-09-13 10:48 ` Zheng Liu
2012-09-09 17:27 ` [PATCH 3/7] ext4: serialize dio nolocked reads with defrag workers V2 Dmitry Monakhov
2012-09-10 9:31 ` Jan Kara
2012-09-10 10:00 ` Jan Kara
2012-09-09 17:27 ` [PATCH 4/7] ext4: fsync should wait for DIO writers Dmitry Monakhov
2012-09-10 9:51 ` Jan Kara
2012-09-10 10:56 ` Dmitry Monakhov
2012-09-12 14:02 ` Jan Kara
2012-09-12 5:40 ` Zheng Liu
2012-09-13 10:46 ` Zheng Liu
2012-09-13 11:01 ` Dmitry Monakhov
2012-09-13 12:36 ` Zheng Liu
2012-09-09 17:27 ` [PATCH 5/7] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
2012-09-10 9:54 ` Jan Kara
2012-09-09 17:27 ` [PATCH 6/7] ext4: endless truncate due to nonlocked dio readers V2 Dmitry Monakhov
2012-09-13 10:41 ` Zheng Liu
2012-09-13 12:07 ` Jan Kara
2012-09-13 12:57 ` Zheng Liu
2012-09-13 14:34 ` Jan Kara
2012-09-13 23:31 ` Zheng Liu
2012-09-09 17:27 ` [PATCH 7/7] ext4: serialize truncate with owerwrite DIO workers V2 Dmitry Monakhov
2012-09-13 10:37 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ehmagoxj.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=wenqing.lz@taobao.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.