linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, tytso@mit.edu, wenqing.lz@taobao.com
Subject: Re: [PATCH 4/7] ext4: fsync should wait for DIO writers
Date: Wed, 12 Sep 2012 16:02:18 +0200	[thread overview]
Message-ID: <20120912140218.GC5726@quack.suse.cz> (raw)
In-Reply-To: <87bohegn97.fsf@openvz.org>

On Mon 10-09-12 14:56:04, Dmitry Monakhov wrote:
> On Mon, 10 Sep 2012 11:51:35 +0200, Jan Kara <jack@suse.cz> wrote:
> > > Even more i_mutex is not holded while punch_hole which obviously
> > > result in dangerous data corruption due to write-after-free.
> >   Yes, that's a bug. I also noticed that but didn't get to fixing it (I'm
> > actually working on a more long term fix using range locking but that's
> > more of a research project so having somehow fixed at least the most
> > blatant locking problems is good).
> Yes you right. In order to do things right we should block:
> 1) direct io
> 2) pagecache /mmap users (writeback, readpage)
> 
> A assumes I've fixed (1) but (2) is still exist
> 
> My current assumption is to do actions similar to writeback
> 
>    down_write(EXT4_I(inode)->i_data_sem)
>    while (index <= end && pagevec_lookup(&pvec, mapping, index,...) {
>         lock_page(pvec[i]);
  Here you need to use trylock to avoid possible deadlocks...

>         zero_user_page(pvec[i], 0, PAGE_SIZE);
>         ret = try_to_release_page(pvec[i]);
>    }
>    /* At this moment we know that we locked all pages in range,
>     * NOTE!!!! currently ext_remove_space may drop i_data_sem internally
>     * so it should be modified to exit once i_mutex was dropped
>    */
>    ret = ext4_ext_remove_space(inode, from, to, NO_RELOCK)
>    while (pvec_num)
>          unlock_page(pvec[i])
>    }
>    up_write(EXT4_I(inode)->i_data_sem)
> 
> Number of locked pages should not be too large
> Or even more instead of massive page locking, we can lock page
> one by one, and simulate fake writeback, so all new writers will
> wait on that bit, but readers will see zeroes.
>    down_write(EXT4_I(inode)->i_data_sem)
>    while (index <= end && pagevec_lookup(&pvec, mapping, index,...) {
>         lock_page(pvec[i]);
>         zero_user_page(pvec[i], 0, PAGE_SIZE);
>         ret = try_to_release_page(pvec[i]);
>         set_page_writeback(pvec[i]);
>         unlock_page(pvec[i])
>    }
>    
>    ret = ext4_ext_remove_space(inode, from, to, NO_RELOCK)
>    while (pvec_num) {
>          end_page_writeback(pvec[i])
>    }
>    up_write(EXT4_I(inode)->i_data_sem)
  Oh, that's a hack. Please don't do that. Using page locks is cleaner
although I agree it's not very good either. That's why I decided not to
loose time with suboptimal solutions and rather look into range locking...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-09-12 14:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-09 17:27 [PATCH 0/7] ext4: Bunch of DIO/AIO fixes Dmitry Monakhov
2012-09-09 17:27 ` [PATCH 1/7] ext4: ext4_inode_info diet Dmitry Monakhov
2012-09-13 10:50   ` Zheng Liu
2012-09-13 11:15     ` Dmitry Monakhov
2012-09-15 15:53       ` Theodore Ts'o
2012-09-09 17:27 ` [PATCH 2/7] ext4: completed_io locking cleanup Dmitry Monakhov
2012-09-10  9:23   ` Jan Kara
2012-09-10 10:19     ` Dmitry Monakhov
2012-09-13 10:48   ` Zheng Liu
2012-09-09 17:27 ` [PATCH 3/7] ext4: serialize dio nolocked reads with defrag workers V2 Dmitry Monakhov
2012-09-10  9:31   ` Jan Kara
2012-09-10 10:00     ` Jan Kara
2012-09-09 17:27 ` [PATCH 4/7] ext4: fsync should wait for DIO writers Dmitry Monakhov
2012-09-10  9:51   ` Jan Kara
2012-09-10 10:56     ` Dmitry Monakhov
2012-09-12 14:02       ` Jan Kara [this message]
2012-09-12  5:40     ` Zheng Liu
2012-09-13 10:46   ` Zheng Liu
2012-09-13 11:01     ` Dmitry Monakhov
2012-09-13 12:36       ` Zheng Liu
2012-09-09 17:27 ` [PATCH 5/7] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
2012-09-10  9:54   ` Jan Kara
2012-09-09 17:27 ` [PATCH 6/7] ext4: endless truncate due to nonlocked dio readers V2 Dmitry Monakhov
2012-09-13 10:41   ` Zheng Liu
2012-09-13 12:07     ` Jan Kara
2012-09-13 12:57       ` Zheng Liu
2012-09-13 14:34         ` Jan Kara
2012-09-13 23:31           ` Zheng Liu
2012-09-09 17:27 ` [PATCH 7/7] ext4: serialize truncate with owerwrite DIO workers V2 Dmitry Monakhov
2012-09-13 10:37   ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120912140218.GC5726@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).