linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, jack@suse.cz,
	wenqing.lz@taobao.com
Subject: Re: [PATCH 4/7] ext4: fsync should wait for DIO writers
Date: Mon, 10 Sep 2012 11:51:35 +0200	[thread overview]
Message-ID: <20120910095135.GF22903@quack.suse.cz> (raw)
In-Reply-To: <1347211634-11509-5-git-send-email-dmonakhov@openvz.org>

On Sun 09-09-12 21:27:11, Dmitry Monakhov wrote:
> fsync and punch_hole are the places where we have to wait for all
> existing writers (writeback, aio, dio), but currently we simply
> flush pended end_io request which is not sufficient.
  Why not? I guess you mean the fact that there can be DIO in flight for
which end_io() was not called so it is not queued in the queue? But that is
OK - we have not yet called aio_complete() for that IO so for userspace the
write has not happened yet. Thus there's no need to flush it to disk -
fsync() does not say anything about writes in progress while fsync is
called.

> Even more i_mutex is not holded while punch_hole which obviously
> result in dangerous data corruption due to write-after-free.
  Yes, that's a bug. I also noticed that but didn't get to fixing it (I'm
actually working on a more long term fix using range locking but that's
more of a research project so having somehow fixed at least the most
blatant locking problems is good).

								Honza

> 
> This patch performs following changes:
> 
> - Guard punch_hole with i_mutex
> - fsync and punch_hole now wait for all writers in flight
>   NOTE: XXX write-after-free race is still possible because
>   truncate_pagecache_range() is not completely reliable and where
>   is no easy way to stop writeback while punch_hole is in progress.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |   10 ++++++++--
>  fs/ext4/fsync.c   |    1 +
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index e993879..8252651 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4845,6 +4845,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  			return err;
>  	}
>  
> +	mutex_lock(&inode->i_mutex);
>  	/* Now release the pages */
>  	if (last_page_offset > first_page_offset) {
>  		truncate_pagecache_range(inode, first_page_offset,
> @@ -4852,12 +4853,15 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	}
>  
>  	/* finish any pending end_io work */
> +	inode_dio_wait(inode);
>  	ext4_flush_completed_IO(inode);
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, credits);
> -	if (IS_ERR(handle))
> -		return PTR_ERR(handle);
> +	if (IS_ERR(handle)) {
> +		err = PTR_ERR(handle);
> +		goto out_mutex;
> +	}
>  
>  	err = ext4_orphan_add(handle, inode);
>  	if (err)
> @@ -4951,6 +4955,8 @@ out:
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  	ext4_journal_stop(handle);
> +out_mutex:
> +	mutex_unlock(&inode->i_mutex);
>  	return err;
>  }
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 24f3719..290c5cf 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -204,6 +204,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  	if (inode->i_sb->s_flags & MS_RDONLY)
>  		goto out;
>  
> +	inode_dio_wait(inode);
>  	ret = ext4_flush_completed_IO(inode);
>  	if (ret < 0)
>  		goto out;
> -- 
> 1.7.7.6
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-09-10  9:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-09 17:27 [PATCH 0/7] ext4: Bunch of DIO/AIO fixes Dmitry Monakhov
2012-09-09 17:27 ` [PATCH 1/7] ext4: ext4_inode_info diet Dmitry Monakhov
2012-09-13 10:50   ` Zheng Liu
2012-09-13 11:15     ` Dmitry Monakhov
2012-09-15 15:53       ` Theodore Ts'o
2012-09-09 17:27 ` [PATCH 2/7] ext4: completed_io locking cleanup Dmitry Monakhov
2012-09-10  9:23   ` Jan Kara
2012-09-10 10:19     ` Dmitry Monakhov
2012-09-13 10:48   ` Zheng Liu
2012-09-09 17:27 ` [PATCH 3/7] ext4: serialize dio nolocked reads with defrag workers V2 Dmitry Monakhov
2012-09-10  9:31   ` Jan Kara
2012-09-10 10:00     ` Jan Kara
2012-09-09 17:27 ` [PATCH 4/7] ext4: fsync should wait for DIO writers Dmitry Monakhov
2012-09-10  9:51   ` Jan Kara [this message]
2012-09-10 10:56     ` Dmitry Monakhov
2012-09-12 14:02       ` Jan Kara
2012-09-12  5:40     ` Zheng Liu
2012-09-13 10:46   ` Zheng Liu
2012-09-13 11:01     ` Dmitry Monakhov
2012-09-13 12:36       ` Zheng Liu
2012-09-09 17:27 ` [PATCH 5/7] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
2012-09-10  9:54   ` Jan Kara
2012-09-09 17:27 ` [PATCH 6/7] ext4: endless truncate due to nonlocked dio readers V2 Dmitry Monakhov
2012-09-13 10:41   ` Zheng Liu
2012-09-13 12:07     ` Jan Kara
2012-09-13 12:57       ` Zheng Liu
2012-09-13 14:34         ` Jan Kara
2012-09-13 23:31           ` Zheng Liu
2012-09-09 17:27 ` [PATCH 7/7] ext4: serialize truncate with owerwrite DIO workers V2 Dmitry Monakhov
2012-09-13 10:37   ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120910095135.GF22903@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).