linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 7/7] writeback: Avoid iput() from flusher thread
Date: Wed, 21 Mar 2012 10:50:05 +1100	[thread overview]
Message-ID: <20120320235005.GT5091@dastard> (raw)
In-Reply-To: <1332284191-21076-8-git-send-email-jack@suse.cz>

On Tue, Mar 20, 2012 at 11:56:31PM +0100, Jan Kara wrote:
> Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
> because iput() can do a lot of work - for example truncate the inode if it's
> the last iput on unlinked file. Some filesystems depend on flusher thread
> progressing (e.g. because they need to flush delay allocated blocks to reduce
> allocation uncertainty) and so flusher thread doing truncate creates
> interesting dependencies and possibilities for deadlocks.
> 
> We get rid of iput() in flusher thread by using the fact that I_SYNC inode
> flag effectively pins the inode in memory. So if we take care to either hold
> i_lock or have I_SYNC set, we can get away without taking inode reference
> in writeback_sb_inodes().
> 
> To make things work we have to move waiting for I_SYNC from end_writeback() to
> evict() just before calling of ->evict_inode. This is because several
> filesystems call end_writeback() after they have deleted the inode (btrfs,
> gfs2, ...) and other filesystems (ext3, ext4, reiserfs, ...) can deadlock when
> waiting for I_SYNC because they call end_writeback() from within a transaction.
> Both were not really a problem previously because flusher thread and
> ->evict_inode() could not run in parallel but now these two could race.
> So moving of I_SYNC wait prevents possible races..
> 
> As a side effect of these changes, we also fix possible use-after-free in
> wb_writeback() because inode_wait_for_writeback() call could try to reacquire
> i_lock on the inode that was already free.
.....

> diff --git a/fs/inode.c b/fs/inode.c
> index d3ebdbe..3869714 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -510,7 +510,6 @@ void end_writeback(struct inode *inode)
>  	BUG_ON(!list_empty(&inode->i_data.private_list));
>  	BUG_ON(!(inode->i_state & I_FREEING));
>  	BUG_ON(inode->i_state & I_CLEAR);
> -	inode_sync_wait(inode);
>  	/* don't need i_lock here, no concurrent mods to i_state */
>  	inode->i_state = I_FREEING | I_CLEAR;
>  }
> @@ -541,6 +540,18 @@ static void evict(struct inode *inode)
>  
>  	inode_sb_list_del(inode);
>  
> +	/*
> +	 * Wait for flusher thread to be done with the inode so that filesystem
> +	 * does not start destroying it while writeback is still running. Since
> +	 * the inode has I_FREEING set, flusher thread won't start new work on
> +	 * the inode.  We just have to wait for running writeback to finish. We
> +	 * must use i_lock here because flusher thread might be working with
> +	 * the inode without I_SYNC set but under i_lock.
> +	 */
> +	spin_lock(&inode->i_lock);
> +	inode_wait_for_writeback(inode);
> +	spin_unlock(&inode->i_lock);
> +

Why move this wait from end_writeback() to here?  The whole point
of end_writeback() is to provide a barrier that guarantees that
there is no async writeback running when it returns, so it seems
strange to move the barrier out of the function that is supposed to
provide the barrier....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2012-03-20 23:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 22:56 [PATCH 0/7 v2] writeback: Avoid iput() from flusher thread Jan Kara
2012-03-20 22:56 ` [PATCH 1/7] writeback: Move clearing of I_SYNC into inode_sync_complete() Jan Kara
2012-04-30 14:36   ` Christoph Hellwig
2012-04-30 21:30     ` Jan Kara
2012-03-20 22:56 ` [PATCH 2/7] writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() Jan Kara
2012-04-30 14:38   ` Christoph Hellwig
2012-03-20 22:56 ` [PATCH 3/7] writeback: Move I_DIRTY_PAGES handling Jan Kara
2012-03-22  2:41   ` Fengguang Wu
2012-03-22  8:35     ` Jan Kara
2012-03-28  3:11       ` Fengguang Wu
2012-03-28 15:12         ` Christoph Hellwig
2012-04-30 14:39           ` Christoph Hellwig
2012-04-30 14:41   ` Christoph Hellwig
2012-04-30 21:21     ` Jan Kara
2012-03-20 22:56 ` [PATCH 4/7] writeback: Separate inode requeueing after writeback Jan Kara
2012-04-30 14:43   ` Christoph Hellwig
2012-04-30 21:42     ` Jan Kara
2012-03-20 22:56 ` [PATCH 5/7] writeback: Remove wb->list_lock from writeback_single_inode() Jan Kara
2012-04-30 14:44   ` Christoph Hellwig
2012-03-20 22:56 ` [PATCH 6/7] writeback: Refactor writeback_single_inode() Jan Kara
2012-03-20 23:35   ` Dave Chinner
2012-03-21 10:03     ` Jan Kara
2012-04-30 14:46   ` Christoph Hellwig
2012-03-20 22:56 ` [PATCH 7/7] writeback: Avoid iput() from flusher thread Jan Kara
2012-03-20 23:50   ` Dave Chinner [this message]
2012-03-21 10:25     ` Jan Kara
2012-03-22  3:03       ` Fengguang Wu
2012-03-22  6:27       ` Dave Chinner
2012-03-22  9:50         ` Jan Kara
2012-03-22  3:01   ` Fengguang Wu
2012-04-30 15:30   ` Christoph Hellwig
2012-04-30 21:58     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120320235005.GT5091@dastard \
    --to=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).