All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.com>
Cc: Jan Kara <jack@suse.cz>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>, Tejun Heo <tj@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>,
	NeilBrown <neilb@suse.de>
Subject: Re: [PATCH 07/10] writeback: Implement reliable switching to default writeback structure
Date: Fri, 10 Feb 2017 14:20:07 +0100	[thread overview]
Message-ID: <20170210132007.GA5491@quack2.suse.cz> (raw)
In-Reply-To: <87zihuu6rj.fsf@notabene.neil.brown.name>

On Fri 10-02-17 13:19:44, NeilBrown wrote:
> On Thu, Feb 09 2017, Jan Kara wrote:
> 
> > Currently switching of inode between different writeback structures is
> > asynchronous and not guaranteed to succeed. Add a variant of switching
> > that is synchronous and reliable so that it can reliably move inode to
> > the default writeback structure (bdi->wb) when writeback on bdi is going
> > to be shutdown.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/fs-writeback.c         | 60 ++++++++++++++++++++++++++++++++++++++++-------
> >  include/linux/fs.h        |  3 ++-
> >  include/linux/writeback.h |  6 +++++
> >  3 files changed, 60 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 23dc97cf2a50..52992a1036b1 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -332,14 +332,11 @@ struct inode_switch_wbs_context {
> >  	struct work_struct	work;
> >  };
> >  
> > -static void inode_switch_wbs_work_fn(struct work_struct *work)
> > +static void do_inode_switch_wbs(struct inode *inode,
> > +				struct bdi_writeback *new_wb)
> >  {
> > -	struct inode_switch_wbs_context *isw =
> > -		container_of(work, struct inode_switch_wbs_context, work);
> > -	struct inode *inode = isw->inode;
> >  	struct address_space *mapping = inode->i_mapping;
> >  	struct bdi_writeback *old_wb = inode->i_wb;
> > -	struct bdi_writeback *new_wb = isw->new_wb;
> >  	struct radix_tree_iter iter;
> >  	bool switched = false;
> >  	void **slot;
> > @@ -436,15 +433,29 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> >  	spin_unlock(&new_wb->list_lock);
> >  	spin_unlock(&old_wb->list_lock);
> >  
> > +	/*
> > +	 * Make sure waitqueue_active() check in wake_up_bit() cannot happen
> > +	 * before I_WB_SWITCH is cleared. Pairs with the barrier in
> > +	 * set_task_state() after wait_on_bit() added waiter to the wait queue.
> 
> I think you mean "set_current_state()" ??

Yes, I'll fix that.

> It's rather a trap for the unwary, this need for a smp_mb().
> Greping for wake_up_bit(), I find quite a few places with barriers -
> sometimes clear_bit_unlock() or spin_unlock() - but
> 
> fs/block_dev.c-         whole->bd_claiming = NULL;
> fs/block_dev.c:         wake_up_bit(&whole->bd_claiming, 0);
> 
> fs/cifs/connect.c-      clear_bit(TCON_LINK_PENDING, &tlink->tl_flags);
> fs/cifs/connect.c:      wake_up_bit(&tlink->tl_flags, TCON_LINK_PENDING);
> 
> fs/cifs/misc.c-                 clear_bit(CIFS_INODE_PENDING_WRITERS, &cinode->flags);
> fs/cifs/misc.c:                 wake_up_bit(&cinode->flags, CIFS_INODE_PENDING_WRITERS);
> 
> (several more in cifs)
> 
> net/sunrpc/xprt.c-      clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
> net/sunrpc/xprt.c-      xprt->ops->close(xprt);
> net/sunrpc/xprt.c-      xprt_release_write(xprt, NULL);
> net/sunrpc/xprt.c:      wake_up_bit(&xprt->state, XPRT_LOCKED);
> (there might be a barrier in ->close or xprt_release_write() I guess)
> 
> security/keys/gc.c-             clear_bit(KEY_GC_REAPING_KEYTYPE, &key_gc_flags);
> security/keys/gc.c:             wake_up_bit(&key_gc_flags, KEY_GC_REAPING_KEYTYPE);

Yup, the above look like bugs.

> I wonder if there is a good way to make this less error-prone.
> I would suggest that wake_up_bit() should always have a barrier, and
> __wake_up_bit() is needed to avoid it, but there is already a
> __wake_up_bit() with a slightly different interface.

Yeah, it is error-prone as all waitqueue_active() optimizations...
 
> In this case, you have a spin_unlock() just before the wake_up_bit().
> It is my understand that it would provide enough of a barrier (all
> writes before are globally visible after), so do you really need
> the barrier here?

I believe I do. spin_unlock() is a semi-permeable barrier - i.e., any read
or write from "outside" can be moved inside. So CPU is free to prefetch
values for waitqueue active checks before the spinlock is unlocked or even
before clearing I_WB_SWITCH bit.

> > +	 */
> > +	smp_mb();
> > +	wake_up_bit(&inode->i_state, __I_WB_SWITCH);
> > +
> >  	if (switched) {
> >  		wb_wakeup(new_wb);
> >  		wb_put(old_wb);
> >  	}
> > -	wb_put(new_wb);
> > +}
> >  
> > -	iput(inode);
> > -	kfree(isw);
> > +static void inode_switch_wbs_work_fn(struct work_struct *work)
> > +{
> > +	struct inode_switch_wbs_context *isw =
> > +		container_of(work, struct inode_switch_wbs_context, work);
> >  
> > +	do_inode_switch_wbs(isw->inode, isw->new_wb);
> > +	wb_put(isw->new_wb);
> > +	iput(isw->inode);
> > +	kfree(isw);
> >  	atomic_dec(&isw_nr_in_flight);
> >  }
> >  
> > @@ -521,6 +532,39 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
> >  }
> >  
> >  /**
> > + * inode_switch_to_default_wb_sync - change the wb association of an inode to
> > + *	the default writeback structure synchronously
> > + * @inode: target inode
> > + *
> > + * Switch @inode's wb association to the default writeback structure (bdi->wb).
> > + * Unlike inode_switch_wbs() the switching is performed synchronously and we
> > + * guarantee the inode is switched to the default writeback structure when this
> > + * function returns. Nothing prevents from someone else switching inode to
> > + * another writeback structure just when we are done though. Preventing that is
> > + * upto the caller if needed.
> > + */
> > +void inode_switch_to_default_wb_sync(struct inode *inode)
> > +{
> > +	struct backing_dev_info *bdi = inode_to_bdi(inode);
> > +
> > +	/* while holding I_WB_SWITCH, no one else can update the association */
> > +	spin_lock(&inode->i_lock);
> > +	if (WARN_ON_ONCE(inode->i_state & I_FREEING) ||
> > +	    !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb) {
> > +		spin_unlock(&inode->i_lock);
> > +		return;
> > +	}
> > +	__inode_wait_for_state_bit(inode, __I_WB_SWITCH);
> 
> I note that __inode_wait_for_state_bit() can drop and reclaim ->i_lock.
> is it possible that:
>   !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb)
> 
> could change while ->i_lock is unlocked?
> It would be particular unfortunate if inode_to_wb(inode) became &bdi->wb
> due to some thing thread, as do_inode_switch_wbs() will deadlock if
>   inode_to_wb(inode) == &bdi->wb
> 
> i.e. do you need to repeat the test?

That's a very good question and I think you are right that I need to repeat
the checks for inode->i_wb. Will fix. Thanks for review!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2017-02-10 13:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
2017-02-09 12:44 ` [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
2017-02-12  3:58   ` Tejun Heo
2017-02-20 14:53     ` Jan Kara
2017-02-09 12:44 ` [PATCH 02/10] block: Unhash also block device inode for the whole device Jan Kara
2017-02-12  4:16   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
2017-02-09 15:54   ` Jan Kara
2017-02-12  4:22     ` Tejun Heo
2017-02-09 12:44 ` [PATCH 04/10] block: Move bdi_unregister() to del_gendisk() Jan Kara
2017-02-10  2:21   ` NeilBrown
2017-02-12  4:31   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function Jan Kara
2017-02-12  4:32   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 06/10] writeback: Move __inode_wait_for_state_bit Jan Kara
2017-02-09 12:44 ` [PATCH 07/10] writeback: Implement reliable switching to default writeback structure Jan Kara
2017-02-10  2:19   ` NeilBrown
2017-02-10 13:20     ` Jan Kara [this message]
2017-02-09 12:44 ` [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
2017-02-12  4:40   ` Tejun Heo
2017-02-20 16:58     ` Jan Kara
2017-02-09 12:44 ` [PATCH 09/10] kobject: Export kobject_get_unless_zero() Jan Kara
2017-02-12  4:41   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 10/10] block: Fix oops scsi_disk_get() Jan Kara
2017-02-12  4:43   ` Tejun Heo
2017-02-09 14:52 ` [PATCH 0/10] block: Fix block device shutdown related races Thiago Jung Bauermann
2017-02-09 15:48   ` Jan Kara
2017-02-13 14:27 ` Thiago Jung Bauermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170210132007.GA5491@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=bauerman@linux.vnet.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=neilb@suse.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.