linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Miao Xie <miaox@cn.fujitsu.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v3] Btrfs: fix race of using total_bytes_pinned
Date: Wed, 2 Jul 2014 15:52:55 +0800	[thread overview]
Message-ID: <20140702075254.GA20612@localhost.localdomain> (raw)
In-Reply-To: <53B36E41.4030601@cn.fujitsu.com>

On Wed, Jul 02, 2014 at 10:28:17AM +0800, Miao Xie wrote:
> On Tue, 1 Jul 2014 17:28:46 +0800, Liu Bo wrote:
> > This percpu counter @total_bytes_pinned is introduced to skip unnecessary
> > operations of 'commit transaction', it accounts for those space we may free
> > but are stuck in delayed refs.
> > 
> > And we zero out @space_info->total_bytes_pinned every transaction period so
> > we have a better idea of how much space we'll actually free up by committing
> > this transaction.  However, we do the 'zero out' part a little earlier, before
> > we actually unpin space, so we end up returning ENOSPC when we actually have
> > free space that's just unpinned from committing transaction.
> > 
> > xfstests/generic/074 complained then.
> > 
> > This fixes it by actually accounting the percpu pinned number when 'unpin',
> > and when finding space for writing, if it finds that pinned bytes is less
> > than needed, we first try again to check if we have space now, as someone
> > may have committed transaction, yes means we're good, while no means we
> > really have run out of space.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> > v3: Really account the percpu pinned number when unpin, instead of zeroing
> >     it out, as we set transaction with UNBLOCKED before 'unpin', zeroing it
> >     out may end up with messing percpu pinned number(suggested by Miao).
> > v2: Add missing brakets for if statement
> > 
> >  fs/btrfs/extent-tree.c | 20 ++++++++++++--------
> >  1 file changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 99c2539..f36fb13 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -3685,7 +3685,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes)
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	struct btrfs_fs_info *fs_info = root->fs_info;
> >  	u64 used;
> > -	int ret = 0, committed = 0, alloc_chunk = 1;
> > +	int ret = 0, committed = 0, alloc_chunk = 1, check_pinned = 0;
> >  
> >  	/* make sure bytes are sectorsize aligned */
> >  	bytes = ALIGN(bytes, root->sectorsize);
> > @@ -3756,11 +3756,18 @@ alloc:
> >  		 * allocation don't bother committing the transaction.
> >  		 */
> >  		if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
> > -					   bytes) < 0)
> > -			committed = 1;
> > +					   bytes) < 0) {
> > +			if (check_pinned) {
> > +				committed = 1;	/* really run out of space. */
> > +			} else {
> > +				check_pinned = 1;
> > +				spin_unlock(&data_sinfo->lock);
> > +				goto again;
> > +			}
> > +		}
> 
> Is there anything wrong in the original code? I think we needn't change the code here
> because we are not sure anyone would commit the transaction, and though someone is
> committing the transaction, the process is slow, we still need commit the transaction
> by ourselves mostly(In fact, this commit is just to wait the transaction to complete)
> (The above analysis has not been confirmed)

oops, I agree that they're no more needed.

I made these changes based on zeroing pinned number in
btrfs_prepare_extent_commit(), if someone has committed the transaction and
been running btrfs_prepare_extent_commit(), the pinned number is here 0 byte
but we'll own free space as soon as finishing btrfs_finish_extent_commit()
later, so I made it 'goto again' to see if we can skip committing transaction.

However, we don't need it any more, now pinned number is calculated within
@sinfo->lock in unpin_extent_range(), (its value < bytes) means we have no
enough space for this write.

-liubo

> 
> Thanks
> Miao
> 
> >  		spin_unlock(&data_sinfo->lock);
> >  
> > -		/* commit the current transaction and try again */
> > +	/* commit the current transaction and try again */
> >  commit_trans:
> >  		if (!committed &&
> >  		    !atomic_read(&root->fs_info->open_ioctl_trans)) {
> > @@ -5678,7 +5685,6 @@ void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
> >  	struct btrfs_caching_control *next;
> >  	struct btrfs_caching_control *caching_ctl;
> >  	struct btrfs_block_group_cache *cache;
> > -	struct btrfs_space_info *space_info;
> >  
> >  	down_write(&fs_info->commit_root_sem);
> >  
> > @@ -5701,9 +5707,6 @@ void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
> >  
> >  	up_write(&fs_info->commit_root_sem);
> >  
> > -	list_for_each_entry_rcu(space_info, &fs_info->space_info, list)
> > -		percpu_counter_set(&space_info->total_bytes_pinned, 0);
> > -
> >  	update_global_block_rsv(fs_info);
> >  }
> >  
> > @@ -5741,6 +5744,7 @@ static int unpin_extent_range(struct btrfs_root *root, u64 start, u64 end)
> >  		spin_lock(&cache->lock);
> >  		cache->pinned -= len;
> >  		space_info->bytes_pinned -= len;
> > +		percpu_counter_add(&space_info->total_bytes_pinned, -len);
> >  		if (cache->ro) {
> >  			space_info->bytes_readonly += len;
> >  			readonly = true;
> > 
> 

      reply	other threads:[~2014-07-02  7:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-01  9:28 [PATCH v3] Btrfs: fix race of using total_bytes_pinned Liu Bo
2014-07-02  2:28 ` Miao Xie
2014-07-02  7:52   ` Liu Bo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140702075254.GA20612@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=miaox@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).