From: Omar Sandoval <osandov@osandov.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>
Subject: Re: [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations
Date: Thu, 10 May 2018 23:51:22 -0700	[thread overview]
Message-ID: <20180511065122.GB30748@vader> (raw)
In-Reply-To: <687716e4-9def-2990-b1f2-f757ccbc4ee9@suse.com>
On Fri, May 11, 2018 at 09:38:15AM +0300, Nikolay Borisov wrote:
> 
> 
> On 11.05.2018 03:11, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Currently, we keep space reserved for all inode orphan items until the
> > inode is evicted (i.e., all references to it are dropped). We hit an
> > issue where an application would keep a bunch of deleted files open (by
> > design) and thus keep a large amount of space reserved, causing ENOSPC
> > errors when other operations tried to reserve space. This long-standing
> > reservation isn't absolutely necessary for a couple of reasons:
> > 
> > - We can almost always make the reservation we need or steal from the
> >   global reserve for the orphan item
> > - If we can't, it's not the end of the world if we drop the orphan item
> >   on the floor and let the next mount clean it up
> > 
> > So, get rid of persistent reservation and just reserve space in
> > btrfs_evict_inode().
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  fs/btrfs/btrfs_inode.h |  19 +++---
> >  fs/btrfs/inode.c       | 142 ++++++++++++-----------------------------
> >  2 files changed, 50 insertions(+), 111 deletions(-)
> > 
> > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> > index 234bae55b85d..2f466cf55790 100644
> > --- a/fs/btrfs/btrfs_inode.h
> > +++ b/fs/btrfs/btrfs_inode.h
> > @@ -20,16 +20,15 @@
> >   * new data the application may have written before commit.
> >   */
> >  #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
> > -#define BTRFS_INODE_ORPHAN_META_RESERVED	1
> > -#define BTRFS_INODE_DUMMY			2
> > -#define BTRFS_INODE_IN_DEFRAG			3
> > -#define BTRFS_INODE_HAS_ORPHAN_ITEM		4
> > -#define BTRFS_INODE_HAS_ASYNC_EXTENT		5
> > -#define BTRFS_INODE_NEEDS_FULL_SYNC		6
> > -#define BTRFS_INODE_COPY_EVERYTHING		7
> > -#define BTRFS_INODE_IN_DELALLOC_LIST		8
> > -#define BTRFS_INODE_READDIO_NEED_LOCK		9
> > -#define BTRFS_INODE_HAS_PROPS		        10
> > +#define BTRFS_INODE_DUMMY			1
> > +#define BTRFS_INODE_IN_DEFRAG			2
> > +#define BTRFS_INODE_HAS_ORPHAN_ITEM		3
> > +#define BTRFS_INODE_HAS_ASYNC_EXTENT		4
> > +#define BTRFS_INODE_NEEDS_FULL_SYNC		5
> > +#define BTRFS_INODE_COPY_EVERYTHING		6
> > +#define BTRFS_INODE_IN_DELALLOC_LIST		7
> > +#define BTRFS_INODE_READDIO_NEED_LOCK		8
> > +#define BTRFS_INODE_HAS_PROPS		        9
> >  
> >  /* in memory btrfs inode */
> >  struct btrfs_inode {
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 348dc57920f5..b9a046b8c72c 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -3343,88 +3343,25 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
> >  /*
> >   * This creates an orphan entry for the given inode in case something goes wrong
> >   * in the middle of an unlink.
> > - *
> > - * NOTE: caller of this function should reserve 5 units of metadata for
> > - *	 this function.
> >   */
> >  int btrfs_orphan_add(struct btrfs_trans_handle *trans,
> >  		struct btrfs_inode *inode)
> >  {
> > -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> >  	struct btrfs_root *root = inode->root;
> > -	struct btrfs_block_rsv *block_rsv = NULL;
> > -	int reserve = 0;
> > -	bool insert = false;
> >  	int ret;
> >  
> > -	if (!root->orphan_block_rsv) {
> > -		block_rsv = btrfs_alloc_block_rsv(fs_info,
> > -						  BTRFS_BLOCK_RSV_TEMP);
> > -		if (!block_rsv)
> > -			return -ENOMEM;
> > -	}
> > -
> > -	if (!test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -			      &inode->runtime_flags))
> > -		insert = true;
> > -
> > -	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> > -			      &inode->runtime_flags))
> > -		reserve = 1;
> > -
> > -	spin_lock(&root->orphan_lock);
> > -	/* If someone has created ->orphan_block_rsv, be happy to use it. */
> > -	if (!root->orphan_block_rsv) {
> > -		root->orphan_block_rsv = block_rsv;
> > -	} else if (block_rsv) {
> > -		btrfs_free_block_rsv(fs_info, block_rsv);
> > -		block_rsv = NULL;
> > -	}
> > +	if (test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > +			     &inode->runtime_flags))
> > +		return 0;
> 
> How come this can return true? Shouldn't btrfs_orphan_add always be
> called for an inode which doesn't have an orphan item? Having this check
> seems to indicate there is some non-determinism in the lifetime of
> orphan items.
Another great point, this was needed when we also inserted orphan items
for truncate, but it's not needed anymore. I'll clean that up, too.
> > -	if (insert)
> > -		atomic_inc(&root->orphan_inodes);
> > -	spin_unlock(&root->orphan_lock);
> > +	atomic_inc(&root->orphan_inodes);
> >  
> > -	/* grab metadata reservation from transaction handle */
> > -	if (reserve) {
> > -		ret = btrfs_orphan_reserve_metadata(trans, inode);
> > -		ASSERT(!ret);
> > -		if (ret) {
> > -			/*
> > -			 * dec doesn't need spin_lock as ->orphan_block_rsv
> > -			 * would be released only if ->orphan_inodes is
> > -			 * zero.
> > -			 */
> > -			atomic_dec(&root->orphan_inodes);
> > -			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> > -				  &inode->runtime_flags);
> > -			if (insert)
> > -				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -					  &inode->runtime_flags);
> > -			return ret;
> > -		}
> > -	}
> > -
> > -	/* insert an orphan item to track this unlinked file */
> > -	if (insert) {
> > -		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
> > -		if (ret && ret != -EEXIST) {
> > -			if (reserve) {
> > -				clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> > -					  &inode->runtime_flags);
> > -				btrfs_orphan_release_metadata(inode);
> > -			}
> > -			/*
> > -			 * btrfs_orphan_commit_root may race with us and set
> > -			 * ->orphan_block_rsv to zero, in order to avoid that,
> > -			 * decrease ->orphan_inodes after everything is done.
> > -			 */
> > -			atomic_dec(&root->orphan_inodes);
> > -			clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -				  &inode->runtime_flags);
> > -			btrfs_abort_transaction(trans, ret);
> > -			return ret;
> > -		}
> > +	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
> > +	if (ret && ret != -EEXIST) {
> > +		atomic_dec(&root->orphan_inodes);
> > +		clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &inode->runtime_flags);
> > +		btrfs_abort_transaction(trans, ret);
> > +		return ret;
> >  	}
> >  
> >  	return 0;
> > @@ -3438,27 +3375,16 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
> >  			    struct btrfs_inode *inode)
> >  {
> >  	struct btrfs_root *root = inode->root;
> > -	int delete_item = 0;
> >  	int ret = 0;
> >  
> > -	if (test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -			       &inode->runtime_flags))
> > -		delete_item = 1;
> > +	if (!test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > +				&inode->runtime_flags))
> > +		return 0;
> 
> Similar comment as in btrfs_orphan_del. Shouldn't this always follow a
> successful btrfs_orphan_add, guaranteeing there is an item for this
> inode? Are there some "benign" races that could happen in the mean time
> which could trigger this check ?
Same thing here, and it's even easier to convince myself here that it's
unnecessary: btrfs_evict_inode() ignores errors if something magical did
happen, and we really expect the orphan item to be there for an
O_TMPFILE in btrfs_link().
next prev parent reply	other threads:[~2018-05-11  6:51 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 01/12] Btrfs: remove stale comment referencing vmtruncate() Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 05/12] Btrfs: don't release reserve or decrement orphan count if orphan item already existed Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 06/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 07/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
2018-05-11  6:38   ` Nikolay Borisov
2018-05-11  6:51     ` Omar Sandoval [this message]
2018-05-11  0:11 ` [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock Omar Sandoval
2018-05-11  6:44   ` Nikolay Borisov
2018-05-11  6:48     ` Omar Sandoval
2018-05-11  7:20       ` Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes Omar Sandoval
2018-05-11  7:01   ` Nikolay Borisov
2018-05-11  7:05     ` Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 11/12] Btrfs: simplify error handling in btrfs_evict_inode() Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=20180511065122.GB30748@vader \
    --to=osandov@osandov.com \
    --cc=clm@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).