* [PATCH v4 00/12] Btrfs: orphan and truncate fixes
@ 2018-05-11 20:13 Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 01/12] Btrfs: update stale comments referencing vmtruncate() Omar Sandoval
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
Hi,
This is the fourth (and hopefully final) version of the orphan item
early ENOSPC and related fixes.
Changes since v3:
- Changed another stale comment in patch 1
- Moved BTRFS_INODE_ORPHAN_META_RESERVED flag removal to patch 10
  instead of patch 9
- Moved inode runtime flag renumbering to a separate patch (patch 11)
- Added some more reviewed-bys
Changes since v2:
- Add patch 5 to get rid of BTRFS_INODE_HAS_ORPHAN_ITEM
- Move patch 10 to patch 6
- Got rid of patch 5; the bug goes away in the process of removing code
  for patches 9 and 10
- Rename patch 10 batch to what it was called in v1
Changes since v1:
- Added two extra cleanups, patches 10 and 11
- Added a forgotten clear of the orphan bit in patch 8
- Reworded titles of patches 6 and 9
- Added people's reviewed-bys
Cover letter from v1:
At Facebook we hit an early ENOSPC issue which we tracked down to the
reservations for orphan items of deleted-but-still-open files. The
primary function of this series is to fix that bug, but I ended up
uncovering a pile of other issues in the process, most notably that the
orphan items we create for truncate are useless.
I've also posted an xfstest that reproduces this bug.
Thanks!
*** BLURB HERE ***
Omar Sandoval (12):
  Btrfs: update stale comments referencing vmtruncate()
  Btrfs: fix error handling in btrfs_truncate_inode_items()
  Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  Btrfs: stop creating orphan items for truncate
  Btrfs: get rid of BTRFS_INODE_HAS_ORPHAN_ITEM
  Btrfs: delete dead code in btrfs_orphan_commit_root()
  Btrfs: don't return ino to ino cache if inode item removal fails
  Btrfs: refactor btrfs_evict_inode() reserve refill dance
  Btrfs: fix ENOSPC caused by orphan items reservations
  Btrfs: get rid of unused orphan infrastructure
  Btrfs: renumber BTRFS_INODE_ runtime flags
  Btrfs: reserve space for O_TMPFILE orphan item deletion
 fs/btrfs/btrfs_inode.h      |  18 +-
 fs/btrfs/ctree.h            |   8 -
 fs/btrfs/disk-io.c          |   9 -
 fs/btrfs/extent-tree.c      |  38 ---
 fs/btrfs/free-space-cache.c |   6 +-
 fs/btrfs/inode.c            | 580 ++++++++++--------------------------
 fs/btrfs/transaction.c      |   1 -
 7 files changed, 172 insertions(+), 488 deletions(-)
-- 
2.17.0
^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH v4 01/12] Btrfs: update stale comments referencing vmtruncate()
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
Commit a41ad394a03b ("Btrfs: convert to the new truncate sequence")
changed btrfs_setsize() to call truncate_setsize() instead of
vmtruncate() but didn't update the comment above it. truncate_setsize()
never fails (the IS_SWAPFILE() check happens elsewhere), so remove the
comment.
Additionally, the comment above btrfs_page_mkwrite() references
vmtruncate(), but truncate_setsize() does the size write and page
locking now.
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..0c644ad7e1cb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5106,7 +5106,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		if (ret)
 			return ret;
 
-		/* we don't support swapfiles, so vmtruncate shouldn't fail */
 		truncate_setsize(inode, newsize);
 
 		/* Disable nonlocked read DIO to avoid the end less truncate */
@@ -8868,8 +8867,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
  *
  * We are not allowed to take the i_mutex here so we have to play games to
  * protect against truncate races as the page could now be beyond EOF.  Because
- * vmtruncate() writes the inode size before removing pages, once we have the
- * page lock we can determine safely if the page is beyond EOF. If it is not
+ * truncate_setsize() writes the inode size before removing pages, once we have
+ * the page lock we can determine safely if the page is beyond EOF. If it is not
  * beyond EOF, then the page is guaranteed safe against truncation until we
  * unlock the page.
  */
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items()
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 01/12] Btrfs: update stale comments referencing vmtruncate() Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
btrfs_truncate_inode_items() uses two variables for error handling, ret
and err. These are not handled consistently, leading to a couple of
bugs.
- Errors from btrfs_del_items() are handled but not propagated to the
  caller
- If btrfs_run_delayed_refs() fails and aborts the transaction, we
  continue running
Just use ret everywhere and simplify things a bit, fixing both of these
issues.
Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling")
Fixes: 1262133b8d6f ("Btrfs: account for crcs in delayed ref processing")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 55 ++++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 27 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0c644ad7e1cb..bfa0e094a60e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4442,7 +4442,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	int pending_del_slot = 0;
 	int extent_type = -1;
 	int ret;
-	int err = 0;
 	u64 ino = btrfs_ino(BTRFS_I(inode));
 	u64 bytes_deleted = 0;
 	bool be_nice = false;
@@ -4494,22 +4493,19 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	 * up a huge file in a single leaf.  Most of the time that
 	 * bytes_deleted is > 0, it will be huge by the time we get here
 	 */
-	if (be_nice && bytes_deleted > SZ_32M) {
-		if (btrfs_should_end_transaction(trans)) {
-			err = -EAGAIN;
-			goto error;
-		}
+	if (be_nice && bytes_deleted > SZ_32M &&
+	    btrfs_should_end_transaction(trans)) {
+		ret = -EAGAIN;
+		goto out;
 	}
 
-
 	path->leave_spinning = 1;
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
-	if (ret < 0) {
-		err = ret;
+	if (ret < 0)
 		goto out;
-	}
 
 	if (ret > 0) {
+		ret = 0;
 		/* there are no items in the tree for us to truncate, we're
 		 * done
 		 */
@@ -4620,7 +4616,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 * We have to bail so the last_size is set to
 				 * just before this extent.
 				 */
-				err = NEED_TRUNCATE_BLOCK;
+				ret = NEED_TRUNCATE_BLOCK;
 				break;
 			}
 
@@ -4687,7 +4683,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						pending_del_nr);
 				if (ret) {
 					btrfs_abort_transaction(trans, ret);
-					goto error;
+					break;
 				}
 				pending_del_nr = 0;
 			}
@@ -4698,8 +4694,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 					trans->delayed_ref_updates = 0;
 					ret = btrfs_run_delayed_refs(trans,
 								   updates * 2);
-					if (ret && !err)
-						err = ret;
+					if (ret)
+						break;
 				}
 			}
 			/*
@@ -4707,8 +4703,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 			 * and let the transaction restart
 			 */
 			if (should_end) {
-				err = -EAGAIN;
-				goto error;
+				ret = -EAGAIN;
+				break;
 			}
 			goto search_again;
 		} else {
@@ -4716,32 +4712,37 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		}
 	}
 out:
-	if (pending_del_nr) {
-		ret = btrfs_del_items(trans, root, path, pending_del_slot,
+	if (ret >= 0 && pending_del_nr) {
+		int err;
+
+		err = btrfs_del_items(trans, root, path, pending_del_slot,
 				      pending_del_nr);
-		if (ret)
-			btrfs_abort_transaction(trans, ret);
+		if (err) {
+			btrfs_abort_transaction(trans, err);
+			ret = err;
+		}
 	}
-error:
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
 		ASSERT(last_size >= new_size);
-		if (!err && last_size > new_size)
+		if (!ret && last_size > new_size)
 			last_size = new_size;
 		btrfs_ordered_update_i_size(inode, last_size, NULL);
 	}
 
 	btrfs_free_path(path);
 
-	if (be_nice && bytes_deleted > SZ_32M) {
+	if (be_nice && bytes_deleted > SZ_32M && (ret >= 0 || ret == -EAGAIN)) {
 		unsigned long updates = trans->delayed_ref_updates;
+		int err;
+
 		if (updates) {
 			trans->delayed_ref_updates = 0;
-			ret = btrfs_run_delayed_refs(trans, updates * 2);
-			if (ret && !err)
-				err = ret;
+			err = btrfs_run_delayed_refs(trans, updates * 2);
+			if (err)
+				ret = err;
 		}
 	}
-	return err;
+	return ret;
 }
 
 /*
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 03/12] Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 01/12] Btrfs: update stale comments referencing vmtruncate() Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-17 17:18   ` David Sterba
  2018-05-11 20:13 ` [PATCH v4 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
btrfs_free_extent() can fail because of ENOMEM. There's no reason to
panic here, we can just abort the transaction.
Fixes: f4b9aa8d3b87 ("btrfs_truncate")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bfa0e094a60e..fa1da1991001 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4655,7 +4655,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						extent_num_bytes, 0,
 						btrfs_header_owner(leaf),
 						ino, extent_offset);
-			BUG_ON(ret);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				break;
+			}
 			if (btrfs_should_throttle_delayed_refs(trans, fs_info))
 				btrfs_async_run_delayed_refs(fs_info,
 					trans->delayed_ref_updates * 2,
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 04/12] Btrfs: stop creating orphan items for truncate
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 05/12] Btrfs: get rid of BTRFS_INODE_HAS_ORPHAN_ITEM Omar Sandoval
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
Currently, we insert an orphan item during a truncate so that if there's
a crash, we don't leak extents past the on-disk i_size. However, since
commit 7f4f6e0a3f6d ("Btrfs: only update disk_i_size as we remove
extents"), we keep disk_i_size in sync with the extent items as we
truncate, so orphan cleanup will never have any extents to remove. Don't
bother with the superfluous orphan item.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/free-space-cache.c |   6 +-
 fs/btrfs/inode.c            | 159 +++++++++++-------------------------
 2 files changed, 51 insertions(+), 114 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index e5b569bebc73..d5f80cb300be 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -253,10 +253,8 @@ int btrfs_truncate_free_space_cache(struct btrfs_trans_handle *trans,
 	truncate_pagecache(inode, 0);
 
 	/*
-	 * We don't need an orphan item because truncating the free space cache
-	 * will never be split across transactions.
-	 * We don't need to check for -EAGAIN because we're a free space
-	 * cache inode
+	 * We skip the throttling logic for free space cache inodes, so we don't
+	 * need to check for -EAGAIN.
 	 */
 	ret = btrfs_truncate_inode_items(trans, root, inode,
 					 0, BTRFS_EXTENT_DATA_KEY);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fa1da1991001..110ccd40987e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3341,8 +3341,8 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 }
 
 /*
- * This creates an orphan entry for the given inode in case something goes
- * wrong in the middle of an unlink/truncate.
+ * This creates an orphan entry for the given inode in case something goes wrong
+ * in the middle of an unlink.
  *
  * NOTE: caller of this function should reserve 5 units of metadata for
  *	 this function.
@@ -3405,7 +3405,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	/* insert an orphan item to track this unlinked/truncated file */
+	/* insert an orphan item to track this unlinked file */
 	if (insert) {
 		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
 		if (ret) {
@@ -3434,8 +3434,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 }
 
 /*
- * We have done the truncate/delete so we can go ahead and remove the orphan
- * item for this particular inode.
+ * We have done the delete so we can go ahead and remove the orphan item for
+ * this particular inode.
  */
 static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 			    struct btrfs_inode *inode)
@@ -3479,7 +3479,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 	struct btrfs_trans_handle *trans;
 	struct inode *inode;
 	u64 last_objectid = 0;
-	int ret = 0, nr_unlink = 0, nr_truncate = 0;
+	int ret = 0, nr_unlink = 0;
 
 	if (cmpxchg(&root->orphan_cleanup_state, 0, ORPHAN_CLEANUP_STARTED))
 		return 0;
@@ -3579,12 +3579,31 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 				key.offset = found_key.objectid - 1;
 				continue;
 			}
+
 		}
+
 		/*
-		 * Inode is already gone but the orphan item is still there,
-		 * kill the orphan item.
+		 * If we have an inode with links, there are a couple of
+		 * possibilities. Old kernels (before v3.12) used to create an
+		 * orphan item for truncate indicating that there were possibly
+		 * extent items past i_size that needed to be deleted. In v3.12,
+		 * truncate was changed to update i_size in sync with the extent
+		 * items, but the (useless) orphan item was still created. Since
+		 * v4.18, we don't create the orphan item for truncate at all.
+		 *
+		 * So, this item could mean that we need to do a truncate, but
+		 * only if this filesystem was last used on a pre-v3.12 kernel
+		 * and was not cleanly unmounted. The odds of that are quite
+		 * slim, and it's a pain to do the truncate now, so just delete
+		 * the orphan item.
+		 *
+		 * It's also possible that this orphan item was supposed to be
+		 * deleted but wasn't. The inode number may have been reused,
+		 * but either way, we can delete the orphan item.
 		 */
-		if (ret == -ENOENT) {
+		if (ret == -ENOENT || inode->i_nlink) {
+			if (!ret)
+				iput(inode);
 			trans = btrfs_start_transaction(root, 1);
 			if (IS_ERR(trans)) {
 				ret = PTR_ERR(trans);
@@ -3608,34 +3627,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 			&BTRFS_I(inode)->runtime_flags);
 		atomic_inc(&root->orphan_inodes);
 
-		/* if we have links, this was a truncate, lets do that */
-		if (inode->i_nlink) {
-			if (WARN_ON(!S_ISREG(inode->i_mode))) {
-				iput(inode);
-				continue;
-			}
-			nr_truncate++;
-
-			/* 1 for the orphan item deletion. */
-			trans = btrfs_start_transaction(root, 1);
-			if (IS_ERR(trans)) {
-				iput(inode);
-				ret = PTR_ERR(trans);
-				goto out;
-			}
-			ret = btrfs_orphan_add(trans, BTRFS_I(inode));
-			btrfs_end_transaction(trans);
-			if (ret) {
-				iput(inode);
-				goto out;
-			}
-
-			ret = btrfs_truncate(inode, false);
-			if (ret)
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-		} else {
-			nr_unlink++;
-		}
+		nr_unlink++;
 
 		/* this will do delete_inode and everything for us */
 		iput(inode);
@@ -3660,8 +3652,6 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 
 	if (nr_unlink)
 		btrfs_debug(fs_info, "unlinked %d orphans", nr_unlink);
-	if (nr_truncate)
-		btrfs_debug(fs_info, "truncated %d orphans", nr_truncate);
 
 out:
 	if (ret)
@@ -5087,29 +5077,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 			set_bit(BTRFS_INODE_ORDERED_DATA_CLOSE,
 				&BTRFS_I(inode)->runtime_flags);
 
-		/*
-		 * 1 for the orphan item we're going to add
-		 * 1 for the orphan item deletion.
-		 */
-		trans = btrfs_start_transaction(root, 2);
-		if (IS_ERR(trans))
-			return PTR_ERR(trans);
-
-		/*
-		 * We need to do this in case we fail at _any_ point during the
-		 * actual truncate.  Once we do the truncate_setsize we could
-		 * invalidate pages which forces any outstanding ordered io to
-		 * be instantly completed which will give us extents that need
-		 * to be truncated.  If we fail to get an orphan inode down we
-		 * could have left over extents that were never meant to live,
-		 * so we need to guarantee from this point on that everything
-		 * will be consistent.
-		 */
-		ret = btrfs_orphan_add(trans, BTRFS_I(inode));
-		btrfs_end_transaction(trans);
-		if (ret)
-			return ret;
-
 		truncate_setsize(inode, newsize);
 
 		/* Disable nonlocked read DIO to avoid the end less truncate */
@@ -5121,29 +5088,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		if (ret && inode->i_nlink) {
 			int err;
 
-			/* To get a stable disk_i_size */
-			err = btrfs_wait_ordered_range(inode, 0, (u64)-1);
-			if (err) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				return err;
-			}
-
 			/*
-			 * failed to truncate, disk_i_size is only adjusted down
-			 * as we remove extents, so it should represent the true
-			 * size of the inode, so reset the in memory size and
-			 * delete our orphan entry.
+			 * Truncate failed, so fix up the in-memory size. We
+			 * adjusted disk_i_size down as we removed extents, so
+			 * wait for disk_i_size to be stable and then update the
+			 * in-memory size to match.
 			 */
-			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans)) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				return ret;
-			}
-			i_size_write(inode, BTRFS_I(inode)->disk_i_size);
-			err = btrfs_orphan_del(trans, BTRFS_I(inode));
+			err = btrfs_wait_ordered_range(inode, 0, (u64)-1);
 			if (err)
-				btrfs_abort_transaction(trans, err);
-			btrfs_end_transaction(trans);
+				return err;
+			i_size_write(inode, BTRFS_I(inode)->disk_i_size);
 		}
 	}
 
@@ -9048,39 +9002,31 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 	}
 
 	/*
-	 * Yes ladies and gentlemen, this is indeed ugly.  The fact is we have
-	 * 3 things going on here
-	 *
-	 * 1) We need to reserve space for our orphan item and the space to
-	 * delete our orphan item.  Lord knows we don't want to have a dangling
-	 * orphan item because we didn't reserve space to remove it.
+	 * Yes ladies and gentlemen, this is indeed ugly.  We have a couple of
+	 * things going on here:
 	 *
-	 * 2) We need to reserve space to update our inode.
+	 * 1) We need to reserve space to update our inode.
 	 *
-	 * 3) We need to have something to cache all the space that is going to
+	 * 2) We need to have something to cache all the space that is going to
 	 * be free'd up by the truncate operation, but also have some slack
 	 * space reserved in case it uses space during the truncate (thank you
 	 * very much snapshotting).
 	 *
-	 * And we need these to all be separate.  The fact is we can use a lot of
+	 * And we need these to be separate.  The fact is we can use a lot of
 	 * space doing the truncate, and we have no earthly idea how much space
 	 * we will use, so we need the truncate reservation to be separate so it
-	 * doesn't end up using space reserved for updating the inode or
-	 * removing the orphan item.  We also need to be able to stop the
-	 * transaction and start a new one, which means we need to be able to
-	 * update the inode several times, and we have no idea of knowing how
-	 * many times that will be, so we can't just reserve 1 item for the
-	 * entirety of the operation, so that has to be done separately as well.
-	 * Then there is the orphan item, which does indeed need to be held on
-	 * to for the whole operation, and we need nobody to touch this reserved
-	 * space except the orphan code.
+	 * doesn't end up using space reserved for updating the inode.  We also
+	 * need to be able to stop the transaction and start a new one, which
+	 * means we need to be able to update the inode several times, and we
+	 * have no idea of knowing how many times that will be, so we can't just
+	 * reserve 1 item for the entirety of the operation, so that has to be
+	 * done separately as well.
 	 *
 	 * So that leaves us with
 	 *
-	 * 1) root->orphan_block_rsv - for the orphan deletion.
-	 * 2) rsv - for the truncate reservation, which we will steal from the
+	 * 1) rsv - for the truncate reservation, which we will steal from the
 	 * transaction reservation.
-	 * 3) fs_info->trans_block_rsv - this will have 1 items worth left for
+	 * 2) fs_info->trans_block_rsv - this will have 1 items worth left for
 	 * updating the inode.
 	 */
 	rsv = btrfs_alloc_block_rsv(fs_info, BTRFS_BLOCK_RSV_TEMP);
@@ -9168,13 +9114,6 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 		btrfs_ordered_update_i_size(inode, inode->i_size, NULL);
 	}
 
-	if (ret == 0 && inode->i_nlink > 0) {
-		trans->block_rsv = root->orphan_block_rsv;
-		ret = btrfs_orphan_del(trans, BTRFS_I(inode));
-		if (ret)
-			err = ret;
-	}
-
 	if (trans) {
 		trans->block_rsv = &fs_info->trans_block_rsv;
 		ret = btrfs_update_inode(trans, root, inode);
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 05/12] Btrfs: get rid of BTRFS_INODE_HAS_ORPHAN_ITEM
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 06/12] Btrfs: delete dead code in btrfs_orphan_commit_root() Omar Sandoval
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
Now that we don't add orphan items for truncate, there can't be races on
adding or deleting an orphan item, so this bit is unnecessary.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h |  1 -
 fs/btrfs/inode.c       | 76 +++++++++++-------------------------------
 2 files changed, 20 insertions(+), 57 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 234bae55b85d..cb7dc0aa4253 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -23,7 +23,6 @@
 #define BTRFS_INODE_ORPHAN_META_RESERVED	1
 #define BTRFS_INODE_DUMMY			2
 #define BTRFS_INODE_IN_DEFRAG			3
-#define BTRFS_INODE_HAS_ORPHAN_ITEM		4
 #define BTRFS_INODE_HAS_ASYNC_EXTENT		5
 #define BTRFS_INODE_NEEDS_FULL_SYNC		6
 #define BTRFS_INODE_COPY_EVERYTHING		7
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 110ccd40987e..9ef20d28fa9e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3354,7 +3354,6 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 	struct btrfs_root *root = inode->root;
 	struct btrfs_block_rsv *block_rsv = NULL;
 	int reserve = 0;
-	bool insert = false;
 	int ret;
 
 	if (!root->orphan_block_rsv) {
@@ -3364,10 +3363,6 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 			return -ENOMEM;
 	}
 
-	if (!test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-			      &inode->runtime_flags))
-		insert = true;
-
 	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
 			      &inode->runtime_flags))
 		reserve = 1;
@@ -3381,8 +3376,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 		block_rsv = NULL;
 	}
 
-	if (insert)
-		atomic_inc(&root->orphan_inodes);
+	atomic_inc(&root->orphan_inodes);
 	spin_unlock(&root->orphan_lock);
 
 	/* grab metadata reservation from transaction handle */
@@ -3398,36 +3392,28 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 			atomic_dec(&root->orphan_inodes);
 			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
 				  &inode->runtime_flags);
-			if (insert)
-				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-					  &inode->runtime_flags);
 			return ret;
 		}
 	}
 
 	/* insert an orphan item to track this unlinked file */
-	if (insert) {
-		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
-		if (ret) {
-			if (reserve) {
-				clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-					  &inode->runtime_flags);
-				btrfs_orphan_release_metadata(inode);
-			}
-			/*
-			 * btrfs_orphan_commit_root may race with us and set
-			 * ->orphan_block_rsv to zero, in order to avoid that,
-			 * decrease ->orphan_inodes after everything is done.
-			 */
-			atomic_dec(&root->orphan_inodes);
-			if (ret != -EEXIST) {
-				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-					  &inode->runtime_flags);
-				btrfs_abort_transaction(trans, ret);
-				return ret;
-			}
+	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
+	if (ret) {
+		if (reserve) {
+			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
+				  &inode->runtime_flags);
+			btrfs_orphan_release_metadata(inode);
+		}
+		/*
+		 * btrfs_orphan_commit_root may race with us and set
+		 * ->orphan_block_rsv to zero, in order to avoid that,
+		 * decrease ->orphan_inodes after everything is done.
+		 */
+		atomic_dec(&root->orphan_inodes);
+		if (ret != -EEXIST) {
+			btrfs_abort_transaction(trans, ret);
+			return ret;
 		}
-		ret = 0;
 	}
 
 	return 0;
@@ -3441,14 +3427,9 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 			    struct btrfs_inode *inode)
 {
 	struct btrfs_root *root = inode->root;
-	int delete_item = 0;
 	int ret = 0;
 
-	if (test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-			       &inode->runtime_flags))
-		delete_item = 1;
-
-	if (delete_item && trans)
+	if (trans)
 		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
 
 	if (test_and_clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
@@ -3460,8 +3441,7 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 	 * to zero, in order to avoid that, decrease ->orphan_inodes after
 	 * everything is done.
 	 */
-	if (delete_item)
-		atomic_dec(&root->orphan_inodes);
+	atomic_dec(&root->orphan_inodes);
 
 	return ret;
 }
@@ -3619,12 +3599,6 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 			continue;
 		}
 
-		/*
-		 * add this inode to the orphan list so btrfs_orphan_del does
-		 * the proper thing when we hit it
-		 */
-		set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-			&BTRFS_I(inode)->runtime_flags);
 		atomic_inc(&root->orphan_inodes);
 
 		nr_unlink++;
@@ -5264,11 +5238,8 @@ void btrfs_evict_inode(struct inode *inode)
 
 	btrfs_free_io_failure_record(BTRFS_I(inode), 0, (u64)-1);
 
-	if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
-		BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-				 &BTRFS_I(inode)->runtime_flags));
+	if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
 		goto no_delete;
-	}
 
 	if (inode->i_nlink > 0) {
 		BUG_ON(btrfs_root_refs(&root->root_item) != 0 &&
@@ -9265,13 +9236,6 @@ void btrfs_destroy_inode(struct inode *inode)
 	if (!root)
 		goto free;
 
-	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-		     &BTRFS_I(inode)->runtime_flags)) {
-		btrfs_info(fs_info, "inode %llu still on the orphan list",
-			   btrfs_ino(BTRFS_I(inode)));
-		atomic_dec(&root->orphan_inodes);
-	}
-
 	while (1) {
 		ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1);
 		if (!ordered)
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 06/12] Btrfs: delete dead code in btrfs_orphan_commit_root()
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (4 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 05/12] Btrfs: get rid of BTRFS_INODE_HAS_ORPHAN_ITEM Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 07/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
btrfs_orphan_commit_root() tries to delete an orphan item for a
subvolume in the tree root, but we don't actually insert that item in
the first place. See commit 0a0d4415e338 ("Btrfs: delete dead code in
btrfs_orphan_add()"). We can get rid of it.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 12 ------------
 1 file changed, 12 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9ef20d28fa9e..84d7dd3a30f9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3302,7 +3302,6 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_block_rsv *block_rsv;
-	int ret;
 
 	if (atomic_read(&root->orphan_inodes) ||
 	    root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE)
@@ -3323,17 +3322,6 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 	root->orphan_block_rsv = NULL;
 	spin_unlock(&root->orphan_lock);
 
-	if (test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
-	    btrfs_root_refs(&root->root_item) > 0) {
-		ret = btrfs_del_orphan_item(trans, fs_info->tree_root,
-					    root->root_key.objectid);
-		if (ret)
-			btrfs_abort_transaction(trans, ret);
-		else
-			clear_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED,
-				  &root->state);
-	}
-
 	if (block_rsv) {
 		WARN_ON(block_rsv->size > 0);
 		btrfs_free_block_rsv(fs_info, block_rsv);
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 07/12] Btrfs: don't return ino to ino cache if inode item removal fails
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (5 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 06/12] Btrfs: delete dead code in btrfs_orphan_commit_root() Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 08/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
In btrfs_evict_inode(), if btrfs_truncate_inode_items() fails, the inode
item will still be in the tree but we still return the ino to the ino
cache. That will blow up later when someone tries to allocate that ino,
so don't return it to the cache.
Fixes: 581bb050941b ("Btrfs: Cache free inode numbers in memory")
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 84d7dd3a30f9..ad4b7fb62f46 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5330,13 +5330,18 @@ void btrfs_evict_inode(struct inode *inode)
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
-		if (ret != -ENOSPC && ret != -EAGAIN)
+		if (ret) {
+			trans->block_rsv = &fs_info->trans_block_rsv;
+			btrfs_end_transaction(trans);
+			btrfs_btree_balance_dirty(fs_info);
+			if (ret != -ENOSPC && ret != -EAGAIN) {
+				btrfs_orphan_del(NULL, BTRFS_I(inode));
+				btrfs_free_block_rsv(fs_info, rsv);
+				goto no_delete;
+			}
+		} else {
 			break;
-
-		trans->block_rsv = &fs_info->trans_block_rsv;
-		btrfs_end_transaction(trans);
-		trans = NULL;
-		btrfs_btree_balance_dirty(fs_info);
+		}
 	}
 
 	btrfs_free_block_rsv(fs_info, rsv);
@@ -5345,12 +5350,8 @@ void btrfs_evict_inode(struct inode *inode)
 	 * Errors here aren't a big deal, it just means we leave orphan items
 	 * in the tree.  They will be cleaned up on the next mount.
 	 */
-	if (ret == 0) {
-		trans->block_rsv = root->orphan_block_rsv;
-		btrfs_orphan_del(trans, BTRFS_I(inode));
-	} else {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
-	}
+	trans->block_rsv = root->orphan_block_rsv;
+	btrfs_orphan_del(trans, BTRFS_I(inode));
 
 	trans->block_rsv = &fs_info->trans_block_rsv;
 	if (!(root == fs_info->tree_root ||
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 08/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (6 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 07/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 09/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
The truncate loop in btrfs_evict_inode() does two things at once:
- It refills the temporary block reserve, potentially stealing from the
  global reserve or committing
- It calls btrfs_truncate_inode_items()
The tangle of continues hides the fact that these two steps are actually
separate. Split the first step out into a separate function both for
clarity and so that we can reuse it in a later patch.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 113 ++++++++++++++++++-----------------------------
 1 file changed, 42 insertions(+), 71 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ad4b7fb62f46..0a753f3a3321 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5189,13 +5189,52 @@ static void evict_inode_truncate_pages(struct inode *inode)
 	spin_unlock(&io_tree->lock);
 }
 
+static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
+							struct btrfs_block_rsv *rsv,
+							u64 min_size)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	int failures = 0;
+
+	for (;;) {
+		struct btrfs_trans_handle *trans;
+		int ret;
+
+		ret = btrfs_block_rsv_refill(root, rsv, min_size,
+					     BTRFS_RESERVE_FLUSH_LIMIT);
+
+		if (ret && ++failures > 2) {
+			btrfs_warn(fs_info,
+				   "could not allocate space for a delete; will truncate on mount");
+			return ERR_PTR(-ENOSPC);
+		}
+
+		trans = btrfs_join_transaction(root);
+		if (IS_ERR(trans) || !ret)
+			return trans;
+
+		/*
+		 * Try to steal from the global reserve if there is space for
+		 * it.
+		 */
+		if (!btrfs_check_space_for_delayed_refs(trans, fs_info) &&
+		    !btrfs_block_rsv_migrate(global_rsv, rsv, min_size, 0))
+			return trans;
+
+		/* If not, commit and try again. */
+		ret = btrfs_commit_transaction(trans);
+		if (ret)
+			return ERR_PTR(ret);
+	}
+}
+
 void btrfs_evict_inode(struct inode *inode)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct btrfs_block_rsv *rsv, *global_rsv;
-	int steal_from_global = 0;
+	struct btrfs_block_rsv *rsv;
 	u64 min_size;
 	int ret;
 
@@ -5248,85 +5287,17 @@ void btrfs_evict_inode(struct inode *inode)
 	}
 	rsv->size = min_size;
 	rsv->failfast = 1;
-	global_rsv = &fs_info->global_block_rsv;
 
 	btrfs_i_size_write(BTRFS_I(inode), 0);
 
-	/*
-	 * This is a bit simpler than btrfs_truncate since we've already
-	 * reserved our space for our orphan item in the unlink, so we just
-	 * need to reserve some slack space in case we add bytes and update
-	 * inode item when doing the truncate.
-	 */
 	while (1) {
-		ret = btrfs_block_rsv_refill(root, rsv, min_size,
-					     BTRFS_RESERVE_FLUSH_LIMIT);
-
-		/*
-		 * Try and steal from the global reserve since we will
-		 * likely not use this space anyway, we want to try as
-		 * hard as possible to get this to work.
-		 */
-		if (ret)
-			steal_from_global++;
-		else
-			steal_from_global = 0;
-		ret = 0;
-
-		/*
-		 * steal_from_global == 0: we reserved stuff, hooray!
-		 * steal_from_global == 1: we didn't reserve stuff, boo!
-		 * steal_from_global == 2: we've committed, still not a lot of
-		 * room but maybe we'll have room in the global reserve this
-		 * time.
-		 * steal_from_global == 3: abandon all hope!
-		 */
-		if (steal_from_global > 2) {
-			btrfs_warn(fs_info,
-				   "Could not get space for a delete, will truncate on mount %d",
-				   ret);
-			btrfs_orphan_del(NULL, BTRFS_I(inode));
-			btrfs_free_block_rsv(fs_info, rsv);
-			goto no_delete;
-		}
-
-		trans = btrfs_join_transaction(root);
+		trans = evict_refill_and_join(root, rsv, min_size);
 		if (IS_ERR(trans)) {
 			btrfs_orphan_del(NULL, BTRFS_I(inode));
 			btrfs_free_block_rsv(fs_info, rsv);
 			goto no_delete;
 		}
 
-		/*
-		 * We can't just steal from the global reserve, we need to make
-		 * sure there is room to do it, if not we need to commit and try
-		 * again.
-		 */
-		if (steal_from_global) {
-			if (!btrfs_check_space_for_delayed_refs(trans, fs_info))
-				ret = btrfs_block_rsv_migrate(global_rsv, rsv,
-							      min_size, 0);
-			else
-				ret = -ENOSPC;
-		}
-
-		/*
-		 * Couldn't steal from the global reserve, we have too much
-		 * pending stuff built up, commit the transaction and try it
-		 * again.
-		 */
-		if (ret) {
-			ret = btrfs_commit_transaction(trans);
-			if (ret) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				btrfs_free_block_rsv(fs_info, rsv);
-				goto no_delete;
-			}
-			continue;
-		} else {
-			steal_from_global = 0;
-		}
-
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 09/12] Btrfs: fix ENOSPC caused by orphan items reservations
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (7 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 08/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 10/12] Btrfs: get rid of unused orphan infrastructure Omar Sandoval
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
Currently, we keep space reserved for all inode orphan items until the
inode is evicted (i.e., all references to it are dropped). We hit an
issue where an application would keep a bunch of deleted files open (by
design) and thus keep a large amount of space reserved, causing ENOSPC
errors when other operations tried to reserve space. This long-standing
reservation isn't absolutely necessary for a couple of reasons:
- We can almost always make the reservation we need or steal from the
  global reserve for the orphan item
- If we can't, it's not the end of the world if we drop the orphan item
  on the floor and let the next mount clean it up
So, get rid of persistent reservation and just reserve space in
btrfs_evict_inode().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 158 ++++++++++++-----------------------------------
 1 file changed, 38 insertions(+), 120 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0a753f3a3321..efa67284ebb6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3331,77 +3331,16 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 /*
  * This creates an orphan entry for the given inode in case something goes wrong
  * in the middle of an unlink.
- *
- * NOTE: caller of this function should reserve 5 units of metadata for
- *	 this function.
  */
 int btrfs_orphan_add(struct btrfs_trans_handle *trans,
-		struct btrfs_inode *inode)
+		     struct btrfs_inode *inode)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
-	struct btrfs_block_rsv *block_rsv = NULL;
-	int reserve = 0;
 	int ret;
 
-	if (!root->orphan_block_rsv) {
-		block_rsv = btrfs_alloc_block_rsv(fs_info,
-						  BTRFS_BLOCK_RSV_TEMP);
-		if (!block_rsv)
-			return -ENOMEM;
-	}
-
-	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-			      &inode->runtime_flags))
-		reserve = 1;
-
-	spin_lock(&root->orphan_lock);
-	/* If someone has created ->orphan_block_rsv, be happy to use it. */
-	if (!root->orphan_block_rsv) {
-		root->orphan_block_rsv = block_rsv;
-	} else if (block_rsv) {
-		btrfs_free_block_rsv(fs_info, block_rsv);
-		block_rsv = NULL;
-	}
-
-	atomic_inc(&root->orphan_inodes);
-	spin_unlock(&root->orphan_lock);
-
-	/* grab metadata reservation from transaction handle */
-	if (reserve) {
-		ret = btrfs_orphan_reserve_metadata(trans, inode);
-		ASSERT(!ret);
-		if (ret) {
-			/*
-			 * dec doesn't need spin_lock as ->orphan_block_rsv
-			 * would be released only if ->orphan_inodes is
-			 * zero.
-			 */
-			atomic_dec(&root->orphan_inodes);
-			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-				  &inode->runtime_flags);
-			return ret;
-		}
-	}
-
-	/* insert an orphan item to track this unlinked file */
-	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
-	if (ret) {
-		if (reserve) {
-			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-				  &inode->runtime_flags);
-			btrfs_orphan_release_metadata(inode);
-		}
-		/*
-		 * btrfs_orphan_commit_root may race with us and set
-		 * ->orphan_block_rsv to zero, in order to avoid that,
-		 * decrease ->orphan_inodes after everything is done.
-		 */
-		atomic_dec(&root->orphan_inodes);
-		if (ret != -EEXIST) {
-			btrfs_abort_transaction(trans, ret);
-			return ret;
-		}
+	ret = btrfs_insert_orphan_item(trans, inode->root, btrfs_ino(inode));
+	if (ret && ret != -EEXIST) {
+		btrfs_abort_transaction(trans, ret);
+		return ret;
 	}
 
 	return 0;
@@ -3414,24 +3353,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 			    struct btrfs_inode *inode)
 {
-	struct btrfs_root *root = inode->root;
-	int ret = 0;
-
-	if (trans)
-		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
-
-	if (test_and_clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-			       &inode->runtime_flags))
-		btrfs_orphan_release_metadata(inode);
-
-	/*
-	 * btrfs_orphan_commit_root may race with us and set ->orphan_block_rsv
-	 * to zero, in order to avoid that, decrease ->orphan_inodes after
-	 * everything is done.
-	 */
-	atomic_dec(&root->orphan_inodes);
-
-	return ret;
+	return btrfs_del_orphan_item(trans, inode->root, btrfs_ino(inode));
 }
 
 /*
@@ -3587,8 +3509,6 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 			continue;
 		}
 
-		atomic_inc(&root->orphan_inodes);
-
 		nr_unlink++;
 
 		/* this will do delete_inode and everything for us */
@@ -5255,10 +5175,8 @@ void btrfs_evict_inode(struct inode *inode)
 	     btrfs_is_free_space_inode(BTRFS_I(inode))))
 		goto no_delete;
 
-	if (is_bad_inode(inode)) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
+	if (is_bad_inode(inode))
 		goto no_delete;
-	}
 	/* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
 	if (!special_file(inode->i_mode))
 		btrfs_wait_ordered_range(inode, 0, (u64)-1);
@@ -5275,16 +5193,12 @@ void btrfs_evict_inode(struct inode *inode)
 	}
 
 	ret = btrfs_commit_inode_delayed_inode(BTRFS_I(inode));
-	if (ret) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
+	if (ret)
 		goto no_delete;
-	}
 
 	rsv = btrfs_alloc_block_rsv(fs_info, BTRFS_BLOCK_RSV_TEMP);
-	if (!rsv) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
+	if (!rsv)
 		goto no_delete;
-	}
 	rsv->size = min_size;
 	rsv->failfast = 1;
 
@@ -5292,46 +5206,50 @@ void btrfs_evict_inode(struct inode *inode)
 
 	while (1) {
 		trans = evict_refill_and_join(root, rsv, min_size);
-		if (IS_ERR(trans)) {
-			btrfs_orphan_del(NULL, BTRFS_I(inode));
-			btrfs_free_block_rsv(fs_info, rsv);
-			goto no_delete;
-		}
+		if (IS_ERR(trans))
+			goto free_rsv;
 
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
-		if (ret) {
-			trans->block_rsv = &fs_info->trans_block_rsv;
-			btrfs_end_transaction(trans);
-			btrfs_btree_balance_dirty(fs_info);
-			if (ret != -ENOSPC && ret != -EAGAIN) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				btrfs_free_block_rsv(fs_info, rsv);
-				goto no_delete;
-			}
-		} else {
+		trans->block_rsv = &fs_info->trans_block_rsv;
+		btrfs_end_transaction(trans);
+		btrfs_btree_balance_dirty(fs_info);
+		if (ret && ret != -ENOSPC && ret != -EAGAIN)
+			goto free_rsv;
+		else if (!ret)
 			break;
-		}
 	}
 
-	btrfs_free_block_rsv(fs_info, rsv);
-
 	/*
-	 * Errors here aren't a big deal, it just means we leave orphan items
-	 * in the tree.  They will be cleaned up on the next mount.
+	 * Errors here aren't a big deal, it just means we leave orphan items in
+	 * the tree. They will be cleaned up on the next mount. If the inode
+	 * number gets reused, cleanup deletes the orphan item without doing
+	 * anything, and unlink reuses the existing orphan item.
+	 *
+	 * If it turns out that we are dropping too many of these, we might want
+	 * to add a mechanism for retrying these after a commit.
 	 */
-	trans->block_rsv = root->orphan_block_rsv;
-	btrfs_orphan_del(trans, BTRFS_I(inode));
+	trans = evict_refill_and_join(root, rsv, min_size);
+	if (!IS_ERR(trans)) {
+		trans->block_rsv = rsv;
+		btrfs_orphan_del(trans, BTRFS_I(inode));
+		trans->block_rsv = &fs_info->trans_block_rsv;
+		btrfs_end_transaction(trans);
+	}
 
-	trans->block_rsv = &fs_info->trans_block_rsv;
 	if (!(root == fs_info->tree_root ||
 	      root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID))
 		btrfs_return_ino(root, btrfs_ino(BTRFS_I(inode)));
 
-	btrfs_end_transaction(trans);
-	btrfs_btree_balance_dirty(fs_info);
+free_rsv:
+	btrfs_free_block_rsv(fs_info, rsv);
 no_delete:
+	/*
+	 * If we didn't successfully delete, the orphan item will still be in
+	 * the tree and we'll retry on the next mount. Again, we might also want
+	 * to retry these periodically in the future.
+	 */
 	btrfs_remove_delayed_node(BTRFS_I(inode));
 	clear_inode(inode);
 }
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 10/12] Btrfs: get rid of unused orphan infrastructure
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (8 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 09/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-11 20:13 ` [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags Omar Sandoval
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
Now that we don't keep long-standing reservations for orphan items,
root->orphan_block_rsv isn't used. We can git rid of it, along with:
- root->orphan_lock, which was used to protect root->orphan_block_rsv
- root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv
- BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations
  in root->orphan_block_rsv
- btrfs_orphan_commit_root(), which was the last user of any of these
  and does nothing else
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h |  1 -
 fs/btrfs/ctree.h       |  8 --------
 fs/btrfs/disk-io.c     |  9 ---------
 fs/btrfs/extent-tree.c | 38 -------------------------------------
 fs/btrfs/inode.c       | 43 +-----------------------------------------
 fs/btrfs/transaction.c |  1 -
 6 files changed, 1 insertion(+), 99 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index cb7dc0aa4253..4807cde0313d 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -20,7 +20,6 @@
  * new data the application may have written before commit.
  */
 #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
-#define BTRFS_INODE_ORPHAN_META_RESERVED	1
 #define BTRFS_INODE_DUMMY			2
 #define BTRFS_INODE_IN_DEFRAG			3
 #define BTRFS_INODE_HAS_ASYNC_EXTENT		5
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2771cc56a622..51408de11af2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1219,9 +1219,6 @@ struct btrfs_root {
 	spinlock_t log_extents_lock[2];
 	struct list_head logged_list[2];
 
-	spinlock_t orphan_lock;
-	atomic_t orphan_inodes;
-	struct btrfs_block_rsv *orphan_block_rsv;
 	int orphan_cleanup_state;
 
 	spinlock_t inode_lock;
@@ -2764,9 +2761,6 @@ void btrfs_delalloc_release_space(struct inode *inode,
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
 void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
-int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
-				  struct btrfs_inode *inode);
-void btrfs_orphan_release_metadata(struct btrfs_inode *inode);
 int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 				     struct btrfs_block_rsv *rsv,
 				     int nitems,
@@ -3238,8 +3232,6 @@ int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
 int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 		struct btrfs_inode *inode);
 int btrfs_orphan_cleanup(struct btrfs_root *root);
-void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
-			      struct btrfs_root *root);
 int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
 void btrfs_invalidate_inodes(struct btrfs_root *root);
 void btrfs_add_delayed_iput(struct inode *inode);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..4a40bfdddabc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1185,7 +1185,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	root->inode_tree = RB_ROOT;
 	INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC);
 	root->block_rsv = NULL;
-	root->orphan_block_rsv = NULL;
 
 	INIT_LIST_HEAD(&root->dirty_list);
 	INIT_LIST_HEAD(&root->root_list);
@@ -1195,7 +1194,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	INIT_LIST_HEAD(&root->ordered_root);
 	INIT_LIST_HEAD(&root->logged_list[0]);
 	INIT_LIST_HEAD(&root->logged_list[1]);
-	spin_lock_init(&root->orphan_lock);
 	spin_lock_init(&root->inode_lock);
 	spin_lock_init(&root->delalloc_lock);
 	spin_lock_init(&root->ordered_extent_lock);
@@ -1216,7 +1214,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	atomic_set(&root->log_commit[1], 0);
 	atomic_set(&root->log_writers, 0);
 	atomic_set(&root->log_batch, 0);
-	atomic_set(&root->orphan_inodes, 0);
 	refcount_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshotted, 0);
 	root->log_transid = 0;
@@ -3674,8 +3671,6 @@ static void free_fs_root(struct btrfs_root *root)
 {
 	iput(root->ino_cache_inode);
 	WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
-	btrfs_free_block_rsv(root->fs_info, root->orphan_block_rsv);
-	root->orphan_block_rsv = NULL;
 	if (root->anon_dev)
 		free_anon_bdev(root->anon_dev);
 	if (root->subv_writers)
@@ -3766,7 +3761,6 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
 
 void close_ctree(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_root *root = fs_info->tree_root;
 	int ret;
 
 	set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
@@ -3861,9 +3855,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 	btrfs_free_stripe_hash_table(fs_info);
 	btrfs_free_ref_cache(fs_info);
 
-	__btrfs_free_block_rsv(root->orphan_block_rsv);
-	root->orphan_block_rsv = NULL;
-
 	while (!list_empty(&fs_info->pinned_chunks)) {
 		struct extent_map *em;
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 51b5e2da708c..3f2e026bc206 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5949,44 +5949,6 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
 	trans->chunk_bytes_reserved = 0;
 }
 
-/* Can only return 0 or -ENOSPC */
-int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
-				  struct btrfs_inode *inode)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
-	/*
-	 * We always use trans->block_rsv here as we will have reserved space
-	 * for our orphan when starting the transaction, using get_block_rsv()
-	 * here will sometimes make us choose the wrong block rsv as we could be
-	 * doing a reloc inode for a non refcounted root.
-	 */
-	struct btrfs_block_rsv *src_rsv = trans->block_rsv;
-	struct btrfs_block_rsv *dst_rsv = root->orphan_block_rsv;
-
-	/*
-	 * We need to hold space in order to delete our orphan item once we've
-	 * added it, so this takes the reservation so we can release it later
-	 * when we are truly done with the orphan item.
-	 */
-	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-
-	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
-			num_bytes, 1);
-	return btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
-}
-
-void btrfs_orphan_release_metadata(struct btrfs_inode *inode)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
-	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-
-	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
-			num_bytes, 0);
-	btrfs_block_rsv_release(fs_info, root->orphan_block_rsv, num_bytes);
-}
-
 /*
  * btrfs_subvolume_reserve_metadata() - reserve space for subvolume operation
  * root: the root of the parent directory
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index efa67284ebb6..5499b4e8a522 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3292,42 +3292,6 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
 	spin_unlock(&fs_info->delayed_iput_lock);
 }
 
-/*
- * This is called in transaction commit time. If there are no orphan
- * files in the subvolume, it removes orphan item and frees block_rsv
- * structure.
- */
-void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
-			      struct btrfs_root *root)
-{
-	struct btrfs_fs_info *fs_info = root->fs_info;
-	struct btrfs_block_rsv *block_rsv;
-
-	if (atomic_read(&root->orphan_inodes) ||
-	    root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE)
-		return;
-
-	spin_lock(&root->orphan_lock);
-	if (atomic_read(&root->orphan_inodes)) {
-		spin_unlock(&root->orphan_lock);
-		return;
-	}
-
-	if (root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) {
-		spin_unlock(&root->orphan_lock);
-		return;
-	}
-
-	block_rsv = root->orphan_block_rsv;
-	root->orphan_block_rsv = NULL;
-	spin_unlock(&root->orphan_lock);
-
-	if (block_rsv) {
-		WARN_ON(block_rsv->size > 0);
-		btrfs_free_block_rsv(fs_info, block_rsv);
-	}
-}
-
 /*
  * This creates an orphan entry for the given inode in case something goes wrong
  * in the middle of an unlink.
@@ -3521,12 +3485,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 
 	root->orphan_cleanup_state = ORPHAN_CLEANUP_DONE;
 
-	if (root->orphan_block_rsv)
-		btrfs_block_rsv_release(fs_info, root->orphan_block_rsv,
-					(u64)-1);
-
-	if (root->orphan_block_rsv ||
-	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)) {
+	if (test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)) {
 		trans = btrfs_join_transaction(root);
 		if (!IS_ERR(trans))
 			btrfs_end_transaction(trans);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c944b4769e3c..44af1edf15d1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1250,7 +1250,6 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
 
 			btrfs_free_log(trans, root);
 			btrfs_update_reloc_root(trans, root);
-			btrfs_orphan_commit_root(trans, root);
 
 			btrfs_save_ino_cache(root, trans);
 
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (9 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 10/12] Btrfs: get rid of unused orphan infrastructure Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-14 15:17   ` David Sterba
  2018-05-11 20:13 ` [PATCH v4 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval
  2018-05-18 15:14 ` [PATCH v4 00/12] Btrfs: orphan and truncate fixes David Sterba
  12 siblings, 1 reply; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and
BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make
them consecutive again.
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4807cde0313d..bbbe7f308d68 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -20,14 +20,14 @@
  * new data the application may have written before commit.
  */
 #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
-#define BTRFS_INODE_DUMMY			2
-#define BTRFS_INODE_IN_DEFRAG			3
-#define BTRFS_INODE_HAS_ASYNC_EXTENT		5
-#define BTRFS_INODE_NEEDS_FULL_SYNC		6
-#define BTRFS_INODE_COPY_EVERYTHING		7
-#define BTRFS_INODE_IN_DELALLOC_LIST		8
-#define BTRFS_INODE_READDIO_NEED_LOCK		9
-#define BTRFS_INODE_HAS_PROPS		        10
+#define BTRFS_INODE_DUMMY			1
+#define BTRFS_INODE_IN_DEFRAG			2
+#define BTRFS_INODE_HAS_ASYNC_EXTENT		3
+#define BTRFS_INODE_NEEDS_FULL_SYNC		4
+#define BTRFS_INODE_COPY_EVERYTHING		5
+#define BTRFS_INODE_IN_DELALLOC_LIST		6
+#define BTRFS_INODE_READDIO_NEED_LOCK		7
+#define BTRFS_INODE_HAS_PROPS		        8
 
 /* in memory btrfs inode */
 struct btrfs_inode {
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* [PATCH v4 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (10 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags Omar Sandoval
@ 2018-05-11 20:13 ` Omar Sandoval
  2018-05-18 15:14 ` [PATCH v4 00/12] Btrfs: orphan and truncate fixes David Sterba
  12 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-11 20:13 UTC (permalink / raw)
  To: linux-btrfs
  Cc: kernel-team, Chris Mason, Josef Bacik, Nikolay Borisov,
	David Sterba
From: Omar Sandoval <osandov@fb.com>
btrfs_link() calls btrfs_orphan_del() if it's linking an O_TMPFILE but
it doesn't reserve space to do so. Even before the removal of the
orphan_block_rsv it wasn't using it.
Fixes: ef3b9af50bfa ("Btrfs: implement inode_operations callback tmpfile")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5499b4e8a522..fefa665e64da 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6465,8 +6465,9 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	 * 2 items for inode and inode ref
 	 * 2 items for dir items
 	 * 1 item for parent inode
+	 * 1 item for orphan item deletion if O_TMPFILE
 	 */
-	trans = btrfs_start_transaction(root, 5);
+	trans = btrfs_start_transaction(root, inode->i_nlink ? 5 : 6);
 	if (IS_ERR(trans)) {
 		err = PTR_ERR(trans);
 		trans = NULL;
-- 
2.17.0
^ permalink raw reply related	[flat|nested] 17+ messages in thread
* Re: [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags
  2018-05-11 20:13 ` [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags Omar Sandoval
@ 2018-05-14 15:17   ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2018-05-14 15:17 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik,
	Nikolay Borisov, David Sterba
On Fri, May 11, 2018 at 01:13:39PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and
> BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make
> them consecutive again.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/btrfs_inode.h | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4807cde0313d..bbbe7f308d68 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -20,14 +20,14 @@
>   * new data the application may have written before commit.
>   */
>  #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
> -#define BTRFS_INODE_DUMMY			2
> -#define BTRFS_INODE_IN_DEFRAG			3
> -#define BTRFS_INODE_HAS_ASYNC_EXTENT		5
> -#define BTRFS_INODE_NEEDS_FULL_SYNC		6
> -#define BTRFS_INODE_COPY_EVERYTHING		7
> -#define BTRFS_INODE_IN_DELALLOC_LIST		8
> -#define BTRFS_INODE_READDIO_NEED_LOCK		9
> -#define BTRFS_INODE_HAS_PROPS		        10
> +#define BTRFS_INODE_DUMMY			1
> +#define BTRFS_INODE_IN_DEFRAG			2
> +#define BTRFS_INODE_HAS_ASYNC_EXTENT		3
> +#define BTRFS_INODE_NEEDS_FULL_SYNC		4
> +#define BTRFS_INODE_COPY_EVERYTHING		5
> +#define BTRFS_INODE_IN_DELALLOC_LIST		6
> +#define BTRFS_INODE_READDIO_NEED_LOCK		7
> +#define BTRFS_INODE_HAS_PROPS		        8
I'll update it to something like this, as we want the auto-numbering
from enums:
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -19,17 +19,19 @@
 * ordered operations list so that we make sure to flush out any
 *   * new data the application may have written before commit.
 */
 -#define BTRFS_INODE_ORDERED_DATA_CLOSE         0
 -#define BTRFS_INODE_ORPHAN_META_RESERVED       1
 -#define BTRFS_INODE_DUMMY                      2
 -#define BTRFS_INODE_IN_DEFRAG                  3
 -#define BTRFS_INODE_HAS_ORPHAN_ITEM            4
 -#define BTRFS_INODE_HAS_ASYNC_EXTENT           5
 -#define BTRFS_INODE_NEEDS_FULL_SYNC            6
 -#define BTRFS_INODE_COPY_EVERYTHING            7
 -#define BTRFS_INODE_IN_DELALLOC_LIST           8
 -#define BTRFS_INODE_READDIO_NEED_LOCK          9
 -#define BTRFS_INODE_HAS_PROPS                  10
 +enum {
 +       BTRFS_INODE_ORDERED_DATA_CLOSE = 0,
 +       BTRFS_INODE_ORPHAN_META_RESERVED,
 +       BTRFS_INODE_DUMMY,
 +       BTRFS_INODE_IN_DEFRAG,
 +       BTRFS_INODE_HAS_ORPHAN_ITEM,
 +       BTRFS_INODE_HAS_ASYNC_EXTENT,
 +       BTRFS_INODE_NEEDS_FULL_SYNC,
 +       BTRFS_INODE_COPY_EVERYTHING,
 +       BTRFS_INODE_IN_DELALLOC_LIST,
 +       BTRFS_INODE_READDIO_NEED_LOCK,
 +       BTRFS_INODE_HAS_PROPS,
 +};
^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [PATCH v4 03/12] Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  2018-05-11 20:13 ` [PATCH v4 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
@ 2018-05-17 17:18   ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2018-05-17 17:18 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik,
	Nikolay Borisov, David Sterba
On Fri, May 11, 2018 at 01:13:31PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> btrfs_free_extent() can fail because of ENOMEM. There's no reason to
> panic here, we can just abort the transaction.
> 
> Fixes: f4b9aa8d3b87 ("btrfs_truncate")
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [PATCH v4 00/12] Btrfs: orphan and truncate fixes
  2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (11 preceding siblings ...)
  2018-05-11 20:13 ` [PATCH v4 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval
@ 2018-05-18 15:14 ` David Sterba
  2018-05-18 17:38   ` Omar Sandoval
  12 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2018-05-18 15:14 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik,
	Nikolay Borisov, David Sterba
On Fri, May 11, 2018 at 01:13:28PM -0700, Omar Sandoval wrote:
> This is the fourth (and hopefully final) version of the orphan item
> early ENOSPC and related fixes.
> 
> Changes since v3:
> 
> - Changed another stale comment in patch 1
> - Moved BTRFS_INODE_ORPHAN_META_RESERVED flag removal to patch 10
>   instead of patch 9
> - Moved inode runtime flag renumbering to a separate patch (patch 11)
> - Added some more reviewed-bys
Patchset added to misc-next, thanks. Besides fstests, I've hammered it
with some stress testing, all fine.
> Changes since v2:
> 
> - Add patch 5 to get rid of BTRFS_INODE_HAS_ORPHAN_ITEM
> - Move patch 10 to patch 6
> - Got rid of patch 5; the bug goes away in the process of removing code
>   for patches 9 and 10
> - Rename patch 10 batch to what it was called in v1
> 
> Changes since v1:
> 
> - Added two extra cleanups, patches 10 and 11
> - Added a forgotten clear of the orphan bit in patch 8
> - Reworded titles of patches 6 and 9
> - Added people's reviewed-bys
> 
> Cover letter from v1:
> 
> At Facebook we hit an early ENOSPC issue which we tracked down to the
> reservations for orphan items of deleted-but-still-open files. The
> primary function of this series is to fix that bug, but I ended up
> uncovering a pile of other issues in the process, most notably that the
> orphan items we create for truncate are useless.
> 
> I've also posted an xfstest that reproduces this bug.
> 
> Thanks!
> 
> *** BLURB HERE ***
Blurb!
^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [PATCH v4 00/12] Btrfs: orphan and truncate fixes
  2018-05-18 15:14 ` [PATCH v4 00/12] Btrfs: orphan and truncate fixes David Sterba
@ 2018-05-18 17:38   ` Omar Sandoval
  0 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2018-05-18 17:38 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team, Chris Mason, Josef Bacik,
	Nikolay Borisov, David Sterba
On Fri, May 18, 2018 at 05:14:35PM +0200, David Sterba wrote:
> On Fri, May 11, 2018 at 01:13:28PM -0700, Omar Sandoval wrote:
> > This is the fourth (and hopefully final) version of the orphan item
> > early ENOSPC and related fixes.
> > 
> > Changes since v3:
> > 
> > - Changed another stale comment in patch 1
> > - Moved BTRFS_INODE_ORPHAN_META_RESERVED flag removal to patch 10
> >   instead of patch 9
> > - Moved inode runtime flag renumbering to a separate patch (patch 11)
> > - Added some more reviewed-bys
> 
> Patchset added to misc-next, thanks. Besides fstests, I've hammered it
> with some stress testing, all fine.
Thanks, Dave!
> > Changes since v2:
> > 
> > - Add patch 5 to get rid of BTRFS_INODE_HAS_ORPHAN_ITEM
> > - Move patch 10 to patch 6
> > - Got rid of patch 5; the bug goes away in the process of removing code
> >   for patches 9 and 10
> > - Rename patch 10 batch to what it was called in v1
> > 
> > Changes since v1:
> > 
> > - Added two extra cleanups, patches 10 and 11
> > - Added a forgotten clear of the orphan bit in patch 8
> > - Reworded titles of patches 6 and 9
> > - Added people's reviewed-bys
> > 
> > Cover letter from v1:
> > 
> > At Facebook we hit an early ENOSPC issue which we tracked down to the
> > reservations for orphan items of deleted-but-still-open files. The
> > primary function of this series is to fix that bug, but I ended up
> > uncovering a pile of other issues in the process, most notably that the
> > orphan items we create for truncate are useless.
> > 
> > I've also posted an xfstest that reproduces this bug.
> > 
> > Thanks!
> > 
> > *** BLURB HERE ***
> 
> Blurb!
Oops ;)
^ permalink raw reply	[flat|nested] 17+ messages in thread
end of thread, other threads:[~2018-05-18 17:38 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-11 20:13 [PATCH v4 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 01/12] Btrfs: update stale comments referencing vmtruncate() Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
2018-05-17 17:18   ` David Sterba
2018-05-11 20:13 ` [PATCH v4 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 05/12] Btrfs: get rid of BTRFS_INODE_HAS_ORPHAN_ITEM Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 06/12] Btrfs: delete dead code in btrfs_orphan_commit_root() Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 07/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 08/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 09/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 10/12] Btrfs: get rid of unused orphan infrastructure Omar Sandoval
2018-05-11 20:13 ` [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags Omar Sandoval
2018-05-14 15:17   ` David Sterba
2018-05-11 20:13 ` [PATCH v4 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval
2018-05-18 15:14 ` [PATCH v4 00/12] Btrfs: orphan and truncate fixes David Sterba
2018-05-18 17:38   ` Omar Sandoval
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).